Read arbitrary YAML files in Rust

YAML serde serde_yaml as_str as_u64 as_f64 as_bool as_sequence get assert_eq! assert!

When we need to read a YAML file in Rust ideally we would define a struct that maps the fields of the YAML file. That would help with data validation and later with the handling of the data. However, we can't always do that and often we might just want to read in the YAML file and think about the specific definition later.

In this example we'll see how to do this.

We are using serde_yaml for this:

examples/read-arbitrary-yaml/Cargo.toml

[package]
name = "read-arbitrary-yaml"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde_yaml = "0.9"

We have two YAML files.

Invalid YAML format

One of them has invalid YAML format, just to show what happens when the YAML parsing fails.

examples/read-arbitrary-yaml/broken.yaml

fname: Foo
- lname: Bar

If we run our code on this file we get an error message.

cargo run broken.yaml

There was an error parsing the YAML file did not find expected key at line 2 column 1, while parsing a block mapping

Valid YAML file

We also have a good YAML file with some fields an values.

examples/read-arbitrary-yaml/data.yaml

fname: Foo
lname: Bar
year: 2023
height: 6.1
# A comment

numbers:
  - 23
  - 19
  - 42
children:
  - name: Alpha
    birthdate: 2020
  - name: Beta
    birthdate: 2022

The code

This is the code:

examples/read-arbitrary-yaml/src/main.rs

use std::env;
use std::fs::File;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} FILENAME", args[0]);
        std::process::exit(1);
    }
    let filename = &args[1];

    let data: serde_yaml::Value = match File::open(filename) {
        Ok(file) => match serde_yaml::from_reader(file) {
            Ok(data) => data,
            Err(err) => {
                eprintln!("There was an error parsing the YAML file {}", err);
                std::process::exit(1);
            }
        },
        Err(error) => {
            eprintln!("Error opening file {}: {}", filename, error);
            std::process::exit(1);
        }
    };
    println!("{:#?}", &data);
    println!();

    let fname = data["fname"].as_str().unwrap();
    assert_eq!(fname, "Foo");
    assert_eq!(&data["fname"], "Foo");

    for field in ["fname", "lname", "address", "year", "height"] {
        println!("field: {}", field);
        let value = match data.get(field) {
            Some(val) => val,
            None => continue,
        };

        if field == "fname" || field == "lname" {
            println!("{}={}", field, &data[field].as_str().unwrap());
            println!("{}={}", field, value.as_str().unwrap());
        }
        if field == "year" {
            println!("{}={}", field, &data[field].as_u64().unwrap());
            println!("{}={}", field, value.as_u64().unwrap());
        }
        if field == "height" {
            println!("{}={}", field, &data[field].as_f64().unwrap());
            println!("{}={}", field, value.as_f64().unwrap());
        }
    }
    println!();

    let field = "year";
    let value = data.get(field).unwrap().as_u64().unwrap();
    println!("{}={}", field, value);
    assert_eq!(value, 2023);
    println!();

    let field = "height";
    let value = match data.get(field) {
        Some(val) => val.as_f64().unwrap(),
        None => return,
    };
    assert_eq!(value, 6.1);
    println!("height={}", value);

    println!();
    // Iterate over list of values from the YAML file
    let numbers = data.get("numbers").unwrap();
    println!("{:?}", numbers);
    println!("{:?}", numbers.as_sequence().unwrap());

    for num in numbers.as_sequence().unwrap() {
        println!("{}", num.as_u64().unwrap());
    }

    println!();
    // Iterate over list of values hashes the YAML file
    let children = data.get("children").unwrap();
    for child in children.as_sequence().unwrap() {
        println!("child: {:?}", child);
        println!("name: {}", child.get("name").unwrap().as_str().unwrap());
        println!(
            "birthdate: {}",
            child.get("birthdate").unwrap().as_u64().unwrap()
        );
    }
}

In the first few lines we are just accepting a filename on the command line.

Then we open the file using std::File::open and use serde_yaml::from_reader to read the YAML file and convert to an internal data structure.

This data structure, assigned to the data variable name, is of type serde_yaml::Value.

let data: serde_yaml::Value =

We printed it out using the first println! statement.

Mapping {
    "fname": String("Foo"),
    "lname": String("Bar"),
    "year": Number(2023),
    "height": Number(6.1),
    "numbers": Sequence [
        Number(23),
        Number(19),
        Number(42),
    ],
    "children": Sequence [
        Mapping {
            "name": String("Alpha"),
            "birthdate": Number(2020),
        },
        Mapping {
            "name": String("Beta"),
            "birthdate": Number(2022),
        },
    ],
}

From this point we can access the elements of this Mapping using either the data[FIELD] format or data.get(FIELD). The former would panic! if we supplied a FIELD that does not exists. The latter, the get function returns an Option and thus we can either use unwrap on it, actually disregarding the possibility that the get call can return None or we can use match to handle the None as well. In this code we used both strategies in different cases. Primarily to show how they can be done.

In some cases you will see println! statements to show the values, in some cases there are also assert_eq! statement to show you what are the expected values.

Besides knowing the name of the field you'd like to fetch you also need to know the type of the field and then you need to use one of the conversion methods of the Value enum.

In the code you can see examples for as_str as_u64, and as_f64 to fetch the primitive values.

You can also see the as_sequence to access a sequence of values.

The output

field: fname
fname=Foo
fname=Foo
field: lname
lname=Bar
lname=Bar
field: address
field: year
year=2023
year=2023
field: height
height=6.1
height=6.1

year=2023

height=6.1

Sequence [Number(23), Number(19), Number(42)]
[Number(23), Number(19), Number(42)]
23
19
42

child: Mapping {"name": String("Alpha"), "birthdate": Number(2020)}
name: Alpha
birthdate: 2020
child: Mapping {"name": String("Beta"), "birthdate": Number(2022)}
name: Beta
birthdate: 2022

Read YAML file containing sequence

In some cases the YAML file is not a mapping, but a sequence at its root as in this example:

examples/read-yaml-sequence/data.yaml

- name: Foo
- name: Bar
  year: 2023
  married: true


We can use the same method here as well. In this case we even see how we can access the first data[0], the second data[1], and any other element in the sequence based on its location. We also see a field that contains a boolean value and we use the as_bool function to conver it to a real bool. We use assert_eq! to compare strings and numbers to expected values and we use assert! to check if a boolean value is indeed true.

examples/read-yaml-sequence/src/main.rs

use std::env;
use std::fs::File;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} FILENAME", args[0]);
        std::process::exit(1);
    }
    let filename = &args[1];

    let data: serde_yaml::Value = match File::open(filename) {
        Ok(file) => match serde_yaml::from_reader(file) {
            Ok(data) => data,
            Err(err) => {
                eprintln!("There was an error parsing the YAML file {}", err);
                std::process::exit(1);
            }
        },
        Err(error) => {
            eprintln!("Error opening file {}: {}", filename, error);
            std::process::exit(1);
        }
    };
    println!("{:#?}", &data);
    println!();

    println!("{:#?}", &data[0]);
    let name = data[0].get("name").unwrap().as_str().unwrap();
    println!("{}", name);
    assert_eq!(name, "Foo");
    println!();

    println!("{:#?}", &data[1]);
    let name = data[1].get("name").unwrap().as_str().unwrap();
    let year = data[1].get("year").unwrap().as_u64().unwrap();
    let married = data[1].get("married").unwrap().as_bool().unwrap();
    println!("{}", name);
    println!("{}", year);
    println!("{}", married);
    assert_eq!(name, "Bar");
    assert_eq!(year, 2023);
    assert!(married);
    println!();
}

Conclusion

It might be a bit cumbersome to access the values this way, but it needs less start-up work of creating the struct mapping the YAML file.

If you have a YAML file with a data structure that differs from this and you'd like me to add such and example, let me know by opening an issue via the link below.

Related Pages

YAML and Rust
Read a simple YAML file into a struct

Author

Gabor Szabo (szabgab)

Gabor Szabo, the author of the Rust Maven web site maintains several Open source projects in Rust and while he still feels he has tons of new things to learn about Rust he already offers training courses in Rust and still teaches Python, Perl, git, GitHub, GitLab, CI, and testing.

Gabor Szabo