YAML and Rust

YAML is a file format often used as a configuration file. Most of the programming languages have a way to deserialize YAML into some internal data structure. So does Rust via serde.


YAML is a human-readable and human writable file format often used for configuration. I maintain several project where people collect data into thousands of YAML files and the some program collects the data and generats a web site.

This is a collection of articles on dealing with YAML in the Rust programming language.

Read arbitrary YAML files in Rust

Read a YAML files without knowing up-front its structure?

  • YAML
  • serde
  • serde_yaml
  • as_str
  • as_u64
  • as_f64
  • as_bool
  • as_sequence
  • get
  • assert_eq!
  • assert!

When we need to read a YAML file in Rust ideally we would define a struct that maps the fields of the YAML file. That would help with data validation and later with the handling of the data. However, we can't always do that and often we might just want to read in the YAML file and think about the specific definition later.

In this example we'll see how to do this.

We are using serde_yaml for this:

[package]
name = "read-arbitrary-yaml"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde_yaml = "0.9"

We have two YAML files.

Invalid YAML format

One of them has invalid YAML format, just to show what happens when the YAML parsing fails.

fname: Foo
- lname: Bar

If we run our code on this file we get an error message.

cargo run broken.yaml

There was an error parsing the YAML file did not find expected key at line 2 column 1, while parsing a block mapping

Valid YAML file

We also have a good YAML file with some fields an values.

fname: Foo
lname: Bar
year: 2023
height: 6.1
# A comment

numbers:
  - 23
  - 19
  - 42
children:
  - name: Alpha
    birthdate: 2020
  - name: Beta
    birthdate: 2022

The code

This is the code:

use std::env;
use std::fs::File;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} FILENAME", args[0]);
        std::process::exit(1);
    }
    let filename = &args[1];

    let data: serde_yaml::Value = match File::open(filename) {
        Ok(file) => match serde_yaml::from_reader(file) {
            Ok(data) => data,
            Err(err) => {
                eprintln!("There was an error parsing the YAML file {}", err);
                std::process::exit(1);
            }
        },
        Err(error) => {
            eprintln!("Error opening file {}: {}", filename, error);
            std::process::exit(1);
        }
    };
    println!("{:#?}", &data);
    println!();

    let fname = data["fname"].as_str().unwrap();
    assert_eq!(fname, "Foo");
    assert_eq!(&data["fname"], "Foo");

    for field in ["fname", "lname", "address", "year", "height"] {
        println!("field: {}", field);
        let value = match data.get(field) {
            Some(val) => val,
            None => continue,
        };

        if field == "fname" || field == "lname" {
            println!("{}={}", field, &data[field].as_str().unwrap());
            println!("{}={}", field, value.as_str().unwrap());
        }
        if field == "year" {
            println!("{}={}", field, &data[field].as_u64().unwrap());
            println!("{}={}", field, value.as_u64().unwrap());
        }
        if field == "height" {
            println!("{}={}", field, &data[field].as_f64().unwrap());
            println!("{}={}", field, value.as_f64().unwrap());
        }
    }
    println!();

    let field = "year";
    let value = data.get(field).unwrap().as_u64().unwrap();
    println!("{}={}", field, value);
    assert_eq!(value, 2023);
    println!();

    let field = "height";
    let value = match data.get(field) {
        Some(val) => val.as_f64().unwrap(),
        None => return,
    };
    assert_eq!(value, 6.1);
    println!("height={}", value);

    println!();
    // Iterate over list of values from the YAML file
    let numbers = data.get("numbers").unwrap();
    println!("{:?}", numbers);
    println!("{:?}", numbers.as_sequence().unwrap());

    for num in numbers.as_sequence().unwrap() {
        println!("{}", num.as_u64().unwrap());
    }

    println!();
    // Iterate over list of values hashes the YAML file
    let children = data.get("children").unwrap();
    for child in children.as_sequence().unwrap() {
        println!("child: {:?}", child);
        println!("name: {}", child.get("name").unwrap().as_str().unwrap());
        println!(
            "birthdate: {}",
            child.get("birthdate").unwrap().as_u64().unwrap()
        );
    }
}

In the first few lines we are just accepting a filename on the command line.

Then we open the file using std::File::open and use serde_yaml::from_reader to read the YAML file and convert to an internal data structure.

This data structure, assigned to the data variable name, is of type serde_yaml::Value.

#![allow(unused)]
fn main() {
let data: serde_yaml::Value =
}

We printed it out using the first println! statement.

Mapping {
    "fname": String("Foo"),
    "lname": String("Bar"),
    "year": Number(2023),
    "height": Number(6.1),
    "numbers": Sequence [
        Number(23),
        Number(19),
        Number(42),
    ],
    "children": Sequence [
        Mapping {
            "name": String("Alpha"),
            "birthdate": Number(2020),
        },
        Mapping {
            "name": String("Beta"),
            "birthdate": Number(2022),
        },
    ],
}

From this point we can access the elements of this Mapping using either the data[FIELD] format or data.get(FIELD). The former would panic! if we supplied a FIELD that does not exists. The latter, the get function returns an Option and thus we can either use unwrap on it, actually disregarding the possibility that the get call can return None or we can use match to handle the None as well. In this code we used both strategies in different cases. Primarily to show how they can be done.

In some cases you will see println! statements to show the values, in some cases there are also assert_eq! statement to show you what are the expected values.

Besides knowing the name of the field you'd like to fetch you also need to know the type of the field and then you need to use one of the conversion methods of the Value enum.

In the code you can see examples for as_str as_u64, and as_f64 to fetch the primitive values.

You can also see the as_sequence to access a sequence of values.

The output

field: fname
fname=Foo
fname=Foo
field: lname
lname=Bar
lname=Bar
field: address
field: year
year=2023
year=2023
field: height
height=6.1
height=6.1

year=2023

height=6.1

Sequence [Number(23), Number(19), Number(42)]
[Number(23), Number(19), Number(42)]
23
19
42

child: Mapping {"name": String("Alpha"), "birthdate": Number(2020)}
name: Alpha
birthdate: 2020
child: Mapping {"name": String("Beta"), "birthdate": Number(2022)}
name: Beta
birthdate: 2022

Read YAML file containing sequence

In some cases the YAML file is not a mapping, but a sequence at its root as in this example:

- name: Foo
- name: Bar
  year: 2023
  married: true

We can use the same method here as well. In this case we even see how we can access the first data[0], the second data[1], and any other element in the sequence based on its location. We also see a field that contains a boolean value and we use the as_bool function to conver it to a real bool. We use assert_eq! to compare strings and numbers to expected values and we use assert! to check if a boolean value is indeed true.

use std::env;
use std::fs::File;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} FILENAME", args[0]);
        std::process::exit(1);
    }
    let filename = &args[1];

    let data: serde_yaml::Value = match File::open(filename) {
        Ok(file) => match serde_yaml::from_reader(file) {
            Ok(data) => data,
            Err(err) => {
                eprintln!("There was an error parsing the YAML file {}", err);
                std::process::exit(1);
            }
        },
        Err(error) => {
            eprintln!("Error opening file {}: {}", filename, error);
            std::process::exit(1);
        }
    };
    println!("{:#?}", &data);
    println!();

    println!("{:#?}", &data[0]);
    let name = data[0].get("name").unwrap().as_str().unwrap();
    println!("{}", name);
    assert_eq!(name, "Foo");
    println!();

    println!("{:#?}", &data[1]);
    let name = data[1].get("name").unwrap().as_str().unwrap();
    let year = data[1].get("year").unwrap().as_u64().unwrap();
    let married = data[1].get("married").unwrap().as_bool().unwrap();
    println!("{}", name);
    println!("{}", year);
    println!("{}", married);
    assert_eq!(name, "Bar");
    assert_eq!(year, 2023);
    assert!(married);
    println!();
}

Conclusion

It might be a bit cumbersome to access the values this way, but it needs less start-up work of creating the struct mapping the YAML file.

If you have a YAML file with a data structure that differs from this and you'd like me to add such and example, let me know by opening an issue via the link below.

Read a simple YAML file into a struct

You can define a struct that represents the fields of a YAML file to get automatic data conversion.

  • struct
  • serde
  • Deserialize

In an earlier article we saw how to read an arbitrary YAML file and then access the individual fields.

A more time consuming, but more robust way is to define a struct mapping all the fields of the YAML file. We'll see several such examples.

For all of them we'll need both serde_yaml and serde as you can see in the Cargo.toml file:

[package]
name = "read-simple-yaml"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.9"

Data

Let's see this simple YAML file:

fname: Foo
lname: Bar
year: 2023
height: 6.1
married: true
# A comment

The code

use std::env;
use std::fs::File;

use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct Data {
    fname: String,
    lname: String,
    year: u16,
    height: f32,
    married: bool,
}

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} FILENAME", args[0]);
        std::process::exit(1);
    }
    let filename = &args[1];

    let data: Data = match File::open(filename) {
        Ok(file) => match serde_yaml::from_reader(file) {
            Ok(data) => data,
            Err(err) => {
                eprintln!("There was an error parsing the YAML file {}", err);
                std::process::exit(1);
            }
        },
        Err(error) => {
            eprintln!("Error opening file {}: {}", filename, error);
            std::process::exit(1);
        }
    };
    println!("{:#?}", &data);
    println!();

    println!("{}", data.fname);
    assert_eq!(data.lname, "Bar");
    assert_eq!(data.year, 2023);
    assert_eq!(data.height, 6.1);
    assert!(data.married);

}

Before getting to the main function we define a struct with the fields of the YAML file and the type of values the YAML file has. We add the Deserialize trait to it.

The first few lines in the main function is to accept the name of the YAML file on the command line as described in the article on how to expect one command line parameter.

The we open the YAML file using the std::fs::File::open function and then we call the serde_yaml::from_reader function to read the content of the file and parse it. The most important part is to assign to a variable that was defined to be type of the struct we have defined earlier. In this example I cleverly used the name Data. Don't do that! Find some more descriptive name in real code!

#![allow(unused)]
fn main() {
let data: Data =
}

The content of the resulting variable looks like this:

Data {
    fname: "Foo",
    lname: "Bar",
    year: 2023,
    height: 6.1,
    married: true,
}

Very nice.

We can access the individual values using the dot-notation. We can then print the values or, as shown in this case, we can use assert_eq! or assert! to verify the values.

Getting started and extra data in the YAML file

What happens if there are extra fields in the YAML file that were not declared in the struct?

In this file there is an extra field called address that was not defined in the struct.

fname: Foo
lname: Bar
year: 2023
height: 6.1
married: true
address: Some place
# A comment

By default the YAML parser of Serde will ignore these extra fields. This is great as it allows us start using the struct even before we manage to map out all the fields.

On the other hand this is also problematic as it means we won't notice when the YAML contains fields that we don't handle. If we also setup default values for some of the fields then a typo in the name of a field will be hard to notice.

Disallow extra, unknown fields

Serde has various container attributes we can apply to the struct. One of them is called deny_unknown_fields.

We can add it to the definition of the struct:

#[derive(Debug, Deserialize)]
#[serde(deny_unknown_fields)]
struct Data {
    fname: String,
    lname: String,
    year: u16,
    height: f32,
    married: bool,
}

If we make that addition and run the program again with the YAML file that has the extra field

cargo run more.yaml

we get a panic!:

There was an error parsing the YAML file unknown field address, expected one of fname, lname, year, height, married at line 6 column 1

Set default values while deserializing YAML in Rust

Some YAML files might be missing some value. In some cases we might want to set default values in the deserialized struct.

We saw how to read a deserialize a simple YAML into a struct. What happens if some of the fields we are expecting in the YAML file are missing?

For the following example we created a struct like this

#![allow(unused)]
fn main() {
#[derive(Deserialize)]
struct Person {
    name: String,
    email: String,
    year: u32,
    married: bool,
}
}

If se supply the following YAML file, each field is filled.

name: Foo Bar
email: foo@bar.com
year: 1990
married: true

However if we provide the following file:

email: foo@bar.com
year: 1990
married: true

We will get an error message in the err variable:

missing field `name`

We get a similar error if more than one field is missing:

name: Foo Bar

Set the default values

One of the solutions is to set default values for some or all of the fields. We can do that by using the default attribute and passing in the name of a function that is going to return the default value.

#![allow(unused)]
fn main() {
#[derive(Deserialize)]
struct Person {
    name: String,

    #[serde(default = "get_default_email")]
    email: String,

    #[serde(default = "get_default_year")]
    year: u32,

    #[serde(default = "get_default_married")]
    married: bool,
}

fn get_default_email() -> String {
    String::from("default@address")
}

fn get_default_year() -> u32 {
    2000
}

fn get_default_married() -> bool {
    false
}
}

The full example

use serde::Deserialize;
use std::fs;

#[derive(Deserialize)]
struct Person {
    name: String,

    #[serde(default = "get_default_email")]
    email: String,

    #[serde(default = "get_default_year")]
    year: u32,

    #[serde(default = "get_default_married")]
    married: bool,
}

fn get_default_email() -> String {
    String::from("default@address")
}

fn get_default_year() -> u32 {
    2000
}

fn get_default_married() -> bool {
    false
}

fn main() {
    let filename = get_filename();
    let text = fs::read_to_string(filename).unwrap();

    let data: Person = serde_yaml::from_str(&text).unwrap_or_else(|err| {
        eprintln!("Could not parse YAML file: {err}");
        std::process::exit(1);
    });

    println!("name: {}", data.name);
    println!("email: {}", data.email);
    println!("year: {}", data.year);
    println!("married: {}", data.married);
}

fn get_filename() -> String {
    let args: Vec<String> = std::env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} FILENAME", args[0]);
        std::process::exit(1);
    }
    args[1].to_string()
}

In this example we have a function called get_filename that gets the name of the file from the command line.

A problem - what if we have a typo?

What if this is the YAML file

name: Foo Bar
email: foo@bar.com
year: 1990
maried: true

Have you noticed the typo I made in one of the fields? I typed in "maried" instead of "married", but I could have mixed up the field called "color" and typed in "colour", if there indeed was such a field.

The current code will happily disregard the field with the typo and use the default value for the "married" field.

That's not ideal.

Dependencies

See the Cargo.toml we had:

[package]
name = "yaml-default-values"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.9"

Deserializing YAML - deny unknown fields

If we have default values or fields are optional then we might never catch if there is a typo in one of the fields.

  • deny_unknown_fields

Defining default values for fields in YAML or making fields optional are very useful features, but if there is a typo in the YAML file we might never notice it. This is certainly a source for a lot of frustration. Luckily there is a solution. We can tell serde to deny_unknown_fields. That way if there is a typo in the names of one of the fields, the parser will return an error.

This is basically what we need to do:

#![allow(unused)]
fn main() {
#[derive(Deserialize)]
#[serde(deny_unknown_fields)]
struct Person {
    name: String,

    #[serde(default = "get_default_married")]
    married: bool,
}

fn get_default_married() -> bool {
    false
}
}

In this struct we expect two fields, name is required, but if there is no married field then we set it do false.

This works well when the YAML file has all the fields:

name: Foo Bar
married: true
name: Foo Bar
married: true

or when the married field is missing:

name: Foo Bar
name: Foo Bar
married: false

However if there is a typo and we have maried instead of married:

name: Foo Bar
maried: true

Then without the deny_unknown_fields we get:

name: Foo Bar
married: false

Adding the deny_unknown_fields attribute would yield the following error:

Could not parse YAML file: unknown field `maried`, expected `name` or `married` at line 2 column 1

Full example

use serde::Deserialize;
use std::fs;

#[derive(Deserialize)]
#[serde(deny_unknown_fields)]
struct Person {
    name: String,

    #[serde(default = "get_default_married")]
    married: bool,
}

fn get_default_married() -> bool {
    false
}

fn main() {
    let filename = get_filename();
    let text = fs::read_to_string(filename).unwrap();

    let data: Person = serde_yaml::from_str(&text).unwrap_or_else(|err| {
        eprintln!("Could not parse YAML file: {err}");
        std::process::exit(1);
    });

    println!("name: {}", data.name);
    println!("married: {}", data.married);
}

fn get_filename() -> String {
    let args: Vec<String> = std::env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} FILENAME", args[0]);
        std::process::exit(1);
    }
    args[1].to_string()
}

Dependencies in Cargo.toml

[package]
name = "yaml-deny-unknown-fields"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.9"

A potential problem

What if we get the files from some external source and the provider decides to add a new field? Our code will stop functioning. On one hand it is good that we immediately notice the extra field, on the other hand we would not want our service to stop working at 2am just because the data supplier decided to roll out their changes at that time.

I am not sure what should be the right solution. How do we balance the two needs: avoiding using default values when there was a typo and allowing the seamless addition of new fields.

Slides

YAML in Rust

Read YAML file

  • serde
  • serde_yml
  • from_reader
  • as_i64
  • as_str
  • struct
  • TODO: if the number of dashes at the top is not correct (e.g. 4, the parser will panic, how to handle this properly?)
Hello World!
3
data = Point { x: 1, y: 2, text: "Hello World!" }
3
Hello World!
---
x: 1
y: 2
text: Hello World!

[package]
name = "read-yaml-file"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_yml = "0.0.12"


```rust
use std::fs::File;

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug)]
struct Point {
    x: i32,
    y: i32,
    text: String,
}

fn main() {
    read_any_yaml();
    read_struct_yaml();
}

fn read_any_yaml() {
    let filename = "data.yaml";
    match File::open(filename) {
        Ok(file) => {
            let data: serde_yml::Value = serde_yml::from_reader(file).expect("YAML parsing error");
            dbg!(&data);

            let text = match data.get("text") {
                Some(val) => val.as_str().unwrap(),
                None => panic!("Field text does not exist"),
            };
            println!("{}", text);

            let x = match data.get("x") {
                Some(val) => val.as_i64().unwrap(),
                None => panic!("Field x does not exist"),
            };
            let y = match data.get("y") {
                Some(val) => val.as_i64().unwrap(),
                None => panic!("Field y does not exist"),
            };
            println!("{}", x + y);
        }
        Err(error) => {
            println!("Error opening file {}: {}", filename, error);
        }
    }
}

fn read_struct_yaml() {
    let filename = "data.yaml";
    match File::open(filename) {
        Ok(file) => {
            let data: Point = serde_yml::from_reader(file).unwrap();
            println!("data = {:?}", data);
            println!("{}", data.x + data.y);
            println!("{}", data.text);
            assert_eq!(data.x, 1);
            assert_eq!(data.y, 2);
            assert_eq!(data.text, "Hello World!");
        }
        Err(error) => {
            println!("Error opening file {}: {}", filename, error);
        }
    }
}

Hello World! 3 data = Point { x: 1, y: 2, text: "Hello World!" } 3 Hello World!


<style>
  footer {
    text-align: center;
    text-wrap: balance;
    margin-top: 5rem;
    display: flex;
    flex-direction: column;
    justify-content: center;
    align-items: center;
  }
  footer p {
    margin: 0;
  }
</style>
<footer><p>Copyright © 2025 • Created with ❤️  by <a href="https://szabgab.com/">Gábor Szabó</a></p>
</footer>

Read YAML file where some field can have arbitrary values

---
title: Sample file
jobs:
  test:
    runs-on: ubuntu
  build:
    runs-on: windows

# Defined fields: title, jobs, runs-on
# Values selected from a well defined list: ubuntu, windows
# User supplied values: test, build, "Sample file"

use std::collections::HashMap;

use serde::{Deserialize, Serialize};

#[allow(non_camel_case_types)]
#[derive(Serialize, Deserialize, Debug, PartialEq)]
enum Platform {
    linux,
    ubuntu,
    windows,
    macos,
}

#[derive(Serialize, Deserialize, Debug)]
#[serde(deny_unknown_fields)]
struct Job {
    #[serde(rename = "runs-on")]
    runs_on: Platform,
}

#[derive(Serialize, Deserialize, Debug)]
#[serde(deny_unknown_fields)]
struct Config {
    title: String,
    jobs: HashMap<String, Job>,
}

fn main() {
    let filename = "data.yaml";

    read_any_yaml(filename);
    read_struct_yaml(filename);
}

fn read_any_yaml(filename: &str) {
    let content = std::fs::read_to_string(filename).expect("File not found");
    let data: serde_yml::Value = serde_yml::from_str(&content).expect("YAML parsing error");
    println!("{:#?}", &data);
    println!("--------");

    let title = match data.get("title") {
        Some(val) => val.as_str().unwrap(),
        None => panic!("Field text does not exist"),
    };
    println!("title: {title}");

    let jobs = match data.get("jobs") {
        Some(val) => val.as_mapping().unwrap(),
        None => panic!("Field jobs does not exist"),
    };
    println!("{:#?}", &jobs);
    for (key, value) in jobs.iter() {
        println!("key: {:?}  value: {:?}", key, value);
    }
    println!("--------");
}

fn read_struct_yaml(filename: &str) {
    let content = std::fs::read_to_string(filename).expect("File not found");
    let data: Config = serde_yml::from_str(&content).expect("YAML parsing error");
    println!("data = {:?}", data);
    println!("title: {}", data.title);
    println!("{:?}", data.jobs.keys());
    for (key, value) in data.jobs.iter() {
        println!("key: {:?} {:?}", key, value);
    }
    assert_eq!(data.title, "Sample file");
    assert_eq!(data.jobs["test"].runs_on, Platform::ubuntu);
    assert_eq!(data.jobs["build"].runs_on, Platform::windows);
}

Deserialize date in YAML

title: Some title
start: 2025-02-08T10:00:00-08:00

[package]
name = "load-datetime-field"
version = "0.1.0"
edition = "2021"

[dependencies]
chrono = { version = "0.4.39", features = ["serde"] }
serde = { version = "1.0.217", features = ["derive"] }
serde_yml = "0.0.12"
use chrono::{DateTime, Timelike, Utc};

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug)]
struct Event1 {
    title: String,
    start: String,
}

#[derive(Serialize, Deserialize, Debug)]
struct Event2 {
    title: String,
    start: DateTime<Utc>,
}

fn main() {
    let filename = "data.yaml";
    let content = std::fs::read_to_string(filename).expect("File not found");
    let data: Event1 = serde_yml::from_str(&content).expect("YAML parsing error");
    println!("{:?}", data);

    let data: Event2 = serde_yml::from_str(&content).expect("YAML parsing error");
    println!("{:?}", data);
    println!("hour: {}", data.start.hour());
    println!("timezone: {}", data.start.timezone());
}
Event1 { title: "Some title", start: "2025-02-08T10:00:00-08:00" }
Event2 { title: "Some title", start: 2025-02-08T18:00:00Z }
hour: 18
timezone: UTC