Embedding simple CSV file in Rust application

include_str! CSV HashMap

We already saw how to embed a simple string in our Rust application and also how to use preprocessing to embed a list of values.

Now we need to embed a simple CSV file that looks like this:

examples/embedded-simple-csv-file/data/languages.csv

rs,rust
sh,bash

toml,toml
lock,toml

It is actually part of the code-base running the Rust Maven web site that maps file extensions to format types.

We need to read in this file and store as a HashMap so we'll be able to easily get the format type from a file extension.

In the case of the list of values I wrote that storing the original text file in the memory would be a waste of memory and thus opted to preprocessing it, but then I thought. The size of these data files is really small relatively to the size of the compiled code. For example in our sample crate the result of cargo build --release is a file of 4,681,888 while the data file is only 36 bytes.

Embedding the file

In this case we take a different approach and embed the file as it is. For this we use the include_str! macro.

examples/embedded-simple-csv-file/src/main.rs

use std::collections::HashMap;

fn main() {
    let ext_to_languages = get_languages();

    println!("{:?}", ext_to_languages);
    println!("{:?}", ext_to_languages["rs"]);

    assert_eq!(ext_to_languages["rs"], "rust");
}

fn get_languages() -> HashMap<String, String> {
    let text = include_str!("../data/languages.csv");

    let mut data = HashMap::new();
    for line in text.split('\n') {
        if line.is_empty() {
            continue;
        }
        let parts = line.split(',');
        let parts: Vec<&str> = parts.collect();
        // let parts = parts.collect::<Vec<&str>>();
        data.insert(parts[0].to_string(), parts[1].to_string());
    }

    data
}


We can now use cargo build --release, we can move the resulting executable anywhere, it will already have the CSV file baked into the code so we won't need to distribute it separately.

Compiled size change

Though I thought the change in the compiled size would be around the size of the file we embed, but I ran a little experiment and it was way more. I commented out the code that the "rs" extension and compiled the code. The resulting file size was 4,677,496. Then I emptied the CSV file and compiled the code again. This time I got a file of 4,670,952. So the difference is 6,544 bytes. Still only 0.2% of the total file size but way more than the 36 bytes I expected. I'll have to investigate this.

This is especially strange as there was no size difference in the embedding simple string case.

Improved version

After publishing this I got some suggestion, based on those I created an improved version with more functional programming elements which is probably way better than this solution. Check out the Embedding simple CSV file and processing in a functional way.

Related Pages

Rust Compilation size and embedded string
HashMap (hash, dictionary, associative array) in Rust
Embedding simple CSV file and processing in a functional way

Author

Gabor Szabo (szabgab)

Gabor Szabo, the author of the Rust Maven web site maintains several Open source projects in Rust and while he still feels he has tons of new things to learn about Rust he already offers training courses in Rust and still teaches Python, Perl, git, GitHub, GitLab, CI, and testing.

Gabor Szabo