Embedding simple CSV file in Rust application

include_str! CSV HashMap

We already saw how to embed a simple string in our Rust application and also how to use preprocessing to embed a list of values.

Now we need to embed a simple CSV file that looks like this:




It is actually part of the code-base running the Rust Maven web site that maps file extensions to format types.

We need to read in this file and store as a HashMap so we'll be able to easily get the format type from a file extension.

In the case of the list of values I wrote that storing the original text file in the memory would be a waste of memory and thus opted to preprocessing it, but then I thought. The size of these data files is really small relatively to the size of the compiled code. For example in our sample crate the result of cargo build --release is a file of 4,681,888 while the data file is only 36 bytes.

Embedding the file

In this case we take a different approach and embed the file as it is. For this we use the include_str! macro.


use std::collections::HashMap;

fn main() {
    let ext_to_languages = get_languages();

    println!("{:?}", ext_to_languages);
    println!("{:?}", ext_to_languages["rs"]);

    assert_eq!(ext_to_languages["rs"], "rust");

fn get_languages() -> HashMap<String, String> {
    let text = include_str!("../data/languages.csv");

    let mut data = HashMap::new();
    for line in text.split('\n') {
        if line.is_empty() {
        let parts = line.split(',');
        let parts: Vec<&str> = parts.collect();
        // let parts = parts.collect::<Vec<&str>>();
        data.insert(parts[0].to_string(), parts[1].to_string());


We can now use cargo build --release, we can move the resulting executable anywhere, it will already have the CSV file baked into the code so we won't need to distribute it separately.

Compiled size change

Though I thought the change in the compiled size would be around the size of the file we embed, but I ran a little experiment and it was way more. I commented out the code that the "rs" extension and compiled the code. The resulting file size was 4,677,496. Then I emptied the CSV file and compiled the code again. This time I got a file of 4,670,952. So the difference is 6,544 bytes. Still only 0.2% of the total file size but way more than the 36 bytes I expected. I'll have to investigate this.

This is especially strange as there was no size difference in the embedding simple string case.

Improved version

After publishing this I got some suggestion, based on those I created an improved version with more functional programming elements which is probably way better than this solution. Check out the Embedding simple CSV file and processing in a functional way.