Regex

The goal of this book is to teach Regexes or if you wish Regular Expressions as they can be used in Rust with the regex crate.

The use of Regular expressions is also called "pattern matching" which might be confusing.

TODO

  • Match a single digit [0-9] or also the non 0-9 digits?
  • Match the first number.
  • Match all the numbers.
  • Parse this: /* core */ some 1 text /* rust */ some 2 text /* perl */ some 3 text.
  • The characters used in regexes.
  • ...

Regex match exact text

In the first example we do something really simple, something for what we actually don't event need a regex, but it can show you the basic syntax in Rust.

We have a string The black cat climbed the green tree that we read from somewhere so we assume it is a String.

We would like to see if the series of characters cat is in the string.

So we create a Regex using the cat string as a regex and call the captures method. This returns an Option that is either Some match or None.

Cargo.toml

[package]
name = "regex-simple-match"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
regex = "1.9.6"

main.rs

use regex::Regex;

fn main() {
    let text = String::from("The black cat climbed the green tree");
    println!("{text}");

    let re = Regex::new(r"cat").unwrap();
    match re.captures(&text) {
        Some(value) => println!("Full match: {:?}", &value[0]),
        None => println!("No match"),
    };

    let re = Regex::new(r"dog").unwrap();
    match re.captures(&text) {
        Some(value) => println!("Full match: {:?}", &value[0]),
        None => println!("No match"),
    };
}

Output

The black cat climbed the green tree
Full match: "cat"
No match

Regex match numbers - capture using parentheses

use regex::Regex;

fn main() {
    let text = String::from("There is the number 23 and another number here: 19");
    println!("{}", text);

    let re = Regex::new(r"[0-9]+").unwrap();

    let number = match re.captures(&text) {
        Some(value) => value,
        None => {
            println!("No match");
            return;
        }
    };
    println!("Number of matches: {}", number.len());
    println!("Full match: '{}'", &number[0]);

    // match the number that comes after colon (:)
    // but the match now includes the : as well
    let re = Regex::new(r": [0-9]+").unwrap();
    let number = match re.captures(&text) {
        Some(value) => value,
        None => {
            println!("No match");
            return;
        }
    };
    println!("Full match: '{}'", &number[0]);

    // Use parentheses to capture parts of the string
    let re = Regex::new(r": ([0-9]+)").unwrap();
    let number = match re.captures(&text) {
        Some(value) => value,
        None => {
            println!("No match");
            return;
        }
    };
    println!("Full match: '{}'", &number[0]);
    println!("Matched: '{}'", &number[1]);
}

Regex capture all the numbers - multiple match

main.rs

#![allow(unused)]
fn main() {
{{# examples/regex-capture-multiple-numbers/src/main.rs }}
}

Output

There is the number 23 and another number here: 19
23
19
["23", "19"]

Substitute

  • Captures
  • replace
  • repalce_all

Cargo.toml

[package]
name = "regex-substitute"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
regex = "1.9.3"

main.rs

use regex::Captures;
use regex::Regex;

fn main() {
    let text = "The black cat climbed the green tree";
    println!("'{}'", &text);

    let re = Regex::new(r"cat").unwrap();
    let result = re.replace_all(text, "dog");
    println!("'{}'", &result);

    // We can use the captured substring by location
    let text = "abcde";
    println!("'{}'", &text);
    let re = Regex::new(r"(.)(.)").unwrap();
    let result = re.replace_all(text, r"$2$1");
    println!("'{}'", &result);

    // We can use named captured and then the result might be clearer and adding another pair of ()
    // won't impact the code.
    let re = Regex::new(r"(?<first>.)(?<second>.)").unwrap();
    let result = re.replace_all(text, r"$second$first");
    println!("'{}'", &result);

    let text = "12345";
    println!("'{}'", &text);
    let re = Regex::new(r"(.)(.)").unwrap();
    let result = re.replace_all(text, |caps: &Captures| {
        let a: i32 = caps[1].parse().unwrap();
        let b: i32 = caps[2].parse().unwrap();
        format!("{} ", a + b)
    });
    println!("'{}'", &result);
}

Compile once

If we have a function that has a regex in it, then every time we call that function the regex engine has to "compile" the regex to its internal format. This takes time and it is rather unnecessary waste of time. After all the same regex will compile the same way no matter how many times the function is called.

The multiple compilation can be avoided by using the once_cell crate.

Cargo.toml

[package]
name = "regex-compile-once"
version = "0.1.0"
edition = "2024"

[dependencies]
once_cell = "1.21.3"
regex = "1.11.1"

Code

use once_cell::sync::Lazy;
use regex::Regex;

fn main() {
    let rows = vec![
        String::from("Hello, world!"),
        String::from("This is some text"),
    ];

    for row in rows {
        check_something(&row);
    }
}


fn check_something(text: &str) {
    static RE: Lazy<Regex> = Lazy::new(|| {
        println!("Compiling regex");
        Regex::new(r"Hello").unwrap()
    });

    let is = RE.is_match(text);
    println!("Is match: {}", is);
}

In the output we can see that it was compiled only once.

Compiling regex
Is match: true
Is match: false