Regex
The goal of this book is to teach Regexes or if you wish Regular Expressions as they can be used in Rust with the regex crate.
The use of Regular expressions is also called "pattern matching" which might be confusing.
TODO
- Match a single digit
[0-9]
or also the non 0-9 digits? - Match the first number.
- Match all the numbers.
- Parse this:
/* core */ some 1 text /* rust */ some 2 text /* perl */ some 3 text
. - The characters used in regexes.
- ...
Regex match exact text
-
Regex
-
captures
In the first example we do something really simple, something for what we actually don't event need a regex, but it can show you the basic syntax in Rust.
We have a string The black cat climbed the green tree
that we read from somewhere so we assume it is a String
.
We would like to see if the series of characters cat
is in the string.
So we create a Regex
using the cat
string as a regex and call the captures
method. This returns an Option
that is either Some
match or
None
.
Cargo.toml
[package]
name = "regex-simple-match"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
regex = "1.9.6"
main.rs
use regex::Regex; fn main() { let text = String::from("The black cat climbed the green tree"); println!("{text}"); let re = Regex::new(r"cat").unwrap(); match re.captures(&text) { Some(value) => println!("Full match: {:?}", &value[0]), None => println!("No match"), }; let re = Regex::new(r"dog").unwrap(); match re.captures(&text) { Some(value) => println!("Full match: {:?}", &value[0]), None => println!("No match"), }; }
Output
The black cat climbed the green tree
Full match: "cat"
No match
Regex match numbers - capture using parentheses
-
Regex
-
captures
-
(
-
)
use regex::Regex; fn main() { let text = String::from("There is the number 23 and another number here: 19"); println!("{}", text); let re = Regex::new(r"[0-9]+").unwrap(); let number = match re.captures(&text) { Some(value) => value, None => { println!("No match"); return; } }; println!("Number of matches: {}", number.len()); println!("Full match: '{}'", &number[0]); // match the number that comes after colon (:) // but the match now includes the : as well let re = Regex::new(r": [0-9]+").unwrap(); let number = match re.captures(&text) { Some(value) => value, None => { println!("No match"); return; } }; println!("Full match: '{}'", &number[0]); // Use parentheses to capture parts of the string let re = Regex::new(r": ([0-9]+)").unwrap(); let number = match re.captures(&text) { Some(value) => value, None => { println!("No match"); return; } }; println!("Full match: '{}'", &number[0]); println!("Matched: '{}'", &number[1]); }
Regex capture all the numbers - multiple match
-
Captures
-
captures_iter
main.rs
#![allow(unused)] fn main() { {{# examples/regex-capture-multiple-numbers/src/main.rs }} }
Output
There is the number 23 and another number here: 19
23
19
["23", "19"]
Substitute
- Captures
- replace
- repalce_all
Cargo.toml
[package]
name = "regex-substitute"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
regex = "1.9.3"
main.rs
use regex::Captures; use regex::Regex; fn main() { let text = "The black cat climbed the green tree"; println!("'{}'", &text); let re = Regex::new(r"cat").unwrap(); let result = re.replace_all(text, "dog"); println!("'{}'", &result); // We can use the captured substring by location let text = "abcde"; println!("'{}'", &text); let re = Regex::new(r"(.)(.)").unwrap(); let result = re.replace_all(text, r"$2$1"); println!("'{}'", &result); // We can use named captured and then the result might be clearer and adding another pair of () // won't impact the code. let re = Regex::new(r"(?<first>.)(?<second>.)").unwrap(); let result = re.replace_all(text, r"$second$first"); println!("'{}'", &result); let text = "12345"; println!("'{}'", &text); let re = Regex::new(r"(.)(.)").unwrap(); let result = re.replace_all(text, |caps: &Captures| { let a: i32 = caps[1].parse().unwrap(); let b: i32 = caps[2].parse().unwrap(); format!("{} ", a + b) }); println!("'{}'", &result); }
Compile once
If we have a function that has a regex in it, then every time we call that function the regex engine has to "compile" the regex to its internal format. This takes time and it is rather unnecessary waste of time. After all the same regex will compile the same way no matter how many times the function is called.
The multiple compilation can be avoided by using the once_cell crate.
Cargo.toml
[package]
name = "regex-compile-once"
version = "0.1.0"
edition = "2024"
[dependencies]
once_cell = "1.21.3"
regex = "1.11.1"
Code
use once_cell::sync::Lazy; use regex::Regex; fn main() { let rows = vec![ String::from("Hello, world!"), String::from("This is some text"), ]; for row in rows { check_something(&row); } } fn check_something(text: &str) { static RE: Lazy<Regex> = Lazy::new(|| { println!("Compiling regex"); Regex::new(r"Hello").unwrap() }); let is = RE.is_match(text); println!("Is match: {}", is); }
In the output we can see that it was compiled only once.
Compiling regex
Is match: true
Is match: false