Quantcast
Channel: Random Hacks
Viewing all articles
Browse latest Browse all 22

Rust lifetimes: Getting away with things that would be reckless in C++

$
0
0

Over the years, I've learned to be cautious with C++ pointers. In particular, I'm always very careful about who owns a given pointer, and who's in charge of calling delete on it. But my caution often forces me to write deliberately inefficient functions. For example:

vector<string>tokenize_string(conststring&text);

Here, we have a large string text, and we want to split it into a vector of tokens. This function is nice and safe, but it allocates one string for every token in the input. Now, if we were feeling reckless, we could avoid these allocations by returning a vector of pointers into text:

vector<pair<constchar*,constchar*>>tokenize_string2(conststring&text);

In this version, each token is represented by two pointers into text: One pointing to the first character, and one pointing just beyond the last character.1 But this can go horribly wrong:

// Disaster strikes!autov=tokenize_string2(get_input_string());munge(v);

Why does this fail? The function get_input_string returns a temporary string, and tokenize_string2 builds an array of pointers into that string. Unfortunately, the temporary string only lives until the end of the current expression, and then the underlying memory is released. And so all our pointers in v now point into oblivion—and our program just wound up getting featured in a CERT advisory. So personally, I'm going to prefer the inefficient tokenize_string function almost every time.

Rust lifetimes to the rescue!

Going back to our original design, let's declare a type Token. Each token is either a Word or an Other, and each token contains pointers into a pre-existing string. In Rust, we can declare this as follows:

#[deriving(Show, PartialEq)]enumToken<'a>{Word(&'astr),Other(&'astr)}

Read more…


Viewing all articles
Browse latest Browse all 22

Trending Articles