Quantcast
Channel: Random Hacks
Viewing all articles
Browse latest Browse all 22

Unscientific column store benchmarking in Rust

$
0
0

I've been fooling around with some natural language data from OPUS, the “open parallel corpus.” This contains many gigabytes of movie subtitles, UN documents and other text, much of it tagged by part-of-speech and aligned across multiple languages. In total, there's over 50 GB of data, compressed.

“50 GB, compressed” is an awkward quantity of data:

Let's look at various ways to tackle this.

Read more…


Viewing all articles
Browse latest Browse all 22

Trending Articles