Unscientific column store benchmarking in Rust

I've been fooling around with some natural language data from OPUS, the “open parallel corpus.” This contains many gigabytes of movie subtitles, UN documents and other text, much of it tagged by part-of-speech and aligned across multiple languages. In total, there's over 50 GB of data, compressed.

“50 GB, compressed” is an awkward quantity of data:

It's large enough so that Pandas can't suck it all into memory.
It's large enough that PostgreSQL stops being fun, and starts feeling like work. (Although cstore_fdw might help.)
It's too small to justify cloud-based tools like Hadoop. As the saying goes, “If it fits on your laptop’s SSD, it’s not big data.” I have USB sticks large enough to hold 50 GB!

Let's look at various ways to tackle this.

Unscientific column store benchmarking in Rust

Trending Articles

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Pol Tapia Arrested by Miami-Dade County Corrections on Jun 10, 2020

Black Angus Grilled Artichokes

Download: 408 Empire ft Mr Turner – Tebanobe

Meet The Top Ten (10) Sexiest Female celebrities in South African...

Cleethorpes pair jailed for savage attack on man in street

Missing girl located, Steeles Avenue West and Bathurst Street, Halima...

PS4 Jailbreak 9.00 Official CFW

IN COURT: Full list of people sentenced at Northampton Magistrates’ Court

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Envoy lauds Brunei-China warm ties

DJ Khaled, Vybz Kartel, Buju Banton & Mavado – You Remind Me (feat. Bounty...

Medical Secretary at States of Jersey

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

SEEDUWA SAKURA LIVE IN GONAPALA 2018

Neem Baba Extra Questions Answer Class 6 English Poorvi

Practice Sheet of Right form of verbs for HSC Students

Nainowale ne Lyrics Translation | Padmaavat (2018)

Windows Update / Microsoft Update の接続先 URL について