SQLite as a Data REPL
Published on: 2023-10-10
If I have some large csv of data that I need to look at, I've typically reached first for pandas, served via a standard python or ipython repl - but I'm writing this to remember to consider looking at sqlite instead.
Anecdotally (I'm typing it at the command-line now), the sqlite3 cli starts up much faster than python3 repl. Add in the cost of import pandas as pd
, and you've probably already taken like 6-8 seconds. Reading in data with sqlite3 seems much quicker than pd.read_csv
, and I get the benefit of easy persistance with .backup FILE
instead of saving a bunch of additional csv files with the results of different filters.
.import FILENAME TABLE --csv
is all you need to read a csv file as a new table, so give it a shot next time you find yourself with the need to munge some data quickly