Is there a way to read the CSV column data as JSON?
I'm using PSDuckDB...and currently I'm using pure PowerShell to select this 'AuditData' column with JSON data and write it to a new JSON file before I can start with DuckDB queries.
Import-Csv -Path 'C:\M365-UnifiedAuditLog.csv' | Select-Object -ExpandProperty AuditData | Out-File 'C:\AuditData.json' -Encoding UTF8NoBOM
I'm new to DuckDB, but the speed is very promising!
Is there a way to read the CSV column data as JSON?
I'm using PSDuckDB...and currently I'm using pure PowerShell to select this 'AuditData' column with JSON data and write it to a new JSON file before I can start with DuckDB queries.
Import-Csv -Path 'C:\M365-UnifiedAuditLog.csv' | Select-Object -ExpandProperty AuditData | Out-File 'C:\AuditData.json' -Encoding UTF8NoBOM
I'm new to DuckDB, but the speed is very promising!
Share Improve this question edited Feb 7 at 7:37 user3022917 asked Feb 7 at 7:07 user3022917user3022917 6192 gold badges11 silver badges21 bronze badges1 Answer
Reset to default 0If the CSV is valid, and if there is a properly encoded JSON column (i.e., encoded according to the CSV encoding rules), then reading the JSON column can be done trivially.
Example: CSV with header, with column j as encoded JSON
id,j
1,"[1,2]"
2,"{""a"":3}"
$ duckdb
D from read_csv('json-in-csv.csv', types={'j': 'JSON'} );
┌───────┬─────────┐
│ id │ j │
│ int64 │ json │
├───────┼─────────┤
│ 1 │ [1,2] │
│ 2 │ {"a":3} │
└───────┴─────────┘
D
One can also use columns
or auto_type_candidates
, e.g. as follow:
from read_csv('json-in-csv.csv', columns={'id': int64, 'j':JSON});
from read_csv('json-in-csv.csv', auto_type_candidates=['json', 'BIGINT']);
Things would of course be much simpler if you were dealing with a TSV file, since raw tabs cannot appear in valid JSON.