I am trying to set up a simple but large DuckDB database with a single column of unique values as read from a parquet file. For faster inference of single-point existence checking (WHERE id = test_id
), I want to convert the parquet file to a DuckDB (as recommended) and add a primary key to it.
I tried it as follows:
CREATE OR REPLACE TABLE data (
id UUID PRIMARY KEY
)
AS SELECT DISTINCT id FROM 'my-parquet-file.parquet'
but got the error:
duckdb.duckdb.ParserException: Parser Error: syntax error at or near "AS"
So what is the correct way of setting up the table for efficient single-point existence checking on huge tables?
I am trying to set up a simple but large DuckDB database with a single column of unique values as read from a parquet file. For faster inference of single-point existence checking (WHERE id = test_id
), I want to convert the parquet file to a DuckDB (as recommended) and add a primary key to it.
I tried it as follows:
CREATE OR REPLACE TABLE data (
id UUID PRIMARY KEY
)
AS SELECT DISTINCT id FROM 'my-parquet-file.parquet'
but got the error:
duckdb.duckdb.ParserException: Parser Error: syntax error at or near "AS"
So what is the correct way of setting up the table for efficient single-point existence checking on huge tables?
Share Improve this question edited Jan 19 at 10:09 NateDhaliwal 11210 bronze badges asked Jan 19 at 9:27 Bram VanroyBram Vanroy 28.4k26 gold badges147 silver badges263 bronze badges1 Answer
Reset to default 1You can add primary key via alter table ... add primary key ...
after creation:
create or replace table data
as
select distinct id
from 'my-parquet-file.parquet';
alter table data add primary key (id);
or create table with primary key and then insert data into it:
create or replace table data (
id uuid primary key
);
insert into data (id)
select distinct id
from 'my-parquet-file.parquet';