I have a couple of BigQuery tables for Apache Iceberg, as seen on their documentation page. This created the respective metadata and data folder where the parquet files are stored, on my GCS bucket.
I'm trying to use a REST Catalog to manage these Iceberg tables because I want to access them with Trino, which, as per their documentation, does not seem to support a BigQuery/Google cloud metastore.
I have the basic structure for the rest.properties file in the "Catalog" folder on my Trino configs (I'm mounting Trino on a Docker), but I'm missing the exact way how to authenticate and connect to my gcs bucket. I have not found any examples anywhere while searching the web. What I have so far is the following:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=/<bucket_name>/thelook_ecommerce/iceberg (is this correct?)
iceberg.rest-catalog.security=OAUTH2 (is this what I want in this case?)
????
????
If authentication with service account was allowed, that would be my favoured option, but I don't see in the documentation how I could do it.
I apologize in advance if this seems too basic or if I said something terribly wrong, but I'm really navigating uncharted territory for me ahah
Thank you very much!
I have a couple of BigQuery tables for Apache Iceberg, as seen on their documentation page. This created the respective metadata and data folder where the parquet files are stored, on my GCS bucket.
I'm trying to use a REST Catalog to manage these Iceberg tables because I want to access them with Trino, which, as per their documentation, does not seem to support a BigQuery/Google cloud metastore.
I have the basic structure for the rest.properties file in the "Catalog" folder on my Trino configs (I'm mounting Trino on a Docker), but I'm missing the exact way how to authenticate and connect to my gcs bucket. I have not found any examples anywhere while searching the web. What I have so far is the following:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=https://storage.googleapis/<bucket_name>/thelook_ecommerce/iceberg (is this correct?)
iceberg.rest-catalog.security=OAUTH2 (is this what I want in this case?)
????
????
If authentication with service account was allowed, that would be my favoured option, but I don't see in the documentation how I could do it.
I apologize in advance if this seems too basic or if I said something terribly wrong, but I'm really navigating uncharted territory for me ahah
Thank you very much!
Share Improve this question edited Mar 12 at 17:22 Filipe Pereira asked Mar 12 at 15:06 Filipe PereiraFilipe Pereira 115 bronze badges2 Answers
Reset to default 0I'm not personally aware of anyone showing an example of using Google Cloud as an externally usable Iceberg REST Catalog, but that doesn't mean it isn't happening with someone. When I look at the Google doc page you supplied, I don't see any mention of them supporting a REST Catalog for engines like Trino & Spark. Even the diagram shows them going directly to the metadata files (bypassing the BigQuery Metastore?) with the comments of "OS engines can query (read-only) using metadata snapshots". Usually, the REST Catalog gives the query engine the name of the current snapshot's metadata file and then off to the races from there.
Even the "view iceberg table metadata snapshot" section talks about manually figuring out the metadata snapshot file instead of getting it from a REST Catalog. Additionally, it looks like the "read iceberg tables with spark" section isn't using a REST Catalog either -- it seems to be pointing to the HadoopCatalog provider which I'm thinking just allows you to hand-jam the metadata file stuff too.
Again, not suggesting this all can't work, but I surely haven't seen anyone do it yet. I'd look for that BQ doc page to show an example of how they imagine Trino would connect to one of their Iceberg tables.
In addition to chasing Google on this, there are slack servers for Trino and for Iceberg where you might get somone else who has attempted this. Sorry I don't have any real suggestions to offer -- just my $0.02's worth. ;)
Did you create a REST service that will serve as the catalog?
According to the trino documentation youv'e shared, you need to connect to the REST service
iceberg.rest-catalog.uriREST server API endpoint URI (required). Example: http://iceberg-with-rest:8181
According to your example, you're connecting to the GCS bucket that isn't a REST service.
iceberg.rest-catalog.uri=https://storage.googleapis/<bucket_name>/thelook_ecommerce/iceberg (is this correct?)
These are projects for a REST Catalog service, you'll need to have such a service to connect Trino to GCS.
polaris-oss-apache-icebergapache-spark
gravitino.apache
To my understanding, Google has not yet exposed a REST API endpoint in its various methods of managing ICEBERG. ICEBERG on BigQuery options