最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

pyspark - Simba JDBC Null pointer exception when querying tables via BigQuery Databricks connection - Stack Overflow

programmeradmin1浏览0评论

I have a federated connection to BigQuery that has GA events tables for each of our projects. I'm trying to query each daily table which contains about 400,000 each day, and load into another table, but I keep seeig this Simba JDBC exception.

I've even chunked out (offset) the query to fetch/append 5000 rows at a time, with a sleep inbetween but I still see this error:

SparkException: Job aborted due to stage failure: Task 0 in stage 2947.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2947.0 (TID 15843) (10.21.40.215 executor 20): java.sql.SQLException: [Simba][JDBC](11380) Null pointer exception. at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroStructToString(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTParser.avroToString(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQHTDataHandler.retrieveData(Unknown Source) at bigquery.shaded.simba.googlebigquery.googlebigquery.dataengine.BQResultSet.getData(Unknown Source) at bigquery.shaded.simba.googlebigquery.jdbcmon.SForwardResultSet.getData(Unknown Source) at bigquery.shaded.simba.googlebigquery.jdbcmon.SForwardResultSet.getString(Unknown Source) at .apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13(JdbcUtils.scala:484) at .apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$13$adapted(JdbcUtils.scala:482) at .apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:376) at .apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) at .apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at .apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at .apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at .apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at .apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at .apache.spark.sql.execution.WholeStageCodegenEvaluatorFac...
File <command-6291825545273755>, line 88
 85 df_chunk = df_chunk.withColumn("event_date", lit(event_date))
 87 # Append chunk to Bronze table 
---> 88 df_chunk.write.option("mergeSchema", "true").mode("append").saveAsTable(bronze_table)
 90 offset += BATCH_SIZE

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论