I have a csv in GCS, and there one huge table in BigQuery called emp_target
.
Currently I read this CSV file using spark like:
df = spark.read.format("csv").option()...load()
df.createOrReplaceTempView("empTempView")
now i need to join this view (empTempView) with emp_target
using a query like:
query = "select e.empid,e.empname,e.salary,e.department, t.managerID from empTempView e inner join dataset.emp_target t on e.empid=t.empid"
I tried to execute this using two methods
Method 1:
res_df = spark.sql(query)
Method 1 did not work and gave me an error like empTempView does not exist in bigquery
Method 2:
res_df = spark.read.format("bigquery").option(..).option("dbtable",query)...
Method 2 gave me the same error
Note: I do not have option to write tempView into bigquery and do join and i can not load emp_target into spark dataframe since it is huge
How can I achieve joining above two different datasets in spark and process in dataproc?