c# - Databricks with temp view with syntax error [PARSE_SYNTAX_ERROR]

I am using .NET C# with the SparkSQL ODBC driver to run a query against Databricks. To test my query, I first test and run in a SQL Notebook in the Databricks portal. I create a TEMP VIEW and then use that in a subsequent SELECT and it works great.

In the notebook it looks like so:

CREATE OR REPLACE TEMP VIEW budget AS
SELECT 1 as ID, 2025 as OPYEAR, 1 as OPMONTH, 13.2 as BGQTY
UNION ALL
SELECT 2, 2025, 2, 97.1
UNION ALL
SELECT 3, 2025, 3, 105.8;

SELECT
    SUM(if(date_format(purchdate, "yyyyMMdd")='20250313',budget.BGQTY,0)) as daySum
FROM
    CoreData
JOIN budget on budget.OPYEAR= cast(date_format(purchdate, "yyyy") as int) 
          and budget.OPMONTH= cast(date_format(purchdate, "MM") as int) 
WHERE
         location = 'HDQ';

Now that I have verified the query in a notebook I then add it to my C# code. I use the SparkSQL ODBC driver to get my data and have confirmed that my connection works and all other standard queries are working. With this query, however, I get this error:

Driver={Simba Spark ODBC Driver};Server=xxxxxxxxx;
Exception thrown: 'System.Data.Odbc.OdbcException' in System.Data.dll
ERROR [42601] [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. 
Error message from server: .apache.hive.service.cli.HiveSQLException: Error running query: [PARSE_SYNTAX_ERROR] .apache.spark.sql.catalyst.parser.ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT': extra input 'SELECT'. SQLSTATE: 42601 (line xx, pos 0)

The line reported in the error message coincides with the line of the 2nd SELECT statement.

So I am unsure why it works in the notebook but not in my .NET code.

In the notebook it looks like so:

CREATE OR REPLACE TEMP VIEW budget AS
SELECT 1 as ID, 2025 as OPYEAR, 1 as OPMONTH, 13.2 as BGQTY
UNION ALL
SELECT 2, 2025, 2, 97.1
UNION ALL
SELECT 3, 2025, 3, 105.8;

SELECT
    SUM(if(date_format(purchdate, "yyyyMMdd")='20250313',budget.BGQTY,0)) as daySum
FROM
    CoreData
JOIN budget on budget.OPYEAR= cast(date_format(purchdate, "yyyy") as int) 
          and budget.OPMONTH= cast(date_format(purchdate, "MM") as int) 
WHERE
         location = 'HDQ';

Driver={Simba Spark ODBC Driver};Server=xxxxxxxxx;
Exception thrown: 'System.Data.Odbc.OdbcException' in System.Data.dll
ERROR [42601] [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. 
Error message from server: .apache.hive.service.cli.HiveSQLException: Error running query: [PARSE_SYNTAX_ERROR] .apache.spark.sql.catalyst.parser.ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'SELECT': extra input 'SELECT'. SQLSTATE: 42601 (line xx, pos 0)

The line reported in the error message coincides with the line of the 2nd SELECT statement.

So I am unsure why it works in the notebook but not in my .NET code.

Share Improve this question asked Mar 18 at 19:18 sinDizzy 1,3647 gold badges34 silver badges65 bronze badges

It looks like your select is returning 2 data sets (2 different selects). If that is the expected results the C# code may need to be adjusted to accept the 2 data sets returned to the C# code with a different call/return method accepting 2 data sets/an array of data sets. – Brad Commented Mar 18 at 19:21
It does not. The first SELECT creates a table on the fly that then is used as a JOIN to the second SELECT. ie if I only put the first SELECT in the notebook nothing gets "returned". You need something like a SELECT * FROM budget; to get a result back. – sinDizzy Commented Mar 18 at 20:28

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Ok I found a solution to my problem. The key to this whole thing is to connect to Databricks via ODBC under ONE session and use ExecuteNonQuery twice and then the final query:

using (OdbcConnection cn = new OdbcConnection(databricksConnStr))
{
    cn.ConnectionTimeout = 15 * 60; //15 minutes

    Debug.Print("Attempting to connect to Databricks...");
    cn.Open();
    Debug.Print("Connection to Databricks was successful.");

    // DATABRICKS STEP 1: drop the temp table if it exists
    OdbcCommand cmdDrop = new OdbcCommand("DROP TABLE IF EXISTS budget;", cn);
    cmdDrop.CommandTimeout = 3 * 60; //3 minutes
    int dropNum = cmdDrop.ExecuteNonQuery();
    if (dropNum != -1) throw new Exception("Databricks temp table was NOT dropped.");

    // DATABRICKS STEP 2: now create the temp table from the data we got from secondary source.
    // This is the first part in my example from CREATE to the first semi-colon.
    OdbcCommand cmdCreate = new OdbcCommand(budgetTableSql, cn);
    cmdCreate.CommandTimeout = 3 * 60; //3 minutes
    int createNum = cmdCreate.ExecuteNonQuery();
    if (createNum != -1) throw new Exception("Databricks temp table was NOT created.");

    // DATABRICKS STEP 3: at this point we have the temp table in the same session. So we can then use
    // our main query which uses the budget temp table to compute the final data. This is the second part
    // in my example from SELECT SUM to the second semi-colon.
    DataTable dtFinal = new DataTable();
    OdbcCommand cmdFinal = new OdbcCommand(sql, cn);
    cmdFinal.CommandTimeout = 60 * 10;
    OdbcDataAdapter daFinal = new OdbcDataAdapter(cmdFinal);
    daFinal.Fill(dtFinal);
    Debug.Print("row count=" + dtFinal.Rows.Count);
}

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

c# - Databricks with temp view with syntax error [PARSE_SYNTAX_ERROR] - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)