I'm developing a web application that interacts with Databricks using the Databricks SQL connector. The application allows users from different teams to add/rename/delete columns of existing tables. The column information is received as HTTP parameters. Snyk started raising SQL injection vulnerability since in the initial version we were passing parameters to the query through format string style. To resolve the issue, we tried using IDENTIFIER() on the schema/table name, but the same does not work on column names under ALTER, UPDATE command. You can refer to from this blog post
Here's the relevant part of my code:
def add_config(self, user_name, action, schema_name, table_name, field_names, field_datatypes):
alter_query = "ALTER TABLE IDENTIFIER(:sch || '.' || :tab) ADD COLUMNS ({})".format(columns_str)
cursor.execute(alter_query, {'sch': schema_name,'tab': table_name})
self.connmit()
I've also implemented a basic input check for safe identifiers:
def is_safe_identifier(self, identifier):
return re.match(r'^[a-zA-Z0-9_]+$', identifier) is not None
This is to ensure input Sanitisation for the http parameters.
How can I modify my code to achieve resolve the SQL Injection while also getting the feature being implemented (add columns dynamically)? Not able to think of any other method than format string.
I'm developing a web application that interacts with Databricks using the Databricks SQL connector. The application allows users from different teams to add/rename/delete columns of existing tables. The column information is received as HTTP parameters. Snyk started raising SQL injection vulnerability since in the initial version we were passing parameters to the query through format string style. To resolve the issue, we tried using IDENTIFIER() on the schema/table name, but the same does not work on column names under ALTER, UPDATE command. You can refer to from this blog post https://community.databricks/t5/technical-blog/how-not-to-build-an-execute-immediate-demo/ba-p/82167
Here's the relevant part of my code:
def add_config(self, user_name, action, schema_name, table_name, field_names, field_datatypes):
alter_query = "ALTER TABLE IDENTIFIER(:sch || '.' || :tab) ADD COLUMNS ({})".format(columns_str)
cursor.execute(alter_query, {'sch': schema_name,'tab': table_name})
self.connmit()
I've also implemented a basic input check for safe identifiers:
def is_safe_identifier(self, identifier):
return re.match(r'^[a-zA-Z0-9_]+$', identifier) is not None
This is to ensure input Sanitisation for the http parameters.
How can I modify my code to achieve resolve the SQL Injection while also getting the feature being implemented (add columns dynamically)? Not able to think of any other method than format string.
Share Improve this question asked Mar 17 at 8:26 Vinay YogeeshVinay Yogeesh 111 bronze badge2 Answers
Reset to default 0Your code is not vulnerable to sql-injection attacks if you only take in strings matching whitelisted characters '^[a-zA-Z0-9_]+$'
. This is also a very reasonable restriction on column names (though I would be a bit stricter and say '^[a-zA-Z][a-zA-Z0-9_]*$'
). Only 'attack' might be to add a column that already exists, causing the statement to fail.
So the question is how do we
(1) ensure the dynamic string in the query actually fulfills this criterion in the future, too
(2) make snyk not report this false positive
You can tell snyk to ignore a certain problem, but that is a last resort: https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/ignore-issues
Now I would have hoped that snyk provides some way to mark a value as 'safe' in some way for the taint-analysis, but I found none. That would be preferable if it gets added. But as it is, I would tell snyk to ignore the issue. Be sure to link the article you posted here as a comment in the code and in the form of the web-UI of snyk. Future developers will be worried by a string-concatenation of SQL and will look for a better alternative using normal sql-injection-prevention methods - a blog-post explaining this does not work in this specific case will save them time and also help to uphold standards for the rest of the codebase.
I would also recommend that, instead of having a 'check'-function returning boolean, have a validation-function that returns its argument on successful validation. You can then use the returned result in the string-concatenation. This make your code safer with respect to future refactorings.
These elements (columns) cannot be bound into JDBC which is why this mechanism will not support them as parameterized. There are two options to do this safely - ideally you should use both:
Validate the columns in these via positive / whitelist validation. Each column name should be checked for existence in the associated tables or against a positive list of characters (which you have done)
You should enquote each column name - adding single quotes around the columns. If you do this, you need to be careful to validate there are no quotes in the name, and error out or escape any quotes. You also need to be aware that adding quotes may make the name case sensitive in many databases.
The second is important because characters can sometimes be used to end a column and add a SQL injection. I believe the current characters are safe but you want to future proof this against someone adding to the list.
Within your function is would be better (as @juliane mentions) to return the value in your validation function. That will allow you to mark the return value as "sanitized" for SQL injection purposes in many code checking tools. Snyk seems to allow you do this with custom santizers but I couldn't track down a lot of documentation on how to do this. The benifit here is that everywhere you use this validation function would then be automatically recognized by Snyk.