While trying to connect Spark with MSSQL, we are setting up a JDBC connection and want to Kerberize it. Using the keytab and principal we created, we were able to establish a connection with a simple Java test project. However, when integrating this into Spark, we are encountering issues.
What is the issue and how can we resolve this ?
Below, I am sharing the error we received, the Spark submit command I used, the jaas.conf file, and the relevant parts of the PySpark project.
ERROR:
Caused by: java.security.PrivilegedActionException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at com.microsoft.sqlserver.jdbc.KerbAuthentication.getClientCredential(KerbAuthentication.java:179)
at com.microsoft.sqlserver.jdbc.KerbAuthentication.initAuthInit(KerbAuthentication.java:139)
... 47 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
jaas.conf:
SQLJDBCDriver {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/home/spark/apacheservacc.keytab"
storeKey=true
useTicketCache=false
principal="apacheservacc/xxx:1433@xxx"
debug=true;
};
spark-submit:
spark-submit --driver-class-path /home/spark/mssql-jdbc-12.8.1.jre8.jar --jars /home/spark/mssql-jdbc-12.8.1.jre8.jar --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/etc/jaas.conf" --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/etc/jaas.conf" spark_kerberos_v2.py
pyspark:
def create_spark_session():
spark = SparkSession.builder \
.appName("MSSQL Kerberos Connection") \
.master("yarn") \
.config("spark.submit.deployMode", "client") \
.config("spark.executor.instances", 4) \
.config("spark.default.parallelism", 8) \
.config("spark.jars", "/home/spark/mssql-jdbc-12.8.1.jre8.jar") \
.config("spark.executor.extraJavaOptions",
"-Djava.security.auth.login.config=/etc/jaas.conf "
"-Djava.security.krb5.conf=/etc/krb5.conf "
"-Dsun.security.krb5.debug=true") \
.config("spark.driver.extraJavaOptions",
"-Djava.security.auth.login.config=/etc/jaas.conf "
"-Djava.security.krb5.conf=/etc/krb5.conf "
"-Dsun.security.krb5.debug=true") \
.getOrCreate()
def connect_mssql_with_spark(spark, server, database, keytab_path, principal):
jdbc_url = f"jdbc:sqlserver://{server}:1433;databaseName={database};encrypt=false;integratedSecurity=true;authenticationScheme=JavaKerberos"
# Connection properties
connection_properties = {
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"keytab": "/home/spark/apacheservacc.keytab",
"principal": "apacheservacc/xxx:1433@xxx"
}
print(jdbc_url)