apache spark - Databricks with kerberos client

I am trying to install krb5-user and sssd-krb5 via init script on Databricks all-purpose compute with 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) runtime version.

The cluster is failing to start and here is what I get in return:

Spark startup failure: Spark was not able to start in time. This issue can be caused by a malfunctioning Hive metastore, invalid Spark configurations, or malfunctioning init scripts. Please refer to the Spark driver logs to troubleshoot this issue, and contact Databricks if the problem persists.

Internal error message: Spark failed to start: INTERNAL_ERROR: Starting worker failed. Failed to run start slave command in container. command: bash ${DB_HOME:-/home/ubuntu/databricks}/spark/scripts/start_spark_slave.sh
10.139.64.11 7077 10.139.64.10 40000 4 stdout: stderr: lxc-attach: 0117-102328-c55l7skk_1e5bc43bbceb40c98680e6fccbe6f304: attach.c: get_attach_context: 405 Connection refused - Failed to get init pid lxc-attach: 0117-102328-c55l7skk_1e5bc43bbceb40c98680e6fccbe6f304: attach.c: lxc_attach: 1469 Connection refused - Failed to get attach context

Spark driver logs do not provide anything meaningful:

appcds_setup elapsed time: 0.000
SLF4J: Failed to load class ".slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See .html#StaticLoggerBinder for further details.
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
chown: invalid group: ‘:spark-users’
Fri Jan 31 18:34:47 2025 Connection to spark from PID  1376
Fri Jan 31 18:34:47 2025 Initialized gateway on port 44977
Fri Jan 31 18:34:47 2025 Connected to spark.

Init script logs do not have any errors but it seems that kerberos libraries installation just stops, here are few last lines:

Setting up systemd (256.5-2ubuntu3) ...
Installing new version of config file /etc/systemd/journald.conf ...
Installing new version of config file /etc/systemd/logind.conf ...
Installing new version of config file /etc/systemd/networkd.conf ...
Installing new version of config file /etc/systemd/pstore.conf ...
Installing new version of config file /etc/systemd/sleep.conf ...
Installing new version of config file /etc/systemd/system.conf ...
Installing new version of config file /etc/systemd/user.conf ...
/usr/lib/tmpfiles.d/legacy.conf:13: Duplicate line for path "/run/lock", ignoring.
Created symlink '/run/systemd/system/tmp.mount' → '/dev/null'.
/usr/lib/tmpfiles.d/legacy.conf:13: Duplicate line for path "/run/lock", ignoring.
Removing obsolete conffile /etc/systemd/resolved.conf ...

And here is my init script:

#!/bin/bash

export DEBIAN_FRONTEND=noninteractive

echo "deb  oracular main universe" | sudo tee -a /etc/apt/sources.list
sudo apt-get update

sudo apt-get -y install krb5-user
sudo apt-get -y install sssd-krb5
cp /Volumes/main/default/configuration_volume/jaas.config .

Is there a way to install kerberos client libraries or perhaps some other options to run spark streaming jobs consuming the data from kerberised Kafka?

Thanks.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

apache spark - Databricks with kerberos client - Stack Overflow

与本文相关的文章

评论列表(0)