最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

bash - Spark-submit warns about non-Spark properties from an ini file but works when passed directly - Stack Overflow

programmeradmin5浏览0评论

I have an ini file that sets some environment-specific properties.

cmds.ini
[DEV]
spark_submit=spark3-submit

[PROD]
spark_submit=spark3-submit

I am parsing this file in shell script and creating spark-submit() function that replaces the original spark-submit

source_cmds.sh
#!/bin/bash
env=${1^^}
eval $(
  awk -v section="[$env]" '
    $0 == section {found=1; next}
    /^\[/{found=0}
    found && /^[^#;]/ {
      gsub(/^[ \t]+|[ \t]+$/, "")
      print
    }
  ' /path/to/cmds.ini |
  sed 's/ *= */=/g'
)

spark-submit(){
  $spark_submit "@"
}

Here's how I source and use this script in another wrapper script (wrapper.sh):

wrapper.sh
#!/bin/bash
source /path/to/source_cmds.sh DEV

spark-submit /path/to/pyspark_script.py

I wanted to include additional environment-specific properties in cmds.ini and updated it as follows:

[DEV]
spark_submit=spark3-submit
conf_args="--conf 'spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/path/' --conf 'spark.executor.extraJavaOptions=-Djava.io.tmpdir=/tmp/path'"

[PROD]
spark_submit=spark3-submit
conf_args=

I also modified source_cmds.sh to pass the conf_args to the spark-submit function:


spark-submit(){
  $spark_submit $conf_args "@"
}

Now, when I run wrapper.sh, Spark shows the following warnings:

Warning: Ignoring non-Spark config property: 'spark.driver.extraJavaOptions

Warning: Ignoring non-Spark config property: 'spark.executor.extraJavaOptions

However, running the same properties directly via the spark-submit command works without any issues:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/path/' \
--conf 'spark.executor.extraJavaOptions=-Djava.io.tmpdir=/tmp/path' \
/path/to/pyspark_script.py

Questions:

  1. Why does Spark treat the properties as "non-Spark" when they are read from cmds.ini but work fine when passed directly?
  2. Do I need to change the way conf_args is defined in cmds.ini?
  3. Are there any changes needed in my spark-submit function implementation to properly handle such arguments?

I have an ini file that sets some environment-specific properties.

cmds.ini
[DEV]
spark_submit=spark3-submit

[PROD]
spark_submit=spark3-submit

I am parsing this file in shell script and creating spark-submit() function that replaces the original spark-submit

source_cmds.sh
#!/bin/bash
env=${1^^}
eval $(
  awk -v section="[$env]" '
    $0 == section {found=1; next}
    /^\[/{found=0}
    found && /^[^#;]/ {
      gsub(/^[ \t]+|[ \t]+$/, "")
      print
    }
  ' /path/to/cmds.ini |
  sed 's/ *= */=/g'
)

spark-submit(){
  $spark_submit "@"
}

Here's how I source and use this script in another wrapper script (wrapper.sh):

wrapper.sh
#!/bin/bash
source /path/to/source_cmds.sh DEV

spark-submit /path/to/pyspark_script.py

I wanted to include additional environment-specific properties in cmds.ini and updated it as follows:

[DEV]
spark_submit=spark3-submit
conf_args="--conf 'spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/path/' --conf 'spark.executor.extraJavaOptions=-Djava.io.tmpdir=/tmp/path'"

[PROD]
spark_submit=spark3-submit
conf_args=

I also modified source_cmds.sh to pass the conf_args to the spark-submit function:


spark-submit(){
  $spark_submit $conf_args "@"
}

Now, when I run wrapper.sh, Spark shows the following warnings:

Warning: Ignoring non-Spark config property: 'spark.driver.extraJavaOptions

Warning: Ignoring non-Spark config property: 'spark.executor.extraJavaOptions

However, running the same properties directly via the spark-submit command works without any issues:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/path/' \
--conf 'spark.executor.extraJavaOptions=-Djava.io.tmpdir=/tmp/path' \
/path/to/pyspark_script.py

Questions:

  1. Why does Spark treat the properties as "non-Spark" when they are read from cmds.ini but work fine when passed directly?
  2. Do I need to change the way conf_args is defined in cmds.ini?
  3. Are there any changes needed in my spark-submit function implementation to properly handle such arguments?
Share Improve this question asked Mar 20 at 12:49 PySparkAcePySparkAce 11 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 0

Seems like the issue lies in the way I store conf_args . The outer double quotes make the shell pass it as a single string, and spark interprets it as a single argument.

Fix:

Store the conf_args as an array of values and unpack them as below. This lets the array elements be treated as individual arguments in spark-submit

cmds.ini

[DEV]
spark_submit=spark3-submit
conf_args=(--conf "spark.driver.extraJavaOptions=-Djava.io.tmpdir=/tmp/path/" --conf "spark.executor.extraJavaOptions=-Djava.io.tmpdir=/tmp/path")

[PROD]
spark_submit=spark3-submit
conf_args=()

source_cmds.sh

spark-submit(){
  $spark_submit "${conf_args[@]}" "@"
}

With this, spark identifies each --conf as individual arguments and works well.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论