I installed Apache Spark to the best of my knowledge; however, it does not work :-( To test my installation, I use the following python script:
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder \
.appName("SimpleApp") \
.getOrCreate()
# Sample data
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
# Create a DataFrame from the data
df = spark.createDataFrame(data, ["Name", "Age"])
# Show the DataFrame content
df.show()
# Stop the Spark session
spark.stop()
The pyspark interpreter starts without problems; however when it comes to "df.show" I see the follwing exception:
>>> df = spark.createDataFrame(data, ["Name", "Age"])
>>> df.show()
Exception ignored in: <_io.BufferedRWPair object at 0x000001C42C971140>+ 1) / 1]
25/02/05 10:15:06 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
25/02/05 10:15:06 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
df.show()
~~~~~~~^^
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\sql\dataframe.py", line 947, in show
print(self._show_string(n, truncate, vertical))
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\sql\dataframe.py", line 965, in _show_string
return self._jdf.showString(n, 20, vertical)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\java_gateway.py", line 1322, in __call__
return_value = get_return_value(
answer, self.gateway_client, self.target_id, self.name)
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\errors\exceptions\captured.py", line 179, in deco
return f(*a, **kw)
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
"An error occurred while calling {0}{1}{2}.\n".
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o56.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (MDXN01072079.mshome executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
To depict my environment I created the following script:
#!/usr/bin/bash
which python
python --version
echo $PYSPARK_PYTHON
echo $SPARK_HOME
which spark-shell
echo $HADOOP_HOME
which winutils
....
The result of the script is:
/c/Python313/python
Python 3.13.1
C:\Python313\python.exe
D:\Tools2\spark-3.5.4-bin-hadoop3
/c/Python313/Scripts/spark-shell
D:\Tools2\hadoop
/d/Tools2/hadoop/bin/winutils
And $SPARK_HOME/bin is in the path :-)
I installed Apache Spark to the best of my knowledge; however, it does not work :-( To test my installation, I use the following python script:
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder \
.appName("SimpleApp") \
.getOrCreate()
# Sample data
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
# Create a DataFrame from the data
df = spark.createDataFrame(data, ["Name", "Age"])
# Show the DataFrame content
df.show()
# Stop the Spark session
spark.stop()
The pyspark interpreter starts without problems; however when it comes to "df.show" I see the follwing exception:
>>> df = spark.createDataFrame(data, ["Name", "Age"])
>>> df.show()
Exception ignored in: <_io.BufferedRWPair object at 0x000001C42C971140>+ 1) / 1]
25/02/05 10:15:06 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
25/02/05 10:15:06 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
df.show()
~~~~~~~^^
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\sql\dataframe.py", line 947, in show
print(self._show_string(n, truncate, vertical))
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\sql\dataframe.py", line 965, in _show_string
return self._jdf.showString(n, 20, vertical)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\java_gateway.py", line 1322, in __call__
return_value = get_return_value(
answer, self.gateway_client, self.target_id, self.name)
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\errors\exceptions\captured.py", line 179, in deco
return f(*a, **kw)
File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
"An error occurred while calling {0}{1}{2}.\n".
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o56.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (MDXN01072079.mshome.net executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
To depict my environment I created the following script:
#!/usr/bin/bash
which python
python --version
echo $PYSPARK_PYTHON
echo $SPARK_HOME
which spark-shell
echo $HADOOP_HOME
which winutils
....
The result of the script is:
/c/Python313/python
Python 3.13.1
C:\Python313\python.exe
D:\Tools2\spark-3.5.4-bin-hadoop3
/c/Python313/Scripts/spark-shell
D:\Tools2\hadoop
/d/Tools2/hadoop/bin/winutils
And $SPARK_HOME/bin is in the path :-)
Share Improve this question edited Feb 5 at 10:04 f_puras 2,5054 gold badges35 silver badges45 bronze badges asked Feb 5 at 9:49 petalumaboypetalumaboy 515 bronze badges1 Answer
Reset to default 0The code is absolutely fine. I suspect the issue is due to incompatibility between the Spark version you are using and the Python version you have. Please downgrade to PYTHON3.12 and see if that helps.