Apache Spark and pyspark not working, simple python script causes exception

I installed Apache Spark to the best of my knowledge; however, it does not work :-( To test my installation, I use the following python script:

    from pyspark.sql import SparkSession
    
    # Create a Spark session
    spark = SparkSession.builder \
        .appName("SimpleApp") \
        .getOrCreate()
    
    # Sample data
    data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
    
    # Create a DataFrame from the data
    df = spark.createDataFrame(data, ["Name", "Age"])
    
    # Show the DataFrame content
    df.show()
    
    # Stop the Spark session
    spark.stop()

The pyspark interpreter starts without problems; however when it comes to "df.show" I see the follwing exception:

>>> df = spark.createDataFrame(data, ["Name", "Age"])
>>> df.show()

Exception ignored in: <_io.BufferedRWPair object at 0x000001C42C971140>+ 1) / 1]

25/02/05 10:15:06 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)

25/02/05 10:15:06 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    df.show()
    ~~~~~~~^^
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\sql\dataframe.py", line 947, in show
    print(self._show_string(n, truncate, vertical))
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\sql\dataframe.py", line 965, in _show_string
    return self._jdf.showString(n, 20, vertical)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
        answer, self.gateway_client, self.target_id, self.name)
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\errors\exceptions\captured.py", line 179, in deco
    return f(*a, **kw)
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
        "An error occurred while calling {0}{1}{2}.\n".
        format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o56.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (MDXN01072079.mshome executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)

To depict my environment I created the following script:

    #!/usr/bin/bash
    which python
    
    python --version
    
    echo $PYSPARK_PYTHON
    
    echo $SPARK_HOME
    
    which spark-shell
    
    echo $HADOOP_HOME
    
    which winutils
....

The result of the script is:

    /c/Python313/python
    
    Python 3.13.1
    
    C:\Python313\python.exe
    
    D:\Tools2\spark-3.5.4-bin-hadoop3
    
    /c/Python313/Scripts/spark-shell
    
    D:\Tools2\hadoop
    
    /d/Tools2/hadoop/bin/winutils

And $SPARK_HOME/bin is in the path :-)

I installed Apache Spark to the best of my knowledge; however, it does not work :-( To test my installation, I use the following python script:

    from pyspark.sql import SparkSession
    
    # Create a Spark session
    spark = SparkSession.builder \
        .appName("SimpleApp") \
        .getOrCreate()
    
    # Sample data
    data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
    
    # Create a DataFrame from the data
    df = spark.createDataFrame(data, ["Name", "Age"])
    
    # Show the DataFrame content
    df.show()
    
    # Stop the Spark session
    spark.stop()

The pyspark interpreter starts without problems; however when it comes to "df.show" I see the follwing exception:

>>> df = spark.createDataFrame(data, ["Name", "Age"])
>>> df.show()

Exception ignored in: <_io.BufferedRWPair object at 0x000001C42C971140>+ 1) / 1]

25/02/05 10:15:06 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)

25/02/05 10:15:06 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    df.show()
    ~~~~~~~^^
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\sql\dataframe.py", line 947, in show
    print(self._show_string(n, truncate, vertical))
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\sql\dataframe.py", line 965, in _show_string
    return self._jdf.showString(n, 20, vertical)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
        answer, self.gateway_client, self.target_id, self.name)
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\pyspark\errors\exceptions\captured.py", line 179, in deco
    return f(*a, **kw)
  File "D:\Tools2\spark-3.5.4-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
        "An error occurred while calling {0}{1}{2}.\n".
        format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o56.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (MDXN01072079.mshome.net executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
        at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
        at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)

To depict my environment I created the following script:

    #!/usr/bin/bash
    which python
    
    python --version
    
    echo $PYSPARK_PYTHON
    
    echo $SPARK_HOME
    
    which spark-shell
    
    echo $HADOOP_HOME
    
    which winutils
....

The result of the script is:

    /c/Python313/python
    
    Python 3.13.1
    
    C:\Python313\python.exe
    
    D:\Tools2\spark-3.5.4-bin-hadoop3
    
    /c/Python313/Scripts/spark-shell
    
    D:\Tools2\hadoop
    
    /d/Tools2/hadoop/bin/winutils

And $SPARK_HOME/bin is in the path :-)

Share Improve this question edited Feb 5 at 10:04 f_puras 2,5054 gold badges35 silver badges45 bronze badges asked Feb 5 at 9:49 petalumaboy 515 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

The code is absolutely fine. I suspect the issue is due to incompatibility between the Spark version you are using and the Python version you have. Please downgrade to PYTHON3.12 and see if that helps.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Apache Spark and pyspark not working, simple python script causes exception - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)