你的位置：首页>programmer>apache spark - why two jobs are created for 1 action in pyspark? - Stack Overflow

apache spark - why two jobs are created for 1 action in pyspark? - Stack Overflow

programmeradmin2025-02-204浏览0评论

Below is the data used in my csv file

empid,empname,empsal,empdept,empblock
1,abc,2000,cse,A
2,def,1000,ece,C
3,ghi,8000,eee,D
4,jkl,4000,ece,B
5,mno,3000,itd,F
6,pqr,6000,mec,C

1)Running below statement would create one job in spark UI to determine the column names although it is not an action which is known. Attached below is job create in spark UI.

df1=spark.read.format("csv").option("header",True).load('csv_file_location')

2)Running below would not create any job at the moment as it is a transformation

x=df1.groupBy("empblock").agg(avg("empsal").alias("avgsal")).filter(col("avgsal")>2000).orderBy("empblock")

3)When I run below, it is creating 2 jobs. Isn't one action supposed to create one job? What's the reason for multiple jobs being created? does number of jobs don't depend on number of actions being called?

x.show()

与本文相关的文章

apache spark - why two jobs are created for 1 action in pyspark? - Stack Overflow

评论列表(0)

暂无评论

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

与本文相关的文章

评论列表(0)