最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

apache spark - why two jobs are created for 1 action in pyspark? - Stack Overflow

programmeradmin4浏览0评论

Below is the data used in my csv file

empid,empname,empsal,empdept,empblock
1,abc,2000,cse,A
2,def,1000,ece,C
3,ghi,8000,eee,D
4,jkl,4000,ece,B
5,mno,3000,itd,F
6,pqr,6000,mec,C

1)Running below statement would create one job in spark UI to determine the column names although it is not an action which is known. Attached below is job create in spark UI.

df1=spark.read.format("csv").option("header",True).load('csv_file_location')

2)Running below would not create any job at the moment as it is a transformation

x=df1.groupBy("empblock").agg(avg("empsal").alias("avgsal")).filter(col("avgsal")>2000).orderBy("empblock")

3)When I run below, it is creating 2 jobs. Isn't one action supposed to create one job? What's the reason for multiple jobs being created? does number of jobs don't depend on number of actions being called?

x.show()

发布评论

评论列表(0)

  1. 暂无评论