最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Local Spark and Yarn Spark giving different results - Stack Overflow

programmeradmin1浏览0评论

I have a function called runSparkFlow() which takes SparkSession object and hashmap of Dataset<Row>. This hashmap of datasets is created from a function called getSourceDatasets(). I am running both of these functions in local and yarn and they are reading the same tables or precisely same data.

But there's a difference in the number of output rows in the local output and the yarn output. I looked at both the DAG visualisation but not able to figure out at which point the difference is getting created.

I just want to understand conceptually, what are the possible points from where the difference can occur? Like one of the case is filtering on timestamp column and due to timezone difference this can occur (although this is not the case).

发布评论

评论列表(0)

  1. 暂无评论