I have a spark EMR serverless, which loads some shapefiles data (geolocation data) from an S3 Bucket, all components are deployed on eu-west-1 This spark job is scheduled to run hourly (via Airflow), lately, many runs fails (but some others succeed)! knowing that nothing changed in the code nor the shapefiles data.
I interprete this part; as failing to allocate resources :
25/03/14 12:15:53 INFO EmrServerlessClusterSchedulerBackend$EmrServerlessDriverEndpoint: No executor found for 2a05:d018:11aa:2d01:10cd:9bc3:78cf:e47e:40020
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_6_piece0 on [2a05:d018:11aa:2d01:567f:c521:d8d9:f236]:43269 in memory (size: 5.9 KiB, free: 5.2 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_6_piece0 on [2a05:d018:11aa:2d01:aa36:89ee:797f:553c]:34481 in memory (size: 5.9 KiB, free: 9.0 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_4_piece0 on [2a05:d018:11aa:2d01:567f:c521:d8d9:f236]:43269 in memory (size: 5.9 KiB, free: 5.2 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_4_piece0 on [2a05:d018:11aa:2d01:aa36:89ee:797f:553c]:34481 in memory (size: 5.9 KiB, free: 9.0 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_2_piece0 on [2a05:d018:11aa:2d01:567f:c521:d8d9:f236]:43269 in memory (size: 5.9 KiB, free: 5.2 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_2_piece0 on [2a05:d018:11aa:2d01:aa36:89ee:797f:553c]:34481 in memory (size: 5.9 KiB, free: 9.0 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_3_piece0 on [2a05:d018:11aa:2d01:aa36:89ee:797f:553c]:34481 in memory (size: 17.5 KiB, free: 9.0 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_3_piece0 on [2a05:d018:11aa:2d01:567f:c521:d8d9:f236]:43269 in memory (size: 17.5 KiB, free: 5.2 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_5_piece0 on [2a05:d018:11aa:2d01:567f:c521:d8d9:f236]:43269 in memory (size: 17.5 KiB, free: 5.2 GiB)
25/03/14 12:44:49 INFO BlockManagerInfo: Removed broadcast_5_piece0 on [2a05:d018:11aa:2d01:aa36:89ee:797f:553c]:34481 in memory (size: 17.5 KiB, free: 9.0 GiB)
25/03/14 13:40:01 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 6) ([2a05:d018:11aa:2d01:33f:cc5b:6e5b:9bc5] executor 2): .apache.hadoop.ConnectTimeoutException: getFileStatus on s3a://platform-prod-112233445566-eu-west-1/shapefiles-raw/shapefiles-raw-folder/latest-version/CONTOURS-structure.shp: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws:443 [platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws/16.xxx.xx.162, platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws/52.217.46.88, platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws/52.217.232.218, platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws/52.217.200.242, platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws/16.15.200.252, platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws/52.217.108.192, platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws/3.5.9.55, platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws/54.xxx.xxx.106] failed: connect timed out
at .apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:392)
at .apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201)
at .apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175)
at .apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3799)
at .apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
at .apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401)
at .apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465)
at .apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441)
at .apache.hadoop.fs.FileSystem.open(FileSystem.java:983)
at com.dummy.shapefiles_reader.shp.ShpFile$.apply(ShpFile.scala:65)
at com.dummy.shapefiles_reader.shp.ShpRDDpute(ShpRDD.scala:53)
at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:365)
at .apache.spark.rdd.RDD.iterator(RDD.scala:329)
at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:365)
at .apache.spark.rdd.RDD.iterator(RDD.scala:329)
at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:365)
at .apache.spark.rdd.RDD.iterator(RDD.scala:329)
at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:365)
at .apache.spark.rdd.RDD.iterator(RDD.scala:329)
at .apache.spark.rdd.MapPartitionsRDDpute(MapPartitionsRDD.scala:52)
at .apache.spark.rdd.RDDputeOrReadCheckpoint(RDD.scala:365)
at .apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:378)
at .apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1508)
But I don't understand why it mentions us-east-1, knowing that my components are on eu-west-1:
Unable to execute HTTP request: Connect to platform-prod-112233445566-eu-west-1.s3.us-east-1.amazonaws:443
here is the complete log