最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

amazon s3 - Back pressure when using Flink FileSink - Stack Overflow

programmeradmin2浏览0评论

We are using Flink’s FileSink operation for publisher events to s3. Multiple kafka topics are used as source. The streaming is causing back pressure and eventually causing huge lag to process the Kafka topics. How should I resolve this back pressure problem and make sink faster to catch up with the source throughput?

We tried with increasing parallelism by adding more KPU’s, but that didn’t help. The lag keeps increasing and the checkpoint duration was also keeps increasing. Next, tried with RollingPolicy to increase the part size so that we do less S3 calls and that to didn’t help. Thought, we were doing lot of S3 calls and that’s causing the throttling and eventually taking long time to push records. How could I try to resolve the backpressure so that the check pointing is faster and keeps up with the source throughput.

We are using Flink’s FileSink operation for publisher events to s3. Multiple kafka topics are used as source. The streaming is causing back pressure and eventually causing huge lag to process the Kafka topics. How should I resolve this back pressure problem and make sink faster to catch up with the source throughput?

We tried with increasing parallelism by adding more KPU’s, but that didn’t help. The lag keeps increasing and the checkpoint duration was also keeps increasing. Next, tried with RollingPolicy to increase the part size so that we do less S3 calls and that to didn’t help. Thought, we were doing lot of S3 calls and that’s causing the throttling and eventually taking long time to push records. How could I try to resolve the backpressure so that the check pointing is faster and keeps up with the source throughput.

Share Improve this question asked yesterday Vivek RajendranVivek Rajendran 111 bronze badge New contributor Vivek Rajendran is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 2
  • When you view your Flink job via the Web UI, where is the back-pressure starting? And do you see a steady increase in the checkpoint size, as well as the duration? – kkrugler Commented yesterday
  • From the web UI it shows the back pressure on the source level. Which i believe is not the actual place. As I added some logs within each operator and could see all the operator are fast except when it reaches sink it takes lot of time. So, think S3 is the real bottleneck here, as it fails to keep up with the source throughput. Yes both big checkpoint and long duration u could see everytime. – Vivek Rajendran Commented yesterday
Add a comment  | 

1 Answer 1

Reset to default 0

Although the documentation isn't clear on this point, I believe you can use entropy injection with the S3 file sink, which should help. Have you tried this?

发布评论

评论列表(0)

  1. 暂无评论