最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Apache Flink job state compatibility after resuming it from savepoint - Stack Overflow

programmeradmin0浏览0评论

I am playing with Apache Flink migration. By migration I mean introducing changes of any kind to the already existing Apache Flink job. I couldn't find relevant info in official docs or the info that I found in different sources was inconsistent. I have a few concerns so far:

  1. What about increasing a prallelism eg.: from 8 to 16? Will it rebalance groups by keys on its own? Do I need to perform any action? I want all the threads to be equally busy. Same with the decreasing the parallelism. I read somewhere, that it works out of the box when we increasing/decreasing parallelism by multiplying the current value eg.: 4 -> 8 instead of 4 -> 9.

  2. Adding/amending any field to the state object. Let's assume we want to add a new field to our state. I found a piece of info that I can do it using TypeSerializer attached to stateDescriptor. Such a type descriptor need to be attached to stateDescriptor in a new verion of job? Or it should be a new job that will run once, amend a state and the we can run new version of existing flink job with new state?

I am playing with Apache Flink migration. By migration I mean introducing changes of any kind to the already existing Apache Flink job. I couldn't find relevant info in official docs or the info that I found in different sources was inconsistent. I have a few concerns so far:

  1. What about increasing a prallelism eg.: from 8 to 16? Will it rebalance groups by keys on its own? Do I need to perform any action? I want all the threads to be equally busy. Same with the decreasing the parallelism. I read somewhere, that it works out of the box when we increasing/decreasing parallelism by multiplying the current value eg.: 4 -> 8 instead of 4 -> 9.

  2. Adding/amending any field to the state object. Let's assume we want to add a new field to our state. I found a piece of info that I can do it using TypeSerializer attached to stateDescriptor. Such a type descriptor need to be attached to stateDescriptor in a new verion of job? Or it should be a new job that will run once, amend a state and the we can run new version of existing flink job with new state?

Share Improve this question asked Feb 5 at 13:24 DamDevDamDev 1251 gold badge1 silver badge8 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1
  1. Yes, Flink will rebalance the keys on its own. The maximum parallelism determines the number of key groups that the keys will be hashed into, so you'll want to make sure to set this configuration parameter large enough to avoid data skew. If max parallelism is large enough, there's no reason to think about scaling by integer multiples. (And with the RocksDB state backend, there's no reason not to simply set the max parallelism to 32768.)

  2. If you are using Flink's POJO serializer, or Avro, and you follow the rules for state evolution, then everything will just work and you don't have to worry about it. Adding a field is an example of a case where state migration will just work. For more details, see https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/fault-tolerance/serialization/schema_evolution/.

发布评论

评论列表(0)

  1. 暂无评论