I am trying to build an application. I have to run a job infinitely, i.e. in while(true). To increase throughput of job, it is split across partitions.
We can compare this to be like Kafka Consumer, that is also partition rebalance aware. And we can upscale or downscale number of consumer threads based on expected throughput, and upon rebalancing, the load is distributed evenly. But, even if there is single instance/thread of consumer, it will still consume from all of the partitions.
I have defined job as a helix resource with StateModel OnlineOffline, and has the same number of partition as job described above. I have defined StateModelFactory and StateModel. In the implementation of StateModel, I have implemented start and stop execution of the partition.
To ensure Controller availability I have deployed 3 controllers in Cluster mode, with LeaderStandby StateModel.
I might have missed something, or my understanding of Helix is wrong. But, I am running into a scenario while running Participants as Spot instances within an ASG. Sometimes, some partitions get re-assigned, but the new participant where the partition was assigned StateModelFactory.createNewStateModel is not called and neither the StateModel function for OFFLINE -> ONLINE transition.
I followed this recipe :
Please help.
Any other suggestions for implementing the same would also be very welcome. I wanted to avoid implementing partition assignment using zookeeper, as it would have been error prone, not flexible (have time constraint in implementation).