How can I save the application state for a node.js Application that consists mostly of HTTP request?
I have a script in Node.JS that works with a RESTful API to import a large number (10,000+) of products into an E-Commerce application. The API has a limit on the amount of requests that can be made and we are staring to brush up against that limit. On a previous run the script exited with a Error: connect ETIMEDOUT
probably due to exceeding API limits. I would like to be able to try connecting 5 times and if that fails resume after an hour when the limit has been restored.
It would also be beneficial to save the progress throughout in case of a crash (power goes down, network crashes etc). And be able to resume the script from the point it left off.
I know that Node.js operates as a giant event-queue, all http requests and their callbacks get put into that queue (together with some other events). This makes it a prime target for saving the state of current execution. Other pleasant (not totally necessary for this project) would be being able to distribute the work among several machines on different networks to increase throughput.
So is there an existing way to do it? A framework perhaps? Or do I need to implement this myself, in that case, any useful resources on how this can be done would be appreciated.
How can I save the application state for a node.js Application that consists mostly of HTTP request?
I have a script in Node.JS that works with a RESTful API to import a large number (10,000+) of products into an E-Commerce application. The API has a limit on the amount of requests that can be made and we are staring to brush up against that limit. On a previous run the script exited with a Error: connect ETIMEDOUT
probably due to exceeding API limits. I would like to be able to try connecting 5 times and if that fails resume after an hour when the limit has been restored.
It would also be beneficial to save the progress throughout in case of a crash (power goes down, network crashes etc). And be able to resume the script from the point it left off.
I know that Node.js operates as a giant event-queue, all http requests and their callbacks get put into that queue (together with some other events). This makes it a prime target for saving the state of current execution. Other pleasant (not totally necessary for this project) would be being able to distribute the work among several machines on different networks to increase throughput.
So is there an existing way to do it? A framework perhaps? Or do I need to implement this myself, in that case, any useful resources on how this can be done would be appreciated.
Share Improve this question asked Jun 16, 2012 at 19:37 Michael YagudaevMichael Yagudaev 6,1293 gold badges51 silver badges55 bronze badges 2- What you want is a persistent job queue. There are many of them; one rather good looking one is Kue (build on redis). – Dan D. Commented Jun 16, 2012 at 23:32
- I think you are absolutely right. Kue is quite brilliant. So basically in the producer I would be reading my data file and in the consumer I could be adding/updating each product. So each product would be a separate job. – Michael Yagudaev Commented Jun 18, 2012 at 14:49
1 Answer
Reset to default 3I'm not sure what you mean when you say
I know that Node.js operates as a giant event-queue, all http requests and their callbacks get put into that queue (together with some other events). This makes it a prime target for saving the state of current execution
Please feel free to ment or expound on this if you find it relevant to the answer.
That said, if you're simply looking for a persistence mechanism for this particular task, I might remend Redis, for a few reasons:
- It allows atomic operations on many data types; for example, if you had an entry in Redis called
num_requests_made
that represented the number of requests made, you could increment this number easily in Redis usingINCR num_requests_made
, and it's guaranteed to be atomic, making it easier to scale to multiple workers. - It has several data types that could prove useful for your needs; for example, a simple string could represent the number of API requests made during a certain period of time (as in the previous bullet point); you might store details on failed API request that need to be resubmitted in a list; etc.
- It provides pub/sub mechanisms which would allow you to municate easily between multiple instances of the program.
If this sounds interesting or useful and you're not already familiar with Redis, I highly remend trying out the interactive tutorial, which introduces you to a few data types and mands for them. Another good piece of reading material is A fifteen minute introduction to Redis data types.