最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

node.js - Thinking in JavaScript promises (Bluebird in this case) - Stack Overflow

programmeradmin3浏览0评论

I'm trying to get my head around some not quite so trivial promise/asynchronous use-cases. In an example I'm wrestling with at the moment, I have an array of books returned from a knex query (thenable array) I wish to insert into a database:

books.map(function(book) {

  // Insert into DB

});

Each book item looks like:

var book = {
    title: 'Book title',
    author: 'Author name'
};

However, before I insert each book, I need to retrieve the author's ID from a separate table since this data is normalised. The author may or may not exist, so I need to:

  • Check if the author is present in the DB
  • If it is, use this ID
  • Otherwise, insert the author and use the new ID

However, the above operations are also all asynchronous.

I can just use a promise within the original map (fetch and/or insert ID) as a prerequisite of the insert operation. But the problem here is that, because everything's ran asynchronously, the code may well insert duplicate authors because the initial check-if-author-exists is decoupled from the insert-a-new-author block.

I can think of a few ways to achieve the above but they all involve splitting up the promise chain and generally seem a bit messy. This seems like the kind of problem that must arise quite monly. I'm sure I'm missing something fundamental here!

Any tips?

I'm trying to get my head around some not quite so trivial promise/asynchronous use-cases. In an example I'm wrestling with at the moment, I have an array of books returned from a knex query (thenable array) I wish to insert into a database:

books.map(function(book) {

  // Insert into DB

});

Each book item looks like:

var book = {
    title: 'Book title',
    author: 'Author name'
};

However, before I insert each book, I need to retrieve the author's ID from a separate table since this data is normalised. The author may or may not exist, so I need to:

  • Check if the author is present in the DB
  • If it is, use this ID
  • Otherwise, insert the author and use the new ID

However, the above operations are also all asynchronous.

I can just use a promise within the original map (fetch and/or insert ID) as a prerequisite of the insert operation. But the problem here is that, because everything's ran asynchronously, the code may well insert duplicate authors because the initial check-if-author-exists is decoupled from the insert-a-new-author block.

I can think of a few ways to achieve the above but they all involve splitting up the promise chain and generally seem a bit messy. This seems like the kind of problem that must arise quite monly. I'm sure I'm missing something fundamental here!

Any tips?

Share Improve this question asked May 12, 2015 at 14:05 russx2russx2 8511 gold badge6 silver badges7 bronze badges 4
  • 1 Don't you want an "Upsert" type method. i.e one that will always return you an authors id, if it doesn't exist then it will create it, if it does exist then it will just return the existing authors id? That way you lessen the chance of something jumping in between the check an insert. You likely want to use some kind of locking mechanism here also – Liam Commented May 12, 2015 at 14:14
  • How many books are you inserting, how big is your array? It might be better to return a set of all author IDs first and merge that data client-side. Also, a lot of this logic may be better performed in the database, rather than in JavaScript. – vol7ron Commented May 12, 2015 at 14:21
  • For what it's worth the correct way to do this would be an upsert and letting the database take care of this for you. The problem is upserts are seriously new (5 days ago implemented in Pg) so knex didn't catch up yet. – Benjamin Gruenbaum Commented May 13, 2015 at 5:46
  • Also, instead of batching n inserts - you really want to send a single query if possible. Implementing this logic and concurrency management on the client side seems very silly to me. It would probably be simpler to use a transaction. – Benjamin Gruenbaum Commented May 13, 2015 at 5:48
Add a ment  | 

2 Answers 2

Reset to default 8

Let's assume that you can process each book in parallel. Then everything is quite simple (using only ES6 API):

Promise
  .all(books.map(book => {
    return getAuthor(book.author)
          .catch(createAuthor.bind(null, book.author));
          .then(author => Object.assign(book, { author: author.id }))
          .then(saveBook);
  }))
  .then(() => console.log('All done'))

The problem is that there is a race condition between getting author and creating new author. Consider the following order of events:

  • we try to get author A for book B;
  • getting author A fails;
  • we request creating author A, but it is not created yet;
  • we try to get author A for book C;
  • getting author A fails;
  • we request creating author A (again!);
  • first request pletes;
  • second request pletes;

Now we have two instances of A in author table. This is bad! To solve this problem we can use traditional approach: locking. We need keep a table of per author locks. When we send creation request we lock the appropriate lock. After request pletes we unlock it. All other operations involving the same author need to acquire the lock first before doing anything.

This seems hard, but can be simplified a lot in our case, since we can use our request promises instead of locks:

const authorPromises = {};

function getAuthor(authorName) {

  if (authorPromises[authorName]) {
    return authorPromises[authorName];
  }

  const promise = getAuthorFromDatabase(authorName)
    .catch(createAuthor.bind(null, authorName))
    .then(author => {
      delete authorPromises[authorName];
      return author;
    });

  authorPromises[author] = promise;

  return promise;
}

Promise
  .all(books.map(book => {
    return getAuthor(book.author)
          .then(author => Object.assign(book, { author: author.id }))
          .then(saveBook);
  }))
  .then(() => console.log('All done'))

That's it! Now if a request for author is inflight the same promise will be returned.

Here is how I would implement it. I think some important requirements are:

  • No duplicate authors are ever created (this should be a constraint in the database itself too).
  • If the server does not reply in the middle - no inconsistent data is inserted.
  • Possibility to enter multiple authors.
  • Don't make n queries to the database for n things - avoiding the classic "n+1" problem.

I'd use a transaction, to make sure that updates are atomic - that is if the operation is run and the client dies in the middle - no authors are created without books. It's also important that a temportary failure does not cause a memory leak (like in the answer with the authors map that keeps failed promises).

knex.transaction(Promise.coroutine(function*(t) {
    //get books inside the transaction
    var authors = yield books.map(x => x.author);
    // name should be indexed, this is a single query
    var inDb = yield t.select("authors").whereIn("name", authors);
    var notIn = authors.filter(author => !inDb.includes("author"));
    // now, perform a single multi row insert on the transaction
    // I'm assuming PostgreSQL here (return IDs), this is a bit different for SQLite
    var ids = yield t("authors").insert(notIn.map(name => {authorName: name });
    // update books _inside the transaction_ now with the IDs array
})).then(() => console.log("All done!"));

This has the advantage of only making a fixed number of queries and is likely to be safer and perform better. Moreover, your database is not in a consistent state (although you may have to retry the operation for multiple instances).

发布评论

评论列表(0)

  1. 暂无评论