最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Find and Replace Strings in Documents Efficiently - Stack Overflow

programmeradmin1浏览0评论

I have the following query, to find   tags in a name field and replace them with an empty space - to get rid of them.
Name strings can have 1 to many   tags e.g.

AA aa
AA  aa
AA   aa
AA    aa
AA AA aaaaaaaa

... like that.

  db.tests.find({'name':/.* .*/}).forEach(function(test){
      test.name = test.name.replace(" ","");
      db.tests.save(test);
   });

   db.tests.find({'name':/.*  .*/}).forEach(function(test){
      test.name = test.name.replace("  ","");
      db.tests.save(test);
   });

  db.tests.find({'name':/.*   .*/}).forEach(function(test){
      test.name = test.name.replace("   ","");
      db.tests.save(test);
   });

Other than repeating the same query pattern, is there a better solution to handle this scenario, in terms of less duplication and higher performance?

I have the following query, to find   tags in a name field and replace them with an empty space - to get rid of them.
Name strings can have 1 to many   tags e.g.

AA aa
AA  aa
AA   aa
AA    aa
AA AA aaaaaaaa

... like that.

  db.tests.find({'name':/.* .*/}).forEach(function(test){
      test.name = test.name.replace(" ","");
      db.tests.save(test);
   });

   db.tests.find({'name':/.*  .*/}).forEach(function(test){
      test.name = test.name.replace("  ","");
      db.tests.save(test);
   });

  db.tests.find({'name':/.*   .*/}).forEach(function(test){
      test.name = test.name.replace("   ","");
      db.tests.save(test);
   });

Other than repeating the same query pattern, is there a better solution to handle this scenario, in terms of less duplication and higher performance?

Share Improve this question edited Sep 22, 2017 at 18:01 CommunityBot 11 silver badge asked Mar 4, 2015 at 23:11 SanathSanath 4,88611 gold badges57 silver badges87 bronze badges
Add a comment  | 

3 Answers 3

Reset to default 19

Surely if all you want to do is strip the   entities from your text then you just do a global match and replace:

db.tests.find({ "name": /\ /g }).forEach(function(doc) {
    doc.name = doc.name.replace(/ /g,"");
    db.tests.update({ "_id": doc._id },{ "$set": { "name": doc.name } });
});

So there should be no need to write out every combination, the regex will replace very match with the /g option. Possibly also use /m for multi-line is your "name" string contains newline characters. See a basic regexer example.

It is also recommended to use $set in order to only modify the field(s) you really want to rather than .save() the whole document back. There is less traffic and less chance of overwriting changes that might have been made by another process since the document was read.

Ideally you would use the Bulk Operations API with MongoDB versions 2.6 and greater. This allows the updates to "batch" so there is again less traffic between the client and the server:

var bulk = db.tests.initializeOrderedBulkOp();
var count = 0;

db.tests.find({ "name": /\ /g }).forEach(function(doc) {
    doc.name = doc.name.replace(/ /g,"");
    bulk.find({ "_id": doc._id })
        .updateOne({ "$set": { "name": doc.name } });
    count++;

    if ( count % 1000 == 0 ) {
        bulk.execute();
        bulk = db.tests.initializeOrderedBulkOp();
    }
});

if  ( count % 1000 != 0 )
    bulk.execute();

Those are your primary ways to improve this. Unfortunately there is no way for a MongoDB update statement to use an existing value as part of it's update expression in this way, so the only way is looping, but you can do a lot to reduce the operations as is shown.

Nowadays,

  • starting Mongo 4.2, db.collection.updateMany (alias of db.collection.update) can accept an aggregation pipeline, finally allowing the update of a field based on its own value.
  • starting Mongo 4.4, the new aggregation operator $replaceAll makes it very easy to replace parts of a string.
// { "name" : "AA aa" }
// { "name" : "AA  aa" }
// { "name" : "AA AA aaaaaaaa" }
db.collection.updateMany(
  { name: { $regex: /\&nbsp\;/ } },
  [{
    $set: { name: {
      $replaceAll: { input: "$name", find: " ", replacement: "" }
    }}
  }]
)
// { "name" : "AAaa" }
// { "name" : "AAaa" }
// { "name" : "AAAAaaaaaaaa" }
  • The first part ({ name: { $regex: /\&nbsp\;/ } }) is the match query, filtering which documents to update (the ones containing " ")
  • The second part ($set: { name: {...) is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
    • $set is a new aggregation operator (Mongo 4.2) which in this case replaces the value of a field.
    • The new value is computed with the new $replaceAll operator. Note how name is modified directly based on the its own value ($name).

As   does not appear as a string in MongoDB search, hence instead of a string, I have used its UNICODE u00a0 as shown below:

db.tests.find({}).forEach(function (x) {
    x.name = x.name.replace(/\u00a0/g, ' ');

    db.tests.save(x);
});

Here, I am replacing   in name data field with white space

发布评论

评论列表(0)

  1. 暂无评论