Sample documents of a collection
DB: MYDB
Collection: MYCOLL
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622"
}
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=h242;AF=9y87;C=8w622"
}
I would like to have expected output as following:
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622",
"A":"5242",
"AF":"987",
"C":"82622"
}
I have tried to do it myself, but for now I created the following aggregate code for a specific value in CSV. which is not what I want to do. Can you please tell me how can I make it better in performance and create (updating the current collection document) in the form as I showed above?
db["MYCOLL"].aggregate([
{
$project: {
INFO_ARR: {
$split:["$INFO",";"]
}
}
},
{
$project: {
AF_ARR: {
$arrayElemAt: [ "$INFO_ARR" , 1 ] }
}
},
{
$project: {
AF_FREQ_ARR: {
$split: [ "$AF_ARR" , "=" ]}
}
},
{
$project: {
AF: {
$arrayElemAt: [ "$AF_FREQ_ARR" , 1 ] }
}
},
{ $merge : { into: { db: "MYDB", coll: "MYCOLL" }, on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }
])
Note: I can not use javascript as the collection is of size 400GB and is having millions of records. Hence, Javascript may be too slow to update all of the records. Am I wrong about this ?
Sample documents of a collection
DB: MYDB
Collection: MYCOLL
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622"
}
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=h242;AF=9y87;C=8w622"
}
I would like to have expected output as following:
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622",
"A":"5242",
"AF":"987",
"C":"82622"
}
I have tried to do it myself, but for now I created the following aggregate code for a specific value in CSV. which is not what I want to do. Can you please tell me how can I make it better in performance and create (updating the current collection document) in the form as I showed above?
db["MYCOLL"].aggregate([
{
$project: {
INFO_ARR: {
$split:["$INFO",";"]
}
}
},
{
$project: {
AF_ARR: {
$arrayElemAt: [ "$INFO_ARR" , 1 ] }
}
},
{
$project: {
AF_FREQ_ARR: {
$split: [ "$AF_ARR" , "=" ]}
}
},
{
$project: {
AF: {
$arrayElemAt: [ "$AF_FREQ_ARR" , 1 ] }
}
},
{ $merge : { into: { db: "MYDB", coll: "MYCOLL" }, on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }
])
Note: I can not use javascript as the collection is of size 400GB and is having millions of records. Hence, Javascript may be too slow to update all of the records. Am I wrong about this ?
Share Improve this question asked Jan 30 at 12:57 ShahanShahan 491 silver badge7 bronze badges 02 Answers
Reset to default 3After splitting the array on semi-colons ;
, split each item of that result on =
. That gives an array of pairs like [ ["A", "5242"], ["AF", "987"], ["C", "82622"]
.
Then use arrayToObject
which will treat the 1st item as the key and the 2nd item as the value. After that, merge with Root and remove the temp array.
I've done the transformations as separate steps but you can combine them into one:
db.mycoll.aggregate([
{
$set: {
INFO_ARR: { $split: ["$INFO", ";"] }
}
},
{
$set: {
INFO_ARR: {
$map: {
input: "$INFO_ARR",
in: { $split: ["$$this", "="] }
}
}
}
},
{
$set: {
INFO_ARR: { $arrayToObject: "$INFO_ARR" }
}
},
{
$replaceWith: {
$mergeObjects: ["$$ROOT", "$INFO_ARR"]
}
},
{
$unset: "INFO_ARR"
},
{
$merge: {
into: {
db: "MYDB",
coll: "MYCOLL"
},
on: "_id",
whenMatched: "merge",
whenNotMatched: "insert"
}
}
])
Mongo Playground
Also, this can be done as an updateMany
query since every doc needs to be updated anyway: Mongo Playground with update & aggregation syntax
You can use the following aggregation pipeline to split the contents of the INFO
property and add the values as properties:
[
{
$project: {
INFO: 1
}
},
{
$set: {
info_arr: {
"$split": [
"$INFO",
";"
]
}
}
},
{
$set: {
info_arr2: {
"$map": {
"input": "$info_arr",
"in": {
k: {
$first: {
$split: [
"$$this",
"="
]
}
},
v: {
$last: {
$split: [
"$$this",
"="
]
}
}
}
}
}
}
},
{
$set: {
info_obj: {
"$arrayToObject": "$info_arr2"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$$ROOT",
"$info_obj"
]
}
}
},
{
$unset: [
"info_arr",
"info_arr2",
"info_obj"
]
}
]
- The pipeline first splits the field content at the semicolons, then splits the individual lines at the equality sign and assigns the key to a
k
field, the value to av
field. - At this point, the
$arrayToObject
operator is used to create an object from the array contents. - Afterwards, the properties of the newly created subdocument are merged with the
$$ROOT
document. - At the end, the temporary properties are removed using an
$unset
stage.
At the end, the documents look like this:
{
"A": "5242",
"AF": "987",
"C": "82622",
"INFO": "A=5242;AF=987;C=82622",
"_id": ObjectId("5a934e000102030405000001")
}
If you add a $merge
stage, you can add the new properties to existing documents in the collection.
See this playground to test.