Fetching Nested MongoDB Subdocuments in Golang using the mgo Driver

6 min readMay 2, 2018

Context

MongoDB makes it easy to organize and store data in hierarchical document structures. However, fetching specific data that is embedded in nested subarrays of a deep hierarchical document can present a challenge, especially when you want to fetch only the data that you need from the database, and nothing more. I will illustrate a technique to efficiently fetch deeply nested subdocuments in MongoDB, including an implementation of the solution in Golang using the mgo driver.

Note: The code examples have been tested with Golang 1.10.1, MongoDB 3.6 and mgo 2.0.

Understanding the Problem

Let’s assume we have a MongoDB collection to store catalogs of luxury fashion items. Here’s a sample JSON document from such a collection.

{
    "_id" : "5ab3a7f781aeb8000181a6b7",
    "catalogName" : "Handbags",
    "publishDate" : "2018-02-28T12:00:04.258699255-05:00",
    "brands" : [ 
        {
            "brandName" : "Gucci",
            "retailers" : [ 
                "Zara", 
                "Dillards", 
                "Lord & Taylor"
            ],
            "items" : [ 
                {
                    "name" : "Olphia",
                    "origin" : "Italy",
                    "price" : 1200.0
                }, 
                {
                    "name" : "Mormont",
                    "origin" : "Italy",
                    "price" : 1300.0
                }, 
                {
                    "name" : "Impreza",
                    "origin" : "UK",
                    "price" : 1400.0
                }
            ]
        }, 
        {
            "brandName" : "Coach",
            "retailers" : [ 
                "Macys", 
                "Harrahs", 
                "Lord & Taylor"
            ],
            "items" : [ 
                {
                    "name" : "Rogue",
                    "origin" : "United States",
                    "price" : 1500.0
                }, 
                {
                    "name" : "Charlie",
                    "origin" : "France",
                    "price" : 900.0
                }
            ]
        }
    ]
}

Each document in this collection represents a catalog of items, with the items further organized by brand. This is a nested structure, with item arrays appearing at the third level. If you’ve worked with MongoDB queries, you are probably familiar with the fact that the query framework is largely focussed on fetching entire documents matching the query criteria. For example, consider following query on this collection:

db.catalog.find({  
   "brands.items":{  
      $elemMatch:{"origin":"Italy",price:{ $gte:500 }
      }
   }
})

The above query will yield all catalog documents in the collection that have one or more items that originate in Italy, and are priced 500 or higher. In the case of our collection, we will get back the entire “Handbags” catalog, with all its brands and items, even if we are interested in only the items that meet our criteria. It will be up to the client application to iterate through the items and extract the relevant data. MongoDB does have a projection framework to limit the query results, but it works for only the first subdocument level of the collection. For example, the above query could use projection as follows:

db.catalog.find({  
   "brands.items":{  
      $elemMatch:{"origin":"Italy",price:{ $gte:500 }
      }
   }
},
{  
   "brands.$":1
})

This would limit the “brands” sub-documents in the result to the first one in each catalog document where the matching item is found. While this projection approach limits the data coming back from the server, it might lead to unexpected results. For example, in this case only items in the first brand (in each catalog document) that has a matching item would be returned. What if there were matching items in other brands?

So, how do you make sure that your MongoDB query gives you the exact data you are looking for in the nested subdocuments, and nothing more? For example, in this case, how do you fetch ALL items across all catalog documents in the collection that match your criteria, and ensure that the result set coming back from MongoDB contains only the item details, and nothing more?

Aggregation to the Rescue

Fortunately, MongoDB comes with an aggregation framework that can be utilized to achieve the desired outcome of fetching specific nested subarray data back from the database. Let’s say our goal is to fetch a list of all items that originate in Italy, and are priced at 500 or higher. We want to ensure that the database operation returns only the items that match the criteria, and nothing else. The following aggregation query accomplishes this goal:

db.getCollection('catalog').aggregate([
  { "$match": {
    "brands": {
      "$elemMatch": { 
        "items.origin": "Italy",
        "items.price": { "$gte": 500 }
      }
    }
  }},
  { "$project": { "_id":0, "brands":1 } },
  { "$addFields": {
    "brands": {
      "$filter": {
        "input": {
          "$map": {
            "input": "$brands",
            "as": "b",
            "in": {
              "items": {
                "$filter": {
                  "input": "$$b.items",
                  "as": "i",
                  "cond": {
                    "$and": [
                      { "$eq": [ "$$i.origin", "Italy" ] },
                      { "$gte": [ "$$i.price", 500 ] }
                    ]
                  }
                }
              }
            }
          }
        },
        "as": "b",
        "cond": 
            { "$gt": [ { "$size": "$$b.items" }, 0 ] }
      }
    }
  }}
])

When this aggregation query is executed on the sample collection (populated using data.json from the GitHub repo referenced later), the result is the following:


{
    "brands" : [ 
        {
            "items" : [ 
                {
                    "name" : "Olphia",
                    "origin" : "Italy",
                    "price" : 1200.0
                }, 
                {
                    "name" : "Mormont",
                    "origin" : "Italy",
                    "price" : 1300.0
                }
            ]
        }
    ]
}
{
    "brands" : [ 
        {
            "items" : [ 
                {
                    "name" : "Racer",
                    "origin" : "Italy",
                    "price" : 600.0
                }
            ]
        }
    ]
}

As you can see, the result set consists of only the data for items that matched the criteria, and nothing else. The query used here utilizes an aggregation pipeline to fetch and progressively refine the data that is returned, using the $match, $project, and $addFields aggregator pipeline stages. It also utilizes the $map aggregation operator to refine the items included in the results. Following is a very high-level explanation of the pipeline, it might be worthwhile to look into the details of the aggregation stages and operators in MongoDB documentation.

All this happens on the MongoDB server utilizing available indexes, making this an overall efficient approach to querying deeply nested subarray data. This approach can be extrapolated to subarrays that are nested deeper into the document structure.

Implementing the Solution in Golang

Now that we have identified a MongoDB technique to get the database to fetch and return only the data we need, how do we implement this in Golang? The aggregation pipeline can be implemented using the mgo MongoDB driver as illustrated below. Here, ‘c’ is an instance of the mgo Collection type. ‘origin’ and ‘minPrice’ are the parameterized values for the query criteria.

pipe := c.Pipe([]bson.M{
   {"$match": bson.M{
      "brands": bson.M{
         "$elemMatch": bson.M{
            "items.origin": bson.M{"$eq": origin},
            "items.price":  bson.M{"$gte": minPrice},
         },
      },
   }},
   {"$project": bson.M{ "_id":0, "brands":1 } },
   {"$addFields": bson.M{
      "brands": bson.M{
         "$filter": bson.M{
            "input": bson.M{
               "$map": bson.M{
                  "input": "$brands",
                  "as":    "b",
                  "in": bson.M{
                     "items": bson.M{
                        "$filter": bson.M{
                           "input": "$$b.items",
                           "as":    "i",
                           "cond": bson.M{
                              "$and": []interface{}{
                                 bson.M{"$eq": []interface{}{"$$i.origin", origin}},
                                 bson.M{"$gte": []interface{}{"$$i.price", minPrice}},
                              },
                           },
                        },
                     },
                  },
               },
            },
            "as":   "b",
            "cond": bson.M{"$gt": []interface{}{bson.M{"$size": "$$b.items"}, 0}},
         },
      },
   },
   }})

The pipeline can be executed and the response obtained as follows:

var resp []bson.M
pipe.All(&resp)

The ‘resp’ returned by the execution has the following map structure, when origin is “Italy”, and minPrice is 500.

[map[brands:[map[items:[map[name:Olphia origin:Italy price:1200] map[name:Mormont origin:Italy price:1300]]]]] map[brands:[map[items:[map[price:600 name:Racer origin:Italy]]]]]]

Notice that this is identical to the result set we obtained earlier by the direct execution of the aggregation pipeline using the MongoDB shell. All that remains now is to unmarshall this into an appropriate Go struct.

type Item struct {
   Name   string  `bson:"name" json:"name"`
   Origin string  `bson:"brandName" json:"origin"`
   Price  float64 `bson:"price" json:"price"`
}//traverse the bson Map returned by the aggregation and extract the //items
var itemsFound []Item
for _, catalogMap := range resp {
   brands := catalogMap["brands"].([]interface{})
   for _, b := range brands {
      brandsMap := b.(bson.M)
      items := brandsMap["items"].([]interface{})
      for _, b := range items {
         itemsMap := b.(bson.M)
         data, _ := json.Marshal(itemsMap)
         var item Item
         if err := json.Unmarshal(data, &item); err != nil {
            return nil, err
         }
         itemsFound = append(itemsFound, item)
      }
   }
}

And that’s it! The ‘itemsFound’ array now contains the item data that we sought out to query and retrieve from MongoDB collection. With this approach, we avoided bringing entire catalog documents from the MongoDB database into our Go program, and traversing the documents on the client side to extract the item data we were interested in. The heavylift for the data refinement now happens on the database.

Source Code

The source code for the example illustrated above is available in the following Github repo:https://github.com/psinghal04/mgonestedarrays

References and Further Reading

MongoDB aggregation : https://docs.mongodb.com/manual/aggregation/
mgo driver for MongoDB: https://labix.org/mgo