Getting Started With Indexes

9-Nov-2018

MongoDB has many different kinds of indexes, depending on the type of data being indexed and the query forms being used.

Types of Indexes in MongoDB

Types
- Single Field / Compound
- Hashed
- Text Indexes
- Geospatial Indexes
Modifiers
- Unique
- TTL
- Partial
- Sparse

The most commonly used index type is the first one, the other types of indexes are more specialized.

Creating Indexes

Creating indexes in MongoDb is very simple. The general form of the command is:

1

db.<collection>.createIndex(<keys>, <options>)

The first parameter is the list of keys to be included in the index. The second parameter takes a options document. This second parameter is optional, but can be used to include modifiers for the index (such as the name).

To create an index on the week field in the teams collection, we would simply run

1

db.teams.createIndex({teams: 1})

Single Field / Compound Indexes

These types of indexes are the most straightforward to create and use. They center around the following form:

1
2


{ week: 1 }
{ week: 1, teams: 1 }

While these are MongoDB documents, at their core, index definitions are simply a list of keys.

Single Field Indexes

A single field index is just that, an index on one field. This tells the query engine to build an index on just the specified field. One of the things that makes indexes faster than table scans, is that they are inherently sorted. This means we need to provide a sort order when creating indexes, hence the 1 above. This means sort the data in ascending order. To sort in descending order, a -1 should be used.

Compound Indexes

A compound index is simply an index with multiple fields. The fields are indexed in the order specified in the create command. Each field can have a different sort specified.

A Note on Querying Compound Indexes

When querying a compound index, all preceeding fields must be included in the index to use it. For example

1

{ a: 1, b: -1, c: 1, d: 1 }

The following query can not use this index:

1

{ b: 12, c: "foo" }

This is because all values of b in the underlying datastructure are prefixed by values of a. Without a, it is not feasible to search for values of b in the index.

When creating a compound index, you generally want to order fields from highest to lowest cardinality¹ (left to right), however this is not a hard requirement.

High cardinality: fields where the data is very unique whereas low cardinality data lacks uniqueness. A good example of data with high cardinality would be the _id field, email addresses, etc.
Low cardinality: fields where the data is not very unique. For example: bool values. Keep in mind that a bool value can be as restrictive as a high cardinality field, depending on how your application utilizes them.

In MongoDB, a compound index has a maximum limit of 31 fields. In practice, this limit is rarely encountered.

Building upon our example from my last post, indexing on teams would return a large number of results as teams have many games in a season, to then be filtered further by the week number. If we reverse these index fields, a week provides us with a small subset of total games, and a team generally has a limited number of games per week. So we might want to first index on the week number, then by the team. But this depends more on our access pattern than our data cardinality.

Querying Sub-documents

What happens if the field specified in the index is a sub-document? You can do this, but the field ordering and values must all match. For example, lets imagine that week is a document.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


{
    week: {
        number: 1,
        startDate: new ISODate("2018-09-01T00:00:00Z")
    },
    teams: [
        {
            name: "Boston Poindexters",
            abbreviation: "BOP"
        },
        {
            name: "New New York Mets",
            abbreviation: "NNYM"
        }
    ]
}

When attempting to query this document with the indexes we already specified, the following query will not match on the index.

1

db.games.find({ week: { startDate: new ISODate("2018-09-01T00:00:00Z"), number: 1 } })

However, this query will match.

1

db.games.find({ week: { number: 1, startDate: new ISODate("2018-09-01T00:00:00Z") } })

While they effectively contain the same query, from the perspective of the index, these are 2 different values. This is because the index is on the week object, not the individual fields that make up the week object. So order matters in this case.

A note on Multikey Indexes

These are a subtle variation of single field and compound indexes. They are implicitly created when one of the indexed fields is an array. Looking at our example document above, we can see that the teams field is an array. This means when we create the following index, we are actually creating a compound multikey index. However mongod detects this and handles it implicitly with no input from us.

1

{ week: 1, teams: 1 }

The main limitation to be aware of is that multikey indexes can only contain one array field. You cannot have a compound index with mutiple array fields.

As I mentioned, there are some subtle differences between “regular” indexes and multikey indexes, namely on index bounds. Please be sure to read the MongoDB docs on Multikey Indexes and Multikey Index Bounds.

Single Field/Compound Index Summary

Index, from left to right, high to low cardinality
MongoDB has a 31 field limit on Compound Indexes
Indexing a document (instead of a field) fields must have the same value and be in the same order.

Next time, we’ll look at Hashed and Text indexes. As always, I hope this helps and happy programming.

Indexing In MongoDB Series

MongoDB Indexes
Getting Started With Indexes
Text Indexes
Index Modifiers
Indexes in Sharded Clusters
Index Performance

Pete Garafano