How to store, query, and index JSON data using AWS DocumentDB?

Reading Time: 11 minutes

NoSQL databases are a great choice for many modern applications such as mobile, web, and gaming that require flexible, scalable, high-performance, and highly functional databases to provide great user experiences. In this blog, we will discuss AWS DocumentDB followed by the implementation to connect to the Amazon DocumentDB cluster from the AWS Cloud9 environment with a mongo shell and run a few queries.

The world’s largest consumer electronics maker “Samsung Electronics” VP says that : 

With ever-increasing data volumes, and growing demand for schema flexibility, our log collection service faced challenges with managing data on a traditional relational database. Amazon DocumentDB (with MongoDB compatibility)’s support for a flexible document model and a fully managed service has relieved us from handling rigid schema and enabled us to operate mission-critical workloads at scale with ease ”  

In this blog, we will cover:

  • SQL & NoSQL Databases
  • Difference between SQL and NoSQL
  • Which database can be the right fit for your business?
  • Terminologies used in SQL vs NoSQL 
  • Types of NoSQL databases
  • How does NoSQL work
  • NoSQL databases offered on AWS
  • What is DocumentDB?
  • Use cases of AWS DocumentDB
  • Benefits of AWS DocumentDB
  • When not to use AWS DocumentDB
  • Companies using AWS DocumentDB
  • Hands-on 
  • Conclusion

SQL & NoSQL databases

SQL, short for Structured Query Language, is a programming language that is used to manage data in relational databases. Relational databases use relations (typically called tables) to store data and then match that data by using common characteristics within the dataset.

Some common relational database management systems that use SQL:

  • Oracle
  • Sybase
  • Microsoft SQL Server
  • Microsoft Access
  • Ingres

The concept of NoSQL databases became popular with Internet giants like Google, Facebook, Amazon, etc. who deal with huge volumes of data. The system response time becomes slow when you use RDBMS for massive volumes of data.

AWS DocumentDB

NoSQL databases are purpose-built for specific data models and have flexible schemas for building modern applications. NoSQL databases are widely recognized for their ease of development, functionality, and performance at scale. They are self-describing, so does not require a schema. Nor does it enforce relations between tables in all cases. All its documents are JSON documents, which are complete entities that one can readily read and understand.

NoSQL database examples include:

  • MongoDB
  • MarkLogic
  • Couchbase
  • CloudDB
  • Amazon DynamoDB
  • Amazon DocumentDB
AWS DocumentDB

Difference between SQL & NoSQL

There are many differences between SQL and NoSQL, all of which are important to understand when making a decision about what might be the best data management system for your organization, following are the differences below :

AWS DocumentDB

Which database can be the right fit for your business?

The first and primary factor in making the SQL vs. NoSQL decision is what your data looks like.

If your data is primarily structured, a SQL database is likely the right choice. Also, any organization will benefit from a predefined structure and set schemas, particularly if they require multi-row transactions.

Situations when all data must be consistent without leaving room for error, such as with accounting systems.

NoSQL is a good choice for those companies experiencing rapid growth with no clear schema definitions. NoSQL offers much more flexibility than a relational database and is a solid option for companies that must analyze large quantities of data or whose data structures they manage are variable.

One of the most important decisions is whether to go with a SQL or NoSQL database as your primary database and whether you may need both to meet your needs.

You’ll need to think about what your data looks like, how you’ll query your data, and the scalability you’ll need in the future.

SQL databases provide great benefits for transactional data whose structure doesn’t change frequently (or at all) and where data integrity is paramount. It’s also best for fast analytical queries.

NoSQL databases provide much more flexibility and scalability, which lends itself to rapid development and iteration.

Terminologies used in SQL Vs NoSQL

The following table compares the terminology used in SQL and NoSQL databases.

AWS DocumentDB

Types of NoSQL Databases

AWS DocumentDB

Each type solves a problem that can’t be solved with relational databases.

Key-Value Pair Based

Data is stored in key/value pairs. Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.

Examples of key-value stores are Redis, Voldemort, Riak, and Amazon’s DynamoDB

Document Oriented

 A document store does assume a certain document structure that can be specified with a schema. Document stores appear the most natural among the NoSQL database types because they’re designed to store everyday documents as is, and they allow for complex querying and calculations on this often already aggregated form of data

Examples of document-oriented are Amazon SimpleDB, CouchDB, MongoDB, Riak, and Amazon DocumentDB.

Column Based

Column-oriented databases work on columns and are based on BigTable paper by Google. Every column is treated separately. The values of single-column databases are stored contiguously

Column-based NoSQL databases are widely used to manage data warehouses, business intelligence, CRM, Library card catalogs,

Examples of a column-based database are HBase, Cassandra Hypertable, etc

Graph-Based

A graph-type database stores entities as well the relations amongst those entities. Compared to a relational database where tables are loosely connected, a Graph database is multi-relational in nature

Graph base databases are mostly used for social networks, logistics, and spatial data.

Examples of Graph-Based are Neo4J, Infinite Graph, OrientDB, FlockDB

How does NoSQL work

NoSQL databases use a variety of data models for accessing and managing data. These types of databases are optimized specifically for applications that require large data volume, low latency, and flexible data models, which are achieved by relaxing some of the data consistency restrictions of other databases.

Consider the example of modeling the schema for a simple book database:

In a relational database, a book record is often stored in separate tables, and relationships are defined by primary and foreign key constraints. In this example, the Books table has columns for ISBN, Book Title, and Edition Number, the Authors table has columns for AuthorID and Author Name, and finally, the Author-ISBN table has columns for AuthorID and ISBN. 

The relational model is designed to enable the database to enforce referential integrity between tables in the database, normalized to reduce redundancy, and generally optimized for storage.

In a NoSQL database, a book record is usually stored as a JSON document. For each book, the item, ISBN, Book Title, Edition Number, Author Name, and AuthorID are stored as attributes in a single document. In this model, data is optimized for intuitive development and horizontal scalability.

NoSQL Databases Offered on AWS

AWS DocumentDB

Key-value: Key-value databases are highly partitionable and allow horizontal scaling at scales that other types of databases cannot achieve. Amazon DynamoDB is designed to provide consistent single-digit millisecond latency for any scale of workloads. This consistent performance is a big part of why the Snapchat Stories feature, which includes Snapchat’s largest storage write workload, moved to DynamoDB.

Document: Document databases make it easier for developers to store and query data in a database by using the same document model format. The flexible, semistructured, and hierarchical nature of documents and document databases allows them to evolve with applications’ needs. Amazon DocumentDB (with MongoDB compatibility) and MongoDB are popular document databases.

Graph: A graph database’s purpose is to make it easy to build and run applications that work with highly connected datasets. Amazon Neptune is a fully-managed graph database service. 

In-memory: Gaming and ad-tech applications have use cases such as leaderboards, session stores, and real-time analytics that require microsecond response times.

Amazon ElastiCache offers Memcached and Redis, to serve low-latency, high-throughput workloads, such as McDonald’s, that cannot be served with disk-based data stores. Amazon 

Search: Amazon Elasticsearch Service (Amazon ES) is purpose-built for providing near-real-time visualizations and analytics of machine-generated data by indexing, aggregating, and searching semi-structured logs and metrics.

In this blog, we will focus more on one of the document-based databases AWS DocumentDB in detail

What is AWS DocumentDB?

AWS DocumentDB is a non-relational database service designed from the ground up to give you the performance, scalability, and availability you need when operating mission-critical MongoDB workloads at scale. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data.

AWS DocumentDB

Use cases of AWS DocumentDB

Benefits of AWS DocumentDB

Benefits of AWS DocumentDB

MongoDB-compatible: Amazon DocumentDB implements the Apache 2.0 open source MongoDB 3.6 and 4.0 APIs by emulating the responses that a MongoDB client expects from a MongoDB server, allowing you to use your existing MongoDB drivers and tools with Amazon DocumentDB.

Fully managed: With Amazon DocumentDB, you don’t need to worry about database management tasks, such as hardware provisioning, patching, setup, configuration, backups, or scaling. Amazon DocumentDB automatically and continuously monitors and backs up your cloud database to Amazon S3, enabling point-in-time recovery.

Performance at scale: Amazon DocumentDB achieves twice the throughput of currently available MongoDB managed servicesAmazon DocumentDB uses a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64 TB per database cluster.

When not to use AWS DocumentDB

When not to use AWS DocumentDB

Companies using AWS DocumentDB

AWS DocumentDB

Hands-On

In this hands-on, we will learn how to get started with Amazon DocumentDB using AWS Cloud9. We will see how to connect to Amazon DocumentDB cluster from your AWS Cloud9 environment with a Mongo shell and run a few queries.

The following diagram shows the final architecture of this walkthrough.

AWS DocumentDB

Here are the steps to be followed :

  1. Creating an AWS Cloud9 Environment
  1. Creating a security group
  1. Creating an Amazon DocumentDB cluster
  1. Installing the mongo shell
  1. Connect to DocumentDB cluster
  1. Inserting and querying data

Firstly we start by creating a Cloud9 environment, follow the steps below

1. Creating an AWS Cloud9 Environment

1.1  – Using the AWS management console, on the AWS Cloud9 management console, choose to Create environment.

         1.2 Enter the name workfall-documentdb and click on next

AWS DocumentDB

    1.3 – In the Configure Settings section, accept all defaults.

    1.4 – Choose Next step.

    1.5 – In the Review section, choose Create environment.

2. Creating a security group

2.1 – On the Amazon EC2 management console, under Network & Security, choose Security groups.

2.2 – Choose to Create a security group.

2.3 – For the Security group name, enter workfall-SG.

2.4 – For Description, enter a description.

2.5 – For VPC, accept the usage of your default VPC

2.6 – In the Inbound rules section, choose Add rule.

2.7 – For Type, choose Custom TCP Rule.

2.8 – For Port Range, enter 27017.

2.9 – The source security group is the security group for the AWS Cloud9 environment you just created. Keep the Source as the default value of Custom and enter “cloud9” in the field adjacent to Custom to see a list of available security groups.

AWS DocumentDB

 2.10 Choose the security group with the name aws-cloud9-<environment name>

2.11 – Accept all other defaults and choose to Create security group. You do not need to modify the outbound rules.

3.  Creating an Amazon DocumentDB cluster

3.1 – On the Amazon DocumentDB management console, under Clusters, choose to Create.

3.2 – On the Create AmazonDocumentDB cluster page, select db.t3.medium under Instance class, and then choose 1 for Number of instances. These options will help minimize costs.

3.3 – Leave other settings at their default.

3.4 – In the Authentication section, enter a username and password

AWS DocumentDB

3.5 Turn on advance setting

3.6 – In the Network settings section, for VPC security groups, choose workfall-SG,choose Create Cluster

4. Installing the mongo shell

4.1 – If your AWS Cloud9 environment is still open, you can skip to step 3.

4.2 – On the AWS Cloud9 management console, under Your environments, choose the name created earlier.

4.3 – Choose open IDE.

4.4 – At the command prompt, create the repository file with the following code:

echo -e "[mongodb-org-3.6] \nname=MongoDB Repository\nbaseurl=https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.6/x86_64/\ngpgcheck=1 \nenabled=1 \ngpgkey=https://www.mongodb.org/static/pgp/server-3.6.asc" | sudo tee /etc/yum.repos.d/mongodb-org-3.6.repo

4.5 – When it is complete, install the mongo shell with the following code:

sudo yum install -y mongodb-org-shell

4.6 – To encrypt data in transit, download the CA certificate for Amazon DocumentDB. See the following code:

wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem

4.7 – You are now ready to connect to your Amazon DocumentDB cluster.

5. Connect to DocumentDB Cluster

5.1 – On the Amazon DocumentDB management console, under Clusters, locate your cluster. 

5.2 – Choose the cluster you created by clicking on the cluster identifier 

5.3 – Copy the connection string provided under “Connect to this cluster with the mongo shell”

Omit <insertYourPassword> so that you are prompted for the password by the mongo shell when you connect. This way, you don’t have to type your password in cleartext.

5.4 – Your connection string should look like the following code (see screenshot)

5.5 – When you enter your password and can see the rs0:PRIMARY> prompt, you are successfully connected to your Amazon DocumentDB cluster

6. Inserting and querying data

6.1 – Now that you are connected to your cluster, you can run a few queries to get familiar with using a document database.

To insert a single document, enter the following code:

db.collection.insert({"hi":"hello"})

You get the following output:

WriteResult({ "nInserted" : 1 })

6.2 – You can read the document that you wrote with the findOne() command (because it only returns a single document). See the following code:

db.collection.findOne()

You get the following output:

{ "_id" : ObjectId("5e401fe56056fda7321fbd67"), "hi" : "hello" }

6.3 – To perform a few more queries, consider the gaming profiles use case. First, insert a few entries into a collection entitled profiles. See the following code:

db.profiles.insertMany([

{ "_id" : 1, "name" : "john", "status": "active", "level": 1, "score":10},

{ "_id" : 2, "name" : "rob", "status": "inactive", "level": 2, "score":50},])

You get the following output:

{ "acknowledged" : true, "insertedIds" : [ 1, 2] }

6.4 – Use the find() command to return all the documents in the profiles collection. See the following code:

db.profiles.find()

You get the following output:

{ "_id" : 1, "name" : "john", "status": "active", "level": 1, "score":10}

{ "_id" : 2, "name" : "rob", "status": "inactive", "level": 2, "score":50}

6.5 – Use a query for a single document using a filter. See the following code.

db.profiles.find({name: "john"})

You get the following output:

{ "_id" : 1, "name" : "john", "status": "active", "level": 1, "score":10}

6.6 – A common use case in gaming is finding a profile for a given user and increment a value in the user’s profile. So we increase their score by +10.

To do that, use the findAndModify command:

db.profiles.findAndModify({

   query: { name: "john", status: "active"},

   update: { $inc: { score: 10 } }})

You get the following output:

{
      "_id" : 1,
      "name" : "john",
      "status" : "active",
      "level" : 1,
      "score" : 20
}

6.7 – You can verify the result with the following query:

db.profiles.find({name: "john"})

You get the following output:

{ "_id" : 1, "name" : "john", "status" : "active", "level" : 1, "score" : 20 }

Conclusion

In this blog, We had a look at traditional RDBMS and how NoSQL databases are becoming the ideal choice for enterprises. We have also explored which options and parameters should be considered while making the right choice for the business use case in selecting the type of database. We also had a look at different types of NoSQL databases and also had a deep dive into DocumentDB, its benefits, and how to start working with DocumentDB. We will discuss other NoSQL databases offered by AWS in our upcoming blogs. Stay tuned to keep getting all updates about our upcoming new blogs on AWS and relevant technologies. 

For any further queries, feel free to post your comments, we are happy to help!

Meanwhile …

Keep Exploring -> Keep Learning -> Keep Mastering

This blog is part of our effort towards building a knowledgeable and kick-ass tech community. At Workfall, we strive to provide the best tech and pay opportunities to AWS-certified talents. If you’re looking to work with global clients, build kick-ass products while making big bucks doing so, give it a shot at workfall.com/partner today.

Back To Top