The NoSQL-Relational Comparison Series Continues- Relational SQL vs MongoDB Methods
When we compare relational and NoSQL Systems, one of the critical analyses we have to perform is data access mechanisms. As we’ll learn over the next few articles of this series on data access, the SQL language used by relational database management systems is much more feature rich and powerful than its NoSQL counterpart. This statement isn’t intended to sway readers to relational systems, it is just the author’s evaluation of both systems’ access languages.
The acronym CRUD stands for Create, Read, Update and Delete. They are the operations used to add, retrieve, manipulate and remove data stored in database management systems. Relational databases use an extremely powerful, industry-standard Structured Query Language (SQL) to perform data interaction operations.
Their NoSQL counterparts provide simpler APIs and languages to store and access data. Most of the NoSQL product offerings have unique languages and mechanisms that provide CRUD capabilities. In this installment of our relational to NoSQL comparison series, we’ll compare SQL CRUD operations with MongoDB’s access methods. Our discussion will include data manipulation as well as advanced query constructs that include joins, or lack thereof. In the next article of this series, we’ll take a look at data manipulation and removal.
Relational SQL
Relational’s Structured Query Language, or SQL, is a language specifically designed to interact with data stored in a relational database management system. The language’s foundation is based upon algebra which is a branch of mathematics in which symbols are used to represent numbers or members of a specified set. The symbols represent quantities and general relationships for the set’s members. The set is combined with binary operations that are defined on the set. EF CODD, the father of relational database management systems, first defined relational algebra’s form in 1970. The SQL language can be further decomposed into:
- Data Definition Language (DDL) Used to build schemas and related storage objects in the database management system
- Data Manipulation Language (DML) Statements that allow applications to interact with database data – SELECT, INSERT, UPDATE, DELETE. Also known as CRUD operations
- Data Control Language (DCL) Statements used to control security within the database. Who can access what
Over the years, SQL has become the standard language used to interact with relational databases. It has been adopted as a standard by various organizations that include the American National Standards Institute (ANSI) and the International Standards Organization that is affiliated with the International Electrotechnical Commission (ISO-IEC). The intent of the standardization is to make the language portable between systems. It has become a common language used to interact with all relational database management systems (RDBMS). Since all major RDBMS offerings provide support for the SQL language, developers and DBAs can easily transfer their skills from one relational database to another. We need to use the term “portable” somewhat loosely, in that all database vendors offer product-specific enhancements to the language. Product-specific enhancements are most often not portable between the vendor offerings.
MongoDB Methods
MongoDB provides methods and commands to administer the database environment and interact with stored data. Methods can be accessed from the abstraction layer, and a subset of them are wrappers for a command or set of commands. In this article, we’ll keep the focus on the MongoDB methods that the application layer uses to interact with data stored in a MongoDB database. Before we begin our comparison of MongoDB and SQL query statements, a few definitions and comparisons are in order. Since I’m an Oracle DBA, I’ll use that product as my representative for the relational systems.
Relational (Oracle) | NoSQL (MongoDB) |
Database | Database |
Table | Collection |
Row | Document |
Column | Field |
Index | Index |
Primary Key | _ID field |
Join | Reference or Embedded Document |
Check Constraint | Validation Rules |
Operation | Relational (Oracle) | NoSQL (MongoDB) Method |
Query Data | SELECT | db.collection.FIND() – where collection is the name of the collection being searched |
Insert Data | INSERT | db.collection.insertOne () – inserts single document db.collection.insertMany() – inserts multiple documents db.collection.insert() – inserts one or more documents |
Update Data | UPDATE | db.collection.updateOne() – updates single document db.collection.updateMany() – updates multiple documents db.collection.replaceOne() – replaces a single document db.collection.update() – updates one or more documents |
Delete Data | DELETE | db.collection.deleteOne() – deletes single document db.collection.deleteMany() –deletes multiple documents db.collection.remove() – deletes one or more documents |
Term | Definition |
Projection | Clause that limits the relational columns or MongoDB fields being returned by the query |
Selection | Clause that limits the relational rows or MongoDB documents being returned by the query |
Cursor | Control structure that allows an application to traverse the results returned by a given query |
Let’s take a look at a basic query:
Relational | |
SELECT empno, ename | Projection |
FROM hr. employees | Schema Object |
WHERE salary > 35000 | Selection |
MongoDB | |
db.employees.find( | Schema Object |
{ salary: {$gt: 35000 } } | Selection |
{ empno: 1, ename: 1 } | Projection |
Both Oracle and MongoDB use query operators to enable simple and complex data selection to occur:
Operator | Oracle | MongoDB |
Equal to | = | $eq |
Greater than | > | $gt |
Greater than or equal to | >= | $gte |
Less than | < | $lt |
Less than or equal to | <= | $lte |
Not equal to | != | $ne |
Matches and values in array | IN | $in |
Does not match values in array | NOT IN | $nin |
Matches conditions of either clause | OR | $or |
Matches conditions of both clauses | AND | $and |
Does not match query expression | NOT | $not |
Does not match both query expressions | NOT | $nor |
String pattern searches | LIKE | $regex |
Lack of a value | NULL | NULL* |
Matches documents with specified field | N/A | $exists |
Matches documents with specified type | N/A | $type |
I purposely didn’t include all possible operators as both databases have literally dozens. Oracle, by far, is the clear winner when it comes to the breadth and depth of operators available. This article is intended to be a general comparison of NoSQL and relational systems data access mechanisms, not an in-depth analysis of SQL, NoSQL syntax. I’ll provide a more in-depth comparison in the last article of the Relational vs NoSQL data access series. Here’s an example of sorting the results being returned:
Relational | |
SELECT empno, ename, address | Projection |
FROM hr. employees | Schema Object |
WHERE salary > 35000 | Selection |
ORDER BY empno | Sorting Results (use DESC keyword for descending) |
MongoDB | |
db.employees.find( | Schema Object |
{ salary: {$gt: 35000 } } | Selection |
{ empno: 1, ename: 1 } ) | Projection |
.sort ( { empno: 1 } ) | Sorting Results (use -1 keyword for descending) |
Relational Joins
A SQL join clause combines columns from two or more relational tables. Attributes that are common to both tables are used to relate the objects together. Let’s take a look at an Employee and Department table as an example. Our sample Employee table stores information related to personnel working in our organization. The table stores their employee number, name, job and department identifiers. The DEPTNO column in the employee’s row contains the number of the department that the employee is assigned to. But, there may be other attributes that further describe a given department that we want to store in our database. Departments have a department number, but they also have department name and location attributes.
Employee Table | |
EMPNO | Employee’s Identification Number |
ENAME | Employee’s Name |
JOBNO | Employee’s Job Number |
MGRNO | Employee Manager’s Number |
DEPTNO | Department Employee Works In |
… | Additional Information for Employee |
Department Table | |
DEPTNO | Department Number |
DEPTNAME | Department Name |
DEPTLOC | Location of Department |
… | Additional Information for Department |
When we want to retrieve information from both tables that is related to a given employee, we would use a SQL join operation that combines, or joins, both tables together. SQL supports various join syntaxes. Here’s some of the more common join clauses:
- Inner Join – Join between two tables with an explicit join clause. Returns data based on matching values contained in the columns specified in the join clause
- Left Outer Join – Join between two tables with an explicit join clause but also returns unmatched rows from first table
- Right Outer Join – Join between two tables with an explicit join clause but also returns unmatched rows from second table
The SQL statement below retrieves the names of our employees and the departments they work in. We are joining the two tables together using the DEPTNO column which is an attribute that is common to both tables. Information from both tables will be returned to our application when the DEPTNO column from our Employee table matches the value from the DEPTNO column in our Department table.
SELECT employee.ename, department.deptname
FROM employee
INNER JOIN department
ON employee.deptno = department.deptno
Normalization and Relational Schema Design
Here is where relational schema normalization becomes important. Normalization is the process that has the goal of storing a fact in its most appropriate place. No schema design is perfect, but the goal of normalization is to store a fact in one place to minimize update, delete and insert anomalies. The more times a fact is duplicated in the schema, the more challenging it is to ensure its validity. The graphic below depicts our sample Employee table. In a denormalized model, when we need to change the location (DEPTLOC column) for Department 10 from “Pittsburgh” to “New York,” we are required to find and update every row that has a DEPTNO=10. The application would be required to change all rows of the employee table that have that DEPTNO value, in this case, FOOT and ALLEN. The second table is our newly created Department table. If we move all of the related attributes for a department to a separate table, we are able to more easily ensure their quality. When our applications need to update information related to a department, the applications don’t have to update multiple rows, and oftentimes multiple tables depending on schema design, to ensure all information related to a department is consistent.
MongoDB Embedded Documents, Manual References and DBRefs
MongoDB, like the majority of NoSQL offerings, does not support join operations. However, MongoDB does provide a few alternatives. I’ll provide examples of the first two, which are the most common implementations:
Embedded documents
The data is denormalized. In our example, the employee’s department information would be stored as an embedded document in our Employee document. If any information related to our departments needs to be changed, all employee documents would have to be searched and updated accordingly.
{ empno: 7388,
ename: ‘Foot’,
jobno: 87,
mgrno: 6599,
hiredate: new Date(‘Dec 12, 2015’),
department:
{ deptno: 10,
deptname: ‘sales’,
depltloc: ‘Pittsburgh’
}
}
Manual references Users would save the _ID field of the Department document in the Employee document. The application would execute a second query using the Department’s _ID field to return the related data from the Document collection. The application would have to maintain the _ID fields in the Employee documents programmatically to ensure their validity.
{ empno: 7388,
ename: ‘Foot’,
jobno: 87,
mgrno: 6599,
hiredate: new Date(‘Dec 12, 2015’),
department_id_locator: <object ID of Department 10 Document>
}
DBRefs
DBRefs are references from one document to another using the value of the first document’s _id field, collection name, and, optionally, its database name. By including these names, DBRefs allow documents located in multiple collections to be more easily linked with documents from a single collection. To resolve DBRefs, our application must perform additional queries to return the referenced documents. Some MongoDB drivers have helper methods that form the query for the DBRef automatically.
$LOOKUP (Aggregation Stage)
MongoDB 3.2 introduces the $lookup operator that can now be included as a stage in an aggregation pipeline. $lookup performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.
Wrap-up
In the next article of the series, we’ll discuss data manipulation and removal. We’ll finish the data access series with an overview of the strengths and weaknesses of both relational and NoSQL systems. We’ll then move forward with our Relational-NoSQL comparison series by looking at transaction support and locking mechanisms. Thanks for reading!