Table of Contents
DATA ANALYTICS REFERENCE DOCUMENT |
|
---|---|
Document Title: | Document Title |
Document No.: | 1552560766 |
Author(s): | Gerhard van der Linde, Rita Raher |
Contributor(s): |
REVISION HISTORY
Revision | Details of Modification(s) | Reason for modification | Date | By |
---|---|---|---|---|
0 | Draft release | Document description here | 2019/03/14 10:52 | Gerhard van der Linde, Rita Raher |
Topic 6 - MongoDB I
Why NoSQL Databases?
Scalability
Unstructured Data
MongoDB
- Document Database
- Schemaless
- Horizontal Scalability Through sharding1)
- Duplication of data
JSON
- JSON - JavaScript Object Notation
- Lightweight data-interchange format
- Machine/Human readable
- Language independent
- JSON Structure
- Name/Value pair
- Ordered Lists
JSON Datatypes
Number
{ "id" : 1 }
{ "id" : 3.14 }
Note that there is no distinction between integer and floating point numbers.
String
{ "id" : 1, "fname" : "John" }
Boolean
{ "reg" : "09-G-13" "hybrid" : false }
Array
{ "student" : "G00257854" "subjects" : ["Databases", "Java", "Mobile Apps"] }
Object Document
{ "student" : "G00257854" "address" : { "street" : "Castle Street" "town" : "Athenry" "county" : "Galway" } }
JSON USES
MongoDB, JSON and BSON
- JSON object = MongoDB document
- Internally, MongoDB represents JSON documents in binary-encoded format called BSON (Binary JavaScript Object Notation)
- BSON extends JSOM model to provide additional data types as well as indexes.
MongoDB Structures
Document - slide 12….
A document is record in a MongoDB collection and the basic unit of data in MongoDB. Documents are analogous to JSON objects or records in an RDBMS.
Collection
- A grouping of MongoDB documents.
- Collections are analogous to RDBMS tables.
- A collection exists within a single database.
- Collections do not enforce a schema. Documents within a collection can have different fields.
- Typically, all documents in a collection have a similar or related purpose.
Database A number of databases can be run on a single MongoDB server.
MongoDB Commands
- show dbs - Show Databases
- use myDB - Switch to databases named “myDB” (If it doesn't exist, Mongo creates it)
- db - Show current Database.
- show collections - Show collections in the current database
MongoDB Rules for creating a Document
- Rules for MongoDB documents
- A document must have an _id field. if one is not provided, it will be automatically generated
- The _id cannot be an array
Create a document - save()
Query the database - find()
$and
$or
$in
Attribute
Attribute and age is greater than 20
findOne()
sort()
MongoDB -_id
- As previously described, the document ID (_id) attribute of a mongoDB document is the only mandatory part of a document.
- It can be any value, except an array.
more on save()
insert()
update()
- Modifies an existing document or docents in a collection
- Update (query, update, options)2)
Does not update Mary
$set
deleteOne()
deleteMany()
Operators
Update Operators
Logical Query Operators
Comparison Query Operators
Topic 7 - MongoDB II
More on find()
db.user.find()
To find only documents that have an email attribute and age is greater than 20
db.user.find({$and:[{email: {$exists:true}}, {age:{$gt:20}}]})
find(query, projection)
db.User.find()
Return only the email attribute of documents where age is greater than 18
db.User.find({age: {$gt: 20}}, {email:1})
Return only the first_name and surname attributes of all documents
db.User.find({}, {_id:false, first_name:1, surname:1})
aggregate()
- Calculates aggregate values for the data in a collection
- db.collection.aggregate(pipeline, options)
- pipeline stages
- pipeline Operators
Example
Get the average gpa for all students
db.users.aggregate([{$group:{_id:null, Average{$avg:"$gpa"}}}])
$group same as Group by in MYSQL
Result:
Get the Maximum GPA per age group
db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}])
To sort: $sort
Indexing
Psuedo code example
for each document d in 'user'{ if(d.age == 35){ return d; } }
- Indexes support the efficient execution of queries in MongoDB.
- Without indexes, MongoDB must perform a collection scan, i.e scan every document in a collection, to select those documents that match the query statement.
- Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form.
- Indexes hold mappings from field values to document locations.
getIndexes()
- By default the only index on a document is on the _id field.
- To find the indexes on a collection:
db.collection.getIndexes()
Which returns information in the following format, detailing the index field (_id) and the order of the indexes(1 is ascending:-1 is descending):
"key":{ "_id":1 }
createIndex()
- To create an index on a field other than _id:
- db.collection.createIndex()
db.user.createIndex({age:1})
dropIndex()
- To drop an index on a field use:
db.collection.dropIndex()
- To drop the index on the age field we just created use:
db.collection.dropIndex({age:1})
- Note: The index on _id cannot be dropped
sort()
- When a sort() is performed on a field that is not an index, MongoDB will sort the results in memory.
- If the sort() method consumes more than 32MB of memory, MongoDB aborts the sort.
- To avoid this error, create an index supporting the sort operation.
__Relationships__ in MongoDB
- Modelling relationships between documents
- One-to-One Relationships with Embedded Documents
- One-to-many Relationships with embedded Documents
- One-to-many relationships with document references
One-to-One relationships with embedded documents
db.student.save({_id:"G00789445", name: "John", address:{_id: 100, town: "Athenry", county:"Galway"}})
db.student.find({}, {address:1})
{ "_id" : "G00789445", "address" : { "_id" : 100, "town" : "Athenry", "county" : "Galway" } }
- Show only the county field of documents that have an address field.
db.student.find({address:{$exists: true}}, {_id:0, "address.county":1})
Note: Observe details in projection part of find, i.e. find(query, projection), {_id:0, “address.county”:1}
{_id:0, “address.county”:1} | Meaning |
---|---|
_id:0 | Do NOT output _id field |
“address.county”:1 | Only output county field |
One-to-Many Relationships with Embedded Documents
Create the document with the relationships
db.student.save({_id:"G00101224", name:"Mary", modules:[{_id:"M100", module:"Databases"}, _id:"M101", module:"Java"}]})
Show the student's _id and module of all modules taken by student G00101224
db.student.find({_id:"G00101224"}, {"modules.module":1})
projection - only show the module of the modules field
{"_id":"G00101224", "modules":[{"module":"Databases"},{"module":"Java"}]}
One-to-Many relationships with document References
In the example the document has only two field, but in reality it can be a very long document with much more information, so it makes sense to use relationships instead.
with referencing
//save the modules to the docs collection db.docs.save({_id:"M100", module:"Databases"}) db.docs.save({_id:"M101", module:"Java"}) //save the students to the docs collection with references to the modules using the module _id fields. db.docs.save({_id:"G00101224", name:"Mary", modules["M100", "M101"]}) db.docs.save({_id:"G00205266", name:"Sarah", modules["M100", "M101"]})
$lookup
Using the $lookup pipeline… 3)
Similar to a join in MySQL…
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. The $lookup stage passes these reshaped documents to the next stage.
{ $lookup: { from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field> } }
from - specifies the collection in the same databases to perform the join with. The from collection cannot be sharded.
localField - The value to search for.
foreignField - The field to search for the value specified by localField.
as - The name of the output.
Return all documents including the complete referenced documents
db.docs.aggregate([{$lookup:{from:"docs", localField:"modules", foreignField: "_id", as:"Details"}}])
Embedded Documents vs Referenced Documents
Features of embedded Documents
- Better performance
- Atomic
Features of Referenced Documents
- Slower
- No repetition
- More complex relationships
MongoDB vs MySQL
Features of MongoDB
- Huge amounts of data
- Unstructured
- Doesn't really support relationships
Features of MySQL
- Very Stable
- Structured
- Integrity