Table of Contents


DATA ANALYTICS REFERENCE DOCUMENT

Document Title:Document Title
Document No.:1552560766
Author(s):Gerhard van der Linde, Rita Raher
Contributor(s):

REVISION HISTORY


Revision


Details of Modification(s)

Reason for modification

Date

By
0 Draft releaseDocument description here 2019/03/14 10:52 Gerhard van der Linde, Rita Raher

Topic 6 - MongoDB I

Why NoSQL Databases?

Scalability

Scale Up/Vertically: means moving the database to a bigger server.

Scale Out/Horizontally

Unstructured Data

Add on new features later on like email and then twitter etc…

MongoDB

JSON

JSON Datatypes

Number

{
  "id" : 1
}
{
  "id" : 3.14
}

Note that there is no distinction between integer and floating point numbers.

String

{
  "id" : 1,
  "fname" : "John"
}

Boolean

{
  "reg" : "09-G-13"
  "hybrid" : false 
}

Array

{
  "student" : "G00257854"
  "subjects" : ["Databases", "Java", "Mobile Apps"]
}

Object Document

{
  "student" : "G00257854"
  "address" : {
    "street" : "Castle Street"
    "town" : "Athenry"
    "county" : "Galway"
  }
}

JSON USES

MongoDB, JSON and BSON

MongoDB Structures

Document FIXME - slide 12….

A document is record in a MongoDB collection and the basic unit of data in MongoDB. Documents are analogous to JSON objects or records in an RDBMS.

Collection

Database A number of databases can be run on a single MongoDB server.

MongoDB Commands

MongoDB Rules for creating a Document

Create a document - save()

Query the database - find()

pretty()

$and

$or

$in

Attribute

Attribute and age is greater than 20

findOne()

sort()

MongoDB -_id

more on save()

insert()

update()

Does not update Mary

$set

deleteOne()

deleteMany()

Operators

https://docs.mongodb.com/manual/reference/operator/

Update Operators

https://docs.mongodb.com/manual/reference/operator/update/

Logical Query Operators

https://docs.mongodb.com/manual/reference/operator/query-logical/

Comparison Query Operators

https://docs.mongodb.com/manual/reference/operator/query-comparison/

Topic 7 - MongoDB II

More on find()

db.user.find()

To find only documents that have an email attribute and age is greater than 20

db.user.find({$and:[{email: {$exists:true}}, {age:{$gt:20}}]})

find(query, projection)

db.User.find()

Return only the email attribute of documents where age is greater than 18

db.User.find({age: {$gt: 20}}, {email:1})

Return only the first_name and surname attributes of all documents

db.User.find({}, {_id:false, first_name:1, surname:1})

aggregate()

Example

Get the average gpa for all students

db.users.aggregate([{$group:{_id:null, Average{$avg:"$gpa"}}}])

$group same as Group by in MYSQL

Result:

Get the Maximum GPA per age group

db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}])

To sort: $sort

db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}, {$sort:{_id:1}}])

Indexing

db.user.find()

Return all documents where age is greater than 18

Psuedo code example

for each document d in 'user'{
  if(d.age == 35){
    return d;
  }
}

getIndexes()

db.collection.getIndexes()

Which returns information in the following format, detailing the index field (_id) and the order of the indexes(1 is ascending:-1 is descending):

"key":{
 
  "_id":1
}

createIndex()

db.user.createIndex({age:1})

dropIndex()

db.collection.dropIndex()
db.collection.dropIndex({age:1})

sort()

__Relationships__ in MongoDB

One-to-One relationships with embedded documents

db.student.save({_id:"G00789445", 
                 name: "John", 
                 address:{_id: 100, 
                          town: "Athenry", 
                          county:"Galway"}})
db.student.find({}, {address:1})
{ "_id" : "G00789445",
  "address" : {
    "_id" : 100,
    "town" : "Athenry",
    "county" : "Galway"
  }
}
db.student.find({address:{$exists: true}}, {_id:0, "address.county":1})                 

Note: Observe details in projection part of find, i.e. find(query, projection), {_id:0, “address.county”:1}

{_id:0, “address.county”:1}Meaning
_id:0Do NOT output _id field
“address.county”:1 Only output county field

One-to-Many Relationships with Embedded Documents

Create the document with the relationships

db.student.save({_id:"G00101224", 
                 name:"Mary", 
                 modules:[{_id:"M100", module:"Databases"},
                           _id:"M101", module:"Java"}]})            

Show the student's _id and module of all modules taken by student G00101224

db.student.find({_id:"G00101224"}, {"modules.module":1})          

projection - only show the module of the modules field

{"_id":"G00101224", "modules":[{"module":"Databases"},{"module":"Java"}]}

One-to-Many relationships with document References

In the example the document has only two field, but in reality it can be a very long document with much more information, so it makes sense to use relationships instead.

with referencing

//save the modules to the docs collection
db.docs.save({_id:"M100", module:"Databases"})      
db.docs.save({_id:"M101", module:"Java"})        
//save the students to the docs collection with references to the modules using the module _id fields.
db.docs.save({_id:"G00101224", name:"Mary", modules["M100", "M101"]})
db.docs.save({_id:"G00205266", name:"Sarah", modules["M100", "M101"]})

$lookup

Using the $lookup pipeline3)

Similar to a join in MySQL…

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. The $lookup stage passes these reshaped documents to the next stage.

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

from - specifies the collection in the same databases to perform the join with. The from collection cannot be sharded.

localField - The value to search for.

foreignField - The field to search for the value specified by localField.

as - The name of the output.

Return all documents including the complete referenced documents

db.docs.aggregate([{$lookup:{from:"docs", localField:"modules", foreignField: "_id", as:"Details"}}])

Embedded Documents vs Referenced Documents

Features of embedded Documents

Features of Referenced Documents

MongoDB vs MySQL

Features of MongoDB

Features of MySQL

1)
Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.