User Tools

Site Tools


modules:52553_mongodb

DATA ANALYTICS REFERENCE DOCUMENT

Document Title:Document Title
Document No.:1552560766
Author(s):Gerhard van der Linde, Rita Raher
Contributor(s):

REVISION HISTORY


Revision


Details of Modification(s)

Reason for modification

Date

By
0 Draft releaseDocument description here 2019/03/14 10:52 Gerhard van der Linde, Rita Raher

Topic 6 - MongoDB I

Why NoSQL Databases?

Scalability

Scale Up/Vertically: means moving the database to a bigger server.

Scale Out/Horizontally

Unstructured Data

Add on new features later on like email and then twitter etc…

MongoDB

  • Document Database
  • Schemaless
  • Horizontal Scalability Through sharding1)
  • Duplication of data

JSON

  • JSON - JavaScript Object Notation
  • Lightweight data-interchange format
  • Machine/Human readable
  • Language independent
  • JSON Structure
    • Name/Value pair
    • Ordered Lists

JSON Datatypes

Number

{
  "id" : 1
}
{
  "id" : 3.14
}

Note that there is no distinction between integer and floating point numbers.

String

{
  "id" : 1,
  "fname" : "John"
}

Boolean

{
  "reg" : "09-G-13"
  "hybrid" : false 
}

Array

{
  "student" : "G00257854"
  "subjects" : ["Databases", "Java", "Mobile Apps"]
}

Object Document

{
  "student" : "G00257854"
  "address" : {
    "street" : "Castle Street"
    "town" : "Athenry"
    "county" : "Galway"
  }
}

JSON USES

MongoDB, JSON and BSON

  • JSON object = MongoDB document
  • Internally, MongoDB represents JSON documents in binary-encoded format called BSON (Binary JavaScript Object Notation)
  • BSON extends JSOM model to provide additional data types as well as indexes.

MongoDB Structures

Document FIXME - slide 12….

A document is record in a MongoDB collection and the basic unit of data in MongoDB. Documents are analogous to JSON objects or records in an RDBMS.

Collection

  • A grouping of MongoDB documents.
  • Collections are analogous to RDBMS tables.
  • A collection exists within a single database.
  • Collections do not enforce a schema. Documents within a collection can have different fields.
  • Typically, all documents in a collection have a similar or related purpose.

Database A number of databases can be run on a single MongoDB server.

MongoDB Commands

  • show dbs - Show Databases
  • use myDB - Switch to databases named “myDB” (If it doesn't exist, Mongo creates it)
  • db - Show current Database.
  • show collections - Show collections in the current database

MongoDB Rules for creating a Document

  • Rules for MongoDB documents
    • A document must have an _id field. if one is not provided, it will be automatically generated
    • The _id cannot be an array

Create a document - save()

Query the database - find()

pretty()

$and

$or

$in

Attribute

Attribute and age is greater than 20

findOne()

sort()

MongoDB -_id

  • As previously described, the document ID (_id) attribute of a mongoDB document is the only mandatory part of a document.
  • It can be any value, except an array.

more on save()

insert()

  • Insert a document or documents into a collection.

update()

  • Modifies an existing document or docents in a collection
  • Update (query, update, options)2)

Does not update Mary

$set

deleteOne()

  • Removes a single document from a collection

deleteMany()

Operators

Update Operators

Logical Query Operators

Comparison Query Operators

Topic 7 - MongoDB II

More on find()

db.user.find()

To find only documents that have an email attribute and age is greater than 20

db.user.find({$and:[{email: {$exists:true}}, {age:{$gt:20}}]})

find(query, projection)

db.User.find()

Return only the email attribute of documents where age is greater than 18

db.User.find({age: {$gt: 20}}, {email:1})

Return only the first_name and surname attributes of all documents

db.User.find({}, {_id:false, first_name:1, surname:1})

aggregate()

  • Calculates aggregate values for the data in a collection
  • db.collection.aggregate(pipeline, options)
    • pipeline stages
    • pipeline Operators

Example

Get the average gpa for all students

db.users.aggregate([{$group:{_id:null, Average{$avg:"$gpa"}}}])

$group same as Group by in MYSQL

Result:

Get the Maximum GPA per age group

db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}])

To sort: $sort

db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}, {$sort:{_id:1}}])

Indexing

db.user.find()

Return all documents where age is greater than 18

Psuedo code example

for each document d in 'user'{
  if(d.age == 35){
    return d;
  }
}
  • Indexes support the efficient execution of queries in MongoDB.
  • Without indexes, MongoDB must perform a collection scan, i.e scan every document in a collection, to select those documents that match the query statement.
  • Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form.
  • Indexes hold mappings from field values to document locations.

getIndexes()

  • By default the only index on a document is on the _id field.
  • To find the indexes on a collection:
db.collection.getIndexes()

Which returns information in the following format, detailing the index field (_id) and the order of the indexes(1 is ascending:-1 is descending):

"key":{
 
  "_id":1
}

createIndex()

  • To create an index on a field other than _id:
  • db.collection.createIndex()

db.user.createIndex({age:1})

dropIndex()

  • To drop an index on a field use:
db.collection.dropIndex()
  • To drop the index on the age field we just created use:
db.collection.dropIndex({age:1})
  • Note: The index on _id cannot be dropped

sort()

  • When a sort() is performed on a field that is not an index, MongoDB will sort the results in memory.
  • If the sort() method consumes more than 32MB of memory, MongoDB aborts the sort.
  • To avoid this error, create an index supporting the sort operation.

__Relationships__ in MongoDB

  • Modelling relationships between documents
    • One-to-One Relationships with Embedded Documents
    • One-to-many Relationships with embedded Documents
    • One-to-many relationships with document references

One-to-One relationships with embedded documents

db.student.save({_id:"G00789445", 
                 name: "John", 
                 address:{_id: 100, 
                          town: "Athenry", 
                          county:"Galway"}})
db.student.find({}, {address:1})
{ "_id" : "G00789445",
  "address" : {
    "_id" : 100,
    "town" : "Athenry",
    "county" : "Galway"
  }
}
  • Show only the county field of documents that have an address field.
db.student.find({address:{$exists: true}}, {_id:0, "address.county":1})                 

Note: Observe details in projection part of find, i.e. find(query, projection), {_id:0, “address.county”:1}

{_id:0, “address.county”:1}Meaning
_id:0Do NOT output _id field
“address.county”:1 Only output county field

One-to-Many Relationships with Embedded Documents

Create the document with the relationships

db.student.save({_id:"G00101224", 
                 name:"Mary", 
                 modules:[{_id:"M100", module:"Databases"},
                           _id:"M101", module:"Java"}]})            

Show the student's _id and module of all modules taken by student G00101224

db.student.find({_id:"G00101224"}, {"modules.module":1})          

projection - only show the module of the modules field

{"_id":"G00101224", "modules":[{"module":"Databases"},{"module":"Java"}]}

One-to-Many relationships with document References

In the example the document has only two field, but in reality it can be a very long document with much more information, so it makes sense to use relationships instead.

with referencing

//save the modules to the docs collection
db.docs.save({_id:"M100", module:"Databases"})      
db.docs.save({_id:"M101", module:"Java"})        
//save the students to the docs collection with references to the modules using the module _id fields.
db.docs.save({_id:"G00101224", name:"Mary", modules["M100", "M101"]})
db.docs.save({_id:"G00205266", name:"Sarah", modules["M100", "M101"]})

$lookup

Using the $lookup pipeline3)

Similar to a join in MySQL…

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. The $lookup stage passes these reshaped documents to the next stage.

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

from - specifies the collection in the same databases to perform the join with. The from collection cannot be sharded.

localField - The value to search for.

foreignField - The field to search for the value specified by localField.

as - The name of the output.

Return all documents including the complete referenced documents

db.docs.aggregate([{$lookup:{from:"docs", localField:"modules", foreignField: "_id", as:"Details"}}])

Embedded Documents vs Referenced Documents

Features of embedded Documents

  • Better performance
  • Atomic

Features of Referenced Documents

  • Slower
  • No repetition
  • More complex relationships

MongoDB vs MySQL

Features of MongoDB

  • Huge amounts of data
  • Unstructured
  • Doesn't really support relationships

Features of MySQL

  • Very Stable
  • Structured
  • Integrity
1)
Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.
modules/52553_mongodb.txt · Last modified: 2020/06/20 14:39 by 127.0.0.1