~~CLOSETOC~~
|<100% 25% - >|
^ \\ DATA ANALYTICS REFERENCE DOCUMENT\\ \\ ^^
^ Document Title:|Document Title|
^ Document No.:|1552560766|
^ Author(s):|Gerhard van der Linde, Rita Raher|
^ Contributor(s):| |
**REVISION HISTORY**
|< 100% 10% - - 10% 17% 10% >|
^ \\ Revision\\ \\ ^\\ Details of Modification(s)^\\ Reason for modification^ \\ Date ^ \\ By ^
| [[:doku.php?id=modules:52553_mongodb&do=revisions|0]] |Draft release|Document description here| 2019/03/14 10:52 | Gerhard van der Linde, Rita Raher |
----
====== Topic 6 - MongoDB I ======
===== Why NoSQL Databases? =====
==== Scalability ====
{{:modules:screenshot_2019-03-12_at_18.22.59.png?600|}}
**Scale Up/Vertically**: means moving the database to a bigger server.
Scale Out/Horizontally
{{:modules:screenshot_2019-03-12_at_18.26.04.png?600|}}
===== Unstructured Data =====
{{:modules:screenshot_2019-03-12_at_18.28.22.png?600|}}
Add on new features later on like email and then twitter etc...
===== MongoDB =====
* Document Database
* Schemaless
* Horizontal Scalability Through sharding((Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.))
* Duplication of data
===== JSON =====
* JSON - JavaScript Object Notation
* Lightweight data-interchange format
* Machine/Human readable
* Language independent
* JSON Structure
* Name/Value pair
* Ordered Lists
===== JSON Datatypes =====
==== Number ====
{
"id" : 1
}
{
"id" : 3.14
}
Note that there is no distinction between integer and floating point numbers.
==== String ====
{
"id" : 1,
"fname" : "John"
}
==== Boolean ====
{
"reg" : "09-G-13"
"hybrid" : false
}
==== Array ====
{
"student" : "G00257854"
"subjects" : ["Databases", "Java", "Mobile Apps"]
}
==== Object Document ====
{
"student" : "G00257854"
"address" : {
"street" : "Castle Street"
"town" : "Athenry"
"county" : "Galway"
}
}
===== JSON USES =====
{{:modules:screenshot_2019-03-12_at_18.47.58.png?600|}}
{{:modules:screenshot_2019-03-12_at_18.48.31.png?600|}}
===== MongoDB, JSON and BSON =====
* JSON object = MongoDB document
* Internally, MongoDB represents JSON documents in binary-encoded format called BSON (Binary JavaScript Object Notation)
* BSON extends JSOM model to provide additional data types as well as indexes.
===== MongoDB Structures =====
**Document** FIXME - slide 12....
A document is record in a MongoDB collection and the basic unit of data in MongoDB.
Documents are analogous to JSON objects or records in an RDBMS.
{{:modules:screenshot_2019-03-12_at_18.55.03.png?600|}}
**Collection**
* A grouping of MongoDB documents.
* Collections are analogous to RDBMS tables.
* A collection exists within a single database.
* Collections do not enforce a schema. Documents within a collection can have different fields.
* Typically, all documents in a collection have a similar or related purpose.
{{:modules:screenshot_2019-03-12_at_18.58.27.png?600|}}
**Database**
A number of databases can be run on a single MongoDB server.
**MongoDB Commands**
* show dbs - Show Databases
* use myDB - Switch to databases named "myDB" (If it doesn't exist, Mongo creates it)
* db - Show current Database.
* show collections - Show collections in the current database
===== MongoDB Rules for creating a Document =====
* Rules for MongoDB documents
* A document must have an _id field. if one is not provided, it will be automatically generated
* The _id cannot be an array
===== Create a document - save() =====
{{:modules:screenshot_2019-03-12_at_19.05.28.png?600|}}
===== Query the database - find() =====
{{:modules:screenshot_2019-03-12_at_19.07.16.png?600|}}
**pretty()**
{{:modules:screenshot_2019-03-12_at_19.07.56.png?600|}}
{{:modules:screenshot_2019-03-12_at_19.09.49.png?600|}}
==== $and ====
{{:modules:screenshot_2019-03-12_at_19.10.35.png?600|}}
==== $or ====
{{:modules:screenshot_2019-03-12_at_19.12.03.png?600|}}
==== $in ====
{{:modules:screenshot_2019-03-12_at_19.12.47.png?600|}}
==== Attribute ====
{{:modules:screenshot_2019-03-12_at_19.13.57.png?600|}}
==== Attribute and age is greater than 20 ====
{{:modules:screenshot_2019-03-12_at_19.14.59.png?600|}}
==== findOne() ====
{{:modules:screenshot_2019-03-12_at_19.15.56.png?600|}}
==== sort() ====
{{:modules:screenshot_2019-03-12_at_19.17.48.png?600|}}
==== MongoDB -_id ====
* As previously described, the document ID (_id) attribute of a mongoDB document is the only mandatory part of a document.
* It can be any value, except an array.
{{:modules:screenshot_2019-03-12_at_19.20.12.png?600|}}
==== more on save() ====
{{:modules:screenshot_2019-03-12_at_19.21.36.png?600|}}
==== insert() ====
* Insert a document or documents into a collection.
{{:modules:screenshot_2019-03-12_at_19.24.06.png?600|}}
{{:modules:screenshot_2019-03-12_at_19.24.59.png?600|}}
==== update() ====
* Modifies an existing document or docents in a collection
* Update (query, update, options)((https://docs.mongodb.com/manual/reference/method/db.collection.update/#db.collection.update))
Does not update Mary
{{:modules:screenshot_2019-03-12_at_19.27.47.png?600|}}
===== $set =====
{{:modules:screenshot_2019-03-12_at_19.28.58.png?600|}}
{{:modules:screenshot_2019-03-12_at_19.29.55.png?600|}}
==== deleteOne() ====
* Removes a single document from a collection
{{:modules:screenshot_2019-03-12_at_19.31.24.png?600|}}
==== deleteMany() ====
{{:modules:screenshot_2019-03-12_at_19.32.28.png?600|}}
==== Operators ====
https://docs.mongodb.com/manual/reference/operator/
==== Update Operators ====
https://docs.mongodb.com/manual/reference/operator/update/
==== Logical Query Operators ====
https://docs.mongodb.com/manual/reference/operator/query-logical/
==== Comparison Query Operators ====
https://docs.mongodb.com/manual/reference/operator/query-comparison/
====== Topic 7 - MongoDB II ======
===== More on find() =====
db.user.find()
{{:modules:screenshot_2019-03-19_at_16.45.15.png?600|}}
To find only documents that have an email attribute and age is greater than 20
db.user.find({$and:[{email: {$exists:true}}, {age:{$gt:20}}]})
{{:modules:screenshot_2019-03-19_at_16.48.22.png?600|}}
===== find(query, projection) =====
db.User.find()
{{:modules:screenshot_2019-03-19_at_16.45.15.png?600|}}
Return only the email attribute of documents where age is greater than 18
db.User.find({age: {$gt: 20}}, {email:1})
{{:modules:screenshot_2019-03-19_at_16.52.24.png?600|}}
Return only the first_name and surname attributes of all documents
db.User.find({}, {_id:false, first_name:1, surname:1})
{{:modules:screenshot_2019-03-19_at_16.55.24.png?600|}}
===== aggregate() =====
* Calculates aggregate values for the data in a collection
* db.collection.aggregate(pipeline, options)
* pipeline __stages__
* pipeline __Operators__
Example
{{:modules:screenshot_2019-03-19_at_16.58.16.png?400|}}
Get the average gpa for all students
db.users.aggregate([{$group:{_id:null, Average{$avg:"$gpa"}}}])
**$group** same as Group by in MYSQL
Result:
{{:modules:screenshot_2019-03-19_at_17.02.00.png?400|}}
Get the Maximum GPA per age group
db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}])
{{:modules:screenshot_2019-03-19_at_17.06.13.png?400|}}
==== To sort: $sort ====
db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}, {$sort:{_id:1}}])
{{:modules:screenshot_2019-03-19_at_17.07.38.png?400|}}
===== Indexing =====
db.user.find()
{{:modules:screenshot_2019-03-19_at_17.09.26.png?400|}}
Return all documents where age is greater than 18
==== Psuedo code example ====
for each document d in 'user'{
if(d.age == 35){
return d;
}
}
* Indexes support the efficient execution of queries in MongoDB.
* Without indexes, MongoDB must perform a collection scan, i.e scan every document in a collection, to select those documents that match the query statement.
* Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form.
* Indexes hold mappings from field values to document locations.
{{:modules:screenshot_2019-03-19_at_17.15.52.png?600|}}
{{:modules:screenshot_2019-03-19_at_17.28.38.png?600|}}
===== getIndexes() =====
* By default the only index on a document is on the _id field.
* To find the indexes on a collection:
db.collection.getIndexes()
Which returns information in the following format, detailing the index field (_id) and the order of the indexes(1 is ascending:-1 is descending):
"key":{
"_id":1
}
===== createIndex() =====
* To create an index on a field other than **_id:**
* **db.collection.createIndex()**
{{:modules:screenshot_2019-03-19_at_18.39.31.png?600|}}
db.user.createIndex({age:1})
===== dropIndex() =====
* To drop an index on a field use:
db.collection.dropIndex()
* To drop the index on the age field we just created use:
db.collection.dropIndex({age:1})
* Note: The index on **_id** cannot be dropped
===== sort() =====
* When a sort() is performed on a field that is not an index, MongoDB will sort the results in memory.
* If the sort() method consumes more than 32MB of memory, MongoDB aborts the sort.
* To avoid this error, create an index supporting the sort operation.
===== __Relationships__ in MongoDB =====
* Modelling relationships between documents
* One-to-One Relationships with Embedded Documents
* One-to-many Relationships with embedded Documents
* One-to-many relationships with document references
===== One-to-One relationships with embedded documents =====
{{:modules:screenshot_2019-03-19_at_17.41.38.png?600|}}
db.student.save({_id:"G00789445",
name: "John",
address:{_id: 100,
town: "Athenry",
county:"Galway"}})
db.student.find({}, {address:1})
{ "_id" : "G00789445",
"address" : {
"_id" : 100,
"town" : "Athenry",
"county" : "Galway"
}
}
* Show only the county field of documents that have an address field.
db.student.find({address:{$exists: true}}, {_id:0, "address.county":1})
**Note:** Observe details in projection part of find, i.e. find(query, projection), {_id:0, "address.county":1}
^{_id:0, "address.county":1}^Meaning^
^_id:0|Do NOT output **_id** field|
^"address.county":1| Only output **county** field|
{{:modules:screenshot_2019-03-19_at_17.49.18.png?600|}}
===== One-to-Many Relationships with Embedded Documents =====
{{:modules:screenshot_2019-03-19_at_17.50.31.png?600|}}
Create the document with the relationships
db.student.save({_id:"G00101224",
name:"Mary",
modules:[{_id:"M100", module:"Databases"},
_id:"M101", module:"Java"}]})
Show the student's **_id** and **module** of all modules taken by student G00101224
db.student.find({_id:"G00101224"}, {"modules.module":1})
projection - only show the **module** of the **modules field**
{"_id":"G00101224", "modules":[{"module":"Databases"},{"module":"Java"}]}
===== One-to-Many relationships with document References =====
In the example the document has only two field, but in reality it can be a very long document with much more information, so it makes sense to use relationships instead.
{{ :modules:screenshot_2019-03-19_at_17.56.54.png?600 |}}
==== with referencing ====
{{ :modules:screenshot_2019-03-19_at_17.58.56.png?600 |}}
//save the modules to the docs collection
db.docs.save({_id:"M100", module:"Databases"})
db.docs.save({_id:"M101", module:"Java"})
//save the students to the docs collection with references to the modules using the module _id fields.
db.docs.save({_id:"G00101224", name:"Mary", modules["M100", "M101"]})
db.docs.save({_id:"G00205266", name:"Sarah", modules["M100", "M101"]})
==== $lookup ====
Using the $lookup //pipeline//... ((https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/))
Similar to a join in MySQL...
Performs a **left outer join** to an //unsharded// collection in the __same database__ to filter in documents from the “joined”
collection for processing. To each input document, the $lookup stage adds a new array field whose elements are the
matching documents from the “joined” collection. The **$lookup** stage passes these reshaped documents to the next stage.
{
$lookup:
{
from: ,
localField: ,
foreignField: ,
as:
**from** - specifies the collection in the same databases to perform the join with.
The from collection cannot be //sharded//.
**localField** - The value to search for.
**foreignField** - The field to search for the value specified by localField.
**as** - The name of the output.
{{:modules:screenshot_2019-03-19_at_18.15.04.png?600|}}
Return all documents including the complete referenced documents
db.docs.aggregate([{$lookup:{from:"docs", localField:"modules", foreignField: "_id", as:"Details"}}])
{{:modules:screenshot_2019-03-19_at_18.18.22.png?600|}}
===== Embedded Documents vs Referenced Documents =====
__Features of embedded Documents__
* Better performance
* Atomic
__Features of Referenced Documents__
* Slower
* No repetition
* More complex relationships
===== MongoDB vs MySQL =====
__Features of MongoDB__
* Huge amounts of data
* Unstructured
* Doesn't really support relationships
__Features of MySQL__
* Very Stable
* Structured
* Integrity