~~CLOSETOC~~ |<100% 25% - >| ^ \\ DATA ANALYTICS REFERENCE DOCUMENT\\ \\ ^^ ^ Document Title:|Document Title| ^ Document No.:|1552560766| ^ Author(s):|Gerhard van der Linde, Rita Raher| ^ Contributor(s):| | **REVISION HISTORY** |< 100% 10% - - 10% 17% 10% >| ^ \\ Revision\\ \\ ^\\ Details of Modification(s)^\\ Reason for modification^ \\ Date ^ \\ By ^ | [[:doku.php?id=modules:52553_mongodb&do=revisions|0]] |Draft release|Document description here| 2019/03/14 10:52 | Gerhard van der Linde, Rita Raher | ---- ====== Topic 6 - MongoDB I ====== ===== Why NoSQL Databases? ===== ==== Scalability ==== {{:modules:screenshot_2019-03-12_at_18.22.59.png?600|}} **Scale Up/Vertically**: means moving the database to a bigger server. Scale Out/Horizontally {{:modules:screenshot_2019-03-12_at_18.26.04.png?600|}} ===== Unstructured Data ===== {{:modules:screenshot_2019-03-12_at_18.28.22.png?600|}} Add on new features later on like email and then twitter etc... ===== MongoDB ===== * Document Database * Schemaless * Horizontal Scalability Through sharding((Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.)) * Duplication of data ===== JSON ===== * JSON - JavaScript Object Notation * Lightweight data-interchange format * Machine/Human readable * Language independent * JSON Structure * Name/Value pair * Ordered Lists ===== JSON Datatypes ===== ==== Number ==== { "id" : 1 } { "id" : 3.14 } Note that there is no distinction between integer and floating point numbers. ==== String ==== { "id" : 1, "fname" : "John" } ==== Boolean ==== { "reg" : "09-G-13" "hybrid" : false } ==== Array ==== { "student" : "G00257854" "subjects" : ["Databases", "Java", "Mobile Apps"] } ==== Object Document ==== { "student" : "G00257854" "address" : { "street" : "Castle Street" "town" : "Athenry" "county" : "Galway" } } ===== JSON USES ===== {{:modules:screenshot_2019-03-12_at_18.47.58.png?600|}} {{:modules:screenshot_2019-03-12_at_18.48.31.png?600|}} ===== MongoDB, JSON and BSON ===== * JSON object = MongoDB document * Internally, MongoDB represents JSON documents in binary-encoded format called BSON (Binary JavaScript Object Notation) * BSON extends JSOM model to provide additional data types as well as indexes. ===== MongoDB Structures ===== **Document** FIXME - slide 12.... A document is record in a MongoDB collection and the basic unit of data in MongoDB. Documents are analogous to JSON objects or records in an RDBMS. {{:modules:screenshot_2019-03-12_at_18.55.03.png?600|}} **Collection** * A grouping of MongoDB documents. * Collections are analogous to RDBMS tables. * A collection exists within a single database. * Collections do not enforce a schema. Documents within a collection can have different fields. * Typically, all documents in a collection have a similar or related purpose. {{:modules:screenshot_2019-03-12_at_18.58.27.png?600|}} **Database** A number of databases can be run on a single MongoDB server. **MongoDB Commands** * show dbs - Show Databases * use myDB - Switch to databases named "myDB" (If it doesn't exist, Mongo creates it) * db - Show current Database. * show collections - Show collections in the current database ===== MongoDB Rules for creating a Document ===== * Rules for MongoDB documents * A document must have an _id field. if one is not provided, it will be automatically generated * The _id cannot be an array ===== Create a document - save() ===== {{:modules:screenshot_2019-03-12_at_19.05.28.png?600|}} ===== Query the database - find() ===== {{:modules:screenshot_2019-03-12_at_19.07.16.png?600|}} **pretty()** {{:modules:screenshot_2019-03-12_at_19.07.56.png?600|}} {{:modules:screenshot_2019-03-12_at_19.09.49.png?600|}} ==== $and ==== {{:modules:screenshot_2019-03-12_at_19.10.35.png?600|}} ==== $or ==== {{:modules:screenshot_2019-03-12_at_19.12.03.png?600|}} ==== $in ==== {{:modules:screenshot_2019-03-12_at_19.12.47.png?600|}} ==== Attribute ==== {{:modules:screenshot_2019-03-12_at_19.13.57.png?600|}} ==== Attribute and age is greater than 20 ==== {{:modules:screenshot_2019-03-12_at_19.14.59.png?600|}} ==== findOne() ==== {{:modules:screenshot_2019-03-12_at_19.15.56.png?600|}} ==== sort() ==== {{:modules:screenshot_2019-03-12_at_19.17.48.png?600|}} ==== MongoDB -_id ==== * As previously described, the document ID (_id) attribute of a mongoDB document is the only mandatory part of a document. * It can be any value, except an array. {{:modules:screenshot_2019-03-12_at_19.20.12.png?600|}} ==== more on save() ==== {{:modules:screenshot_2019-03-12_at_19.21.36.png?600|}} ==== insert() ==== * Insert a document or documents into a collection. {{:modules:screenshot_2019-03-12_at_19.24.06.png?600|}} {{:modules:screenshot_2019-03-12_at_19.24.59.png?600|}} ==== update() ==== * Modifies an existing document or docents in a collection * Update (query, update, options)((https://docs.mongodb.com/manual/reference/method/db.collection.update/#db.collection.update)) Does not update Mary {{:modules:screenshot_2019-03-12_at_19.27.47.png?600|}} ===== $set ===== {{:modules:screenshot_2019-03-12_at_19.28.58.png?600|}} {{:modules:screenshot_2019-03-12_at_19.29.55.png?600|}} ==== deleteOne() ==== * Removes a single document from a collection {{:modules:screenshot_2019-03-12_at_19.31.24.png?600|}} ==== deleteMany() ==== {{:modules:screenshot_2019-03-12_at_19.32.28.png?600|}} ==== Operators ==== https://docs.mongodb.com/manual/reference/operator/ ==== Update Operators ==== https://docs.mongodb.com/manual/reference/operator/update/ ==== Logical Query Operators ==== https://docs.mongodb.com/manual/reference/operator/query-logical/ ==== Comparison Query Operators ==== https://docs.mongodb.com/manual/reference/operator/query-comparison/ ====== Topic 7 - MongoDB II ====== ===== More on find() ===== db.user.find() {{:modules:screenshot_2019-03-19_at_16.45.15.png?600|}} To find only documents that have an email attribute and age is greater than 20 db.user.find({$and:[{email: {$exists:true}}, {age:{$gt:20}}]}) {{:modules:screenshot_2019-03-19_at_16.48.22.png?600|}} ===== find(query, projection) ===== db.User.find() {{:modules:screenshot_2019-03-19_at_16.45.15.png?600|}} Return only the email attribute of documents where age is greater than 18 db.User.find({age: {$gt: 20}}, {email:1}) {{:modules:screenshot_2019-03-19_at_16.52.24.png?600|}} Return only the first_name and surname attributes of all documents db.User.find({}, {_id:false, first_name:1, surname:1}) {{:modules:screenshot_2019-03-19_at_16.55.24.png?600|}} ===== aggregate() ===== * Calculates aggregate values for the data in a collection * db.collection.aggregate(pipeline, options) * pipeline __stages__ * pipeline __Operators__ Example {{:modules:screenshot_2019-03-19_at_16.58.16.png?400|}} Get the average gpa for all students db.users.aggregate([{$group:{_id:null, Average{$avg:"$gpa"}}}]) **$group** same as Group by in MYSQL Result: {{:modules:screenshot_2019-03-19_at_17.02.00.png?400|}} Get the Maximum GPA per age group db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}]) {{:modules:screenshot_2019-03-19_at_17.06.13.png?400|}} ==== To sort: $sort ==== db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}, {$sort:{_id:1}}]) {{:modules:screenshot_2019-03-19_at_17.07.38.png?400|}} ===== Indexing ===== db.user.find() {{:modules:screenshot_2019-03-19_at_17.09.26.png?400|}} Return all documents where age is greater than 18 ==== Psuedo code example ==== for each document d in 'user'{ if(d.age == 35){ return d; } } * Indexes support the efficient execution of queries in MongoDB. * Without indexes, MongoDB must perform a collection scan, i.e scan every document in a collection, to select those documents that match the query statement. * Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. * Indexes hold mappings from field values to document locations. {{:modules:screenshot_2019-03-19_at_17.15.52.png?600|}} {{:modules:screenshot_2019-03-19_at_17.28.38.png?600|}} ===== getIndexes() ===== * By default the only index on a document is on the _id field. * To find the indexes on a collection: db.collection.getIndexes() Which returns information in the following format, detailing the index field (_id) and the order of the indexes(1 is ascending:-1 is descending): "key":{ "_id":1 } ===== createIndex() ===== * To create an index on a field other than **_id:** * **db.collection.createIndex()** {{:modules:screenshot_2019-03-19_at_18.39.31.png?600|}} db.user.createIndex({age:1}) ===== dropIndex() ===== * To drop an index on a field use: db.collection.dropIndex() * To drop the index on the age field we just created use: db.collection.dropIndex({age:1}) * Note: The index on **_id** cannot be dropped ===== sort() ===== * When a sort() is performed on a field that is not an index, MongoDB will sort the results in memory. * If the sort() method consumes more than 32MB of memory, MongoDB aborts the sort. * To avoid this error, create an index supporting the sort operation. ===== __Relationships__ in MongoDB ===== * Modelling relationships between documents * One-to-One Relationships with Embedded Documents * One-to-many Relationships with embedded Documents * One-to-many relationships with document references ===== One-to-One relationships with embedded documents ===== {{:modules:screenshot_2019-03-19_at_17.41.38.png?600|}} db.student.save({_id:"G00789445", name: "John", address:{_id: 100, town: "Athenry", county:"Galway"}}) db.student.find({}, {address:1}) { "_id" : "G00789445", "address" : { "_id" : 100, "town" : "Athenry", "county" : "Galway" } } * Show only the county field of documents that have an address field. db.student.find({address:{$exists: true}}, {_id:0, "address.county":1}) **Note:** Observe details in projection part of find, i.e. find(query, projection), {_id:0, "address.county":1} ^{_id:0, "address.county":1}^Meaning^ ^_id:0|Do NOT output **_id** field| ^"address.county":1| Only output **county** field| {{:modules:screenshot_2019-03-19_at_17.49.18.png?600|}} ===== One-to-Many Relationships with Embedded Documents ===== {{:modules:screenshot_2019-03-19_at_17.50.31.png?600|}} Create the document with the relationships db.student.save({_id:"G00101224", name:"Mary", modules:[{_id:"M100", module:"Databases"}, _id:"M101", module:"Java"}]}) Show the student's **_id** and **module** of all modules taken by student G00101224 db.student.find({_id:"G00101224"}, {"modules.module":1}) projection - only show the **module** of the **modules field** {"_id":"G00101224", "modules":[{"module":"Databases"},{"module":"Java"}]} ===== One-to-Many relationships with document References ===== In the example the document has only two field, but in reality it can be a very long document with much more information, so it makes sense to use relationships instead. {{ :modules:screenshot_2019-03-19_at_17.56.54.png?600 |}} ==== with referencing ==== {{ :modules:screenshot_2019-03-19_at_17.58.56.png?600 |}} //save the modules to the docs collection db.docs.save({_id:"M100", module:"Databases"}) db.docs.save({_id:"M101", module:"Java"}) //save the students to the docs collection with references to the modules using the module _id fields. db.docs.save({_id:"G00101224", name:"Mary", modules["M100", "M101"]}) db.docs.save({_id:"G00205266", name:"Sarah", modules["M100", "M101"]}) ==== $lookup ==== Using the $lookup //pipeline//... ((https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/)) Similar to a join in MySQL... Performs a **left outer join** to an //unsharded// collection in the __same database__ to filter in documents from the “joined” collection for processing. To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. The **$lookup** stage passes these reshaped documents to the next stage. { $lookup: { from: , localField: , foreignField: , as: } } **from** - specifies the collection in the same databases to perform the join with. The from collection cannot be //sharded//. **localField** - The value to search for. **foreignField** - The field to search for the value specified by localField. **as** - The name of the output. {{:modules:screenshot_2019-03-19_at_18.15.04.png?600|}} Return all documents including the complete referenced documents db.docs.aggregate([{$lookup:{from:"docs", localField:"modules", foreignField: "_id", as:"Details"}}]) {{:modules:screenshot_2019-03-19_at_18.18.22.png?600|}} ===== Embedded Documents vs Referenced Documents ===== __Features of embedded Documents__ * Better performance * Atomic __Features of Referenced Documents__ * Slower * No repetition * More complex relationships ===== MongoDB vs MySQL ===== __Features of MongoDB__ * Huge amounts of data * Unstructured * Doesn't really support relationships __Features of MySQL__ * Very Stable * Structured * Integrity