L4-2 Mongo

MongoDB
First Light
Mongo DB Basics
• Mongo is a document based NoSQL.
– A document is just a JSON object.
– A collection is just a (large) set of documents
– A database is just a set of collections.
• JSON = JavaScript Object Notation
– Actually BSON = binary encoded JSON
• Mongo shell is a JavaScript interpreter!
– (And I have never coded JavaScript, ahem...)
CS@AU
Henrik B Christensen
2
An In-between NoSQL
• The ends of the spectrum
– Key-value stores
• Know the key to access opaque blob of anything
• Fire-and-forget (write-and-forget)
– RDB
• Elaborate ad-hoc queries over highly structured data
(Schema)
• Normalized meaning ‘lots’ of tables
• Transactions
• MongoDB sits somewhere in the middle
• Documents have elaborate (OO) structure (but not fixed!)
• Rather powerful query language (no joins though)
• From fire-and-forget to ‘acknowledge write on all replica’
CS@AU
Henrik B Christensen
3
JSON
• Get used to key/value pairs!
• { course: ”SAiP”, semester:”E12”, teacher: ”hbc” }
• Basically close to fields of OO languages
– The architectural mismatch between programming
language and DB concepts is lessened!
CS@AU
Henrik B Christensen
4
Basic commands…
• MongoDB creates objects and collections in the
fly…
CS@AU
Henrik B Christensen
5
No schema enforced...
CS@AU
Henrik B Christensen
6
Schema: Pro and Con
• Schema can provide a lot of data safety
– Validating data, avoid hard-to-find bugs in clients, ...
• However, they are also costly to migrate
• MongoDB is pretty handy in agile and early
development when the ‘schema’ changes often...
CS@AU
Henrik B Christensen
7
find()
• You can formulate simple queries using ‘find()’ on
a collection. Of course, the parameter of find is
– A JSON object!
CS@AU
Henrik B Christensen
8
More complex queries
• Regular expressions, and, or...
CS@AU
Henrik B Christensen
9
Hey – what about updates?
• Update
– 1 argument: the document to find
– 2 argument: the values to add/set/update
CS@AU
Henrik B Christensen
10
Adding more structure
• Now, after I go home you decide to give my talk
grades.
– No new tables, schema, etc.
– We just add more structure, similar to OO
• Ahh – one late grade arrives – justs $push it
CS@AU
Henrik B Christensen
11
Or - using SkyCave
CS@AU
Henrik Bærbak Christensen
12
RoomRecord like stuff
CS@AU
Henrik Bærbak Christensen
13
Pretty() is pretty nice
CS@AU
Henrik Bærbak Christensen
14
RegExps
CS@AU
Henrik Bærbak Christensen
15
Sorting on fields
CS@AU
Henrik Bærbak Christensen
16
Bounded result: ‘limit’
CS@AU
Henrik Bærbak Christensen
17
Wall exercise?
CS@AU
Henrik Bærbak Christensen
18
Adding msg
CS@AU
Henrik Bærbak Christensen
19
Players
CS@AU
Henrik Bærbak Christensen
20
Now…
• How do we compose the ‘getShortRoomDesc()’?
• SELECT r.desc FROM room r, player p
WHERE p.name = ”Mikkel”
AND p.pos = r. pos
• ???
CS@AU
Henrik Bærbak Christensen
21
The NoSQL answer
• The NoSQL answer: Manual references!
– It is client-side responsibility to join
• Find p.pos using query 1; next find r.desc using query 2
– (§4.4.2 in MongoDB manual 3.0.6)
• Exercise
– Why it is this the right answer in a NoSQL world?
• Hint: Think 10.000 clients, think CPU cycles – where?
CS@AU
Henrik Bærbak Christensen
22
Alternatives
• Solution 2:
– Denormalize / Embedded documents
• But not always possible for complex data structures
• But may actually slow queries down depending on search
patterns
– Searching inside documents is more tedious
• Solution 3:
– DBRefs
• special MongoDB feature to make it even more SQL like
CS@AU
Henrik B Christensen
23
MongoDB modeling
Comparing Documents to Tables
CS@AU
Henrik B Christensen
24
Entry on social network site:
Schema
CS@AU
Henrik B Christensen
25
As RDB Schema
• The RDB version
CS@AU
Henrik B Christensen
26
Discussion
• Thus Mongo has less need for joining because
the datamodel is richer
– Arrays of complex objects
– Sub objects
• Avoids the RDB idioms for modeling OneToMany
relations
• ManyToMany handled by manual references
– Two ‘find()’ instead of one ‘Select’
• And
– Replaces many random reads with fewer sequential
CS@AU
Henrik B Christensen
27
Going Large
Durability, Scaling,
Replication and Sharding
CS@AU
Henrik B Christensen
28
Durability
• RDBs guaranty Durability
– Once a data update is acknowledged, data is stored
• MongoDB is configurable (write concern)
–
–
–
–
CS@AU
Unacknowledge:
Acknowledged:
Journaled:
Replica acknow.:
the
fire-and-forget
acknowledge the write operation
at least one will store data
at least N replica has received
write operation
Henrik B Christensen
29
Scaling out
• To get more power/space – just add more...
CS@AU
Henrik B Christensen
30
Replication
• Replica sets
– Primary (handles writes/reads)
– N secondaries (only reads)
– Eventual consistency!
• Failover is automatic
– Secondary votes
– New primary selected
• Experience: Easy!
CS@AU
Henrik B Christensen
31
Sharding
• Key goals
– No change in the client side API!
• When our EcoSense data grows out of its boxes we do not
have to change our client programs!
– Auto sharding
• You configure your shard key as ranges on your document
keys
– Shard balancing
• Migrates data automatically if one shard grows too large
• Experience: Quite easy 
CS@AU
Henrik B Christensen
32