Documents stored as collections Indexing for deep properties

SAFE BY DEFAULT, OPTIMIZED FOR
EFFICIENCY
RAVENDB
▪ Open source
▪ Document database
▪ Built with C#
▪ Data saved as JSON
▪ Uses Lucene.NET for indexing
▪ Uses Esent for storage
▪ Latest stable version: 3.5.3
▪ Available only for Windows OS and Linux 64 bit
RAVENDB users
And much more…
Fundamentals
▪ No schema
▪ Documents are stored as JSON
{ "Address" : “Vinarska", "City"
: “I love Brno", "PostalCode" :
60200 }
▪ The main focus of RavenDB is to allow developers to build highperformance, low latency applications quickly and efficiently.
Queries and Indexes
▪ Documents stored as collections
▪ Indexing for deep properties
▪ MapReduce support
Map(k1,v1) → list(k2,v2)
Reduce(k2, list (v2)) → list(v3)
▪ It supports static and ad-hoc indexes
▪ Indexing performed in background
Scaling & Replication
▪ Sharding support
▪ Replication support
▪ Full backup support
MapReduce
▪ MapReduce done as indexes
▪ In RavenDB, MapReduce is defined as an index and is
precalculated in the background
▪ It doesn’t support MapReduce pipeline
Querying
The basic operation on RavenDB
Querying
Querying
Insertion
• Done on the server side
Querying
Filtering
• Return records that match the given condition
Querying
Querying
Querying
Searching
• Using the WHERE closure to create conditions
Querying
Paging
• Splitting the databases into pages, and reading one page at a time.
Indexing
▪ Indexes are server-side functions that define using which fields (and what
values) document can be searched on and are the only way to satisfy
queries in RavenDB.
▪ The whole indexing process is done in the background and is triggered
whenever data is added or changed.
▪ The core of every index is its mapping function with LINQ-like syntax and the
result of such a mapping is converted to Lucene index entry, which is
persisted for future use to avoid re-indexation each time the query is issued
and to achieve fast response times.
▪ Even when you do not create an index, RavenDB will use one to execute
queries. In fact there are no O(N) operations in general in RavenDB queries.
Using indexes, queries in RavenDB are O(logN) operations.
Indexing
▪ RavenDB is safe by default and whenever you make a query, the query
optimizer will try to select an appropriate index to use. If there is no such
appropriate index, then the query optimizer will create an index for you.
▪ Map indexes (sometimes referred as simple indexes) contain one (or more)
mapping functions that indicate which fields from documents should be
indexed (in other words they indicate which documents can be searched by
which fields).
▪ multi-map indexes allow you to index data from multiple collections e.g.
polymorphic data
Indexing-Map Reduced Index
▪ Map-Reduce indexes that allow complex aggregations to be performed in
two-step process. First by selecting appropriate records (using Map
function), then by applying specified reduce function to these records to
produce smaller set of results.
▪ In essence, it is just a way to take a big task and divide it into discrete tasks
that can be done in parallel.
▪ The notion of stale indexes comes from an observation deep in ravendb's
design, assuming that the user should never suffer from assigning the
server big tasks. as far as ravendb is concerned, it is better to be stale than
offline, and as such it will return results to queries even if it knows they may
not be as up-to-date as possible.
▪ A fanout index is an index that outputs multiple index entries per each
document.
Customizing using sort
▪ Indexes in RavenDB are lexicographically sorted by default, so all queries
return results which are ordered lexicographically. When putting a static
index in RavenDB, you can specify custom sorting requirements, to ensure
results are sorted the way you want them to.
▪ Dates are written to the index in a form which preserves lexicography order,
and is readable by both human and machine (like so: 2011-0404T11:28:46.0404749+03:00), so this requires no user intervention, too.
▪ Numerical values, on the other hand, are stored as text and therefore
require the user to specify explicitly what is the number type used so a
correct sorting mechanism is enforced. This is quite easily done, by
declaring the required sorting setup in SortOptions
▪ SORTOPTION(NUMBER)-(1,2,3,11)
▪ SORTOPTION(string)-(1,11,2,3)
Boosting & Analyzers
▪ Another great feature that Lucene engine provides and RavenDB leverages.This
feature gives user the ability to manually tune the relevance level of matching
documents when performing a query.
▪ From the index perspective we can associate with an index entry a boosting factor
and the higher value it has, the more relevant term will be. To do this we must
use Boost extension method from Raven.Client.Linq.Indexing
▪ The indexes each RavenDB server instance uses to facilitate fast queries are powered
by Lucene, the full-text search engine.
Boosting & Analyzers
▪ Lucene takes a Document , breaks it down into fields , and then splits all the text in a
Field into tokens (Terms) in a process called Tokenization. Those tokens are what will
be stored in the index, and later will be searched upon.
▪ After a successful indexing operation, RavenDB feeds Lucene with each entity from
the results as a Document, and marks every property in it as a Field . Then every
property is going through the Tokenization process using an object called a "Lucene
Analyzer", and then finally is stored into the index.
Boosting & Analyzers
▪ after the tokenization and analysis process is complete, the resulting tokens are
stored in an index, which is now ready to be search with. only fields in the final index
projection could be used for searches, and the actual tokens stored for each depend
on how the selected analyzer processed the original text.
▪ lucene allows storing the original token text for fields, and ravendb exposes this
feature in the index definition object via stores.
▪ Lucene offers several out-of-the-box Analyzers, and the new ones can be created
easily. Various analyzers differ in the way they split the text stream ("tokenize"), and
in the way they process those tokens post-tokenization.
Boosting & Analyzers
▪ StandardAnalyze
▪ StopAnalyzer
▪ SimpleAnalyzer
▪ WhitespaceAnalyzer
▪ KeywordAnalyzer
▪ By default, RavenDB uses a custom analyzer called LowerCaseKeywordAnalyzer for all
content. This implementation behaves like Lucene's KeywordAnalyzer, but it also performs
case normalization by converting all characters to lower case.
▪ In other words, by default, RavenDB stores the entire term as a single token, in a lower case
form. So given the same sample text from above, LowerCaseKeywordAnalyzer will
produce a single token looking like this:
Term Vectors & Dynamic fields
▪ Term Vector is a representation of a text document as a vector of identifiers that can
be used for similarity searches, information filtering, information retrieval, and
indexing. In RavenDB the features like MoreLikeThis or text highlighting are
leveraging the term vectors to accomplish their purposes.
▪ While strongly typed entities are well processed by LINQ expressions, some
scenarios demand the use of dynamic properties. To support searching in object
graphs they cannot have their entire structure declared upfront. RavenDB exposes
low-level API for creating fields from within index definitions.
Testing Indices and side by side index
▪ The common problem, especially when data set is too big and indexation takes very
long time, is the need of changing the index definition. As you know, each change of
the definition will reset index and start indexation process (for this index) from
scratch which in many cases in fine, but not during the development, when you are
shaping the index and demanding immediate feedback from server with the results
(or at least partial results).
▪ To resolve this issue, we have introduced the ability to test indexes on a limited data
set. This way developers will get index results immediately from a limited data set so
the can proceed with the index creation process, without resetting the main index till
the new definition is ready.
Testing Indices and side by side index
▪ This feature enables you to create an index that will be replaced by another one after
one of the following conditions are met:
• new index becomes non-stale (non-optional)
• new index reaches last indexed etag (in the moment of creation of a new side-byside index) of a index that will be replaced (optional)
• particular date is reached (optional)

Download Report

Documents stored as collections Indexing for deep properties

Paperzz.com

Your Paperzz