幻灯片 1

C-Store: A Column-oriented
DBMS
Speaker: Zhu Xinjie
Supervisor: Ben Kao
C-Store: A Column-oriented DBMS
•
•
•
•
•
•
Introduction
Data model
RS (read-optimized store)
WS (writeable store)
Tuple mover
Performance comparison
Introduction
•
Most existing DBMS are record-oriented (row-oriented)
storage systems, whose major features consist of:
•
Store complete tuples of tabular data along with
auxiliary B-tree indexes on attributes in the table
•
store values in their native data format
•
Effective on OLTP-style applications
Introduction
Deficiencies of row-oriented store:
• Bring into memory irrelative attributes for processing a
given query
• Ineffective in read-mostly (ad hoc query) environment,
i.e., not support read-optimized
• Shifting data values onto byte or word boundaries in
main memory is expensive
Introduction
• C-Store physically stores a collection of column-oriented
overlapping projections, each sorted on some attributes.
• Code data elements into a more compact form
• Query executor operates on the compressed
representation to avoid the cost of decompression.
Introduction
• C-Store is implemented as a grid environment where
there are G nodes with private disk and private memory.
• Redundant objects to be stored in different sort-orders
provide higher retrieval performance and high availability
(K-safe)
• Simultaneously achieve very high performance on
queries and reasonable speed on OLTP-style
transactions
Introduction
• Architecture of C-Store:
• Updates and transactions
are sent to WS
• Queries are sent to RS
• Tuple mover moves tuples from WS to RS
Data Model
• C-Store implements only projections.
• Each projection is anchored on a given logical
table T, and contains one or more attributes from
T.
• In addition, a projection may also contain other
attributes from other non-anchored table.
Data Model
• EMP1, EMP2 and EMP3 are anchored on Table EMP.
DEPT1 is anchored on Table DEPT.
Data Model
• If there are k attributes in a projection, then k data
structures store k columns, respectively, each of which is
sorted on the same sort key (any column or columns).
Data Model
• Every projection is horizontally partitioned into one or
more segments identified by a segment identifier Sid.
Data Model
• For every table, there must be a covering set of
projections such that every column is stored in at least
one projection.
• To reconstruct complete rows of tables from the stored
segments needs:
• Storage Key
• Join Indices
Data Model
• Storage Key: each segment
associates every data value of
every column with a storage
key, SK.
• Values from different column in
the same segment with
matching SK belongs to the
same logical row.
• SK are integers and not
physically stored in RS, but
physically stored in WS.
Data Model
• Join Indices: if T1 and T2 are two projections anchored
on a table T, a join index from T1 to T2 is logically
a collection of tables, one per segment of T1 consisting
of rows of the form: (s: Sid in T2, k: SK in s)
RS
• Any segment of any projection is broken into columns,
each of which is stored in order of the sort key for the
projection.
• Selecting one of four encoding schemes for a column
depends on its ordering (self-order or foreign order) and
the proportion of distinct values it contains.
RS
• Type1 self-order, few distinct values
a column represented by a sequence of (v,f,n) such that v
is the value, f is the position where v first appears and n
is the number of times v appears, e.g.(4,12,7)means a
group of 4’s appear in position 12,13,…18 in the column.
• Type2 foreign-order, few distinct values
a column represented by a sequence of (v,b) such that v
is the value and b is a bitmap indicating the positions
where v appears, e.g. 0,0,1,1,2,1,0,2 can be encoded as
(0,11000010),(1,00110100),(2,00001001).
RS
• Type3 self-order, many distinct values
represent every value as a delta from the previous
one,e.g.1,4,7,7,8,12 would be represented as 1,3,3,0,1,4.
• Type4 foreign-order, many distinct values
just leave the values unencoded.
• Join Indexes can be stored as normal columns.
WS
• Implements the identical
physical design as RS
• Each column in a WS
projection is represented as a
collections of pairs (v,sk) such
that v is the value and sk is its
corresponding storage key.
Each pair is represented in a
B-tree on the second field.
• “Name” is represented as
(Alice,1), (Jill,2), (Bob,3)
• “Age” is represented as (23,1),
(24,2), (25,3)
WS
• The sort key(s) of each
projection is represented by
pairs (s,sk) such that s is the
sort key value and sk is the
storage key describing where s
first appears. Each pair is
represented in a B-tree on the
sort key field(s).
• To perform searches, use the
latter B-tree to find the storage
keys of interest, then use the
former B-tree to find the other
fields in the record.
• The sort key of EMP1 is “age”,
so the sort key for EMP1 is
represented as (23,1), (24,2),
(25,3)
Tuple Mover
• Create a new RS segment named RS’
• Read in unmarked records from columns of RS segment,
merges in column values from WS
• Update any join indexes
• Free disk space used by the old RS
Performance Comparison
• Performance analysis limited to read-only queries
• Report on only single-site
• Experiment data: TPC-H scale_10 totals 60,000,000 line
items (1.8GB)
• Run seven queries on each system: a commercial rowstore, a commercial column-store and C-Store
Performance Comparison
• Space-constrained case:
Performance Comparison
• Space-unconstrained case:
Conclusion
• A column store representation with an associated query
execution engine
• A hybrid architecture allowing transactions on a column
store
• A focus on economizing storage representation on disk
• A data model consisting of overlapping projections of
tables