Nessun titolo diapositiva

Conditions Database
Oracle implementation
Andrea Valassi
(CERN IT-DB)
Borrowing a lot from previous presentations by Emil Pilecki
1
LCG Conditions Database Workshop
08-Dec-2003
Overview
• Oracle implementation developed by Emil Pilecki in IT-DB
– Work started in early 2002 (initial interest mainly in LHCb)
– Production release 0.4.1.6 in August 2002
• Implementation is essentially frozen on August 2002 status
– Emil left the group in late 2002
– No production users before the Harp migration
• Harp condition data migrated from Objy in November 2003
– Only minor ad-hoc changes (by A.V.) with respect to Emil’s 0.4.1.6 version
– Actual data migration using tools derived from Emil’s migration tools
• Emil’s 0.4.1.6 version ready to be re-released for LCG
– Ported to SCRAM and LCG CVS repository
2
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003
Implementation choices
• Oracle 9i server
– At CERN: devdb9 (development) or pdb01 (production)
• Relational data model
– Oracle 9i object features not used
• Client access through the OCCI library
– More user-friendly and better suited for C++ than OCI
– OCCI implementation transparent for users
• Some performance optimization for read access (queries)
– Data insertion not optimized yet
3
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003
Why relational data model?
• Data model is simpler
• Sufficient for condition data
• Well known and reliable
• Less storage overhead
• Less client-side processing
4
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003
Relational Design (ERD):
folder(set)s, objects, data
Possible data relation
Folder_set
# folder_set_id
* name
Condition_data
Necessary data relation
# data_id
One to many relation
o data_value
o description
o attributes
Foreign key is a part of
primary key for that table
r parent_set_id
Condition_object
#
Attribute is a part of primary
key
*
Attribute cannot be null
o
Null value allowed for this
attribute
r
Attribute is a foreign key
u
Attribute is a part of Unique
constraint
# object_id
* since
5
Folder
* till
# folder_id
* insertion_time
* name
* layer
o description
o description
o attributes
r data_id
r parent_set_id
r folder_id
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003
Relational Design (ERD): tags
Tag
Folder_set
# folder_set_id
* name
# tag_id
Object_tag
u name
* assignment_time
* creation_time
#r tag_id
o description
#r object_id
o description
o attributes
r parent_set_id
Condition_object
# object_id
Folder
# folder_id
* name
6
Folder_tag
* since
* assignment_time
* till
#r tag_id
* insertion_time
#r folder_id
* layer
o description
o description
o attributes
r folder_id
r parent_set_id
r data_id
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003
Use of materialized views
• Materialized views for data that
is frequently accessed:
Folder_paths
Folder_sets_paths
* full_path
* full_path
* folder_id
– Full folder and folder set paths
• Built from hierarchical queries
* folder_set_id
* parent_fs_id
– Current HEAD of each folder
Heads
* object_id
• To simplify computation of overlapping
intervals on inserting
• To speed up read access to the HEAD
intervals
* since
* till
* insertion_time
* layer
* parent_folder_id
7
Andrea Valassi IT-DB
• Limitations
– Update operations are auto-committed
– Rollback and bulk updates not possible
Oracle implementation
08-Dec-2003
Partitioning
• Partitioning by folder
– The object and data tables have a separate partition for each folder
• These tables are also hash-subpartitioned by object id and data id
• Advantages
– Performance enhancements for large databases
• Limitations
– The partitioning schema is hardcoded and the same for all folders
• Too many partitions for simple folders with few rows
• Large storage overhead (each partition is a segment of at least 64kb)
– No partitioning by time range yet
8
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003
Other implementation features
• User-defined indexes for main columns
– Explicitly used in queries through optimizer hints
• Stored PL/SQL procedures
– Increase server-side processing and reduce network traffic
– Most obvious example: computation of overlapping intervals on insertion
9
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003
Comments from Harp migration
• A few ad-hoc changes in the implementation
– Specify separate data, index, BLOB tablespaces
– Use tables and packages from the schema of a different user
• Objy to Oracle migration via export/import to/from files
– Standard export/import tools using API work but are not optimal
– Modified import tool (breaking the API) used for Harp migration
• “Clone” mode: keep the same insertion date and layer number
• Bulk updates to increase insert speed by a factor 10 (to 600 rows/sec)
• Some of the items on the to-do list
– Reengineer data insertion (and m.view usage) to use bulk updates
– Reengineer data retrieval for BLOB’s to use bulk reads
– Keep track which objects are system-inserted
10
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003
Platforms
• Linux RedHat 7.3
– Compiler gcc 2.95.2 and 2.96
– The OCCI libraries are not yet released for gcc3.2
• Sun Solaris 5.7 and 5.8
– Compiler: CC Sun WorkShop 6 C++ 5.2
• Windows 2000
– Microsoft Visual Studio 6.0
11
Andrea Valassi IT-DB
Oracle implementation
08-Dec-2003