B6_MetadataCataloging

Archive Metadata (Keyword) Cataloging
This document will discuss how we do Metadata (Keyword) Cataloging using the Ingest
Manifest. Many Ingest Manifests will be created from FITS files. Please look at the Ingest
Manifest Schema and design document before reading this document. I’ll be using WSS as an
example but it applies to any file with metadata.
The key to Metadata Ingesting is the KeywordFieldMap view. This is a view on 3 tables one of
which has a foreign key on FileType.
KeywordFieldMap Tables
Fig. 1 – ERD for Metadata Processing Static Tables
7/29/2017
Archive MetaData (Keyword) Cataloging
1
Archive Metadata (Keyword) Cataloging
There are 2 catalog lookup tables.
1. FileType - A lookup table that defines the different types of files that can be stored in
the Archive. There are two unique keys in this table which map 1 to 1 to each other:
FileTypeID (an integer identity field) and fileType (describes the type of data). The
FileTypeID is used in ArchiveFile and FileTypeKeywordField. There must be an entry in
this table before an entry can be written to any of the other two tables.
Note: Only the first two columns of the WSS entries are displayed since that is all that is
needed for WSS metadata cataloging
FileTypeID
24
25
26
27
fileDescription
WSS WEX
WSS MCS
WSS WAS
WSS OPD
Fig. 1.1 WSS Entries in FileType Table
2. MetaTableInfo – A lookup table listing the order in which the tables should be processed
E.g. Wss must be processed before WssZipFile since there is a foreign key on WssZipFile
pointing to Wss. This controls the order in which the stored procedures are called. It
also contains multInserts which the software used to populate multiple rows in the table
usually from the different header exensions. This will be true for cases of Ingest
Manifest TableRow entries. See Ingest Manifest document. An entry must be in
MetaTableInfo before an entry can be put into KeywordField.
tableName
Wss
WssOpd
WssOpdDefocus
WssVisitMap
WssZipFile
tableProcessOrder
1
1
2
2
2
multInserts
0
0
1
1
1
Instrument
NULL
NULL
NULL
NULL
NULL
productLevels
NULL
NULL
NULL
NULL
NULL
Fig. 1.2 WSS Entries in TableHierarchy Table
7/29/2017
Archive MetaData (Keyword) Cataloging
2
Archive Metadata (Keyword) Cataloging
3. KeywordField – A lookup table that maps a keyword and extension to a field in a table.
This is part a view, KeywordFieldMap, Ingest uses to call stored procedures to populate
the metadata tables. This table is populated using the Keyword json files. The field
definitions are as follows:
KeywordFieldID - The primary key which, uniquely maps to the keword, extName
and tableName. It is a foreign key into FileTypeKeywordField table.
a. Keyword – Name of the Keyword.
b. extName – Name of the header extension from which to extract the keyword
value.
c. tableName – Name of the database table where the Keyword’s value should be
stored. There must be an entry in TableHierarchy for this tableName.
d. fieldName – Name of the field within a table where this Keyword’s value is
stored.
e. quoteIt – A Boolean indicating whether or not this Keyword value should have
quotes around this parameter in the stored procedure call
f. specialProcess – A function name that should be coded in the Ingest code and
executed there to determine the stored procedure parameter value.
g. dataModelName – The path to this keyword in the json file. Only populated for
keywords listed in a json file and it’s not listed in the table below since it is not
used in cataloging keywords.
Key
wor
dFi
eldI
D
220
221
222
keyword
extName
tableName
fieldName
qu
ot
eIt
specialPr
ocess
AP_TYPE
CORR_ID
DATE
PRIMARY
PRIMARY
PRIMARY
Wss
Wss
Wss
ap_type
corr_id
date
1
1
1
223
224
225
226
227
228
229
230
231
232
233
234
TIME
ENVIRON
FILECNT
OPER
MCS_OP
WAS_OP
SEQ_NUM
USER
ZIPNAME
APERNAME
CORR_ID
DATE-OBS
PRIMARY
PRIMARY
ZIPFILES
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
Wss
Wss
Wss
Wss
Wss
Wss
Wss
Wss
Wss
WssOpd
WssOpd
WssOpd
date
environ
filecnt
oper
operation
operation
seq_num
username
zipname
apername
corr_id
date_obs
1
1
0
1
1
1
1
1
1
1
1
1
NULL
NULL
CONCAT('
',TIME)
IGNORE()
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
CONCAT('
',TIME-
7/29/2017
Archive MetaData (Keyword) Cataloging
3
Archive Metadata (Keyword) Cataloging
235
236
237
238
239
240
241
242
243
244
…
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
TIME-OBS
DETECTOR
FILECONT
FPX
FPY
GROUP_ID
GRP_CNT
HEXA1Z0
HEXA1Z1
HEXA1Z2
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
RESULT_PHASE
RESULT_PHASE
RESULT_PHASE
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
date_obs
detector
filecont
fpx
fpy
group_id
grp_cnt
hexa1z0
hexa1z1
hexa1z2
1
1
1
0
0
1
1
0
0
0
OBS)
IGNORE()
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
HEXGZ8
INSTRUME
MODE
OBS_ID
OP_TYPE
OPER
RMS_WFE
RMS_WFE
TSTAMP
XCONTENT
DEFOCUS
EXTNUM
CORR_ID
VISIT_ID
NAME
TSTAMP
TYPE
RESULT_PHASE
PRIMARY
PRIMARY
PRIMARY
PRIMARY
PRIMARY
EXPECTED
RESULT_PHASE
PRIMARY
EXPECTED
RAW_PSF
RAW_PSF
VISITMAP
VISITMAP
ZIPFILES
ZIPFILES
ZIPFILES
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpd
WssOpdDefocus
WssOpdDefocus
WssVisitMap
WssVisitMap
WssZipFile
WssZipFile
WssZipFile
hexgz8
instrume
mode
obs_id
op_type
oper
rms_wfe_e
rms_wfe_r
tstamp
xcontent
defocus
extnum
corr_id
visit_id
name
tstamp
type
0
1
1
1
1
1
0
0
1
1
0
0
1
1
1
1
1
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
Fig. 1.3 WSS Entries in KeywordField table
7/29/2017
Archive MetaData (Keyword) Cataloging
4
Archive Metadata (Keyword) Cataloging
4. FileTypeKeywordField – A mapping table that maps the keywords in KeywordField to a
file type in FileType.
a. FileTypeID – Foreign key into FileType
b. KeywordFieldID – Foreign key into KeywordField
FileTypeID
24
25
26
24
25
26
…
27
27
27
27
27
27
27
27
27
27
27
27
26
26
24
25
26
KeywordFieldID
220
220
220
221
221
221
…
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
426
426
Fig. 1.4 Partial WSS Entries in FileTypeKeywordField Table
5. KeywordFieldMap – a view that combines all the information in FileType,
MetaTableInfo, KeywordField and FileTypeKeywordField. This is what is used by Ingest
code to populate the metadata tables from the keywords.
7/29/2017
Archive MetaData (Keyword) Cataloging
5
Archive Metadata (Keyword) Cataloging
Keyword Cataloging – How it works
There are different ways the keyword cataloging can be implemented. Below is an option to be
considered.
1. Process the Metadata element
for each HeaderExtension
get the extName, hduNum attributes
for each keyword in HeaderExtension
if element = “keyword” then
process the keyword, passing in hduNum (as a Ctr)
end if
if element = “TableRow” then
increment rowCtr
for each column in TableRow
process the column, passing in rowCtr
end loop
end if
end loop
end loop
loop through the table name and Ctr information in table order which was saved while
processing each keyword
Create stored procedure call by joining all parameters in the hash.
Execute the stored procedure call.
end loop
Following is an option for processing the Keyword/Column elements. The idea is to save the
stored procedure parameters in a hash, so they can be easily joined at the end. The key of the
hash is the table order and table name with either the extCtr or rowCtr. The value of the hash
is a field/value in the form of: @fld = value.
Do not process the keyword under any of the following circumstances:
a. There is no entry in KeywordFieldMap for the FileTypeID, keyword name and
extension name.
b. The extension name in the Ingest Manifest does not match the extension name
in KeywordFieldMap for the same keyword.
c. multInserts is false and the keyword has already been processed.
d. specialProcess for entry is IGNORE.
Loop through table name, field name, quoteIt, multInserts and specialProcess from
KeywordFieldMap using FileTypeID, keyword name and extension name.
Store keyword by table name to indicate it was processed.
If specialProcess is not null then make a call to the special process function.
7/29/2017
Archive MetaData (Keyword) Cataloging
6
Archive Metadata (Keyword) Cataloging
if quoteIt is true then
quote the keyword value
end if
if multInserts is true then
The hash key is the table order + table name and the extCtr or rowCtr.
else
The hash key is the table order + table name and a zero.
end if
Store the parameter information in the hash based on the hash key with the value
being: @<field> = value (with appropriate quoting).
7/29/2017
Archive MetaData (Keyword) Cataloging
7
Archive Metadata (Keyword) Cataloging
Keyword Cataloging – Examples
The metadata tables are populated using stored procedure calls. The stored procedure name is
defined as spIngest_Insert<tableName> with a list of parameters defined as @<field> = value
(with appropriate quoting).
Example 1: Same field filled by same keyword from different types of files
FileT Keyword
ypeID
extName
25
26
24
PRIMARY
PRIMARY
PRIMARY
AP_TYPE
AP_TYPE
AP_TYPE
table
Nam
e
Wss
Wss
Wss
fieldNa
me
quoteI
t
ap_type 1
ap_type 1
ap_type 1
multInse specialPro
rts
cess
tableOr
der
0
0
0
1
1
1
NULL
NULL
NULL
Any file with a file type of WSS WEX, WSS WAS or WSS MCS populates the Wss table, ap_type
field with the AP_TYPE keyword from the PRIMARY header and this value should be quoted in
the stored procedure call. Since tableOrder is 1 this stored procedure should be called before
any stored procedure with a tableOrder of 2.
Example 2: Same field filled by two different keywords from different types of files
FileT Keyword
ypeID
25
MCS_OP
26
WAS_OP
extName
PRIMARY
PRIMARY
tableN fieldName
ame
Wss
operation
Wss
operation
quote
It
1
1
multI
nserts
0
0
specialPr
ocess
NULL
NULL
tableO
rder
1
1
Take the quoted MCS_OP keyword from the PRIMARY header in the WSS MCS file and put it in
the Wss.operation field for an MCS type of file. Take the quoted WAS_OP keyword from the
PRIMARY header in the WSS WAS file and put it in the same Wss.operation field (but different
row) for a WAS type of file. Since tableOrder is 1 this stored procedure should be called before
any stored procedure with a tableOrder of 2.
Example 3: Same Keyword populating multiple rows in table from same file but different
header extensions
FileT Keyword
ypeI
D
27
DEFOCUS
27
EXTNUM
7/29/2017
extName
tableName
fieldNa
me
RAW_PSF
RAW_PSF
WssOpdDefocus defocus
WssOpdDefocus extnum
Archive MetaData (Keyword) Cataloging
quo multI
teIt nserts
0
0
1
1
special
Proces
s
NULL
NULL
table
Order
2
2
8
Archive Metadata (Keyword) Cataloging
EXTNUM and DEFOCUS are 2 examples of a keyword that populate multiple rows in a table.
They appear in multiple RAW_PSF headers in the WSS OPD file. Add a row to WssOpdDefocus
table for both of those keywords for each RAW_PSF extension header. So if there are 4
RAW_PSF header extensions there will be 4 rows in WssOpdDefocus. quoteIt is false since
these fields are both numbers, multInserts is true since the table entry is added for data from
each RAW_PSF header extension. The tableOrder here is 2 which, means in this case that the
WssOpd table must be populated before this table can be populated.
Example 4: Same Keyword in different header extensions within the same file populating
different fields in the same table.
FileT Keyword extName
ypeI
D
27
RMS_WFE EXPECTED
27
RMS_WFE RESULT_PHASE
tableNa
me
fieldName
WssOpd rms_wfe_e
WssOpd rms_wfe_r
quo multI
teIt nsert
s
0
0
0
0
special
Proces
s
NULL
NULL
table
Order
1
1
RMS_WFE is found in two different extension headers within the same WSS OPD file. We want
the keyword values from both header extensions to populate different fields in the same row in
the same table. The RMS_WFE keyword in the EXPECTED header extension should populate
rms_wfe_e. The RMS_WFE keyword in the RESULT_PHASE header extension should populate
rms_wfe_r. quoteIt is false since the keyword values are numbers and multInserts is false since
both keyword values are going into the same row in the table. TableOrder is 1 so no other
table has to be populated before this table.
Example 5: Keywords that need special processing.
FileT Keyword
ypeI
D
27
DATE-OBS
extName
27
TIME-OBS
tableNa
me
specialProces
s
table
Order
PRIMARY
fieldNam quo mult
e
teIt Inse
rts
WssOpd date_obs 1
0
1
PRIMARY
WssOpd date_obs
CONCAT('
',TIME-OBS)
IGNORE()
1
0
1
To date we only have two functions for specialProcess: CONCAT and IGNORE. In this case we
have two separate keywords that we want to concatenate into one database field. The DATEOBS keyword has a special Process of CONCAT(' ',TIME-OBS) which means take the DATE-OBS
keyword and concatenate the TIME-OBS keyword to it with a space between them and put the
resulting value into the WssOpd.date_obs field. This is a DATETIME2 field to the value should
be quoted. The TIME-OBS keyword has a specialProcess of IGNORE() which means this entry
should not be used in creating the stored procedure call. We still have TIME-OBS in
KeywordFieldMap since we want to know which keywords are needed for Ingest Metadata
processing and it’s useful to know into which field the TIME-OBS keyword value was placed.
7/29/2017
Archive MetaData (Keyword) Cataloging
9
Archive Metadata (Keyword) Cataloging
WSS Tables
There are currently 5 WSS tables.
1. Wss – Contains one row for each fits file archived for types WEX, WAS and MCS. All the
table fields should be filled in except for WEX does not fill in operation or seq_num. The
fields in this table are primarily filled by the PRIMARY extension with the exception of
FILECNT which is in the ZIPFILES extension. This table has a foreign key on ArchiveFile.
Fig 3.3 is not in a MS Word table because there were too many columns.
ArchiveFileID ap_type corr_id
userName environ
oper zipname date filecnt
466 WAS R2015032606
kulp ANA MIMF
1
F
WAS-MIMF-01 2015-03-26 21:02:48.0000000
18
7/29/2017
Archive MetaData (Keyword) Cataloging
operation seq_num
R2015032606-ANA-kulp-
10
Archive Metadata (Keyword) Cataloging
468 WEX N2015032703 kulp OPS NULL NULL F
WEX.zip 2015-03-27 16:38:57.0000000
6
N2015032703-ANA-kulp-
Fig. 3.3 Sample Entries in Wss table
2. WssZipFile – Contains one row for each row in the ZIPFILES table extension. The column
names are derived from the TTYPEn Keyword in the table extension where the n stands
for the column number, e.g. TTYPE1 = name, TTYPE2 = tstamp, TTYPE3 = type. This table
has a foreign key on Wss.
ArchiveF
ileID
466
466
name
tstamp
type
2015-03-18 22:50:00.0000000
2015-03-26 19:54:46.0000000
WAS Session Data
Calibrated Science Image
2015-03-26 19:54:46.0000000
Calibrated Science Image
2015-03-26 20:12:38.0000000
Calibrated Science Image
2015-03-26 20:15:03.0000000
Calibrated Science Image
2015-03-26 20:15:02.0000000
Calibrated Science Image
2015-03-26 20:15:01.0000000
Calibrated Science Image
2015-03-26 20:15:04.0000000
Calibrated Science Image
2015-03-26 20:15:22.0000000
Calibrated Science Image
2015-03-26 20:16:01.0000000
Calibrated Science Image
2015-03-26 20:16:01.0000000
Calibrated Science Image
466
466
466
466
466
466
466
468
468
468
468
468
alg_fa01_session_data.xml
jw80500016001_02101_00001_MIRIFULONG_uncal_
MiriSloperPipeline.fits
jw80500016001_02101_00001_MIRIFULONG_uncal_
MiriSloperPipeline_MiriSpec2Pipeline.fits
jw80500051001_02102_00001_MIRIFUSHORT_uncal
_MiriSloperPipeline.fits
jw82500001003_02101_00001_NRCA1_uncal_Sloper
Pipeline.fits
jw82500001003_02101_00001_NRCA1_uncal_Sloper
Pipeline_Image2Pipeline.fits
jw82500001003_02101_00001_NRCB1_uncal_Sloper
Pipeline.fits
jw82500001003_02101_00001_NRCB1_uncal_Sloper
Pipeline_Image2Pipeline.fits
jw84500002001_02101_00001_NRS1_uncal_SloperPi
peline.fits
jw87500020001_02101_00001_NIS_uncal_Sloper_Pi
peline.fits
jw87500020001_02101_00001_NIS_uncal_SloperPip
eline_Spec2Pipeline.fits
MCS_CONFIG_TEST.XML
operation_file.xml
SimulatedMSDB.xml
was_activity_log.xml
WAS_CONFIG_wss_i_n_t.xml
was_status_file.xml
wex_activity_log.xml
CorrectionIdDB.xml
FileAssociations.xml
Mementos.xml
mirrorStateHistory.xml
wavefrontAnalysisHistory.xml
2015-01-23 14:47:51.0000000
2015-03-26 20:32:01.0000000
2015-02-23 22:33:27.0000000
2015-03-26 20:32:00.0000000
2015-03-26 20:43:17.0000000
2015-03-26 20:32:00.0000000
2015-03-26 20:32:01.0000000
2015-03-27 16:30:46.0000000
2015-03-27 14:13:59.0000000
2015-03-27 16:38:53.0000000
2015-03-27 16:30:45.0000000
2015-03-26 15:30:03.0000000
468
WssConfig.xml
2015-03-27 16:29:20.0000000
MCS Configuration Data
WAS Operation File
Mirror State Database
WAS Activity Log
WAS Configuration Data
WAS Status File
WEx Activity Log
Correction Id Database
File Associations
Mementos File
Mirror State History File
Wavefront Analysis
History File
WSS Configuration File
466
466
466
466
466
466
466
466
466
Fig. 3.4 Sample WssZipFile Contents
3. WssVisitMap – Contains one row for each row in the VISITMAP table in the table
extension. The column names are derived from the TTYPEn keyword in the table
extension where the n stands for the column number, e.g. TTYPE1 = corr_id, TTYPE2 =
visit_id. This table has a foreign key on Wss.
7/29/2017
Archive MetaData (Keyword) Cataloging
11
Archive Metadata (Keyword) Cataloging
ArchiveFileID
466
466
466
466
466
corr_id
R2015032606
R2015032606
R2015032606
R2015032606
R2015032606
visit_id
V80500016001
V80500051001
V82500001003
V84500002001
V87500020001
Fig. 3.5 Sample WSSVisitMap Contents
4. WssOpd – Contains 1 row for each OPD FITS file. The keywords in this table are a
combination of the PRIMARY extension plus specific keywords from the other
extensions. For the contents below, only 4 of the hex* keywords are listed. There is an
rms_wfe_r from the RESULTS_PHASE extension (keyword=RMS_WFE) and rms_wfe_e
from the EXPECTED extension (keyword=RMS_WFE). This table has a foreign key on
ArchiveFile. Fig 3.6 is not in a MS Word table because there were too many columns.
ArchiveFileID filecont
tstamp
date_obs obs_id
op_type modeoper
corr_id
group_id grp_cnt
instrume detector apername fpx fpy
rms_wfe_r hexa1z0 hexa1z1 hexa6z7 hexc1z1 rms_wfe_e xcontent
464 Analysis output products
2015-04-08 15:50:13.0000000
2011-05-11
01:30:23.0000000
V12345024003P3456701703202 WAVEFRONT_MAINTENANCE
AUTOMATIC
F
R2015040801
1
4
NIRCam-A NULL APERNAME 0.910331 -0.674433 0.276277 -0.0778056 0.0133364 0.0751282 0.0218403
0.27134
Optical path difference (in micrometers)
Fig. 3.6 Sample WSSOpd Contents
7/29/2017
Archive MetaData (Keyword) Cataloging
12
Archive Metadata (Keyword) Cataloging
5. WssOpdDefocus – Contains 1 row for each DEFOCUS keyword found in each RAW_PSF
HeaderExtension. This table has a foreign key on WssOpd.
ArchiveFileID
464
464
464
464
extNum
1
2
3
4
defocus
-3.5
-1.61321
-3.5
3.77359
Fig. 3.7 Sample WSSOpdDefocus Contents
File Tracking Table
There is only one File Tracking table that we care about and only because it’s involved; it
provides the ArchiveFileID to all the Wss tables in the design.
1. ArchiveFile – contains the ArchiveFileID that is used in the primary key for all the WSS
tables.
ArchiveFileID
fileName
FileTypeID
464
465
466
467
468
R2015040801-APERNAME-1.FITS
R2015032606-ANA-kulp-WAS-MIMF-01.zip
R2015032606-ANA-kulp-WAS-MIMF-01.FITS
N2015032703-ANA-kulp-WEX.zip
N2015032703-ANA-kulp-WEX.FITS
27
26
26
24
24
Fig. 3.8 Sample WSS ArchiveFile Contents
7/29/2017
Archive MetaData (Keyword) Cataloging
13