www.visceral.eu Data format definition focusing on Competition 2 and beyond Deliverable number D2.2.2 Dissemination level Public Delivery date 31 October 2013 Status Final Author(s) Tomas Salas This project is supported by the European Commission under the Information and Communication Technologies (ICT) Theme of the 7th Framework Programme for Research and Technological Development. Grant Agreement Number: 318068 D2.2.2 Data format definition focusing on Competition 2 and beyond Executive Summary VISCERAL will provide a very large data set of medical images which will be used for an image retrieval benchmark and the automated annotation of these images. These data will come mostly from electronic health records, and have been collected to provide health care. Original data will have to go through a series of transformations in order to address legal issues and also to ensure that conforms to the needs of the benchmarking process. This deliverable describes the format conventions for the collection, storage, and distribution of data in the VISCERAL project with a focus on competition 2. The deliverable provides a detailed description of these conventions, and how they should be implemented in the VISCERAL project. To keep data management overhead at a minimum, the conventions are fixed at the beginning of the project, and it is planned to keep these conventions throughout the project lifetime. The conventions are formulated in a way that allows appending additional information later in the project, if relevant annotation aspects arise later. In any case newer conventions should be backwards compatible, so that existing pipelines can stay fixed. Page 2 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Table of Contents 1 2 3 4 Introduction ........................................................................................................................ 4 Original data ....................................................................................................................... 4 Information classification .................................................................................................. 4 Final dataset ........................................................................................................................ 8 4.1 Objects included in the final dataset .................................................................................... 8 4.2 Objects excluded from final dataset .................................................................................... 9 4.3 Transformations performed on the final data set ............................................................ 10 4.4 DICOM headers .................................................................................................................. 11 4.5 Metadata .............................................................................................................................. 15 4.6 Pixel data .............................................................................................................................. 22 4.7 Quality control ..................................................................................................................... 22 4.7.1 Conformity with DICOM standard ............................................................................... 22 4.7.2 Information within the study ......................................................................................... 22 4.8 Remaining re-identification risks....................................................................................... 23 5 6 Conclusion ......................................................................................................................... 23 References ......................................................................................................................... 24 List of Abbreviations DICOM Digital Imaging and Communications in Medicine MRI Magnetic Resonance Imaging CT Computed Tomography RSNA Radiological Society of North America HIPAA Health Insurance Portability and Accountability Act CDA Clinical Document Architecture PDF Portable Document File Page 3 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond 1 Introduction Images provided by GENCAT to the VISCERAL project are a subset of the imaging studies stored in its medical imaging central archive. This information consists solely of DICOM[1] objects. This deliverable covers main characteristics of the original data, criteria used in order to obtain a subset of this data, transformations performed on the final data set to meet data protection legal requirements and a detailed description of the final dataset. Prior to their distribution, original DICOM objects and images will have to be processed in order to meet specific project requirements, like ensuring data privacy or provide basic metadata that allows to perform targeted extractions. 2 Original data Data to be provided to the VISCERAL project will be a subset obtained from a central medical imaging archive containing more than 8.000.000 procedures. It’s important to notice that the system only contains DICOM objects, and only will provide images (pixel data) and some metadata associated with them. The current system is designed to grant authorised professionals access to the information they need in order to provide healthcare. While it provides comprehensive information about one single patient, it does not allow exploitation for research purposes. Prior to exploitation, pre-processing, data analysis and modelling of the dataset are needed in order to obtain a subset which addresses VISCERAL needs. Main issues to solve through this process are: - The system contains personal health data (patient information) and may also include nonhealth personal data (data identifying different healthcare professionals, patient’s relatives, etc.). - The original system only provides aggregated information about the modality performing the study, with no information about the procedure that has been performed. Considering this, pre-processing process goals are: - Make the original information as close to anonymous as possible. - Obtain from DICOM headers information that will help to create a dataset according to VISCERAL needs: patient’s sex and age, body part examined and, optionally, some additional details about the performed procedure (anatomical focus, reason for study, etc.). 3 Information classification Resulting from processing prior to data extraction, the information provided in order to classify imaging procedures will be: Page 4 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond - Modality - Patient’s age - Patient’s sex - Body Region - Anatomical Focus (option) - Modality modifier (option) Body Region, Anatomical Focus, and Modality modifier had been obtained from the Study Description, which is, in most of the cases, the description of the requested procedure to the imaging service. RADLEX Playbook has been used for standardized classification of the information obtained from the Study Description. RadLex Playbook[2] is a component of the RadLex controlled terminology that provides a standard lexicon for radiology orderables. RadLex terminology has been developed by and it is maintained by the Radiological Society of North America. The final data set will be created on demand, according to available modalities, body regions, anatomical focus and modality modifiers. Available information at the moment of writing this document is as follows. MRI Body Region Abdomen Abdomen Abdomen Abdomen Abdomen Abdomen Abdomen Abdomen Abdomen Abdomen Bone Breast Cervical Spine Chest Chest Chest Chest Chest Chest Chest Chest Chest Face Face Face Anatomic Focus Modality Modifier Gastrointestinal Tract Gastrointestinal Tract Gastrointestinal Tract Kidney Kidney Liver Liver Pancreas Pancreas Colonography Enterography Angiography Cholangiography Cholangiography Cervical Chest Wall Heart Mediastinum Pulmonary Veins Ribs Sternoclavicular Joint Thoracic Thoracic Angiography Angiography Maxillofacial Parotid Gland Page 5 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Body Region Head Head Head Head Head Head Head Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lumbar Spine Lumbosacral Spine Neck Pelvis Pelvis Pelvis Pelvis Pelvis Spine Thoracic Spine Trunk Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Anatomic Focus Modality Modifier Brain Brain Internal Auditory Canal Paranasal Sinuses Pituitary Gland Sella Turcica Angiography Ankle Femur Fingers Foot Knee Knee Knee Knee Leg Thigh Lumbar Lumbar Arthrography Arthrography Total Arthrography Hip Prostate Rectum Sacrum Thoracic Arm Carpal Bone Elbow Fingers Forearm Hand Humerus Shoulder Wrist CT Body Region Anatomic Focus Modality Modifier Abdomen Page 6 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Body Region Anatomic Focus Modality Modifier Abdomen Abdomen Abdomen Abdomen Abdomen Abdomen Abdomen Bone Cervical Spine Chest Chest Chest Chest Chest Chest Chest Chest Chest Chest Chest Face Face Face Face Head Head Head Head Head Head Head Head Head Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lower Extremity Lumbar Spine Gastrointestinal Tract Gastrointestinal Tract Gastrointestinal Tract Kidney Liver Pancreas Peritoneum Colonography Enterography Cervical Chest Wall Clavicle Coronary Arteries Heart Lung Pulmonary Veins Ribs Sternoclavicular Joint Sternum Thoracic Maxillofacial Orbits Paranasal Sinuses Brain Brain Brain Ear Internal Auditory Canal Middle Ear Paranasal Sinuses Sella Turcica Angiography Perfusion Ankle Femur Fingers Foot Knee Leg Thigh Lumbar Page 7 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Body Region Anatomic Focus Lumbosacral Spine Neck Neck Neck Pelvis Pelvis Pelvis Spine Thoracic Spine Trunk Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Upper Extremity Lumbar Modality Modifier Larynx Thyroid Gland Hip Sacrum Thoracic Arm Carpal Bone Elbow Forearm Humerus Shoulder Wrist 4 Final dataset The final dataset will be produced from studies selected in the previous list. It will consist of a Windows File System storing a modified copy of most of the original DICOM objects and a database or Excel file with metadata describing those objects and their location within the file system. The process will preserve the original structure of the study, with objects grouped into series, and series grouped into studies. This structure will be transmitted both through the metadata database and the images header. No modification will be performed on pixel data. Reasons to modify or to exclude DICOM objects from the final dataset are related to privacy concerns or ethical issues. 4.1 Objects included in the final dataset An imaging procedure will generate a collection of DICOM objects, mainly images. However another DICOM objects may be present within the study. This table shows objects that could be included within the study, image ones identified by an asterisk (*). The rest of the objects offer additional information consisting on annotations, measurements and visualization parameters. The table includes the DICOM unique identifier for each object (SOP Class UID). Detailed information on objects definition can be found at PS 3.3- 2012 Digital Imaging and Communications in Medicine (DICOM) Part 3: Information Object Definitions[3] Page 8 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond SOP Class UID Description Comments * 1.2.840.10008.5.1.4.1.1.2 CTImageStorage 1.2.840.10008.5.1.4.1.1.2.1 1.2.840.10008.5.1.4.1.1.4 Enhanced CT Image Storage * MRImageStorage 1.2.840.10008.5.1.4.1.1.4.1 Enhanced MR Image Storage * * * 1.2.840.10008.5.1.4.1.1.7 SecondaryCaptureImageStorage 1.2.840.10008.5.1.4.1.1.66.4 Segmentation Storage 1.2.840.10008.5.1.4.1.1.11.1 GrayscaleSoftcopyPresentationState 1.2.840.10008.5.1.4.1.1.88.59 KeyObjectSelectionDocument Secondary Captures are strong candidates to present personal information burned into the pixel data. As indicated, instances with the tag ‘BurnedInAnnotation’ set to ‘YES’ will be removed from the data set. Privacy concerns will make necessary a manual revision of Secondary Captures present in the final dataset. Results of this check will determine whether Secondary Captures will be or will be not included into the final dataset. Inclusion of this object and in which conditions should be discussed as it may potentially introduce bias in the benchmark process. Inclusion of this object and in which conditions should be discussed as it may potentially introduce bias in the benchmark process. Inclusion of this object and in which conditions should be discussed as it may potentially introduce bias in the benchmark process. 4.2 Objects excluded from final dataset 1. Final data set will not include imaging procedures from patients under 18 years 2. Studies known to be performed because of rare diseases or, in general, when the number of procedures available for a given category is too small, will not be included within the final dataset. 3. Objects which may include personal health data as non-structured data or binary data will be excluded from the final dataset. These objects are mainly reports in different forms: DICOM Structure Reports (SR), Adobe PDFs or CDAs (XML documents according to HL7 specifications for a clinical document), but also objects containing raw data or procedure logs. These objects share the characteristic to potentially present personal health data as non structured information. Anonymization of this kind of information not only presents problems from the technical point of view, but also lacks a clear definition on how to deal with it, as far as laws don’t address properly the need of processing personal data in order to anonymize it. As above, DICOM unique identifier for each object has been included, and additional information can be found within the DICOM Part 3 document. Page 9 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond SOP Class UID 1.2.840.10008.5.1.4.1.1.66 Description Raw Data 1.2.840.10008.5.1.4.1.1.88.11 Basic Text SR 1.2.840.10008.5.1.4.1.1.88.22 Enhanced SR 1.2.840.10008.5.1.4.1.1.88.33 Comprehensive SR 1.2.840.10008.5.1.4.1.1.88.40 Procedure Log 1.2.840.10008.5.1.4.1.1.88.65 Chest CAD SR 1.2.840.10008.5.1.4.1.1.88.67 X-Ray Radiation Dose SR 1.2.840.10008.5.1.4.1.1.88.69 Colon CAD SR 1.2.840.10008.5.1.4.1.1.88.70 Implantation Plan SR Document 1.2.840.10008.5.1.4.1.1.104.1 Encapsulated PDF 1.2.840.10008.5.1.4.1.1.104.2 Encapsulated CDA 4. Individual instances from a study which present privacy concerns will be excluded from the final dataset. At the moment this restriction affects DICOM objects including the following tags when its value is set to ‘YES’: DICOM Tag (0028,0301) (0028,0302) Description Value Exclusion reason BurnedInAnnotation YES Indicates that personal data has been included as pixels within the object. RecognizableVisual Features YES Indicates that the object contains sufficiently recognizable visual features to allow the image or a reconstruction from a set of images to identify the patient. 4.3 Transformations performed on the final data set The final data set will be processed in order to remove personal data and personal health data. Transformations performed on DICOM objects have been performed taking into account HIPAA safe harbour specifications[4] and DICOM supplement 142[5], and include: 1. Removal of ‘identifiers’ and ‘quasi identifiers’. Those more commonly found in image headers are: Names: Patient’s names, Patient’s relatives names and healthcare professional’s names Page 10 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Any reference to the healthcare provider ordering, performing, or reporting the study Dates: birth date, admission date, discharge date, study date Medical record numbers Device identifiers and serial numbers 2. As an exception to the above, patient’s age will be provided, but with some limitations: a. Ages have been grouped into ranks: 18-24 35-39 50-54 65-69 25-29 40-44 55-59 70-74 30-34 45-49 60-64 75-79 As DICOM only supports a single value for patient’s age, each patient has been assigned an age within their rank. b. Patients aged 80 or older have been grouped and age has been set to ‘80’ for all of them 3. All dates within the object have been modified or removed, specifically the patient’s date of birth has been removed 4. UIDs have been modified, with the exception of SOP Class UID that has been preserved 5. Tags intended to contain free text information have been extracted to the database and removed from the DICOM object. 6. Vendor proprietary tags have been removed 4.4 DICOM headers Contents of the DICOM header will depend on the modality, manufacturer, configurations decided by the image provider, decisions made by professionals while performing the test, and further postprocessing tasks of the images. Even if exact content cannot be predicted, it would be very similar to these real examples offered here: CT Tag Attribute Name VR Value (0002,0001) FileMetaInformationVersion OB 00\01 (0002,0002) MediaStorageSOPClassUID UI (0002,0003) MediaStorageSOPInstanceUID UI 1.2.840.10008.5.1.4.1.1.2 1.2.3.4.5.29672964508301581263868831094053736059 0 (0002,0010) TransferSyntaxUID UI 1.2.840.10008.1.2.1 (0002,0012) ImplementationClassUID UI 1.2.40.0.13.1.1 (0002,0013) ImplementationVersionName SH dcm4che-1.4.27 Page 11 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Tag Attribute Name VR Value (0008,0005) SpecificCharacterSet CS ISO_IR 100 (0008,0008) ImageType CS ORIGINAL\PRIMARY\AXIAL (0008,0012) InstanceCreationDate DA 20081211 (0008,0016) SOPClassUID UI (0008,0018) SOPInstanceUID UI 1.2.840.10008.5.1.4.1.1.2 1.2.3.4.5.29672964508301581263868831094053736059 0 (0008,0020) StudyDate DA 20081211 (0008,0021) SeriesDate DA 20081211 (0008,0023) ContentDate DA 20081211 (0008,0030) StudyTime TM (0008,0031) SeriesTime TM (0008,0050) AccessionNumber SH 153940361384910000735631294983738438801 (0008,0060) Modality CS CT (0008,0090) ReferringPhysiciansName PN (0010,0010) PatientName PN Anonymous (0010,0020) PatientID LO Anonymous-ID (0010,0040) PatientSex CS M (0010,1010) PatientAge AS 048Y (0012,0062) Undefined UN YES (0012,0063) Undefined UN DICOM-S142-Baseline (0018,0022) ScanOptions CS AXIAL MODE (0018,0050) SliceThickness DS 10.0 (0018,0060) KVP DS 100.0 (0018,0090) DataCollectionDiameter DS 250.0 (0018,1020) SoftwareVersion LO 07MW11.10 (0018,1100) ReconstructionDiameter DS 250.0 (0018,1110) DistanceSourceToDetector DS 949.075 (0018,1111) DistanceSourceToPatient DS 541.0 (0018,1120) GantryDetectorTilt DS 0.0 (0018,1130) TableHeight DS 179.9 (0018,1140) RotationDirection CS CW (0018,1150) ExposureTime IS 500 (0018,1151) XRayTubeCurrent IS 100 (0018,1152) Exposure IS 50 (0018,1160) FilterType SH HEAD FILTER (0018,1170) GeneratorPower IS 10000 (0018,1190) FocalSpot DS 1.2 (0018,1210) ConvolutionKernel SH SOFT (0018,5100) PatientPosition CS HFS (0020,000D) StudyInstanceUID UI 1.2.3.4.5.87617896750017244363385293660019016200 (0020,000E) SeriesInstanceUID UI 1.2.3.4.5.32665506069341097807191958223385598552 (0020,0010) StudyID SH (0020,0011) SeriesNumber IS 200 (0020,0012) AcquisitionNumber IS 5 Page 12 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Tag Attribute Name VR Value (0020,0013) InstanceNumber IS 5 (0020,0032) ImagePositionPatient DS -128.0\-119.7\-243.5 (0020,0037) ImageOrientationPatient DS (0020,0052) FrameOfReferenceUID UI 1.0\0.0\0.0\0.0\1.0\0.0 1.2.3.4.5.18525705700927828592346014678953405909 7 (0020,1040) PositionReferenceIndicator LO SN (0020,1041) SliceLocation DS -243.5 (0028,0002) SamplesPerPixel US 1 (0028,0004) PhotometricInterpretation CS MONOCHROME2 (0028,0010) Rows US 512 (0028,0011) Columns US 512 (0028,0030) PixelSpacing DS 0.488281\0.488281 (0028,0100) BitsAllocated US 16 (0028,0101) BitsStored US 16 (0028,0102) HighBit US 15 (0028,0103) PixelRepresentation 1 (0028,0120) PixelPaddingValue US US|S S (0028,1050) WindowCenter DS 150.0 (0028,1051) WindowWidth DS 700.0 (0028,1052) RescaleIntercept DS -1024.0 (0028,1053) RescaleSlope DS 1.0 (0028,1054) RescaleType LO PerformedProcedureStepStartDa te DA HU (0040,0244) -2000 20081211 MR Tag Attribute Name FileMetaInformationVersio n VR Value OB 00\01 UI 1.2.840.10008.5.1.4.1.1.4 (0002,0003) MediaStorageSOPClassUID MediaStorageSOPInstance UID UI 1.2.3.4.5.69439814550354249985023682275304301129 (0002,0010) TransferSyntaxUID UI 1.2.840.10008.1.2.4.70 (0002,0012) UI 1.2.40.0.13.1.1 (0002,0013) ImplementationClassUID ImplementationVersionNa me SH dcm4che-1.4.27 (0008,0005) SpecificCharacterSet CS ISO_IR 100 (0008,0008) ImageType CS ORIGINAL\PRIMARY\DIFFUSION\NONE\ND\NORM (0008,0012) InstanceCreationDate DA 20090311 (0008,0016) SOPClassUID UI 1.2.840.10008.5.1.4.1.1.4 (0008,0018) SOPInstanceUID UI 1.2.3.4.5.69439814550354249985023682275304301129 (0008,0020) StudyDate DA 20090311 (0008,0021) SeriesDate DA 20090311 (0008,0023) ContentDate DA 20090311 (0008,0030) StudyTime TM (0008,0031) SeriesTime TM (0002,0001) (0002,0002) Page 13 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Tag Attribute Name VR Value (0008,0050) AccessionNumber SH 135515459970539662550887127401933904768 (0008,0060) Modality CS MR (0008,0090) ReferringPhysiciansName PN (0010,0010) PatientName PN Anonymous (0010,0020) PatientID LO Anonymous-ID (0010,0040) PatientSex CS M (0010,1010) PatientAge AS 057Y (0012,0062) Undefined UN YES (0012,0063) Undefined UN DICOM-S142-Baseline (0018,0020) ScanningSequence CS EP (0018,0021) SequenceVariant CS SK\SP (0018,0022) ScanOptions CS PFP\FS (0018,0023) MRAcquisitionType CS 2D (0018,0024) SequenceName SH *ep_b1000#5 (0018,0025) AngioFlag CS N (0018,0050) SliceThickness DS 4.0 (0018,0080) RepetitionTime DS 6300.0 (0018,0081) EchoTime DS 100.0 (0018,0083) NumberOfAverages DS 1.0 (0018,0084) ImagingFrequency DS 123.259445 (0018,0085) ImagedNucleus SH 1H (0018,0086) EchoNumber IS 1 (0018,0087) MagneticFieldStrength DS 3.0 (0018,0088) 5.2 (0018,0089) SpacingBetweenSlices DS NumberOfPhaseEncodingSt eps IS (0018,0091) EchoTrainLength IS 1 (0018,0093) PercentSampling DS 100.0 (0018,0094) PercentPhaseFieldOfView DS 100.0 (0018,0095) PixelBandwidth DS 1002.0 (0018,1020) SoftwareVersion LO syngo MR B17 (0018,1251) TransmittingCoil SH Body (0018,1310) AcquisitionMatrix US 192\0\0\192 (0018,1312) PhaseEncodingDirection CS COL (0018,1314) FlipAngle DS 90.0 (0018,1315) VariableFlipAngleFlag CS N (0018,1316) SAR DS 0.109896615 (0018,1318) dBdt DS 0.0 (0018,5100) PatientPosition CS HFS (0020,000D) StudyInstanceUID UI 1.2.3.4.5.183201517982418290293516120352495178384 (0020,000E) SeriesInstanceUID UI 1.2.3.4.5.66601054998603300312696841387804239516 (0020,0010) StudyID SH (0020,0011) SeriesNumber IS 2 (0020,0012) AcquisitionNumber IS 6 143 Page 14 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Tag Attribute Name VR Value (0020,0013) InstanceNumber IS 150 (0020,0032) ImagePositionPatient DS (0020,0037) ImageOrientationPatient DS -126.54206\-70.64552\91.29098 0.9985181\-0.040072843\0.036820725\0.048835676\0.9583674\0.28133065 (0020,0052) FrameOfReferenceUID UI 1.2.3.4.5.82849207386744743147565708052495260877 (0020,1040) PositionReferenceIndicator LO (0020,1041) SliceLocation DS 70.60576 (0028,0002) SamplesPerPixel US 1 (0028,0004) PhotometricInterpretation CS MONOCHROME2 (0028,0010) Rows US 192 (0028,0011) Columns US 192 (0028,0030) PixelSpacing DS 1.25\1.25 (0028,0100) BitsAllocated US 16 (0028,0101) BitsStored US 12 (0028,0102) HighBit US 11 (0028,0103) PixelRepresentation 0 (0028,0106) SmallestImagePixelValue (0028,0107) LargestImagePixelValue US US|S S US|S S (0028,1050) WindowCenter DS 81.0 (0028,1051) WindowWidth WindowCenterWidthExplan ation PerformedProcedureStepSt artDate DS 221.0 LO Algo1 DA 20090311 (0028,1055) (0040,0244) 0 138 4.5 Metadata Metadata will include the following columns. Table Column DICOM Tag STUDY STUDYDATETIME STUDY NUMBEROFSERIES STUDY MODALITIESINSTUDY (0008,0061) STUDY PATIENTSAGE (0010,1010) STUDY PATIENTSAGETYPE (0010,1010) PATIENTSAGETYPE Comments D=Days W=Weeks M=Months Y=Years STUDY PATIENTSWEIGHT (0010,1030) STUDY PATIENTSSEX (0010,0040) M = male F = female O = other STUDY TOTALINSTANCIES Study total instancies STUDY ADMITTINGDIAGNOSISDESCRIPTION (0008,1080) Page 15 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Table Column DICOM Tag STUDY MEDICALALERTS (0010,2000) STUDY ALLERGIES (0010,2110) STUDY ADDITIONALPATIENTHISTORY (0010,21B0) STUDY PREGNANCYSTATUS (0010,21C0) Comments 0001 = not pregnant 0002 = possibly pregnant 0003 = definitely pregnant 0004 = unknown STUDY PATIENTCOMMENTS (0010,4000) STUDY MAGNETICFIELDSTRENGTH (0018,0087) STUDY STUDYCOMMENTS (0032,4000) STUDY SPECIALNEEDS (0038,0050) STUDY PATIENTSTATE (0038,0500) STUDY PREMEDICATION (0040,0012) STUDY BODYREGION NA Head Face Neck Chest Breast Upper Extremity Abdomen Pelvis Lower Extremity Spine Cervical Spine Lumbar Spine Lumbosacral Spine Thoracic Spine Thoracolumbar Spine Trunk Bone STUDY ANATOMICFOCUS NA Acetabulum Aorta Appendix Aortic Root Carotid Arteries Coronary Arteries Perforator Arteries Pulmonary Arteries Joint Sternoclavicular Joint Temporomandibular Joint Left Atrium Forearm Page 16 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Table Column DICOM Tag Comments Arm Bladder Popliteal Fossa Leg Wrist Oral Cavity Brain Cervical Clavicle STUDY ANATOMICFOCUS Elbow Internal Auditory Canal Heart Vocal Cord Ribs Thigh Fingers Epidural Space Shoulder Sternum Stomach Femur Liver Posterior Cranial Fossa Knee Parotid Gland Salivary Gland Thyroid Gland Pituitary Gland Humerus Small Bowel Larynx Lumbar Hand Hip Maxillofacial Mediastinum Spleen Muscle Middle Ear Orbits Ear Long Bone Temporal Bone Carpal Bone Page 17 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Table Column DICOM Tag Comments Facial Bones Pancreas Lung Parenchyma Chest Wall Peritoneum Fibula Foot Pleura Circle Of Willis STUDY ANATOMICFOCUS Prostate Lung Cyst Renal Cyst Rectum Retroperitoneum Kidney Sacrum Sella Turcica Paranasal Sinuses Thoracic Outlet Subdiaphragm Adrenal Soft Tissue Of The Neck Tibia Thoracic Gastrointestinal Tract Trachea Ankle Coronary Veins Pulmonary Veins Airway STUDY REASONFOREXAM NA Ablation Radiofrequency Ablation ARVD Needle Aspiration Biopsy Nerve Block Calcium Score Calculus Stroke Kyphoplasty Screw Placement Needle Placement Page 18 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Table Column DICOM Tag Comments Foreign Body Craniosynostosis Cryoablation Diagnostic Donor Drainage Embolism Structure Screening STUDY REASONFOREXAM Facet Block Fiducial Fistula Fracture Function Gout Hematuria Hemorrhage Inflammation Congenital Disease Interstitial Disease Myelopathy Morphology Nanoknife Malignant Neoplasm Nodule Paracentesis Pericardiocentesis Post Op Pre Op Follow-Up Procedure Prosthesis Puncture Chemoembolization Radiculopathy Thoracentesis Trauma Tube Tumor Vascular Vertebroplasty STUDY MODALITYMODIFIER NA 3D Imaging Processing High Resolution Angiography Arthrography Page 19 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Table Column DICOM Tag Comments Bronchography Cisternography Cystography Placement Cholangiography Colonography Densitometry Dynamic Discogram STUDY MODALITYMODIFIER Low Dose Enterography Surgical Equipment Dental Scan Scanogram Guidance Limited Localization Measurement Myelography Single Phase Multiphase Pelvimetry Perfusion Portography Reconstruction Stereotaxis Triphasic Urography Venography SERIES SERIESDATETIME SERIES MODALITY (0008,0060) SERIES SOPCLASSUID (0008,0016) SERIES SOPCLASSUID DESCRIPTION SERIES SERIESDESCRIPTION (0008,103E) SERIES BODYPARTEXAMINED (0018,0015) SERIES PROTOCOLNAME (0018,1030) SERIES IMAGETYPE (0008,0008) SERIES PERFORMEDPROCEDURETYPEDESCRIPTION (0040,0255) SERIES SCHEDULEDPROCEDURESTEPDESCRIPTION (0040,0007) SERIES REQUESTPROCEDUREDESCRIPTION (0032,1060) SERIES MANUFACTURER (0008,0070) SERIES MANUFACTURERMODELNAME (0008,1090) SERIES REASONFORSTUDY (0032,1030) NA Page 20 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond Table Column DICOM Tag SERIES REASONFORTHEREQUESTEDPROCEDURE SERIES NUMEROINSTANCIAS SERIES CONTRASTBOLUSAGENT (0018,0010) SERIES SCANNINSEQUENCE (0018,0020) SERIES SEQUENCEVARIANT (0018,0021) SERIES SCANOPTIONS (0018,0022) SERIES MRAACQUISITIONTYPE (0018,0023) SERIES PATIENTPOSITION (0018,5100) Comments (0040,1002) NA Total series instancies HFP = Head First-Prone HFS = Head First-Supine SERIES PATIENTPOSITION HFDR=HeadFirst-Decubitus Right HFDL = Head First-Decubitus Left FFDR = Feet First-Decubitus Right FFDL = Feet First-Decubitus Left FFP = Feet First-Prone FFS = Feet First-Supine HFS = Head First-Supine SERIES LATERALITY (0020,0060) R = right L = left SERIES REQUESTEDCONTRASTAGENT (0032,1070) SERIES SLICETHICKNESS (0018,0050) SERIES SPACINGBETWEENSLICES (0018,0088) INSTANCE INSTANCEDATETIME NA INSTANCE RELATIVEPATH NA INSTANCE IMAGECOMMENTS (0020,4000) Page 21 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond 4.6 Pixel data Pixel data has not been modified. Below is a view of one image as retrieved through a DICOM viewer. 4.7 Quality control Final data will be verified prior to its transfer. Verifications will be designed to assure that DICOM objects are well formed and that the study contains at least the minimum amount of information required for VISCERAL. 4.7.1 Conformity with DICOM standard Two types of verification will be performed: 1. Manual retrieval of some studies using several DICOM viewers, info in the database consistent with viewer (series and instances within the study) 2. Automated revision of conformance of the resulting objects with DICOM standard 4.7.2 Information within the study For a number of reasons, a well formed study may not include enough information to perform an automated processing on it. The most usual case relates to the post processing of image instances, that should result in a new series within the original study. Even so, it is not uncommon to save resulting instances into a new study. As this new study will not include the original image instances, it is considered not to have utility for the VISCERAL project. The most effective way to address this and similar issues is to set a minimum Page 22 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond number of instances for the object to be considered valid. This number has been set at the moment to a minimum of 100 instances. Studies below this threshold will be removed from the dataset. 4.8 Remaining re-identification risks Personal information is any combination of values that can make a person identifiable, and therefore a known list of these values does not exist. It has to be considered also that, in order to effectively anonymize data, it would be necessary to have a better understanding of how data can be re-identified. In the present situation, it is not known which information is required or available to re-identify a given dataset, nor is it possible to assess the probability for an attacker to build or obtain a re-identifying database. Under these circumstances it cannot be assured that the whole set of data is completely anonymous. Re-identification risk could be non-significant for small datasets, but not acceptable for big ones. This issue will have to be addressed through organizational, technical and legal measures, among which: - Control access policies and mechanisms - Commitment ‘not to’: o produce additional copies of the dataset o re-use o try to re-identify - Delete from the data set records containing personal information and communicate to the data provider - Communicate any data breach 5 Conclusion Personal health data collected in order to provide healthcare cannot be used for research activities without prior preparation. In the case of GENCAT data, the main tasks performed to obtain the final data set to be provided to VISCERAL have been: 1. A semi-automated process of information classification based on modality and body part examined, with the body part examined obtained from the study description. The product of this classification process is a catalogue that classifies original information according to these criteria. This catalogue allows further processing of original information to be addressed to specific subsets of the original information 2. Assessment of DICOM objects according to the type of information they contain and the viability of de-identifying them where applicable. From this process a list of DICOM objects not to be extracted has been obtained. 3. Data extraction from the production systems according to selections performed on the above catalogue and the restrictions obtained from the assessment process. This extraction process will create a candidate encrypted copy of the original objects and a database containing metadata obtained from DICOM headers 4. De-identification of candidate objects Page 23 of 24 D2.2.2 Data format definition focusing on Competition 2 and beyond 5. Review of de-identified objects in order to remove potentially remaining personal health information 6. Quality controls As a result of these activities a transferable data set is obtained. 6 References [1] http://medical.nema.org/ [2] http://www.rsna.org/RadLex_Playbook.aspx [3] http://medical.nema.org/Dicom/2011/11_03pu.pdf [4]http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/Deidentification/guidance.html [5] ftp://medical.nema.org/medical/dicom/final/sup142_ft.pdf Page 24 of 24
© Copyright 2025 Paperzz