I-CORE Computation Center The Hebrew University of Jerusalem

I-CORE Computation Center
The Hebrew University of Jerusalem
and Hadassah Medical Center
Coordinators: Hanah Margalit, Tommy Kaplan
Motivation
• Large-scale new
technologies generate huge
amounts of data
• Computational analysis
is often the bottleneck in
many large-scale experiments
• Hardware
- Strong computers
- Large data storage unit
• Software
- Automatic pipeline
- Bioinformatics
Support Unit
Software
Strategy
• Appropriate
hardware
• Consultant
Dr. Lior Amar
Rotem Technologies
• Pipeline
development
• Programmer
Hagai Cohen
(Moshe Roseman)
• Bioinformatics
Support Unit
• Support Team
Dr. Sharona Elgavish
Dr. Yuval Nevo
In collaboration with
HUJI team
Users
• Hardware
• Experimental
• Experimental /
computational
• Pipeline
• Computational
• Bioinformatic
Support
Hardware
• Modular Structure
• Storage Unit
250-300 TB
• Computer Cluster
- 32 servers of 16 cores each (=512 cores)
- 64-128GB memory for each server
-Infiniband (communication between servers, servers
and storage)
• System administration (Eliyahu Rosenberg)
Sequencing pipeline
Automatic Computational Pipeline
DNA sample
Library
preparation
Sequencing
Illumina HiSeq 2000 (example)
1 billion (109) 50nt single-end reads
Computational pipeline
Automatic Computational Pipeline
Identifiers: Genome, experiment, read length
ChIP-seq
RNA-seq DNA methylation
(bisulfite-seq)
Genetic variation
(SNP)
ChIP-seq pipeline
FASTQ
FASTQ
FASTQ
FASTQ
FASTQ
file
file
file
file
file
QC
bowtie2
bowtie2
bowtie2
bowtie2
20-60 files
(250-800Mb each)
MACS,
Grizzly
merge
Peak calling
List
Genomic visualization
Peaks
Pos. Height
Near
gene
Pipeline Overview
•
•
-
Initial automatic analysis provides:
List of results
Visualization of results
Allows researcher:
browse results
Re-run analysis with different parameters
- Follow-up using computational/bioinformatic tools
(including internal Galaxy platform)
Bioinformatics Support Team
• Bioinformatic support and guidance from the stage of
designing the experiment to advanced analysis
• Collaboration in designing validation and follow-up
experiments after receiving initial results
• Close work with students of the research teams
(establishment of a student forum and crosstalk between
students)
• Workshops and tutorials, teaching novel computational
approaches and tools
ChIP-seq advanced analysis as an example
• Based on the initial analysis (e.g. by pipeline) identify
common targets of a TF and their binding motif
• Integrate with gene expression data (RNA-seq and
microarray data) to identify functional binding
• Describe regulatory networks and integrate with other
regulation levels (e.g. PPI. Histone modification)
Contact
Prof. Hanah Margalit
[email protected]
Dr. Tommy Kaplan
[email protected]