Genome Browser Custom Tracks

Short-reads Custom Tracks
Assaf Gordon
[email protected]
Hannon Lab
CSHL
July 8, 2010
14 _
2 _
1
Contents
I
Visualization
4
1
BED/Interval files
5
2
SAM files
6
3
PSLX (blat) files
8
4
Coverage
10
4.1
Coverage of BED files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
4.2
Coverage of SAM/BAM files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
4.3
Coverage by Strand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4.4
Coverage by Exons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
II
Genome Browser Custom Tracks
18
5
Uploading tracks using FTP
18
6
Using CSHL’s local Genome Browser server (http://foxtrot.cshl.edu)
20
7
Track display options
20
III
Technical Details
21
8
Formatting conventions
21
9
[CHROM_SIZE] file
21
10 Bluehelix setup
22
11 direct MySQL access
22
12 Programs reference
23
13 Compiling programs from source code
23
IV
23
Troubleshooting
2
Introduction
Visualizing small number of intervasl (up to 1,000,000 intervals, or files smaller than 50MB) is easily done by simply uploading the file to the UCSC Genome Browser. Visualizaing large number of reads presents some technical
difficulties.
This document shows how to visualize large files of millions of short-reads (long reads will work just as well).
• Text in fixed-font shows unix commands. See 8 for more details.
• [CHROM_SIZE] is a text file containing the names and sizes of chromosomes. See 9 for more details.
• All program mentioned here are available on BlueHelix. See 10 for more details.
3
Part I
Visualization
The general method is the same for all file formats:
1. Convert your input files into one of several ’common’ textual formats (BED, BedGraph, Wiggle, SAM),
2. Convert the BED/BedGraph/Wiggle/SAM files into a binary File (BigBed,BigWig,BAM)
3. Add a Custom Track in UCSC Genome Browser, pointing to the binary files:
• With the public UCSC Genome Browser:
Upload the binary files to an FTP site, and point the custom track to the correct URL.
See section 5 for FTP usage.
• With the CSHL Mirror Genome Browser:
Put the binary files on BlueHelix, and point the custom track to the correct path.
See section 6 for BlueHelix usage.
4
1
BED/Interval files
Task
Display a BED file:
chr2L
chr2L
chr2L
chr2L
chr2L
13774500
17984104
13675851
18884003
3358603
13774548
17984148
13675900
18884049
3358646
seq-1
seq-2
seq-3
seq-4
seq-5
0
0
0
0
0
+
+
+
+
+
As genomic intervals on the UCSC Genome Browser:
816850
816860
816870
816880
50 bases
816890
816900
seq-457175
seq-375022
seq-525048
seq-455884
seq-585200
seq-453462
seq-373485
seq-456181
seq-371720
seq-588847
seq-375968
seq-583503
seq-583238
seq-368435
seq-457465
seq-523133
seq-375826
seq-587247
816910
816920
dummy.bb
seq-458346
816930
816940
816950
816960
seq-522655
seq-374312
seq-367774
seq-518228
seq-371273
seq-457844
seq-519065
seq-456680
Gap Locations
FlyBase Protein-Coding Genes
FlyBase Noncoding Genes
RefSeq Genes
D. melanogaster mRNAs from GenBank
Solution
1. Sort the input file by chromosome name and start position (input file must be sorted to be converted into a binary
bigBed file).
2. create a binary bigBed file from the BED file.
Commands:
$ sort -k1,1 -k2,2n < [SAMPLE.BED] > [SAMPLE.SORTED.BED]
$ bedToBigBed [SAMPLE.SORTED.BED] [CHROM_SIZE] [SAMPLE.BB]
The file [SAMPLE.BB] can be uploaded to the UCSC genome browser as a custom track:
Home
Genomes
Genome Browser
Blat
Tables
Gene Sorter
Session
FAQ
Help
Add Custom Tracks
clade
Insect
genome
Paste URLs or data:
D. melanogaster
assembly
Or upload:
Apr. 2006 (BDGP R5/dm3)
Browse…
Submit
track name="BigBed Track" type=bigBed bigDataUrl=http://myserver.edu/sample.bb
Clear
5
2
SAM files
Task
Display a SAM file1 :
ZAPHOD_FC42T13AAXX:9:1:503:868
ZAPHOD_FC42T13AAXX:9:1:877:655
ZAPHOD_FC42T13AAXX:9:1:839:364
ZAPHOD_FC42T13AAXX:9:1:125:213
0
16
16
0
chr3L
chr3L
chr3L
chr2L
3390
11651753
12316404
12315946
255
255
255
255
44M
43M
44M
44M
*
*
*
*
0
0
0
0
As a custom track in the UCSC Genome Browser:
500 bases
BAMTrack
UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics
Vertebrate Multiz Alignment & Conservation (44 Species)
Placental Mammal Basewise Conservation by PhyloP
Multiz Alignments of 44 Vertebrates
Solution
1. Convert the SAM file to a BAM file.
2. Sort the BAM file.
3. Create an index (BAI file) to accompany the BAM file.
Commands:
$ samtools view -S -b -o [SAMPLE.BAM] [SAMPLE.SAM]
# .BAM extension will be added automatically to the ’SAMPLE.SORTED’ file.
$ samtools sort [SAMPLE.BAM] [SAMPLE.SORTED]
# A new index file will be created with the same name and a .BAI extension
$ samtools index [SAMPLE.SORTED.BAM]
1 SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments.
http://samtools.sourceforge.net/
6
See
0
0
0
0
ACATATT
TAATATA
TAATATA
TAATATA
The two files [SAMPLE.SORTED.BAM] and [SAMPLE.SORTED.BAM.BAI] can be uploaded to the UCSC genome
browser as a custom track:
Home
Genomes
Genome Browser
Blat
Tables
Gene Sorter
Session
FAQ
Help
Add Custom Tracks
clade
Insect
genome
Paste URLs or data:
D. melanogaster
assembly
Or upload:
Apr. 2006 (BDGP R5/dm3)
Browse…
Submit
track name="BAMTrack" type=bam bigDataUrl=http://myserver.edu/sample.sorted.bam
Clear
7
3
PSLX (blat) files
Task
Display a PSL file (output of BLAT program)
50
50
50
50
50
50
50
50
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+
+
+
+
+
+
+
+
4-77
4-77
4-77
4-77
4-77
4-77
4-77
4-77
50
50
50
50
50
50
50
50
0
0
0
0
0
0
0
0
50
50
50
50
50
50
50
50
chrU
chrU
chrU
chrU
chrU
chrU
chrU
chrU
As a custom track in the UCSC Genome Browser:
688100
115056-5
290420-2
688200
688300
500 bases
688400
688500
529551-1
283403-2
366570-1
335694-1
688600
688700
688800
688900
PSLX example
245263-2
77855-6
45216-14
1047-45
338552-1
91168-5
522302-1
211923-3
689000
689100
FlyBase Protein-Coding Genes
FlyBase Noncoding Genes
RefSeq Genes
Repeating Elements by RepeatMasker
Solution
1. Convert the PSLX to BED file
2. Sort the BED file by Chromosome/Start position.
3. Convert BED to BigBed file.
Commands:
$ pslToBed [SAMPLE.PSLX] [SAMPLE.BED]
$ sort -k1,1 -k2,2n < [SAMPLE.BED] > [SAMPLE.SORTED.BED]
$ bedToBigBed [SAMPLE.SORTED.BED] [CHROM_SIZE] [SAMPLE.BB]
The file [SAMPLE.BB] can be uploaded to the UCSC genome browser as a custom track:
8
689200
10049037
10049037
10049037
10049037
10049037
10049037
10049037
10049037
6678238
6680033
6878866
6879800
6880518
7817848
7818749
7860041
6
6
6
6
6
7
7
7
Home
Genomes
Genome Browser
Blat
Tables
Gene Sorter
Session
FAQ
Help
Add Custom Tracks
clade
Insect
genome
Paste URLs or data:
D. melanogaster
assembly
Or upload:
Apr. 2006 (BDGP R5/dm3)
Browse…
Submit
track name="BigBed Track" type=bigBed bigDataUrl=http://myserver.edu/sample.bb
Clear
9
4
Coverage
4.1
Coverage of BED files
Task
Display the nucleotide coverage of a BED file:
chr2L
chr2L
chr2L
chr2L
chr2L
chr2L
13774500
17984104
13675851
18884003
3358603
3212400
13774548
17984148
13675900
18884049
3358646
3212446
seq-1
seq-2
seq-3
seq-4
seq-5
seq-6
0
0
0
0
0
0
+
+
+
+
+
+
As a custom Wiggle track in the UCSC Genome Browser:
816200
CG3639
816300
500 bases
816400
816500
816600
816700
Coverage (BigWig)
816800
816900
817000
FlyBase Protein-Coding Genes
FlyBase Noncoding Genes
RefSeq Genes
Solution
1. Sort the BED file (unlike in BedToBigBed, sorting by chromosome name is sufficient. no need to sort by start
position).
2. Use genomeCoverageBed to calculate coverage over each genomic position.
3. Use bedGraphToBigWig to convert the textual BedGraph into a BigWig file.
Commands:
$ sort -k1,1 < [SAMPLE.BED] > [SAMPLE.SORTED.BED]
$ genomeCoverageBed -bg -i [SAMPLE.SORTED.BED] -g [CHROM_SIZE] > [SAMPLE.BEDGRAPH]
$ bedGraphToBigWig [SAMPLE.BEDGRAPH] [CHROM_SIZE] [SAMPLE.BW]
The file [SAMPLE.BW] file is the BigWig file, which can be used as the custom track:
10
Home
Genomes
Genome Browser
Blat
Tables
Gene Sorter
Session
FAQ
Help
Add Custom Tracks
clade
Insect
genome
Paste URLs or data:
D. melanogaster
assembly
Or upload:
Apr. 2006 (BDGP R5/dm3)
Browse…
Submit
track name="Wiggle Track" type=bigWig bigDataUrl=http://myserver.edu/sample.bw
Clear
11
4.2
Coverage of SAM/BAM files
Task
Display the nucleotide coverage of a SAM (or BAM) file:
ZAPHOD_FC42T13AAXX:9:1:503:868
ZAPHOD_FC42T13AAXX:9:1:877:655
ZAPHOD_FC42T13AAXX:9:1:839:364
ZAPHOD_FC42T13AAXX:9:1:935:985
ZAPHOD_FC42T13AAXX:9:1:125:213
ZAPHOD_FC42T13AAXX:9:1:953:789
ZAPHOD_FC42T13AAXX:9:1:31:108
ZAPHOD_FC42T13AAXX:9:1:454:503
0
16
16
16
16
16
16
16
chr3L
chr3L
chr3L
chr3L
chr2L
chr2L
chr2L
chr2L
3390
11651753
12316404
11651753
12315946
11651753
12315946
11651254
We want to view coverage of genomic positions as a custom track:
816200
816300
500 bases
816400
CG3639
816500
816600
816700
Coverage (BigWig)
816800
816900
817000
FlyBase Protein-Coding Genes
FlyBase Noncoding Genes
RefSeq Genes
Solution
1. Convert the SAM file to a BAM file (if needed)
2. Sort the BAM file.
3. Use genomeCoverageBed to calculate coverage over each genomic position.
4. Use bedGraphToBigWig to convert the textual BedGraph into a BigWig file.
Commands:
$ samtools view -S -b -o [SAMPLE.BAM] [SAMPLE.SAM]
$ samtools sort [SAMPLE.BAM] [SAMPLE.SORTED]
$ genomeCoverageBed -bg -ibam [SAMPLE.SORTED.BAM] \
-g [CHROM_SIZE] > [SAMPLE.BEDGRAPH]
$ bedGraphToBigWig [SAMPLE.BEDGRAPH] [CHROM_SIZE] [SAMPLE.BW]
The file [SAMPLE.BW] file is the BigWig file, which can be used as the custom track:
12
255
255
255
255
255
255
255
255
44M
43M
44M
43M
44M
44M
44M
45M
*
*
*
*
*
*
*
*
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
ACATATT
TAATATA
TAATATA
TAATATA
TAATATA
TAATATA
TAATATA
GCTCTCT
Home
Genomes
Genome Browser
Blat
Tables
Gene Sorter
Session
FAQ
Help
Add Custom Tracks
clade
Insect
genome
Paste URLs or data:
D. melanogaster
assembly
Or upload:
Apr. 2006 (BDGP R5/dm3)
Browse…
Submit
track name="Wiggle Track" type=bigWig bigDataUrl=http://myserver.edu/sample.bw
Clear
13
4.3
Coverage by Strand
Task
Display the coverage of intervals in a BED (or BAM) file:
chr2L
chr2L
chr2L
chr2L
chr2L
13774500
17984104
13675851
18884003
3358603
13774548
17984148
13675900
18884049
3358646
seq-1
seq-2
seq-3
seq-4
seq-5
0
0
0
0
0
+
+
+
As two tracks in the UCSC Genome browser - one for positive-strand reads, one for negative-strand reads:
19047600
19047650
100 bases
19047700
19047750
19047800
positive_strand
19047850
19047900
19047950
negative_strand
RefSeq Genes
Solution
1. Sort the BED file (sorting by chromosome name is sufficient)
2. Use genomeCoverageBed with the -strand option to calculate coverage of each strand.
3. For the negative strand track, use awk to negate the coverage values.
4. Use bedGraphToBigWig to create BigWig track for each strand.
Commands:
$ sort -k1,1 < [SAMPLE.BED] > [SAMPLE.SORTED.BED]
$ genomeCoverageBed -bg -i [SAMPLE.SORTED.BED] -g [CHROM_SIZE] -strand + \
> [SAMPLE.POS.BEDGRAPH]
$ genomeCoverageBed -bg -i [SAMPLE.SORTED.BED] -g [CHROM_SIZE] -strand - | \
awk '{ $4 = - $4 ; print $0 }' > [SAMPLE.NEG.BEDGRAPH]
$ bedGraphToBigWig [SAMPLE.POS.BEDGRAPH] [CHROM_SIZE] [SAMPLE.POS.BW]
$ bedGraphToBigWig [SAMPLE.NEG.BEDGRAPH] [CHROM_SIZE] [SAMPLE.NEG.BW]
The two files ([SAMPLE.POS.BW] and [SAMPLE.NEG.BW]) can be sued as custom tracks in the Genome Browser.
A special parameter (color) will show each strand in a different color:
14
Home
Genomes
Genome Browser
Blat
Tables
Gene Sorter
Session
FAQ
Help
Add Custom Tracks
clade
Insect
genome
Paste URLs or data:
D. melanogaster
assembly
Apr. 2006 (BDGP R5/dm3)
Or upload:
Browse…
Submit
track name="Positive Strand" color=0,0,255 type=bigWig
bigDataUrl=http://myserver.edu/SAMPLE.POS.BW
track name="Negative Strand" color=255,0,0 type=bigWig
bigDataUrl=http://myserver.edu/SAMPLE.NEG.BW
15
Clear
4.4
Coverage by Exons
Problem
Display a BED (or BAM) file containing blocked intervals2
chr2L
chr2L
chr2L
chr2L
chr2L
chr2L
chr2L
chr2L
chr2L
67753
107813
107813
108308
113350
118291
119550
119550
120070
67927
108589
108589
108591
113473
118362
119870
119870
120477
seq1
seq2
seq3
seq4
seq5
seq6
seq7
seq8
seq9
0
0
0
0
0
0
0
0
0
+
+
+
-
67753
107813
107813
108308
113350
118291
119550
119550
120070
67927
108589
108589
108591
113473
118362
119870
119870
120477
255,0,0
255,0,0
255,0,0
255,0,0
255,0,0
255,0,0
255,0,0
255,0,0
255,0,0
2
2
2
2
2
2
2
2
2
9,36
25,2
25,2
38,4
19,40
13,2
4,43
4,43
10,57
As a custom track in the UCSC Genome Browser (with only the exonic/blocks of each interval as displayed):
229500
230000
1 kb
230500
231000
Exon Coverage
231500
232000
232500
Spliced Reads
RefSeq Genes
Solution
1. Sort the BED file
2. Use genomeCoverageBed to calculate coverage of the exonic regions (with the -split option)
3. Use bedGraphToBigWig to create BigWig track.
Commands:
$ sort -k1,1 < [SAMPLE.BED] > [SAMPLE.SORTED.BED]
$ genomeCoverageBed -bg -i [SAMPLE.SORTED.BED] -g [CHROM_SIZE] | \
-split > [SAMPLE.BEDGRAPH]
$ bedGraphToBigWig [SAMPLE.BEDGRAPH] [CHROM_SIZE] [SAMPLE.BW]
The [OUTPUT.BW] file is the BigWig file, which can be used as the custom track.
2 BED
files with 12 columns, or SAM/BAM files which have CIGAR strings with N/D - Result of mapping spliced-junctions.
16
0,138
0,774
0,774
0,279
0,83
0,69
0,277
0,277
0,350
Home
Genomes
Genome Browser
Blat
Tables
Gene Sorter
Session
FAQ
Help
Add Custom Tracks
clade
Insect
genome
Paste URLs or data:
D. melanogaster
assembly
Or upload:
Apr. 2006 (BDGP R5/dm3)
Browse…
Submit
track name="Wiggle Track" type=bigWig bigDataUrl=http://myserver.edu/sample.bw
Clear
17
Part II
Genome Browser Custom Tracks
Once you have a BigBed/BigWig/BAM custom track file, you need to load it to the Genome Browser server (by
clicking on "Add Custom Track" or "Manage Custom Tracks" buttons). This section explains how to load track files
into the Genome Browser server, and how the set track display options.
5
Uploading tracks using FTP
FTP server (File Transfer Protocol) is a computer that stores files, and gives access to them anywhere from the
internet3 .
To use BigBed/BigWig/BAM custom tracks with the public UCSC Genome browser, you’ll have to put the files on
an public FTP server, and instruct the UCSC Genome Browser to read the files from the FTP server (unlike our local
Genome Browser mirror server, which can read files directly from BlueHelix).
Using an FTP server is the easiest way to make those files public, but HTTP server can also be used (if you know how
to upload files to an HTTP server. This document does not deal with HTTP servers).
Getting an FTP server account
• All CSHL members - All CSHL members can request an FTP account from the I.T department. To request an
FTP account, fill out this form: http://intranet.cshl.edu/it/requests/account_request.
html . Put "FTP" in the field "...I would like to access the following server(s)".
• Hannon Lab members - Email [email protected] for an FTP account on ftp://cancan.cshl.edu .
• Other alternatives - Any public HTTP/FTP server will work just fine, if you have access to one.
Uploading a file to an FTP server
If the custom track file (BigWig,BigBed,BAM) is stored on your local computer (Mac/Windows), use one of the
following friendly programs to upload the file to the ftp server:
• Cyberduck (for Mac OS)
• FileZilla (For Mac, Windows, Linux)
• Apple’s Classic FTP for Mac
• WinSCP (For windows)
• and many many more...
If the custom track file is stored on BlueHelix, or if you prefer to use the command line FTP program, see the following
commands as an example. Text in bold are commands you should type. Replace gordon with your FTP username.
Replace dummy.bb with the file name of your custom track.
3 This
is a gross over-simplification, but it’ll do for now.
18
$ ftp -p ftp2.cshl.edu
Connected to ftp2.cshl.edu.
220 (vsFTPd 2.0.5)
Name (ftp2.cshl.edu:gordon): gordon
331 Please specify the password.
Password: TYPE PASSWORD AND PRESS ENTER
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> bin
200 Switching to Binary mode
ftp> put dummy.bb
local: dummy.bb remote: dummy.bb
227 Entering Passive Mode (143,48,220,121,171,132)
150 Ok to send data.
226 File receive OK.
5937928 bytes sent in 0.56 secs (10444.5 kB/s)
ftp> quit
221 Goodbye.
Loading a Custom-Track from an FTP server
URL (Uniform Resource Locator) is a method to find files on the internet4 .
The syntax of the URL is ftp://USER:PASSWORD@SERVER/FILE. Assuming the following details:
FTP server: ftp2.cshl.edu
FTP Username: gordon
FTP password: 12345678
Custom Track file name: sample.bam
The full URL to access this file will be:
ftp://gordon:[email protected]/sample.bam
When adding a custom-track In the public UCSC Genome Browser http://genome.ucsc.edu, use the URL of
the file with the bigDataUrl keyword, as so:
Home
Genomes
Genome Browser
Blat
Tables
Gene Sorter
Session
FAQ
Help
Add Custom Tracks
clade
Insect
genome
Paste URLs or data:
D. melanogaster
assembly
Apr. 2006 (BDGP R5/dm3)
Or upload:
Browse…
Submit
track type=bam bigDataUrl=ftp://gordon:[email protected]/sample.bam
Clear
If all went well, when you click "Submit" the new custom track will be added. If there was any error
4 Again,
a gross over-simplification that will do for now. See ?? for more accurate description.
19
6
Using CSHL’s local Genome Browser server (http://foxtrot.cshl.edu)
http://foxtrot.cshl.edu is our local mirror of the UCSC Genome Browser.
It supports contains several common organisms/builds (hg18,hg19,mm9,dm3,panTro2,strPur2) and several other custom builds.
Advantages of using our local server:
1. Faster tracks upload (for BED/Wiggle files)
2. Sessions and Custom tracks are saved for longer periods
3. BLAT with less stringent matching parameters (suitible for short-reads)
4. Can read custom tracks directly from BlueHelix storage (no need to upload files to HTTP/FTP server). See
Below for details.
7
Track display options
See this short tutorial: http://tango.cshl.edu/compskills/gb_tutorial7.pdf .
20
Part III
Technical Details
8
Formatting conventions
Fixed-Fonts sections (as the one below) depict a unix session, as typed on a terminal. This will usually be on BlueHelix.
• lines starting with ’#’ are comments
• lines starting with ’$’ are unix shell commands. These should be typed be the user.
• other lines are the program output: will be printed on the screen when the user executes the commands.
The following example shows a unix session, where the user runs the ls command (print file list):
# This is a comment. The next line shows executing the "ls" command
# followed by the output of the "ls" command (the four files).
$ ls
file1
file2
file3
file4
Where input or output files are involved, they will appear in UPPER CASE, surrounded by square brackets. These
should be replaced by real file names when the command is executed by the user.
# The following command copies a file
# The command has no output - nothing is printed after the command is executed.
$ cp [INPUT.TXT] [OUTPUT.TXT]
9
[CHROM SIZE] file
The programs bedClip, genomeCoverageBed, bedGraphToBigWig, bedToBigBed require a textual file containing
the names and sizes of each chromosome (for the organism/build used). The examples in this document use the
[CHROM_SIZE] place holder for this file.
On BlueHelix, files are available for the most common builds:
$ cd /data/hannon/gordon/databases/chrom_sizes
$ ls -l
total 128
-rw-r--r-- 1 gordon hannon 513 Mar 9 19:55 dm3_chromInfo.txt
-rw-r--r-- 1 gordon hannon 2229 Mar 9 19:54 hg18_chromInfo.txt
-rw-r--r-- 1 gordon hannon 3924 Mar 9 19:54 hg19_chromInfo.txt
-rw-r--r-- 1 gordon hannon 1249 Mar 9 19:54 mm9_chromInfo.txt
Each file contains three columns: chromosome, size, file (the file column can be safely ignored):
21
$ cat dm3_chromInfo.txt
chr2L
23011544
/gbdb/dm3/dm3.2bit
chr2LHet
368872
/gbdb/dm3/dm3.2bit
chr2R
21146708
/gbdb/dm3/dm3.2bit
chr2RHet
3288761
/gbdb/dm3/dm3.2bit
chr3L
24543557
/gbdb/dm3/dm3.2bit
chr3LHet
2555491
/gbdb/dm3/dm3.2bit
chr3R
27905053
/gbdb/dm3/dm3.2bit
chr3RHet
2517507
/gbdb/dm3/dm3.2bit
chr4
1351857
/gbdb/dm3/dm3.2bit
chrU
10049037
/gbdb/dm3/dm3.2bit
chrUextra 29004656
/gbdb/dm3/dm3.2bit
chrX
22422827
/gbdb/dm3/dm3.2bit
chrXHet
204112
/gbdb/dm3/dm3.2bit
chrYHet
347038
/gbdb/dm3/dm3.2bit
chrM
19517
/gbdb/dm3/dm3.2bit
Files for every organism/build available on the UCSC Genome Browsercan be download from:
http://hgdownload.cse.ucsc.edu/goldenPath/ORG/database/chromInfo.txt.gz
Example (for hg18):
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/chromInfo.txt.gz
10
Bluehelix setup
On BlueHelix, the relevant programs are available in:
/data/hannon/gordon/ucsc_genome_browser/bin
A required library (libmysqlclient.so) is availble here:
/data/hannon/gordon/usr/lib/mysql/
When using BASH, run the following commands:
export PATH=$PATH:/data/hannon/gordon/ucsc_genome_browser/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/hannon/gordon/usr/lib/mysql
When using TCSH, run the following commands:
setenv PATH $PATH:/data/hannon/gordon/ucsc_genome_browser/bin
setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH:/data/hannon/gordon/usr/lib/mysql
TODO: make a friendly script (set agnostic)
11
direct MySQL access
The UCSC Genome Browserallows direct access to the back-end MySQL database containing all the annotation tracks
(see http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29 for more details).
Our local mirror (http://foxtrot.cshl.edu) contains the same annotation tables for several common builds
(mainly: hg18, hg19, mm9, dm3, panTro2). Contact [email protected] to setup direct access to the database server
(could be faster then connecting to UCSC’s MySQL server).
22
12
Programs reference
bedClip
bedToBigBed
bedGraphToBigWig
genomeCoverageBed
samtools
gb custom track line
13
Compiling programs from source code
Jim Kent’s Tools
Don’t.
Download the pre-compiled binaries from http://genome-test.cse.ucsc.edu/~kent/exe/.
If you insist on building it from source, you’ll find it on BlueHelix:
/home/hannon/gordon/source/kent_genome_browser_source/kent
And the build instructions here: http://genome.ucsc.edu/admin/jk-install.html.
If you have an I.T.-managed server with CentOS 5.4 and Linux kernel 2.6.18, send me an email and I can send
you the compiled binaries for that platform.
samtools
The source code for samtools v0.1.7a is on BlueHelix:
/home/hannon/gordon/source/samtools-0.1.7a
Or on the official web site: http://samtools.sourceforge.net/
bedtools
The examples in this document require a patched version of Aaron Quinlan’s BEDTools package, available on BlueHelix:
/home/hannon/gordon/source/BEDTools_bedgraph
The official web site: http://code.google.com/p/bedtools/
Future versions (probably 2.5.5) might incoporate these patches.
23
Part IV
Troubleshooting
SAM no header
$ samtools view -S -b dummy.sam
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!
Errors with FTP and custom tracks
ftp server response timed out > 1000000 microsec - wrong password
Error Couldn’t find host ccan.cshl.edu. h_errno 1 - bad server name
Error ftp server error on cmd=[SIZE /end221.bb ] response=[550 Could not get file size. ] - wrong file name
Error Missing bigDataUrl setting from track of type=bigBed - multiline track file.
24