Short-reads Custom Tracks Assaf Gordon [email protected] Hannon Lab CSHL July 8, 2010 14 _ 2 _ 1 Contents I Visualization 4 1 BED/Interval files 5 2 SAM files 6 3 PSLX (blat) files 8 4 Coverage 10 4.1 Coverage of BED files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2 Coverage of SAM/BAM files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.3 Coverage by Strand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.4 Coverage by Exons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 II Genome Browser Custom Tracks 18 5 Uploading tracks using FTP 18 6 Using CSHL’s local Genome Browser server (http://foxtrot.cshl.edu) 20 7 Track display options 20 III Technical Details 21 8 Formatting conventions 21 9 [CHROM_SIZE] file 21 10 Bluehelix setup 22 11 direct MySQL access 22 12 Programs reference 23 13 Compiling programs from source code 23 IV 23 Troubleshooting 2 Introduction Visualizing small number of intervasl (up to 1,000,000 intervals, or files smaller than 50MB) is easily done by simply uploading the file to the UCSC Genome Browser. Visualizaing large number of reads presents some technical difficulties. This document shows how to visualize large files of millions of short-reads (long reads will work just as well). • Text in fixed-font shows unix commands. See 8 for more details. • [CHROM_SIZE] is a text file containing the names and sizes of chromosomes. See 9 for more details. • All program mentioned here are available on BlueHelix. See 10 for more details. 3 Part I Visualization The general method is the same for all file formats: 1. Convert your input files into one of several ’common’ textual formats (BED, BedGraph, Wiggle, SAM), 2. Convert the BED/BedGraph/Wiggle/SAM files into a binary File (BigBed,BigWig,BAM) 3. Add a Custom Track in UCSC Genome Browser, pointing to the binary files: • With the public UCSC Genome Browser: Upload the binary files to an FTP site, and point the custom track to the correct URL. See section 5 for FTP usage. • With the CSHL Mirror Genome Browser: Put the binary files on BlueHelix, and point the custom track to the correct path. See section 6 for BlueHelix usage. 4 1 BED/Interval files Task Display a BED file: chr2L chr2L chr2L chr2L chr2L 13774500 17984104 13675851 18884003 3358603 13774548 17984148 13675900 18884049 3358646 seq-1 seq-2 seq-3 seq-4 seq-5 0 0 0 0 0 + + + + + As genomic intervals on the UCSC Genome Browser: 816850 816860 816870 816880 50 bases 816890 816900 seq-457175 seq-375022 seq-525048 seq-455884 seq-585200 seq-453462 seq-373485 seq-456181 seq-371720 seq-588847 seq-375968 seq-583503 seq-583238 seq-368435 seq-457465 seq-523133 seq-375826 seq-587247 816910 816920 dummy.bb seq-458346 816930 816940 816950 816960 seq-522655 seq-374312 seq-367774 seq-518228 seq-371273 seq-457844 seq-519065 seq-456680 Gap Locations FlyBase Protein-Coding Genes FlyBase Noncoding Genes RefSeq Genes D. melanogaster mRNAs from GenBank Solution 1. Sort the input file by chromosome name and start position (input file must be sorted to be converted into a binary bigBed file). 2. create a binary bigBed file from the BED file. Commands: $ sort -k1,1 -k2,2n < [SAMPLE.BED] > [SAMPLE.SORTED.BED] $ bedToBigBed [SAMPLE.SORTED.BED] [CHROM_SIZE] [SAMPLE.BB] The file [SAMPLE.BB] can be uploaded to the UCSC genome browser as a custom track: Home Genomes Genome Browser Blat Tables Gene Sorter Session FAQ Help Add Custom Tracks clade Insect genome Paste URLs or data: D. melanogaster assembly Or upload: Apr. 2006 (BDGP R5/dm3) Browse… Submit track name="BigBed Track" type=bigBed bigDataUrl=http://myserver.edu/sample.bb Clear 5 2 SAM files Task Display a SAM file1 : ZAPHOD_FC42T13AAXX:9:1:503:868 ZAPHOD_FC42T13AAXX:9:1:877:655 ZAPHOD_FC42T13AAXX:9:1:839:364 ZAPHOD_FC42T13AAXX:9:1:125:213 0 16 16 0 chr3L chr3L chr3L chr2L 3390 11651753 12316404 12315946 255 255 255 255 44M 43M 44M 44M * * * * 0 0 0 0 As a custom track in the UCSC Genome Browser: 500 bases BAMTrack UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics Vertebrate Multiz Alignment & Conservation (44 Species) Placental Mammal Basewise Conservation by PhyloP Multiz Alignments of 44 Vertebrates Solution 1. Convert the SAM file to a BAM file. 2. Sort the BAM file. 3. Create an index (BAI file) to accompany the BAM file. Commands: $ samtools view -S -b -o [SAMPLE.BAM] [SAMPLE.SAM] # .BAM extension will be added automatically to the ’SAMPLE.SORTED’ file. $ samtools sort [SAMPLE.BAM] [SAMPLE.SORTED] # A new index file will be created with the same name and a .BAI extension $ samtools index [SAMPLE.SORTED.BAM] 1 SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. http://samtools.sourceforge.net/ 6 See 0 0 0 0 ACATATT TAATATA TAATATA TAATATA The two files [SAMPLE.SORTED.BAM] and [SAMPLE.SORTED.BAM.BAI] can be uploaded to the UCSC genome browser as a custom track: Home Genomes Genome Browser Blat Tables Gene Sorter Session FAQ Help Add Custom Tracks clade Insect genome Paste URLs or data: D. melanogaster assembly Or upload: Apr. 2006 (BDGP R5/dm3) Browse… Submit track name="BAMTrack" type=bam bigDataUrl=http://myserver.edu/sample.sorted.bam Clear 7 3 PSLX (blat) files Task Display a PSL file (output of BLAT program) 50 50 50 50 50 50 50 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + + + + + + + + 4-77 4-77 4-77 4-77 4-77 4-77 4-77 4-77 50 50 50 50 50 50 50 50 0 0 0 0 0 0 0 0 50 50 50 50 50 50 50 50 chrU chrU chrU chrU chrU chrU chrU chrU As a custom track in the UCSC Genome Browser: 688100 115056-5 290420-2 688200 688300 500 bases 688400 688500 529551-1 283403-2 366570-1 335694-1 688600 688700 688800 688900 PSLX example 245263-2 77855-6 45216-14 1047-45 338552-1 91168-5 522302-1 211923-3 689000 689100 FlyBase Protein-Coding Genes FlyBase Noncoding Genes RefSeq Genes Repeating Elements by RepeatMasker Solution 1. Convert the PSLX to BED file 2. Sort the BED file by Chromosome/Start position. 3. Convert BED to BigBed file. Commands: $ pslToBed [SAMPLE.PSLX] [SAMPLE.BED] $ sort -k1,1 -k2,2n < [SAMPLE.BED] > [SAMPLE.SORTED.BED] $ bedToBigBed [SAMPLE.SORTED.BED] [CHROM_SIZE] [SAMPLE.BB] The file [SAMPLE.BB] can be uploaded to the UCSC genome browser as a custom track: 8 689200 10049037 10049037 10049037 10049037 10049037 10049037 10049037 10049037 6678238 6680033 6878866 6879800 6880518 7817848 7818749 7860041 6 6 6 6 6 7 7 7 Home Genomes Genome Browser Blat Tables Gene Sorter Session FAQ Help Add Custom Tracks clade Insect genome Paste URLs or data: D. melanogaster assembly Or upload: Apr. 2006 (BDGP R5/dm3) Browse… Submit track name="BigBed Track" type=bigBed bigDataUrl=http://myserver.edu/sample.bb Clear 9 4 Coverage 4.1 Coverage of BED files Task Display the nucleotide coverage of a BED file: chr2L chr2L chr2L chr2L chr2L chr2L 13774500 17984104 13675851 18884003 3358603 3212400 13774548 17984148 13675900 18884049 3358646 3212446 seq-1 seq-2 seq-3 seq-4 seq-5 seq-6 0 0 0 0 0 0 + + + + + + As a custom Wiggle track in the UCSC Genome Browser: 816200 CG3639 816300 500 bases 816400 816500 816600 816700 Coverage (BigWig) 816800 816900 817000 FlyBase Protein-Coding Genes FlyBase Noncoding Genes RefSeq Genes Solution 1. Sort the BED file (unlike in BedToBigBed, sorting by chromosome name is sufficient. no need to sort by start position). 2. Use genomeCoverageBed to calculate coverage over each genomic position. 3. Use bedGraphToBigWig to convert the textual BedGraph into a BigWig file. Commands: $ sort -k1,1 < [SAMPLE.BED] > [SAMPLE.SORTED.BED] $ genomeCoverageBed -bg -i [SAMPLE.SORTED.BED] -g [CHROM_SIZE] > [SAMPLE.BEDGRAPH] $ bedGraphToBigWig [SAMPLE.BEDGRAPH] [CHROM_SIZE] [SAMPLE.BW] The file [SAMPLE.BW] file is the BigWig file, which can be used as the custom track: 10 Home Genomes Genome Browser Blat Tables Gene Sorter Session FAQ Help Add Custom Tracks clade Insect genome Paste URLs or data: D. melanogaster assembly Or upload: Apr. 2006 (BDGP R5/dm3) Browse… Submit track name="Wiggle Track" type=bigWig bigDataUrl=http://myserver.edu/sample.bw Clear 11 4.2 Coverage of SAM/BAM files Task Display the nucleotide coverage of a SAM (or BAM) file: ZAPHOD_FC42T13AAXX:9:1:503:868 ZAPHOD_FC42T13AAXX:9:1:877:655 ZAPHOD_FC42T13AAXX:9:1:839:364 ZAPHOD_FC42T13AAXX:9:1:935:985 ZAPHOD_FC42T13AAXX:9:1:125:213 ZAPHOD_FC42T13AAXX:9:1:953:789 ZAPHOD_FC42T13AAXX:9:1:31:108 ZAPHOD_FC42T13AAXX:9:1:454:503 0 16 16 16 16 16 16 16 chr3L chr3L chr3L chr3L chr2L chr2L chr2L chr2L 3390 11651753 12316404 11651753 12315946 11651753 12315946 11651254 We want to view coverage of genomic positions as a custom track: 816200 816300 500 bases 816400 CG3639 816500 816600 816700 Coverage (BigWig) 816800 816900 817000 FlyBase Protein-Coding Genes FlyBase Noncoding Genes RefSeq Genes Solution 1. Convert the SAM file to a BAM file (if needed) 2. Sort the BAM file. 3. Use genomeCoverageBed to calculate coverage over each genomic position. 4. Use bedGraphToBigWig to convert the textual BedGraph into a BigWig file. Commands: $ samtools view -S -b -o [SAMPLE.BAM] [SAMPLE.SAM] $ samtools sort [SAMPLE.BAM] [SAMPLE.SORTED] $ genomeCoverageBed -bg -ibam [SAMPLE.SORTED.BAM] \ -g [CHROM_SIZE] > [SAMPLE.BEDGRAPH] $ bedGraphToBigWig [SAMPLE.BEDGRAPH] [CHROM_SIZE] [SAMPLE.BW] The file [SAMPLE.BW] file is the BigWig file, which can be used as the custom track: 12 255 255 255 255 255 255 255 255 44M 43M 44M 43M 44M 44M 44M 45M * * * * * * * * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ACATATT TAATATA TAATATA TAATATA TAATATA TAATATA TAATATA GCTCTCT Home Genomes Genome Browser Blat Tables Gene Sorter Session FAQ Help Add Custom Tracks clade Insect genome Paste URLs or data: D. melanogaster assembly Or upload: Apr. 2006 (BDGP R5/dm3) Browse… Submit track name="Wiggle Track" type=bigWig bigDataUrl=http://myserver.edu/sample.bw Clear 13 4.3 Coverage by Strand Task Display the coverage of intervals in a BED (or BAM) file: chr2L chr2L chr2L chr2L chr2L 13774500 17984104 13675851 18884003 3358603 13774548 17984148 13675900 18884049 3358646 seq-1 seq-2 seq-3 seq-4 seq-5 0 0 0 0 0 + + + As two tracks in the UCSC Genome browser - one for positive-strand reads, one for negative-strand reads: 19047600 19047650 100 bases 19047700 19047750 19047800 positive_strand 19047850 19047900 19047950 negative_strand RefSeq Genes Solution 1. Sort the BED file (sorting by chromosome name is sufficient) 2. Use genomeCoverageBed with the -strand option to calculate coverage of each strand. 3. For the negative strand track, use awk to negate the coverage values. 4. Use bedGraphToBigWig to create BigWig track for each strand. Commands: $ sort -k1,1 < [SAMPLE.BED] > [SAMPLE.SORTED.BED] $ genomeCoverageBed -bg -i [SAMPLE.SORTED.BED] -g [CHROM_SIZE] -strand + \ > [SAMPLE.POS.BEDGRAPH] $ genomeCoverageBed -bg -i [SAMPLE.SORTED.BED] -g [CHROM_SIZE] -strand - | \ awk '{ $4 = - $4 ; print $0 }' > [SAMPLE.NEG.BEDGRAPH] $ bedGraphToBigWig [SAMPLE.POS.BEDGRAPH] [CHROM_SIZE] [SAMPLE.POS.BW] $ bedGraphToBigWig [SAMPLE.NEG.BEDGRAPH] [CHROM_SIZE] [SAMPLE.NEG.BW] The two files ([SAMPLE.POS.BW] and [SAMPLE.NEG.BW]) can be sued as custom tracks in the Genome Browser. A special parameter (color) will show each strand in a different color: 14 Home Genomes Genome Browser Blat Tables Gene Sorter Session FAQ Help Add Custom Tracks clade Insect genome Paste URLs or data: D. melanogaster assembly Apr. 2006 (BDGP R5/dm3) Or upload: Browse… Submit track name="Positive Strand" color=0,0,255 type=bigWig bigDataUrl=http://myserver.edu/SAMPLE.POS.BW track name="Negative Strand" color=255,0,0 type=bigWig bigDataUrl=http://myserver.edu/SAMPLE.NEG.BW 15 Clear 4.4 Coverage by Exons Problem Display a BED (or BAM) file containing blocked intervals2 chr2L chr2L chr2L chr2L chr2L chr2L chr2L chr2L chr2L 67753 107813 107813 108308 113350 118291 119550 119550 120070 67927 108589 108589 108591 113473 118362 119870 119870 120477 seq1 seq2 seq3 seq4 seq5 seq6 seq7 seq8 seq9 0 0 0 0 0 0 0 0 0 + + + - 67753 107813 107813 108308 113350 118291 119550 119550 120070 67927 108589 108589 108591 113473 118362 119870 119870 120477 255,0,0 255,0,0 255,0,0 255,0,0 255,0,0 255,0,0 255,0,0 255,0,0 255,0,0 2 2 2 2 2 2 2 2 2 9,36 25,2 25,2 38,4 19,40 13,2 4,43 4,43 10,57 As a custom track in the UCSC Genome Browser (with only the exonic/blocks of each interval as displayed): 229500 230000 1 kb 230500 231000 Exon Coverage 231500 232000 232500 Spliced Reads RefSeq Genes Solution 1. Sort the BED file 2. Use genomeCoverageBed to calculate coverage of the exonic regions (with the -split option) 3. Use bedGraphToBigWig to create BigWig track. Commands: $ sort -k1,1 < [SAMPLE.BED] > [SAMPLE.SORTED.BED] $ genomeCoverageBed -bg -i [SAMPLE.SORTED.BED] -g [CHROM_SIZE] | \ -split > [SAMPLE.BEDGRAPH] $ bedGraphToBigWig [SAMPLE.BEDGRAPH] [CHROM_SIZE] [SAMPLE.BW] The [OUTPUT.BW] file is the BigWig file, which can be used as the custom track. 2 BED files with 12 columns, or SAM/BAM files which have CIGAR strings with N/D - Result of mapping spliced-junctions. 16 0,138 0,774 0,774 0,279 0,83 0,69 0,277 0,277 0,350 Home Genomes Genome Browser Blat Tables Gene Sorter Session FAQ Help Add Custom Tracks clade Insect genome Paste URLs or data: D. melanogaster assembly Or upload: Apr. 2006 (BDGP R5/dm3) Browse… Submit track name="Wiggle Track" type=bigWig bigDataUrl=http://myserver.edu/sample.bw Clear 17 Part II Genome Browser Custom Tracks Once you have a BigBed/BigWig/BAM custom track file, you need to load it to the Genome Browser server (by clicking on "Add Custom Track" or "Manage Custom Tracks" buttons). This section explains how to load track files into the Genome Browser server, and how the set track display options. 5 Uploading tracks using FTP FTP server (File Transfer Protocol) is a computer that stores files, and gives access to them anywhere from the internet3 . To use BigBed/BigWig/BAM custom tracks with the public UCSC Genome browser, you’ll have to put the files on an public FTP server, and instruct the UCSC Genome Browser to read the files from the FTP server (unlike our local Genome Browser mirror server, which can read files directly from BlueHelix). Using an FTP server is the easiest way to make those files public, but HTTP server can also be used (if you know how to upload files to an HTTP server. This document does not deal with HTTP servers). Getting an FTP server account • All CSHL members - All CSHL members can request an FTP account from the I.T department. To request an FTP account, fill out this form: http://intranet.cshl.edu/it/requests/account_request. html . Put "FTP" in the field "...I would like to access the following server(s)". • Hannon Lab members - Email [email protected] for an FTP account on ftp://cancan.cshl.edu . • Other alternatives - Any public HTTP/FTP server will work just fine, if you have access to one. Uploading a file to an FTP server If the custom track file (BigWig,BigBed,BAM) is stored on your local computer (Mac/Windows), use one of the following friendly programs to upload the file to the ftp server: • Cyberduck (for Mac OS) • FileZilla (For Mac, Windows, Linux) • Apple’s Classic FTP for Mac • WinSCP (For windows) • and many many more... If the custom track file is stored on BlueHelix, or if you prefer to use the command line FTP program, see the following commands as an example. Text in bold are commands you should type. Replace gordon with your FTP username. Replace dummy.bb with the file name of your custom track. 3 This is a gross over-simplification, but it’ll do for now. 18 $ ftp -p ftp2.cshl.edu Connected to ftp2.cshl.edu. 220 (vsFTPd 2.0.5) Name (ftp2.cshl.edu:gordon): gordon 331 Please specify the password. Password: TYPE PASSWORD AND PRESS ENTER 230 Login successful. Remote system type is UNIX. Using binary mode to transfer files. ftp> bin 200 Switching to Binary mode ftp> put dummy.bb local: dummy.bb remote: dummy.bb 227 Entering Passive Mode (143,48,220,121,171,132) 150 Ok to send data. 226 File receive OK. 5937928 bytes sent in 0.56 secs (10444.5 kB/s) ftp> quit 221 Goodbye. Loading a Custom-Track from an FTP server URL (Uniform Resource Locator) is a method to find files on the internet4 . The syntax of the URL is ftp://USER:PASSWORD@SERVER/FILE. Assuming the following details: FTP server: ftp2.cshl.edu FTP Username: gordon FTP password: 12345678 Custom Track file name: sample.bam The full URL to access this file will be: ftp://gordon:[email protected]/sample.bam When adding a custom-track In the public UCSC Genome Browser http://genome.ucsc.edu, use the URL of the file with the bigDataUrl keyword, as so: Home Genomes Genome Browser Blat Tables Gene Sorter Session FAQ Help Add Custom Tracks clade Insect genome Paste URLs or data: D. melanogaster assembly Apr. 2006 (BDGP R5/dm3) Or upload: Browse… Submit track type=bam bigDataUrl=ftp://gordon:[email protected]/sample.bam Clear If all went well, when you click "Submit" the new custom track will be added. If there was any error 4 Again, a gross over-simplification that will do for now. See ?? for more accurate description. 19 6 Using CSHL’s local Genome Browser server (http://foxtrot.cshl.edu) http://foxtrot.cshl.edu is our local mirror of the UCSC Genome Browser. It supports contains several common organisms/builds (hg18,hg19,mm9,dm3,panTro2,strPur2) and several other custom builds. Advantages of using our local server: 1. Faster tracks upload (for BED/Wiggle files) 2. Sessions and Custom tracks are saved for longer periods 3. BLAT with less stringent matching parameters (suitible for short-reads) 4. Can read custom tracks directly from BlueHelix storage (no need to upload files to HTTP/FTP server). See Below for details. 7 Track display options See this short tutorial: http://tango.cshl.edu/compskills/gb_tutorial7.pdf . 20 Part III Technical Details 8 Formatting conventions Fixed-Fonts sections (as the one below) depict a unix session, as typed on a terminal. This will usually be on BlueHelix. • lines starting with ’#’ are comments • lines starting with ’$’ are unix shell commands. These should be typed be the user. • other lines are the program output: will be printed on the screen when the user executes the commands. The following example shows a unix session, where the user runs the ls command (print file list): # This is a comment. The next line shows executing the "ls" command # followed by the output of the "ls" command (the four files). $ ls file1 file2 file3 file4 Where input or output files are involved, they will appear in UPPER CASE, surrounded by square brackets. These should be replaced by real file names when the command is executed by the user. # The following command copies a file # The command has no output - nothing is printed after the command is executed. $ cp [INPUT.TXT] [OUTPUT.TXT] 9 [CHROM SIZE] file The programs bedClip, genomeCoverageBed, bedGraphToBigWig, bedToBigBed require a textual file containing the names and sizes of each chromosome (for the organism/build used). The examples in this document use the [CHROM_SIZE] place holder for this file. On BlueHelix, files are available for the most common builds: $ cd /data/hannon/gordon/databases/chrom_sizes $ ls -l total 128 -rw-r--r-- 1 gordon hannon 513 Mar 9 19:55 dm3_chromInfo.txt -rw-r--r-- 1 gordon hannon 2229 Mar 9 19:54 hg18_chromInfo.txt -rw-r--r-- 1 gordon hannon 3924 Mar 9 19:54 hg19_chromInfo.txt -rw-r--r-- 1 gordon hannon 1249 Mar 9 19:54 mm9_chromInfo.txt Each file contains three columns: chromosome, size, file (the file column can be safely ignored): 21 $ cat dm3_chromInfo.txt chr2L 23011544 /gbdb/dm3/dm3.2bit chr2LHet 368872 /gbdb/dm3/dm3.2bit chr2R 21146708 /gbdb/dm3/dm3.2bit chr2RHet 3288761 /gbdb/dm3/dm3.2bit chr3L 24543557 /gbdb/dm3/dm3.2bit chr3LHet 2555491 /gbdb/dm3/dm3.2bit chr3R 27905053 /gbdb/dm3/dm3.2bit chr3RHet 2517507 /gbdb/dm3/dm3.2bit chr4 1351857 /gbdb/dm3/dm3.2bit chrU 10049037 /gbdb/dm3/dm3.2bit chrUextra 29004656 /gbdb/dm3/dm3.2bit chrX 22422827 /gbdb/dm3/dm3.2bit chrXHet 204112 /gbdb/dm3/dm3.2bit chrYHet 347038 /gbdb/dm3/dm3.2bit chrM 19517 /gbdb/dm3/dm3.2bit Files for every organism/build available on the UCSC Genome Browsercan be download from: http://hgdownload.cse.ucsc.edu/goldenPath/ORG/database/chromInfo.txt.gz Example (for hg18): http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/chromInfo.txt.gz 10 Bluehelix setup On BlueHelix, the relevant programs are available in: /data/hannon/gordon/ucsc_genome_browser/bin A required library (libmysqlclient.so) is availble here: /data/hannon/gordon/usr/lib/mysql/ When using BASH, run the following commands: export PATH=$PATH:/data/hannon/gordon/ucsc_genome_browser/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/hannon/gordon/usr/lib/mysql When using TCSH, run the following commands: setenv PATH $PATH:/data/hannon/gordon/ucsc_genome_browser/bin setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH:/data/hannon/gordon/usr/lib/mysql TODO: make a friendly script (set agnostic) 11 direct MySQL access The UCSC Genome Browserallows direct access to the back-end MySQL database containing all the annotation tracks (see http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29 for more details). Our local mirror (http://foxtrot.cshl.edu) contains the same annotation tables for several common builds (mainly: hg18, hg19, mm9, dm3, panTro2). Contact [email protected] to setup direct access to the database server (could be faster then connecting to UCSC’s MySQL server). 22 12 Programs reference bedClip bedToBigBed bedGraphToBigWig genomeCoverageBed samtools gb custom track line 13 Compiling programs from source code Jim Kent’s Tools Don’t. Download the pre-compiled binaries from http://genome-test.cse.ucsc.edu/~kent/exe/. If you insist on building it from source, you’ll find it on BlueHelix: /home/hannon/gordon/source/kent_genome_browser_source/kent And the build instructions here: http://genome.ucsc.edu/admin/jk-install.html. If you have an I.T.-managed server with CentOS 5.4 and Linux kernel 2.6.18, send me an email and I can send you the compiled binaries for that platform. samtools The source code for samtools v0.1.7a is on BlueHelix: /home/hannon/gordon/source/samtools-0.1.7a Or on the official web site: http://samtools.sourceforge.net/ bedtools The examples in this document require a patched version of Aaron Quinlan’s BEDTools package, available on BlueHelix: /home/hannon/gordon/source/BEDTools_bedgraph The official web site: http://code.google.com/p/bedtools/ Future versions (probably 2.5.5) might incoporate these patches. 23 Part IV Troubleshooting SAM no header $ samtools view -S -b dummy.sam [samopen] no @SQ lines in the header. [sam_read1] missing header? Abort! Errors with FTP and custom tracks ftp server response timed out > 1000000 microsec - wrong password Error Couldn’t find host ccan.cshl.edu. h_errno 1 - bad server name Error ftp server error on cmd=[SIZE /end221.bb ] response=[550 Could not get file size. ] - wrong file name Error Missing bigDataUrl setting from track of type=bigBed - multiline track file. 24
© Copyright 2024 Paperzz