poster - [4] crg.eu

bwtool: A tool for bigWig files
Andy Pohl and
Miguel Beato
Centre de Regulació Genòmica (CRG) i UPF,
Dr. Aiguader 88, 08003 Barcelona
Abstract
Examples
BigWig files [1] are a compressed, indexed, binary format for genome-wide signal
data for a variety of experiments or otherwise per-base signal data e.g. ChIP-seq
read depth, GC percent, etc. bwtool is a tool designed to read bigWig files rapidly
and efficiently, providing functionality for extracting data and summarizing it in
several ways, globally or at specific regions. Additionally, the tool enables the
conversion of the positions of signal data from one genome assembly to another,
also known as "lifting". We believe bwtool can be very useful for the analyst
frequently working with bigWig data, which is becoming a standard format to
represent functional signals along genomes.
Highlighting the aggregate operation, we can produce plot
data for ENCODE [4] bigWigs containing various ChIP-seq
data from MCF7 cells and make a plot with GENCODE gene
TSSs (cromatina.crg.cat/bwtool/ex1.html):
Operations
bwtool's functionality is subdivided into subprograms that roughly fall into three
categories: data extraction, analysis, and data modification:
Data Extraction
matrix
Given m regions, makes an m x n sized matrix around defined center
points from the regions.
paste
Aligns and outputs line by line multiple bigWigs base by base.
window
Outputs data in a sliding window, one slide per line.
extract
Can output data in irregularly-sized intervals.
random
Outputs intervals of a defined size from random loci in the bigWig.
sax
Discretizes bigWig data and output as a mock FASTA file.
Another example makes use of the chromgraph operation,
using schnurri ChIP data from the Berkeley Drosophila
Transcription Network Project
(cromatina.crg.cat/bwtool/ex2.html):
Analysis
aggregate
Similar in usage to matrix, but oriented around creating plots of
averaged bigWig profiles around regions of interest.
chromgraph Makes a file suitable for visualization by UCSC Genome Graphs.
distribution
Counts the occurances of values in the data.
find
Finds regions of the bigWig based on thresholds or local extrema.
summary
Provides summary statistics for given loci in the bigWig.
Data Modification
fill
Fills missing regions in the bigWig with a desired value.
remove
Creates missing regions in the bigWig based on a given threshold or
a file to mask specific regions.
shift
Simply moves data on the chromosomes a certain amount/direction.
lift
Maps bigWig data onto another genomeʼs coordinates using a
special alignment file called a liftOver chain.
Availability
bwtool is an open-source software, licensed under the GPL
v3. Version control is hosted by github.com, where
contributions to the software may also be made.
References
[1] Kent, WJ. et al., Bioinformatics, 2010 (17): pp. 2204-2207.
[2] Shin, H. et al., Bioinformatics, 2009 (25)19: pp. 2605-2606
[3] Harrow, J. et al., Genome Research, 2012 (22): pp. 1760-1774.
[4] ENCODE Project Consortium, PLoS Biology, 2011 (9)4: p. e1001046
Acknowledgements
Performance
Formal benchmarking and comparisons to other software is complicated by dearth of
available programs with bigWig functionality and the variety of operations bwtool
provides. To provide an anecdote however, we ran CEAS [2] using WIG files
generated from a human Pol2 ChIP-seq against 20,318 protein-coding genes from
GENCODE v18 [3]. It took around two days, and provided some plots including a
“metagene” plot and several other plots of varying usefulness. bwtool aggregate
was run on the same genes, and a bigWig version of the WIG data and was done
creating the plot data in under 4 min on the same machine. Granted, bwtool only
calculated the data for a single plot, and it took a few more minutes to make the plot
in R, but nevertheless we tend to save a lot of time in situations like this.
Thanks to Daniel Soronellas, João Curado, Alessandra
Breschi, Roderic Guigó, Jakob Skou Pedersen, Brian Raney,
and Jim Kent for testing the program and providing feedback
and advice prior to release.
More Information
http://cromatina.crg.cat/bwtool
If you have questions, you can
find me in attendence or e-mail
[email protected]