Supplementary Tables

Supplemental Information
Supplementary Note
normR allows for normalization of ChIP-seq data genome-wide and in arbitrarily defined windows (even
with varying sizes), e.g. windows around the Transcriptional Start Site (TSS). Previously we defined a
binary classifier statistic to compare the performance among a set of ChIP-seq enrichment/difference
calling methods based on a “majority vote defined gold standard” [1]. First, we will follow this approach
to compare precision and recall for enrichR, MACS2 [2] (v2.1.0.20150731), DFilter [3] (v1.6),
CisGenome [4], SPP [5] (v1.13), BCP [6] (v1.1) and MUSIC [7]. Second, precision recall curves were
computed for each tool under its gold standard. Third, we catalogue regions specific for each tool under a
unified gold standard set, i.e. the union of all tools’ gold standards, for subsequent analysis of validity of
these tool-specific peak calls in terms of fold change and read coverage (power). Fourth, we will utilize
the unified gold standard to study saturation dynamics of enrichment calls in an in silico down sampled
sequencing library. Finally, we will apply the same approach to compare diffR with ChIPDiff [8],
histoneHMM [9] and ODIN [10].
Comparison of Peak Callers under a Majority Vote Defined Gold Standard
To assay enrichR’s performance, we downloaded MACS2 [2] (v2.1.0.20150731), DFilter [3] (v1.6),
CisGenome [4], SPP [5] (v1.13), BCP [6] (v1.1) and MUSIC [7]. A FDR threshold of 0.99 was used
where applicable to allow for subsequent filtering of results.
First, we removed duplicated fragments (-F 1024) and kept only reads with a mapping quality higher than
20 (-q 20) with samtools [11] (v0.1.19-44428cd) allowing for fair comparison with enrichR:
samtools view -F 1024 –q 20 in.bam > out.bam
Second, we called peaks for H3K4me3 and H3K36me3. MACS2 was run using the following command:
macs2 callpeak -t H3K4me3.bam -c Control.bam -f BAMPE -g hs -q 0.99
macs2 callpeak -t H3K36me3.bam -c Control.bam -f BAMPE -g hs -q 0.99
For H3K36me3, we merged the results with “--broad” calls:
macs2 callpeak -t H3K36me3.bam -c Control.bam -f BAMPE -g hs –broad -q 0.99
DFilter
was
run
using
suggested
configurations
(http://collaborations.gis.astar.edu.sg/~cmb6/kumarv1/dfilter/tutorial.html) with a FDR threshold of 0.99 which allows for
subsequent filtering of results for Precision/Recall analysis:
./run_dfilter.sh –t=H3K4me3.bam –c=Control.bam -f=bam -pe -bs=100 \
-ks=100 -lpval=0.001 –o=H3K4me3_result.bed
./run_dfilter.sh –t=H3K36me3.bam –c=Control.bam -f=bam -pe -bs=100 -ks=20 -lpval=0.001
–nonzero –o=H3K36me3_result.bed
Because CisGenome, SPP, BCP, MUSIC and NCIS [12] work on single end read alignments we
considered only first reads in a proper mapped pair (-f 66) for a fair comparison to peak callers working
on paired end data:
samtools view -b -f 66 Input.bam > Input_SE.bam
samtools view -b -f 66 ChIP.bam > ChIP_SE.bam
For CisGenome, we generated *.aln files with piping bedtools bamtobed [13] and ran CisGenome’s
SeqPeak routine using default parameters with a P-value cutoff of 0.99 on a generated filelist:
bedtools bamtobed –I Input_SE.bam > Input_SE.bed
cut –f 1,2,6 Input_SE.bed > Input_SE.aln
bedtools bamtobed –I ChIP_SE.bam > ChIP_SE.bed
cut –f 1,2,6 ChIP_SE.bed > ChIP_SE.aln
echo -n “Input_SE.aln\t0\nChIP_SE.aln\t1” > ChIP_filelist.txt \
&& ./seqpeak -i ChIP_filelist.txt -d . -o Result -bar 0 -lpcut 0.99
SPP was run in R using suggested configurations (http://compbio.med.harvard.edu/ Supplements/ChIPseq/tutorial.html) with a Z-score threshold of 0.5 and removing chromosomes with no reads mapping to
them:
chip.data = read.bam.tags(“ChIP_SE.bam”)
input.data = read.bam.tags(“Control_SE.bam”)
idx.notnull <- !sapply(chip.data$tags, is.null)
chip.data <- lapply(chip.data, "[", idx.notnull)
input.data <- lapply(input.data, "[", idx.notnull)
bin.charac = get.binding.characteristics(
chip.data,srange = c(50,500),
bin = 5,
accept.all.tags = T
)
broad.clusters = get.broad.enrichment.clusters(
signal.data=chip.data$tags,
control.data=input.data$tags,
window.size=1e3,
z.thr=0.5,
tag.shift=round(bin.charac$peak$x/2)
)
write.broadpeak.info(broad.clusters,file)
BCP was run using the following command:
./BCP_v1.1/BCP_HM -1 ChIP_SE.bed -2 Input_SE.bed -3 EnrichmentCalls.bed -p 0.9
For
MUSIC,
we
downloaded
mappability
files
for
50bp
http://archive.gersteinlab.org/proj/MUSIC/multimap_profiles/hg19/hg19_50bp.tar.bz2
following command:
reads
from
and ran the
samtools view Input_SE.bam | ./MUSIC -preprocess SAM stdin Input/ && \
./MUSIC -sort_reads Input Input/sorted && \
./MUSIC -remove_duplicates Input/sorted 2 Input/dedup
samtools
./MUSIC
./MUSIC
./MUSIC
-mapp
view ChIP_SE.bam | ./MUSIC -preprocess SAM stdin preprocessed && \
-sort_reads preprocessed sorted && \
-remove_duplicates sorted 2 dedup && \
-get_multiscale_punctate_ERs -chip dedup -control Input/dedup \
../Mappability_hg19_50bp -l_mapp 50 -q_val 0.99
Third, to compare called peaks by above methods to enrichR enriched regions, overlap of reported peaks
with 500 bp (1,000 bp) windows was calculated in R for H3K4me3 (H3K36me3) if a peak at FDR 0.05
(0.10) overlapped a window by at least 250bp:
binsize = 500; fdr = 0.05;
gr <- tileGenome(genome.gr, width = binsize)
ov <- matrix(0, nrow = length(gr), ncol = 7)
colnames(ov) <- c(“enrichR”, “MACS2”, “DFilter”, “CisGenome”, “SPP”, “BCP“, “MUSIC“)
for (method in colnames(ov)) {
peaks.sig <- peaks[[meth]][which(peaks[[meth]]$lqval >= -log10(fdr))]
ov[,method][countOverlaps(gr, peaks.sig, minoverlap = 250)> 0 )] <- 1
}
Fourth, a “tool-specific gold standard” for comparison of tools was defined as follows: A bin is enriched
under the gold standard if at least four out of six other methods (including enrichR) called this bin
enriched:
gs <- lapply(colnames(ov), function(method) {
which(apply(ov[,which(colnames(ov) != method)], 1, sum) >= 4)
})
names(gs) <- colnames(ov)
Finally, we computed precision and recall under the tool-specific gold standard for all peaks reported by
a tool:
getPrecRecall <- function(ov, gs) {
mp <- which(ov == 1)
tp <- sum(mp %in% gs)
fn <- sum(!(gs %in% mp))
fp <- sum(!(mp %in% gs))
tn <- dim(ov)[1] - tp - fn – fp
specificity <- tn/(fp+tn)
precision <- tp/length(mp)
recall <- tp/(tp + fn)
f2 <- fscore(precision, recall, 2)
return(c("precision"=precision,
"recall"=recall,
"f2"=f2,
“specificity"=specificity))
}
stats <- mapply(getPrecRecall, as.list(ov), gs)
Precision Recall Curves
First, each tool’s peak calls were ordered by increasing multiple-testing corrected P-values.
peaks <- lapply(peaks, function(p) peaks[order(peaks$lqval, decreasing=TRUE)])
Second, we computed precision and recall under the tool-specific gold standard for first 100, 200, etc.
peaks:
nmax <- max(colSums(ov))
stats.sub <- lapply(seq(100, nmax, 100), function(n) {
ov.sub <- lapply(peaks, function(p) {
if (length(p) < n) {
NULL
} else {
ov <- rep(0, length(gr))
ov[which(countOverlaps(gr, p[1:min(n, length(p))], minoverlap=250) > 0)] <- 1
return(ov)
}
})
return(getPrecRecall(ov.sub, gs))
})
Finally, the obtained precision and recall values were used to plot a precision recall curve. Because some
tools did not report the full spectrum of recall values we calculated a “PartAUC”, i.e. the area under the
curve ranging from the minimum to the maximum recall value. Thus, the “PartAUC” represents a lower
bound of the full AUC.
Catalogue of Tool-specific Regions under the Gold Standard
To study the validity of tool-specific peak calls we catalogued regions not represented by a “unified gold
standard”, i.e. the union of seven tool-specific gold standards defined in the previous section:
gsUnified <- unique(unlist(gs))
toolSpecCalls <- lapply(colnames(ov), function(method) {
which(!(mat[,method] %in% gsUnified))
})
Saturation Analysis for Enrichment Calls based on a Unified Gold Standard Set
First, we removed duplicated fragments (-F 1024) and kept only reads with a mapping quality higher than
20 (-q 20) as described above. Resulting alignment files were artificially downsampled in silico with
samtools to 5%, 10%, 20%, 30%, 50% and 75% of the original sequencing depth:
for sub in .05 .1 .2 .3 .5 .75; do
for f in *.bam; do
of=${f/.bam/.Sub$sub}
samtools view -u -s 1$sub $f | samtools sort -@ 4 - $of && samtools index $of.bam
done
done
Second, Peak calling and overlapping with genomic windows (Step 3) was done as described above.
Then, we calculated the fraction of “unified gold standard” recovered by each method:
Stats.ds <- mapply(getPrecRecall, as.list(ov), rep(list(gsUnified), 7))
Comparison of Difference Callers
To assess diffR’s performance, we downloaded ChIPDiff [8], histoneHMM [9] (v1.6) and ODIN [10]
(v0.4).
First, we removed duplicated fragments (-F 1024) and kept only reads with a mapping quality higher than
20 (-q 20) allowing for fair comparison with diffR:
samtools view -F 1024 –q 20 in.bam > out.bam
Second, we called differential peaks. ODIN was run with the following command incorporating condition
specific input alignments and the hs37d5 genome sequence:
rgt-ODIN -m -v --input-1=Input1.bam –input-2=Input2.bam ChIP1.bam ChIP2.bam \
hs37d5.fa hs37d5_chromSizes
For ChIPDiff which works on single end read alignments we considered only first reads in a proper
mapped pair (-f 66) for a fair comparison to peak callers working on paired end data:
samtools view -b -f 66 ChIP.bam > ChIP_SE.bam
bedtools bamtobed –I ChIP_SE.bam > ChIP_SE.bed
./ChIPDiff ChIP1_SE.bed ChIP2_SE.bed hs37d5_chromSizes
For histoneHMM, enriched regions were called prior to differential enrichment detection as suggested in
the tutorial:
binsize=500 #1000
./histoneHMM_call_regions.R -b $binsize -c hs37d5_chromSizes_Autosomes \
-o ChIP1_histoneHMM -t 20 ChIP1.bam
./histoneHMM_call_regions.R -b $binsize -c hs37d5_chromSizes_Autosomes \
-o ChIP2_histoneHMM -t 20 ChIP2.bam
./histoneHMM_call_differential.R --sample1 ChIP1.bam --sample2 ChIP2.bam \
ChIP1.txt ChIP2.txt
Third, to compare called peaks by above methods to diffR differential regions, overlap of peaks with 500
bp (1,000 bp) windows was calculated in R for H3K4me3 (H3K27me3) if a peak at FDR 0.05 (0.10)
overlapped a window by at least 250bp. As opposed to enrichR, the matrix contains two columns for each
tool because a peak can be enriched either for control or treatment:
binsize = 500; fdr = 0.05;
gr <- tileGenome(genome.gr, width = binsize)
ov <- matrix(0, nrow = length(gr), ncol = 6)
colnames(ov) <- c(“diffR_ctrl”, “diff_treat”, “ODIN_ctrl”, “ODIN_treat”,
“ChIPDiff_ctrl”, “ChIPDiff_treat”, “histoneHMM_ctrl“, “histoneHMM_treat”)
for (method in colnames(ov)) {
peaks.sig <- peaks[[meth]][which(peaks[[meth]]$lqval >= -log10(fdr))]
ov[,method][countOverlaps(gr, peaks.sig, minoverlap = 250)> 0 )] <- 1
}
Fourth, a “tool-condition-specific gold standard” for comparison of tools was defined as follows: A bin
is differentially enriched for one condition under the gold standard if at least two out of three other
methods (including diffR) called this bin differentially enriched for this condition:
gs <- lapply(colnames(ov), function(method) {
col.idx <- which(colnames(ov) != method &
grep(strsplit(method, “_”)[[1]][2], colnames(ov))
which(apply(ov[,col.idx], 1, sum) >= 2)
})
names(gs) <- colnames(ov)
Finally, we computed precision and recall under the tool-condition-specific gold standards for all peaks
reported by a tool:
getPrecRecall <- function(ov, gs) {
mp <- which(ov == 1)
tp <- sum(mp %in% gs)
fn <- sum(!(gs %in% mp))
fp <- sum(!(mp %in% gs))
tn <- dim(ov)[1] - tp - fn – fp
specificity <- tn/(fp+tn)
precision <- tp/length(mp)
recall <- tp/(tp + fn)
f2 <- fscore(precision, recall, 2)
return(c("precision"=precision,
"recall"=recall,
"f2"=f2,
“specificity"=specificity))
}
stats <- mapply(getPrecRecall, as.list(ov), gs)
Supplementary Tables
Supplementary Table 1. Overlap of enrichR, MACS2, Dfilter, CisGenome, SPP, BCP and MUSIC H3K4me3enriched 500 bp (H3K36me3-enriched 1000 bp) regions with GENCODE genes (version 19) and promoters (+/-750
bp TSS).
Supplementary Table 2. Enrichment calling binary classifier statistic based on a 4-tool-consensus vote for tools
enrichR, MACS2, Dfilter, CisGenome, SPP, BCP and MUSIC in Primary Human Hepatocytes H3K4me3 and
H3K36me3 ChIP-seq data.
Supplementary Table 3. NCIS calculated normalization factors for H3K4me3 and H3K36me3.
Supplementary Table 4. Overlap of regimeR H3K9me3 and H3K27me3 1,000 bp regime regions with GENCODE
genes (version 19) and promoters (+/-750 bp TSS).
Supplementary Table 5. Performance of diffR with respect to enrichR-compare calls as ground truth on differential
enrichment in HepG2 cells and Primary Human Hepatocytes (PHH) for different settings, i.e. low power regions
removed, diffR-detected CNVs removed or ENCODE-reported CNVs in HepG2 cells removed.
Supplementary Table 6. Difference calling binary classifier statistic base on a 2-tool-consensus vote for diffR,
ChIPDiff, histoneHMM and ODIN for H3K4me3 and H3K27me3 differences between HepG2 and PHH.
Supplementary Table 7. Metrics for tool-specific regions in difference calls between HepG2 and PHH by diffR,
ChIPDiff, histoneHMM and ODIN for H3K4me3 and H3K27me3 with respect to a unified gold standard.
Supplementary Figures
Supplementary Figure 1. Quality and signal-to-noise ratio (SNR) characteristics for ChIP-seq data used.
(A,B) Fraction with respect to bin with highest coverage plotted against 500bp bin ranks created with the
“plotFingerprint”-function of deepTools (v2.3) for Primary Human Hepatocytes (A) and HepG2 cells (B). A perfect
uniform distribution of reads along the genome (i.e. without enrichments in open chromatin etc.) will be represented
as a straight diagonal line. The Input in both cell types is close to uniform which indicates a good quality. In both
cell types, H3K4me3 shows the greatest SNR which is indicated by a steep rise towards high ranks, i.e. only few
regions harbor H3K4me3 enrichment. H3K36me3 has a SNR substantially less than H3K4me3 but greater than
H3K9me3 and H3K9me3 which indicates more regions being H3K36me3-enriched than H3K4me3-enriched but
fewer than H3K27me3- or H3K9me3-enriched.
Supplementary Figure 2. enrichR calls compare favorably to MACS2, Dfilter, CisGenome, SPP, BCP and
MUSIC for high (H3K4me3) and low (H3K36me3) signal-to-noise ratio ChIP-seq data. (A,B) enrichR
H3K4me3- (A) and H3K36me3-enriched (B) substantially overlap with peaks called by benchmark methods based
on 500 (A) and 1,000 bp (B) genomic intervals. (C) Regions called exclusively by enrichR are distant to peaks
called by other methods. (D,E) H3K36me3-enriched regions called exclusively by one (“excl.”) or multiple (“incl.”)
methods are in most cases significantly higher DNA-methylated (D) and higher expressed (E) as compared to
background regions. enrichR has the greatest amount of tool-exclusive regions (92,508). (F,G) Fold-change (F) and
read coverage (G) in tool-specific (under the gold standard) H3K36me3-enriched regions is significantly greater
(less) than in background (gold standard) regions, expect for DFilter with no tool-specific regions and CisGenome
with 5 tool-specific regions. enrichR has the greatest amount of tool-specific regions (189,636). (H) enrichR toolspecific regions are distant to gold-standard regions. See Supplementary Note for details on the binary classificatory
statistic and the gold standard. Wilcoxon signed-rank Test: “***” ( P ≤ 0.001), “**” ( P ≤ 0.01), “*” ( P ≤
0.05) and “n.s.” ( P > 0.05).
Supplementary Figure 3. Peak Caller Performance based on the tool-specific gold standard set for enrichR,
MACS2, DFilter, CisGenome, SPP, BCP and MUSIC. Precision-Recall Curves are computed based on reported
peaks in Primary Human Hepatocytes (Hepa2) at FDR threshold of 0.99 w.r.t. the tool-specific gold standards at a
FDR of 10% as ground truth (see Supplementary Note) for H3K4me3 (A), H3K36me3 (B), H3K27me3 (C) and
H3K9me3 (D). For DFilter, CisGenome and MUSIC, which do not reach Recall = 1, curves are cut by a vertical
line. A “PartAUC” is calculated which represents a lower bound on the full AUC. An AUC of 1 is best and is
represented by a horizontal line at Precision=1 and a vertical line at Recall=1. The number of 500bp (1kb) bins
called H3K4me3-(H3K27me3/H3K36me3/H3K9me3-)enriched at a FDR of 10% is given in the legend. (A) All
methods expect SPP have a very good precision at Recall smaller than 0.7 for H3K4me3. The greatest PartAUC is
achieved by enrichR. enrichR and BCP report most bins enriched among all methods. (B) For H3K36me3, enrichR
reports the most bins enriched and has the best PartAUC. (C) For H3K27me3, SPP has the best PartAUC but reports
only few bins enriched. enrichR calls more enrichment with its “PartAUC” scoring second. (D) For H3K9me3,
MACS reports the most bins enriched. enrichR has the best “PartAUC”.
Supplementary Figure 4. Saturation Analysis based on a unified gold standard set for enrichR, MACS2,
DFilter, CisGenome, SPP, BCP and MUSIC. Peak callers were run on alignment files that were down sampled in
silico with the “samtools view –s” command to 5%, 10%, 20%, 30%, 50% and 75% (Supplementary Note). (A) For
H3K4me3, MACS2, BCP and enrichR capture >90% of the unified gold standard with 50% of the reads from the
original library. (B) For H3K36me3, MACS2, BCP, MUSIC and enrichR recover >90% at 30%. (C) For
H3K27me3, MACS2 and enrichR recover >90% at 50%. (D) For H3K9me3, only MACS2 recovers >90% at 75%.
Supplementary Figure 5. enrichR normalization factor resembles NCIS normalization factor. (A,B) H3K4me3
ChIP counts in 500 bp regions (A) and H3K36me3 ChIP counts in 1,000 bp regions (B) plotted against Input counts
with marked 𝜃 ∗ (sequencing depth ratio), enrichR’s 𝜃𝐵 (expected fraction of ChIP reads in background) and
𝜃𝐵 (expected fraction of ChIP reads in enriched regions) as well as 𝜃𝑁𝐶𝐼𝑆 (normalization factor calculated by [12]).
𝜃𝐵 is substantially lower than the naïve 𝜃 ∗ and follows the trend of 𝜃𝑁𝐶𝐼𝑆 by accounting for the effect of enrichment
on the correct normalization factor. Low power regions excluded by the T filter are marked and an inlet is given that
shows only low count regions. (B,C) The genome-wide standardized, regularized enrichment calculated by enrichR
for H3K4me3 (B) and H3K36me3 (C).
Supplementary Figure 6. H3K9me3 and H3K27me3 broad and peak regimes have distinct sequence
characteristics in HepG2 cells. (A) The numbers of SINE/LINEs and Retro-elements/LTRs in H3K9me3 regimes
are significantly greater than in background regions. (B) H3K9me3 and H3K27me3 enrichment overlap
substantially in the broad regime, whereas peaks therein suggest mutual exclusivity. (C) H3K27me3 broad (peak)
regime is significantly enriched in CpGs observed over expected, i.e. CpG odds-ratio, over background (broad)
regions. (D) H3K27me3 (H3K9me3) peak regimes are significantly more (less) conserved than background regions.
Wilcoxon signed-rank Test: “***” ( P ≤ 0.001), “**” ( P ≤ 0.01), “*” ( P ≤ 0.05) and “n.s.” ( P > 0.05).
Supplementary Figure 7. Discrepancies between diffR and enrichR-compare are due to low count
regions and Copy Number Variations specific for HepG2 cells. (A-B) diffR-regions for conditional
differences between HepG2 cells and PHH catalogued as fraction of enrichR-compare classified regions
for H3K4me3 (A) and H3K27me3 (B) reveals a diminished sensitivity of diffR. Compare to Figure 4D,E
for enrichR-compare as fraction of diffR. See also Supplementary Table 5. (C,D) When removing low
power regions, diffR becomes more sensitive w.r.t. enrichR-compare results in H3K4me3 (C) and
H3K27me3 (D) ChIP-seq data. False negatives show very low counts. (E,F) When removing diffR-called
CNVs, diffR becomes more sensitive for H3K4me3 (E) and H3K27me3 (F). (G,H) A similar effect is
observed when removing ENCODE measured CNVs in HepG2 cells. Dashed lines represent expected
fold change and read coverage.
Supplementary Figure 8. Input-seq for HepG2 cells and Primary Human Hepatocytes (PHH) can
be utilized for copy number variation (CNV) detection. Input-seq for HepG2 cells (blue) and PHH
(red) indicate CNV presence in HepG2 cells on Human chromosome 14. The ENCODE consortium [14]
measured HepG2 cells genotypes (HAIB Genotype track) which encompass highly amplified (blue) and
deleted regions (red) that deviate from normal (black) in HepG2 cells. diffR on HepG2 and PHH Inputseq for 20 and 50 kb bins identified the depicted large amplification in HepG2 along with a short deletion
within this region.
Supplementary Figure 9. Comparison of diffR to ChIPDiff, histoneHMM and ODIN based on a consensus
vote gold standard (GS). (A,B) diffR-, ChIPDiff- and histoneHMM-specific regions have the greatest fold change
over Input with diffR calling most tool-specific regions among these for H3K4me3 (A) and H3K27me3 (B). (C,D)
diffR- and ODIN-specific regions show greatest coverage among all tools for H3K4me3 (C) and H3K27me3 (D).
Taken together with the fold changes, diffR performs best in identification of true differences, i.e. high absolute fold
change, at regions with sufficient power, i.e. read coverage.
References
1. Kinkley S, Helmuth J, Polansky JK, Dunkel I, Gasparoni G, Fröhler S, et al. reChIP-seq reveals widespread
bivalency of H3K4me3 and H3K27me3 in CD4(+) memory T cells. Nat Commun. 2016;7:12514.
2. Feng J, Liu T, Qin B, Zhang Y, Liu XS. Identifying ChIP-seq enrichment using MACS. Nat Protoc. 2012;7:1728–
40.
3. Kumar V, Muratani M, Rayan NA, Kraus P, Lufkin T, Ng HH, et al. Uniform, optimal signal processing of
mapped deep-sequencing data. Nat. Biotechnol. 2013;31:615–22.
4. Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH. An integrated software system for analyzing ChIPchip and ChIP-seq data. Nat Biotech. Nature Publishing Group; 2008;26:1293–300.
5. Kharchenko PV, Tolstorukov MY, Park PJ. Design and analysis of ChIP-seq experiments for DNA-binding
proteins. Nat. Biotechnol. 2008;26:1351–9.
6. Xing H, Mo Y, Liao W, Zhang MQ. Genome-wide localization of protein-DNA binding and histone modification
by a Bayesian change-point method with ChIP-seq data. PLoS Comput. Biol. 2012;8:e1002613.
7. Harmanci A, Rozowsky J, Gerstein M. MUSIC: identification of enriched regions in ChIP-Seq experiments using
a mappability-corrected multiscale signal processing framework. Genome Biol. 2014;15:474.
8. Xu H, Wei C-L, Lin F, Sung W-K. An HMM approach to genome-wide identification of differential histone
modification sites from ChIP-seq data. Bioinformatics. 2008;24:2344–9.
9. Heinig M, Colomé-Tatché M, Taudt A, Rintisch C, Schafer S, Pravenec M, et al. histoneHMM: Differential
analysis of histone modifications with broad genomic footprints. BMC Bioinformatics. 2015;16:60.
10. Allhoff M, Seré K, Chauvistré H, Lin Q, Zenke M, Costa IG. Detecting differential peaks in ChIP-seq signals
with ODIN. Bioinformatics. 2014;30:3467–75.
11. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and
SAMtools. Bioinformatics. 2009;25:2078–9.
12. Liang K, Keles S. Normalization of ChIP-seq data with control. BMC Bioinformatics. 2012;13:199.
13. Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics.
2014;47:11.12.1–34.
14. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature.
2012;489:57–74.