)Gene Expression Omnibus (GEO
تحلیل دادههای حجیم زیستی
سید ابوالفضل مطهری
دانشکده کامپیوتر ،دانشگاه صنعتی شریف
Data
Sample
RNA/kedney
Feature
PhoneData
time
Basic Types
Platforms (GPL xxx)
A Platform record describes the
list of elements on the array
(e.g., cDNAs, oligonucleotide
probesets, ORFs, antibodies) or
the list of elements that may be
detected and quantified in that
experiment (e.g., SAGE tags,
peptides)
Samples (GSMxxx)
A Sample record describes the
conditions under which an individual
Sample was handled, the
manipulations it underwent, and the
abundance measurement of each
element derived from it. A Sample
entity must reference only one
Platform and may be included in
multiple Series.
Basic Types
Series (GSExxx)
A Series record defines a set of
related Samples considered to be
part of a group, how the Samples
are related, and if and how they
are ordered. A Series provides a
focal point and description of the
experiment as a whole. Series
records may also contain tables
describing extracted data,
summary conclusions, or
analyses.
Datasets (GDSxxx)
GEO DataSets are curated sets of GEO
Sample data. A GDS record represents a
collection of biologically and statistically
comparable GEO Samples and forms the
basis of GEO’s suite of data display and
analysis tools. Samples within a GDS refer
to the same Platform, that is, they share a
common set of probe elements. Value
measurements for each Sample within a
GDS are assumed to be calculated in an
equivalent manner, that is, considerations
such as background processing and
normalization are consistent across the
dataset. Information reflecting experimental
design is provided through GDS subsets.
Package GEOquery
The IRanges package is designed to represent sequences,
ranges representing indices along those sequences, and
data related to those ranges.
IRanges
Genomic Ranges
AnnotationDbi
Genomic Features
Package: Biobase
eSet
ExpressionSet
A virtual class
Data Container
Expression data from microarray experiments (assayData)
‘meta-data’ describing samples in the experiment (phenoData),
Annotations and meta-data about the features on the chip or technology used
for the experiment (featureData, annotation),
Information related to the protocol used for processing each sample
(protocolData),
A flexible structure to describe the experiment (experimentData).
ExpressionSet - construction
eset = ExpressionSet(assayData=matrix(runif(1000), nrow=100, ncol=10))
ExpressionSet (storageMode: lockedEnvironment)
assayData: 100 features, 10 samples element names: exprs
protocolData: none
phenoData: none
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
ExpressionSet - working with data
eset = data(sample.ExpressionSet)
# information about assay and sample data
featureNames(eset)[1:10]
sampleNames(eset)[1:5]
experimentData(eset)
# subset: first 10 genes, samples 2, 4, and 10
expressionSet <- eset[1:10,c(2,4,10)]
# named features and their expression levels
subset <- eset[c("AFFX-BioC-3_at","AFFX-BioDn-5_at"),]
exprs(subset)
# samples with above-average 'score' in phenoData
highScores <- expressionSet$score > mean(expressionSet$score)
expressionSet[,highScores]
#coerce to data.frame or matrix
data_frame= as(eset,”data.frame”)
expressions_matrix= exprs(eset)
Very Useful Packages
The IRanges package is designed to represent sequences,
ranges representing indices along those sequences, and
data related to those ranges.
IRanges
Genomic Ranges
AnnotationDbi
Genomic Features
Genomic Ranges - Classes
single interval range
features
2
multiple interval range
features
genomic alignments
GRanges: Single Interval Range Features
GRanges
GRangesList
The GRanges
class represents a collection
of genomic features GAlignments
that each have a single start
on the genome. This includes features such as contiguous binding sites, transcripts, and exon
can be created by using the GRanges constructor function. For example,
> gr <+
GRanges(seqnames =
+
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
+
ranges =
+
IRanges(1:10, end = 7:16, names = head(letters, 10)),
+
strand =
+
Rle(strand(c("-", "+", "*", "+", "-")),
+
c(1, 2, 2, 3, 2)),
+
score = 1:10,
+
GC = seq(1, 0, length=10))
> gr
GRanges object with 10 ranges and 2 metadata columns:
seqnames
ranges strand
|
score
GC
<Rle> <IRanges> <Rle>
| <integer>
<numeric>
To this end, the GenomicRanges package prov
Genomic
Ranges
- Overlaps
of these
functions
is findOverlaps, which tak
containing the index pairings for the overlappi
> mtch <- findOverlaps(gr, grl)
> as.matrix(mtch)
[1,]
[2,]
[3,]
[4,]
[5,]
queryHits subjectHits
2
1
3
1
4
1
5
2
6
2
As suggested in the sections discussing the natu
above matrix of hits for a GRanges object is a
Another function in the overlaps fam
Genomic
Ranges
Overlaps
each element in the query.
> countOverlaps(gr, grl)
a b c d e f g h i j
0 1 1 1 1 1 0 0 0 0
A third function in this family is s
overlap at least one element in the subje
> subsetByOverlaps(gr,grl)
GRanges object with 5 ranges and 2
seqnames
ranges strand |
a b c d e f g h i j
0 1 1 1 1 1 0 0 0 0
Genomic Ranges - Overlaps
A third function in this family is subsetByOverlaps, which extracts the elem
overlap at least one element in the subject.
> subsetByOverlaps(gr,grl)
GRanges object with 5 ranges and 2 metadata columns:
seqnames
ranges strand |
score
GC
<Rle> <IRanges> <Rle> | <integer>
<numeric>
b
chr2
[2, 8]
+ |
2 0.888888888888889
c
chr2
[3, 9]
+ |
3 0.777777777777778
d
chr2
[4, 10]
* |
4 0.666666666666667
e
chr1
[5, 11]
* |
5 0.555555555555556
f
chr1
[6, 12]
+ |
6 0.444444444444444
------seqinfo: 3 sequences from an unspecified genome
20
Genomic
Ranges
Overlaps
Finally,
you can- use
the select argument to get the inde
for each element in the query.
> findOverlaps(gr, grl, select="first")
[1] NA
1
1
1
2
2 NA NA NA NA
> findOverlaps(grl, gr, select="first")
[1] 2 5
5
Genomic Alignments
In addition to GRanges and GRangesList classes, the Gen
class, which is a more specialized container for storing a set o
support alignments in general, not only those coming from a
alignments with gaps in the reference sequence (a.k.a. gapped
Genomic Ranges - Overlaps
Marc Carlson
AnnotationDbi
March 18, 2015
PLATFORM
PKGS
GENE ID
HOMOLOGY
PKGS
ORG
PKGS
GENE ID
ONTO ID
GENE ID
TRANSCRIPT
PKGS
GENE ID
SYSTEM
BIOLOGY
(GO, KEGG)
ONTO ID’S
Figure 1: Annotation Packages: the big picture
Bioconductor provides extensive annotation resources. These can be gene centric, or genome centric. Anno
tations can be provided in packages curated by Bioconductor, or obtained from web-based resources. Th
vignette is primarily concerned with describing the annotation resources that are available as packages. Mor
advanced users who wish to learn about how to make new annotation packages should see the vignette title
AnnotationDbi
Gene centric AnnotationDbi packages include:
•
Organism level: e.g. org.Mm.eg.db.
•
Platform level: e.g. hgu133plus2.db, hgu133plus2.probes, hgu133plus2.cdf.
•
Homology level: e.g. hom.Dm.inp.db.
•
System-biology level: GO.db
Genome centric GenomicFeatures packages include
•
Transcriptome level: e.g. TxDb.Hsapiens.UCSC.hg19.knownGene
•
Generic genome features: Can generate via GenomicFeatures
One web-based resource accesses biomart, via the biomaRt package:
• Query web-based ‘biomart’ resource for genes, sequence, SNPs, and etc.
## [11] "MAP"
"PATH"
"PMID"
"REFSEQ"
"SYM
library(org.Hs.eg.db)
help
system
as needed"ENSEMBL"
to learn"PATH"
which"ENSEMBLPROT"
values to pass
in to columns
i
##
[11]
"MAP"
"PMID"
##
[16]
"UNIGENE"
"ENSEMBLTRANS"
"GEN
columns(org.Hs.eg.db)
##
[21]
"UNIPROT"
"EVIDENCE"
"ONTOLOGY"
"GOA
##
[16]
"UNIGENE""GO"
"ENSEMBL"
"ENSEMBLPROT"
## [26] "EVIDENCEALL" "ONTOLOGYALL" "OMIM"
"UCSCKG"
AnnotationDbi - select
##
"UNIPROT"
"GO"
"EVIDENCE" "PRO
Solution:
## [21]
[1] "ENTREZID"
"PFAM"
"IPI"
help("SYMBOL")
## for explanation
of these columns and
keytypes values
##
[26]
"EVIDENCEALL"
"ONTOLOGYALL"
"OMIM"
## [6] "ALIAS"
"CHR"
"CHRLOC"
"CHR
library(org.Hs.eg.db)
keytypes(org.Hs.eg.db)
## [11] "MAP"
"PATH"
"PMID"
"REF
columns(org.Hs.eg.db)
help("SYMBOL")
## for"ENSEMBL"
explanation of
these columns
a
##
[16]
"UNIGENE"
"ENSEMBLPROT"
"ENS
## [1] "ENTREZID"
"PFAM"
"IPI"
"PROSITE"
"ACC
##
[1]"ALIAS"
"ENTREZID""CHR" "GO"
"PFAM""CHRLOC" "EVIDENCE"
"IPI"
"
##
[6]
"CHRLOCEND" "ONT
"ENZ
## [21]
"UNIPROT"
keytypes(org.Hs.eg.db)
##
"REFSEQ"
"SYM
## [11]
[26]
"EVIDENCEALL"
"OMIM"
"UCS
##
[6]"MAP"
"ALIAS" "PATH""ONTOLOGYALL"
"CHR" "PMID"
"CHRLOC"
"
## [16] "UNIGENE"
"ENSEMBL"
"ENSEMBLPROT" "ENSEMBLTRANS" "GEN
##
[11]
"MAP"
"PATH"
"PMID"
"
##
[1]
"ENTREZID"
"PFAM"
"IPI"
##
[21]
"UNIPROT"
"GO"
"EVIDENCE"
"ONTOLOGY"
"GOA
help("SYMBOL") ## for explanation of these columns and ke
##
[16]
"UNIGENE"
"ENSEMBL"
"ENSEMBLPROT"
"
## [26]
"EVIDENCEALL"
"UCSCKG"
##
[6]
"ALIAS" "ONTOLOGYALL"
"CHR""OMIM"
"CHRLOC"
##
[21]
"GO"
"EVIDENCE"
"
uniKeys
<- "UNIPROT"
head(keys(org.Hs.eg.db,
keytype="UNIPROT"))
keytypes(org.Hs.eg.db)
##
[11]
"MAP"
"PATH"
"PMID"
cols[26]
<- c("SYMBOL",
"PATH")
##
"EVIDENCEALL"
"ONTOLOGYALL"
"OMIM"
"
##
[16]
"UNIGENE"
"ENSEMBL"
"ENSEMBLPROT"
## [1] "ENTREZID"
"PFAM"
select(org.Hs.eg.db,
keys=uniKeys,
columns=cols,"IPI"
keytype="UNIPROT")"PRO
##
[21]
"UNIPROT"
"GO"
"EVIDENCE"
##
[6]
"ALIAS"
"CHR"
"CHRLOC"
"CHR
help("SYMBOL")
## for explanation
of these ’select’
columns
and
## Warning in .generateExtraRows(tab,
keys, jointype):
resulte
##
[26]
"EVIDENCEALL"
"ONTOLOGYALL"
"OMIM"
## [11]
"MAP"
"PMID"
"REF
between
keys
and return rows "PATH"
##
"UNIGENE"
"ENSEMBL"
"ENSEMBLPROT"
"ENS
Thesekeytypes(org.Hs.eg.db)
four [16]
methods
are named:
columns,
keytypes,
keys and select. And they
are described in this vignette.
##
UNIPROT
SYMBOL
PATH
uniKeys
<head(keys(org.Hs.eg.db,
keytype="UNIPROT"
They can
used with all chip, organism,
and TxDb packages along
with the popular GO.db "ONT
## currently
[21] be"UNIPROT"
"GO"
"EVIDENCE"
package.
cols
<c("SYMBOL",
"PATH")
##
[1]
"ENTREZID"
"PFAM"
"IPI"
"
## [26] "EVIDENCEALL" "ONTOLOGYALL" "OMIM"
"UCS
##
[6] "ALIAS"
"CHR"
"CHRLOC"
"
select(org.Hs.eg.db,
keys=uniKeys,
columns=cols,
key
Genomic Features
TxDb
GRanges
Genomic Features
AnnotationDbi
Genomic Ranges
Retrieving data with select is useful, but sometimes it is more convenient to extract the result as GRanges
objects. This is often the case when you are doing counting or specialized overlap operations downstream. For
these use cases there is another family of methods available.
Genomic Features
Perhaps the most common operations for a TxDb object is to retrieve the genomic coordinates or ranges for
exons, transcripts or coding sequences. The functions transcripts, exons, and cds return the coordinate
information as a GRanges object.
Methods to retrieve GRanges
As an example, all transcripts present in a TxDb object can be obtained as follows:
GR <- transcripts(txdb)
GR[1:3]
## GRanges object
##
seqnames
##
<Rle>
##
[1]
chr15
##
[2]
chr15
##
[3]
chr15
##
-------
with 3 ranges and 2 metadata columns:
ranges strand |
tx_id
tx_name
<IRanges> <Rle> | <integer> <character>
[20362688, 20364420]
+ |
53552 uc001yte.1
[20487997, 20496811]
+ |
53553 uc001ytf.1
[20723929, 20727150]
+ |
53554 uc001ytj.3
The transcripts function returns a GRanges class object. You can learn a lot more about the ma
of these objects by reading the GenomicRanges introductory vignette. The show method for a GRang
will display the ranges, seqnames (a chromosome or a contig), and strand on the left side and the
related metadata on the right side. At the bottom, the seqlengths display all the possible seqnames a
the length of each sequence.
Genomic Features
In addition, the transcripts function can also be used to retrieve a subset of the transcripts avai
Methods
to
retrieve
GRanges
as those on the +-strand of chromosome 1.
GR <- transcripts(txdb, vals <- list(tx_chrom = "chr15", tx_strand = "+"))
length(GR)
## [1] 1732
unique(strand(GR))
## [1] +
## Levels: + - *
The exons and cds functions can also be used in a similar fashion to retrive genomic coordinates
and coding sequences.
Exercise 3
Use exons to retrieve all the exons from chromosome 15. How does the length of this compare to
returned by transcripts?
Solution:
EX <- exons(txdb)
EX[1:4]
and coding sequences.
Exercise 3
Use exons to retrieve all the exons from chromosome 15. How does the length of this com
returned by transcripts?
Genomic Features
Solution:
Methods to retrieve GRanges
EX <- exons(txdb)
EX[1:4]
## GRanges object with 4 ranges and 1 metadata column:
##
seqnames
ranges strand |
exon_id
##
<Rle>
<IRanges> <Rle> | <integer>
##
[1]
chr15 [20362688, 20362858]
+ |
192986
##
[2]
chr15 [20362943, 20363123]
+ |
192987
##
[3]
chr15 [20364397, 20364420]
+ |
192988
##
[4]
chr15 [20487997, 20488227]
+ |
192989
##
------##
seqinfo: 1 sequence from hg19 genome
length(EX)
## [1] 10771
length(GR)
## [1] 1732
3.5
Working with Grouped Features
Genomic
Features
Often one is interested
in how particular genomic features relate to each other, and not just their location
example, it might be of interest to group transcripts by gene or to group exons by transcript. Such group
are supported by the transcriptsBy, exonsBy, and cdsBy functions.
Methods to retrieve GRanges
The following call can be used to group transcripts by genes:
GRList <- transcriptsBy(txdb, by = "gene")
length(GRList)
## [1] 799
names(GRList)[10:13]
## [1] "100033424" "100033425" "100033427" "100033428"
GRList[11:12]
##
##
##
##
##
##
##
##
##
##
##
##
##
##
GRangesList object of length 2:
$100033425
GRanges object with 1 range and 2 metadata
seqnames
ranges strand
<Rle>
<IRanges> <Rle>
[1]
chr15 [25324204, 25325381]
+
columns:
|
tx_id
tx_name
| <integer> <character>
|
53638 uc001yxw.4
$100033427
GRanges object with 1 range and 2 metadata columns:
seqnames
ranges strand | tx_id
tx_name
[1]
chr15 [25326433, 25326526]
+ | 53640 uc001yxz.3
------seqinfo: 1 sequence from hg19 genome
synthetic IDs, you can still always retrieve the original IDs.
Exercise 4
Starting with the tx ids that are the names of the GRList object we just made, use select
matching transcript names. Remember that the list used a by argument = ”tx”, so the lis
transcript IDs.
Genomic Features
Solution:
Methods to retrieve GRanges
GRList <- exonsBy(txdb, by = "tx")
tx_ids <- names(GRList)
head(select(txdb, keys=tx_ids, columns="TXNAME", keytype="TXID"))
##
##
##
##
##
##
##
1
2
3
4
5
6
TXID
53552
53553
53554
53555
53556
53557
TXNAME
uc001yte.1
uc001ytf.1
uc001ytj.3
uc021sex.1
uc010tzb.1
uc021sey.1
Finally, the order of the results in a GRangesList object can vary with the way in which thing
In most cases the grouped elements of the GRangesList object will be listed in the order tha
along the chromosome. However, when exons or CDS are grouped by transcript, they will inst
according to their position along the transcript itself. This is important because alternative sp
that the order along the transcript can be di↵erent from that along the chromosome.
Genomic
Features
Making and Utilizing TxDb Objects
9
Methods to retrieve GRanges
return a GRangesList object grouped by transcript for introns, 5’ UTR’s, and 3’ UTR’s, respectively. Below
are examples of how you can call these methods.
length(intronsByTranscript(txdb))
## [1] 3337
length(fiveUTRsByTranscript(txdb))
## [1] 1825
length(threeUTRsByTranscript(txdb))
## [1] 1803
3.7
Getting the actual sequence data
The GenomicFeatures package also provides provides functions for converting from ranges to actual sequence
(when paired with an appropriate BSgenome package).
library(BSgenome.Hsapiens.UCSC.hg19)
##
##
##
##
Loading
Loading
Loading
Loading
required
required
required
required
package:
package:
package:
package:
BSgenome
Biostrings
XVector
rtracklayer
tx_seqs1 <- extractTranscriptSeqs(Hsapiens, TxDb.Hsapiens.UCSC.hg19.knownGene)
length(threeUTRsByTranscript(txdb))
## [1] 1803
Genomic Features
3.7
Getting the actual sequence data
Methods to retrieve the actual sequence
The GenomicFeatures package also provides provides functions for converting from ranges to actual sequence
(when paired with an appropriate BSgenome package).
library(BSgenome.Hsapiens.UCSC.hg19)
##
##
##
##
Loading
Loading
Loading
Loading
required
required
required
required
package:
package:
package:
package:
BSgenome
Biostrings
XVector
rtracklayer
tx_seqs1 <- extractTranscriptSeqs(Hsapiens, TxDb.Hsapiens.UCSC.hg19.knownGene)
And, once these sequences have been extracted, you can translate them into proteins with translate:
suppressWarnings(translate(tx_seqs1))
##
A AAStringSet instance of length 3337
##
width seq
##
[1]
125 EDQDDEARVQYEGFRPGMYVRVEIENV...QRLLKYTPQHMHCGAAFWA*FSDSCH
##
[2]
288 RIAS*GRAEFSSAQTSEIQRRRSSVLL...IFLFFESVFYSVYFNYGNNCFFTVTD
##
[3]
588 RSGQRLPEQPEAEGGDPGKQRRRAEHR...KVICERDLLENETHLYLCSIKICFSS
##
[4]
10 HHLNCRPQTG
##
[5]
9 STVTLPHSQ
##
...
... ...
## [3333]
10 QVPMRVQVGQ
## [3334]
306 MVTEFIFLGLSDSQELQTFLFMLFFVF...TLRNKDMKTAIRRLRKWDAHSSVKF*
## [3335]
550 LAVSLFFDLFFLFMCICCLLAQTSRVL...RRQSLTPRRLHPAQLEILY*KHTVGF
names
uc001yte.1
uc001ytf.1
uc001ytj.3
uc021sex.1
uc010tzb.1
uc021syy.1
uc002cdf.1
uc002cds.2
library(BSgenome.Hsapiens.UCSC.hg19)
##
##
##
##
Loading
Loading
Loading
Loading
required
required
required
required
package:
package:
package:
package:
BSgenome
Biostrings
XVector
rtracklayer
Genomic Features
tx_seqs1 <- extractTranscriptSeqs(Hsapiens, TxDb.Hsapiens.UCSC.hg19.knownGene)
Methods to retrieve the actual sequence
And, once these sequences have been extracted, you can translate them into proteins with translate:
suppressWarnings(translate(tx_seqs1))
##
##
##
##
##
##
##
##
##
##
##
##
##
A AAStringSet instance of length 3337
width seq
[1]
125 EDQDDEARVQYEGFRPGMYVRVEIENV...QRLLKYTPQHMHCGAAFWA*FSDSCH
[2]
288 RIAS*GRAEFSSAQTSEIQRRRSSVLL...IFLFFESVFYSVYFNYGNNCFFTVTD
[3]
588 RSGQRLPEQPEAEGGDPGKQRRRAEHR...KVICERDLLENETHLYLCSIKICFSS
[4]
10 HHLNCRPQTG
[5]
9 STVTLPHSQ
...
... ...
[3333]
10 QVPMRVQVGQ
[3334]
306 MVTEFIFLGLSDSQELQTFLFMLFFVF...TLRNKDMKTAIRRLRKWDAHSSVKF*
[3335]
550 LAVSLFFDLFFLFMCICCLLAQTSRVL...RRQSLTPRRLHPAQLEILY*KHTVGF
[3336]
496 LAVSLFFDLFFLFMCICCLLAQTSRVL...EAVTDPETFASCTARDPLLKAHCWFL
[3337]
531 LAVSLFFDLFFLFMCICCLLAQTSRVL...RRQSLTPRRLHPAQLEILY*KHTVGF
names
uc001yte.1
uc001ytf.1
uc001ytj.3
uc021sex.1
uc010tzb.1
uc021syy.1
uc002cdf.1
uc002cds.2
uc010utv.1
uc010utw.1
Exercise 5
But of course this is not a meaningful translation, because the call to extractTranscriptSeqs will have
extracted all the transcribed regions of the genome regardless of whether or not they are translated. Look
at the manual page for extractTranscriptSeqs and see how you can use cdsBy to only translate only the
coding regions.
Look at getSeq() at BSgenome
Genomic Features
Making and Utilizing TxDb Objects
CDS
Solution:
cds_seqs <- extractTranscriptSeqs(Hsapiens, cdsBy(txdb, by="tx"))
translate(cds_seqs)
##
##
##
##
##
##
##
##
##
##
##
##
##
A AAStringSet instance of length 1875
width seq
[1]
102 MYVRVEIENVPCEFVQNIDPHYPIILG...EDHNGRQRLLKYTPQHMHCGAAFWA*
[2]
435 MEWKLEQSMREQALLKAQLTQLKESLK...QEHPGLGSNCCVPFFCWAWPPRRRR*
[3]
317 MKIANNTVVTEFILLGLTQSQDIQLLV...QEVKTSMKRLLSRHVVCQVDFIIRN*
[4]
314 METANYTKVTEFVLTGLSQTPEVQLVL...YTLRNKEVKAAMRKLVTKYILCKEK*
[5]
317 MKIANNTVVTEFILLGLTQSQDIQLLV...QEVKTSMKRLLSRHVVCQVDFIIRN*
...
... ...
[1871]
186 MAGGVLPLRGLRALCRVLLFLSQFCIL...RDHVHCLGRSEFKDICQQNVFLQVY*
[1872]
258 MYNSKLWEASGHWQHYSENMFTFEIEK...GGKWYPVNFLKKDLWLTLTWITVVH*
[1873]
803 MAAEALAAEAVASRLERQEEDIRWLWS...ILVTSAIDKLKNLRKTRTLNAEEAF*
[1874]
306 MVTEFIFLGLSDSQELQTFLFMLFFVF...TLRNKDMKTAIRRLRKWDAHSSVKF*
[1875]
134 MSESINFSHNLGQLLSPPRCVVMPGMP...QGSCYKGETQESVESRVLPGPRHRH*
names
53552
53558
53570
53571
53572
56842
56843
56844
56885
56887
Genomic Features
Making TxDb Object
makeTranscriptDb is a low-level constructor for making a TxDb object from user
supplied transcript annotations.
The makeTranscriptDbFromUCSC function allows the user to make a TxDb object from
transcript annotations available at the UCSC Genome Browser.
The makeTranscriptDbFromGFF function allows the user to make a TxDb object from
transcript annotations available as a GFF3 or GTF file.
The makeTranscriptDbFromBiomart function allows the user to make a TxDb object
from transcript annotations available on a BioMart database.
Genomic Features
Making TxDb Package
A TxDb package is an annotation package containing a TxDb object.
The makeTxDbPackageFromUCSC function allows the user to make a TxDb package
from transcript annotations available at the UCSC Genome Browser.
The makeTxDbPackageFromBiomart function allows the user to do the same thing as
makeTxDbPackageFromUCSC except that the annotations originate from biomaRt.
Finally, the makeTxDbPackage function allows the user to make a TxDb package
directly from a TxDb object.
© Copyright 2026 Paperzz