preface - Oxford University Press

PREFACE
In 2001, in a massive collaborative effort of scientists,
working from a multitude of disciplines, including biology, biochemistry, chemistry, genetics, engineering, and
computer science, one of the tremendous feats in the history of science was accomplished: the sequencing of the
human genome.
Today, we can imagine a not-too-distant future in
which our personal genomes are entirely known to us.
We will download our genetic data and know by our sequences whether we are susceptible to particular diseases such as diabetes, cancer, and stroke. We’ll modify
our behaviors and mitigate these risks—our lives will
change. For some of us, a poor genetic profile will affect
our outlook on life, or the economics of our lives. How
will medicine adapt to common knowledge of the
genome? We do not quite know yet what this world looks
like, but some of its weightiest questions are already
being asked and debated—and studied by a rapidly expanding field of genomics and bioinformatics research.
These are questions about the modern world, the modern
person, and the future of biological science.
Welcome to the world of bioinformatics.
THE APPROACH
Concepts in Bioinformatics and Genomics takes a conceptual approach to its subject, balancing biology, mathematics, and programming, while highlighting relevant
real-world applications. Topics are developed from the
fundamentals up, like in an introductory textbook. This
is a comprehensive book for students enrolled in their
first course in bioinformatics. A compelling case study
gene, the TP53 gene, a human tumor suppressor with
strong clinical applications, runs throughout, engaging students with a continuously relevant example.
The textbook thoroughly describes basic principles of
00-Momand-FM.indd 13
probability as they lead up to the concept of Expect value
(E-value) and its use in sequence alignment programs.
Concepts in Bioinformatics and Genomics also describes,
from a mathematical perspective, the development of the
hidden Markov model and how it can be used to align
sequences in multiple sequence alignment programs.
Finally, it introduces students to programming exercises
directly related to bioinformatics problems. Thoughtprovoking exercises stretch the students’ imaginations
and learning, giving them a deeper understanding of
software programs, molecular biology, basic probability,
and program-coding methodology underpinning the
discipline. The material covered in this book provides
students with the fundamental tools necessary to analyze biological data.
ORGANIZATION
Introduction to Bioinformatics: Chapters 1–5
CHAPTER 1 is an overview of molecular biology. It will
provide the essential biology vocabulary for understanding bioinformatics. Chapter 2 introduces GenBank, the
database that stores the vast amounts of DNA and RNA
sequence data crucial for bioinformatics research.
CHAPTER 3 discusses molecular evolution, which explains the diversity of sequences and how mutations get
passed to progeny. Chapter 4 delves into the derivation
of amino acid substitution matrices, the basis of sequence comparison programs, which help us connect
molecular evolution to protein structure and function.
Chapter 5 discusses amino acid substitution matrices
and pairwise sequence comparison programs. Here, we
begin to get into the nuts and bolts of algorithms that use
data from evolution and protein domain conservation to
infer whether two genes are homologs.
27/05/16 7:40 PM
xivPREFACE
Biology: Chapters 6–10
CHAPTER 6 further develops the topic of pairwise sequence comparison by describing the Basic Local Alignment Search Tool (BLAST) and discusses multiple
sequence alignment programs with an emphasis on the
first popular program of this class—ClustalW. Chapter 7 is devoted to protein structure prediction programs.
This chapter provides strong foundational knowledge of
protein structures and the Protein Data Bank. Chapter 8
introduces phylogenetics with a discussion of DNA, protein sequence information, and the construction of phylogenetic trees. Chapter 9 presents genomics analysis
with an emphasis on next-generation sequencing (NGS),
and annotation of bacterial genomes. Chapter 10 is all
about gene expression. Approximately half of this chapter is devoted to methods to measure transcript levels
with an emphasis on microarrays and RNA-seq. The
other half is devoted to proteomics, where we describe
how mass spectrometry is used to identify proteins isolated from 2D-gels.
Mathematics: Chapters 11–12
CHAPTER 11 introduces you to probability, a requisite
component of bioinformatics research, with an emphasis
on counting methods, dependence, Bayesian inference,
and random variables. In Chapter 12 the subject of a continuous random variable, introduced in the previous
chapter, will be further developed into a discussion of
the extreme value distribution and its use in analyzing
the significance of an alignment. We conclude the chapter with stochastic processes, specifically Markov chains
and hidden Markov models, as well as a mathematical
derivation of the Jukes-Cantor model.
Programming: Chapters 13–14
CHAPTER 13 focuses on Python, a popular bioinformatics programming language. The Kyte-Doolittle Hydropathy sliding window program (one of the first popular
bioinformatics programs) is used to illustrate Python
fundamentals and to introduce you to the program
design process. Chapter 14 follows this design process
and steps you through the development of a pairwise
­sequence alignment tool.
FOR PROFESSORS
Approach and Rationale
The bioinformatics discipline has matured to the point
where there is general agreement on the software programs and databases that are standards in the field. The
algorithms that form the foundations of these software
programs will not significantly change within the next
00-Momand-FM.indd 14
three to four years. Similarly, databases that are bulwarks
of the field will not vanish in the foreseeable future. Understanding the rationale for the basis of these bioinformatics tools is critical for students pursuing molecular
life science or bioinformatics careers.
Flexible Organization
Overall, biology, mathematics, and computer science are
presented in an order that systematically develops a student’s understanding of the area. To highlight relevant
connections between the three, we include crossreferences in the main text and in footnotes. Those who
wish to teach the course with the biology-heavy chapters
in the beginning may consider presenting the chapters in
the order listed in the table of contents. In this order, the
biology-heavy chapters (Chapters 1 through 10) are followed by two mathematics-heavy chapters (Chapters 11
and 12) and two computer science-heavy chapters
(Chapters 13 and 14).
If instructors wish to integrate computer programming early into the course, they may want to consider
presenting the chapters in the following order: 1–5, 13,
14, and 6–12. Chapters 1 through 5 provide the biological rationale for pairwise sequence alignment and Chapters 13 and 14 provide the computer programming
background so that students can create their own software tools to align sequences. The programming concepts in Chapters 13 and 14 reinforce the biological
principles covered in Chapters 1 through 5. To provide
students with more time to learn the Python programming basics, instructors may wish to intersperse topics
from Chapters 13 and 14 among topics covered in Chapters 1 through 5. After covering Chapters 1 through 5, 13
and 14, material from the more biology-heavy chapters
(Chapters 6–10) and the mathematics-heavy chapters
(Chapters 11–12) can be covered.
Some bioinformatics and genomics courses are taught
in a format consisting of a lecture section and a separate
computer lab section. If this is the case, the lecture section can focus on Chapters 1 through 12, the lab section
on Chapters 13 and 14. The lab section may allow more
time for students to work through small coding assignments that together provide a foundation for a more extensive programming project (described in Chapter 14)
to be completed by the end of the lab course. Another
way of dividing the material between lecture and lab sections is to focus the lecture on the biology-heavy chapters
(Chapters 1–10) and include Chapters 11–14 in the lab.
If instructors would like to integrate mathematics
earlier in the course they may consider covering Chapters 11 and 12 just prior to Chapter 6. The introductory
basic probability segment of Chapter 11, followed by the
explicit derivation of extreme value distribution in
Chapter 12, provide a strong foundation for the discussion of E-value, an important component of the BLAST
27/05/16 7:40 PM
PREFACE
xv
SUGGESTED ALTERNATIVE PRESENTATIONS OF TEXTBOOK
PRESENTATION ORDER OF
ALTERNATE CHAPTER
FIRST FIVE CHAPTERS
PRESENTATION ORDER
1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12, 13, 14
Biology-heavy chapters first with cross-references to
mathematics- and computer science-heavy chapters.
1, 2, 3, 4, 5,
13, 14, 6, 7, 8, 9, 10, 11, 12
Biology-foundation chapters first with computer
science-heavy chapters more integrated.
1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12-lecture
13, 14-lab
Biology-heavy and mathematics-heavy lecture section
with a lab focused on computer science.
1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11, 12–lecture
11, 12, 13, 14–lab
Biology-heavy lecture section with a lab focused on
mathematics and computer science.
1, 2, 3, 4, 5,
11, 12, 6, 7, 8, 9, 10, 13, 14
Biology-foundation chapters first with mathematicsheavy chapters more integrated.
program discussed in Chapter 6. The segment of
Chapter 12 that introduces hidden Markov models will
strengthen the students’ understanding of multiple sequence alignment discussed in Chapter 6.
The table above shows our suggestions for alternative sequences of the textbook chapters that can be tailored to your particular needs.
THE FEATURES
Balance of Biology, Mathematics,
and Programming
Concepts in Biochemistry and Genomics strikes a balance
of topics for all students, no matter their background. Biology students will appreciate the reinforcement of the
molecular life science topics and the gradual introduction to basic probability and programming concepts.
Basic probability and programming use examples in biology to help biology students see the relevance of these
concepts to molecular life science. Mathematics is expertly interwoven with bioinformatics concepts. Students with a background in computer programming will
appreciate the basic biology primer in the first chapter.
For students who already know how to program in another language, this textbook offers the opportunity to
learn the fundamentals of a new language, Python.
Genomics
Genomics is a field that studies the entire sequenced
­genomes of organisms. Bioinformatics programs and
­databases are highly applicable to genomics because of
the critical need to analyze and store a large amount of
sequence data. Without bioinformatics, we cannot fully
assess the genomics data we have c­ ollected. Chapters
that emphasize genomics are C
­ hapter 8 (“Phylogenetics”),
00-Momand-FM.indd 15
TYPE OF INTEGRATION
Chapter 9 (“Genomics”) and Chapter 10 (“Transcript
and Protein Expression Analysis”).
Case studies of TP53, the Tumor
Suppressor Gene
The TP53 tumor suppressor is mutated in virtually all
cancer types, and there is wide interest in using this
knowledge to develop better cancer therapies. In Chapter 1, we discuss how p53 was discovered as a protein
bound to a monkey virus oncoprotein, and in the last
chapter, we show students how to create sequence alignment programs that quantify the similarities between
p53 and its paralogs, p63 and p73. By the end of this textbook, students and instructors will have a deep understanding of the molecular biology of this gene and how
bioinformatics can be used to further research progress
in the fight against cancer.
Scientist Spotlight
Scientists who made significant contributions to the bioinformatics field are highlighted in “Scientist Spotlight”
boxed sections. The scientists who ­
created the first
widely applicable amino acid substitution matrices
(Margaret Dayhoff), the first global sequence alignment
program (Christian Wunsch), the first local sequence
alignment program (Michael ­Waterman), and the first
program that successfully predicted protein membrane
spanning regions (Russell Doolittle)—these are just a
few of the brilliant discoveries and minds featured.
A Closer Look
From the TP53 gene to DNA fingerprinting and the
­Neanderthal genome, this boxed material examines in
detail some of the most important elements of Concepts in Bioinformatics and Genomics. Replete with figures, photographs, and excerpts from published texts,
27/05/16 7:40 PM
xviPREFACE
“A Closer Look” provides the background and clarity
needed to fully grasp the relevance of bioinformatics.
Thought Questions
Interspersed throughout the text, “Thought Questions”
ask the important conceptual questions and prompt students to problem-solve and apply their knowledge on the
fly. These questions provide students opportunities to
self-test and better engage with their reading. Answers
are found at the end of the chapter.
End-of-Chapter Exercises
Additionally, a robust list of end-of-chapter exercises
encourages students to apply their bioinformatics knowledge holistically. Exercises are qualitative and quantitative, specific and comprehensive.
Glossary Terms
Glossary terms are highlighted and defined the first time
they appear in the text. Concise explanations of the
terms are also provided in the glossary section at the end
of the book.
SUPPORT PACKAGE
Oxford University Press offers a comprehensive ancillary package for instructors and students using Concepts
in Bioinformatics and Genomics.
00-Momand-FM.indd 16
For Students
Companion website (www.oup.com/us/momand):
Resources and links to bioinformatics software, tools,
and databases are available on the companion website.
These are stable resources, such as Dotter, BLAST,
GenBank, and many more, that have matured with the
discipline into the essential tools for the bioinformatician. The companion site also provides downloadable
programming tools that are necessary for students to
complete the ­programming projects and end-of-chapter
exercises.
For Instructors
The Ancillary Resource Center (ARC), located at www
.oup-arc.com/momand, contains the following teaching tools:
• Digital Image Library includes electronic files in
PowerPoint format of every illustration, photo,
graph, figure caption, and table from the text—both
labeled and unlabeled versions.
• Answers to End-of-Chapter Questions includes
detailed solutions to all of the many exercises provided at the end of each chapter.
• Editable Lecture Notes in PowerPoint format for
each chapter help make preparing lectures faster and
easier than ever. Each chapter’s presentation includes
a succinct outline of key concepts and incorporates
the graphics from the chapter.
27/05/16 7:40 PM