Applications in Operations Research

Computational Biology and Bioinformatics:
Applications in Operations Research
Dave Goulet and Allen Holder
March 2, 2015
1
The Interplay Between Operations Research
and the Life Sciences
Operations Research (OR) was a young discipline in the 1950s when it had its
first brush with the biological sciences. For instance, initial computer simulations of evolution were conducted by Barricelli in the middle 1950s [6], and
in the 1960s Bremermann described how genetic algorithms could be used to
solve optimization problems [10]. A more modern example of how the biological
sciences have impacted our ability to heuristically search for optimality is that
of ant colony optimization, which was developed in the 1990s [13]. OR’s overarching goal to improve, and if possible optimize, the decision making process
has benefited from mimicking natural processes since nature itself seems to seek
optimality.
While OR has adopted biological principals for its own advancement, the intrinsic disciplinary overlap suggests that the biological sciences might similarly
benefit from OR. Indeed, the innate optimal quality of nature, together with the
fact that OR has built a myriad of mathematical and computational methods
to compute optimal qualities, has promoted the application of OR to problems
in the biological sciences. Biological applications of OR tend to differ from OR
applications in business and industry because the biological entities are not independent agents that can make decisions. Even so, the natural trends of many
biological entities are often toward optimal states that can be appropriately
modeled with traditional OR techniques.
The application of OR to biology blossomed with the advent of the fields
of computational biology and bioinformatics in the 1990s. Numerous biological
investigations were moving from wet lab research to computer models that could
simulate a natural process. The computational setting promised a vast increase
in the speed of our experimental ability, and hence, the number of (simulated)
experiments could be far larger than what would have been possible in a laboratory. Simulated results could then be used to identify which experiments should
be confirmed by more costly wet lab research.
1
This article samples some of the OR applications in the biological sciences.
The presentation is arranged relative to biological scale, starting with problems
in biochemistry and moving toward legacy studies of entire populations. Other
surveys of OR applications in biology are divided by the type of OR [23]. Each
section below begins with a succinct introduction describing the biological relevance and ends with OR examples. Further details are found in the citations.
2
Proteins
Proteins are the functional workhorses of life. For example, proteins form the
contractile elements in muscle tissue as well as the walls of the ion channels in
the neurons controlling those muscles. Mammary acinar cells create the large
amounts of protein in breast milk while simultaneously responding to protein
signaling molecules in their environment. Proteins on the surfaces of some
viruses act as syringes for injecting DNA into host cells. Other proteins span the
cell walls of bacteria and combine to form molecular motors. Protein tethers and
anchors pull chromosomes apart during cell division, and synthetic proteins can
mark other proteins on the surface of cancerous cells so that cytotoxins, made
of protein, can kill those cancerous cells without harming healthy ones. There
are proteins that allow immune cells to recognize and respond to pathogens,
and there are proteins (prions) that are themselves inanimate pathogens.
2.1
Biological Introduction
Proteins in their simplest form are bonded chains of amino acid molecules.
These polypeptide chains can be composed of tens of thousands of the many
known types of amino acids (20 in humans), allowing staggering combinatorial
complexity. The amino acid sequence of a peptide chain is known as the protein’s
primary structure.
The importance of proteins and their amino acid building blocks is illustrated
by the theory that all life on earth started with the formation of amino acids. In
1953 Miller and Urey [34] placed water, methane, ammonia, and hydrogen into
a vessel and applied heat and electricity. Trace amounts of amino acids were
identified two weeks later. In 1969 the Muchinson meteorite struck Australia,
and when examined, it was found to contain amino acids. More recently, amino
acids were discovered in lunar soil from the Apollo missions [18]. It’s tempting
to interpret the genesis and apparent ubiquity of amino acids as the fascinating
origins of life on earth; however, the primary structure only initiates the protein
story.
Amino acids within a single peptide chain can form weak bonds with others
nearby, leading to secondary structures such as alpha helices, beta sheets, and
beta barrels. Secondary structures create relatively rigid and geometrically fixed
regions on an otherwise flexible chain. Because these structures may form in
multiple locations along the chain, combinatorial complexity beyond that of the
primary sequence is possible.
2
Stronger bonds between distant amino acids can form either spontaneously
or with the assistance of chaperon proteins. These bonds cause proteins to fold
and become compact and globular, with some amino acids being internalized.
This tertiary structure is the basis of protein function. Folding can, in theory, be
modeled by simulating molecular dynamics or quantum mechanics. However,
such simulations are complicated by the action of small molecules and other
proteins, which may interfere or assist with folding. A stable fold is assumed to
represent a locally optimal energetic state, but other local optima may exist, and
a protein may oscillate among multiple folded states. These temporal dynamics
augment the spatial combinatorial complexity.
Quaternary structures are formed by multiple polypeptide chains, further
increasing complexity and potential functionality. An example is that of the
potassium ion channel embedded in a neuron’s cell membrane. The ion channel
is formed by four total protein subunits of two types, each of which can be in
an open or closed state.
2.2
OR Applications and Proteins
Arguably the biggest outstanding problem in computational biology is the simulation of protein folding. The ability to accurately predict tertiary structure
from primary sequence, and hence, the ability to infer functionality from amino
acid chains, would revolutionize much of biology, medicine, and health care.
However, this proverbial “holy grail” of computational biology has eluded our
scientific and computational ability. From an OR perspective protein folding
is a difficult nonlinear, global optimization problem that minimizes free energy.
The free energy model is based on the pioneering work of Christian Anfinsen [4],
which lead to the thermodynamic hypothesis stating that a protein’s native state
is uniquely determined by its primary sequence. There are several introductions
to protein folding, see [8] as an example.
Each amino acid is uniquely determined by a side-chain of atoms, and the
torsion angle of the bond that connects a side-chain to the protein is called a
rotamer. Rotamers must be one of a select number of angles due to energy
restrictions. The possible angles at each site are cataloged in a library. Determining side-chain conformations to minimize energy is called the rotamer
assignment problem, which has traditionally been a nonlinear combinatorial
problem [31] (see [11] for related semidefinite optimization).
Another problem in which OR can help is the searching, comparing, and
cataloging of proteins whose structures are stored in a database. The protein
data bank, www.rcsb.org, collects and stores protein structures, and as of 2013,
the database is nearing 100,000 structures. Traditional comparisons among proteins were made by aligning amino acid sequences, but the goal of identifying
functional similarity is better addressed by aligning three dimensional folds.
However, conducting all pairwise comparisons requires efficient algorithms and
clever modeling. Several combinatorial approaches in the early 2000s were suggested, see [12, 32]. Such combinatorial approaches required days of computing
to complete the pairwise comparisons of small, 40 protein databases, and their
3
ability to identify known protein families was imperfect. A few years latter
several groups realized how to use dynamic programming to efficiently and accurately align larger databases [9, 30, 40], and databases with 100s of proteins
now solve in seconds with perfect accuracy. The increased efficacy has further
led to stochastic studies [27].
3
Metabolisms
High throughput biological experiments have produced an immense amount of
information about cellular processes, and whole-cell models based on this data
are becoming possible. The grand goal is to build computational models that
couple gene expression and protein interaction with the metabolism. This three
tiered approach to whole-cell modeling mirrors the central dogma of molecular
biology, which asserts a one-way flow of information from DNA to mRNA to
protein, and while imperfect, the central dogma remains the prevailing framework with which to approach cellular biology. The first tier is a network that
explains the co-expression of the genes encoded by the DNA, and the second tier
is a network that explains pairwise relationships between the resulting proteins.
The last tier is the metabolism, which is the collection of bio-chemical reactions
that supports life. Linking the three networks so that they correctly imitate
the regulatory and reactive mechanisms of a cell is a research question on which
OR can have an impact. Indeed, standard OR tools have already established
themselves substantively in the study of metabolisms as discussed below.
3.1
Biological Introduction
A cell’s metabolism is the net flux of all biochemicals in and out of the cell.
Cells intake carbohydrates, proteins, lipids, and many small molecules and ions.
These species act as building materials and fuel for the cell as it grows, repairs
itself, recycles molecules, replicates it’s genome, and exports materials to its
environment.
The engines driving these microscopic metabolic factories are the mitochondria. These organelles convert the energy held in the bonds of nutrients into
readily usable energy that is stored in the bonds of the molecule Adenosine
Triphosphate (ATP). ATP is transported intra- and intercellularly, and it fuels
the metabolism by sequentially breaking off phosphate groups to extract energy,
which transforms ATP into diphosphate and monophosphate forms (ADP and
AMP). An aerobic metabolism produces ATP from carbohydrates using available oxygen. Energy production in animal cells is primarily aerobic. Anaerobic
production of ATP is possible, though less efficient. The primary mechanism of
ATP production in yeast is anaerobic, and E. coli cells can survive in either an
aerobic or anaerobic state.
Different cell types emphasize different metabolic processes and products.
Muscle cells create relatively large amounts of ATP to fuel the contraction cycle
in muscle fibers. Brain cells create and recycle surface receptors, ion channels,
4
and neurotransmitters. Cells of the liver and gall bladder emphasize the production of enzymes used for digestion. Mammary cells secrete milk lipids and
proteins. Amoeba secrete cyclic-AMP in order to communicate with neighboring
amoeba. Penicillium fungi produce antibiotics to fend off bacterial opportunists,
and cells of the hair follicles intake sulfurous compounds to use in cross linking
proteins to produce thin but durable strands of hair.
Although many components of the cellular metabolism are well characterized, the timing and quantification of a metabolism, especially for ensembles of
cells, is poorly understood. A collection of cells can be viewed as a black box.
In some well controlled experiments all inputs to and outputs from this box
can be measured. Broad conclusions can be drawn about the metabolism of a
typical cell, but this averaging to quantify a typical cell is deceptive. Intercellular interactions are homogenized over the ensemble, as though a single cell in
isolation were capable of accomplishing all metabolic feats on its own.
3.2
OR and Metabolic Models
Mathematical programming has proven itself to be a trusted computational
tool in the study of whole cell metabolisms. Metabolic models are constructed
by listing the chemical reactions of the cell to create a stoichiometric matrix,
S. The linear system Sv = 0 holds if the metabolism is assumed to have
evolved to a steady state, where vj is the flux of reaction j. The resulting
system is under determined, and an objective function is introduced to help
identify reasonable metabolic states. Typically the objective is the growth rate,
although others have been suggested and studied [38]. Allowing g(v) to be an
appropriate objective, the FBA model is
max{g(v) : Sv = 0, L ≤ v ≤ U },
where L and V are variable bounds on the fluxes.
A common use of FBA is to predict lethal gene knockouts. Removing a
gene’s expression can remove some of the resulting proteins, which can subsequently halt reactions. In cells such as E. coli the map between gene knockouts
and reactions is well understood, and an FBA model can replicate the gene
knockout by setting the appropriate fluxes to zero. If the optimal value diminishes sufficiently, then the gene knockout is lethal and the gene is said to be
essential. FBA correctly predicts gene essentiality with over 90% accuracy [36].
FBA has been studied and extended in many ways. A quadratic model
that minimizes metabolic adjustment is a common adaptation to predict gene
essentiallity [39]. Extreme pathways have been studied as basic optimal solutions [37], and the central metabolism can be identified if FBA’s innate degeneracy is handled carefully [2]. Robust extensions to accommodate stochastic
modeling parameters have also been investigated [1]. Lastly, recent directives
suggest amalgamations that meld metabolic models with gene-expression to include external factors such as temperature stress [35].
5
4
Microtubules
Cells require infrastructure to house their metabolic outcomes much like factories require infrastructure to produce their goods. The functional scaffolding
of many cells is formed by mircrotubules, which are rigid proteinaceous tubes
similar in diameter to carbon nano wires and nanotubes. Microtubules endow
cells with rigidity, transport organelles, provide cellular locomotion, regulate
cell growth and geometry, and separate chromosomes during cell division.
Cells come in numerous shapes, sizes, and forms. For example, a huge ostrich
egg is a single cell, as is the nucleus free red blood cell. Microtubule interactions are responsible for the cellular structure in each case. The regulation
of microtubule interactions varies depending on cell type, and computational
models that simulate the growth and demise of microtubule structures aid our
understanding of the complex structural dynamics.
4.1
Biological Introduction
At the start of mitotic cell division in animals, a cell’s centrosome is duplicated
and the resulting pair separates. The centrosomes migrate to opposite poles
of the cell. A centrosome is a hub from which microtubule spokes emanate,
and as cell division proceeds, tubules emanating from a centrosome lengthen
and attach to chromosome pairs aligned along the cell’s midline. Because the
centrosomes are anchored to the plasma membrane, the chromosomal microtubules can contract and pull chromosome pairs towards the poles. Once the
chromosomes are symmetrically separated, cell division proceeds, leaving each
daughter cell with a full genetic complement and a single centrosome.
Microtubules apply force by altering their length, which vary according to a
dynamically unstable biochemical process. Tubulin heterodimers bind sequentially to form a hollow tube with 13 dimers per helical revolution. Because the
helix is composed of tubulin heterodimers (a bonded pair of two types of tubulin
monomers) the two exposed ends of the microtubule present different monomers.
This polarization of the microtubule results in different binding affinities for free
tubulin on each end, with the + end binding more readily than the - end. The
unstable nature of microtubules allows rapid changes in length.
Microtubules grow by the addition of tubulin dimers and decay by their removal. Guanine triphospate (GTP) biases the reaction governing the addition
of tubulin to the helix, making attachment more probable. The tendency for
lengthening and shortening varies with the amount of GTP. Even if GTP concentrations maintain a microtubule’s length, a dynamic equilibrium is present
in which tubulin dimers are constantly added and removed. In general, microtubules are always undergoing a process called treadmilling, meaning that
tubulin dimers are being added and/or dropped.
Microtubules self organize into microtubule arrays during plant cell development. These cortical microtubules (CMT) exhibit polarized ± structure, but
this polarization doesn’t result in - ends gathering at a common center. Instead numerous local interactions between a large population of nearly identical
6
CMTs influences dynamic organization and assembly. CMTs in plants have been
observed to assemble into astral structures, bundles, and parallel cross-linked
sheets. These dynamic CMT arrays act as scaffolds, directing the placement
of structural fibers necessary to the formation of a new cell wall. Movement
and placement of microtubules is governed by interactions within the array as
well as by treadmilling. Arrays with tubules transverse to the elongation axis
are correlated to continued elongation, while arrays with longitudinal or oblique
orientation correlate to the cessation of elongation.
4.2
OR and Microtubules
A three dimensional discrete event simulation of CMT organization is developed
in [20], see also [19]. This model probabilistically assumes that the + end of
the microtubule either grows, shrinks, or pauses depending on the length of
time that it is in one of these three states. The transition from one state
to another is modeled with an exponential distribution, and once a transition
has been triggered, the + end enters the next state. The - end is modeled
similarly but without the pause state. The rate of growth or decay is sampled
from a normal distribution. All probabilistic parameters are tuned to mimic
experimental observations.
The outcome of a random interaction between two CMTs is decided by the
angle of intersection. If an intersecting CMT forms an angle of less than 40◦ with
another CMT (called the ‘barrier’ CMT of the interaction), then the intersecting
CMT aligns with the barrier CMT. If the angle is at least 40◦ , then 30% of the
time the intersecting CMT begins to shorten. The remaining 70% of the time
the intersecting CMT passes through the barrier CMT. Intersecting the cell wall
always forces the microtubule to enter a shortening phase.
Discrete event simulations are commonly used in OR to study stochastic
problems, and it is well known that simple, random decision rules can lead to
complex dynamics. In the case of CMT organization, a well tuned simulation
reproduces the complex observable phenomena found in cells. Moreover, the
model accurately predicts mutant behavior caused by mutations in the MOR1
and FRA2 genes. The trust built by verifying the computational model’s efficacy
against experimental results then adds credence to computational queries about
how a cell’s structure might be affected by altering the interactions.
5
Epidemics
Infectious diseases have plagued humankind throughout history, and modeling
their spread is a stalwart in mathematical and computational biology. Standard
OR techniques have not been commonly used, although (stochastic) differential
equation models and statistical analyses are routine. Indeed, the 2005 review
article [3] states that “No humanitarian emergencies (epidemics, famine, war and
genocide) were addressed in the OR/MS related journals.” However, some OR
work is emerging, and the magnified public safety importance associated with
7
the spread of disease suggests that OR should be considered as the research
community investigates new models. Below we introduce the biology of the
spread of disease and then mention where OR has made an appearance.
5.1
Biological Introduction
Infectious diseases spread from individual to individual via a transmission vector. Disease vectors include viruses, bacteria, fungi, protozoa, and proteins
known as prions. The spread of disease can occur from one cell to another, as is
the case with replicating viruses in multicellular hosts, or between individuals
across vast geographical scales, as is the case with the Human Immunodeficiency
Virus (HIV) and the influenza pandemic of 1918.
Transmission vectors can occur horizontally among individuals of the same
species, vertically from mother to child during gestation or birth, and from one
individual to another of a different species or subspecies. The Norwalk virus
exemplifies the former as it famously victimized masses of cruise ship patrons,
causing symptoms akin to severe food poisoning [22]. Herpes and HIV are easily
transmitted vertically. The bird and swine strains of influenza transmit back
and forth between humans and other species, mutating each time [33]. The
vector may be detrimental to both species or harmless to one, as is the case
with malarial parasites in mosquito serum.
The initiation of spreading pathogens is termed an outbreak. An epidemic
is characterized by affected individuals increasing in number and broadening in
geographical locale. During a pandemic the extent of infection reaches a global
scale. Entities such as the Center for Disease Control and the World Health
Organization analyze quantitative models to aid in detecting and containing
outbreak events.
The nature of the pathogen effects it’s mode of transmission within and
between populations. Rhinovirus (the common cold) may spread via handshake,
whereas E. coli bacteria can wait for hosts on a doorknob or a head of lettuce.
A viable fungal spore can blow into an area on the wind or be revived from
fossilized amber after millions of years. Giardia protozoa are commonly drunk
from wells or ponds after being expelled through the gastrointestinal tract of
another species. Prions, an inanimate vector made solely of proteins, with no
genetic material, can be spread among species by consuming the brains of the
infected.
The spread of an infectious disease may have consequences ranging from
undetectable to lethal. Warts, caused by some types of HPV, are inconvenient.
Common colds may seem innocuous, but they are responsible for roughly $40
billion of lost productivity and medical expenses each year [21]. More severe
infections, or benign ones occurring in immunodeficient individuals, can result
in crippling conditions, e.g. polio and tuberculosis, or birth defects, e.g. rubella.
Some pathogens even gain control of the infected host and alter behavior [26].
8
5.2
OR Applications and Epidemics
The first overlap between traditional OR and the modeling of epidemics seems to
be [28, 29], which uses a model of differential equations to analyze the potential
spread of smallpox. The overlap with OR is the use of a queue to track the
individuals awaiting vaccination as a policy of distribution is employed. From
an operational perspective, the model can be used to assess containment policies
and to guide implementation. While the use of stochastic processes to study
epidemics has a long history, the model in [28] and the expertise of the authors
is a clear bridge between OR and epidemiology. Other works have followed [7,
42, 43]. The review article [43] suggests many research opportunities.
6
Population Reconstruction
Mapping the generational progressions of species helps us identify the evolutionary aspects that underlie numerous areas of biology. However, our ability
to genetically track whole populations is recent, and in most cases we have
what is essentially a brief snap-shot of a long, long genetic evolution. Inferring previous populations from current genetic information importantly informs
evolutionary processes, and OR models have found success in doing so.
6.1
Biological Introduction
Population genetic studies provide information about migrations, inheritance,
mutation rates, evolution, and speciation. If the DNA sequences of a parent and
its offspring are compared so as to locate small genetic alterations, e.g., single
nucleotide polymorphisms (SNPs or snips), then information about mutation
rates from one generation to the next can be gleaned. These parent/offspring
mutation rates help explain the long term evolution of the species. However,
sequence comparison between family members is only possible if the family
structure is known. In wild populations, observing these familial relationships
is often impossible, and the genetic information needs to be analyzed to infer
family relationships.
The persistence of a population may be influenced by its ability to relocate, which, for example, enables the population to adapt to changing resources
and habitat. Some prehistoric human populations were known to have migrated small distances annually along shorelines to ensure adequate marine food
sources. By contrast, the arctic tern boasts the longest known migratory patterns, flying over two million kilometers during its thirty year life span. Plant
seeds and pollen may also be transported large distances. Air and sea currents
are believed to have relocated many plant species across the Atlantic Ocean.
While the geographical range occupied by a species can sometimes be observed,
the migratory and dispersal mechanisms by which this range is obtained may
remain hidden. Reconstructing familial relationships, possibly many generations back, enables our understanding of how populations evolve. Population
reconstruction has facilitated some seed and pollen dispersal studies. Indeed,
9
the reconstruction of family relationships in oak trees made clear that offspring
could emerge at surprisingly long distances from the parent [17].
The health of a population is partially determined by its DNA sequence.
Knowing the primary sequence of DNA reveals much of an organism’s ability to
produce proteins. Transcription of particular DNA sequences leads to the expression of particular proteins that enable cells to perform particular functions.
However, knowledge of the genome and proteome doesn’t reveal which proteins
will be expressed in which cells and at what time. Illuminating the roles of RNA
and regulatory proteins has lead to deeper understanding of the specificity and
timing of gene expression.
Further complicating the gene expression mechanism is the epigenome. Portions of DNA bind to histones, molecules that cause DNA to coil and cluster into
untranscribable chromatin. Other molecules can alter the binding properties of
histones, remodeling the chromatin and allowing transcription to occur. The
quantity and variety of histone-regulating molecules is influenced by an organism’s diet, environment, stress levels, and inheritance. Studies of the combined
effect of inheritance and diet have revealed that nutrient intake by parents or
grandparents can influence epigenetic factors in the offspring. This epigenetic
memory is known to have lead to the emergence of schizophrenia in offspring
conceived during famine times [25]. Reconstructing populations allows deeper
analysis of the modes of acquisition and inheritance of epigenetic factors.
6.2
OR Applications and Population Reconstruction
Genes are sequences of DNA that code for specific traits, and locations on the
genome that distinguish individuals within a taxa are called single nucleotide
polymorphisms (SNPs). A haplotype is a collection of SNPs, and a genotype is a
coupled pair of haplotypes, each of which is donated by the individual’s parents.
The problem of inferring haplotpyes from genotypes is called haplotyping. Each
SNP of a haplotype can have one of two states, and if the SNPs of the two sides
of a genotype agree, then the genotype is homozygous at the location of the
SNP. Otherwise the genotype is heterozygous at that location.
In the presence of heterozygous SNPs, the parental donations are unclear,
and the problem of inferring haplotypes requires an inference rule. A classic inference rule is that of pure parsimony, which asks to identify a smallest
collection of haplotypes that can combine to explain the current generation’s
genotypes. Another inference rule is that of perfect phylogeny, which requires
that the selected haplotypes comply with the structure of a tree to ensure that
each haplotype is a copy of one of the parental haplotypes. Phylogentic design
is itself a problem with substantial overlap with OR [5, 14, 41].
Both inference rules naturally lead to combinatorial optimization problems
that have received attention in the OR community, see [24] as a review. A
related problem is to directly identify siblings and half-siblings from genotypes,
which again naturally leads to combinatorial problems that include Mendelian
laws of inheritance [15]. These combinatorial methods have become accurate
enough to identify flaws in existing datasets [16].
10
References
[1] E. Almaas, E. Gruber, A. Holder, A. Ko, M. MacGillivray, and M. Sawyer.
Robust analysis of metabolic pathways. submitted, 2014.
[2] Eivind Almaas, Zoltan N Oltvai, and Albert-Laszlo Barabasi. The activity
reaction core and plasticity of metabolic networks. PLoS Comput Biol,
1(7):e68, Dec 2005.
[3] N. Altay and W. Green III. OR/MS research in disaster operations management. European Journal of Operations Research, 175:475–493, 2006.
[4] C. B. Anfinsen. Principles that govern the folding of protein chains. Science,
181(4096), 1973.
[5] Vineet Bafna, Bjarni V Halldorsson, Russell Schwartz, Andrew G Clark,
and Sorin Istrail. Haplotypes and informative snp selection algorithms:
don’t block out information. In Proceedings of the seventh annual international conference on Research in computational molecular biology, pages
19–27. ACM, 2003.
[6] Nils Aall Barricelli. Symbiogenetic evolution processes realized by artificial
methods. Methodos, 9(35-36):143–182, 1957.
[7] S Basu and AP Galvani. The transmission and control of xdr tb in south
africa: an operations research and mathematical modelling approach. Epidemiology and infection, 136(12):1585–1598, 2008.
[8] Arieh Ben-Naim. The protein folding problem and its solutions, volume 32.
World Scientific Singapore, 2013.
[9] Nicolas Bonnel and Pierre-Francois Marteau. Lna: Fast protein structural comparison using a laplacian characterization of tertiary structure.
IEEE/ACM Transactions on Computational Biology and Bioinformatics
(TCBB), 9(5):1451–1458, 2012.
[10] Hans J Bremermann. Optimization through evolution and recombination.
Self-organizing systems, pages 93–106, 1962.
[11] Forbes Burkowski, Yuen-Lam Cheung, and Henry Wolkowicz. Efficient use
of semidefinite programming for selection of rotamers in protein conformations. INFORMS J. Computing, 26:748–766, 2013.
[12] A. Caprara, R. Carr, S. Istrail, G. Lancia, and B. Walenz. 1001 optimal
pdb structure alignments: Integer programming methods for finding the
maximum contact map overlap. J Comput Biol, 11(1):27–52, 2004.
[13] G. Caro and M. Dorigo. Antnet: Distributed stigmergetic control for communications networks. Journal of Artificial Intelligence Research, 9:317–
365, 1998.
11
[14] Daniele Catanzaro, Ramamoorthi Ravi, and Russell Schwartz. A mixed
integer linear programming model to reconstruct phylogenies from single
nucleotide polymorphism haplotypes under the maximum parsimony criterion. Algorithms for Molecular Biology, 8(1), 2013.
[15] W. Chaovalitwongse, C. Chou, T. Berger-Wolf, B. DasGupta, S. Sheikh,
M Ashley, and I. Cabellero. New optimization model and algorithm for
sibling reconstruction from genetic markers. INFORMS Journal on Computing, 22(2):1–15, 2009.
[16] C. Chou, Z. Liang, W Chaovalitwongse, T. Berger-Wolf, D. DasGupta,
S. Sheikh, M. Ashley, and I. Caballero. Column-generation framework of
nonlinear similarity model for reconstructing sibling groups. INFORMS
Journal on Computing, 27(1):35–47, 2015.
[17] B. D. Dow and M. V. Ashely. Microsatellite analysis of seed dispersal and
parentage of saplings in bur oak. Molecular Ecology, 5:615–627, October
1996.
[18] J. E. Elsila, M. P. Callahan, D. P. Glavin, J. P. Dworkin, S. K. Noble, and
Jr. Gibson, E. K. Distribution of amino acids in lunar regolith. Conference
Paper JSC-CN-30317, NASA, May 2014.
[19] E. Can Eren, N. Gautam, and R. Dixit. Computer simulation and mathematical models of the noncentrosomal plant cortical microtubule cytoskeleton. Cytoskeleton, 69:144–154, 2012.
[20] E. Can Eren, D. Ram, and N. Gautam. A three-dimensional computer simulation model reveals the mechanisms for self-organization of plant cortical
microtubules into oblique arrays. Molecular Biology, 21:2674–2684, 2010.
[21] A. Fendick, Monto A. S., Nightengale B., and M. Sarnes. The economic burden of non–influenza-related viral respiratory tract infection in the united
states. Archives of Internal Medicine, 163(4):487–494, 2003.
[22] Centers for Disease Control and Prevention. Outbreak updates for international cruise ships. http://www.cdc.gov/nceh/vsp/surv/gilist.htm.
[23] Harvey J Greenberg and Allen G Holder. Computational biology. Encyclopedia of Operations Research and Management Science, pages 225–238,
2013.
[24] D. Gusfield and S. Orzack. Haplotype inference. In S. Aluru, editor, Handbook of Computational Molecular Biology, pages 18–1 – 18–25. CRC Press,
Boca Raton, FL, 2006.
[25] Bastiaan T. Heijmans, Elmar W. Tobi ABD Aryeh D. Stein, Hein Putter, Gerard J. Blauw ABD Ezra S. Susser, P. Eline Slagboom, and L. H.
Lumey. Persistent epigenetic differences associated with prenatal exposure
to famine in humans. PNAS, 105(44):17046–17049, October 2008.
12
[26] D. C. Henne and S. J. Johnson. Zombie fire ant workers: Behavior controlled by decapitating fly parasitoids. Insectes Sociaux, 54(2):150–153,
May 2007.
[27] A. Holder, J. Simon, J. Strauser, J. Taylor, and Y. Shibberu. Dynamic
programming used to align protein structures with a spectrum is robust.
biology, 2(4):1296–1310, 2013.
[28] E. Kaplan, D. Craft, and L. Wein. Analyzing bioterror response logistics:
the case of smallpox. Mathematical Biosciences, 185:33–72, 2003.
[29] EdwardH. Kaplan and LawrenceM. Wein. Decision making for bioterror
preparedness: Examples from smallpox vaccination policy. In MargaretL.
Brandeau, François Sainfort, and WilliamP. Pierskalla, editors, Operations
Research and Health Care, volume 70 of International Series in Operations
Research & Management Science, pages 519–536. Springer US, 2004.
[30] I. Kifer, R. Nussinov, and H. Wolfson. Gossip: A method for fast and
accurate global alignment of protein structures. Bioinformatics, 27(7):925–
932, 2011.
[31] Carleton L Kingsford, Bernard Chazelle, and Mona Singh. Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics, 21(7):1028–1039, 2005.
[32] P. Di Lena, P. Fariselli, L. Margara, M. Vassura, and R. Casadio. Fast
overlapping of protein contact maps by alignment of eigenvectors. Bioinformatics, 26(18):2250–2258, Sep 2010.
[33] W. Ma, K. M. Lager, A. L. Vincent, B. H. Janke, M. R. Gramer, and
J. A. Richt. The role of swine in the generation of novel influenza viruses.
Zoonoses and Public Health, 56:326–337, August 2009.
[34] Stanley L. Miller. A production of amino acids under possible primitive
earth conditions. Science, 117(3046):528–529, May 1953.
[35] Ali Navid and Eivind Almaas. Genome-level transcription data of yersinia
pestis analyzed with a new metabolic constraint-based approach. BMC
Systems Biology, 6(1):150, 2012.
[36] Jeffrey D Orth, Tom M Conrad, Jessica Na, Joshua A Lerman, Hojung
Nam, Adam M Feist, and Bernhard O Palsson. A comprehensive genomescale reconstruction of escherichia coli metabolism–2011. Mol Syst Biol,
7:535, 2011.
[37] J. Papin, N. Price, and B. Palsson. Extreme pathway lengths and reacion participation in genome-scale metabolic networks. Genome Research,
12:1889–1900, 2002.
13
[38] Robert Schuetz, Lars Kuepfer, and Uwe Sauer. Systematic evaluation of
objective functions for predicting intracellular fluxes in escherichia coli. Mol
Syst Biol, 3:119, 2007.
[39] Daniel Segrè, Dennis Vitkup, and George M. Church. Analysis of optimality
in natural and perturbed metabolic networks. Proceedings of the National
Academy of Sciences, 99(23):15112–15117, 11 2002.
[40] Y. Shibberu and A. Holder. A spectral approach to protein structure alignment. IEEE/ACM Trans Comput Biol Bioinform, Feb 2011.
[41] Srinath Sridhar, Fumei Lam, Guy E Blelloch, R Ravi, and Russell Schwartz.
Direct maximum parsimony phylogeny reconstruction from genotype data.
BMC bioinformatics, 8(1):472, 2007.
[42] Pieter Trapman and Martinus Christoffel Jozef Bootsma. A useful relationship between epidemiology and queueing theory: The distribution of the
number of infectives at the moment of the first detection. Mathematical
biosciences, 219(1):15–22, 2009.
[43] Jingyu Zhang, Jennifer E Mason, Brian T Denton, and William P Pierskalla. Applications of operations research to the prevention, detection,
and treatment of disease. Encyclopedia of Operations Research and Management, Wiley, 2011.
14