Comp. Genomics
Recitation 11
Biological networks
Exercise
• A large PPI network G was generated
using high throughput technologies.
• A smaller network H is known in a
different organism.
• Assume that there exists an efficient
algorithm which determines whether there
is a sub-network of G of size ≥k that is
isomorphic to H
Exercise
• Two graphs (PPI networks) are said to be
isomorphic if there is a bijection between
their vertices sets such that f(u) is
adjacent to f(v) iff u is adjacent to v
Exercise
• Show that the same algorithm can solve
the following problem in polynomial time:
• CLIQUE: Is there a clique of size ≥ k in a
given a graph G’ and an integer k
Solution
• Given a graph G’ and a number k, we
create another graph H’ of size k in which
there is an edge between every two
vertices
• This takes polynomial time
• We run the original algorithm on (G’,H’)
and answer the same
Exercise
• Show that the algorithm from the previous
question can also solve the following problem:
• Input:
• A set of elements X=(x1,x2,…,xn),
• A distance function d(xi,xj)=1 if xi and xj are
“close”, 0 otherwise
• Output: Can the set be divided into at most k
clusters such that all the element pairs in every
cluster are close
Solution
• Build a graph |G|, edge (xi,xj) means d(xi,xj)=1
• Use the previous algorithm to find a clique of
maximal size (decision problemoptimization
problem)
• Find the clique and remove it from the graph
• Repeat at most k times. If the result is the
empty graph, answer ‘Yes’. Otherwise answer
‘No’.
שאלה ממועד א' ,תשע"ג
• אלגוריתם color codingמאפשר למצוא מסלולים
באורך kבגרף.
• עבור צביעה כלשהיא של הגרף ב k-צבעים ,תאר
אלגוריתם fixed parameterיעיל ככל האפשר
לחישוב מספר המסלולים הצבעוניים באורך k
בגרף.
• תזכורת :מסלול צבעוני הינו מסלול שכל צמתיו
צבועים בצבעים שונים זה מזה.
פתרון שאלה ממועד א' ,תשע"ג
• תהא נתונה צביעה .cלכל תת קבוצת צבעים S
וקדקוד vנגדיר את ) C(v,Sכמספר המסלולים
שמבקרים בצבעים ב S-ומסתיימים ב.v-
המשך פתרון שאלה ממועד א' ,תשע"ג
• אז איך מקבלים את מס המסלולים הכולל בגרף
בהינתן ) C(v, full setלכל ?v
• עוברים על כל הצמתים וסוכמים לכולם .אחרי זה
מחלקים ב 2-בשל ספירה כפולה.
Homework 3, questions 2-3
• The genomes of each two individuals are
identical in 99.9% of the positions.
• The positions in which they vary are called
Single Nucleotide Polymorphisms (SNPs
for short).
• In each SNP, only two nucleotides are
possible, e.g. A or G. We denote these
two options by 0 and 1.
Homework 3, questions 2-3
• For simplicity, we will deal with one diploid
chromosome, which we term the
"genome" (i.e., two sequences over {0,1}.
Each sequence represents the bases in
the SNPs of one copy of the
chromosome).
Homework 3, questions 2-3
• In reading a human genome we can see
the sum of its two copies: for each SNP
we get 0, 1 or 2 according to the bases in
the two copies of that position.
• For 0,0 we see 0. For 1,1, we see 2, for
0,1 or 1,0 we see 1 (the sequencing does
not distinguish between the copies 0 came
from).
Homework 3, question 2
• When sequencing a human genome, in
some cases, there is some uncertainty in
the reads, and we see several options for
a SNP, e.g. {0,1}, {1,2} or {0,1,2}.
Homework 3, question 2
• The input R to our problem is reads from n
different individuals in a single position.
• The possible reads are the subsets
{0},{1},{2},{0,1},{1,2},{0,1,2}.
• We want to learn the probability of having
1 in that position. We denote this
probability by p.
Homework 3, question 2a
• Write a likelihood function for the
observed reads. That is, write a formula
for L(p; R), where R are the n reads.
• Where Pr(0)=?, Pr(1)=?, Pr(2)=?
Homework 3, question 2b
• Write the Q function for this problem.
Write it explicitly. i.e., don't leave the
expected sign or an exponential sum of
terms.
• Where P(k|Pt,R) = ?
Homework 3, question 2c
• Give the update rule for p.
∑i Pr(ki=0|Ri)·log(1-p)2 +
Pr(ki=1|Ri)·log(2p(1-p)) + Pr(ki=2|Ri)·log(p2)
• We get A·log(p)+B·log(1-p), where
A+B=2, and the update rule is p =
A/(A+B), so:
Homework 3, question 3
• For each population (i.e., Africans or
Europeans) there is a different probability
for 0 or 1 in each SNP.
• We can describe an African-American
genome as two sequences of SNPs; each
sequence is composed of segments of a
European and African genome,
interweaved (due to recombination and
inter-population mating).
Homework 3, question 3
• In the European / African segment, SNP
probabilities are according to the
European / African population.
• There is no dependence between
segments.
Homework 3, question 3
• We wish to model such a genome using a
double HMM.
• Instead of one path, there are two
independent paths (with possible
transitions between them).
• The output is the sum of these paths.
Homework 3, question 3a
• Describe how a dHMM models the
sequencing of the African-American
genome. Plot the states and possible
transitions and describe the output.
• Two states for each SNP, transitions
between consecutive states. Emission
probabilities by African/European
populations.
Homework 3, question 3b
• Suppose we are given the sequence of
SNPs along a chromosome (i.e., a
sequence over {0,1,2}) along with the
dHMM model for that chromosome,
including the transition and emission
probabilities.
Homework 3, question 3b
• Describe an algorithm that calculates the
most probable partition of the two
chromosomal copies into European and
African segments.
• C(i, j1, j2) = the likelihood of the most
likely pair of paths ending at states j1, j2 in
SNP i.
Homework 3, question 3b
• C(i, j1, j2) = max (k1,k2)
C(i-1, k1, k2) δk1 j1 δk2 j2 emit(gi, j1, j2)
• Emit is based on the emission
probabilities of the two states.
• Running time: O(m)
© Copyright 2026 Paperzz