...Biological background

Shay Sade
Eldad Mataraso
Project advisor: Prof. Yoram Louzoun
Final Project
 About 98% of thymocytes
die during the development
processes in the thymus by
failing either positive
selection or negative
selection.
Positive selection
 Positive selection "selects for" T-cells capable of
interacting with MHC.
 Only those thymocytes that bind the MHC/antigen
complex with adequate affinity will receive a vital "survival
signal."
 The implication of this binding is that all T cells must be
able to recognize self antigens to a certain degree.
Positive selection
 The thymocytes with no affinity for self antigens die
by apoptosis and are engulfed by macrophages.
 This process does not remove thymocytes that may cause
autoimmunity.
 The potentially autoimmune cells are removed by the
process of negative selection
Positive selection
Negative selection
Negative selection removes thymocytes that are
capable of strongly binding with "self" peptides
presented by MHC.
They are again presented with self-antigen in complex
with MHC molecules on antigen-presenting cells
(APCs) such as dendritic cells and macrophages.
Negative selection
 Thymocytes that interact too strongly with the
antigen receive an apoptotic signal that leads to cell
death.
The vast majority of all thymocytes end up dying
during this process.
Negative selection
 Our main purpose is to check whether viruses have
evolved to remove epitopes in order to avoid
Cytotoxic T cell induced cell destruction.
 We will check this possibility by comparing the
similarity of viral to human epitopes in human and
non-human viruses.
Autoimmunity:
The similarity of epitopes
between viruses and their host
may be a reason for some
autoimmune reactions.
The similarity between hosts
and antigens amino acid
sequences may result in the
host immune system attacking
the organs expressing the self
antigens that are similar to the
viral antigens, therefore
inducing autoimmune
reactions.
• Escape
One of the factors that can possibly affect the immune
recognition and help viruses to escape the immune
system is the similarity to self.
This mechanism can allow viruses to survive
undetected by the immune system.
Tolerance can be obtained since the immune system
does not recognize self antigens (beside perhaps some
cases of autoimmune diseases).
A. Comparison between the following results:
 The data resulted from running the algorithm with
epitopes from human host viruses on the human
genome.
 The data resulted from running the algorithm with
non human viral epitopes on the human genome.
B. Comparison between the following results:
 The data resulting from running the algorithm with
random viral segments of 8–10 amino acids on the
human genome.(not done yet)
 The data resulting from running the algorithm with
real viral epitopes on the human genome.
Peptibase
The Peptibase server was developed by our lab and is used to predict epitopes within AA 
sequences.
The analysis performed in Peptibase is conducted on the 31 most frequent HLA alleles, 
taking into account the HLA allele frequency in the human population.
 Given an AA sequence, Peptibase uses 3 cut-offs on a 9-mer AA sliding window to predict
its epitopes:
 Cleavage by the Proteasome
 Binding to TAP
 Binding to MHC-I


For each 9-mer, cleavage, TAP and MHC-I
binding scores are computed.
9-mers passing all three stages are
defined as epitopes.
The problem of finding k similar charachters between
given sequences is a known problem in the field of
sequence comparisons.
A A D D D D S S G G G
A B D C C D S K G H G
6/11
Naïve solution:
Divided the human genome to its all possible sequences in
the length of 9 amino acids .
(we can also divide it into 8 or 10 amino acids).
we wish to find k resemblance between our human nines
and a given viral nine
(we also divide the virus into nines).
Human genome
HFDDSDSSDFFGHHYDSSDDFFDSSDFDSAAY…
Virus genome
FGFDFGFDFGFGDDDDDHHHHHDDDDDDS…
Naïve solution:
This run we will keep only the sequences that have 5 or
more "hits“.
Human genome
AAABBNMGGFDCVGFFDDSXDCCFFCCDDFGG
HFDSSASTSFAGHHYDSSDDFFDSSDFDSAAY…
SSASTSFAG
Virus ninmers
compare: result: 5/9
SHAAASFAG
SAGAYSOG….
save
Naïve solution – complexity :
run of all given viral nines.
if the number of nines in the virus is 'm' and the
number of nines in human is 'n' the complexity of this
algorithm will be : O(m*n), and for one given nine
O(9n)~O(n).
Our algorithm has two parts
a. construction of a library from the human
genome.
b. search for the nearest ninemer in our library to
the given virus ninemer.
Construction the library
run over the human genome in a sliding window form
by alleles and save all of the human ninemers
Genome
ABCDEFGHIJKLM…
Genome ninemers
ABCDEFGHI
BCDEFGHIJ
CDEFGHIJK
.
.

each ninemer is separated to nine fourmers in a round sliding
window form:
EEERTHFFG
EEER
EERT
ERTH
RTHF
THFF
HFFG
FFGE
FGEE
GEEE
Each fourmer belongs to this given ninemer

Why do we need fourmers?
 We will use a function F in order to turn each former
in to a unique number so that different fourmers will
receive a different numbers.

Every number represents a file name.

All ninemers which, share an identical number will be
saved in a file named after the shared fourmer.
 All of the 20 amino acids were numbered from 0 to 19.
 The Function Calculation:
F ( A[i..i  3])  203  A[i]  202  A[i  1]  20  A[i  2]  A[i  3]
why 160000?
 for example F(‘aaac’) = 2
 the ninemer will be saved in maximum 9 files.
 In total we have (20^4) files .
Each file contains :
human index of the ninemer
The pos of the fourmer (0,1…8).
 For example: the file represented by the fourmer
AGHJ will look as fallowing:
file Name: AGBB
AVGAGBBIG
3
AGBBAAGGG
0
Our algorithm has two parts
a. construction of a library from the human
genome.
b. search for the nearest ninemer in our library
to the given virus ninemer.
 cut the virus ninemer in to nine fourmers (as a)
 Example:
Virus: ninemer: a a a b b c d e g
Fst former – inx 1: a a b b
Search 77 files: 76 files of neighbors (3/4) + 1 identical (4/4)
Which files?
 F(aaab) = 0+0+20+1=21
Do the Same formula over 76 neighbors files:
F(?abb)=…19 inx
F(a?bb)=…19 inx
F(aa?b)=…19 inx
F(aab?)=…19 inx
memory
(20^4)*2=160,000 files
in each file 11,300,000*9\160000=635 ninemers
Time
every fourmer has 4*19=76 neighbors.
Every ninmer has 9 fourmer.
Total : 4*19*9 +9 = 693 files.
av e rage of Human Papilloma v irus similarity VS av e rage of
Non Human Papilloma v irus
average of similar epitopes in viruses
80
70
Non Human viruses
Human viruses
60
50
40
30
20
10
0
9 out of 9
8 out of 9
7 out of 9
6 out of 9
5 out of 9
similarity
P-values
9 out of 9
8 out of 9
7 out of 9
6 out of 9
5 out of 9
0.205384
0.091781
0.110829
0.009971
0.010291
Red - Non Human
Blue - Human
Average of Human Papilloma viruses VS Average of non Human Papilloma viruses
Average Similarity between Human Hepatitis and Non Human
Hepatitis
70
60
Non Human
50
40
Human
30
20
10
0
9 from 9
8 from 9
7 form 9
6 form 9
5 form 9
Similarity between Human Hepatitis viruses VS Non human Hepatitis viruses
Red - Non Human
Blue - Human
 It’s seems that non human viruses are more similar to self in
5/9 and 6/9 similarity which is relatively small similarity
 While in 7/9 similarity and more their is a tendency for higher
similarity in human host viruses.
This can be supported biologically by our assumption that
human viruses evolved to remove epitopes in order to avoid
Cytotoxic T cell induced cell destruction
 The results are more clear and tend to support out conclusions
in the Papilloama viruses than in the Hepatitis viruses .
This maybe explained by the size of the dataset that we
worked on in each virus type(bigger in papilloma).
 This days we are waiting for the results of HIV and Herpes
viruses human and non human both.
 After we will finish gathering more viruses results that hopefully
will confirm our current conclusions.
 Prof. Yoram Louzoun
 Royi Itzhak and all Yoram’s lab members
 Ariel Azia Amitai