Use ORF finder to predict the coding region for gene you got in the

Use ORF finder to predict the coding region for gene you got in the
homework 2.
(a) Which open reading frame is the correct one?
Answer: Frame +3, from 66 to 2045 ,length 1980.
(b) How many residues does protein have? Give the sequence in FASTA
format.
Answer:659
MLKKIFYGFIVLFLIVVGLLAILIAQVWVSTNKDIAKIKDYRPSVASQILDRKGRLIANIYDKEFRFYAR
FEEIPPRFIESLLAVEDTLFFEHGGINLDAIMRAMIKNAKSGRYTEGGSTITQQLVKNMVLTREKTLTRK
LKEAIISIRIEKVLSKEEILERYLNQTFFGHGYYGVKTASLGYFKKPLDKLTLKEITMLVALPRAPSFYD
PTKNLEFSLSRANDILRRLYSLGWISSNELKGALNEVPIIYNQTSTQNIAPYVVDEVLKQLDQLDGLKTQ
GYTIKLTIDLDYQRLALESLRFGHQKILEKIAKEKPKTNASNEDEDNLNASMIVTDTSTGKILALVGGID
YKKSAFNRATQAKRQFGSAIKPFVYQIAFDNGYSTTSKIPDTARNFENGNYSKNSEQNHAWHPSNYSRKF
LGLVTLQEALSHSLNLATINLSDQLGFEKIYQSLSDMGFKNLPKDLSIVLGSFAISPIEAAEKYSLFSNY
GTMLKPMLIESITDQQNDVKTFTPMETKKITSKEQAFLTLSVLMNAVENGTGSLARIKGLEIAGKTGSSN
NNIDAWFIGFTPTLQSVIWFGRDDNTPIGKGATGGVVSAPVYSYFMRNILAIEPSLKRKFDVPKGLRKEI
VDKIPYYSTPNSITPTPQKTDDGEEPLLF
(c) Use blast to search the nr database. Set E value to 0.0000001 with PAM70
matrix.
Answer:
gi|21262171|dbj|BAB96754.1|
penicillin binding protein [Hel...
1653
0.0
gi|21262169|dbj|BAB96753.1|
penicillin binding protein [Hel...
1636
0.0
gi|21262167|dbj|BAB96752.1|
penicillin binding protein [Hel...
1634
0.0
gi|15645222|ref|NP_207392.1|
penicillin-binding protein 1A ...
1623
0.0
gi|19073477|gb|AAL84835.1|AF479618_1
penicillin-binding pro...
1618
0.0
gi|19073475|gb|AAL84834.1|AF479617_1
penicillin-binding pro...
1613
0.0
gi|13272374|gb|AAK17126.1|AF315503_1
PBP1 [Helicobacter pyl...
1609
0.0
gi|15611611|ref|NP_223262.1|
PENICILLIN-BINDING PROTEIN [He...
1608
0.0
gi|15791870|ref|NP_281693.1|
penicillin-binding protein [Ca...
683
0.0
gi|17232816|ref|NP_489364.1|
penicillin-binding protein [No...
322
3e-86
(d) Use blast 2 sequence to compare number 1 and number 10 hits.
Answer:
Sequence 1 lcl|seq_1 Length 659 (1 .. 659)
Sequence 2 lcl|seq_2 Length 643 (1 .. 643)
2
1
NOTE:The statistics (bitscore and expect value) is calculated based on the size of nr
database
Score =
334 bits (788), Expect = 6e-90
Identities = 216/607 (35%), Positives = 327/607 (53%), Gaps = 84/607 (13%)
Query: 38
IKDYRPSVASQILDRKGRLIANIYDKEFRFYARFEEIPPRFIESLLAVEDTLFFEHGGIN 97
I+++ P+ ++ I D KGRL+A+I
R
I P
+ LA EDT F+ H GI+
Sbjct: 65
IRNFVPAETTYIYDIKGRLLASIHGEVNREVVPLKKISPHLKRAVLASEDTSFYHHHGID 124
Query: 98
LDAIMRAMIKNAKSGRYTEGGSTITQQLVKNMVLTREKTLTRKLKEAIISIRIEKVLSKE 157
I RA++ N +G
EGGST+T QLVKN+ L+ E+T TRK+ EA+++IR+E VLSK+
Sbjct: 125 PVGIGRALVVNLEAGEVQEGGSTLTMQLVKNLFLSQERTFTRKIAEAVLAIRLEQVLSKD 184
Query: 158 EILERYLNQTFFGHGYYGVKTASLGYFKKPLDKLTLKEITMLVALPRAPSFYDPTKNLEF 217
EIL+ YLNQ + G
YGV
A+
YF K
L L E +M+
L
AP
+ P
NLE
Sbjct: 185 EILDLYLNQVYWGDNNYGVQMAARYYFNKSAANLNLAESAMMAGLLPAPENFSPFINLEL 244
Query: 218 SLSRANDILRRLYSLGWISSNELKGALNEVPIIYNQTSTQNI------------APYVVD 265
+
+
++L R+
L WIS
+
YNQ+
Q I
APY+ +
Sbjct: 245 AKQKQKEVLLRMLELNWISQQD-----------YNQALKQKIQLNNKRTLEGSAAPYITN 293
Query: 266 EVLKQL------DQL--DGLKTQGYTIKLTIDLDYQRLALESLRFGHQKILEKIAKEKPK 317
V
+L
D L
GL+ Q
TID
+Q +A
+ +
HQ++
K
Sbjct: 294 SVAQELVRRFGRDVLLKGGLRVQT-----TIDAQFQMMANKTVKRWHQRL-------KRQ 341
Query: 318 TNASNEDEDNLNASMIVTDTSTGKILALVGGIDYKKSAFNRATQAKRQFGSAIKPFVYQI 377
+N+
+++
D
T
I ALVGG+D K S FNRATQA+RQ GSA KPFVY
Sbjct: 342 GLRNNQ------IALVAIDPRTHFIKALVGGVDAKTSEFNRATQARRQPGSAFKPFVYYA 395
Query: 378 AFDNG-YSTTSKIPDTARNFENGN--YSKNSEQNHAWHPSNYSRKFLGLVTLQEALSHSL 434
AF +G ++
+ + DT
+ +GN
YS
P NY
F+G + +
ALS S
Sbjct: 396 AFASGKFTPNTIVQDTPVRYRDGNGWYS----------PRNYDNSFMGAIPIRTALSLSR 445
Query: 435 NLATINLSDQLGFEKIYQSLSDMGFKNLPKD--LSIVLGSFAISPIEAAEKYSLFSNYGT 492
N+ +I L
G
++ ++
+G
+ P +
S+ LG+
++P+E A
Y+
+NYG
Sbjct: 446 NIPAIKLGKAVGLNRVIETSRTLGITS-PMEPVTSLPLGAIGVTPVEMASAYATLANYGW 504
Query: 493 MLKPMLIESITDQQNDVKTFTPMETKKITSKEQAFLTLS---------VLMNAVENGTGS 543
LI
++D
+V
I +
L L+
V+ + + NGTG
Sbjct: 505 QSPTTLIMRVSDSNGNV---------LIDNTPKPRLVLNPWASASVIDVMQSVINNGTGR 555
Query: 544 LARIKGLEIAGKTGSSNNNIDAWFIGFTPTLQSVIWFGRDDNTPIGKGATGGVVSAPVYS 603
A I G
AGKTG++++
D WF+G
P L + IW GRDDN
+G GATGG
AP+
Sbjct: 556 AAAI-GRPAAGKTGTTSSERDVWFVGTVPQLTTAIWVGRDDNKRLGYGATGGGTVAPIWR 614
Query: 604 YFMRNIL 610
FM N L
Sbjct: 615 DFMQNAL 621
CPU time:
Lambda
0.15 user secs.
K
H
0.01 sys. secs
0.16 total secs.
0.332
0.230
0.987
Gapped
Lambda
0.291
K
H
0.0910
0.410
Matrix: PAM70
Gap Penalties: Existence: 10, Extension: 1
Number of Hits to DB: 4681
Number of Sequences: 0
Number of extensions: 668
Number of successful extensions: 7
Number of sequences better than 10.0: 1
Number of HSP's better than 10.0 without gapping: 1
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 1
length of query: 659
length of database: 442,539,632
effective HSP length: 49
effective length of query: 610
effective length of database: 442,539,583
effective search space: 269949145630
effective search space used: 269949145630
T: 9
A: 40
X1: 15 ( 7.2 bits)
X2: 119 (50.0 bits)
X3: 119 (50.0 bits)
S1: 41 (21.7 bits)
S2: 75 (34.9 bits)
(e) Search for conserved domains of this gene in Pfa m database. How many
domains can you find? (1) Give the name, Pfam number and consensus
sequence of the conserved domain. (2) Show Domain Relatives. (3) How
many similar domain architectures can you find?
Answer:
(1)
gnl|CDD|1466 pfam00912, Transglycosyl, Transglycosylase. The penicillin-bin... 233 2e-62
gnl|CDD|7821 pfam00905, Transpeptidase, Penicillin binding protein transpep... 122 5e-29
gnl|CDD|1466, pfam00912, Transglycosyl, Transglycosylase. The penicillin-binding
proteins are bifunctional proteins consisting of transglycosylase and transpeptidase in
the N- and C-terminus respectively.
CD-Length = 169 residues, 100.0% aligned
Score =
233 bits (595), Expect = 2e-62
Query:
47
SQILDRKGRLIANIYDKEFRFYARFEEIPPRFIESLLAVEDTLFFEHGGINLDAIMRAMI
106
Sbjct:
1
MKIYDADGELIGEFGEERRR-PVPLNDIPPNLKEALIASEDRRFYEHHGIDPKGIGRAAL
59
Query:
107
KNAKSGRYTEGGSTITQQLVKNMVLTREKTLTRKLKEAIISIRIEKVLSKEEILERYLNQ
166
Sbjct:
60
ANLKSGGVVQGASTITQQLAKNLFLSHERTFTRKANEAWLALQLEQVYSKDEILELYLNK
119
Query:
167
TFFGHGYYGVKTASLGYFKKPLDKLTLKEITMLVALPRAPSFYDPTKNLE
216
Sbjct:
120
IYFGNGVYGIEAAAQYYFGKPAKDLTLAEAALLAGLPKAPSRYNPVRNPE
169
gnl|CDD|7821, pfam00905, Transpeptidase, Penicillin binding protein
transpeptidase domain. The active site serine is conserved in all members of this
family.
CD-Length = 327 residues,
Score =
99.7% aligned
122 bits (306), Expect = 5e-29
Query:
305
QKILEKIAKEKPKTNASNEDEDNLNASMIVTDTSTGKILALVGGIDYKKSAF--------
356
Sbjct:
1
SKLQKAAERALDKAVAKYKAK---RGAAVVMDPKTGEVLAMASSPSYDPNLFVGGENEPL
57
Query:
357
-NRATQAKRQFGSAIKPFVYQIAFDNGYSTTSKIPDTARNFENGNYSKNSEQNHAWHPSN
415
Sbjct:
58
RNRAVTGVYEPGSTFKPITAAAALENGVIK----PNEVLDDSGGIYQGGG----STIKYD
109
Query:
416
YSRKFLGLVTLQEALSHSLNLATINLSDQLGFEKIYQSLSDMGF----------------
459
Sbjct:
110
WRRGGHGTITLRQALEKSSNTGFVKLALKLGPDKLRDYLKRFGLGVKTGIDLPGEAAGSL
169
Query:
460
KNLPKDLSIVLGSFAI------SPIEAAEKYSLFSNYGTMLKPMLIESITDQQNDVKTFT
513
Sbjct:
170
PPSNKRLLADTATSAFGQGDTVTPLQMAQAYATIANGGTLVQPHLVKSIVDPNGQIDG--
227
Query:
514
PMETKKITSKEQAFLTLSVLMNAVENGTGSLARIKGLEIAGKTGSSN---------NNID
564
Sbjct:
228
TPVSKETISKTVSEMLQAGLEGVVGGGTGQTAAVPGYDVAGKTGTAQKAGKGGGYTNTYN
287
Query:
565
AWFIGFTPTLQSVIWFGRDDNTPIGKGATGGVVSAPVYS
603
Sbjct:
288
AWFVGYAPADNPKYAVAVVIDNPQDKGGYGGAVAAPIFK
326
(2)
(3)173 similar domain architectures