Genomes and environments - eur

Genomes, metagenomes and environments: a perspec3ve Alessandra Carbone
Laboratoire de Genomique des Microorganismes
UMR7238 CNRS-Université Pierre et Marie Curie
The sargasso sea Santa Cruz whale carcass bone Different profiles of gene enrichment in environment specific-­‐func3ons The sargasso sea Santa Cruz whale carcass bone Nutrient poor environment, and its genes for ABC-­‐type transporters dedicated to amino-­‐
acids transport and metabolism are transla3onally op3mized. Microbes live in an abundant food source. Transla3onal op3miza3on is shown in energy produc3on and conversion genes. This difference can reflect func3onal adapta3on of microbes to different environmental condi3ons The sargasso sea Santa Cruz whale carcass bone Nutrient poor environment, and its genes for ABC-­‐type transporters dedicated to amino-­‐
acids transport and metabolism are transla4onally op4mized. Microbes live in an abundant food source. Transla4onal op4miza4on is shown in energy produc3on and conversion genes. This difference can reflect func3onal adapta3on of microbes to different environmental condi3ons Three main results are shown for microbes: 1. Genome coding (codon bias) can be automa3cally studied and a set of op3mized (most biased) genes can be iden3fied for each genome 2. There exists a genome organiza3on based on codon bias that reflects environmental living condi3ons 3. Codon biases reflect metabolic processes important for an organism Are these statements true for microbial communi3es? Codon usage and transla3onal op3miza3on Transfert RNA Preferred codon Codon usage and transla3onal op3miza3on In E.coli and other organisms that reproduce rapidly high tRNA number high expression correlated to codon preference (experimentally) Codon preference and tRNA : Ikemura, 1985; Bennetzen and Hall, 1982; Bulmer, 1987; Gouy and Gau3er, 1982. tRNA and elonga4on rate : Varenne et al., 1984. High expression and codon preference : Grantham et al., 1980; Wada et al., 1990; Sharp and Li, 1987; Sharp et al., 1986; Médigue et al., 1991; Shields and Sharp, 1987; Sharp et al., 1988; Stenico et al., 1994. Towards the iden3fica3on of a genome codon bias A gene becomes a point A genome becomes a space of points (Médigue et al., J. Mol Biol 1991) codons g g = [x1,g x2,g … x64,g] xi,g rela3ve frequency of codon i in g Vector normalisa3on: (xi,g – xi) / σi xi mean of frequencies xi,g σi standard devia3on of xi,g Normalized vectors and PCA are used to “see” in 3D -­‐ organisms in codon space -­‐ genes and func3ons Haemophilus influenzae
Staphylococcus aureus
(Carbone et al. Bioinforma3cs 2003) Bacillus subtilis
Salmonella typhi
Bacillus subtilis
Salmonella typhi
similar geometry but translated and rotated Func3onal organiza3on of genes in codon space Genes that are cons3tu3vely expressed at a high level, most of them are involved in transla3on, protein folding, transcrip3on, DNA binding “The rabbit head”
Genes that maintain a low or intermediary level of expression, but that at 3mes can be expressed at a very high level Genes made mostly by hydrophobic amino-­‐acids, like in membrane proteins. Integra3on host factors, inser3on sequences, genes behaving as mutators when inac3ve, but also genes controlling cell division, outer membrane proteins, catabolic operons (Médigue et al., J. Mol Biol 1991) Ribosomal proteins
ATP binding proteins
IS proteins
NADH proteins
Flagellar biosynthesis proteins
Lipoproteins, membrane proteins,
transport proteins
Proteins codifying for “transla3on”, glycolysis … They are the most biased and the most expressed in E.coli In E.coli
How to define “codon bias” and how to search for highly biased genes in an automa4c manner? We look for a dominant bias (Carbone et al. Bioinforma3cs 2003) g w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 L
number of codons in g
wk
frequency of the kth codon of g in S
frequency of the dominant synonymous codon in S
(Πk=1…11 wk)
1/L Index How to compute S for indexing the genes? • • • • Take S by choosing randomly the 1% of genes in G
Compute weights and index values
Select, as new S, the 1% of genes with the highest index
Repeat the iteration until convergence
(Πk=1…11 wk)
1/L Self-­‐Consistent Codon Index L
number of codons in g
wk
frequency of the kth codon of g in S
frequency of the dominant synonymous codon in S
• • • • Take S by choosing randomly the 1% of genes in G
Compute weights and index values
Select, as new S, the 1% of genes with the highest index
Repeat the iteration until convergence
The algorithm associates to each genome a vector of weights, one for each codon, represen3ng the occurrence of the codon within the most biased set of genes of the genome. Most biased
genes in E.coli
SCCI (algorithm) (E.coli reproduce rapidly) SCCI (algorithm) SCCI (algorithm) SCCI (algorithm) Valida3on for other fast growing organisms SCCI : a universal measure for dominant bias
Strand bias SCCI Borrelia burgdorferi GC3 bias Pseudomonas aeruginosa SCCI The set of biased genes S
• is unique (for the organisms we checked, ~210)
• exists also for organisms that do not have an
evolutionary tendency explained with
translational pressure.
What are these genes doing? • Highly expressed genes (belonging to most species) • Genes with uncharacterised func3on • Genes dependent on specific environmental condi3ons • Stress response genes • Non-­‐orthologous genes If we look at the tail SCCI(g) > µ+σ = 0.42 Towards a space of genomes A genome becomes a point The vector of weights is taken as a signature for the genome (Carbone et al. Mol Biol Evol 2004) Bacteria and archaea in codon space Aeropyrum pernix Pyrococcus Methanobacterium thermoautotr. Thermoplasma vulcanium Aquifex aeolicus Halobacterium sp Treponema pallidum S.solfataricus Helicobacter Chlorobium tepidum Mycoplasma Chlamidiales Agrobacterium tumefaciens Vibrio cholerae Y.pes3s Salmonella E.coli Leptospira Staphilococcus Op3mal growth temperature Don’t mind the colors… AT content Op3mal growth temperature AT content Can we exploit the geometry of the space to derive
functional characteristics of groups of organisms?
Phylogenetically related families :
γ-proteobacteria
Vibrionales/ Alteromonadales Xanthomonadales Enterobacteriales Enterobacteriales Enterobacteriales Pasteurellales Similar physiology and habitat Organisms at small distance: similar physiology and habitat Environmental clusters : on 323 bacteria soil bacteria enterics symbions spore formers small intercellular pathogens small extracellular pathogens (Willenbrock et al. Gen Biol 2006) A recent study that we have conducted on 855 bacterial genomes coming from different environments (sediment, animal-­‐associated habitat, soil, plant-­‐associated habitat and marine-­‐associated habitat) show that there are a few preferred classes of biases for each environment, and that different environments display different codon biases. Can we use this signal to iden3fy the most important metabolic networks in an organism? (Carbone & Madden, J Mol Evol 2005) Metabolic networks and coding E.coli EcoCyc network, P.Karp et al. Pathway Index PI(P) = mean SCCI(g) g∈P Rela4ve Pathway Index RPI(P) = (PI(P)-­‐μM)/σM His3dine+purine+ pyrimidine biosynthesis Non-­‐oxida3ve branch of the pentose phosphate pathway Glycolysis ...and also : L-­‐serine degrada4on (Pizer&Potochny 1964) Ammonia assimila4on Pathway (Reitzer 1986, Helling 1994) TCA cycle aerobic respira3on Helicobacter pylori Thioredoxin (Baker 2001) BioCyc network, P.Karp et al. Riboflavin biosynthesis (Worst 1998) Glycolysis Even genomes that do not grow rapidly might have signals of transla3onal bias Metabolic pathways essential to Mycobacterium
tuberculosis
Essen3al to M.tuberculosis but not to other bacteria Biotin synthesis
Chorismate biosynthesis
Aspargine degradation
Pyridoxal 5’phosphate biosynthesis
Valine degradation
Leucine biosynthesis
ppGpp
(Norman et al. 1994)
(Parish and Stoker 2002)
(Sassetti et al. 2003)
(Sassetti et al. 2003)
(Sassetti et al. 2003)
(Sassetti et al. 2003)
(Primm et al. 2000)
Towards a large scale mapping of metabolic preferences (Carbone & Madden, J Mol Evol 2005) Is there a signature of a microbial community, similarly to the genomic signature seen for single bacterial genomes? (Alaeitabar & Carbone, unpublished, 2011) Enrichment of func3ons within highly expressed genes in metagenomes Roller M et al. Nucl. Acids Res. 2013;nar.gkt673
Enrichment of func3ons within highly expressed genes in metagenomes Roller M et al. Nucl. Acids Res. 2013;nar.gkt673
Enrichment of func3ons within highly expressed genes in metagenomes Roller M et al. Nucl. Acids Res. 2013;nar.gkt673
Conclusions • These findings suggest that microbial communi3es are representable by genomic signatures, specific to different communi3es, as single genomes are. • This might be true for bacterial communi3es but also for eukaryo3c ones. For these lauer, assembly is much harder and our approach does not ask for large con3gs for the analysis. • The community-­‐wide “op3miza3on effect” is an important metagenomic feature with predic4ve power: genes with unknown func3on that are poten3ally important for the community can be iden3fied Conclusions • Likely, we will be able to rank metabolic func3ons and orthologous groups of genes at the system level. Such effort is likely to be important to understand the adapta3on of the en3re metagenome to its par3cular environment. Can we draw a metabolic map for communi3es? Conclusions Analysis at the systemic level should go parallel to an improvement of gene annota4on. We should work at the annota4on level: On a test realized on 51 bacterial genomes containing 159 930 CDS and 189 726 annotated domains (by Pfam) we found 28 107 new domains. (Bernardes et al. submiued, 2013) (Ugarte et al, in prepara3on, 2013) Postdoctoral fellowship
Application form
Conclusions 'N'"(1)$'+(!"$'+6+*(1"13O#%#8(P1Q3'(L(R%N'#(1"(*N'+N%'H(*)(0+*$'%"#(H%$E*&$()&"-$%*"13(0+'2%-$%*"(%"(R'"*/'#(*)(2%))'+'"$($1;*"*/O8(
,%"-'(1""*$1$%*"#(1+'(&#'2($*(%2'"$%)O(0+*$'%"#(%"N*3N'2(%"(133(S%"2#(*)(Q%*3*R%-13(0+*-'##'#(%"(1"(*+R1"%#/4($E'(2'N'3*0/'"$(*)("*N'3(
-*/0&$1$%*"13( $'-E"%T&'#4( $E1$( 3'12( $*( 1--&+1$'( 0+*$'%"( )&"-$%*"( 0+'2%-$%*"#4( %#( #$%33( -+&-%138( PE'( %/0+*N'/'"$( *)( 31+R'( #-13'(
1""*$1$%*"#(H%33(+')%"'(*&+(T&1"$%$1$%N'(&"2'+#$1"2%"R(*)($E'(-*/03';(/*3'-&31+(/1-E%"'+O(*)(3%N%"R(*+R1"%#/#8(
!
P1Q3'(LF(!"#$"%&'(")*+),#*&"-%.)/-&0)1%2%*/%)+1%$&-*%)-%)3-++"#"%&)("%*4".)
5-%(3*4)
67'3")
8,"$-".)
%149"#)*+),#*&"-%.)
:)*+)1%2%*/%)
,#*&"-%.)
!"#$%&'(&)&*%'
LL?AM(
M>(
+&,-."/&01%2%3',),$&-3'
MU>?A(
M?(
7'$1W*1(
4".3.5/%)&'(,)&-.$&32,"'
MAABM(
LV(
6-.5/,),3'7&(0%&,'
L?BAU(
MM(
+%.-&'%-2,32%-&)%3''
?LMM(
MB(
C&"R%(
8&99/&".(*9,3'9,",:%3%&,'
UU>A(
M<(
;-2&(.,0&'/%32.)*2%9&'
=M>L(
??(
Z&S1+O*$1(
X/*'Q*W*1(
<)&3(.1%#('=&)9%5&"#('
B?VL(
<>(
4%92*.32,)%#('1%39.%1,#('
LMU?U(
MB(
Y%03*/*"12%21(
7%&"1%&'%-2,32%-&)%3'
B>LM(
?V(
7%&"1%&'2/,2&'
M?=MM(
BL(
.+O0$*0EO$1(
!%$,).>%,))&'-&2&-3'
ML>>>(
?A(
,$+1/'"*0%3'#(
</&,.1&92*)#('2"%9."-#2#('
L>?>=(
M>(
61+1Q1#13%1(
?"%9/.(.-&3':&$%-&)%3'
BVU=L(
?<(
8&)(.-,))&',-2,"%9&'
?AM>(
<<(
6+*$'*Q1-$'+%1(
@,)%9.0&92,"'5*)."%''
LBV<(
<M(
Z3&#%/%-+*Q%1(
;)#3%(%9".0%#('(%-#2#('
LB?=(
<B(
82&5/*).9.99#3'&#",#3'
MUM>(
UM(
:1-$'+%1(
C%+/%-&$'#(
;-2,".9.99#3'=&,9&)%3'
<MA=(
MA(
82",52.9.99#3'5-,#(.-%&,'
LVV>(
M<(
X-$%"*Q1-$'+%1(
A*9.0&92,"%#('2#0,"9#).3%3'?BC'
?L<?(
?M(
.O1"*Q1-$'+%1(
A%9".9*32%3'&,"#$%-.3&'
B<BU(
B>(
A,2/&-.0",:%0&92,"'3(%2/%%'
LAL>(
MB(
Z&+O1+-E1'*$1(
A,2/&-.9.99#3'(&"%5&)#1%3'
L=>A(
<L(
@&).0&92,"%#('3&)%-&"#('
MA?V(
?A(
X+-E1'1(
[*+1+-E1'*$1(
+&-1%1&2#3'D."&"9/&,#('9"*52.=%)#('
LULM(
MB(
PE1&/1+-E1'*$1( +,-&"9/&,#('3*(0%.3#('6'
M>LA(
?=(
.+'"1+-E1'*$1(
<*".0&9#)#('.$#-%,-3,'?;E''
M=UV(
<=(
!
;)%*<"7).&#'&"(=)&*)&#"'&)&0"),#*97"4)
\'+'4( H'( H1"$( $*( ';03*+'( $E'( #01-'( *)( E*/*3*R*&#( #'T&'"-'#( I$E1$( %#4( #'T&'"-'#( #E1+%"R( 1( -*//*"( 1"-'#$*+K( 1"2( -*"#$+&-$(
We should work at the annota4on level: Angela Falciatore Juliana Bernardes Tina Alaeitabar Ari Ugarte