CACAO Remote Training

CACAO Training
ASM-JGI 2012
Transferring information to new genomes
Lists of genes
Database
New knowledge
Known functions of
Homologs or subsets
Curation is rate limiting
Literature
Database
Biocurators
(rate limiting)
Datasets
CACAO is growing
2000
schools
1796
students
annotations
1500
1316
1000
871
753
500
309
153
0
1 16
Spring 2010
2 22
Fall 2010
165
97
5
Spring 2011
9
Fall 2011
6
Spring 2012
CACAO biodiversity
250
Annotations
200
150
100
50
0
Spring 2012
CACAO 2
• CACAO changes the job of the professionals
from primary curation to assessment
• Growth in CACAO makes assessment rate
limiting
• Solution: Promote CACAO veterans to help
with assessment
BIOCURATORS
[email protected]
The biocurator training …
What’s in it for you?
–
We hope you will
•
•
•
•
learn how we think about protein function
gain skills that will help your future career
enjoy contributing to a resource used by people all over the world
have fun!
Annotation
Annotation: a note that is made while
reading any form of text
For scientists,
1. Nucleotide level: Where the genes are in
the genome
2. Protein level: What their functions are
From Wikipedia
Annotation
Annotation: a note that is made while
reading any form of text
For scientists,
1. Nucleotide level: Where the genes are in
the genome
2. Protein level: What their functions are
From Wikipedia
Functional Annotation
Annotation: a note that is made while
reading any form of text
Functional Annotation: a note in a
specific format that is made based on
evidence in a peer-reviewed paper about
the attributes of a protein
Functional Annotation
Functional Annotation: a note in a
specific format that is made based on
evidence in a peer-reviewed paper about
the attributes of a protein
• Specific format = GO (Gene Ontology) Annotation
GO (Gene Ontology) Annotations
• 3 aspects (ontologies) for
describing protein attributes:
1. Biological Process
2. Molecular Function
3. Cellular Component
• Controlled vocabulary
– Everyone uses the same terms
– Terms have 7 digit IDs that computers can
understand
• Relationships between terms
GO:0005886
Molecular Function
• activities or “jobs” of a gene product
GO:0004347 hexokinase activity
GO:0016301 Kinase activity
From PMID:9341134, rndsystems.com
Biological Process
• a commonly recognized series of events
GO:0009405
pathogenesis
GO:0051301 cell division
GO:0006351 transcription,
DNA dependent
From ridge.icu.ac.jp, edtech.clas.pdx.edu, scielosp.org
Cellular Component
• where a gene product acts
GO:0005739 mitochondrion
GO:0009274
peptidoglycan-based
cell wall
GO:0005840
ribosome
From visualphotos.com, epmm.group.shef.ac.uk, http://www.cellsignal.com/products/2415.html
Where can you search for GO terms?
GONUTS (gowiki.tamu.edu)
- http://gowiki.tamu.edu
- http://www.ebi.ac.uk/QuickGO
- http://amigo.geneontology.org
What do you actually need once you
have found the correct term?
GO:0004713
Functional Annotation
Functional Annotation: a note in a
specific format that is made based on
evidence in a peer-reviewed paper about
the attributes of a protein
• Specific format = GO (Gene Ontology) Annotation
• Peer-reviewed paper
Finding a scientific paper
• Has to be a scientific paper with experimental data
in it. (Anything else is a valid reason to challenge!!)
• No review articles, no books, no textbooks, no
wikipedia articles, no class notes…
• You will need the PMID number
22110029
Functional Annotation
Functional Annotation: a note in a
specific format that is made based on
evidence in a peer-reviewed paper about
the attributes of a protein
• Specific format = GO (Gene Ontology) Annotation
• Peer-reviewed paper
• Protein
What can you annotate? Proteins.
• PubMed for papers on a specific topic or protein or GO term
• Search UniProt for something interesting (i.e. allergen) or a
protein of interest (i.e. PcnB)
• Check the references in the paper you are currently reading
No matter what, you will need to find the protein’s accession on UniProt
(http://uniprot.org)
Use that accession to make a page for that protein on GONUTS
(http://gowiki.tamu.edu)
Add your GO annotations to the protein’s page on GONUTS
Why do you need an accession from
UniProt (http://www.uniprot.org)?
*
1. UniProt is not editable by the community, but GONUTS is.
2. GONUTS can make a page that has the annotations from UniProt for
any protein using it’s UniProt accession.
3. Correct & complete annotations at the end of the competition will be
submitted back to UniProt.
How do you make a new protein page
in GONUTS?
2
1
• GoPageMaker will:
 Check if the page exists in GONUTS & take you there if it does.
 Make a page if it does not exist in GONUTS already & pull all of the
annotations from UniProt into a table that you can edit.
• Make as many protein pages as you would like!
Functional Annotation
Functional Annotation: a note in a
specific format that is made based on
evidence in a peer-reviewed paper about
the attributes of a protein
• Specific format = GO (Gene Ontology) Annotation
• Peer-reviewed paper
• Protein
Form for your annotation
(when you edit the table)
4 REQUIRED parts of EVERY GO annotation
GO
Reference
Evidence
code
Notes (about evidence)
Summary of Evidence Codes for CACAO
Evidence codes describe the type of
work or analysis done by the authors
•
•
•
•
•
•
•
IDA: Inferred from Direct Assay
IMP: Inferred from Mutant Phenotype
IGI: Inferred from Genetic Interaction
ISO: Inferred from Sequence Orthology
ISA: Inferred from Sequence Alignment
ISM: Inferred from Sequence Model
IGC: Inferred from Genomic Context
If it’s not one of these 7, your annotation is incorrect!!!
http://gowiki.tamu.edu/wiki/index.php/evidence_codes
Functional Annotation
Functional Annotation: a note in a
specific format that is made based on
evidence in a peer-reviewed paper about
the attributes of a protein
•
•
•
•
Specific format = GO (Gene Ontology) Annotation
Peer-reviewed paper
Protein
Evidence code
4 REQUIRED parts of EVERY GO annotation
GO
Reference
Evidence
code
Notes (about evidence)
2 other parts that may rarely be
required…
Qualifier
With/From
How is CACAO scored? Rounds
• Points for a complete AND correct annotation (normally 1
week/round, today = 25 mins)
• 4 necessary parts
• May be additional parts
• NOTE: We will take away points if the annotation is not correct when assessed
by an experienced CACAO biocurator
• Challenges are used to steal points for incorrect &/or
incomplete annotations (normally 1 week/round, today = 20
mins)
• Identify a problem
• Suggest correct alternative
• Refinements can be entered by any team (during any challenge
week)
Scoreboard & Challenges
http://gowiki.tamu.edu/wiki/index.php/Category:A
SM_JGI_challenge
Team & Individual Pages
challenge
Challenges
1. Enter the reason for your
challenge here.
- (i.e. What’s wrong)
2. Provide the fix(es) for it.
Annotation discussion (aka argument)
• UniProt – http://uniprot.org
– Find your protein(s) here (UniProt accession required)
• PubMed – http://pubmed.org
– Find your papers about the protein’s attributes (molecular function,
biological process, cellular component)
• GONUTS – http://gowiki.tamu.edu
– Search for GO terms
– Make page for your protein on GONUTS (using UniProt accession)
– Add your annotation to the protein’s Annotation table during first
(Annotation) week of any round
– Review and challenge competitors’ annotations during the second
(challenge) week of any round
ASM-JGI Competition!
• You now have 25 mins to:
– Use the assigned paper for your group and …
– Find the correct UniProt accession
– Make the page for the protein on GONUTS
– Make at least one annotation
• You will have 20 mins to challenge other
teams’ annotations
– What fields are wrong & why?!