RNAseq Workflows and Standards iPlant/USDA

iPlant/USDA-ARS Big Data workshop on
RNAseq Workflows and Standards
Dec 7-9, 2014 at CSHL
Organizers
Jason Williams, Kapeel Chougule,
Dewayne Shoemaker, Doreen Ware
Goals
• Foster competency in using iPlant including knowing
the general features of the platforms, how they work
together, how to select and use platforms/tools, and
knowledge of platform’s expectations and limitations
• Support trainers with domain knowledge at the most
basic level of analysing RNA-Seq data
• Cultivate support and commitment to a sustainable
network that provides access to shared training
materials, help, and recommendations for future
training and workshop.
Agenda
• 2 pre-workshop webinar- forming teams and
getting started with RNA seq assembly
• At the workshop– working with data and metadata in iPlant (iDrop,
iCommands, DE, Sharing, Searching)
– working with the Discovery Environment
(apps/workflows/analyses) and Atmosphere (
image launching, visualizing and downstream
analysis)
– an introduction to XSEDE resources and iPlant
support mechanism
Total participants : 23
Data
Insect
Plant
Parasite
Fish
Animal
Tools Utilized
Data Pre-Processing
FastX suite: trimmer, quality trimming, quality filter
HTProcessPipeline: includes Trimmomatic for quality trimming
Digital Normalization
Trinity Normalization
khmer normalization suite (based on DigiNorm)
de novo Transcriptome Assembly
Trinity
SOAPdenovo-Trans
Assembly Quality
CEGMA
Contig statistics
Conserved Domains
Transdecoder
BLAST
Create a BLAST database
Mapping Reads to de novo Assembly
Bowtie (trimmed reads agst the
SOAPtrans assembly)
Bowtie Build and Map
SAM to sorted BAM->indexed BAM
Tools Utilized
khmer genie
R studio
Trinotate – blast, signalP, tmHMM, HMMER, RNAMMER
Survey done after the workshop
Before the workshop how would you rate your level of
bioinformatics skills?
n=20
12
10
8
6
4
2
0
Beginner
Advanced
Intermediate
How helpful was it to be working in teams
n=20
12
10
8
6
4
2
0
Not helpful at all
Slightly helpful
Neutral/No opinion
Helpful
Very Helpful
How helpful were the webinars that preceded the workshop
14
12
10
8
6
4
2
0
Not helpful at all Slightly helpful
Neutral/No
opinion
Helpful
Very Helpful
How did the workshop impact on your ability to perform
bioinformatics analyses?
12
10
8
6
4
2
0
Had no impact
Improved my ability slightly
Improved my ability
immensely
How prepared are you to help others use iPlant in the following ways
n=20
Create Atmosphere image
Move data into/out of Atmosphere
Connect to Atmosphere image
Launch Atmosphere image
Create workflow in DE
Create App in DE
Add/Modify App in DE
Run an anlyses in the Discovery Environment
Input and manage metadata
Run analyses in iPlant
Share data within iPlant
Upload data (iCommands)
Upload data (iDrop/DE)
0
Unprepared
5
Somewhat prepared
10
Prepaired
15
Very prepared
20
25
Barriers to using iPlant (Indicate how much you agree with the statement)
n=19
I can't get publishable results using iPlant resources
iPlant support staff are not reliable/quick
iPlant documentation and manuals are not helpful
iPlant services are not reliable
I find it difficulty to use iPlant tools
iPlant tools are not user friendly
0
N/A
Strongly Agree
2
4
Agree
6
Neutral
8
Disagree
10
12
14
Strongly Disagree
16
18
20
Outcomes from workshop
Groups
• Group1 – Tools and workflows -Brenda Oppert, Anna
Bennett, Jamie Strange, Brian Rector Neil Sanscrainte and
George Yocum
• Group2- Integration of new tools-Christopher Childers,
Guangtu Gao , Geoff Waldbieser, Monica Poelchau, Zaid Abdo
• Group3- Metadata Standards- Michelle Graham, Joe Hull, Pia
Olafson, Lucy Stewart, Judy Chen, Deven See, and Brad
Coates (Lead)
• Group4-Adoption (training and organizing webinars)- Anja
Baldo, Stephen White, Linda Ballard, Brenda Oppert, Kristina
Friesen, Pia Olafson
Group1-Prioritize tool and workflows
Brenda Oppert, Anna Bennett, Jamie Strange, Brian Rector Neil Sanscrainte and George Yocum
1. upload and assemble an RNA and genomic data set.
2. process data through annotation and post assembly quality
control
3. Downstream: Report to “integration” team.
4. Develop a mechanism for other ARS researchers and
collaborators to suggest improvements.
Group2-Tool integration‘Install me!’
Christopher Childers, Guangtu Gao , Geoff Waldbieser, Monica Poelchau,
Zaid Abdo
1. Communication with the ‘Application’ group: create
template with required metadata for requested applications
(program name, version, URL, application, justification, test
input files).
1. Develop workflow for program installation (to easily train
other developers) (which includes pushing the finished apps
to the Tester group; including sufficient
documentation/readmes for the resulting app)
Group3-Meta Data Standards Group
Members Michelle Graham, Joe Hull, Pia Olafson, Lucy Stewart, Judy Chen, Deven See,
and Brad Coates (Lead
•
Emphasis was on following NSF standards and NCBI annotation descriptors.
•
Across project areas (insect, plant, animal) collaborate with iPlant and Big Data
centers to implement standard associations with data uploads & DOIs.
•
Database integration of meta data, sequence and assembly information into
searchable database to ease retrieval, find/foster collaborations, and
highlight ARS outputs.
.
Group4-Adoption (training and organizing
webinars)
Anja Baldo, Stephen White, Linda Ballard, Brenda Oppert, Kristina Friesen, Pia Olafson
1. Identify holes in existing material; differences in standard iPlant vs. USDA practices
2. Webinars could go on USDA youtube channel
3. Announce releases of tools, images, workflows, via webinars (other tools that are widely successful)
4. Tie trainings into IDPs
5. Downloadable materials (tutorials, videos, etc) at ARS website, SharePoint, or iPlant location
6. Assess adoption
i. Track training material downloads
ii. Track iPlant signups
iii. Ask what they hope to get out of the training when they sign up, make it brief
iv. Ask after a few months if expectations were met
v. Track USDA tags in user forum
7. Pre-workshop homework (successful component of current gathering)
i. needs to be clear, easy to complete
ii. Sub-groups of attendees to foster participation in pre-course materials
8. What about locations with poor connections? And other barriers to adoption.