Transport Inference Parser:
Inferring Transport Reactions from
Protein Data for PGDBs
Thomas J Lee, Peter Karp, AIC BRG
Ian Paulsen consulting
Running the
Transport Inference Parser
1. Run Pathway Tools.
2. Make the organism of interest the current organism.
3. [Run operon predictor].
4. Select Tools/Pathologic.
5. From Pathologic, select Refine/Transport Inference Parser.
6. If running TIP for the first time on the organism, optionally
provide its aerobicity.
7. Wait and observe progress.
8. When complete, Probable Transporter Table window
appears.
9. You may now review and modify the inferred transporters.
Task Description
Infer transport reactions from protein data and
construct them in BioCyc KBs for a variety of
organisms, automatically where possible, with
human assistance where necessary.
Scope
• Run for all Tier 3 KBs (~700 KB)
• To support both automated and user-controlled
operation:
–
–
–
–
Distinguish high- and low-confidence inferences
Automated mode accepts all high-confidence inferences
Track evidence where possible
Provide accept/reject/edit options to user
Output
Construct the following for each inferred transported
substrate:
– Transport-Reaction frame of correct subclass
• Assign compartments – use simple assumptions
– Enzymatic-Reaction frame linking protein to
reaction
Construct Protein-Complexes as required
Sequence of operations
1. Find candidate transporter proteins.
2. Filter out candidates.
3. Identify substrate(s).
4. Assign an energy coupling to transporter.
5. Identify compartment of each substrate.
6. Group subunits of transporter complexes.
7. Construct full compartmental reaction from
substrate and coupling.
8. Construct enzymatic reaction linking each reaction
with protein.
1. Find candidate transporter proteins
•
•
•
•
Input: all protein frames of organism
Output: internal data structure (PARTRANS)
Exclude proteins with long annotations (default: 12
words)
Tokenize the annotation
•
Annotation must contain an indicator. Exs:
“export”, “permease”, “channel”
"transport”,
2. Filter candidates
•
Exclude if annotation matches a list of regular
expressions of counterindicator phrases and patterns
–
•
Ex: “transport associated domain”
Exclude if annotation contains counterindicator word
–
Exs: “regulator”, “nuclear-export”
3. Identify substrate(s)
Search annotation for names of MetaCyc compounds. Details:
Multiple substrates indicate multiple reactions,
symport/antiport pair, or both. Exs:
“cytosine/purines/uracil/thiamine/allantoin permease
family protein”
“magnesium and cobalt transport protein cora, putative”
“sodium:sulfate symporter transmembrane domain protein”
“probable agcs sodium/alanine/glycine symporter”
Exclude non-substrates that look like compounds via an
exception list. Exs: “as” “be” “c” “i”
3. Identify substrate(s) (cont.)
Name canonicalization.
Ex: strip plurals.
Affixed substrates.
Exs: “-transporting” “-specific”
Lookup special ionic forms.
Exs: “cuprous” “ferric” “hydrogen”
Resolve multivalent options using aerobicity.
Exs: “FE” “CR” “MN”
Two-word substrates, substrate classes.
Ex: “amino acid”
4. Assign an energy coupling.
1. Search annotation for prioritized list of indicators. Exs:
("atp-binding" . ATP)
("mfs" . SECONDARY)
("pts" . PTS)
("phosphotransferase" . PTS)
("carrier" . SECONDARY)
("channel" . CHANNEL)
2. Some substrates imply a coupling. Ex:
protoheme => ATP
Absence of indicator => UNKNOWN
Deferred some more sophisticated techniques:
•
•
BLAST vs. E.coli
HMM family identification
5. Identify compartment of each
substrate.
Use keywords to determine compartment of primary
substrate (Exs: “export”, “antiporter”)
Otherwise assume primary substrate is transported
into cell (periplasm => cytoplasm)
Deferred complex compartment analysis:
•
Assume E.coli-like cellular structure
6. Group subunits of transporter
complexes.
Many transporters are systems of several proteins.
These are grouped into complexes
Grouping criteria; all must be met:
–
–
–
Predicted coupling is ATP or PEP
Predicted substrates are identical
Genes of proteins have a common operon (NOTE
requirement on operon availability)
Resulting complex is added to KB under ProteinComplexes.
7. Construct full compartmental reaction
from substrate and coupling.
Determine set of transported substrates for this transporter:
• For SECONDARY coupling:
–
–
–
•
Identify auxiliary substrate providing ion gradient (H+, Na+)
Remove from transported substrate list
Place on side of reaction indicated by symport/antiport clues
For other couplings:
–
Determined previously in substrate analysis
7. Construct full compartmental reaction
from substrate and coupling (cont).
For each transported substrate of this transporter, either
import reaction (from E.coli) or to create new one.
1. Search import KB for reaction with matching substrates
(find-rxn-by-substrates)
– Transported substrate added with indicated compartment
– Auxiliary substrates determined by coupling. Ex:
–
CHANNEL typically have none
–
ATP have ATP/H2O ADP/phosphate
2.
If one reaction is found, import:
(import-reactions trxns src-kb dst-kb …)
3. If multiple reactions found, retain all.
4. Else if reaction is not present in KB, create new rxn
7. Construct full compartmental reaction
from substrate and coupling (cont).
Create new reaction:
•
Create reaction frame, subclass determined by coupling:
–
•
(create-instance-w-generated-id rxn-class)
Add transported and auxiliary substrates to appropriate
sides of reaction
8. Construct enzymatic reaction linking
each reaction with protein.
For each created reaction:
• (add-reactions-to-protein …)
• Added evidence code, history string arguments
• Subordinates new
[(import-reactions) handles import of enzymaticreactions]
Running the
Transport Inference Parser
1. Run Pathway Tools.
2. Make the organism of interest the current organism.
3. [Run operon predictor].
4. Select Tools/Pathologic.
5. From Pathologic, select Refine/Transport Inference Parser.
6. If running TIP for the first time on the organism, optionally
provide its aerobicity.
7. Wait and observe progress.
8. When complete, Probable Transporter Table window
appears.
9. You may now review and modify the inferred transporters.
GUI Overview
1. Window is titled: Probable Transporter Table for Organism
2. Table of inferred transporters is organized into columns:
–
–
–
–
–
Status
Gene
Substrate
Coupling
Reaction / Function
3. Each row contains a transport reaction description:
– Multiple reactions per transport protein are possible
– Sort by Gene (the default) to keep together visually
4. Aggregate pane shows counts by status.
5. Mousing over a reaction shows details in bottom pane.
Notional GUI Example
Status Gene
Substrate Coupling Reaction /
Annotation
Unreviewed
T0059
Ca2+
SECONDARY
Ca+2[c] + H+[p] =
Ca+2[p] + H+[c]
calcium/proton antiporter
Rejected
T3669
phosphate
ATP
H2O + ATP + phosphate[p] =
ADP + 2 phosphate[c]
phosphate transport atpbinding protein
Accepted
T0080
Na+
CHANNEL
Na+[p] = Na+[c]
sodium channel
Reviewing and Editing
• Left-click on a row
– Dialog box appears
• May edit:
– Function (name)
– Energy coupling
• May invoke Reaction Editor on reaction
• May retract reaction
• May update status
Transporter Status
• Accepted:
– Incorporate transporter into PGDB upon save
• Rejected:
– Discard transporter upon save
• Unreviewed:
– Initial value of status
– Change to Accepted to preserve edits
Accept and Reject are undoable
Filtering and Sorting
• Filtering excluded transporters from display:
– Filter low- or high-confidence transporters
– Filter by status
– Filter by number of reactions per substrate
• Sort transporters by:
–
–
–
–
Gene
Energy Coupling
Substrate number/name
Status (e.g., Accepted, Rejected)
Group Operations
TIP permits en masse acceptance or rejection
of remaining predictions being shown:
Edit / Accept all Unreviewed
predictions being shown
Edit / Accept all Unreviewed
predictions being shown
Saving Your Work
The TIP has made in-memory modifications
to the KB; nothing has been saved.
Exit / Save
saves all predictions & edits.
Exit / Cancel
reverts to most recent save.
Multisession Workflow
1. TIP remembers accepted predictions in the KB.
2. TIP remembers rejected transporters in a file
under the organism directory.
3. To continue, re-run TIP and resume session.
4. If you don’t resume (i.e., start from scratch):
–
–
Will not re-predict Accepteds
Will re-predict Rejecteds
© Copyright 2026 Paperzz