Automated Peptide Mapping for Quantitative

Issue 5
FEATURED ARTICLE
Sally Deeb, Ph.D., Scientific Account Manager, Genedata GmbH, Munich, Germany
Automated Peptide Mapping for Quantitative Comparison of Biotherapeutics
The improved speed of peptide mapping has greatly expanded its utility in industrial settings. Genedata
Expressionist® automates the peptide mapping process and standardizes routine high-throughput comparability studies. Workflows compute Critical Quality Attributes (CQAs) and enable better decisionmaking for process monitoring and stress testing. Complete automation can be combined with user validation of results based on a priori knowledge.
Background
Automated Workflow Concept
Peptide mapping is an essential analytical technique for characterizing the primary structure of protein-based therapeutics. It is generally used for amino acid sequence confirmation,
connectivity assessment, and characterization of post-translational modifications (PTMs). The high sensitivity of peptide
mapping to the smallest covalent structural changes of a
protein has also enabled its usage as a valuable ‘fingerprint’
for comparative analysis. In bioprocess development, for instance, peptide maps are employed for lot-to-lot identity testing. Likewise, peptide mapping is considered a vital step in
comparing an innovator and a biosimilar.
Genedata Expressionist ensures high quality, efficient and
reliable analysis over time, thanks to automated workflows
(Figure 1). The basic concept involves two main phases. The
first comprises setting up a customized workflow and saving
it for later use. This is normally done just once by an expert at
the beginning of a project. The second phase is the execution
phase, which simply involves loading the raw data and running
the saved workflow with a one-click operation. When data processing is complete, Genedata Expressionist provides the user
with immediate browsing and downstream analysis capabilities including statistical tests, visual verification of the results,
and generation of customized reports.
Recent developments in separation techniques, MS instrumentation, and sample preparation procedures have allowed
scientists to implement peptide mapping in their routine biotherapeutics characterization pipelines. While generating
peptide maps has become easier, comparing them remains a
tedious process. In addition, data collected over time shows
variability in chromatograms due to shifts in elution times.
With the upscaling of these experiments, analysts need to reliably identify and quantify peptides in an automated fashion
as well as perform comparability studies to report out-oftolerance CQAs.
Genedata Expressionist addresses these issues by providing
an enterprise software platform for optimizing the analysis of
peptide maps, automating the process, and performing subsequent comparative analysis.
The process of setting up a workflow starts by connecting
different activities which are suited for the specific application. This is usually followed by optimizing certain parameter
settings that meet the needs of the data type being analyzed.
Settings optimization is particularly important for noise reduction steps where the requirements may differ for different
data types or even the same type of data acquired on different
instruments. Genedata Expressionist offers optimized procedures to obtain good quality results from MS instruments
from different vendors. Following optimization, the customized workflow is saved for future applications. In addition,
users granted manager roles have the option of locking down
parameter settings. This introduces the concept of ‘approved’
workflows which can be shared among lab members. Consequently, ‘approved’ workflows allow standardizing down-
Workflow Set-up
Execution
Create
Optimize
Save
Run
Genedata Expressionist Refiner MS
10 8
5·10 7
2·10 7
6
10 7
7
Review
8
5·10 6
9
2·10 6
10
10 6
11
12
5·10 5
13
2·10 5
14
10 5
15
5·10 4
17
18
19
5·10 3
RT
10 4
16
RT
2·10
4
time. After data cleaning and
alignment, the objective of
the second block of activities shown in Figure 2A is to
detect peaks (centers and
boundaries) and group isotopic clusters to be submitted
to search algorithms.
20
21
2·10 3
22
10 3
23
500
Analyze
24
200
25
The peptide mapping activity
is specific to the application
and this is where all calculaValidate
tions related to peptide identification and quantification
are performed. In the genReport
eral settings tab, tolerances
m/z
can be configured. It is also
Workflow is built by
Activity settings are
Workflow is saved
possible to specify whether
connecting activities
tuned according to
for later use;
fragmentation spectra are
suited for a specific
data specifics, e.g.
activity settings
required for identification, or
application
noise reduction
can be locked
matching by mass only is sufFigure 1: Schematic representation of an automated workflow concept
ficient. The option of manual
review of results can be actistream data analysis in labs working in GxP environments.
vated here. If applied, a pop-up window with a list of all pepAutomation ensures consistency and efficiency in running
tides identified is triggered before the final execution of the
standardized workflows, especially if comparisons need to be
activity. This window allows the user to manually accept or redone over time. After running the saved workflow in Genedata
ject peptides based on a priori knowledge or manual inspecExpressionist, it is possible to perform a variety of activities
tion of the data (Table 1 ).
such as comparative analysis. Figure 1 illustrates a typical
strategy following execution of a peptide mapping workflow
The list of identified glycopeptides includes a candidate where
which starts with manually reviewing results, followed by
the glycan identified does not fit to the expected pattern of
performing statistical tests, validating the corresponding canclassical glycans on an antibody (G0, G0F, and G1F). Impordidates using visualizers, and finally generating a customized
tantly, it has the lowest score compared to the other glycopepreport.
tides. In this situation the glycosylation was considered to be
a false positive and was rejected from the final results list as
Peptide Mapping Automated Workflow
shown in Table 1.
The detailed components of a peptide mapping workflow are
A
B
shown in Figure 2A. The peptide mapping activity is the core
activity of the workflow where all settings specific to the search
can be configured. The workflow also includes several steps of
signal pre-treatment prior to the peptide mapping activity. In
the case described in Figure 2, importing raw files is followed
by data cleaning to eliminate noise characteristic of MS data.
Genedata Expressionist offers the flexibility to optimize this
step for the specific instruments employed for the analyses.
26
100
27
50
28
20
29
10
30
Genedata
Expressionist Refiner MS
5
31
1200
1150
950
1100
900
1050
850
800
750
1000
7
5·10
1
700
650
600
550
500
450
400
350
300
200
250
32
8
10
2
m/z
Created by 'Dominik Mertens (dmertens)' on Aug 12, 2015 10:52:26 AM from workflow 'HCP example' using Genedata Expressionist Refiner MS, 9.1.
2·10 7
6
10 7
7
5·10 6
8
9
2·10 6
10
10 6
11
5·10 5
12
13
2·10 5
14
10 5
15
5·10 4
16
17
2·10 4
5·10 3
2·10
19
RT
RT
18
10 4
20
21
3
22
10 3
23
500
24
200
25
100
26
27
50
28
20
29
10
30
31
5
1200
1150
1100
1050
1000
950
900
850
800
750
700
650
600
550
500
450
400
350
300
250
200
32
2
1
m/z
Created by 'Dominik Mertens (dmertens)' on Aug 12, 2015 10:52:56 AM from workflow 'HCP example' using Genedata Expressionist Refiner MS, 9.1.
Following data cleaning is retention time (RT) alignment, a
crucial step when comparing different samples. It corrects for
drifts in RT which can result from technical variability in the
chromatography setup, and produces aligned chromatograms
that allow for accurate quantitative comparisons. It is possible
to align samples against each other, or all samples against a
reference which enables comparisons of data collected over
C
Figure 2: A) Peptide mapping workflow; B) Modifications settings tab;
C) Downstream data analysis and reporting
Table 1: Manual inspection of the results of the peptide mapping activity. In the review column (outlined in red) a green tick implies accepting the result,
whereas a red block mark indicates a rejection
Genedata Expressionist provides its own search and scoring algorithm for peptide mapping. The sequence information
(text or file) needs to be added in the sequence tab. Additional
input is required for searching for modifications, glycosylation, and disulfide bonding in the corresponding tabs. In the
modifications tab, for instance, it is possible to limit the search
space by setting restrictions on the number of modifications
allowed per peptide and/or on their positions (Figure 2B). This
helps in reducing the number of false positives when several
modifications are searched simultaneously. When searching
for glycosylated peptides, the glycosylation tab provides the
option of performing a library-based or a customized search
as well as the option to search for partial glycosylation. Disulfide bonding specification options include fixed, scrambled, or
de novo searches.
need to be analyzed, the saved workflow is simply executed. If
manual review of results is activated, then the workflow will
not be completed until the peptide list is manually reviewed
and the peptides to be listed in the final results are accepted.
Once the results tables are obtained, it is possible to branch
out from the saved workflow and perform statistical analysis on the spot (Figure 2C). As shown in the figure, the type of
quantitative measure to be used is first specified in the data
setup. This is followed by normalizing the data, performing
statistical tests such as ANOVA, and finally generating a report that lists, for example, the top 20 significantly changing
peptides.
Comparative Analysis of Peptide Maps
Peptide mapping is quite often a comparative procedure.
When compared to a reference, peptide maps can be used to
After optimal customization of the peptide mapping workflow
detect structural alterations. Identifying significant differ(Figure 2A), it is saved and parameter settings that need to be
ences between the peptide map of a reference and that of a
kept unchanged are locked in an approved workflow. Running
sample of interest often requires statistical analysis. In the
approved peptide mapping workflows allows for standardized
activities shown in Figure 2C, percent abundance normaland efficient comparative studies such as matching a biosimiization was first employed to allow monitoring of changes in
lar to an innovator, assessing batch-to-batch variability, or
the expression levels of variable modifications (deamidations
monitoring manufacturing changes. When new peptide maps
and oxidations) relative to their unmodified counterpart. The
ANOVA test was subsequently
performed to detect significantName
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
ly changed peptides (Table 2).
LGEYGFQNALIVR
99.84
99.82
99.69
99.78
93.45
93.50
This test is often complemented
LGEYGFQNALIVR | Deamidated [N8]
0.16
0.18
0.31
0.22
6.55
6.50
with an Absent/Present search
LVNELTEFAK
99.87
99.87
99.79
99.83
96.32
96.45
LVNELTEFAK | Deamidated [N3]
0.13
0.13
0.21
0.17
3.68
3.55
to identify peptides which exYNGVFQECCQAEDK | Carboxymethyl [C8 C9]
99.35
99.39
99.04
99.09
73.24
72.42
clusively show up in one group
YNGVFQECCQAEDK | Carboxymethyl [C8 C9] Deamidated [N2]
0.65
0.61
0.96
0.91
26.76
27.58
of samples. Importantly, all reYNGVFQECCQAEDKGACLLPK | Carboxymethyl [C8 C9 C17]
97.73
97.36
96.81
97.05
62.64
60.54
sults tables are linked to visualYNGVFQECCQAEDKGACLLPK | Carboxymethyl [C8 C9 C17] Deamidated [N2]
2.09
2.43
2.88
2.68
32.00
34.10
izers which are associated with
0.18
0.21
0.31
0.27
5.37
5.35
YNGVFQECCQAEDKGACLLPK | Carboxymethyl [C8 C9 C17] Deamidated [Q6]
every
activity in the workflow.
Table 2: Abundances (%) of significantly changed peptides. Samples 1 and 2 are the references, samples 3 and 4
are mildly stressed, while samples 5 and 6 are harshly stressed
This provides an excellent plat-
a more accurate approach to
follow up results and verify statistically significant quantitative
changes. Figure 4 illustrates
the 2D visualization of two deamidations, one discovered by
ANOVA (Figure 4A) to be significantly more highly expressed
in the stressed sample, and the
second found by the Absent/
Present search (Figure 4B) to
be exclusively present.
These visualizers verify the results of both tests showing the
higher level of expression of the deamidated peptides in the
stressed sample. Additionally, validating the sequence identity
of these peptides can be done by examining their corresponding fragmentation spectra which are visualized along with annotations in the peptide mapping activity results. Figure 4C
shows the fragmentation spectrum of the deamidated peptide
in Figure 4B overlaid with the fragmentation spectrum of its
unmodified counterpart. Linked 2D visualizers are powerful
tools for the validation of statistical analyses.
Figure 3: Mirror plot of the total ion chromatograms of two peptide mapping samples
form to visualize significantly changing candidates between
samples.
The chromatogram view, for instance, is an activity which can
provide the classical mirror plots of the chromatograms to be
compared (Figure 3). However, these mirror plots might suffer
from problems related to co-eluting peptides or sub-optimal
chromatographic separation. The 2D visualizers associated
with all the activities of the peptide mapping workflow provide
B
Reference sample
18.6
LVNELTEFAK
C
19
RT
Deamidation
Reference sample
19.8
RT
A
19.4
20.3
19.8
20.8
1247.4
1164
1249.4
1165
1166
1167
Stressed sample
Stressed sample
1169
18.6
Deamidation
19.8
19
RT
Deamidation
1168
RT
1245.4
19.4
20.3
19.8
20.8
1245.4
1247.4
m/z
1249.4
1164
1165
1166
1167
m/z
1168
1169
Figure 4: 2D visualization of two deamidations, one discovered by ANOVA (A) to be significantly more highly expressed in the stressed sample, and the second found by the Absent/Present search (B) to be exclusively present in the stressed sample. Fragmentation spectrum of the unmodified peptide LVNELTEFAK (blue) overlaid with the fragmentation spectrum of the deamidated form, highlighting a deamidated fragment (C)
Summary
Automated data analysis is key in settings where peptide mapping is routine procedure. Genedata Expressionist provides a
flexible software platform that can be tailored to specific instrumentation and analytical methods. The platform allows running personalized workflows in an automated fashion, offering efficient and standardized data analysis. Complete automation can be combined with user validation of results based on a priori knowledge or manual inspection of the data. From raw
data to final reports, Genedata Expressionist offers streamlined, high quality data analysis with considerable time savings.
® ®
Genedata
GenedataExpressionist
Expressionist
is part
fis part
of theofGenedata
the Genedata
portfolio
portfolio
of advanced
of advanced
softwaresoftware
solutions that
solutions that
serve
servethe
theevolving
evolving
needs
needs
of drug
of drug
discovery,
discovery,
industrial
industrial
biotechnology,
biotechnology,
and other
and
lifeother
sciences.
life sciences.
Basel | Boston | Munich | San Francisco | Tokyo
www.genedata.com/expressionist | [email protected]
©
© 2016
2015Genedata
Genedata
AG.AG.
All rights
All rights
reserved.
reserved.
Genedata
Genedata
Expressionist
Expressionist
is a registered
is a registered
trademark oftrademark
Genedata AG.
of Genedata AG. All other
All
product
other product
and service
and service
names
names
mentioned
mentioned
areare
the
the
trademarks
trademarks of
of their
theirrespective
respective
companies.
companies.