- biostep

When investigating protein expression changes between different experimental conditions, the quality of
the input data plays a huge part in the robustness of the analysis and the confidence with which true
changes in spot patterns/expression levels can be measured. For this reason Progenesis PG240 provides
a range of data quality measures. They can be divided into two broad categories, that of INCA related
measures and those derived from applying a bootstrapping methodology to the data.
What is INCA?
The Intelligent Noise Correction Algorithm (INCA) is an algorithm that specifically identifies and quantifies
noise within a 2D gel image. INCA is highly discriminatory and is capable of distinguishing between true
signal and high levels of noise. Noise can vary considerably across an image so it is not enough to simply
apply a ‘one size fits all’ approach to dealing with the level of noise, instead each pixel is statistically
assessed to enable identification as either true signal or non-specific noise. Two types of noise are
identified by INCA; the first is a low level Gaussian distributed sensor noise, generated by any capturing
device which utilises an electrical current. The second is the more visually obvious random noise, for
example speckling from the crystallisation of certain stains, dust particles, edges of tears and the like.
Noise as defined here is not to be confused with the images general background intensity caused by the
staining technique employed; this is handled in a separate ‘background subtraction’ operation.
Alternative methods for noise removal such as median and averaging masks and blurring filters assume
that noise spikes are high frequency, so a low level pass filter is used. This can result in the spot
becoming distorted. A clean gel that is processed using a median filter can still result in distorted spots
and hence altered results. In contrast INCA does not affect a spot if it is noise free; the spot will not be
altered in any way.
Spot detection occurs after the image has been INCA processed, so using the corrected image and thus
yielding better quality spot detection. Quantitation of spot material however is performed on both the
original raw image file and the corrected image file, providing the user with INCA corrected data alongside
the uncorrected data within the software. Importantly, it is left up to the user to decide if they wish to make
use of the corrected data or not.
As the noise component is quantified in the INCA process an additional level of data, i.e. the noise, is
associated with each detected spot. Various fields reporting the noise are available from within the
software; providing a means to aid in the identification (and removal if required) of noisy spots within the
data set. For example INCA volume / noise expresses the INCA corrected volume as a ratio to the noise
associated with that spot. A good quality spot should have a large INCA volume / noise value. This data
measurement is particularly useful if performing spot filtering on images in order to remove spots
consisting primarily of noise.
This ability to separate true signal level from the noise component of a spot offers a powerful advantage
for the investigation of expression changes. Studying expression changes between gels containing noisy
data could result in the identification of inaccurate expression changes. Unless removed, any noise spikes
present within protein spots will contribute to the overall signal intensity and hence volume of those spots.
This will be particularly problematic with low-level material. As INCA removes such noise from the spot
material, any expression differences based on INCA measurements will be a more accurate reflection of
the true spot data.
Nonlinear Dynamics Group
[email protected] | www.nonlinear.com
Nonlinear Dynamics Ltd
Cuthbert House | All Saints | Newcastle upon Tyne | NE1 2ET | UK
tel: +44 (0)191 230 2121 | fax: +44 (0)191 230 2131
Nonlinear USA Inc. | toll free: 1-866 GELS USA
4819 Emperor Blvd | Suite 400 | Durham | NC27703 | tel: 919 313 4556 | fax: 919 313 4505
From within the software the result of INCA can be visualised in a number of ways. Noise can be directly
viewed from within the 3D window, as shown below in Figure 1.
A
B
C
Figure 1 3D view showing INCA identified noise in red
A. The spot in view (highlighted in green) has only a small proportion of noise present compared to its overall signal.
Spot volume= 23488668, INCA volume= 23411269, INCA volume/noise= 62.795
B. This spot has a considerable proportion of its volume attributed to non-specific noise. Spot volume= 122528, INCA
volume=72706, INCA volume/noise= 1.384
C. The same spot as that shown in B but after noise has been removed
Alternatively the INCA corrected data fields and noise associated fields are available for display in the data
tables within the Progenesis PG240 software.
Data Quality Measurements
While INCA allows the user to correct the spot data for noise levels, the Progenesis PG240 software
provides additional tools to directly assess the actual quality of the detected spots. This is performed
through the Statistics Fields available in the Measurement and Comparison Tables. Data quality is
measured by applying a bootstrap method of re-sampling to the spot data.
The bootstrap statistics fields available are bootstrap Volume, bootstrap Error and bootstrap Coefficient of
Variation (CV).
Figure 2 Bootstrap fields used for data quality measurements. Access these via the field selection tabs of the data
tables
Nonlinear Technical Note – Data Quality
2
Bootstrapping
Bootstrapping (or re-sampling) is used to calculate confidence limits for a given measurement. If we
assume that a given set of measurements are subject to some degree of noise, then bootstrapping allows
you to quantify the errors and add confidence to the measure. The procedure involves taking a subset of
the measured values and deriving some property from these. For example a study may assume there is a
linear relationship between two variables. For this scenario you would randomly select a subset of values
and fit a line. You then repeat this process and build a distribution of fitted lines. The result of this
procedure is that you obtain a most likely fitted line but you also get other lines which you can be confident
bracket the actual value (this is usually 3 standard deviation points of the fitted line distribution).
The bootstrapping method used in Progenesis PG240 adapts this procedure by choosing a subset of
pixels from a spot and fitting a surface through them from which a spot volume is calculated. Multiple
surface fits are generated by sequential rounds of selection of random pixel points across the spot surface.
Calculations of all the generated surface fits are then made to yield a mean surface fit, with an associated
volume (this is the bootstrap volume for that spot), and a bootstrap error, corresponding to one standard
deviation from the mean surface fit. The bootstrap CV is then calculated as: 100 x bootstrap error /
bootstrap volume.
frequency
mean bootstrap volume
Figure 3 Generalised curve
representing the distribution of surface
fits calculated for a spot
bootstrap volume
3 standard deviations
=3x bootstrap error
A relatively noise free spot will have a bootstrap volume very similar to that of its INCA corrected volume
(and uncorrected volume), and a very small bootstrapping error because the surface fits will all cluster
quite tightly around the mean bootstrap volume. Importantly its bootstrap CV should also be a small value
as the error will be very small in proportion to the mean bootstrap volume. This is shown in Figure 4
below.
frequency
3 standard deviations
= 3x bootstrap error
bootstrap volume
Figure 4 Plot showing a relatively noise free spot, where the surface fits are tightly clustered around the mean
bootstrap volume, and the bootstrap error is quite small. The spot shown in the 3D view has volume= 5963678, INCA
volume=5921973, bootstrap volume=5950442, bootstrap CV= 0.225
Nonlinear Technical Note – Data Quality
3
A noisy spot however will most likely generate a mean bootstrap volume significantly different to that of its
INCA volume. The noisy pixels in this spot will mean that any surface fits including the noisy pixels will
result in bootstrap volumes significantly different to the INCA volume. The multiple surface fits that are
calculated, many of which may include noisy pixels, will result in a mean bootstrapping volume with a
larger associated error, reflecting the wider range of fits generated through the inclusion of noisier data.
The bootstrap CV value will also therefore be larger as the bootstrap error will be greater in proportion to
the bootstrap volume. This is illustrated in Figure 5 below.
A
frequency
3 standard deviations
= 3x bootstrap error
B
bootstrap volume
C
Figure 5 Handing of a spot associated with a reasonable level of noise.
A. A wider range of surface fits are generated when data from a noisy spot is analysed by bootstrapping, reflected by a
larger bootstrap error.
B. View of a noisy spot in 3D. Spot volume = 404807, INCA vol= 275621, bootstrap volume= 142261, bootstrap CV=
147.
C. The same spot as that shown in B but after noise has been removed.
Nonlinear Technical Note – Data Quality
4
Using Bootstrapping Values to Explore Data Quality
Bootstrap CV is a useful measure for the identification of particularly noisy spots. It is particularly powerful
because the CV is a measure of the bootstrap error as a proportion of the bootstrap volume. This is a
more reliable measure of data quality than bootstrap error alone as the significance of a large error is
related to how large the parameter being measured is.
There are a number of direct methods available to assess data quality using bootstrapping results in
Progenesis PG240.
1. Tables
Display the bootstrap CV field in the Measurements or Comparison table; the table can then be sorted on
the basis of these values simply by clicking on the bootstrap CV column header. Spots with the largest
CVs, and hence have a large bootstrap error in comparison to their bootstrap volume, will be at the top of
the table (see Figure 6).
Figure 6 The spot selected in the table has a large bootstrap CV, this is confirmed upon inspection of the 3D view
which shows this spot to have a large amount of associated noise.
Once the data has been sorted in this way spots with large bootstrap CVs can be quickly removed from
the analysis if required by filtering the table from within the Spot Filtering mode.
2. Histograms
If bootstrap volume is selected to view in the Histogram window and standard deviation is selected as the
Error type, by default the error shown will be the bootstrap error. If preferred, 3 standard deviations can be
selected to view. If the Error type ‘Coefficient of Variation’ is selected this will be the Bootstrap CV.
The error bars within the histogram provide a visual means to determine if spots are different from one
another or if the absolute difference in measurement parameter is negated by the level of noise associated
with that spot measurement.
Figure 7 Histogram view showing bootstrap volume and
bootstrap error. The size of the error bars suggests that this spot
is not particularly noisy.
Nonlinear Technical Note – Data Quality
5
Falls Sie Fragen zu einem Produkt haben oder allgemeine Informationen benötigen,
wenden Sie sich bitte an:
biostep GmbH
Meinersdorfer 47a
09387 Jahnsdorf
Germany
phone:
fax:
email:
web:
+49 3721 3905-0
+49 3721 3905-28
[email protected]
www.biostep.de