Protocol for error testing in landmark based geometric morphometrics

23/05/2007
D. Adriaens
Protocol for error testing in landmark based geometric
morphometrics
1. Data acquisition
Define as many potential landmarks as possible, landmarks that may be useful for later analyses.
During this test phase, the usefulness of each landmark will be screened, and the dataset will be
improved by removing uninformative landmarks.
Use some representative specimens of the collection that will be studied, in including an as wide
range of variation as possible (so that you have an idea whether landmarks are useful for the
whole range of specimens to be studied).
Once you have potential landmarks, clearly define them: write down a definition on how to
recognise the landmarks. This should already reduce the digitisation error in advance.
Select a number of specimens for digitisation error and orientation error testing, relying on the
following criteria:
•
Shape variation in the dataset should be represented as much as possible in the
selected specimens for error testing. Depending on the number of parameters you
want to include in the actual analysis, the more specimens need to be included in the
selection for error testing. Some parameters to take into account are:
i. If you have specimen of different species, include representative specimens of
each species;
ii. if you have specimens of different geographic locations, take one specimen
from each location;
iii. if you have specimens of different size or age, take a number of specimens
that span the range in size or age;
iv. if you have specimens of different sex, include specimens of different sexes.
•
Dataset for digitisation error, for each representative group (see previous criteria), do
the following:
i. Position the selected specimen of a group for imaging as good as possible;
ii. Take a picture
iii. Make a tps-file in tps-util that includes a link to each of those pictures;
iv. Open the tps-file in notepad, and copy each item (thus “LM=0” followed by
path of the image) an equal number of times (the number of replica’s can
depend on the number of specimens you have in the error testing dataset), so
you end up with an n number of replicated images for each specimen.
v. Save the tps-file as the digitisation test file.
•
Dataset for orientation error, for each representative group (see above), do the
following:
i. Position the selected specimen of a group for imaging as good as possible;
ii. Take a picture;
iii. Take out the specimen, and position it again;
iv. Take a picture, and repeat this procedure for each specimen an equal number
of times (again, the number may depend on the number of groups included),
so you end up with an n number of replicated orientations for each
specimen.
1
23/05/2007
D. Adriaens
v. Make a tps-file in tps-util, including a link to all images. Save the tps-file as
the orientation test file.
Beware that this procedure must be done for each type of dataset that will be analysed, that is,
if you intend to include a dataset of your specimens in a dorsal, lateral and ventral view, you
need to construct a digitisation test and orientation test file for each of these orientations.
Digitisation and orientation error may be different depending on the view.
2. Testing for digitisation error
Open the digitisation test file in tps-dig, and digitise all the landmarks as defined earlier, for all
the images, and save landmark coordinates. Now first screen for digitisation errors that can be
avoided in the future, such as:
•
Landmarks that are switched in sequence during digitisation;
•
Landmarks that are positioned in the wrong place (thus do not follow strictly the
definition you formulated for it in the beginning or landmarks of which your
definition proves to be insufficient)
Open tps-small, and verify under options whether scale align is
set to ‘1’ and projection is set to ‘orthogonal’ (unless otherwise
wanted). Open your digitisation test file and compute the
Procrustes distances and tangent distances (press ‘compute’
button). Press ‘view plot’, and under options click ‘distance to
reference’ (example on the left). Check the distributions of the
data points (= each image digitised) for abnormal
distributions. If you have a small number of specimens lying
separated from the majority of the specimens, you may be
dealing with avoidable digitisation errors. Check what
specimens (by checking the label number listed next to the
data points in the graph) show this abnormal distribution, and
write them down. Also check the slope and regression
coefficient of the regression between the Procrustes distances and tangent distances in the
report (File-View report).
Next step is to check what landmarks might be involved in this error. So, open the digitisation
test file in tps-relw. Calculate the consensus configuration (press ‘consensus’ button on the left),
and visualise the consensus (click ‘consensus’ button on the right). Under options, click ‘vectors’.
Now screen for abnormally long vectors around the consensus landmarks and write them down.
Now open the digitisation test file again in tps-dig, and find the specimens you wrote down, and
check for the landmark position for the landmarks you wrote down. This should quickly show
you what the cause is of this digitisation error. Correct these errors and do both analyses again
until you see no noticeable digitisation errors anymore. If you end up with some landmarks
showing visible vectors (of comparable length per landmark), and others without visible vectors
(which means digitisation error is almost
zero), this probably means they are bad
landmarks (as they are hard to digitise in a
standardised way). If you find no way to
avoid that error, write down that these are
potential landmarks to remove from the
error testing data set later on (but do not
remove them at this point, as they may
prove to be biologically informative after
all) (see step 4).
Once removed those avoidable errors, click
‘partial warps’ and ‘relative warps’ buttons
on the left in tps-relw, followed by the
‘relative warps’ button on the right. This
will give you a plot of relative warp 1 vs
2
23/05/2007
D. Adriaens
relative warp 2, with all the specimens and their replica’s spread over the biplot (example on the
right). This plot already gives you an idea of the size of the digitisation error (is diameter of
cluster of replica’s per specimen) with respect to the variation between specimens (distance
between the clusters). If you don’t see the different specimens nicely separated, the set of
landmarks used is insufficient for further analyses (as your error is as big as your biological
variation).
Now screen the plot, looking at the way replica’s are distributed for each specimen. Replica’s of a
specimen that show very little digitisation errors, are evenly spread in a circular pattern (e.g.
example below on the left). Clusters that show an abnormal (and thus probably avoidable)
pattern of digitisation error, deviate from this pattern (e.g. example below on the right). By
clicking the
-button, and move the red circle appearing at the zero-point of the biplot to the
two extreme ends of this range, you can visualise what pattern is underlying this non-random
digitisation error. The grid on the left shows the configuration for specimen 90, whereas the grid
on the right shows the configuration of specimen 83. The difference between both then shows
the pattern of non-random digitisation.
Do this for all non-random digitisation errors, and define the cause of it. Then go back to the
digitisation test file in tps-dig and correct for those digitisations. Repeat the whole procedure,
until you (ideally) end up with a biplot in tps-relw where all clusters have a circular distribution.
The amount of digitisation error that is remaining, is then unavoidable noise, but noise that is
reduced to its minimum.
3. Testing for orientation error
Analogous to the procedure for digitisation error testing, the same is done for orientation error
testing. So, open the orientation test file in tps-dig and digitise all the landmarks in all the
images. Subsequently, screen for all major digitisation errors using tps-small and tps-relw, as
described for the digitisation error. Correct digitisations until they are removed. It is thus
important to be aware that in this orientation test file, also digitisation errors are included as
they may have been in the digitisation test file!
Also repeat the procedure in order to homogenise the distribution of the clusters of replica’s in a
biplot of RW1 vs RW2. Of course, now orientation error is included, so all non-random
distributions of clusters of replica’s may remain when caused by orientation error. If first having
carefully screened that non-random clusters are not the result of digitisation error (see above),
the distribution of the orientation replica’s in a cluster remain substantially large, you may want
to include a new set of images of n orientation replica’s of that specimen, after having verified
what is explaining the non-random pattern (by visualising deformation grids of extreme points
in a cluster, as done for the digitisation test). Then do the analyses again, to see whether the
plots are improving.
4. Screening for useful landmarks
Once you have removed avoidable digitisation and orientation error, you may want to increase
the informative nature of your dataset by removing landmarks that are too fuzzy, and thus do
not contribute to the biological variation in shape (they only obscure informative landmarks
then). You can now delete the landmarks that you wrote down after having removed
digitisation errors (see step 2), so the ones where equally sized vectors are randomly spread
around the consensus landmark. Remove each landmark separately first, and for each dataset do
the relative warp analyses to see whether variation within each cluster decreases with respect to
the variation between clusters. You can do that for both the digitisation test file and orientation
test file, depending on whether landmarks may be problematic at each of those levels. Especially
check those clusters that showed a non-random distribution in both tests.
3
23/05/2007
D. Adriaens
Once you have done that for each separate landmark, you can try whether removing a
combination of two or more landmarks even more improves the cluster distribution. Once you
find no improvement anymore, use the dataset with the largest number of informative
landmarks to continue from here, as well as define your final list of landmarks that you will use
for all further analyses.
5. Quantifying digitisation and orientation error
It is interesting to have an idea how large your digitisation and orientation error is with respect
to the variation between specimens. The lower the error, the more discriminative power your set
of landmarks will have during further analyses.
One measure of variation in shape, whether due to error or natural, is calculating the Procrustes
distances between all specimens (as a measure for biologically relevant shape variation) and
between all replica’s in a cluster (as a
measure for errors).
For quantifying the digitisation error, open
the digitisation test file in tps-small and
press ‘compute’. Open the report (File,
View report…), and go to the bottom of
the report, where you get something as
mentioned on the right. The values for
Min, Max and Mean Procrustes distances
are then relevant data that you can copy in
an excel sheet. The values listed here (thus
for the whole digitisation test file), is the
minimum, maximum and mean shape distance between all combinations of two specimens, thus
including both the natural shape variation and shape variation due to digitisation error. Here
shape distances are given in Kendall shape space (Procrustes distance) and in tangent shape
space (tangent distance). It is better to use Procrustes distances for this quantification of error, as
they are calculated in the original shape space.
Next step is to calculate the same values, but now for each cluster of replicas. For the digitisation
test file, you open the file in notepad and make new files for each cluster of replica’s (thus each
file contains the landmark coordinates and path of the images for each specimen. You then
calculate min, max and mean Procrustes distances for each set of replica’s per specimen and copy
that into the same excel sheet. Also include the values calculated for the total dataset (see
previous paragraph). As an overall measure for digitisation error, now extract the minimal and
maximal value observed in all separate specimens, and calculate the average of all mean values
for those specimens (values shown in yellow in table below):
You can then express the amount of
digitisation error with respect to the total
variation in shape as a percentage, by
calculating the ratio of the mean value for
total digitisation and the mean of the total
dataset. In the table listed here, this would
mean that 4.1% of the observed variation is due
to digitisation error. You can also visualise the
mean, as well as ranges of digitisation error with
respect to total variation in a bar diagram with
error flags (see on the right).
You then do the same procedure for the
orientation test file, which should yield you a
4
23/05/2007
D. Adriaens
percentage of error that is larger than the one calculated for digitisation error (as now also
orientation error is added). Assuming that the digitisation error in both test files is equally large,
by subtracting the percentage obtained from the digitisation test from that of the orientation
test, you get an estimate of the amount of orientation error only.
The essential conclusion from all this is that the percentage obtained from the orientation test,
thus representing digitisation ánd orientation error, should be sufficiently low. The lower, the
more discriminative power you may expect for any further analysis on other specimens, using the
same set of landmarks, the same procedure for digitisation and the same procedure for
orientation.
5