23/05/2007 D. Adriaens Protocol for error testing in landmark based geometric morphometrics 1. Data acquisition Define as many potential landmarks as possible, landmarks that may be useful for later analyses. During this test phase, the usefulness of each landmark will be screened, and the dataset will be improved by removing uninformative landmarks. Use some representative specimens of the collection that will be studied, in including an as wide range of variation as possible (so that you have an idea whether landmarks are useful for the whole range of specimens to be studied). Once you have potential landmarks, clearly define them: write down a definition on how to recognise the landmarks. This should already reduce the digitisation error in advance. Select a number of specimens for digitisation error and orientation error testing, relying on the following criteria: • Shape variation in the dataset should be represented as much as possible in the selected specimens for error testing. Depending on the number of parameters you want to include in the actual analysis, the more specimens need to be included in the selection for error testing. Some parameters to take into account are: i. If you have specimen of different species, include representative specimens of each species; ii. if you have specimens of different geographic locations, take one specimen from each location; iii. if you have specimens of different size or age, take a number of specimens that span the range in size or age; iv. if you have specimens of different sex, include specimens of different sexes. • Dataset for digitisation error, for each representative group (see previous criteria), do the following: i. Position the selected specimen of a group for imaging as good as possible; ii. Take a picture iii. Make a tps-file in tps-util that includes a link to each of those pictures; iv. Open the tps-file in notepad, and copy each item (thus “LM=0” followed by path of the image) an equal number of times (the number of replica’s can depend on the number of specimens you have in the error testing dataset), so you end up with an n number of replicated images for each specimen. v. Save the tps-file as the digitisation test file. • Dataset for orientation error, for each representative group (see above), do the following: i. Position the selected specimen of a group for imaging as good as possible; ii. Take a picture; iii. Take out the specimen, and position it again; iv. Take a picture, and repeat this procedure for each specimen an equal number of times (again, the number may depend on the number of groups included), so you end up with an n number of replicated orientations for each specimen. 1 23/05/2007 D. Adriaens v. Make a tps-file in tps-util, including a link to all images. Save the tps-file as the orientation test file. Beware that this procedure must be done for each type of dataset that will be analysed, that is, if you intend to include a dataset of your specimens in a dorsal, lateral and ventral view, you need to construct a digitisation test and orientation test file for each of these orientations. Digitisation and orientation error may be different depending on the view. 2. Testing for digitisation error Open the digitisation test file in tps-dig, and digitise all the landmarks as defined earlier, for all the images, and save landmark coordinates. Now first screen for digitisation errors that can be avoided in the future, such as: • Landmarks that are switched in sequence during digitisation; • Landmarks that are positioned in the wrong place (thus do not follow strictly the definition you formulated for it in the beginning or landmarks of which your definition proves to be insufficient) Open tps-small, and verify under options whether scale align is set to ‘1’ and projection is set to ‘orthogonal’ (unless otherwise wanted). Open your digitisation test file and compute the Procrustes distances and tangent distances (press ‘compute’ button). Press ‘view plot’, and under options click ‘distance to reference’ (example on the left). Check the distributions of the data points (= each image digitised) for abnormal distributions. If you have a small number of specimens lying separated from the majority of the specimens, you may be dealing with avoidable digitisation errors. Check what specimens (by checking the label number listed next to the data points in the graph) show this abnormal distribution, and write them down. Also check the slope and regression coefficient of the regression between the Procrustes distances and tangent distances in the report (File-View report). Next step is to check what landmarks might be involved in this error. So, open the digitisation test file in tps-relw. Calculate the consensus configuration (press ‘consensus’ button on the left), and visualise the consensus (click ‘consensus’ button on the right). Under options, click ‘vectors’. Now screen for abnormally long vectors around the consensus landmarks and write them down. Now open the digitisation test file again in tps-dig, and find the specimens you wrote down, and check for the landmark position for the landmarks you wrote down. This should quickly show you what the cause is of this digitisation error. Correct these errors and do both analyses again until you see no noticeable digitisation errors anymore. If you end up with some landmarks showing visible vectors (of comparable length per landmark), and others without visible vectors (which means digitisation error is almost zero), this probably means they are bad landmarks (as they are hard to digitise in a standardised way). If you find no way to avoid that error, write down that these are potential landmarks to remove from the error testing data set later on (but do not remove them at this point, as they may prove to be biologically informative after all) (see step 4). Once removed those avoidable errors, click ‘partial warps’ and ‘relative warps’ buttons on the left in tps-relw, followed by the ‘relative warps’ button on the right. This will give you a plot of relative warp 1 vs 2 23/05/2007 D. Adriaens relative warp 2, with all the specimens and their replica’s spread over the biplot (example on the right). This plot already gives you an idea of the size of the digitisation error (is diameter of cluster of replica’s per specimen) with respect to the variation between specimens (distance between the clusters). If you don’t see the different specimens nicely separated, the set of landmarks used is insufficient for further analyses (as your error is as big as your biological variation). Now screen the plot, looking at the way replica’s are distributed for each specimen. Replica’s of a specimen that show very little digitisation errors, are evenly spread in a circular pattern (e.g. example below on the left). Clusters that show an abnormal (and thus probably avoidable) pattern of digitisation error, deviate from this pattern (e.g. example below on the right). By clicking the -button, and move the red circle appearing at the zero-point of the biplot to the two extreme ends of this range, you can visualise what pattern is underlying this non-random digitisation error. The grid on the left shows the configuration for specimen 90, whereas the grid on the right shows the configuration of specimen 83. The difference between both then shows the pattern of non-random digitisation. Do this for all non-random digitisation errors, and define the cause of it. Then go back to the digitisation test file in tps-dig and correct for those digitisations. Repeat the whole procedure, until you (ideally) end up with a biplot in tps-relw where all clusters have a circular distribution. The amount of digitisation error that is remaining, is then unavoidable noise, but noise that is reduced to its minimum. 3. Testing for orientation error Analogous to the procedure for digitisation error testing, the same is done for orientation error testing. So, open the orientation test file in tps-dig and digitise all the landmarks in all the images. Subsequently, screen for all major digitisation errors using tps-small and tps-relw, as described for the digitisation error. Correct digitisations until they are removed. It is thus important to be aware that in this orientation test file, also digitisation errors are included as they may have been in the digitisation test file! Also repeat the procedure in order to homogenise the distribution of the clusters of replica’s in a biplot of RW1 vs RW2. Of course, now orientation error is included, so all non-random distributions of clusters of replica’s may remain when caused by orientation error. If first having carefully screened that non-random clusters are not the result of digitisation error (see above), the distribution of the orientation replica’s in a cluster remain substantially large, you may want to include a new set of images of n orientation replica’s of that specimen, after having verified what is explaining the non-random pattern (by visualising deformation grids of extreme points in a cluster, as done for the digitisation test). Then do the analyses again, to see whether the plots are improving. 4. Screening for useful landmarks Once you have removed avoidable digitisation and orientation error, you may want to increase the informative nature of your dataset by removing landmarks that are too fuzzy, and thus do not contribute to the biological variation in shape (they only obscure informative landmarks then). You can now delete the landmarks that you wrote down after having removed digitisation errors (see step 2), so the ones where equally sized vectors are randomly spread around the consensus landmark. Remove each landmark separately first, and for each dataset do the relative warp analyses to see whether variation within each cluster decreases with respect to the variation between clusters. You can do that for both the digitisation test file and orientation test file, depending on whether landmarks may be problematic at each of those levels. Especially check those clusters that showed a non-random distribution in both tests. 3 23/05/2007 D. Adriaens Once you have done that for each separate landmark, you can try whether removing a combination of two or more landmarks even more improves the cluster distribution. Once you find no improvement anymore, use the dataset with the largest number of informative landmarks to continue from here, as well as define your final list of landmarks that you will use for all further analyses. 5. Quantifying digitisation and orientation error It is interesting to have an idea how large your digitisation and orientation error is with respect to the variation between specimens. The lower the error, the more discriminative power your set of landmarks will have during further analyses. One measure of variation in shape, whether due to error or natural, is calculating the Procrustes distances between all specimens (as a measure for biologically relevant shape variation) and between all replica’s in a cluster (as a measure for errors). For quantifying the digitisation error, open the digitisation test file in tps-small and press ‘compute’. Open the report (File, View report…), and go to the bottom of the report, where you get something as mentioned on the right. The values for Min, Max and Mean Procrustes distances are then relevant data that you can copy in an excel sheet. The values listed here (thus for the whole digitisation test file), is the minimum, maximum and mean shape distance between all combinations of two specimens, thus including both the natural shape variation and shape variation due to digitisation error. Here shape distances are given in Kendall shape space (Procrustes distance) and in tangent shape space (tangent distance). It is better to use Procrustes distances for this quantification of error, as they are calculated in the original shape space. Next step is to calculate the same values, but now for each cluster of replicas. For the digitisation test file, you open the file in notepad and make new files for each cluster of replica’s (thus each file contains the landmark coordinates and path of the images for each specimen. You then calculate min, max and mean Procrustes distances for each set of replica’s per specimen and copy that into the same excel sheet. Also include the values calculated for the total dataset (see previous paragraph). As an overall measure for digitisation error, now extract the minimal and maximal value observed in all separate specimens, and calculate the average of all mean values for those specimens (values shown in yellow in table below): You can then express the amount of digitisation error with respect to the total variation in shape as a percentage, by calculating the ratio of the mean value for total digitisation and the mean of the total dataset. In the table listed here, this would mean that 4.1% of the observed variation is due to digitisation error. You can also visualise the mean, as well as ranges of digitisation error with respect to total variation in a bar diagram with error flags (see on the right). You then do the same procedure for the orientation test file, which should yield you a 4 23/05/2007 D. Adriaens percentage of error that is larger than the one calculated for digitisation error (as now also orientation error is added). Assuming that the digitisation error in both test files is equally large, by subtracting the percentage obtained from the digitisation test from that of the orientation test, you get an estimate of the amount of orientation error only. The essential conclusion from all this is that the percentage obtained from the orientation test, thus representing digitisation ánd orientation error, should be sufficiently low. The lower, the more discriminative power you may expect for any further analysis on other specimens, using the same set of landmarks, the same procedure for digitisation and the same procedure for orientation. 5
© Copyright 2024 Paperzz