Introduction

PIBWin HELP FILE CONVERTED TO WORD
This tutorial was written by Dr. Trevor Bryant and goes far more in-depth than the Schnf Ashex
Tutorial in terms of PIBWin’s abilities.
Introduction
PROBABILISTIC IDENTIFICATION OF BACTERIA for Windows (PIBWin) is a windows version of a
DOS program PIB (also called Bacterial Identifier).
The programme has three major functions:
the identification of an unknown isolate
the selection of additional tests to distinguish between possible strains if identification is
not achieved
the storage and retrieval of results
It also has some utility functions for assessing the usefulness of identification matrices and for converting
matrices into different formats.
The program makes use of Excel files to store identification matrices and archived results to achieve this,
although other file formats are supported to allow backwards compatibility with the DOS version of the
programme.
Up to date information on the programme can be found on the PIBWin web site
www.som.soton.ac.uk/staff/tnb/pib.htm which can also be accessed from the Help menu.
The program is designed to use probabilistic identification matrices that have either published in the
literature or created by the user. The matrices that are provided with PIB have been taken from the
literature. These matrices have been typed in from the publication describing them and users should refer to
these publications for full details of the methods used when testing isolates.
Identification Matrix
The identification matrix is displayed when the Matrix tab is selected.
The matrix may be displayed as integer numbers (ranging from 1 to 99) representing the percentage
probability of obtaining a positive result, or they can be displayed as +/v/- depending on the value selected.
This option is set by the Options.
The view can be changed by clicking the right mouse button and checking or unchecking Display Matrix as
+/v/- on the pop up menu.
To view the full name for a test or taxa move the cursor over the item, a pop up box will display the item in
full.
Sorting the identification matrix
The matrix can be sorted by double clicking on the name at the top of each column. The first double click
performs an ascending sort (negative results first), successive double clicks perform descending and
ascending sorts.
Note the underlying identification matrix is not affected by sorting as the Matrix tab displays a view of it.
To return to the original order, either click the right mouse button and select Revert to original order, or
select another tab and then return to the Matrix tab.
Results
The Results tab is where the results for an unknown strain are entered.
There are four aspects to the Results screen
Details Bar
Results Grid
Entering Results
Buttons
Details Bar
The details bar is where a personal key, the source of the isolate and details about the isolate can be
entered.
Key can be a maximum of 15 characters. A key must be entered if the results are to be saved to an Archive
file for recall at a later time.
Source is drop down list box which allows text up to a maximum of 50 characters to be entered. To achieve
consistent entry of source text, existing values from the Archive file is displayed in the drop down list, so
the list will grow in length over time.
Details provides for a maximum of 255 characters.
The Save button is enabled when one result has been entered and there is an entry in the Key box; it is only
shown on the Identification and Additional Tests tabs.
Note: If an isolate is recalled from the Archive file and the key changed. Save will create a new, additional,
record in the Archive file.
Results Grid
Results can be entered in a grid or list format. This is controlled by the status of the Use List Format for
Results check box.
Grid format enables a 96 well microtitre plate format to be accommodated. The full name of each test is
shown in a pop up box when the cursor is placed over the test name.
List Format is a scrolling list
Entry of Results
Results can be entered using the keyboard or the mouse. There are 4 possible states for a result:
positive + , negative -, indeterminate ? and not done.
The indeterminate state is to allow for tests that have been carried out, but the interpretation of the result is
difficult and you are undecided about the result. The indeterminate state allows you to record that the test
has been done, rather than the result is missing.
Mouse Action
Result
Key
Function Key
Positive
+ or =
F2
Left click
Negative
- or _
F3
Right click
Indeterminate
? or /
F4
<space bar> or <Enter>
F5
Missing
Repeat click
The programme has been written so that the shift character does not have to be pressed to obtain the + or ?
symbol, although some keyboard layouts may differ.
To change a result press the key for the new value.
To remove a result using the mouse, click a second time.
Note: because of the way the mouse works, the first left click sometimes acts as a select object so an
additional click is needed.
Buttons
Reset
Clears the results of the current isolate and resets them all to
missing. The details are left unchanged
New
Clears the results and the details of the current isolate and resets
them all to missing.
Recall
Recalls the results of a previous isolate from an Archive file
Archived Results
The Archive Results screen displays details and identification of previously entered isolates. If an Archive
file is not already open then an Open window is displayed when the Recall button is pressed in the Results
window.
To recall the results of a previous isolate Double Click on the row of the isolate.
Sorting the Archived Results
Each column of information can be sorted. Click on the column heading to sort the archived isolates into
ascending order, a second click reverses the sort into descending order.
Searching the Archived Results
The Find button activates a search of the archived results. Searching is case insensitive, it does not include
wild cards or complex searching. Once a hit has been obtained, the Find Next button is enabled to permit
further searching.
Searching is performed across all rows and columns excluding the first column.
Technical details
The software can support two types of Archive Files, Excel and DOS Archive.
The DOS Archive format is for backwards compatibility with the previous DOS version of this software. It
is not recommended that this format is used. It contains less information about isolates and is less flexible.
The Excel format is recommended.
The Excel Archive file can be opened and manipulated in Microsoft Excel. This enables the data to be used
by other software packages, unwanted isolate information deleted. DO NOT CHANGE the order of the
columns in the Archive file. This would make the file unusable with the identification matrix. There are
some internal checks that the software performs to detect discrepancies between the Identification matrix
file and the Archive file but these are not fool proof. It is a case of user beware. So if you wish to
experiment make sure that you have taken back ups of your files before they are modified.
Identification
The identification tab is shown once a test result has been entered in the Results window.
Additional Tests
This tab is available when Identification is not successful and more than one taxon is a possible candidate
for the unknown isolate.
Tests may be chosen in two ways:
they may be selected so that the most likely taxon can be distinguished from other
likely taxa.
they can be selected to distinguish likely taxa from each other.
Use the radio buttons to select which method of test selection you wish to choose, then use the
spin edit box
to choose the number of taxa to be considered.
Use Select Tests to obtain the list of tests to be used.
Move the cursor over the strains and tests to obtain the name in full in a pop up window.
The Exclude Tests button allows you to specifically omit certain tests before test selection is
carried out.
See Also Test Selection Algorithm
Exclude Tests
The Exclude Tests window is used by the Additional Tests and Select Best Tests for Matrix procedures.
A list of tests in the current matrix is displayed. Those tests that will be omitted from the test selection
procedure are shown with an asterisk * in the Excluded column.
Tests can be included or excluded by clicking on the Excluded column.
Include All Tests is used to include all tests from the Test Selection procedure
Exclude All Tests is used to exclude all tests from the Test Selection procedure, then those tests that are
required can be selected by clicking in the Exclude column.
Tools
The Tools menu options provide functions for manipulating matrix files and investigating the properties of
an identification matrix
Convert Matrix
The Identification matrix file can be written in one of three formats:
Excel [*.xls]
Comma separated values [*.csv]
Fixed format [*.mat]
The recommended format is to use the Excel format because this contains
more information that the other two formats.
The fixed format is for backwards compatibility with the original DOS
version of this software and its use is not recommended.
Convert DOS archive
This allows the Archive file created by the original DOS version of this
software to be rewritten in the Excel archive format. It is strongly
recommended that you convert old Archive files.
Note: a new Archive file is created and the original Archive file is left
untouched.
Select Best Tests
This allows investigation of the current matrix to determine which are the
most important tests in the matrix. See Select Best Tests for Matrix for
further details
Calculate Matrix ID scores
This allows investigation of the current matrix to determine if there is an
overlap between strains in the matrix. See Matrix ID scores for further
details
Select Best Tests for Matrix
This procedure is called from the Tools Menu. The procedure can be used to select the minimum of tests to
distinguish taxa in an identification matrix.
Tests may be chosen in two ways:
they may be selected so that one taxon can be distinguished from other strains (taxa).
they can be selected to distinguish all strains (taxa) from each other.
Use Select Tests to obtain the list of tests to be used.
Move the cursor over the strains and tests to obtain the name in full in a pop up window.
The Exclude Tests button allows you to specifically omit certain tests before test selection is
carried out.
See Also Test Selection Algorithm
Matrix ID Scores
The Matrix ID scores procedure is called from the Tools Menu. It is used to assess whether the
identification matrix is capable of identifying each taxon (strain) that is contained in it. The procedure
considers each taxon in turn, it uses each percentage probability for that taxon as a positive or negative
result, creating a Hypothetical Median Organism (HMO). It then uses this HMO to calculate an
Identification Score using the Willcox probability. If any probabilities of 50 are encountered (typically
missing data is coded as 50), the identification score is calculated in three ways, tests where a value of 50 is
found for the taxon are:
excluded
all treated as positive results
all treated as negative results
These results are shown as ID Score, Missing Positive and Missing Negative.
If the ID score does not exceed the Identification Threshold then the strain with the second highest
identification score is listed in the Next Strain column.
Ideally the ID Score and Missing Positive and Missing Negative columns should display values of 1.00000.
If identification is not achieved then the most likely taxa are listed descending order of their identification
scores. The Additional Tests tab is shown when the Identification tab is selected.
Differences between the unknown isolate likely taxa are listed in a second grid.
What is displayed is controlled by the threshold values set in Options.
Options
This calls the Options window which has two tabbed Options: General and Identification.
The Use default values button resets the defaults for values on the Identification tab.
Open Last Identification Matrix
Open Last Archive File:
The current (last) identification matrix used by the programme is
automatically opened when PIBWin is started. The name of the file
is displayed when this option is selected. The Open window at the
that is normally displayed at the start of the programme is not
displayed when this option is selected.
The current (last) archive file used by the programme is
automatically opened when PIBWin is started. The name of the file
is displayed when this option is selected.
Display Matrix as +/v/-
The identification matrix values can either be displayed as integer
numbers (ranging from 1 to 99) representing the percentage
probability of obtaining a positive result, or they can be displayed as
+/v/- depending on the criterion used for Tests are displayed as
positive if the percentage is equal to or greater than on the
Identification tabbed option.
Record identification in Output
Window
The identification of any unknown isolate, atypical tests, additional
tests to separate possible strains are recorded in an Output window
when this option is selected.
Identification achieved when the ID
score is greater than or equal to
An unknown is identified when the ID score, also known as the
Willcox probability, is equal to or greater than the specified value.
[default value 0.95]
A value within the range 0.00001 to 0.99999 can be
entered, though the accepted range for this value is 0.95 to 0.999
depending on the identification matrix
and the Modal Likelihood is greater
than or equal to
[default value 0.01]
A second criterion, the modal likelihood, is also applied to the
identification. This avoids identification when one taxon gives a
high ID score, but also has several test results that differ from the
unknown.
A value within the range 0.00001 to 0.99999 can be
entered.
List atypical results for taxa with ID
scores equal to or greater than
A value within the range 0.00001 to 0.99999 can be
entered.
[default value 0.05]
When no identification, list taxa with
ID scores equal to or greater than
[default value 0.001]
This controls how many possible taxa are listed when
identification is not achieved.
A value within the range 0.00001 to 0.99999 can be
entered.
Taxa are distinguished by at least
[default value 2]
If identification is not achieved, further tests may be selected. The
minimum number of tests to distinguish pairs of taxa can be varied,
though traditionally 2 tests is the norm.
A test separates a pair of taxa if their
percentage difference is at least
A pair of taxa are separated by a test if the absolute difference
between their matrix entries is at least the value specified. This
value can range from 51 to 98.
[default value 70]
Tests are displayed as positive if the
percentage is equal to or greater than
[default value 85]
The Identification matrix values either be displayed as integer
numbers (ranging from 1 to 99) representing the percentage
probability of obtaining a positive result, or they can be displayed
as +/v/- depending on the value selected.
This value can range from 51 to 99. Negative results are calculated
as 100-the chosen value.
Theory
Most computer assisted identification systems are based on Willcox's implementation of Bayes theorem.
where:
is the probability that an unknown isolate, giving a pattern of test results R, is a member
of taxon (group of bacteria) ti and
is the probability that the unknown has a pattern R given that it
is a member of taxon ti. Bayes theorem incorporates prior probabilities; these are the expected prevalence
of strains included in the identification matrix. For bacterial identification most authors give all taxa an
equal chance of being isolated and therefore the prior probabilities for all taxa are set to 1.0 and omitted
from the equation. The above equation therefore can be re-expressed as:
where the probabilities are now referred to as Identification Scores, or Willcox Scores. The identification
scores for each taxon are normalized values and Li* for all taxa sums to one. Identification of an unknown
isolate is achieved when Li* for one taxon exceeds a specified threshold value.
An example is shown below with an identification matrix consisting of three taxa for which we have the
probabilities for four tests.
Identification matrix with results of unknown
1
2
3
4
a
0.01
0.20
0.99
0.90
b
0.95
0.01
0.99
0.01
c
0.99
0.10
0.85
0.99
+
-
+
missing
Tests
Taxa
Results of unknown
An unknown has been isolated whose results for the first three tests are positive, negative and positive
respectively. The likelihoods that the taxa a, b and c will give the pattern of results observed for the
unknown is calculated by multiplying the probability of obtaining a positive result for test 1 by the
probability of obtaining a negative result for test 2 by the probability of obtaining a positive result for test 3
for each taxon in turn.
Calculation of likelihood of unknown
1
Taxa
2
3
Likelihood
a
0.01
*
(1-0.20)
*
0.99
=
0.00792
b
0.95
*
(1-0.01)
*
0.99
=
0.93110
c
0.99
*
(1-0.10)
*
0.85
=
0.75735
Sum
=
1.69637
The original identification matrix only gives the probabilities for positive results, in order to use the
probability for a negative result we must subtract the matrix entries for test 2 from 1.
Calculation of likelihood of unknown
1
Taxa
2
3
Likelihood
a
0.01
*
(1-0.20)
*
0.99
=
0.00792
b
0.95
*
(1-0.01)
*
0.99
=
0.93110
c
0.99
*
(1-0.10)
*
0.85
=
0.75735
Sum
=
1.69637
The Identification Scores are expressed as normalized likelihoods.
Willcox probabilities (normalised likelihoods)
Identification Score
Taxa
a
0.00792 / 1.69637
=
0.004669
b
0.93110 / 1.69637
=
0.548877
c
0.75735 / 1.69637
=
0.446455
=
1.000000
Sum
In this example the unknown is not identified because a single taxon does not reach the identification
threshold value. Taxa b and c are still both candidates for the identity of the unknown. Threshold values of
0.999 are typically used, for example with the Enterobacteriaceae, but with other groups of bacteria, such
as the streptomycetes, values as low as 0.95 have been used. In practical terms, a value of 0.999 means that
the taxon which the unknown identifies with will have at least two test differences from all other taxa in the
matrix.
Whatever type of identification system is used, there are four possible outcomes:
The unknown is identified with the correct taxon.
The unknown is misidentified, i.e. incorrectly attributed to wrong taxon.
The unknown is not identified at all, and correctly so because the taxon to which it
belongs is not present in the matrix.
The unknown is not identified, but should have been identified with a taxon that is
present in the matrix.
It is important that any system deals with these possibilities, although the last one is difficult to resolve.
One problem with the identification score is that if an unknown is not represented in the matrix, but one
strain within the matrix is closer to it (in a-space) than all others, the unknown may be identified as this
strain. This is where additional criteria should be used to assist the identification process. These include,
listing the differences in test results between the unknown and the strain it has been identified as, as well as
the use of other numeric criteria such as taxonomic distance, the standard error of taxonomic distance
measures or maximum likelihoods. Taxonomic distance is the distance of an unknown from the centroid of
any taxon with which it is being compared; a low score, ideally less than 1.5, indicates relatedness. The
standard error of taxonomic distance assumes that the taxa are in hyperspherical normal clusters. An
acceptable score is less than 2.0 to 3.0, and about half the members of a taxon will have negative scores,
because they are closer to the centroid than average. The maximum, or best likelihood, is the maximum
probability for a taxon calculated using those tests carried out on the unknown. The calculation uses the
maximum of the probabilities of a negative and positive result of a test.
Maximum possible likelihoods
1
Taxa
2
Best
Likelihood
3
a
(1-0.01)
*
(1-0.20)
*
0.99
=
0.78408
b
0.95
*
(1-0.01)
*
0.99
=
0.93110
c
0.99
*
(1-0.10)
*
0.85
=
0.75735
This allows for taxa with several entries of 0.50 in a matrix. Some authors calculate the
likelihood/maximum likelihood ratio, termed the modal likelihood fraction
Modal likelihood fraction
Modal likelihood
Taxa
a
0.00792 / 0.78408
=
0.010101
b
0.93110 / 0.93110
=
1.000000
c
0.75735 / 0.75735
=
1.000000
or it’s inverse and use it to decide whether to accept the identification offered by a Willcox score that has
exceeded the identification threshold.