Manual PIBWinhlp

PIBWin HELP FILE CONVERTED TO WORD
This tutorial was written by Dr. Trevor Bryant and goes far more in-depth than the Schnf
Ashex Tutorial in terms of PIBWins abilities.
Introduction
PROBABILISTIC IDENTIFICATION OF BACTERIA for Windows (PIBWin) is a windows
version of a DOS program PIB (also called Bacterial Identifier).
The programme has three major functions:
the identification of an unknown isolate
the selection of additional tests to distinguish between possible strains if
identification is not achieved
the storage and retrieval of results
It also has some utility functions for assessing the usefulness of identification matrices and for
converting matrices into different formats.
The program makes use of Excel files to store identification matrices and archived results to
achieve this, although other file formats are supported to allow backwards compatibility with
the DOS version of the programme.
Up to date information on the programme can be found on the PIBWin web site
www.som.soton.ac.uk/staff/tnb/pib.htm which can also be accessed from the Help menu.
The program is designed to use probabilistic identification matrices that have either published
in the literature or created by the user. The matrices that are provided with PIB have been
taken from the literature. These matrices have been typed in from the publication describing
them and users should refer to these publications for full details of the methods used when
testing isolates.
Identification Matrix
The identification matrix is displayed when the Matrix tab is selected.
The matrix may be displayed as integer numbers (ranging from 1 to 99) representing the
percentage probability of obtaining a positive result, or they can be displayed as +/v/depending on the value selected. This option is set by the Options.
The view can be changed by clicking the right mouse button and checking or unchecking
Display Matrix as +/v/- on the pop up menu.
To view the full name for a test or taxa move the cursor over the item, a pop up box will display
the item in full.
Sorting the identification matrix
The matrix can be sorted by double clicking on the name at the top of each column. The first
double click performs an ascending sort (negative results first), successive double clicks
perform descending and ascending sorts.
Note the underlying identification matrix is not affected by sorting as the Matrix tab displays a
view of it. To return to the original order, either click the right mouse button and select Revert
to original order, or select another tab and then return to the Matrix tab.
Results
The Results tab is where the results for an unknown strain are entered.
There are four aspects to the Results screen
Details Bar
Results Grid
Entering Results
Buttons
Details Bar
The details bar is where a personal key, the source of the isolate and details about the isolate
can be entered.
Key can be a maximum of 15 characters. A key must be entered if the results are to be saved
to an Archive file for recall at a later time.
Source is drop down list box which allows text up to a maximum of 50 characters to be
entered. To achieve consistent entry of source text, existing values from the Archive file is
displayed in the drop down list, so the list will grow in length over time.
Details provides for a maximum of 255 characters.
The Save button is enabled when one result has been entered and there is an entry in the Key
box; it is only shown on the Identification and Additional Tests tabs.
Note: If an isolate is recalled from the Archive file and the key changed. Save will create a
new, additional, record in the Archive file.
Results Grid
Results can be entered in a grid or list format. This is controlled by the status of the Use List
Format for Results check box.
Grid format enables a 96 well microtitre plate format to be accommodated. The full name of
each test is shown in a pop up box when the cursor is placed over the test name.
List Format is a scrolling list
Entry of Results
Results can be entered using the keyboard or the mouse. There are 4 possible states for a
result:
positive + , negative -, indeterminate ? and not done.
The indeterminate state is to allow for tests that have been carried out, but the interpretation of
the result is difficult and you are undecided about the result. The indeterminate state allows
you to record that the test has been done, rather than the result is missing.
Result
Function
Key
Key
Mouse
Action
Positive
+ or =
F2
Left click
Negative
- or _
F3
Right click
Indeterminate ? or /
F4
Missing
<space
<Enter>
bar>
or
F5
Repeat click
The programme has been written so that the shift character does not have to be pressed to
obtain the + or ? symbol, although some keyboard layouts may differ.
To change a result press the key for the new value.
To remove a result using the mouse, click a second time.
Note: because of the way the mouse works, the first left click sometimes acts as a select
object so an additional click is needed.
Buttons
Reset Clears the results of the current isolate and resets them
all to missing. The details are left unchanged
New
Clears the results and the details of the current isolate

and resets them all to missing.
Recall Recalls the results of a previous isolate from an Archive

file
Archived Results
The Archive Results screen displays details and identification of previously entered isolates. If
an Archive file is not already open then an Open window is displayed when the Recall button
is pressed in the Results window.
To recall the results of a previous isolate Double Click on the row of the isolate.
Sorting the Archived Results
Each column of information can be sorted. Click on the column heading to sort the archived
isolates into ascending order, a second click reverses the sort into descending order.
Searching the Archived Results
The Find button activates a search of the archived results. Searching is case insensitive, it
does not include wild cards or complex searching. Once a hit has been obtained, the Find
Next button is enabled to permit further searching.
Searching is performed across all rows and columns excluding the first column.
Technical details
The software can support two types of Archive Files, Excel and DOS Archive.
The DOS Archive format is for backwards compatibility with the previous DOS version of this
software. It is not recommended that this format is used. It contains less information about
isolates and is less flexible. The Excel format is recommended.
The Excel Archive file can be opened and manipulated in Microsoft Excel. This enables the
data to be used by other software packages, unwanted isolate information deleted. DO NOT
CHANGE the order of the columns in the Archive file. This would make the file unusable with
the identification matrix. There are some internal checks that the software performs to detect
discrepancies between the Identification matrix file and the Archive file but these are not fool
proof. It is a case of user beware. So if you wish to experiment make sure that you have taken
back ups of your files before they are modified.
Identification
The identification tab is shown once a test result has been entered in the Results window.
Additional Tests
This tab is available when Identification is not successful and more than one taxon is a
possible candidate for the unknown isolate.
Tests may be chosen in two ways:
they may be selected so that the most likely taxon can be distinguished from other
likely taxa.
they can be selected to distinguish likely taxa from each other.
Use the radio buttons to select which method of test selection you wish to choose, then use
the spin edit box
to choose the number of taxa to be considered.
Use Select Tests to obtain the list of tests to be used.
Move the cursor over the strains and tests to obtain the name in full in a pop up window.
The Exclude Tests button allows you to specifically omit certain tests before test selection is
carried out.
See Also Test Selection Algorithm
Exclude Tests
The Exclude Tests window is used by the Additional Tests and Select Best Tests for Matrix
procedures.
A list of tests in the current matrix is displayed. Those tests that will be omitted from the test
selection procedure are shown with an asterisk * in the Excluded column.
Tests can be included or excluded by clicking on the Excluded column.
Include All Tests is used to include all tests from the Test Selection procedure
Exclude All Tests is used to exclude all tests from the Test Selection procedure, then those
tests that are required can be selected by clicking in the Exclude column.
Tools
The Tools menu options provide functions for manipulating matrix files and investigating the
properties of an identification matrix
Convert Matrix
The Identification matrix file can be written in one of three formats:
Excel [*.xls]
Comma separated values [*.csv]
Fixed format [*.mat]
The recommended format is to use the Excel format because this
contains more information that the other two formats.
The fixed format is for backwards compatibility with the original
DOS version of this software and its use is not recommended.
Convert DOS archive
This allows the Archive file created by the original DOS version of
this software to be rewritten in the Excel archive format. It is
strongly recommended that you convert old Archive files.
Note: a new Archive file is created and the original Archive file is
left untouched.
Select Best Tests
This allows investigation of the current matrix to determine which

are the most important tests in the matrix. See Select Best Tests
for Matrix for further details
Calculate
scores
Matrix
ID This allows investigation of the current matrix to determine if there

is an overlap between strains in the matrix. See Matrix ID scores
for further details
Select Best Tests for Matrix

This procedure is called from the Tools Menu. The procedure can be used to select the
minimum of tests to distinguish taxa in an identification matrix.
Tests may be chosen in two ways:
they may be selected so that one taxon can be distinguished from other strains
(taxa).
they can be selected to distinguish all strains (taxa) from each other.
Use Select Tests to obtain the list of tests to be used.

Move the cursor over the strains and tests to obtain the name in full in a pop up window.
The Exclude Tests button allows you to specifically omit certain tests before test selection is
carried out.
See Also Test Selection Algorithm
Matrix ID Scores
The Matrix ID scores procedure is called from the Tools Menu. It is used to assess whether
the identification matrix is capable of identifying each taxon (strain) that is contained in it. The
procedure considers each taxon in turn, it uses each percentage probability for that taxon as a
positive or negative result, creating a Hypothetical Median Organism (HMO). It then uses this
HMO to calculate an Identification Score using the Willcox probability. If any probabilities of 50
are encountered (typically missing data is coded as 50), the identification score is calculated in
three ways, tests where a value of 50 is found for the taxon are:
excluded
all treated as positive results
all treated as negative results
These results are shown as ID Score, Missing Positive and Missing Negative.
If the ID score does not exceed the Identification Threshold then the strain with the second
highest identification score is listed in the Next Strain column.
Ideally the ID Score and Missing Positive and Missing Negative columns should display values
of 1.00000.
If identification is not achieved then the most likely taxa are listed descending order of their
identification scores. The Additional Tests tab is shown when the Identification tab is selected.
Differences between the unknown isolate likely taxa are listed in a second grid.
What is displayed is controlled by the threshold values set in Options.
Options
This calls the Options window which has two tabbed Options: General and Identification.
The Use default values button resets the defaults for values on the Identification tab.
Open Last Identification Matrix
The current (last) identification matrix used by the

programme is automatically opened when PIBWin is started.
The name of the file is displayed when this option is
selected. The Open window at the that is normally displayed
at the start of the programme is not displayed when this
option is selected.
Open Last Archive File:
The current (last) archive file used by the programme is

automatically opened when PIBWin is started. The name of
the file is displayed when this option is selected.
Display Matrix as +/v/-
The identification matrix values can either be displayed as

integer numbers (ranging from 1 to 99) representing the
percentage probability of obtaining a positive result, or they
can be displayed as +/v/- depending on the criterion used for
Tests are displayed as positive if the percentage is equal to
or greater than on the Identification tabbed option.
Record identification in Output The identification of any unknown isolate, atypical tests,
Window
additional tests to separate possible strains are recorded in
an Output window when this option is selected.
Identification achieved when the

ID score is greater than or equal
to
[default value 0.95]
An unknown is identified when the ID score, also known as

the Willcox probability, is equal to or greater than the
specified value.
A value within the range 0.00001 to 0.99999 can be
entered, though the accepted range for this value is 0.95 to
0.999 depending on the identification matrix
and the Modal Likelihood is A second criterion, the modal likelihood, is also applied to
greater than or equal to the identification. This avoids identification when one taxon
gives a high ID score, but also has several test results that
differ from the unknown.
entered.
List atypical results for taxa with A value within the range 0.00001 to 0.99999 can be
ID scores equal to or greater than entered.
When no identification, list taxa This controls how many possible taxa are listed when
with ID scores equal to or greater identification is not achieved.
than
entered.
Taxa are distinguished by at least If identification is not achieved, further tests may be
[default value 2]
selected. The minimum number of tests to distinguish pairs
of taxa can be varied, though traditionally 2 tests is the
norm.
A test separates a pair of taxa if A pair of taxa are separated by a test if the absolute
their percentage difference is at difference between their matrix entries is at least the value
least
specified. This value can range from 51 to 98.
[default value 70]
Tests are displayed as positive if
the percentage is equal to or
greater than
[default value 85]
The Identification matrix values either be displayed as

integer numbers (ranging from 1 to 99) representing the
percentage probability of obtaining a positive result, or they
can be displayed as +/v/- depending on the value selected.
This value can range from 51 to 99. Negative results are
calculated as 100-the chosen value.
Theory
Most computer assisted identification systems are based on Willcox's implementation of Bayes
theorem.
where:
is the probability that an unknown isolate, giving a pattern of test results R, is a
member of taxon (group of bacteria) ti and

is the probability that the unknown has a
pattern R given that it is a member of taxon ti. Bayes theorem incorporates prior probabilities;
these are the expected prevalence of strains included in the identification matrix. For bacterial
identification most authors give all taxa an equal chance of being isolated and therefore the
prior probabilities for all taxa are set to 1.0 and omitted from the equation. The above equation
therefore can be re-expressed as:
where the probabilities are now referred to as Identification Scores, or Willcox Scores. The
identification scores for each taxon are normalized values and Li* for all taxa sums to one.
Identification of an unknown isolate is achieved when Li* for one taxon exceeds a specified
threshold value.
An example is shown below with an identification matrix consisting of three taxa for which we
have the probabilities for four tests.
Identification matrix with results of unknown
Tests
Taxa
Results of unknown
0.01
0.20
0.99
0.90
0.95
0.01
0.99
0.01
0.99
0.10
0.85
0.99
missing
An unknown has been isolated whose results for the first three tests are positive, negative and
positive respectively. The likelihoods that the taxa a, b and c will give the pattern of results
observed for the unknown is calculated by multiplying the probability of obtaining a positive
result for test 1 by the probability of obtaining a negative result for test 2 by the probability of
obtaining a positive result for test 3 for each taxon in turn.
Calculation of likelihood of unknown
1
2
Taxa
Likelihood
0.01
(1-0.20)
0.99
0.00792
0.95
(1-0.01)
0.99
0.93110
0.99
(1-0.10)
0.85
0.75735
Sum
1.69637
The original identification matrix only gives the probabilities for positive results, in order to use
the probability for a negative result we must subtract the matrix entries for test 2 from 1.
Calculation of likelihood of unknown

1
2
Taxa
Likelihood
0.01
(1-0.20)
0.99
0.00792
0.95
(1-0.01)
0.99
0.93110
0.99
(1-0.10)
0.85
0.75735
Sum
1.69637
The Identification Scores are expressed as normalized likelihoods.

Willcox probabilities (normalised likelihoods)
Identification Score
a
0.00792 / 1.69637
0.004669
Taxa b
0.93110 / 1.69637
0.548877
0.75735 / 1.69637
0.446455
Sum
1.000000
In this example the unknown is not identified because a single taxon does not reach the
identification threshold value. Taxa b and c are still both candidates for the identity of the
unknown. Threshold values of 0.999 are typically used, for example with the
Enterobacteriaceae, but with other groups of bacteria, such as the streptomycetes, values as
low as 0.95 have been used. In practical terms, a value of 0.999 means that the taxon which
the unknown identifies with will have at least two test differences from all other taxa in the
matrix.
Whatever type of identification system is used, there are four possible outcomes:
The unknown is identified with the correct taxon.
The unknown is misidentified, i.e. incorrectly attributed to wrong taxon.
The unknown is not identified at all, and correctly so because the taxon to which it
belongs is not present in the matrix.
The unknown is not identified, but should have been identified with a taxon that is
present in the matrix.
It is important that any system deals with these possibilities, although the last one is difficult to
resolve. One problem with the identification score is that if an unknown is not represented in
the matrix, but one strain within the matrix is closer to it (in a-space) than all others, the
unknown may be identified as this strain. This is where additional criteria should be used to
assist the identification process. These include, listing the differences in test results between
the unknown and the strain it has been identified as, as well as the use of other numeric
criteria such as taxonomic distance, the standard error of taxonomic distance measures or
maximum likelihoods. Taxonomic distance is the distance of an unknown from the centroid of
any taxon with which it is being compared; a low score, ideally less than 1.5, indicates
relatedness. The standard error of taxonomic distance assumes that the taxa are in
hyperspherical normal clusters. An acceptable score is less than 2.0 to 3.0, and about half the
members of a taxon will have negative scores, because they are closer to the centroid than
average. The maximum, or best likelihood, is the maximum probability for a taxon calculated
using those tests carried out on the unknown. The calculation uses the maximum of the
probabilities of a negative and positive result of a test.
Maximum possible likelihoods

1
Taxa
Best
Likelihood
(1-0.01)
(1-0.20)
0.99
0.78408
0.95
(1-0.01)
0.99
0.93110
0.99
(1-0.10)
0.85
0.75735
This allows for taxa with several entries of 0.50 in a matrix. Some authors calculate the
likelihood/maximum likelihood ratio, termed the modal likelihood fraction
Modal likelihood fraction

Modal likelihood
a
0.00792 / 0.78408
0.010101
Taxa b
0.93110 / 0.93110
1.000000
0.75735 / 0.75735
1.000000
or its inverse and use it to decide whether to accept the identification offered by a Willcox
score that has exceeded the identification threshold.

Manual PIBWinhlp

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Manual PIBWinhlp

Uploaded by

Copyright:

Available Formats

PIBWin HELP FILE CONVERTED TO WORD

List Format is a scrolling list

Clears the results and the details of the current isolate

Recall Recalls the results of a previous isolate from an Archive

to choose the number of taxa to be considered.

Use Select Tests to obtain the list of tests to be used.

Convert DOS archive

Select Best Tests

This allows investigation of the current matrix to determine which

ID This allows investigation of the current matrix to determine if there

Select Best Tests for Matrix

Use Select Tests to obtain the list of tests to be used.

Open Last Identification Matrix

The current (last) identification matrix used by the

Open Last Archive File:

The current (last) archive file used by the programme is

Display Matrix as +/v/-

The identification matrix values can either be displayed as

Identification achieved when the

An unknown is identified when the ID score, also known as

The Identification matrix values either be displayed as

is the probability that an unknown isolate, giving a pattern of test results R, is a

member of taxon (group of bacteria) ti and

Calculation of likelihood of unknown

The Identification Scores are expressed as normalized likelihoods.

Maximum possible likelihoods

Modal likelihood fraction

You might also like