Probability of Random Correspondence for Forensic Fingerprints

UNIVERSIDAD AUTONOMA DE MADRID
ESCUELA POLITECNICA SUPERIOR
MASTER IN COMPUTER SCIENCE AND TELECOMMUNICATION
ENGINEERING
Probability of Random
Correspondence for Forensic
Fingerprints Based on Statistical
Models of the Feature Space
-MASTER THESIS-
Probabilidad de Coincidencia Aleatoria para Huellas
Dactilares Forenses basada en Modelado Estadistico del
Espacio de Caracteristicas.
-TRABAJO DE FIN DE MASTER-
María Puertas Calvo
September 2011
Probability of Random
Correspondence for Forensic
Fingerprints Based on Statistical
Models of the Feature Space
AUTHOR: María Puertas Calvo
ADVISOR: Daniel Ramos Castro
ATVS Biometric Recognition Group
(http://atvs.ii.uam.es)
Dpto. de Tecnología Electrónica y de las Comunicaciones
Escuela Politécnica Superior
Universidad Autónoma de Madrid
Resumen
La huella dactilar como rasgo inconfundiblemente único y discriminante en cada
individuo ha sido el rasgo biométrico más utilizado durante más de un siglo para
identificar a criminales que dejaron su marca en la escena del crimen. La teoría de
unicidad de la huella dactilar, que afirma que no existen dos huellas dactilares iguales
ha sido aceptada universalmente pese a la dificultad de realizar estudios significativos
que lo confirmen. Sin embargo, en los últimos años la aparición de errores en
identificación ha llevado a que esta teoría sea cuestionada en diversos ámbitos del
mundo científico y forense.
A raíz de esta problemática, en la comunidad científica han surgido distintas
aproximaciones cuyo objetivo común es cuantificar la individualidad de las huellas
dactilares. El mayor desafío para el estudio de la individualidad en huella dactilar es el
desarrollo de modelos estadísticos que describan con precisión la variabilidad de las
características de las huellas dactilares. Estos modelos podrán ser entonces utilizados
para calcular la Probabilidad de Coincidencia Aleatoria (PCA). Dada una impresión
dactilar (de cualquier naturaleza) con un patrón determinado de características, la PCA
se define como la probabilidad de encontrar en una población determinada, otra huella
que presente un determinado número de características coincidentes con la primera.
En este trabajo se presenta una medida de cuantitativa que permite conocer la
Probabilidad de Coincidencia Aleatoria (PCA) entre una marca encontrada en la escena
de un crimen y una huella dactilar tomada de un sospechoso. Es decir, la probabilidad
de que las características de una huella anónima correspondan con las de la huella de
un individuo distinto de quien la originó. El grado de individualidad de una huella
dactilar se basa en la distribución de sus puntos característicos, denominados minucias.
Para ello, se realiza un modelado estadístico que representa la distribución de
minucias de una población relevante. Este modelo está basado en mezclas de
distribuciones Gaussianas y Von-Mises que permiten ajustarse con precisión a las
distribuciones reales de la localización y la orientación de las minucias.
Como paso previo, se desarrolla un algoritmo de alineamiento de impresiones
dactilares con la huella latente cuya individualidad se pretenda calcular. Una vez
alineadas las impresiones, se utilizarán para entrenar el modelo estadístico de
distribución de las minucias, para el que se tendrá en cuenta la posición y la
orientación de las minucias en la huella dactilar. A partir de este modelo obtenemos la
PCA de la huella latente. Las bases de datos de NIST, SD14 y SD27 han sido utilizadas
para la generación de los modelos y cálculo de la PCA. Estas bases de datos se
caracterizan por su origen forense y contienen tanto impresiones dactilares (27258)
como huellas latentes reales (258). Para la extracción se ha utilizado un SDK comercial
y se ha desarrollado un algoritmo de post-procesado.
i
Abstract
Fingerprints have been, for more than a century, the most used biometric trait to
identify criminals that leave their mark in a crime scene. The uniqueness of fingerprints
has been universally accepted although there aren’t any significant studies to confirm
it. However, fingerprint identification errors have been discovered leading the theory
of uniqueness to be questioned by many experts in forensic science.
Based on this issues, many approaches have emerged from the scientific community to
try to quantify the individuality level of fingerprints. The biggest challenge in the
study of fingerprint individuality is to develop statistical models that precisely
describe the variability of the fingerprint features. These models can be then used to
calculate the Probability of Random Correspondence (PRC). Given a fingerprint with a
determined set of features, the PRC can be defined as the probability of finding another
fingerprint among a population that shares a determined number of coincident
features with the first one.
This work presents a quantitative measure to calculate the PRC between a fingermark
found on a crime scene and a fingerprint obtained from a suspect. That is, the
probability that the features from a selected fingerprint match those from another
fingerprint that comes from a source different than the first one. The degree of
individuality of a fingerprint, in this work, is based on the distribution of the most
common used fingerprint features: the minutiae. To find this individuality, a statistical
model is developed that represents the minutiae distribution of a population. This
model is based on Gaussian and Von-Mises distributions, which model precisely the
real location and direction of the minutiae.
As a necessary previous step, a minutiae alignment algorithm is developed. This is
done in order to align the fingerprint impressions to the query fingermark. Once they
are aligned, minutiae location and direction are used to train the statistical models on
the minutiae distribution. Finally, the model is used to calculate de PRC of the query
fingermark.
This worked has been implemented using NIST databases SD14 and SD27, that contain
both fingerprint impressions and latent fingermark from real crime scenes. Feature
extraction was done by using a commercial SDK and then applying a post-processing
algorithm that removes spurious minutiae.
iii
Agradecimientos
A Dani por el súper esfuerzo, por sus ideas brillantes y por entender siempre todas mis
dudas a la primera, y resolverlas. A Javier Ortega por su apoyo y su aprecio hacia mí
durante estos ya tres años en el ATVS. A todos los demás ATVsianos, por vuestra
Amistad, que es lo más valioso que he adquirido en estos años, Peter, Javi, Franco,
Rubén, Iñaki, Miriam, Marta, Gal, Julián, Ruifang (Oh my God!) y Ram. Al
departamento de Identificación de la Guardia Civil por hacer posible este trabajo. A
mis amigos, porque desde lejos me doy incluso más cuenta de lo importantes que sois
para mí. A mis padres. A mi hermana. A Scott.
Gracias.
v
Table of Contents
1.
Introduction ........................................................................................................................ 1
1.1.
Objectives .................................................................................................................... 4
1.2. Major contributions ......................................................................................................... 4
2.
Fingerprints in forensics .................................................................................................... 5
3.
Related works ..................................................................................................................... 9
4.
3.1.
Grid models................................................................................................................. 9
3.2.
Fixed probability models........................................................................................... 9
3.3.
Relative measurement models ................................................................................. 9
3.4.
Generative models ................................................................................................... 10
3.5.
Models that include other information ................................................................. 13
Statistical models on minutiae features......................................................................... 15
4.1.
Gaussian distribution............................................................................................... 16
4.2.
Von-Mises distribution ............................................................................................ 17
4.3.
Joint distribution model .......................................................................................... 18
5.
Probability of Random Correspondence (PRC) ........................................................... 21
6.
Implementation ................................................................................................................ 23
6.1.
NIST databases ......................................................................................................... 23
6.2.
Fingerprint feature extraction................................................................................. 26
6.3.
Post-processing: removing spurious minutiae..................................................... 26
6.4.
Minutiae alignment .................................................................................................. 30
6.5.
Training the mixture models .................................................................................. 35
7.
Experimental results ........................................................................................................ 39
8.
Conclusions and future work ......................................................................................... 41
8.1. Conclusions .................................................................................................................... 41
8.2. Future work.................................................................................................................... 41
Bibliography .............................................................................................................................. 45
vii
List of Figures
Figure 1: Example of erroneous fingerprint individualization. The fingermark found in
Madrid bombing (left) and Brandon Mayfield fingerprint (right). ..................................... 3
Figure 2: Four different Level 1 patterns: arch, right loop, left loop and whorl................ 5
Figure 3: Level 2 features: minutiae ......................................................................................... 5
Figure 4: Level 3 features: pores and ridge contours can be appreciated. ......................... 5
Figure 5: A single rolled ten-print impression and a latent fingermark............................. 6
Figure 6: A ten-print card .......................................................................................................... 7
Figure 7: Example of bivariate normal distribution ............................................................ 16
Figure 8: Von-Mises distributions in
, with
and different values of the
precision . ................................................................................................................................ 18
Figure 9: Three images from NIST SD 14 database ............................................................. 23
Figure 10: A latent fingermark and its mated impression from NIST S27 database ...... 24
Figure 11: Distribution of the number of minutiae for the different sets of fingerprints.
..................................................................................................................................................... 25
Figure 12: Example of fingerprint minutiae extracted with Verifinger. Many spurious
minutiae are extracted by the system. ................................................................................... 26
Figure 13: Skeletonized fingerprint image by Verifinger ................................................... 27
Figure 14: Block mapping........................................................................................................ 28
Figure 15: Fingerprint mask after opening. .......................................................................... 28
Figure 16: Fingerprint mask after selection of the biggest component ............................. 29
Figure 17: Fingerprint mask after closing. All the minutiae outside the mask are
removed. .................................................................................................................................... 29
Figure 18: Example of successfully identified spurious minutiae (red minutiae). ........ 30
Figure 19: A successful example of alignment between two fingerprint feature sets. ... 34
Figure 20: Distribution of the minutiae used to train a model........................................... 37
Figure 21: Mixture of Gaussian distributions that model minutiae locations. ................ 38
Figure 22: Von-Mises distributions of each mixture in one of the models....................... 38
List of Tables
Table 1: Some values for the experiments............................................................................. 39
Table 2: Average nPRC for different tolerance values and 12 coincident minutiae. ...... 39
Table 3: Average nPRC for different numbers of coincident minutiae w. ....................... 40
viii
1. Introduction
The theory of fingerprint uniqueness has been accepted as a true fact for more than a
century. According to this theory, the pattern in a finger is unique and an identical
pattern cannot be found on a different finger [1]. Even though this theory has never
been proved, the randomness of fingerprint patterns is so high that it is hard to believe
that two fingerprints could have the exact same pattern. However, the pattern in a
finger and a fingerprint are not exactly the same thing. In fact, they are not the same
thing at all.
The fingerprint is the mark that appears when a finger touches a surface. This mark,
that can be printed with ink, over digital sensors or just sweat, is a reproduction (or a
stamp) of the finger’s friction ridge pattern. But a reproduction of an object cannot be
taken as the object itself. Unlike a finger, a fingerprint can have good or bad quality.
The pressure applied by the finger when touching the surface introduces distortion on
the print. Finally, the way the finger is touching the surface and the nature of this
surface is a determinant factor for the size and shape of the fingerprint [2].
The theory of uniqueness has been used equally for both fingers and fingerprints.
However, we just saw that they are different things. To challenge this theory, we
would have to ask ourselves the following questions. In the case of the fingers: Can
two different fingers have the exact same ridge skin pattern? And in the case of the
prints: Can two fingerprints from different fingers be identical?
If we move this theory to a forensic frame, the question is not any of the above. In
forensics, the standards set a determined threshold that states which is the minimum
degree of similarity necessary to consider two fingerprints to come from the same
source. This threshold is in many countries set on 12 minutiae [3]. Minutiae are the
bifurcations and endings found on the ridge skin of a finger. They are the most
commonly used features used to compare fingerprints.
So now, in forensic terms, the question is: Can a fingerprint found on a crime scene
share a determined (big) number of features with the fingerprint of a random person
different from the one that generated it?
For many years, all three questions have been answered with the same word: No. The
unproved theory of uniqueness has been taken as an irrefutable truth and made no
distinction between fingers, fingerprints and forensic fingermarks. However, it is
obvious that the former questions are very different and cannot be answered equally.
In the first case, there is no way to know if two fingers can share the exact same
pattern, as it is impossible to compare all the fingers in the world. In the case of
forensic fingerprints, however, this theory has already been challenged several times in
the past years.
1
The first one was the 1999 case of USA vs. Byron Mitchell [4], where fingerprint
evidence was not accepted stating that uniqueness of fingerprints had not been
objectively tested and its potential matching error rates were unknown. After this case,
fingerprint-based identification has been challenged in more than 20 cases in the USA
[5] [6].
In all these cases, Daubert rules where applied to reject fingerprint evidence. According
to these rules, in order to have scientific evidence accepted into court, it has to fulfill
the following premises [7]:
1. The particular technique has been subject to statistical hypothesis testing
following real casework conditions.
2. The accuracy of the technique is known.
3. Standards controlling the technique’s operation exist and have been
maintained.
4. The technique has been subject to peer review and publication.
5. It has a widespread acceptance within the relevant scientific community.
The main issue with the admissibility of fingerprint evidence is that the
individualization process has not been subjected to the principles of scientific
validation. A Paradigm Shift is needed in order to adapt forensic recognition to a more
transparent and scientific frame [8]. According to Saks and Koehler, “When factfinders hear evidence of a forensic match, a proper assessment of the probative value of
that match requires awareness of the chance that a mistake was made”.
Another significant motivation that encourages an improvement in forensic
identification techniques is the finding of several cases where innocent people were
convicted based on an erroneous forensic identification. One of the most important and
highly impacting of these errors was that of Brandon Mayfield, who was wrongly
arrested after the Madrid bombings in 2004 [9]. A latent fingermark found on the crime
scene was matched to Mayfield’s fingerprint using the FBI’s IAFIS and subsequently an
identification decision was reported by different FBI examiners. Some days later, the
Spanish National Police linked this fingermark to someone else, and the FBI recognized
their mistake. This is one of the most famous cases, but many other erroneous
identifications have been made in the past [10] [11].
In the past years, many experts in the field have suggested to adapt the current forensic
evaluation techniques to that of the DNA profiling [9] [10] [11]. One of the great
strengths of DNA typing is that it uses a statistical approach based on population
genetics theory and empirical testing. Experts evaluate matches between suspects and
crime scene DNA evidence in terms of the probability of random matches across
different reference populations (e.g., different ethnicities). These probabilities are
derived from databases that identify the frequency with which various alleles occur at
different locations on the DNA strand. When the value of the evidence is presented
2
into court, a similarity measure is given along with this probability of random match,
which corresponds to the technique’s error rate [8].
Figure 1: Example of erroneous fingerprint individualization. The fingermark found in Madrid
bombing (left) and Brandon Mayfield fingerprint (right).
In the field of fingerprints, many researchers have started to characterize the
individuality of fingerprints in terms of probability of a random match [15] [16] [17].
However, most of these approaches try to state a level of individuality of fingerprints
in general. Using databases of good quality fingerprints, they try to find what the
probability is to find two random fingerprints that share a determined number of
features. However, to our knowledge, there are not any works that calculated what the
probability of a random match is for a specific fingerprint found on a crime scene and a
certain population.
In this work, the main purpose is to calculate the Probability of Random
Correspondence of a fingerprint found on a crime scene with respect to a fingerprint
database. With this information, a fingerprint examiner can have a better idea of the
rarity of the fingerprint that he/she is analyzing. Moreover, a well calculated PRC can
be used as the error rate of the technique. This error rate should be reported, according
to Daubert rules, along with the result of the comparison to a court.
In order to obtain a good measure of the PRC, statistical models are going to be
implemented to capture the variability of fingerprints in the feature space. A realistic
forensic dataset is going to be used to train the models and calculate the PRC. Finally,
the calculated values of the PRC will be compared to the empirically obtained values to
verify the validity of the models and the measures.
3
1.1. Objectives
The main objectives of this work are:

To generate a statistical model than can reflect the real distribution of fingerprint
features among a big population.

To develop and test a measure for the specific Probability of Random
Correspondence in a strict forensic framework using the generated models.

To validate the results by comparing them to empirically obtained measurs.
To make these objectives reachable, a series of half-way objectives need to be achieved.

To obtain reliable features from a realistic forensic fingerprint database in order to
have results that are as reliable as possible.

To develop a feature alignment algorithm in order to be able to train the statistical
models.
1.2. Major contributions
The main contribution of this work is that it adapts existing techniques to evaluate the
individuality of fingerprints into a strictly forensic framework. While other authors
have evaluated the probability that two fingerprints are alike, this word focuses in
finding the probability of finding a false match between a crime scene fingermark and
a random fingerprint from a specific population. To do so, the fingerprint feature space
has been modeled in order to simulate the statistical distribution of minutiae locations
and directions among a big population. These models have been used to calculate the
specific Probability of Random Correspondence from a set of forensic fingermarks.
This measure allows knowing how rare the query fingerprint is with respect to a
population, which is in fact a measure of how likely it is to commit a mistake when
finding a match between the query and a fingerprint from the population.
In order to train the models, the feature space needs to be aligned to the query features.
For this matter, a minutiae alignment algorithm was developed in this work.
An algorithm to detect spurious minutiae in the fingerprints is also an original
contribution of this work.
4
2. Fingerprints in forensics
Human fingerprints are formed from the 10th to the 16th week of the estimated
gestational age (EGA) and remain permanent throughout the whole life of an
individual [1]. Each fingerprint has a highly particular configuration of ridges and
valleys that make the fingerprint a very discriminating trait for biometric identification.
Three different abstraction levels are used in fingerprint recognition.

Level 1 is the fingerprint general pattern and is normally used for fingerprint
classification.
Figure 2: Four different Level 1 patterns: arch, right loop, left loop and whorl.

Level 2 features refer to ridge bifurcations and endings, i.e. minutiae. Minutiae
are the most commonly used features by automatic systems and human beings to
differentiate among fingerprints.
Figure 3: Level 2 features: minutiae

Level 3 includes ridge contours, pores and incipient ridges. These last features
cannot be appreciated in bad quality images. In order to use them for
individualization, high resolution images and good quality fingerprints are
needed [2].
Figure 4: Level 3 features: pores and ridge contours can be appreciated.
5
In forensic science, fingerprints can be classified in two large groups: fingerprint
impressions and latent fingermarks [3]. In figure 5, an example of the different kinds of
fingerprints can be found.
Figure 5: A single rolled ten-print impression and a latent fingermark.
Fingerprint impressions are acquired from arrested people usually by scanning the
inked impression on paper or directly from the fingers with a live-scan device. From
every finger of the subject, two different impressions are acquired: a rolled impression
and a plain impression. Rolled impressions are obtained by rolling the finger from one
side to the other. Plain impressions are those in which the finger is pressed down but
not rolled. Rolled impressions have a larger surface including as much information as
possible. Plain impressions have fewer minutiae but are less distorted and have clearer
ridges. Fingerprint impressions are stored in ten-print cards or fingerprint records. A
ten-print card contains the 10 rolled and 10 plain impressions of one individual (see
figure 6). The controlled conditions enforced during the acquisition process usually
provide high quality fingerprint impressions that can be easily processed by automatic
minutiae extractors. For this reason, after a fingerprint record is scanned, the feature
templates of the fingerprints are automatically extracted and enrolled into the system’s
database.
The other kind of fingerprints in forensics is known as latent marks or fingermarks.
Although fingermarks can also be visible (e.g. a blood mark), latent fingermarks are the
most common [4]. Latent fingermarks, made of a mix of secretions and contamination,
are usually lifted from surfaces that have been touched by a person. The different
techniques used to recover these prints from the original surface go from UV imaging
to more complex chemical processes [17][20].
6
In forensic science, anonymous marks that are found in crime scenes have been used as
evidence for more than a century. However, individual identification using
fingermarks is not a trivial problem due to their bad quality. Generally, fingermarks
have a small surface, are distorted and may contain artifacts. All these problems make
it hard for automatic systems to perform well when doing feature extraction and
matching. For this reason, a lights-out identification process, i.e. without human
intervention, is not yet possible for latent fingerprints and a lot of human effort is
required when searching a latent mark with an AFIS [3].
Figure 6: A ten-print card
AFIS are computer systems used by law enforcement agencies in most countries to
help them solving crimes by searching anonymous fingermarks among millions of
criminal records [5]. As most biometric systems, AFIS systems perform two main
differentiated steps: feature extraction and matching.
The feature extractor is the module in charge of searching for the discriminating
features in the fingerprint image and creating a template. The matcher module
compares fingerprint templates and gives a similarity measure, called score, about how
similar two fingerprint templates are. AFIS matchers operate in so-called identification
mode or 1 to N mode [6]. This means that the matcher compares the query against all
the enrolled fingerprint templates stored in the database. The matcher’s output consists
of a list of the fingerprints that achieve the highest scores.
7
State-of-the-art AFIS can perform different types of searches [6].
1.
2.
3.
Ten-print to ten-print: these searches are those in which a ten-print record is
searched against the ten-print database. The amount of information available in
this kind of searches allows them to be lights out and human intervention is
usually not needed. Feature extraction, matching and verification are done
automatically by the system.
Fingermark to ten-print: this is the most critical function of an AFIS, as it helps
finding the author of a crime if their fingerprints were previously registered in the
AFIS. The low quality and reduced area of fingermarks make it harder for the
AFIS than impression to impression searches.
Fingermark to fingermark: useful to identify two anonymous prints that have been
left by the same person, even when the person has not been identified. This is a
really difficult task for an AFIS, as partial fingermarks do not always have
information from the same part of the finger.
This work focuses on fingermark to ten-print matching, where the feature extractor of
the AFIS is not normally used on the fingermarks. On the contrary, fingerprint
examiners manually mark the features on the fingermark before sending it to the
matcher. The matcher will then compare the template manually generated to the
automatically-extracted ten-print templates stored in the database. After the matching
process, a list of 10 to 20 candidates with the highest matching scores is returned, and
human intervention is needed again. The fingerprint examiner will then analyze high
scoring prints in the candidate list and compare them manually to the query.
Once a good candidate to match the query fingermark is found, either by using an
AFIS or an impression from a suspect, a decision is to be made by following the ACE-V
protocol. The ACE-V protocol consists of 4 steps, namely Analysis, Comparison,
Evaluation and Verification. In the first three steps, the examiner analyzes the two
fingerprints separately, then compares the features between both fingerprints and
finally evaluates if they share enough features to belong to the same source, stating if
the amount of information is enough to determine such decision. Once this process is
complete, the Verification involves a different examiner going again through the whole
ACE process [4]. Once this process is complete, the examiners report their decision,
which can be one of the three:

Individualization: both prints come from the same source.

Exclusion: The fingerprints come from different sources.

Inconclusive: There is not enough information in the fingerprints take a
decision.
This decision is there reported to a court with normally no other data to back it up.
In general, the decision is only based on the examiner’s experience and no error
rates of any kind are provided.
8
3. Related works
Over the past decades, several attempts have been made to establish the individuality
of fingerprints. According to [7] about 20 different models have been proposed in the
past trying to establish the probability of two random people sharing a high number of
coincident features in their fingerprint. This probability is known as Probability of
Random Correspondence (PRC). Most of the proposed methods model minutiae. This is
due to the fact that minutiae features are the most common features used in both
manual and automatic fingerprint recognition. Some methods add other features to
their models, such as ridge information or fingerprint quality, making them more
complex.
3.1. Grid models
Grid models use grids to divide a fingerprint into individual squares. Then the squares
are examined to find the distribution of minutiae and calculate the probability of
occurrence of an individual square. The probability of a particular fingerprint is
calculated as the product of the probability of each square, as the minutiae are
considered to be spatially independent. The Galton model [8] and the Osterburg
model [9] are examples of this very simple methodology.
3.2. Fixed probability models
This family of models assumes fixed probability of occurrence of minutiae, which are
considered also locally independent from each other. Therefore, the probability of N
minutiae occurring at their respective locations would be PN. Some variations of fixed
probability models are those proposed by Henry [10], Balthazard [11] and Wentwort
[12].
3.3. Relative measurement models
These models measure minutiae features, position and orientation, relative to other
minutiae or relative to the core of the fingerprint. The Trauring model [13] uses a
fingerprint identification system that measure the position of minutiae relative to the
position of three minutiae selected while enrolling a fingerprint. This model is only
valid for identification and requires good quality fingerprints and the use of an
automatic system.
Another model was proposed by Champod and Margot which considers nine different
types of minutiae and empirically calculates the probability of each of them for each
9
position and orientation of the fingerprint. Finally it calculated the PRC by considering
all the parameters (type, location, orientation) independent [14].
Neumann at al [15] [16] proposed a model to computed likelihood ratios (LR) to assess
the evidential value of comparison by spatial modeling using radial triangulation of
minutiae. For assessing the numerator of the LR, this work develops a probabilistic
distortion model.
3.4. Generative models
Generative models are statistical models that represent the distribution of the features.
In these models, the statistical distributions are learnt through a training process that
uses fingerprint data.
Pankanti’s Model
In 2002 Pankanti et al [17] proposed a stochastic model for minutiae distributions in
order to calculate the PRC using this model. This model considers only two types of
minutiae, namely bifurcations and endings, as they are the most common minutiae
types. The minutiae in the fingerprint are considered to follow a uniform distribution
on the area of the fingerprint. Minutiae location and orientation are assumed to be
independent from each other. Pankanti avoids the problem of minutiae alignment by
only using good quality manually aligned minutiae to train the model.
Let
denote a generic random minutiae location,
and
denote its corresponding direction. A minutiae pair is defined as
. A match
between two minutiae from the query
and the template
,
and
happens when the following conditions are fulfilled:
√(
)
(
)
(1)
(2)
where r0 is the tolerance in distance and d0 is the orientation tolerance. Due to the
inherent distortion that fingerprints suffer, these tolerance values are necessary to
account for the intra-variability.
Using this model, the probability of finding matching minutiae between the template
and the query can be given by:
∑
( )(
( )
10
)
( )
(3)
Where
(respectively ) is the number of minutiae in the query (respectively
template),
is the ratio of the total area of the fingerprint and the area of tolerance,
and
is the probability of one minutiae match:
( √(
)
(
)
)
(4)
(
)
(5)
Zhu and Dass Mixture Models
Zhu et al [18] proposed a mixture model to represent the minutiae variability of a
finger. This model aims to improve Pankanti’s model by modeling minutiae clustering
tendencies and dependence between minutiae location and orientation in different
regions of the fingerprint. A joint distribution of minutiae pairs
can be defined as:
∑
where
(
∑ )
(
(
where
)
and covariance matrix ∑ , and
{
(7)
is the Von-Mises distribution for the minutiae direction given by:
( |
with
(6)
∑ ) is the bivariate
(
is the total number of mixture components,
Gaussian probably density function with mean
)
)
(
)
(8)
being the modified Bessel function of the first kind with order 0, defined as
(
Finally,
probability
)
(
∫
)
is the cluster prior probability for each cluster, and
(9)
is the mixture
) for each cluster .
In [18] they show that the probability of matching exactly
minutiae pairs between
the query and the template corresponds to the Poisson probability mass function:
(10)
where
is the mean given by
(11)
for
and
being the number of minutiae in
11
and
respectively, and
(√(
)
(
)
) (12)
denotes the probability of a match when
and
are random minutiae from
respectively.
In [16] a mixture model is calculated for each fingerprint and then the models are
grouped in clusters using an agglomerative clustering procedure. Fingerprint
individuality under certain parameter settings is numerically estimated by performing
fingerprint imposter matching experiments on the synthetic minutiae patterns
generated from the models.
This model also accounts for intra-variability as each model is built by using a master
minutiae set obtained by aligning different impressions from the same fingerprints.
Su and Srihari Models
Su and Srihari [19] carried out experiments similar to those in [18] but adapting them
to a more forensic framework. First of all, they define the specific probability of
random correspondence, nPRC, which allows calculating the probability of a
determined fingerprint or fingermark to be matched to a random fingerprint among a
population. This is a very useful measure in forensics, as it can provide forensic
examiners and courts with a specific PRC for the fingerprint under consideration.
Besides, Su and Srihari present a registration method for fingerprints based on core
detection. In [17] and [18], the fingerprints had to be manually aligned. However, when
working with fingermarks it is really complicated to know which the upright position
is, as usually only a portion of the fingerprint is available.
The model proposed by Su and Srihari is as well based in a mixture of Gaussians and
Von-Mises distribution. However, in this case the model is calculated for all the
fingerprints in the database and it accounts for relationships between neighboring
minutiae using Bayesian networks. Results are given in form of nPRC for two latent
fingerprints and different choices of matching degrees.
Chen and Moon Models
Chen and Moon gave two different approaches. In the first one [20] [21] they consider
that minutiae in the fingerprint are randomly distributed following a uniform
distribution and they derive a conservative expression to calculate de PRC.
In a posterior work [22] they propose a deterministic composite stochastic model for
describing and simulating fingerprint minutiae patterns. This model consists of a pair
potential Markov point process and a thinned process. The Markov point process is
employed to simulate the over dispersing among minutiae. The thinned process
simulates the large scale clustering of minutiae by creating low minutiae density
12
regions where the probability of the emergence of minutiae is generally lower than the
remaining parts of fingerprints. The PRC is calculated by matching impostor synthetic
fingerprints generated from the proposed model and looking at the false acceptance
rate (FAR) obtained. The main problem with this work is that it does not include
minutiae direction in the model, which is fundamental in minutiae based fingerprint
matching. Also, the training of the model and the evaluation are performed using only
good quality fingerprint impressions.
3.5. Models that include other information
A very complete model that includes information different than minutiae can be found
in [23]. An individuality model is proposed that incorporates all three levels of
fingerprint features: pattern or class type (Level 1), minutiae and ridges (Level 2), and
pores (Level 3). Correlations among these features and their distributions are also
taken into account in the model. In particular, they create pattern specific models and
for each pattern they model minutiae location and direction along with ridge period
and curvature at each minutia. Experimental results show that the theoretical estimates
of fingerprint individuality using the model consistently follow the empirical values
obtained with a good quality fingerprint database.
In [24] and [25], Su and Srihari include level 1 information in the training of their
models along with minutiae location and direction and distribution of ridges where
minutiae lie. Results in this work report that ridge information leads to a much lower
PRC than when only minutiae are used.
Finally, more state-of-the art studies in this field can be found in [26], [7] and [27].
13
4. Statistical models on minutiae
features
In this work, the Probability of Random Correspondence is going to be calculated by
fitting a set of fingerprint features into a model built using sets of features that come
from a different source that that of the query set.
The feature space in this case consists on fingerprint minutiae. As explained before,
minutiae are the anomalies that occur on the skin ridges, such as bifurcations, endings,
islands, dots, enclosures or bridges. The frequency of occurrence of the different types
of minutiae is very different between bifurcations and endings, which are extremely
frequent, and all the other types, which are very infrequent. For that reason, most
fingerprint recognition systems only look for ridge bifurcations and endings. In this
work, only these two types of minutiae are used and no difference is made between
them. Each minutia is characterized in terms of two components: its location and its
direction. The minutiae location consists on the spatial coordinates
of its
position in the fingerprint image. The direction
corresponds to the
subtended by the minutia measured from the horizontal axis.
In order to have a model that adjusts to reality, many authors [20][28] have studied the
distribution of the minutiae location and direction among large sets of fingerprints and
then, they have tried to find parametric models that explain faithfully how minutiae
behave.
It is now well known that minutiae in a fingerprint tend to group in clusters [18]. Also,
minutiae pairs that are close in space tend to have a more similar direction than
minutiae that are further from each other, i.e. there is interdependence between the
location and direction of the minutiae.
A model that embraces all the previous observations was generated. The main
premises taken in to account were the following [18]:
1) The model needs to account for minutiae clustering tendencies
2) The dependence between minutiae location and direction needs to be modeled.
3) The model need to be flexible, that is, it has to represent the observed
distribution of minutiae in fingerprint images over different databases
4) Fingerprint individuality measures have to be easy to obtain from the model.
The proposed model consists on a joint distribution formed by G mixtures. Each
mixture corresponds to a minutiae cluster. For each cluster, the minutiae location is
modeled with a bivariate normal distribution. The direction is modeled with a VonMisses distribution.
15
In this section, the statistical distributions used in this work to model fingerprint
features are explained, as well as how their parameters are calculated. First, a brief
explanation on the Gaussian and the Von-Mises distributions is going to be given.
Then, the combined mixture distribution is presented.
4.1. Gaussian distribution
The distribution of the minutiae location in each cluster is modeled using a bivariate
Gaussian distribution.
( |
)
( |
)
{
√
Where s is the minutiae location s
,
(
)
}
is the vector of means and
(13)
is the
covariance matrix :
(
(
)
(14)
)
A graphical example of a bivariate normal distribution is shown in figure 7.
Figure 7: Example of bivariate normal distribution
16
(15)
4.2. Von-Mises distribution
In probability theory and directional statistics, the Von-Mises distribution is a
continuous probability distribution on the circle. It is a close approximation to the
wrapped normal distribution, which is the circular analogue of the normal distribution
[29].
In this work, the Von-Mises is used to model the minutiae direction in each cluster.
However, rather than using the normal form of the distribution for the range
,
we are going to model the minutiae orientation (instead of the direction) which varies
in the range
and then interpolate to the minutiae direction by using the cluster
mixture probability. The cluster mixture probability
is the probability that a minutia
in the cluster g has its orientation in the range
So the distribution on the minutiae direction is the following:
( |
)
( |
)
(
|
)
(16)
where the function
{
and ( |
) is the Von-Mises distribution defined as:
( |
In (),
and
(17)
)
(18)
represent the mean angle and the precision (inverse of the variance) of
the Von-Mises distribution respectively.
is the modified Bessel function of the
first kind with order 0, defined as:
(
)
(
∫
)
A graphical example of the Von-Mises distribution is shown in figure 8.
17
(19)
Figure 8: Von-Mises distributions in
, with
and different values of the precision .
4.3. Joint distribution model
The proposed distribution model is based on a mixture consisting in G components or
clusters. The minutiae in each cluster are distributed according to the following
expression:
(
Where
( |
|
)
) and
( |
( |
)
( |
)
(20)
) are defined in equations (13) and (16)
respectively.
Finally, the complete distribution can be expressed as:
∑
where
(
|
)
(21)
, are the mixture weights, so that:
∑
(22)
18
and
is defined as the set of all the unknown parameters of the distribution
The model is trained by using the Expectation-Maximization (EM) algorithm [29] and
the optimal number of mixtures G is estimated with the Bayes Information Criteria
(BIC). More detail on the training of the model for this work is given in the
implementation section.
19
5. Probability of Random
Correspondence (PRC)
As measures of individuality, three different probabilities can be defined, namely PRC,
nPRC and specific nPRC.
1. PRC is the probability that two randomly chosen samples from a population
share a determined number of features.
2. nPRC: the probability that among a set of n samples, there is a pair of two
samples de share a determined number of features.
3. Specific nPRC: the probability that in a set of n samples, a specific sample x
shares a determined number of features with any other sample on the set.
The main purpose of this work is to calculate the probability of a given fingermark to
have a specific number coincident minutiae with a random fingerprint impression
from a database. This measure will then correspond with the specific nPRC.
To compute the PRCs for minutiae, we first define the correspondence or match
between to minutiae pairs. Two minutiae from the query
and the template
,
and
are said to match when:
√(
)
(
)
(23)
(24)
where and
are the location and angle tolerances respectively and are set in order
to account for intravariability among two different impressions of the same fingerprint.
The criteria for the selection of these tolerance values will be discussed later in this
work.
Given a query fingermark q and a fingerprint population of n fingerprint impressions,
the probability that w pairs of minutiae are matched between the query q and a random
chosen fingerprint from n fingerprints can be defined as:
( )∑
( )
(∏
∏
21
)
(25)
where m is the average number of minutiae per fingerprint in the population.:
(
and
| )
∫|
|
|
(26)
|
is the probability distribution in equation 21 evaluated in
.
The tolerance values
establish with is the intravariability level that is allowed in
order to consider that two minutiae in two different fingerprints match, sd they
determine the matching region of a query minutia. In [15] a procedure is adopted to
select the values of the pair
so that only a certain small amount of genuine
matching minutiae is rejected. The value of is selected based on the distribution of
Euclidean distance between the locations of minutiae pairs. The value of is selected
so than only the upper 2.5% of the genuine matching distances are rejected. In the same
fashion, the value of
is calculated so that only the upper 2.5% genuine angular
distances are rejected. Using this methodology, the values are:
However, other authors use different values. For example, in [33],
and
. In this work, different tolerances are going to be tried so the variability of
the PRC with respect to these tolerance values is observed.
Another important parameter that needs to be set is w, which corresponds to the
number of matching minutiae between the query and the template. In many countries,
the minimum amount of coincident minutiae to account for a match is set in 12.
However, different values are going to be calculated to see how the PRC varies
depending on the number of coincident minutiae.
22
6. Implementation
6.1. NIST databases
The fingerprint data used in this work comes from two different databases. Both
databases are collected by the National Institute of Standards and Technology (NIST)
from real operational forensic fingerprints captured in the United States.
NIST SD 14
This database [29] contains 54,000 rolled fingerprints from 2,700 different individuals.
For each individual, the database contains two acquisitions of each of their ten fingers.
Each fingerprint is store in an 832x768 8-bit gray scale image. This database has only
images, i.e. fingerprint features do not come with the database.
These fingerprints were acquired by the FBI in real casework either by rolling the inked
fingers on a paper or by using a live scan device. The fingerprints acquired using a
live-scan were then printed in paper and then scanned again to digitalize this database.
Most images in this database contain artifacts that do not belong to the fingerprints,
such as pen annotations and numbers that belong to the fingerprint card. An example
of fingerprints from NIST SD 14 is shown in figure 9.
A subset of this database is going to be used in the experiments. In particular, 1000
impressions from 1000 different fingers will be used. The distribution of the number of
minutiae in this dataset is shown in figure 11. The average number of minutiae is 108.
Figure 9: Three images from NIST SD 14 database
23
NIST SD 27
The SD 27 database [30] contains 258 images of real forensic fingermarks and their
mated rolled fingerprint impressions. I.e. for each latent fingermark in the database,
there is a rolled fingerprint impression that comes from the same finger. Each
fingermark-impression pair has a different source. All the fingerprints come from real
solved forensic cases in the United States.
The images in this database are 800x768 8-bit gray scale images scanned at 500 dpi. The
fingermarks are classified in three different groups according to their quality level
named good, bad and ugly. There are 88 good, 85 bad and 85 ugly fingermarks. The
average number of minutae is 20 in the fingermark and 106 in the impressions set. The
distribution of the number of minutiae is shown in figure 11.
Besides the images, latent-impression pair in this database comes with four sets of
fingerprint features. Two of these sets are the ideal features of the fingermark and the
rolled impression respectively, marked by human FBI examiners. The other two sets of
features are the subsets of the first ones that only have the matching features between
the two prints. In this work, the ideal minutiae features included in this database are
going to be used.
As of our experiments, the minutiae from rolled impressions from SD14 (subset of
1000) and SD27 are used to train the statistical models on minutiae features. As SD14
dos not come with a feature set, a feature extractor is needed in order to have the
minutiae.
The minutiae from the latent fingermarks in SD 27 are going to be used to evaluate the
PRC of the fingermarks.
Figure 10: A latent fingermark and its mated impression from NIST S27 database
24
Figure 11: Distribution of the number of minutiae for the different sets of fingerprints.
25
6.2. Fingerprint feature extraction
A fingerprint feature extraction algorithm is necessary in order to extract the minutiae
from the fingerprints in the SD 14 database. As this database is really big (27,000
images), the performance of the feature extraction needs to be high so no manual
correction is needed.
For this matter, a commercial SDK, Neurotechnology Verifinger 6.3 has been used [31].
This product provides a series of feature extraction and matching algorithms that can
be adapted for the user’s needs. Many research jobs in fingerprint recognition and its
applications use Verifinger in their experiments, as it is considered to be a reliable and
versatile product [2] [3].
In this work, a feature extraction software is built using the Verifinger SDK in order to
extract minutiae from the images in NIST SD 14 database. However, unlike real law
enforcement AFIS, this software is not well prepared to work with forensic
fingerprints. As detailed in the previous section, fingerprint images from NIST SD14
database contain several artifacts that do not belong to the fingerprint. Unfortunately,
Verifinger treats these elements as part of the fingerprint and extracts minutiae from
them too. As a result, most fingerprints end up with a significant amount of spurious
minutiae extracted by the system. An example of this is shown on figure 12.
Figure 12: Example of fingerprint minutiae extracted with Verifinger. Many spurious minutiae
are extracted by the system.
6.3. Post-processing: removing spurious minutiae
In the previous section, it was shown that the feature extraction software used to
obtain the minutiae from NIST SD14 database has a poor performance in regions of the
image that do not correspond to the fingerprint. In these regions, the system identifies
26
pen strokes and other artifacts as part of the fingerprint and extracts spurious minutiae
out of them.
Ideally, to construct statistical models based on minutiae features, we would need the
minutiae to be as accurate as possible. The more reliable minutiae and the less spurious
minutiae in our database, the more realistic the model would be. The best procedure to
achieve this objective would be to manually mark the minutiae in all the fingerprints,
to make sure that all the minutiae are correct. However, there are two main reasons no
to do this: (1) Manually extracting minutiae from 27,000 prints would be extremely
time-consuming; (2) For practical forensic applications, minutiae in fingerprint
impressions are extracted by AFIS systems and not by human experts. Therefore,
having an ideal model would not be useful for real applications.
However, there has to be a tradeoff between perfectly marked minutiae and what it
was obtained from Verifinger feature extraction. Based on our experience, we can
assure that real state-of-the-art law enforcement AFIS do not extract as many spurious
minutiae (if any) out of the area of the fingerprint. As one of the objectives of this work
is to have a real application that works with this type of AFIS systems, our data has to
be as realistic as possible, so the spurious minutiae need to be removed.
For that reason, a post-processing algorithm has been developed in order to remove
the spurious minutiae obtained during the feature extraction. This image processing
based algorithm takes advantage of the skeletonized images of the fingerprints, which
were obtained during the feature extraction process with the Verifinger SDK. An
example of a skeletonized fingerprint is shown in figure 13. The following algorithm
was developed to remove the spurious minutiae and it uses morphological and other
image processing techniques:
1.
The algorithm starts with the skeletonized image obtain from Verifinger. White
pixels have 1 value and black pixels have 0 value.
Figure 13: Skeletonized fingerprint image by Verifinger
27
2.
The image is divided in blocks of 8x8 pixels. Each block is mapped into a pixel
in a binary image as follows: if the block contains at least 1 black pixel, the
mapped pixel is set to zero. Otherwise, it is a background block and the pixel is
set to 1.
Figure 14: Block mapping
3.
A morphological opening operation with a squared structuring element is
applied to the block map. This operation consists on a morphological erosion
followed by a dilation. The purpose of the erosion is to remove noise (i.e. small
blocks) and separate the different components of the image. Then the dilation is
applied in order to fill small holes in the white structures.
Figure 15: Fingerprint mask after opening.
28
4.
Once the opening is done, we count the connected components in the
resulting image as well as their sizes. All the components except from the
one with the biggest area are removed from the map.
Figure 16: Fingerprint mask after selection of the biggest component
5.
Another morphological operation is now performed to the map: closing. A
closing consists on a dilation followed by erosion as is aimed to fill holes
without affecting the edges of the structure. Then erosion is performed to
remove minutiae from the edges of the fingerprint, as they are usually
unreliable.
Figure 17: Fingerprint mask after closing. All the minutiae outside the mask are removed.
29
6.
Finally the mask is scaled again to the original image size and compared to
the minutiae file. All the minutiae that fall outside of the mask are removed.
In figure 18, the identified spurious minutiae as marked in red and the
reliable minutiae are marked in green. All the spurious minutiae are then
removed from the minutiae file.
Figure 18: Example of successfully identified spurious minutiae (red minutiae).
6.4. Minutiae alignment
A crucial task of this work is the minutiae alignment. Most previous works in this field
only use good quality fingerprints that are easily positioned upright and do not need to
be rotated when compared to each other. However, partial fingermarks often do not
have the core or deltas so it is very hard to know which the upright position of the
mark is. For that reason, AFIS normally rotate the fingermark to find the best
alignment between the fingermark and the ten-print image [6].
Minutiae alignment is important to create the models because we aim to calculate the
specific nPRC for each fingermark in relation to the set of fingerprint impressions. In
real casework, the query would be the one to be aligned to each of the templates in
order to calculate the matching score. However, as we need to model the distribution
of the templates, we are going to align all the templates to the latent. For this task, a
point pattern matching algorithm was adapted. The original algorithm was developed
in [32] and it has been adapted in this work by adding minutiae directions to make the
algorithm more accurate.
30
The main purpose of the algorithm is to align two sets of minutiae
and . Each
minutiae consists on a location
and direction
. So we have:
Matching Pairs Support Algorithm
The first part is the matching pairs support finding algorithm and its purpose is to fins
a pair of corresponding minutiae in A and B by evaluating all possible pairs in both
fingerprints. It is explained as follows:
Set weight w=0;
For all minutiae in in A,
For all minutiae in B,
{
{
},
{
Reset the accumulator matrix: V= zeros(3,20)
{
For all minutiae in A,
Take the vector (
}
{
). Calculate the magnitude and the angle.
√
For all minutiae in B
{
},
Take the vector (
{
). Calculate the magnitude and the angle.
√
Calculate the ratio of the magnitudes and angles.
S= quantify (s1/s2);
;
If 0.85 < S < 1.15 (restriction in scale)
If |
and
=
}
}
If max(V)>w {
W=max(V);
( Smax, max)=ind(V(w));
BestPair (M,N)=
}
}
}
31
+1
The matrix V is an accumulating vector to store the matching pair supports from the
comparison. The three possible values for scaling are 0.9, 1 and 1.1. The original
algorithm accounts for a much bigger range of scaling between the two sets of points.
However, fingerprints images are usually scaled to a common size by using the ridge
frequency, so the scaling is not so necessary in this case. For the angle, a
rotation is
allowed in intervals of
, as normally the upright position of the fingermarks is
unknown.
Using the matching pairs support finding algorithm we obtain a pair of matching
minutiae between A and B. The next step is to maximize the alignment between the
two fingerprints using this matching pair as a reference. To do so, the next registration
algorithm was developed.
Registration Algorithm
The purpose of this algorithm is to find the optimal rotation, translation and scaling
parameters to align B to A. The first step consists on finding pairs of fingerprints that
could match between A and B when rotating and scaling the fingerprint knowing that
is a matching pair and (Smax, max) the most probable scaling and rotation
parameters. Minutiae directions are also taken into account for this matter.
The pseudo code for the registration algorithm is the following:
G=zeros (m,n)
For all minutiae in A,
{
}
{
√
iq=0; count=0;
For all minutiae in B
{
},
{
√
If
and
Count=count+1; iq=q;
}
If count==1;
}
32
Once the algorithm finishes, the matrix G will have value one for all the matching
minutiae pairs found by the registration algorithm.
Then we create a vector {(
} where is the total number of
matching pairs. Once a list of matching pairs is obtained, least squares estimation of
registration parameters is used to find the best alignment between the two sets of
points [32]. The following expressions are used to estimate the parameters [32]:
(
)
(
)(
)
(27)
Where:
∑
∑
∑
∑
∑
∑
∑
Finally, the minutiae in B are transformed by applying the calculated parameters using
the following expressions:
( )
( *
(
)(
)
(28)
Although this algorithm is to be used with minutiae that are not from the same source,
the best way to test it is to align fingerprints that do come from the same source, as it is
the only way to see if the algorithm finds the best alignment between the two
fingerprints.
33
In figure 19, an example of the result of the alignment algorithm is shown. In blue, the
minutiae from the fingerprint impression are shown in their original size and location.
The minutiae in the mated latent fingermark, shown in red, are rotated and translated
following the parameters calculated with the alignment algorithm. Circles in different
bright colors are the minutiae pairs identified by the algorithm and used to estimate de
least square parameters.
Figure 19: A successful example of alignment between two fingerprint feature sets.
34
6.5. Training the mixture models
The following model on minutiae locations and directions needs to be trained.
∑
where
( |
) and
( |
( |
)
( |
)
(29)
) are defined in section 4.1 and 4.2. To do so, the
EM algorithm is going to be used. This algorithm is going to calculate the missing
parameters of the model using the minutiae locations and directions from a set of
fingerprint impressions that have previously been aligned with a query fingermark.
The EM algorithm
The EM algorithm is an iterative method for finding the maximum likelihood estimate
of parameters when there are latent random variables. In this case, the latent or hidden
variables are the class labels
which can take the values
for each
minutia (
)
.
The algorithm consists in two steps, namely the Expectation (E)-step and the
Maximization (M)-step. In the E-step, the expectation of the logarithm of the complete
likelihood is obtained conditional on the observed data and parameter estimates at the
current iteration. The M-step the new parameters are obtained by maximizing the
likelihood [30].
For our problem, the E and M steps can be combined into one updating equation for
each parameter linking the current estimates to subsequent ones.
The posterior probabilities that the minutia
belongs to the cluster
are calculated
after each step (n) as follows:
(30)
where
is the set of all the parameters calculated in the (n) iteration as follows [18]:
The cluster weights
are calculated as:
∑
(31)
35
The mean
of the Gaussian distribution:
∑
(32)
∑
The covariance matrix
of the Gaussian distribution:
∑
(33)
∑
The mean value of the Von-Mises distribution
{
∑
:
}
∑
(34)
where:
{
The precision value
(35)
is found by following the numerical method in [35] using:
∑
(36)
∑
The cluster class label
for the observation (
) is determined as:
(37)
36
The estimate of the mixture probabilities
:
∑
(38)
∑
The algorithm is initializing by using the k-means algorithm on the minutiae locations.
The posterior probabilities in the first iteration are calculated as the cluster frequencies
determined by the cluster labels given by the k-means algorithm.
The algorithm is run until it converges for different numbers of clusters
Finally, the Bayes Information Criteria (BIC) is applied to decide which is the optimal
number of clusters.
∑
Where
maximizes
(39)
is the number of unknown parameters in
. The value of G that
is selected as the optimal number of mixtures in the model.
In figure 20 there is a distribution of all the minutiae used to train one of the models.
The final cluster for each minutia is represented in a different color. The x and y axis
correspond to the minutiae location while the vertical axis is the minutiae direction.
The number of mixtures in this case is G=10.
Figure 20: Distribution of the minutiae used to train a model.
37
In figure 21 and 22 an example of one of the trained models is represented. Figure 21
shows the mixture of Gaussian distributions that models the minutiae locations. Figure
22 shows the different Von-Misses distributions that model the minutiae direction.
Figure 21: Mixture of Gaussian distributions that model minutiae locations.
Figure 22: Von-Mises distributions of each mixture in one of the models.
38
7. Experimental results
In this section, results are going to be shown in the way of specific nPRC for different
values of coincident features, tolerance values and query fingerprints.
For each query fingermark in SD27, the subset of 1257 impressions is aligned with the
query using the developed alignment algorithm (note that the mate of the impression
is not included, as we are interested in impostor pairs). Once this is done, the aligned
set of minutiae from the impressions is used to train the mixture model explained in
section 4. Then, using the equations in section 5, the query minutiae and the model, the
specific nPRC is calculated.
Some other data is shown in table 1. It contains some useful information that useful for
understanding the results.
DESCRIPTION
Value
Number of fingerprints in the database
1258
Mean number of minutiae in database
109
Number of queries
257
Mean number of minutiae in the query
20
Total number of impostor comparisons
324306
Table 1: Some values for the experiments
In the first experiment, we obtain values of the nPRC for different tolerance values
(
). In this case, the nPRC is calculated for each of the fingermarks and then the
average value is obtained.
w
Emp
12
20
12
15
12
10
12
6
Table 2: Average nPRC for different tolerance values and 12 coincident minutiae.
As it can be seen in table 2, the number of coincident minutiae has been set to 12 while
varying the tolerance value from a very elastic mode to a much more restrictive mode.
Results show big changes in the PRC when modifying the tolerance values in both
empirical and statistical approaches.
39
Another important parameter is the number of coincident minutiae. In the next
experiment, we are going to calculate the same average nPRC when the number of
coincident minutiae, w, changes. In this case, the tolerance values are going to be fixed.
w
Emp
4
10
6
10
8
10
10
10
12
10
Table 3: Average nPRC for different numbers of coincident minutiae w.
In table 3 it is shown how the number of coincident minutiae affects the nPRC directly.
It has to be taken into account that defining a match between 2 minutiae depends
entirely on this number and that the probability of random correspondence is not
defined without it.
40
8. Conclusions and future work
8.1. Conclusions
A statistical model to calculate the rarity of fingerprints is presented. This model is
based on the minutiae locations and directions and accounts for the clustering
tendencies of minutiae within the fingerprint and for the dependence between location
and direction.
A series of previous steps were taken in order to be able to train the model correctly.
First, the fingerprint features had to be extracted correctly from the database. To do so,
a post-processing algorithm based on image processing techniques was developed and
successfully tested. This algorithm was necessary because the commercial SDK that
was acquired to perform feature extraction found too many spurious minutiae, which
would have made the model more unreliable if not removed.
The next step was to align the fingerprint impressions to the query fingerprint. An
algorithm was developed for this matter and successfully tested in mated fingerprints.
The algorithm proved to perform with accuracy in most cases.
Finally, the aligned minutiae were used to train the model and a statistical measure of
the probability of random correspondence was proposed. This measure permits to
know the probability of finding a random match when searching a fingermark in a
fingerprint database.
The different parameters that affect the Probability of Random Correspondence have
been analyzed and several observations have been obtained.
8.2. Future work
In the next future, a software prototype will be developed to allow fingerprint
examiners from the Guardia Civil to know the rarity of a fingermark at hand with
respect to a relevant population.
Furthermore, there is a wide number of possibilities to continue this work and improve
the results. First of all, a more exhaustive study in the analysis of factors that affect the
PRC needs be developed. This factors include the tolerance values, number of minutiae
and eventhe variation of parameters in the statistical model.
Also, including other features in the model, such as ridge information and general
pattern information would definitely improve the reliability of the calculated
probabilities.
41
Bibliography
[1] D. Maltoni, D. Maio, A.K. Jain, and S. Prabhakar, Handbook of fingerprint
Recognition.: Springer, 2009.
[2] C. Champod, C. Lennard, P. Margot, and M. Stoilovic, Fingerprints and Other Ridge
Skin Impressions.: CRC Press, 2004.
[3] D.R. Asbaugh, Quantitative-Qualitative Friction Ridge Analysis: An Introduction to
Basic and Advanced Ridgeology. Boca Raton, FL: The CRC Press, 1999.
[4] "U.S. v. Byron Mitchell, Criminal Action No.96-407, US District COurt for the
Eastern District of Pennsyvania," 1999.
[5] "U.S. v. Llera Plaza, 179 F. Supp 2d 492 (ED Pa 2002)".
[6] S. Cole, "Is Fingerprint Identification Valid? Rhetorics of Reliability in FIngerprint
Proponents? Discourse," Law and Policy, vol. 28, no. 1, pp. 109-135, 2006.
[7] U.S. Supreme Court, Daubert vs. Merrel Dow Pharmaceuticals., 1993, vol. [509 US.
579].
[8] M.J. Saks and J.J. Koehler, "The Coming Paradigm Shift in Forensic Identification
Science," Science, vol. 309, no. 5736, pp. 892-895, 2005.
[9] Office of the Inspector General, "A Review of the FBI's Handling of the Brandon
Mayfield Case," U.S. Department of Justice, 2006.
[10] S. Cole, "More Than Zero: Accounting for Error in Latent Fingerprint
Identification," vol. 95, no. 3, pp. 985-1078, 2005.
[11] B. Scheck, P. Neufeld, and J. Dwyer, Actual Innocence. New York: Doubleday, 2000.
[12] C. Champod and I.W. Evett, A probabilistic approach to fingerprint evidence.: Journal
of Forensic Identification, 2001, vol. 51(2).
[13] C.G.G. Aitken and F. Taroni, Statistics and the Evaluation of Evidence for Forensic
Science. Chichester: John Wiley & Sons, 2004.
[14] D. Ramos, Forensic evaluation of the evidence using automatic speaker recognition
systems. Madrid: PhD. Thesis UAM, 2007.
43
[15] S. Pankanti, S. Prabhakar, and A.K. Jain, "On the Individuality of Fingerprints,"
IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1010-1025,
2002.
[16] Y. Zhu, S. Dass, and A.K. Jain, "Statistical Models for Assessing the Individuality
of Fingerprints," IEEE Trans. on Information Forensics and Security, vol. 2, pp. 391401, 2007.
[17] J. Chen and Y. Moon, "A Minutiae-based Fingerprint Individuality Model," in Proc.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis,
2007, pp. 1 - 7.
[18] M Okajima, "Quantitative and Genetic Features of Epidermal Ridge Typica on the
Palm of Twins," Human Heredity, vol. 34, pp. 285-290, 1984.
[19] A.K. Jain, Y. Chen, and M. Demirkus, "Pores and Ridges: High Resolution
Fingerprint Matching Using Level 3 Features," IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 29, no. 1, pp. 15-27, 2007.
[20] A.K. Jain and J. Feng, "Latent fingerprint matching," IEEE Trans. on Pattern Analysis
and Machine Learning, vol. 33, no. 1, pp. 88 –100, 2011.
[21] N. Ratha and R. Bolle, Automatic Fingerprint Recognition Systems.: Springer, 2003.
[22] P. Komarinski, Automatic Fingerprint Identification Systems (AFIS).: Elsevier, 2005.
[23] S. Srihari and H. Srinivasan, "Individuality of Fingerprints: Comparison of Models
and Measurements," CEDAR Technical Report TR-02-07, 2007.
[24] F. Galton, Finger Prints. London: Macmillan, 1892.
[25] J. Osterburg, "Development of a mathematical formula for the calculation of
fingerprint probabilities based on individual characteristics," Journal of American
Statistical Association, vol. 772, p. 72, 1997.
[26] E. Henry, Classification and uses of fingerprints. London: Routledge & Sons, 1900.
[27] V. Balthazard, "De l'identification par les empreintes digitales," Comptes Rendus des
Academies des Sciences, vol. 152, p. 1862, 1911.
[28] B. Wentwort and H. Wilder, Personal Identification. Boston: Richard G. Badger,
1918.
44
[29] M. Trauring, "Automatic comparison of finger-ridge patterns," Nature, vol. 197, p.
938, 1963.
[30] C. Champod and P. Margot, "Computer assisted analysis of minutiae occurrences
on fingerprints," in International Symposium on Fingerprint Detection and
Identification, Jerusalem, 1996, p. 305.
[31] C. Neumann et al., "Computation of likelihood ratios in fingerprint identification
for configurations of any number of minutiae," in Journal of Forensic Science,
Vol.52(1), pp.54-64., 2007.
[32] C. Neumann, C. Champod, and R. et al Puch Solis, "Computation of likelihood
ratios in fingerprint identification for configurations of three minutiae," in Journal
of Forensic Science, Vol 51(6), pp.1255-66., 2006.
[33] C. Su and S. Srihari, "Evaluation of Rarity of Fingerprints in Forensics," in
Proceedings of Neural Information Processing Systems, Vancouver, 2010.
[34] J. Chen and Y. Moon, "A statistical study on the fingerprint minutiae distribution,"
In Proceedings of ICAASP 2006, vol. 2, 2006.
[35] J. Chen and Y. Moon, "The statistical modelling of fingerprint minutiae
distribution with implications for fingerprint individuality studies," in IEEE
Conference on Computer Vision and Pattern Recognition, 2008, pp. 1-7.
[36] Y. Chen and A. K. Jain, "Beyond Minutiae: A Fingerprint Individuality Model with
Pattern, Ridge and Pore Features," in Proc. International Conference on Biometrics
(ICB), 2009.
[37] C. Su and S. Srihari, "Probability of Random Correspondence for fingerprints," in
Proc. International Workshop on Computational Forensics (IWCF 2009), The Hague,
2009, pp. 55-66.
[38] C. Su and S. Srihari, "Generative Models for Fingerprint Individuality Using Ridge
Models," in Proc. International Conference on Pattern Recognition, Tampa, 2008.
[39] D. A. Stoney and J.I. Thornton, "A Critical Analysis of quantitative fingerprint
individuality models," Journal of Forensic Sciences, vol. 31, no. 4, pp. 1187-1216,
1986.
[40] S. Dass, S. Pankanti, S. Prabhakar, and Y. Zhu, "On the individuality of
Fingerprints: Models and Methods," in Encyclopedia of Biometrics.: Springer, 2009.
[41] N. Fisher, Statistical Analysis of Circular Data.: Cambridge University Press, 1993.
45
[42] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum-likelihood for incomplete
data via the EM algorithm.," Journal of the Royal Statistical Society. Series B
(Methodological), vol. 39, no. 1, pp. 1-38, 1977.
[43] “NIST Mated Fingerprint Card Pairs 2 (MFCP2),” NIST Special Database 14,
http://www.nist.gov/srd/nistsd14.htm, 2010.
[44] “Fingerprint Minutiae from Latent and Matching Tenprint Images",NIST Special
Database 27, http://www.nist.gov/srd/nistsd27.htm, 2010.
[45] [Online]. http://www.neurotechnology.com
[46] S. Chang, F. Cheng, W. Hsu, and G. Wu, "Fast algorithm for point pattern
matching: invariant to translations, rotations and scale changes," Pattern
Recognition, vol. 30, no. 2, pp. 311-320, 1997.
[47] G.W. Hill, "Evaluation and Inversion of the Ratios of Modified Bessel Functions,
I_1(X)/I_0(x) and I_1.5(x)/I_0.5(x)," ACM Transactions on Mathematical Software,
vol. 7, no. 2, pp. 199-208, 1981.
46

Download Report

Probability of Random Correspondence for Forensic Fingerprints

Paperzz.com

Your Paperzz