Signature with Text-Dependent
and Text-Independent Speech for
Robust Identity Verification
B. Ly-Van*, R. Blouet**, S. Renouard**
S. Garcia-Salicetti*, B. Dorizzi*, G. Chollet**
* INT, dept EPH, 9 rue Charles Fourier, 91011 EVRY France;
**ENST, Lab. CNRS-LTCI, 46 rue Barrault, 75634 Paris
Emails: {Bao.Ly_van, Sonia.Salicetti, Bernadette.dorizzi}@int-evry.fr;
{Blouet, Renouard, Chollet}@tsi.enst.fr
Overview
• Introduction: Why Speech and Signature?
• BIOMET database: brief description
– Signature data
– Speech data
•
•
•
•
Writer verification
Speaker verification systems
Fusion systems
Results and Conclusions
The BIOMET Database
• 5 modalities: hand shape, fingerprints, online signatures, talking faces
• 131 people: 50% male, 50% female
• Data from 68 people for fusion
• Time variability: two sessions spaced of 5
months
– S. Garcia-Salicetti, C. Beumier, G. Chollet, B. Dorizzi, J. Leroux-Les
Jardins, J. Lunter, Y. Ni, D. Petrovska-Delacretaz, "BIOMET: a
Multimodal Person Authentication Database Including Face, Voice,
Fingerprint, Hand and Signature Modalities", 4th International Conference
on Audio and Video-Based Biometric Person Authentication, 2003.
Signatures capture
• Captured on a digitizer : 200 Hz
– WACOM Intuos2 A6
• 5 parameters:
– Coordinates
– Axial pressure
– Azimuth and Altitude
• 15 genuine per person
• 12 forgeries per person
Azimuth (0°-359°)
Altitude (0°-90°)
0°
270°
180°
90°
Signatures modeling
• Preprocessing (filtering)
• Feature extraction: 12 parameters
• Modeling signature: continuous HMM
– 2 states, 3 gaussians per state
– Bagging techniques: 10 models to build an
«aggregated» model (average score)
– Training: 10 signatures of one session
• Normalized score: |Si(O) - Si*|
Speech
• Two verification systems:
– Data: volontary degraded
• Text-dependent: only 4 digits sequence among 10
digits (5 templates per speaker)
• Text-independent: sentences extracted from the
original data:
– client model: trained on digits (15 seconds) and tested on
sentences
– world model: trained on data from 131-68 people
– Methods:
• Text-dependent: DTW (Dynamic Time Warping)
• Text-independent: GMM (Gaussian Mixture Model)
Text-dependent (DTW)
• DTW computes the spectral distance between two template patterns
DTW Score
Template
speech signal
Sample speech
signal
Text-independent (GMM)
Front-end
Front-end
GMM
MODELING
GMM model
adaptation
WORLD
GMM
MODEL
TARGET
GMM
MODEL
Baseline GMM method
HYPOTH.
TARGET
GMM MOD.
Front-end
P (x / )
WORLD
GMM
MODEL
Log[
P (x / )
=
P(x/)
]
P(x/)
Fusion systems
• Additive Tree Classifier (ATC)
– Boosting techniques on Binary Trees
– CART algorithm
• Support Vector Machine (SVM)
– Linear kernel
• Input:
– Normalized signature score
– Text-dependent LLR score
– Text-independent LLR score
Tree-based Approach for score fusion
Goal: finding an optimal partition R = {Rk}1k K of
the score space S=(s1, s2, s3) according to an
Information Theory criterion
a sub-optimal solution, based on CART:
Best partition : R* = arg minR C(R)
Score estimation based on P(client|Rk) and P(world|Rk)
at each node of a given tree
Use of RealAdaboost to build 50 trees per client and
to obtain a robust estimation of P(client|Rk) and
P(world|Rk)
Verification based on ATC
A score S=(s1, s2, s3) is presented to the system
composed of 50 trees :
each tree gives as output a score, based on the
affected region Rk
the LLR score is computed with
P(client|Rk) and P(world|Rk)
an average score is then computed with the 50 scores
SVM principles
H
X
y(X)
Class(X)
Ho
Fusion experiments
• The 68 people database: splitted in 2 equal
parts
– 34 people: Fusion Learning Base (and threshold
estimation for unimodal systems with the
criterion min TE)
– 34 people: Fusion Test Base (and test of
unimodal systems)
• Per person:
– 5 genuine bimodal values
– 12 impostor bimodal values
Fusion Performances
Speech
without
noise
Speech
–10dB
noise
Speech
0dB
noise
Model
Signature
TI Speech
TD Speech
ATC
SVM
TI Speech
TD Speech
ATC
SVM
TI Speech
TD Speech
ATC
SVM
TE (%)
11.9 [±2.7]
6.3 [±2.0]
10.3 [±2.6]
2.8 [±1.4]
2.7 [±1.4]
8.0 [±2.3]
11.9 [±2.7]
2.9 [±1.4]
2.9 [±1.4]
17.0 [±3.1]
16.5 [±3.1]
6.7 [±2.1]
5.8 [±2.0]
FA (%)
8.9 [±2.9]
2.0 [±1.4]
7.6 [±2.7]
1.7 [±1.3]
1.3 [±1.1]
2.0 [±1.4]
7.8 [±2.7]
2.5 [±1.6]
1.9 [±1.4]
6.0 [±2.4]
6.3 [±2.4]
4.7 [±2.1]
2.4 [±1.5]
FR (%)
20.1 [±6.0]
16.0 [±5.5]
17.0 [±5.7
5.2 [±3.3]
5.9 [±3.6]
23.2 [±6.4]
22.1 [±6.3]
3.9 [±2.9]
5.3 [±3.4]
45.0 [±7.5]
42.0 [±7.4]
11.2 [±4.8]
13.6 [±5.2]
Conclusions
• Equivalent results of ATC and SVM:
– role of Boosting (ATC)
• Fusion increases performance by a factor 2
relatively to the best unimodal system (in clear or
noisy environments)
• Other methods to create noisy environments
should be tested (not gaussian white noise but real
one !)
• Fusion performances should also be studied only
on the 2 speech verification systems, since no
noise was introduced in the signature modality
© Copyright 2025 Paperzz