BMI

Proceedings of the 2005 IEEE
Engineering in Medicine and Biology 27th Annual Conference
Shanghai, China, September 1-4, 2005
Biometric Statistical Study of One-Lead ECG
Features and Body Mass Index (BMI)
T. W. Shen, W. J. Tompkins
Department of Biomedical Engineering, University of Wisconsin, Madison WI, USA
Abstract— We have studied the electrocardiogram (ECG) as a
potential biometric for human identity verification. This
research investigates the relationship between ECG biometric
features and body mass index (BMI) using correlation analysis
and linear regression methods. Using our ECG database of 168
normal healthy people (113 females and 55 males), we studied
normalized features extracted from a one-lead, resting, palm
ECG. The results showed that normalized ECG biometric
features explain 25.3% of the variability of the BMI. ECG
features of males better correlate with the BMI model than
those of females. Furthermore, we calculated correlation
coefficients and R-square changes to analyze the correlations
between extracted features and the BMI and to indicate the
most significant feature as a predictor of BMI among all ECG
biometric features.
Keywords - Electrocardiogram (ECG), Human identity
verification, Biometrics, Body Mass Index (BMI), ECG features,
Linear regression, Correlation
B
I. INTRODUCTION
iometrics use anatomical, physiological or behavioral
characteristics that are significantly different from
person to person and are difficult to forge. Several
biometrics that have been used commercially for human
identity verification are facial geometry, fingerprints, and
voice analysis [1-2]. Electrocardiogram (ECG) analysis is not
only a very useful diagnostic tool for clinical proposes [3], but
also is lately studied as a potential biometric [4-7]. It is
beneficial that a single-lead ECG is a one-dimensional,
low-frequency, life-essential signal which can be recorded
with three electrodes (two active electrodes and a ground
electrode). Biel et al. [4] showed that it is possible to identify
individuals based on a chest ECG signal. Israel et al. [7]
showed the uniqueness of an individual’s ECG by
investigating temporal features. This analysis achieved a
100% identification rate regardless of the electrode locations
on a 29-person group by combining LDA for discriminant
functions and a voting algorithm on the contingency matrix.
Also, our previous research demonstrated that an ECG-based
biometric system could successfully identify a group of 20
Tsu-Wang (David) Shen was a Ph.D. student in University of Wisconsin,
Madison, WI 53705 USA. He is now an assistant professor with the
Department of Medical Informatics, Tzu Chi University, Hau-Lien, Taiwan,
R.O.C. (phone: 886-3-856-5301 ext.7379; e-mail: [email protected]).
W. J. Tompkins is with the Department of Biomedical Engineering,
University of Wisconsin, Madison, WI 53705 USA (e-mail:
[email protected]).
0-7803-8740-6/05/$20.00 ©2005 IEEE.
persons from the MIT/BIH database with 100% accuracy [5]
by combining a template matching method with a
decision-based neural network (DBNN). In addition, we
investigated the resting palm ECGs on a large, normal,
healthy population for human identification. In a
predetermined group with 10, 20, and 50 persons, we
achieved a 100% identification rate by using prescreening
technology and distance classification. Moreover, the
combined system model was further tested in the
predetermined group with 100 and 168 people to get 96% and
95.3% identification rates respectively [6]. This article
summarizes how the ECG can be used for human
identification for a short-term scale.
The ECG varies from person to person due to the
differences in position, size, and anatomy of the heart, age,
sex, relative body weight, chest configuration, and various
other factors [8]. These variant factors make a person’s ECG
signal unique. From our experimental data, Fig. 1 shows an
example of two persons with exactly the same age, sex,
weight and height who have completely different ECG
patterns.
Figure 1. Two subjects (called No. 217 and No. 225) have completely
different ECG patterns, even though they share the same gender
(female), age (21 years old), weight (56.7 kg), and height (170 cm). The
units on the x axis are sample data point numbers. The sampling rate of
these ECG signals is 500 sps. The units on the y axis are millivolts.
Thaler [9] described that ECG waves can increase in
duration and in amplitude at certain parts of the signal and the
electrical axis shifts with cardiac hypertrophy. However, it is
unclear how factors such as weight, height, and body mass
index ( kg / m 2 ), may influence the ECG for presumed healthy
individuals, and what features extracted from the Lead-I ECG
can be related to these factors. This paper explores the
relationships between the Lead-I biometric ECG features and
the BMI.
II. EXPERIMENTAL SETUP
Unlike the MIT/BIH ECG arrhythmia database from
cardiology patients, this research surveyed a normal healthy
population and all ECG signals are presumed normal. We
investigated short-term, resting, lead-I ECG signals recorded
from 168 individuals (113 females and 55 males) to create our
ECG biometric database. The subjects voluntarily reported
their ages, weights, and heights. The age range is from 19~52
years. Their weight and height range from 45~118 kg and
155~208 cm. The Interquartile Ranges (IQR) of age, weight,
and height are 3 years (Q1:20 and Q3:23), 13 kg (Q1:57 and
Q3:70), and 15.24 cm (Q1:160 and Q3:175), respectively.
Table 1 shows more detail information about our database.
gender, weight, height, and BMI ( kg / m 2 ) influence selected
biometric features. In the preprocessing procedure, baseline
wander, dc shift, power-line noise, and high-frequency
interference are removed [5-6]. In general, standard ECG
machines have a bandwidth between 0.05 Hz and 150 Hz.
With this bandwidth, baseline wander, muscle interference
and other noise are so severe for a palm ECG that we
band-limited the ECG to the frequency range between 1 and
50 Hz. We designed our computer software to randomly
select 20 sequential normal heartbeats from each of the 168
individuals in this investigation to form a 3360-beat group as
an original ECG database. Next, the signal averaging method
was applied on each 20-heartbeat group to create 168 median
heartbeats as our database. Then we extracted the 17 features
listed in Table 2 and Fig. 3 from each heartbeat.
R
RP
amplitude
RQ
amplitude
RS/RS2
amplitude
RT
amplitude
RS
slope
Table 1. General statistic data on ECG biometric database
Females (mean ̈́ S.D.) Males (mean ̈́ S.D.)
Age (year)
20.7 ̈́˄ˁˉ
23.2 ̈́ʳˉˁˉ
Weight (kg)
62.4 ̈́ʳˋˁ˅
77.1 ̈́ʳ˄˅ˁˌ
Height (cm)
166.9 ̈́ʳˈˁˋ
179.8 ̈́ˊˁˌ
22.41
̈́ʳ˅ˁˉˉ
23.80
̈́ʳˆˁˆ˃
2
BMI ( kg / m )
T
ST
slope
ST
amplitude
P
QRS
Triangle
Area
QS amplitude
The subjects’ ECG signals were measured and collected
with an ECG data acquisition unit (BIOPAC Student Lab
PRO system MP30 with software), electrodes (disposable
silver-silver chloride electrodes from BIOPAC Systems,
Inc.), and computers (IBM-compatible PCs). We recorded the
lead I ECG from each subject using two electrodes placed on
the left palm (active and ground) and one electrode on the
right palm as shown in Fig. 2. These subjects were in a
resting position and sitting upright, and they were asked to
relax. Their palms were open and resting on their legs. We
recorded the Lead-I ECG for 90 s at a sampling rate of 500
sps with an amplifier gain of 2000. In the preprocessing
session, we applied digital filters to the raw ECG data to
reduce interference.
Q
S
QS
duration
QT duration
Figure 3. Seven features based on QRST points.
Table 2: Seventeen selected features used for classification
Selected features
Selected features
Selected
features
Angle Q
Angle R
Angle S
RQ amplitude
RS amp./TS amp.
8
15
QS duration
RS 2 amplitude
9
16
RS amplitude
PQ amplitude
10
17
ST amplitude
QS amplitude
11
QT duration**
RP amplitude
12
RS slope
RT amplitude
13
QRS triangular Area
ST slope
14
Note: **The definition of QT duration is different from the clinical
definition of QT interval. The QT duration is the time delay between the
Q and T point. Bazett formula was applied for QT normalization.
1
2
3
4
5
6
7
After we extracted these features, we divided the database
into a female and a male group. Then, we normalized all
features using (1) so that we could compare features with
different units.
Figure 2. Disposable electrodes attached to a subject’s palms.
III. METHODOLOGY
Our ECG biometric database surveyed a young, normal,
healthy population and compared the results with those from
a clinical database. It is crucial to analyze whether age,
Normalized feature =
feature − Globe min … (1)
Globe max − Globe min
where “ Globe min” and “ Globe max” represent the
minimum and maximum values of a certain feature over a
total of 168 people.
We applied correlation analysis and linear regression
methods [10] to analyze these normalized features with the
BMI by using SPSS 12.
IV. RESULTS and DISCUSSION
In statistics, the R-squared value is the fraction of the
variance in the data that is explained by a regression. It is
defined as the ratio of the sum of squares explained by a
regression model and the total sum of squares around the
mean. It can be referred to as the proportion of variation
explained by the model. Table 3 shows that our normalized
ECG biometric features explain 25.3% and 6.5% of the
variability of BMI and age.
Table 3. Comparison of R square values by changing the dependent
variable as age or BMI
Model Summary – dependent variable: BMI ( kg / m 2 )
Model
1
R
Adjusted R
Square
R Square
.503
.253
.185
Std. Error of
the Estimate
2.66366
Model Summary – dependent variable: Age (year)
Model
1
R
Adjusted R
Square
R Square
.255
.065
-.020
Std. Error of
the Estimate
4.1832
As shown in Table 3, the BMI showed much stronger
R-squared values than age, so it was selected for further
analysis. We calculated correlation coefficients to analyze the
correlation level between normalized features and the BMI as
shown in Table 4.
Table 4: Correlation coefficient table formed by calculating the
relationship between normalized features and the BMI.
BMI
BMI
V1 Pearson
V10
Pearson
.354(**)
.268(**)
Correlation
Correlation
Sig. (2-tailed)
.000
Sig. (2-tailed)
.000
Pearson
V2 Pearson
V11
.105
-.017
Correlation
Correlation
Sig. (2-tailed)
.830
Sig. (2-tailed)
.175
V3 Pearson
V12
Pearson
.413(**)
.347(**)
Correlation
Correlation
Sig. (2-tailed)
.000
Sig. (2-tailed)
.000
V4 Pearson
V13
Pearson
.237(**)
.288(**)
Correlation
Correlation
Sig. (2-tailed)
.002
Sig. (2-tailed)
.000
Pearson
V5 Pearson
V14
.192(*)
-.178(*)
Correlation
Correlation
Sig. (2-tailed)
.021
Sig. (2-tailed)
.013
V6 Pearson
V15
Pearson
.424(**)
Correlation
Correlation
.229(**)
Sig. (2-tailed)
.000
Sig. (2-tailed)
.003
V7 Pearson
V16
Pearson
.346(**)
Correlation
Correlation
.317(**)
Sig. (2-tailed)
.000
Sig. (2-tailed)
.000
Pearson
V8 Pearson
V17
-.111
-.197(*)
Correlation
Correlation
Sig. (2-tailed)
.011
Sig. (2-tailed)
.152
V9 Pearson
**
Correlation
is
significant
at
.419(**)
Correlation
the 0.01 level (2-tailed).
Sig. (2-tailed)
.000 * Correlation is significant at
the 0.05 level (2-tailed).
According to Table 4, the largest correlation coefficient
occurs for feature 6 (v6). However, the R-square changes for
all normalized features must be calculated in order to figure
out if this variable is a good predictor of the dependent
variable.
Feature 6 is the only significant predictor of the dependent
variable BMI in Table 5 because the Significance number is
less than 0.05. Features 1, 3, and 12 are the variables
automatically excluded by SPSS 12.
Table 5. Results of R-squared changes among futures - BMI
ANOVA(c)
V2
V4
V5
V6
V7
V8
V9
V10
V11
V13
V14
V15
V16
V17
Reg.
Res.
Total
Sum of
Squares
.639
.001
3.089
31.482
9.341
7.921
6.149
.964
3.600
.867
10.278
7.821
.751
.841
368.423
1085.550
1453.973
df
1
1
1
1
1
1
1
1
1
1
1
1
1
1
14
153
167
Mean
Square
.639
.001
3.089
31.482
9.341
7.921
6.149
.964
3.600
.867
10.278
7.821
.751
.841
26.316
7.095
F
.090
.000
.435
4.437
1.316
1.116
.867
.136
.507
.122
1.449
1.102
.106
.119
3.709
Sig.
.764(a)
.991(a)
.510(a)
.037(a)
.253(a)
.292(a)
.353(a)
.713(a)
.477(a)
.727(a)
.231(a)
.295(a)
.745(a)
.731(a)
.000(b)
R2
Change
.000
.000
.002
.022
.006
.005
.004
.001
.002
.001
.007
.005
.001
.001
a Tested against the full model.
b Predictors in the Full Model: (Constant), V17, V9, V5, V11, V2, V10,
V8, V15, V16, V14, V4, V6, V7, V13.
c Dependent Variable: BMI
Gender analysis for BMI –
The next step was to investigate if gender differences can
cause the different proportion of variation explained by the
model. Based on the female data samples, 23.0% of the
variability among the observed values of the female BMI was
explained by the linear relationship between BMI and the
normalized features. In comparison, significantly, 42.1% of
the variation was explained by the male model.
That is, 57.9% of variation was not explained by this
relationship. Overall, features extracted from male subjects
can explain their BMI values more than those same features
extracted from female subjects. Table 6 lists correlation
coefficients that we analyzed by gender.
Table 6: Correlation coefficient table obtained by calculating the
relationship between normalized features and the BMIs. The correlation
coefficients are separated by gender.
Feature
Female
Male
Feature 1
Gender
0.2678
0.4058
Feature 2
-0.0036
0.0238
Feature 3
0.3120
0.4653
Feature 4
0.1966
0.1727
Feature 5
-0.0443
-0.1826
Feature 6
0.3064
0.4927
Feature 7
0.2868
0.3768
Feature 8
-0.1827
-0.2392
Feature 9
0.3188
0.4777
Feature 10
0.2127
0.2523
Feature 11
0.0780
0.0626
Feature 12
0.2618
0.4082
Feature 13
0.2207
0.3452
Feature 14
0.1384
0.1610
Feature 6 provided the largest correlation coefficient for
the male model. By comparison Features 3 and 9 showed the
biggest correlation coefficient numbers for the female group.
Also, the overall correlation coefficients decreased for the
female database. Features 3, 6, and 9 are all highly correlated
with each other. Fig. 4 shows two scatter plots (normalized
Feature #6 vs. BMI) and linear regression lines for each
gender.
Figure 4. Two scatter plots (normalized Feature 6 vs. BMI) separated by
gender.
V. CONCLUSIONS & FUTURE WORK
This research showed that our normalized ECG biometric
features explain 25.3% of the variability of the BMI. The
ECG features of males can better explain BMI than those of
females. R-square changes showed that Feature 6 is the
significant predictor of overall BMI values against all the
other ECG features. However, the correlation coefficient
between Feature 6 and the overall BMI is only 0.424. These
results imply that certain ECG biometric features are
somewhat correlated with the BMI. A possible explanation
for this is that the BMI may be an indicator of the abdominal
volume. Hence, BMI changes may be correlated with a shift
of
a
subject’s
electrical
heart
axis
making
RS-amplitude-related features (such as Features 3, 6, and 9)
have more correlation with BMI than the other features. We
will evaluate this assumption in a future long-term ECG
study. The future work will measure the above ECG
biometric features several times from the same individual
with changes to the BMI due to significant weight variations.
This will clarify how the BMI can influence certain ECG
biometric features and how those features can be normalized
based on the current BMI.
VI. ACKNOWLEDGMENTS
Special appreciation goes to Dr. Kevin T. Strang and
Andrew J. Lokuta for the full support of this research and for
helping us in the experimental environment. Also, thanks go
to Profs. Ron Serlin and Daniel Bolt who provided many
valuable suggestions.
REFERENCES
[1] Frischholz, R.W., and Dieckmann, U. 2000. BioID: A multimodal
biometric identification system. Computer; Feb. 2000, IEEE. pp.64-68.
[2] Pankanti, S., Bolle R.M., and Jain, A. 2000. Biometrics: The future of
identification. Computer; Feb. 2000, IEEE. pp.46-55.
[3] D. Dubin, Rapid Interpretation of EKG's, V ed. Cover Publishing
Company, Tampa, Florida, 2000.
[4] Biel, L., Pettersson, O., Philipson, L., and Wide, P. ECG analysis: A new
approach in human identification. IEEE Trans. on Instrumentation and
Measurement, vol. 50, No. 3, June 2001.
[5] T. W. Shen, W. J. Tompkins, and Y. H. Hu, "One-lead ECG for identity
verification," 2nd Joint Conf. IEEE Eng. Med. Biol. Soc. & Biomed. Eng.
Soc., pp. 62-63, 2002.
[6] T. W. Shen, "Biometric Identity Verification Based on
Electrocardiogram," PhD thesis, in Biomedical Engineering. University of
Wisconsin, Madison, WI, 2005.
[7] S. A. Israel, J. M. Irvine, A. Cheng, M. D. Wiederhold, and B. K.
Wiederhold, “ECG to identify individuals,” Pattern Recognition, vol. 38, pp.
133-142, 2005.
[8] Simon, B.P., and Eswaran, C. An ECG classifier designed using modified
decision based neural network; Computers and Biomedical Research, 30. pp.
257-72, 1997.
[9] M. S. Thaler, The only EKG book you’ll ever need, 4th ed. Lippincott
Williams & Wilkins, Philadelphia, PA, p.p. 61-93, 2003.
[10] M. Pagano and K. Gauvreau, Principles of biostatistics, 2nd ed. Duxbury,
Pacific Grove, CA, 2000.