1 Whose Move is it Anyway? Authenticating Smart Wearable Devices Using Unique Head Movement Patterns Sugang Li∗ , Ashwin Ashok† , Yanyong Zhang∗ , Chenren Xu‡ , Janne Lindqvist∗ , Macro Gruteser∗ ∗ WINLAB, Rutgers University, North Brunswick, NJ, USA † Carnegie Mellon University, Pittsburgh, PA, USA ‡ CECA, Peking University, Beijing, China Abstract—In this paper, we present the design, implementation and evaluation of a user authentication system, Headbanger, for smart head-worn devices, through monitoring the user’s unique head-movement patterns in response to an external audio stimulus. Compared to today’s solutions, which primarily rely on indirect authentication mechanisms via the user’s smartphone, thus cumbersome and susceptible to adversary intrusions, the proposed head-movement based authentication provides an accurate, robust, light-weight and convenient solution. Through extensive experimental evaluation with 95 participants, we show that our mechanism can accurately authenticate users with an average true acceptance rate of 95.57% while keeping the average false acceptance rate of 4.43%. We also show that even simple head-movement patterns are robust against imitation attacks. Finally, we demonstrate our authentication algorithm is rather light-weight: the overall processing latency on Google Glass is around 1.9 seconds. I. I NTRODUCTION Wearable devices are on the way to become an integral part of people’s daily lives [10], [16], [35]. These devices collect data about the wearers, their surroundings and often even about their health. It is thus critical to the users’ privacy, that this data is protected from unauthorized access. Although there has been work [19], [20], [23] on limiting privacy threats from ubiquitous photography enabled by the wearable devices, robust usable and secure authentication systems leveraging the devices have not emerged. An authentication system for these devices has to strike an appropriate balance with user convenience, especially since users are interacting with an increasing number of wearables. Authentication Challenge. Today, authentication on most commercially available wearable devices [10], [35] relies on an indirect mechanism, where users can log in to their wearables through phones. This requires the wearable device to be registered and paired to the mobile device, and both devices to be carried by the user, which can be highly inconvenient in reality. The security of this approach is also in question as it increases the chance of hacking into both devices if either is lost or stolen. Some devices including Google Glass [16] and FitBit’s health tracker [10] allow linking the device to online accounts instead of a mobile device for convenience, which, however, does not make it more secure. Indirect authentication remains Fig. 1. Illustration of Headbanger. The head-worn device authenticates the users based on signatures generated from head-movement patterns. These patterns are created in response to an audio snapshot played on the device. a dominant paradigm for wearables despite these fundamental shortcomings because such devices are seriously resourceconstrained in many aspects: battery power, computational and storage capabilities, and input/output. As a result, typical authentication methods designed for more resource-rich devices can not be directly applied here; rather, user authentication for wearable devices must operate indirectly through a more capable device. We, however, take the viewpoint that wearables will become more independent units that have to maintain security guarantees without such paired devices and we seek to develop suitable direct authentication methods that are both accurate and light-weight. Before we explore direct authentication methods for wearable devices, let us first consider available solutions for other mobile systems, especially smartphones and tablets. Broadly speaking, the two most commonly used authentication methods on mobile systems are (arguably) password-based methods (with their variants) and biometric-based methods. However, we argue that neither of these two methods is really suitable for wearable devices. Typing passwords or drawing swipe patterns on wearable devices can be rather cumbersome due to their small input/output units, if they do have a touch sensor at all. Collecting and recognizing physiological biometrics (such as DNA, fingerprint, hand/finger geometry, iris, odor, palm-print, retinal scan, voice) requires specialized sensing hardware and processing resources that add cost, and many of these sensors are even larger than the size of wearables themselves. We therefore focus on a third class of direct authentication methods: relying upon the uniqueness of human behavior characteristics such as human walking gait, arm swings, typing 2 patterns, body pulse beats, eye-blinks, etc. This way of authenticating users is often referred to as behavior-based authentication, and it has been studied in the context of authenticating smartphones and tablets [6]–[8], [25], [28]–[30], [36]. The main advantage of using behavioral characteristics for mobile devices is that the signatures can be readily generated from raw data of built in sensors such as motion sensors, camera, and microphones. Considering that cameras and microphones, as well as vision and audio processing algorithms, are quite energy-hungry, we thus focus on those behavioral characteristics that can be easily captured by sensors that require less power consumption, such as accelerometer. More specifically, we propose to authenticate wearable devices to users based on the following behavioral characteristic: our unique body movement patterns and their dependence on external stimuli that wearable devices can generate, such as vibrations and music. Head-movement based authentication. Body movement patterns have long been used by humans to discriminate between people. By watching how a person walks, dances, waves hands, we can often recognize the person from afar. This is because human body movements are usually distinctive and repeatable. Achieving the same through wearables, however, is not straightforward and poses significant research challenges: it is unclear whether these seriously-constrained devices are able to capture the differentiating characteristics of movement patterns, process the data, and quantify the uniqueness of each user’s behaviors. Moreover, each device will have only a limited view of body movements, dependent on its mounting position on the human body. In this paper, we set out to conduct a holistic study of wearable authentication through body movements and to design an accurate, robust and lightweight authentication system. A key distinguishing feature of our work is that we will also consider stimuli that wearable devices can provide, particularly stimuli that are difficult to observe even for the closest adversaries. For example, we can use fast-tempo music through earbuds to stimulate movements and to make such free-style movements more repeatable. In particular, we have designed, implemented and evaluated Headbanger, an authentication system that can authenticate users by sensing head movements when listening to music beats. Although we use Google Glass as a running example, our design can be applied to other head-worn gadgets and any system that can record head-movements through motion sensing. Our choice for using head movements is motivated by the fact that head-worn wearables are becoming very common today and such devices are already equipped with motion sensors; for example, personal imaging and heads-up display devices, gaming headsets, augmented reality devices. In summary, the key contributions of this paper are: 1) We have designed and implemented a novel user authentication method for wearable devices using head-movement patterns. Our study shows that user’s head-movement patterns contain unique signatures that when inferred correctly can be used as valid means for authentication. We design a system, Headbanger, that records, processes, and classifies head-movement patterns of users based on the built-in accelerometer sensor readings. 2) Through comprehensive experiments involving 95 participants and over different system design parameters we show that head-movement patterns can generate accurate authentication results. Our approach effectively identifies a wearable device user, with an average false acceptance rate of 4.43% and an average true-positive rate of 95.57%. Also, we show that our authentication method is quite robust: when a user slightly increases her head-movement complexity, it quickly becomes much harder for attackers to imitate the movement pattern. 3) We implement Headbanger on Google Glass and carefully profile the execution time of each software module in the implementation. Our measurements indicate an average processing latency of 1.93 seconds on the Google Glass for the best authentication performance. II. BACKGROUND A. Mobile Device Authentication Through Body Movements A number of body-movement based authentication approaches have been proposed for mobile devices. These systems leverage unique signatures from human behavior that may be subconscious or in response to external stimulus or both. For example, it has been shown that gait (e.g. stride length, the amount of arm swing) when the user is walking or running is a reliable identification cue, and irrespective of the environment [36]. Okumura et. al. [29] have shown that human arm swing patterns can be used to create signatures to authenticate to their cell-phones. Monrose et.al. [28] show that keystroke rhythms, when users type on the keyboard, that include typing dynamics such as how long is a keystroke, how far is between consecutive strokes, and how is the pressure exerted on each key, can be used to authenticate users. Similarly, mouse usage dynamics [25] and touchpad touching dynamics [3], [7] have also been shown to serve as potential authentication cues. We take the viewpoint that, in comparison to other means of authentication, body-movement based authentication may offer great convenience. With this rationale, we design an authentication system, dubbed Headbanger, for head-worn devices by monitoring user’s unique head-movement patterns in response to an external audio stimulus. B. Using Head-movement for Authentication According to Jain et al. [22], a type of body movement is useful for authentication when it is universal, distinctive, repeatable, and collectible. Sensors for collecting headmovement patterns are available on most of today’s headworn wearable devices, and thus making head movements both universal and collectible. In this paper, we show that free-style head movements are distinctive and repeatable, especially when combined with external stimuli such as music beats. In Headbanger, music plays a crucial role in stimulating body movements such that the resulting movement pattern is natural to the user (more distinctive) and easier to remember (more repeatable). Zentner and Eerola [38] have shown that most people move their body 3 Response Time Motion Peak Beat Fig. 2. The response time of a head motion is the interval between the motion and the music beat to which the motion responds. From a sequence of head motions, we can obtain the response time sequence. as a natural response to external rhythmic stimuli such as music; even at a very early age, infants respond to music and their movements speed up with the increasing rhythm speed. Most adults naturally perform head movements or hand movements when listening to a fast beat audio track [24]. When combined with external rhythmic stimuli, we believe body movements become more distinctive – not only a person’s movement pattern is unique, but their response to rhythmic stimuli is also unique. In this way, the resulting authentication system will be more dependable. C. Motivation for Headbanger Next, we conducted a preliminary experiment to investigate whether head-movement patterns can be potentially used to authenticate users. In this experiment, we collected headmovement accelerometer data from 28 subjects, wherein each subject was asked to perform a simple nodding movement following a short audio track (referred to as music cue in the rest of the paper). For this purpose, we examined a simple aspect of the head-movement pattern, response time, which is the time interval between a music beat and the corresponding nodding motion, as shown in Figure 2. The response time indicates how quickly a person responds to music beats. In this experiment, we collected 20 samples from each subject, hence 20 response time time series for each subject. Then we calculated the average distance scores between these response times, both for the same subject and across subjects. We considered three types of distance metrics: cosine distance (COS), correlation distance (COR), and dynamic time warping (DTW) distance [2], and the readers can find the detailed description of these three metrics in Section III-B. We plot these three types of distance values in Figures 3 (a)-(c), respectively. In each plot, we include distance values for all 28 subjects – for each subject, we plot average distance scores between her and each of the other 27 subjects (referred to as distances with false subjects, shown in blue dots), as well as average distance scores among her own response times (referred to as distances with true subjects, shown in red squares). All three figures clearly demonstrate that the average Fig. 4. Headbanger system design flow. The online authentication phase of Headbanger consists of the following steps: (1) sensor data collection in which we collect accelerometer data while users move their head as a response to an audio track played on the glass, (2) filtering in which we apply a Butterworth filtering to smoothen the sensor data for subsequent processing, (3) parameter generation in which we calculate the distances between two accelerometer samples as the parameter, and (4) classification in which we adopt an adaptive thresholding mechanism to classify the user’s head movement, whose output will be used as the authentication result. distance score between a subject’s samples is much lower than that among different subjects’ samples, which further suggests that a subject exhibits repeatable and unique head nodding characteristics. These observations suggest that even with simple nodding movements, the accelerometer data collected by Google Glass have the potential to be used for accurate and robust authentication. Motivated by this observation, we next devise the Headbanger authentication system. III. H EADBANGER S YSTEM D ESIGN Headbanger enables direct authentication of users to their smart-glass devices or smart-glass apps using headmovements. We posit that Headbanger will run as a service in the device upon power-up or application start-up, similar to the screen-lock in smartphones. The authentication process has two phases: an offline training phase and an online authentication phase. In the training phase, the system collects sensor readings when the real user moves her head with a music cue (following her pre-designed movement pattern), and uses the collected data to build a classifier. In the following discussion, we assume there is only one real user for the device for the sake of simplicity. An extension to support multiple users per device can be realized through minor modifications, namely, by appropriately indexing the users in the trained database. In the online authentication phase, we collect sensor samples during a user’s authentication attempt and label the samples using the Headbanger classifier. The user is authenticated upon a successful classification. 4 Cosine-distance with False Subjects Cosine-distance with True Subject 2.0 Correlation-distance with False Subjects Correlation-distance with True Subject 2.0 5000 1.6 1.4 1.2 1.8 4500 Distance Score Distance Score Distance Score 1.8 4000 1.6 1.4 3500 3000 1.2 1.0 0 DTW-distance with False Subjects DTW-distance with True Subject 5500 5 10 15 Subject ID 20 25 30 (a) Cosine Distance 0 2500 5 10 15 Subject ID 20 25 30 (b) Correlation 0 5 10 15 Subject ID 20 25 30 (c) DTW Fig. 3. (a) Cosine, (b) Correlation and (c) DTW distances are computed over 20 response time time series for each subject, with 28 subjects in total. For each subject’s column, a red square represents the average distance among that subject’s own response times, while a blue dot represents the average distance to other subjects’ response times. The results show that red distances are always lower than blue ones, suggesting that different subjects’ head-movement patterns are distinguishable. As illustrated in Figure 4, Headbanger involves the following key modules: sensor data collection and filtering, sample distance computing, and classification. We will now discuss these design aspects in more detail. A. Sensor Data Collection and Filtering The data collection step involves collecting motion sensor data (we mainly focus on accelerometer in this study) while the user makes head-movements in response to the music cue with a duration of T seconds. The raw accelerometer signals are collected at a sampling rate of r samples/sec. The accelerometer data corresponding to one user, is a collection of accelerometer readings on the 3D axis (x, y, and z) collected over T -second duration, stored in a matrix with dimensionality 3 × rT . We will refer to this 3 × rT matrix as a sample. We retain the duration T to be in the order of few seconds, as frequency of human head movements are, intuitively, typically in the order of few times per second. Next, we filter the raw samples to remove noises due to spurious movements such as vibration or shaking. We adopt a low-pass digital Butterworth filter [4] and set a relaxed cut-off frequency of 10Hz. B. Sample Distance Computing In this study, we build a distance-based classifier for its simplicity is well suited for wearable devices. There are various ways of computing distances between two signals; we have considered three popular distance-computing algorithms in this study – Cosine (COS)distance, Correlation (COR)distance, and dynamic-time warping (DTW)distance. Suppose we have two time series Sa = (s1 , s2 , ..., sn ) and Sb = (s1 , s2 , ..., sn ). Their COS distance is calculated as Sa ·Sb kSa k×kSb k ; The COR distance is calculated by dividing their distance covariance by the product of their distance standard deviations; The DTW distance is defined as the distance when the optimal alignment between Sa and Sb has been determined by ”warping” them non-linearly [2]. C. Classification The classification step labels a test sample as “true” or “false” depending upon whether its distance to the real user’s training samples is below a threshold. Again, we choose this method because it strikes a good balance between simplicity and performance. Next, we explain how we build the classifier and how to conduct online classification in detail: 1) Identifying Top-K Training Samples. Given M training samples, we first identify the K samples that are closest to all the training samples. For each training sample, we calculate its average distance to the other M − 1 samples, and then choose those K samples that have the lowest average distance values. These K samples are empirical estimation of the centroid of the sample space, and thus best represent the space among the collection of the training samples. We refer to them as Top-K samples. In our classifier, we focus on the Top-K training samples instead of all the training samples because it does not only incur much less computing overhead, but it also provides much better robustness against noises in training data. 2) Establishing Distance Threshold. Suppose a sample, s, is one of the Top-K samples. We have its distance scores to the other M − 1 samples in the training set, from which we can calculate the distance mean µs and distance standard deviation σs . Then sample s’s distance threshold is defined as (µs + nσs ), where n is referred to as the threshold parameter for our distance-based classifier. 3) Classifying Test Sample. If we use a training sample s to classify the test sample t, then t is labeled as a true sample if the distance between s and t is less then s’s distance threshold (µs + nσs ); otherwise, it is labelled as a false sample. The strictness of this classifier is characterized by the value of the threshold parameter, n; a large n can increase the false acceptance rate while a small n value can result in a high rejection rate of true samples. 4) Voting. We label the test sample according to all K Top-K samples, and the final classification result is the majority decision among all K individual classification results. 5 0.8 0.8 0.6 DTW-EER=4.43% COS-EER=18.72% COR-EER=24.94% 0.4 0.4 0.2 0.00.0 DTW Cosine Distance Correlation 0.2 0.4 0.6 False Accepted Rate 0.8 1.0 Fig. 5. The impact of the distance computing algorithm (i.e., DTW, cosine distance, and Correlation). In this set of results, we varied the value of n from 0.0 to 10.0 with an increment of 0.1, resulting in 100 thresholds in each case. We then plotted the TPR (y-axis) and FAR (x-axis) for each threshold. The results show that DTW delivers much better accuracies than the other two distance algorithms. As a result, in the remaining of the study, we will choose DTW for distance computing. Among the four steps outlined above, the first two steps belong to the offline training phase, while the last two steps belong to the online authentication phase. Finally, if the user’s test sample is classified as “true” then the user is authenticated to the device; otherwise, the user is rejected. IV. E VALUATION We conducted comprehensive evaluation of Headbanger through laboratory studies with human subjects – our studies were approved by the Institutional Review Board (IRB) of our institution. In the first phase of evaluation, we collected from volunteer participants accelerometer sensor readings with Google Glass. We analyzed these traces offline on a PC. Our evaluation in this phase is primarily aimed at determining the accuracy and robustness of Headbanger. In the second phase of evaluation, we implemented a Headbanger app and measured its processing latency. Our measurements suggest that Headbanger is indeed light-weight and can be executed on wearable devices such as Google Glass. A. Authentication Accuracy of Headbanger 1) Participants: We had a total of 30 volunteer participants for this experiment, including a total of 19 males and 11 females. The mean age of the participants was 29.7 years with a standard deviation of 9.81 years. The youngest participant was 23 years old while the eldest was 54 years old. 2) Procedure: Our first experiment setup aimed at emulating the typical usage scenario of Headbanger for authentication, where a user conducts head-movements in response to a music cue played on the Google Glass device during a login attempt. In this experiment, all participants were asked to wear a Google Glass device. Participants who originally wore glasses (e.g. corrective eyewear) were asked to remove their glasses before conducting the experiment. The trials were conducted 0.2 0.00.0 1.00 0.98 0.96 0.94 K1-EER=4.43% K3-EER=3.93% 0.92 0.90 0.000.020.040.060.080.10 K=1 K=3 0.2 0.4 0.6 False Accepted Rate 0.8 1.0 Fig. 6. The impact of different K values: K = 1 and K = 3. In this set of results, we varied the value of n from 0.0 to 10.0 with an increment of 0.1, resulting in 100 thresholds in each case. We then plotted the TPR (y-axis) and FAR (x-axis) for each threshold. The results show that voting schemes (K = 3) provide minor improvement on the performance. As a result, in the remaining of the study, we will choose K = 1. 1.0 0.8 1.00 True Positive Rate 0.6 True Positive Rate 1.0 True Positive Rate 1.0 0.6 0.95 0.90 0.4 0.2 0.00.0 0.85 M30-EER=4.43% M20-EER=6.22% M10-EER=7.60% 0.80 0.00 0.05 0.10 0.15 0.20 0.2 0.4 0.6 False Accepted Rate M=30 M=20 M=10 0.8 1.0 Fig. 7. The impact of training dataset size: 10, 20, and 30 samples. In this set of results, we varied the value of n from 0.0 to 10.0 with an increment of 0.1, resulting in 100 thresholds in each case. We then plotted the TPR (y-axis) and FAR (x-axis) for each threshold. The results show that having 30 samples delivers the best performance without adding to the online authentication computing overhead. As a result, in the remaining of the study, we will choose to have 30 samples. in an academic environment and overseen by one of our team members. The Google Glass ran our data-collection app that played a piece of music (music cue) for a specific duration, and recorded the accelerometer sensor readings. The sensor readings were recorded into a text file that was stored in the Glass’s memory and later transported to a PC for offline processing through a Python script. The experiment was conducted in a well-lit indoor academic laboratory environment. During the course of a data collection session, the participants were allowed to take a break or withdraw from data collection if they felt uncomfortable at any point. The participants also could take a break for a minute after each trial. Each trial lasted for the duration of the music cue played 6 10 EER (%) 8 6.14 6 4 6.65 4.43 2 0 10 Sec 6 Sec 5 Sec Fig. 8. The EER value of Headbanger when users choose different music cue lengths (10 sec, 6 sec and 5 sec). We have an EER value of 6.65% with a 5-second music cue, and 4.43% with a 10-second music cue, which is better than results of many similar body-movement based authentication systems (such as those in [13], [31]). on the Glass, and a total of 40 such trials were conducted for each of the 30 participants. The entire data collection effort lasted over a duration of 60 days, of which 15 participants conducted their trials in a single sitting over a period of two hours, while the rest of the trails were spread over 3 days on average per subject. 3) Metrics: We evaluate the accuracy of Headbanger using metrics that are commonly used in evaluating authentication systems, namely, true positive rate TPR (percentage of true test samples that are correctly accepted), false acceptance rate FAR (percentage of false test samples that are mistakenly accepted), and true rejection rate TRR (percentage of true test samples that are mistakenly rejected). These three metrics are, however, largely dependent on the choice of the classification threshold parameter in Headbanger– a strict threshold in the classifier can lead to a high TRR value, while overly relaxing the threshold can lead to a high FAR. Hence, in order to report the threshold-independent performance, we also consider equal error rate EER which is the percentage of errors when we have F AR = T RR. 4) Tuning Important System Parameters: Figures 5-7 present the results on how the system’s performance is impacted by several important system parameters, namely, the choice of sample distance computing algorithm, the number of best representative training samples used for classification (K), and the number of total training samples (M ). Based upon these results, we tune the parameter values to balance the tradeoff between authentication accuracy and data collection/computing overhead. Recall that our classifier employs a distance threshold of (µ + nσ), where µ and σ are calculated from the training samples. For each parameter study, we varied the value of n from 0.0 to 10.0 with an increment of 0.1, and had a total of 100 threshold values. We then plotted the TPR on y-axis and FAR on x-axis for each threshold value, resulting curves referred to as Receiver Operating Characteristics (ROC) curves. Firstly, Figure 5 compares the performance of three distance computing algorithms: COS (cosine), COR (correlation) and DTW, assuming a single best representative training sample K = 1, music cue duration of 10s, and 30 training samples M = 30. Among these three algorithms, DTW fares much better than the other two: its EER is 14.29% smaller than that of COS distance, and 20.51% smaller than that of COR distance. This is as expected because DTW is designed to match the waveform of two signals [2] and thus outputs more accurate distance score. As a result, in the remaining of this study, we will use DTW for evaluation. Even though DTW incurs more computation than the other two, our Google Glass implementation shows that through software optimization, the processing latency of DTW distance can be made very small (see Section IV-C). Secondly, Figure 6 compares the performance of two K values: K=1 and K=3, assuming DTW distance, music cue duration of 10s, and 30 training samples. Recall that our classifier compares the test sample against K best representative training samples, generates K independent classification results, and votes among them for the final authentication result. Hence, we expect that considering top 3 samples will be better than only considering the top 1 sample, as confirmed by the results shown in Figure 6. However, we observe the improvement is very marginal: the EER when K = 3 is only 0.5% smaller than the EER when K = 1. On the other hand, having K = 3 incurs three times as much computation as having K = 1. As a result, in the remaining of this study, we will use K = 1 for evaluation. Thirdly, Figure 7 compares the performance of three training dataset sizes: M=10, 20, and 30 samples, assuming DTW distance, K = 1, and music cue duration of 10s. We observe that the EER when M = 30 is 1.79% smaller than the EER when M = 20 and 3.17% smaller than that when M = 10. We emphasize that the value of M has NO impact on the computing overhead in the online authentication phase because the classifier only compares the test sample against K representative training samples. As a result, in the remaining of this study, we will use M = 30 for the evaluation. Please note that we don’t choose use a larger M value because the benefit of having a larger M diminishes quickly after having 30 samples. 5) Authentication Accuracy Results: After tuning system parameters to balance accuracy and computing overhead, we next calculate the EER value of the resulting Headbanger system when users choose different music cue durations and present the results in Figure 8. As soon as a user starts the authentication procedure, how long the user must wait before she receives the authentication result is an important qualityof-service metric, which we refer to as authentication latency. Authentication latency consists of two parts: data input latency and data processing latency, wherein the former is the time a user spends on listening to the music cue and making head movements while the latter is the time Headbanger spends on computing the authentication result. Between these two parts, data input latency is by far the bottleneck as the data processing latency can be easily reduced (by better algorithms and/or 7 Fig. 9. Pictorial description of the nodding pattern employed by each target. Target (a) only moved his head in the vertical direction, and his nodding immediately follows each music beat (on-beat); target (b) only moved his head in the vertical direction, but there is often a delay between his nodding and the music beat (off-beat); target (c) occasionally combined shaking with nodding, and on-beat. As a result, the nodding patterns in (b) and (c) are slightly more complex than that in (a). faster hardware) and/or hidden (by pipelining computing with data collection). Unfortunately, data input latency is hard to be reduced or hidden by the improvement in software or hardware. Recognizing that different users can tolerate different latencies and desire different levels of authentication accuracy, Headbanger allows the users to choose the music cue duration (which has the same length as the data input latency). From Figure 8, we observe that the EER value of Headbanger is 6.65% with a 5-second music cue, and 4.43% with a 10-second music cue. We take the viewpoint that such error rate is rather sufficient for personal head-worn devices that are not used in hostile environments. Also, the data input latency of 5-10 seconds is comparable with similar authentication systems. For example, a gait-based authentication system [13] delivers an EER of 7.3% after observing the user for 20 meters; a pattern-swipe system [7] delivers at best a TPR of 98% and a FAR of 50% with offline processing; an eye-blinking-based authentication system [31] delivers the BAC=(TPR + FAR)/2 of 94.4%, with a processing latency of 32 seconds. B. Authentication Robustness of Headbanger After evaluating authentication accuracy for Headbanger, we next study its robustness. In this study, we focus on imitation attacks. For this purpose, we asked a number of participants (attackers) to imitate simple nodding patterns from three participants (targets) after watching the target’s video for as long as they desire, and calculated their chances of successful imitation - i.e., the attacker is mistakenly accepted by Headbanger. We note that nodding is the simplest headmovement pattern and easiest to imitate; hence, the results presented here represent the lower bound robustness that Headbanger offers. In reality, users are more likely to employ more sophisticated head-movement patterns, which will be much harder to imitate. 1) Participants: We had a total of 37 volunteer participants, including 31 males and 6 females. The average age of the participants was 25.6 years with a standard deviation of 6.6 years. The youngest participant was 22 years old while the eldest was 49 years old. 2) Procedure: Our second experiment aimed at emulating a practical imitation attack scenario. In this experiment, three targets recorded video when they were nodding with a music cue. Note that the music is usually played via a bone conduction speaker or an earplug, and that it is difficult to use a camcorder to capture the music sound in a noisy environment. To address this concern, during recording, we set the speaker volume to maximum and conducted the recording in a quiet laboratory environment. We divided the attackers into three groups, and asked each group to imitated one target. In each session (consisting of 30 trials), the attacker could watch the video for as long as they wish. Our system provided a feedback after each trial so that the attackers could adjust their nodding pattern if they wanted. After the attacker had 30 trials, we ended the session no matter whether the attacker had succeeded or not. In each session, we noted the total number of successes the attacker had as well as the number of trails before the first successful imitation. 3) Results: Each of the three targets performed simple nodding in this experiment; though simple, their nodding patterns have varying complexity. As shown in Figure 9, Target A moved his head vertically, each nodding on a music beat, with no noticeable horizontal movement; target B also only moved his head in the vertical direction, but there was always a delay between his nodding and the corresponding music beat; target C combined slight shaking along with nodding, and closely followed music beats. Among these three targets, A is the easiest, while the other two are slightly more complex. Table I summarizes the results. Here, we use FAR to denote the successful imitation rate because a successful imitation is counted as a false acceptance in our system. The overall FAR of the experiment is 6.94%, while the individual FARs for the three targets are 15.83%, 2.77% and 2.72% accordingly. Since target A had the easiest nodding pattern, 7 out of 12 participants could succeed at least once during their 30 trials, while for targets B and C, the numbers are 3 out of 13 and 3 of 12, respectively. These results are very promising: when a user employs slightly more complex head-movement patterns, it becomes much harder for others to imitate (EER dropping from more than 15% to around 2.7%). Target A B C Overall TABLE I. No. of Attackers 12 13 12 38 No. of Successful Attackers 7 3 3 13 Average No. of Trials Before First Successful Login 10.33 14.33 17.67 13.17 FAR (%) 15.83 2.77 2.72 6.94 T HE ATTACK RESULTS SHOW THAT AS HEAD - MOVEMENT PATTERNS BECOME MORE COMPLEX , IT BECOMES MUCH HARDER TO IMITATE . 8 Fig. 10. Software modules of Headbanger implementation C. Headbanger Google Glass App Implementation In the second phase of evaluation, we implemented Headbanger on Google glass as an authentication app. Figure 10 shows the main software modules in the app. Upon initiation by the user, the app plays a music cue for a user-specified duration. The user conducts head-movements in synchrony with the music cue while the app records the accelerometer data in parallel. At the end of the music cue duration the app enters the data processing phase where the sensor readings are input to the Headbanger’s software modules for processing. Upon completion of data processing, the app responds with a YES or NO textual output on the Google Glass screen. The training phase is conducted offline on a PC, and we ensure that the training set is readily available during the authentication process. 1) Data Processing Latency: In Table II we report the measured average processing latency of the Headbanger app for music cue durations of 5, 6 and 10 seconds. The processing latency is within 2 seconds for a 10-second data input, and is less than 1 second (0.88s) for a 5-second input. We also report the total latency breakdown for different software modules, and find that DTW computation is the main bottleneck. In our implementation, we used Fast DTW [33] which decreases the computation complexity from O(n2 ) to O(n) without compromising the authentication accuracy. As we discussed earlier, the processing latency can be further reduced, and it could be partially hidden if we pipeline the data processing and data input. As a result, we believe that our design of Headbanger is indeed light-weight and it is realistic to run the Headbanger app on devices that have comparable computing capabilities as the Google Glass. V. R ELATED W ORK Several studies have looked at head or eye movements for various purposes including user authentication. Harwin et music cue duration (s) 10 6 5 data processing latency (s) 1.93 1.15 0.88 time Filtering 0.49 0.63 0.82 breakdown (%) DTW Thresholding 99.50 0.01 99.36 0.01 99.19 0.01 TABLE II. M EASURED RESPONSE TIME OF Headbanger APP IMPLEMENTATION ON G OOGLE G LASS WITH DIFFERENT MUSIC CUE DURATIONS AND FOR K = 1. T HE RESPONSE TIME REPORTED HERE IS AN AVERAGE OVER 20 TRIALS . al. [17] are the first to use head gestures for human computer interaction. Westeyn et al. [37] used eye-blinking pattern as a unique feature for authentication. They achieved 82.02% accuracy with 9 participants. Rogers et al. [31] proposed to use unconscious blinking and head movement to identify a user from a group of users. In this method, users were asked to view rapidly changing pictures for 34 seconds before they can be identified. Ishimaru et al. [21] comes close to our system design; they proposed to combine the eye blinking frequency from the infrared proximity sensor and head motion patterns from accelerometer sensor on Google Glass to recognize activities (e.g., reading, talking, watching TV, math problem solving). We note that recognizing activities is usually much easier than recognizing who is performing the activity, which is our objective in this study. There are also a number of physiological activity recognition studies using computer vision methods [18], [26]. While [26] primarily uses computer vision to detect head gestures, BioGlass [18] combines Google Glass’s accelerometers, gyroscope, and camera to extract physiological signals of the wearer such as pulse and respiratory rates. Camera processing on wearable devices, especially Google Glass is compute intensive and has a high energy budget [27]. Accelerometers have long been used to sense, detect and also recognize movements in other parts of the body; for example, gait recognition requires sensing in areas such as waist [1], pocket [14], arm [15], [29], leg [13] and ankle [12]. These techniques, though well known, may not be suitable for on wearable devices due to complexity (computation and energy) in the machine learning process. Hand gesture and touchscreen dynamics are often coupled for authenticating to a (touchscreen) device (see e.g. [5] for a survey). A number of features [5], [32] (e.g. finger length, hand size, swipe/zoom speed, acceleration, click gap, contact size, pressure) and behavioral feature (e.g. touch location, swipe/zoom length, swipe/zoom curvature, time, duration) have been exploited as for authentication purposes [5], [9], [11], [34]. VI. C ONCLUDING R EMARKS AND F UTURE D IRECTION As wearable devices are increasingly weaved into our everyday life, providing security to the data acquired by or accessed through these devices becomes critically important. In this study, we have developed a user authentication system that uses head-movement patterns for direct authentication to a wearable device. Compared to existing authentication solutions, the proposed solution delivers accurate authentication, is robust against imitation attacks, incurs low processing delays, and offers great convenience to users. Through an extensive evaluation that involves 95 users, we observe that the average true acceptance rate of our approach is at 95.57% and the false acceptance rates at 4.43%. We also observe that even simple head-movement patterns only allow less then 3% of the imitation attacks to succeed. We have also implemented an app on Google Glass, and measured the end-to-end processing delay of less than 2 seconds for a 10second data sample. As a result, we believe it is realistic for 9 the proposed authentication system to be executed on resourceconstrained devices such as smart-glass. We further believe the proposed method can help enable wider deployment of wearable devices and apps in our life. Towards this goal, in our future work, we will focus on how to make the headmovement based user authentication approach more reliable in real-world settings. VII. ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation under Grant Number 1228777. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. R EFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] H. J. Ailisto, M. Lindholm, J. Mantyjarvi, E. Vildjiounaite, and S.-M. Makela. Identifying people from gait pattern with accelerometers. In Defense and Security. International Society for Optics and Photonics, 2005. D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In ACM KDD AAAI workshop, 1994. C. Bo, L. Zhang, X.-Y. Li, Q. Huang, and Y. Wang. Silentsense: silent user identification via touch and movement behavioral biometrics. In ACM MobiCom, 2013. R. Challis and R. Kitney. The design of digital filters for biomedical signal processing part 3: The design of butterworth and chebychev filters. Journal of biomedical engineering, 1983. G. D. Clark and J. Lindqvist. Engineering gesture-based authentication systems. Pervasive Computing, IEEE, 14(1):18–25, 2015. C. Cornelius, R. Peterson, J. Skinner, R. Halter, and D. Kotz. A wearable system that knows who wears it. In ACM MobiSys, 2014. A. De Luca, A. Hang, F. Brudy, C. Lindner, and H. Hussmann. Touch me once and i know it’s you!: implicit authentication based on touch screen patterns. In ACM CHI, 2012. A. De Luca and J. Lindqvist. Is secure and usable smartphone authentication asking too much? Computer, 48(5):64–68, May 2015. T. Feng, J. Yang, Z. Yan, E. M. Tapia, and W. Shi. Tips: contextaware implicit user identification using touch screen in uncontrolled environments. In ACM HotMobile, 2014. Fitbit. http://en.wikipedia.org/wiki/Fitbit. M. Frank, R. Biedert, E.-D. Ma, I. Martinovic, and D. Song. Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication. Information Forensics and Security, IEEE Transactions on, 8(1):136–148, 2013. D. Gafurov, P. Bours, and E. Snekkenes. User authentication based on foot motion. Signal, Image and Video Processing, 2011. D. Gafurov, K. Helkala, and T. Søndrol. Biometric gait authentication using accelerometer sensor. Journal of computers, 1(7):51–59, 2006. D. Gafurov, E. Snekkenes, and P. Bours. Gait authentication and identification using wearable accelerometer sensor. In IEEE AIAT, 2007. D. Gafurov and E. Snekkkenes. Arm swing as a weak biometric for unobtrusive user authentication. In IEEE IIHMSP, 2008. Google glass. http://en.wikipedia.org/wiki/Google Glass. W. Harwin and R. Jackson. Analysis of intentional head gestures to assist computer access by physically disabled people. Journal of biomedical engineering, 12(3):193–198, 1990. [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] J. Hernandez, Y. Li, J. M. Rehg, and R. W. Picard. Bioglass: Physiological parameter estimation using a head-mounted wearable device. In IEEE MobiHealth, 2014. R. Hoyle, R. Templeman, D. Anthony, D. Crandall, and A. Kapadia. Sensitive lifelogs: A privacy analysis of photos from wearable cameras. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 1645–1648. ACM, 2015. R. Hoyle, R. Templeman, S. Armes, D. Anthony, D. Crandall, and A. Kapadia. Privacy behaviors of lifeloggers using wearable cameras. In ACM UbiComp, 2014. S. Ishimaru, K. Kunze, K. Kise, J. Weppner, A. Dengel, P. Lukowicz, and A. Bulling. In the blink of an eye: combining head motion and eye blink frequency for activity recognition with google glass. In ACM AH, 2014. A. K. Jain, A. Ross, and S. Prabhakar. An introduction to biometric recognition. Circuits and Systems for Video Technology, IEEE Transactions on, 14(1):4–20, 2004. S. Jana, A. Narayanan, and V. Shmatikov. A scanner darkly: Protecting user privacy from perceptual applications. In Security and Privacy (SP), 2013 IEEE Symposium on, pages 349–363. IEEE, 2013. A. R. Jensenius. Action-sound: Developing methods and tools to study music-related body movement. 2007. Z. Jorgensen and T. Yu. On mouse dynamics as a behavioral biometric for authentication. In ASIACCS, 2011. R. Kjeldsen. Head gestures for computer control. In IEEE ICCV Workshop, 2001. R. LiKamWa, Z. Wang, A. Carroll, F. X. Lin, and L. Zhong. Draining our glass: An energy and heat characterization of google glass. In Proceedings of 5th Asia-Pacific Workshop on Systems, page 10. ACM, 2014. F. Monrose and A. D. Rubin. Keystroke dynamics as a biometric for authentication. Future Generation computer systems, 16(4):351–359, 2000. F. Okumura, A. Kubota, Y. Hatori, K. Matsuo, M. Hashimoto, and A. Koike. A study on biometric authentication based on arm sweep action with acceleration sensor. In IEEE ISPACS, 2006. T. Rahman, A. T. Adams, M. Zhang, E. Cherry, B. Zhou, H. Peng, and T. Choudhury. Bodybeat: a mobile system for sensing non-speech body sounds. In ACM MobiSys, 2014. C. E. Rogers, A. W. Witt, A. D. Solomon, and K. K. Venkatasubramanian. An approach for user identification for head-mounted displays. In Proceedings of the 2015 ACM International Symposium on Wearable Computers, pages 143–146. ACM, 2015. N. Sae-Bae, K. Ahmed, K. Isbister, and N. Memon. Biometric-rich gestures: a novel approach to authentication on multi-touch devices. In ACM CHI, 2012. S. Salvador and P. Chan. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis, 2007. M. Sherman, G. Clark, Y. Yang, S. Sugrim, A. Modig, J. Lindqvist, A. Oulasvirta, and T. Roos. User-generated free-form gestures for authentication: Security and memorability. In ACM MobiSys, 2014. Smart watch. http://en.wikipedia.org/wiki/Smartwatch. S. V. Stevenage, M. S. Nixon, and K. Vince. Visual analysis of gait as a cue to identity. Applied cognitive psychology, 13(6):513–526, 1999. T. Westeyn and T. Starner. Recognizing song-based blink patterns: Applications for restricted and universal access. In IEEE FGR, 2004. M. Zentner and T. Eerola. Rhythmic engagement with music in infancy. Proceedings of the National Academy of Sciences, 2010.
© Copyright 2026 Paperzz