Exploring Low Profile Techniques for Malicious Code

University of Colorado, Boulder
CU Scholar
Computer Science Graduate Theses & Dissertations
Computer Science
Spring 1-1-2013
Exploring Low Profile Techniques for Malicious
Code Detection on Smartphones
Bryan Charles Dixon
University of Colorado at Boulder, [email protected]
Follow this and additional works at: http://scholar.colorado.edu/csci_gradetds
Part of the Information Security Commons
Recommended Citation
Dixon, Bryan Charles, "Exploring Low Profile Techniques for Malicious Code Detection on Smartphones" (2013). Computer Science
Graduate Theses & Dissertations. 69.
http://scholar.colorado.edu/csci_gradetds/69
This Dissertation is brought to you for free and open access by Computer Science at CU Scholar. It has been accepted for inclusion in Computer
Science Graduate Theses & Dissertations by an authorized administrator of CU Scholar. For more information, please contact
[email protected].
Exploring Low Profile Techniques for Malicious Code
Detection on Smartphones
by
Bryan Dixon
B.S., North Carolina State University, 2007
B.S., North Carolina State University, 2007
B.S., North Carolina State University, 2007
M.S., University of Colorado at Boulder, 2012
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Department of Computer Science
2013
This thesis entitled:
Exploring Low Profile Techniques for Malicious Code Detection on Smartphones
written by Bryan Dixon
has been approved for the Department of Computer Science
Shivakant Mishra
Prof. Richard Han
Prof. Qin Lv
Prof. John Black
Prof. Eric Keller
Date
The final copy of this thesis has been examined by the signatories, and we find that both the
content and the form meet acceptable presentation standards of scholarly work in the above
mentioned discipline.
iii
Dixon, Bryan (Ph.D., Computer Science)
Exploring Low Profile Techniques for Malicious Code Detection on Smartphones
Thesis directed by Prof. Shivakant Mishra
In recent years there has been a growing number of viruses, rootkits, and malware designed to
gain access to system resources and information stored on smartphones. Most current approaches
for detecting this malicious code have detrimental impacts on the user in terms of reduced functionality, slower network speeds, or loss of battery life.
This work presents a number of approaches that have a minimal impact on the user but
offer successful detection of potential malicious code on the smartphone. We do this primarily by
focusing on anomalous power use as a method for detecting the presence of malicious code. This
work also introduces ways to fine-tune the process by establishing a normal profile of power usage
for each user, which increases the rate of malware detection.
Dedication
To my parents.
v
Acknowledgements
This thesis is a culmination of over four years of work, and without the involvement and help
of many people in my professional development as a researcher and my life personally I wouldn’t
be where I am today. First, I want to thank my advisor, Shivakant Mishra, who has not only been
instrumental in encouraging my progress, but helped encourage my pursuit of smartphone security
research on which this work is based. Second, I want to thank my thesis committee, Richard Han,
John Black, Qin (Christine) Lv, and Eric Keller, for their invaluable comments and suggestions.
I’d also like to thank Jacqueline DeBoard for offering guidance throughout this whole process.
I would also like to thank my many colleagues for their support and willingness to act as
soundboards for ideas, as well as the advice they offered. I am particularly thankful to Harold
Gonzales, Mike Gartrell, Dirk Grunwald, Doug Sicker, Richard Han, John Black, Yifei Jiang,
Donny Warbritton, Junho Ahn, Andy Sayler, Ning Gao, Kevin Bauer, Ali Alzabarah, Allison
Brown, and Frank Di Natale.
I would like to thank all my friends, family, and individuals who offered to help me with
my thesis project by running my research code on their smartphones. Without them none of this
would be possible. I’d like to offer a special thanks to those who took the extra time and effort to
run the malicious code simulator so I could get results to back up the effectiveness of my proposed
thesis.
Lastly, I would like to thank Jeannette Pepin for her constant encouragement and help editing
and reviewing my writings, as well as helping me run my projects in their buggy development stage,
and still more time helping me simulate data.
Contents
Chapter
1 Introduction
1
1.1
Need for Malicious Code Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Detection Doesn’t Need to Impact the User . . . . . . . . . . . . . . . . . . . . . . .
1
1.3
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.4
Fundamental Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.5
Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2 Background and Related Works
2.1
2.2
5
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1.1
6
Security Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1
Propagation and Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2
Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Malicious Code Simulators
20
3.1
Motivation for Use of Malware Simulation vs Real Malware . . . . . . . . . . . . . . 20
3.2
Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1
SMS Spam Trojan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2
Location Tracking Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
vii
3.3
Future of Malware Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Computer Based Detection Through File Integrity
23
4.1
Motivation for Computer Based Detection . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2
Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.1
4.3
Keyed Hash Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1
File Change Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4
Future Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 General Power Based Detection
29
5.1
Motivation for General Power Based Detection . . . . . . . . . . . . . . . . . . . . . 29
5.2
Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3
Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3.1
5.4
Statistical Approach with New Data . . . . . . . . . . . . . . . . . . . . . . . 35
Future Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 Location-Based Power Detection
39
6.1
Motivation for Location Based Power Detection . . . . . . . . . . . . . . . . . . . . . 39
6.2
Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2.1
Design Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2.2
Deployment and Recruitment . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3
Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.4
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.5
Future Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.5.1
Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
viii
7 Time Domain Based Detection
58
7.1
Motivation for Time Domain Based Detection . . . . . . . . . . . . . . . . . . . . . . 58
7.2
Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.3
Future Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8 Conclusions and Future Work
8.1
8.2
8.3
67
Fundamental Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.1.1
File Integrity to Accelerate Computer Based Scans . . . . . . . . . . . . . . . 67
8.1.2
General Power Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.1.3
Location Based Power Detection . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.1.4
Time Domain Based Power Detection . . . . . . . . . . . . . . . . . . . . . . 68
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2.1
Additional Tuning Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2.2
Combining Tuning Factors
8.2.3
Integrating With Another System . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2.4
Known Malicious Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Bibliography
71
Appendix
A Additional Tables of Data
73
B Additional Graphs of Data
87
Tables
Table
2.1
Propagation Vectors Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
4.1
File Integrity Hashing Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1
General Power Analysis for All Phones . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.1
Statistics for the standalone Project . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3
Location Based Effectiveness for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . 52
6.5
Location Based Effectiveness for SCH-I535 (2)
7.1
Time of Day effectiveness for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . . . 63
7.3
Time of Day effectiveness for SCH-I535 (2) . . . . . . . . . . . . . . . . . . . . . . . 64
7.5
Hour of Day effectiveness for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . . . 65
7.7
Hour of Day effectiveness for Samsung Galaxy S3 SCH-I535 (2) . . . . . . . . . . . . 66
. . . . . . . . . . . . . . . . . . . . . 57
A.1 Location Based Effectiveness for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . 73
A.3 Location Based Effectiveness for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . 74
A.5 Location Based Effectiveness for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . 75
A.7 Time of Day effectiveness for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . . . 76
A.9 Time of Day effectiveness for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . . . 77
A.11 Time of Day effectiveness for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . . . 77
A.13 Time of Day effectiveness for SAMSUNG-SGH-I747 . . . . . . . . . . . . . . . . . . 78
x
A.15 Time of Day effectiveness for Galaxy Nexus (1) . . . . . . . . . . . . . . . . . . . . . 79
A.17 Hour of Day effectiveness for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . . . 80
A.19 Hour of Day effectiveness for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . . . 81
A.21 Hour of Day effectiveness for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . . . 82
A.23 Hour of Day effectiveness for Samsung SGH-I727 . . . . . . . . . . . . . . . . . . . . 83
A.25 Hour of Day effectiveness for Samsung SGH-I747 . . . . . . . . . . . . . . . . . . . . 84
A.27 Hour of Day effectiveness for Galaxy Nexus (1) . . . . . . . . . . . . . . . . . . . . . 85
A.29 Hour of Day effectiveness for Samsung Galaxy S3 SCH-I535 (1) . . . . . . . . . . . . 86
Figures
Figure
2.1
Visualization of the GPS exploit a root kit could perform . . . . . . . . . . . . . . .
7
2.2
Normal Behavior vs Root Kit Behavior . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3
Visualization of the eavesdrop exploit a root kit could perform . . . . . . . . . . . .
9
2.4
Visualization of Smartphones normal power drain versus all radios in use . . . . . . 10
2.5
Clustering Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6
Performance of the two mitigation techniques . . . . . . . . . . . . . . . . . . . . . . 12
2.7
How a CAPTCHA system could be implemented for use on a smartphone . . . . . . 14
2.8
Visualization of how SmartSiren is implemented . . . . . . . . . . . . . . . . . . . . . 16
2.9
Visualization of how Risk Ranker works . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1
Battery level change for Day 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2
Battery level change for Day 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3
Battery level change for Day 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4
Battery level change for Day 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5
Graph of percent change in Battery graphs over 2000 samples . . . . . . . . . . . . . 32
5.6
Capture Percentage Location Tracking General Power . . . . . . . . . . . . . . . . . 33
5.7
Capture Percentage SMS Spam General Power . . . . . . . . . . . . . . . . . . . . . 34
5.8
False Positive Rates for General Power . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.1
Countries Running standalone Project . . . . . . . . . . . . . . . . . . . . . . . . . . 48
xii
6.2
Average Rate of Power Change at Locations for SGH-727 . . . . . . . . . . . . . . . 50
6.3
Average Rate of Power Change at Locations for SGH-747 . . . . . . . . . . . . . . . 50
7.1
Hour of day based power use for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . 59
7.2
Time of day based power use for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . 60
7.3
Hour of day based power use for SCH-I535(2) . . . . . . . . . . . . . . . . . . . . . . 61
7.4
Time of day based power use for SCH-I535(2) . . . . . . . . . . . . . . . . . . . . . . 62
B.1 Average Rate of Power Change at Locations for Galaxy Nexus (1) . . . . . . . . . . 88
B.2 Average Rate of Power Change at Locations for Nexus 4 (1) . . . . . . . . . . . . . . 88
B.3 Average Rate of Power Change at Locations for Nexus 4 (2) . . . . . . . . . . . . . . 89
B.4 Average Rate of Power Change at Locations for Nexus 4 (3) . . . . . . . . . . . . . . 89
B.5 Average Rate of Power Change at Locations for Nexus 4 (4) . . . . . . . . . . . . . . 90
B.6 Time of day based power use for SGH-I727 . . . . . . . . . . . . . . . . . . . . . . . 91
B.7 Hour of day based power use for SGH-I727 . . . . . . . . . . . . . . . . . . . . . . . 92
B.8 Time of day based power use for SGH-I747 . . . . . . . . . . . . . . . . . . . . . . . 93
B.9 Hour of day based power use for SGH-I747 . . . . . . . . . . . . . . . . . . . . . . . 94
B.10 Time of day based power use for Galaxy Nexus (1) . . . . . . . . . . . . . . . . . . . 95
B.11 Hour of day based power use for Galaxy Nexus (1) . . . . . . . . . . . . . . . . . . . 96
B.12 Time of day based power use for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . 97
B.13 Hour of day based power use for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . 98
B.14 Time of day based power use for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . 99
B.15 Hour of day based power use for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . 100
B.16 Time of day based power use for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . 101
B.17 Hour of day based power use for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . 102
Chapter 1
Introduction
1.1
Need for Malicious Code Detection
Smartphones are such an important part of many peoples daily lives now. And this impact
continues to become a growing trend as more and more people make the move to owning smartphones. This leads to the need for these smartphones needing to be secure and safe from malicious
entities and code.
The growing amount of malicious code aimed at smartphones presents an increasing risk to
smartphone users, making the detection of such code ever more important. The significant amount
of personal data stored and captured by smartphones means that privacy as well as general security
is a concern. In addition, malicious code can generate increased activity that drains a significant
amount of power from the phone and can cause congestion in the cellular network.
1.2
Detection Doesn’t Need to Impact the User
Given the need to protect smartphones from this malicious code, a variety of research and
projects have been proposed and/or developed to try and combat the threat that malicious code
can pose. Some of these proposed project though can have as much if not more of a negative impact
on a user in the impact it has on the functionality of the smartphone.
One major limitation for detection software that runs on smartphones is that smartphones
have limited storage, RAM, and processing power in comparison to the hardware capabilities of
modern computers. As such, techniques that effectively detect malicious code on a computer may
2
not be a feasible solution for a smartphone. In addition, the limitations imposed by the app
development framework can hinder the functionality of detection software on a smartphone. The
key drawback of detection software on a smartphone is that in order to be effective, such a service
or app may need to be perpetually running, thus placing a significant drain on the battery. This
factor would be a huge drawback for users, who would likely opt to risk malicious code over the
loss of battery life. Most conventional anti-virus solutions for smartphones tend to be on-demand,
or on-install scans that will likely limit the effectiveness to combat new unknown threats.
This thought that tools to detect the presence of malicious code need to be resource hungry
and have a negative impact on the user either in the way of significant battery drain or reducing
the functionality of the phone by reducing the available CPU cycles and memory. This has lead to
other strategies discussed in the related work that can other negative impacts on a user in order
to fight potential. Many of these would observe increases in network traffic as anomalous and then
throttle the increase that could be anything from a malicious app or a user opting to use Netflix,
Pandora, or some other streaming service. [9][10]
1.3
Problem Statement
q
While it is apparent that the need for a way to detect malicious code exists, most current
approaches tend to lead to negative impacts on the user. To this end, in this dissertation, we seek to
illustrate that malicious code can be detected with a minimal impact on the user and smartphone.
We assert the following thesis:
Light weight anomaly detection techniques can have a statistically significant capture percentage and a low false positive percentage without many of the drawbacks
conventional detection techniques cause.
We will focus primarily on how intelligent data collection combined with simple anomaly
detection results in a negligible impact on the user and a statistically significant capture rate.
3
1.4
Fundamental Contributions
This thesis contributes the following to the field of malicious code detection for smartphones.
• File Integrity Based Detection. (Chapter 4) An early work that utilized the fact that
at the time users would typically plug their phones in at least daily to a computer to sync.
This explored how to improve potentially scanning from a computer such that it would
take less time and potentially be less susceptible to malicious code escaping detection.[19]
• General Power Based Detection. (Chapter 5) Recognizing that most malicious code
behaviors would have a significant drain to a smartphone’s battery we began our investigation into how to detect malicious code against a general normal profile for how a user uses
their phone.
• Location-Based Power Detection. (Chapter 6) Realizing that we could continue to
improve on the general power based detection, we added location data into how we look at
and generate what is normal for a user based on where they are located.[20]
• Time Domain Based Detection. (Chapter 7) A final addition we looked into was using
time domain based analysis that operates on the concept that a user uses their phone
differently based on the time of day.
1.5
Dissertation Outline
The remainder of this dissertation is organized as follows. In chapter 2 we will provide a
survey of the relevant background and related work from the malicious code detection literature.
Then in Chapter 3 the malware simulators that were created for testing effectiveness of the projects
proposed in this paper. Chapter 4 will provide details of our initial work that looked into how to
improved computer based scanning. In chapter 5 we will give details of how we initially looked
general power analysis and then its effectiveness when we revisited it with the data and approach
taken in the stand alone prototype used for the thesis project. Then in chapter 6 we will go into
4
details of how the thesis project prototype was designed, the results of the data, and how effective
it was. Then chapter 7 will show how effective using the power usage at different times of day
is. Finally, Chapter 8 will summarize the fundamental contributions of this thesis and highlight
both what conclusions can be drawn and a variety of future work that could be derived from the
work.
Chapter 2
Background and Related Works
2.1
Background
With an uptake in the smartphone market there have been an emergence of viruses, malware,
and even root kits to gain access to smartphones. These security threats can be troublesome due to
the large amount of personal data that smartphone users store in their phones. Additionally this
malicious code will lead to both a dramatic drain in the battery of the phone, as well as creating
an abnormal load on the cellular network in comparison to an expected phones load.
And with the phone networks getting even faster and these devices having constant access to
the Internet as long as they have a signal. This will make these devices even more susceptible to
malicious attacks and a more viable device to exploit and gain control of. As such a need to study
malicious code on cellphones, as well as how the malicious code could be mitigated, stopped, or
detected has been a growing trend of research. And since there are such a varied number of infection
vectors that these viruses, malware, and root kits can make use of to propagate themselves this
Table 2.1: Table of propagation vectors and known examples of malicious code that use the vectors
6
makes them harder to stop on a network level. Figure 2.1 shows a number of existing smartphone
viruses and their vector of propagation. [18]
Smartphones offer a unique development platform in comparison to personal computer development platforms for the software to detect and remove the malicious code. A primary hinderance
is the limitations of hardware, such as slower processors, limited storage and ram. These factors
lead to a limitation on the effectiveness of applying modern algorithms for searching for this malicious code on the phone itself. Additionally having a service running perpetually looking for
malicious code would dramatically increase the drain on the phones battery, which would be a
drawback to users who would probably choose to risk malicious code over running code on their
phone to find malicious code for that reason. And having only scanning at install time, which is
how most existing solutions have solved the needing to constantly scan issue will also will allow
new unidentified threats get past for a few users and won’t be detected unless the user initiates a
full scan.
In addition to there being trouble developing solutions on the phone do to the physical limitations of the device and the necessity for the solution on the phone not to have a significant drain
on the battery, there are limitations on the functionality provided by the smartphone developers
in regards to what they allow to run on their phones. Additionally the tools to develop kernel level
software to combat malicious code are not readily available for most smartphones, and even those
that provide such tools are not as well documented as the development tools for the higher level or
app level development tools are documented.
2.1.1
Security Threats
A primary security threat that is the threat of root kits on smartphones as these devices and
operating systems are becoming increasingly complex and more closely resembling a small personal
computer, and hence becoming viable targets for root kits. Root kits which compromise and gain
long term control over an infected machine and can be the worst of the malicious code in that
it can expose all of the functionality of the core system to a malicious user. With this level of
7
Figure 2.1: Visualization of the GPS exploit a root kit could perform
8
Figure 2.2: Normal behavior vs root kit behavior for computer based detection system.
access and control a malicious user could make use of the root kit to eavesdrop on conversations,
emails, messaging services, etc.[26] And as it has low level access and can virtually hide itself from
detection from any tools on the device itself it means that most of the tools to conventionally look
for malicious code on a system could be rendered useless as it could hide itself. Figure 2.2 shows
how a root kit could hide itself from a detection system. Additionally, Figure 2.1 shows how a
root kit could be used to query a compromised phone for its location without alerting the phone
user that such a query took place.[15] This is a good real world example of how a root kit could
gain private information from a user. This one example could be a huge security issue for personal
security as if a stalker or assassin could easily use such an exploit to gain real time information as
to their target’s location.
An additional security threat that was discussed in these investigations brought to light about
another reason that would be a huge threat in places with confidential or classified information
being discussed during regular meetings would be for the phone, which is usually overlooked, being
enabled based off a calendar signal being caught and then remote dialing the attacker allowing
them to listen in on a conversation.[15] As root kits are able to hide from most detection tools,
and the level of access they can gain on a phone, as well as the private information that phones
are exposed to, this makes for a good motivator for strategies to detect this malicious code on a
smartphone. Figure 2.3 illustrates how this could be accomplished. With this in mind developing
tools to determine if a phone indeed has been compromised by a root kit is definitely a necessity
9
Figure 2.3: Visualization of the eavesdrop exploit a root kit could perform
10
Figure 2.4: Visualization of Smartphones normal power drain versus all radios in use
when considering malicious code detection for smartphones.
In addition to these root kits providing a huge security issue for users, corporations, and
governments they also create a huge drain on the battery of smartphones. Figure 2.4 illustrates
what the battery life is like for 3 different phones normally and what the battery life looks like
when a lot more of the resources are being utilized a lot. As is pretty apparent if you use the
radios to send a large number of messages constantly such as information on where the phone is
or relaying a conversation via an eavesdropping function. This noticeable change in battery life
is the motivation for the research discussed in this work focused on using power change data as a
mechanism for malicious code detection.
2.2
Related Work
When looking at related work it is easiest to think of the approaches as either mitigation,
prevention, or detection. We will discuss each of the related projects that exist in those areas in
more detail as well as some limitations they face.
11
Figure 2.5: Clustering Visualization
2.2.1
Propagation and Mitigation
The need to understand the possible mechanisms for malicious code to propagate and the
behavior such propagation takes is the start for further research in possible large scale mechanisms
to contain or limit the propagation of malicious code during an initial outbreak period. This would
be similar to CDC mechanism to limit the spread of large-scale viral outbreaks by slowing or
stopping the spread through various containment protocols.
Study of current investigations of cellphone virus propagation has lead to the understanding
that most of them operate on a clustering behavior, as can be seen in Figure 2.5 as a virus propagates
it effects the devices closer to it more than the ones farther away. [16] This makes sense since phones
have a number of vectors available to them all of which will effect stuff closer to it geographically.
The Cabir worm, which effects Symbian phones, replicates over Bluetooth connections via scanning
for vulnerable Bluetooth phones in its vicinity. And even if it was utilizing the phones phonebook
people commonly know others closer to themselves geographically, which would also result in a
similar clustering behavior.
Since a phones data usage or message sending volume would dramatically increase should
it have a replicating piece of malicious code residing on it in order to send itself to other phones.
And this observation that under normal operations the volume would be minimal could lead us
12
Figure 2.6: Performance of the two mitigation techniques
13
to methods to mitigate or slow the spread of this malicious code. [30] There were two approaches
to implementing code to slow the propagation of malicious code after detecting a change in the
expected normal data volume load. The first being Williamson’s rate limit algorithm, which operates on the phone and when an increase in data volume is detected will force a limit globally
for all the phone operations. [30] The second is a proactive group behavior containment (PGBC)
algorithm that would operate at the messaging server level, or on the cell tower level. [16] This
second method could detect an increase in a phone’s data volume on its network and limit it and
the other phones on its network’s data rate as well. Figure 2.6 shows a comparison of these two
algorithms along with no change to mitigate the propagation, which shows that if you could put
a layer on a cell tower or in the cell network to detect virus propagations and limit the network
speed, the number of phones that would get infected over time would dramatically be limited.
Though limiting the rate of data seems to be an effective solution for slowing a virus, this
would slow the cell network and reduce the functionality of the phone and other phones on the
network, which would not be a behavior that users and the cell providers would wish to invite
for only limiting the spread and not preventing the spread of malicious code. Additionally these
limiting mechanisms may identify normal behavior as malicious and slow regular traffic by reacting
to what seems to be abnormal behavior. This would mean that when you expected to have lots of
speed on the cell network you suddenly do not. This would be especially noticeable on the new cell
networks running HPSA+ or 4G LTE (Long Term Evolution) speeds. Since these networks offer
far faster network speeds, so users who started making full use of this speed could appear to be
a malicious device and suddenly have their speed cut, which would result in very bad feedback to
the cellular companies. This is one reason this is not a very viable solution.
2.2.2
Prevention
Another approach was to provide system level defenses to challenge the malicious code by
utilizing graphical turing test via a visual CAPTCHA (Completely Automated Public Turing test
to Tell Computers and Human Apart).[31] Figure 2.7 shows a representation of how this challenge
14
Figure 2.7: How a CAPTCHA system could be implemented for use on a smartphone
15
could be implemented in the steps that it’d need to take in its implementation. This challenge
was required to send or make connections through Bluetooth and messaging (SMS/MMS) services.
Since the majority of the existing malware and viruses for smartphones utilize these two vectors, it
could in theory prevent a phone that had been compromised from being able to use those vectors to
exploit another phone. Also could possibly prevent the original exploit should to make a Bluetooth
connection or process any payload from a MMS also require this challenge to be met. As it is nearly
impossible for the malicious code to guess the correct challenge response, this could greatly reduce
the exploitability of smartphones that implement a similar solution.
Along similar lines was work to build a phone operating system that is built upon trusted code
and everything running as an trusted code above a secure linux kernel.[13] This phone prototype
was designed around a similar base kernel such as Security Enhanced Linux (SELinux) which has
isolated operational domains. By creating distinct operational domains that code can operate in
unique to each application this will make it tough if not nearly impossible for code to gain enough
access to exploit and even if it was exploited would be limited in what it could accomplish in
compared to a phone without such security protocols in place. Based our use and investigations
of how to utilize Android for our own approach and it uses a similar design structure of trusted
code.[1]
A recent development in preventing malicious code is an actual release from the NSA, who
developed SELinux, in the form of SE Android a project to address the lack of securing files and
application domains in Android.[11]
2.2.3
Detection
So far the current work has primarily related to that of how malicious code propagates once it
has exploited a phone, how this malicious codes spread could be limited or slowed, and mechanisms
could be added to limit the effectiveness of malicious code to get access to start. None of these have
looked at how to detect malicious code or at the very least notify the user or network that it has been
compromised. To this end, SmartSiren, utilizes a proxy system for its communications that would
16
Figure 2.8: Visualization of how SmartSiren is implemented
17
detect an increase in messaging usage and trigger alerts to both the rest of the phones utilizing
this framework as well as the possibly infected phone alerting to them to the fact that the phone
in question is compromised, or is believed to be infected.[18] Figure 2.8 shows how the SmartSiren
system is setup. This could be useful as the rest of the phones could avoid communicating with
the possibly infected device therefore better preventing and limiting the spread of malicious code
over the cell network. As this system still requires a dramatic increase in messages for a period
of time, a number of replications of the malicious code could still occur before any alert happens.
Additionally if the rate of propagation of the malicious code was limited such that it would not be
distinguishable from regular usage, which as the data usage of users increases with the increases
in network speed, and services provided to these smartphones that is becoming increasingly more
likely.
There are two more detection projects that are closely related to some of this work in that
they also use power for their basis of detection. The first project uses high grain power signatures
to detect the power signatures of known malware and viruses.[25] The drawback of this project
over the power based techniques we have investigated is that it requires the malicious code be
known and requires a significantly more detailed power signature that isn’t able to be obtained
on most smartphones. Plus constantly observing the power at such a high level of detail would
have significantly more impact on the battery percentage on the phone then a periodic or triggered
polling as used in the stand alone prototype discussed later and previous work. There is another
recent research project called VirusMeter. [27] VirusMeter uses realtime power analysis similar to
the general power based research in Chapter 5 with only power changes being investigated as a
method of detection. This work came out after we had finished or initial power analysis but is
interesting for comparison in it uses significantly more machine learning techniques and real time
consumption analysis and we’ve shown effective detection without so much overhead. Additionally
they measured a 1.5% drain in the battery running their system vs a phone not running it and
running the data collection basis for the prototype we measure no change in the battery with the
data collection on the phone or not, indicating it was negligible over a whole day to the point we
18
couldn’t measure it.
A final project that falls into the classification of being used for detection is RiskRanker.[22]
In Figure 2.9 we can see the general behavior of RiskRanker to effectively filter out via a few
mechanisms of analysis apps that are likely to have pose significant risk through use of static code
analysis. They actually investigated the libraries and code execution paths of known malicious code
and then would investigate apps against a set of learning algorithm and analysis rules to filter out
when an app presented significant risk based on a few risky behaviors being in the app. We think
this project offers the most interesting and promising impact short term in actually potentially
being put to use in Google and Amazon’s app stores to filter out apps that pose significant risk to
the user. As this would likely happen in the store approval mechanism and not on the user’s phone
or at the network level this is another example of work that offers an approach that would have a
minimal impact on the user with great success at removing malicious code from the app stores to
start with. And as they use the code access behaviors vs signatures it offers a solution that can
potentially detect malicious code that hasn’t been identified yet as well.
19
Figure 2.9: Visualization of how Risk Ranker works
Chapter 3
Malicious Code Simulators
3.1
Motivation for Use of Malware Simulation vs Real Malware
As mentioned in previous sections most malicious code either has a negative impact on the
phone or costs a user money. Additionally, some of the malicious code will root the users device
and potentially corrupt the system in addition to the negative impacts. As such since all of this
research is done by volunteers and on actual user phones and we don’t want to cause these issues on
their phones, we have opted to instead build simulators that simulate the behaviors of the known
malicious code discussed previously that was described by other researchers.[21][15]
3.2
Design and Implementation
We designed two basic simulators, and then improved on one of them to make it more energy
efficient on the assumption that smart malicious developers would do the same to try and defeat
our approach.
3.2.1
SMS Spam Trojan
The first kind of malicious code that we opted to simulate is an SMS Spam Trojan. This is
based on a number of known malicious apps, viruses, and common method for a malicious code
propagating itself. This simulator was developed on the premise that the malicious code would
want to stay under the Android OS limit of 100 SMS in an hour, plus spread it’s communications
over the hour to limit it’s impact on the battery for any given moment.
21
The simulator was built to use the android alarm to allow it to do periodic SMS messages.
Additionally, the simulator uses the phone number of the device it is installed on to send the SMS
messages back to the user running the simulator so as to not spam any given individual.
3.2.2
Location Tracking Malware
The second kind of malicious code is based on a proposed root kit that could be installed on
a users phone to allow a malicious user to gain real time access to a users location.[15]
3.2.2.1
Power Naive Version
The first way we developed this was rather naive and didn’t make full use of some Android
OS system features that would have allowed for the app to use significantly less power. This version
constantly was getting the user’s location via the GPS and when it moved significantly would send
a SMS message back to the user’s phone, again to insure it didn’t effect another user and not
communicate where the user was to any third party. This version of the location tracking malware
was used when we initially investigated the potential of general power analysis.
3.2.2.2
Power Aware Version
When we designed the location based power detection data collector and eventually the stand
alone location based malicious code detector every design decision was how to mitigate impact to the
user including limiting the battery drain it caused. As such full investigation into better methods
for making use of location without being as power hungry.
To limit the power impact of gaining location data instead of constantly polling the OS for
the current location, we put Android’s intent system to use to have the OS signal our service when
the phone had moved a certain distance. This greatly reduced the CPU use of the location tracking
malware even though we still made use of fine grain GPS based location services and a rather small
distance to alert, unlike the location based data collector/thesis project.
22
This improved approach has even shown to use less power than the SMS Spam Trojan malicious code simulator that we will show in Chapter 5 used far less power than the power naive
version of this malware simulation.
3.3
Future of Malware Simulations
For future work the malware simulators could continue be expanded on to further investigate
the effectiveness of both this current work and any future expansions on this work. This could be
to include other known malicious code behaviors or to change the vectors of propagation. Finally,
continuing to improve on the simulators power minimization to insure that any given behavior is
being simulated making the best use of the Android OS to reduce the battery impact it has, which
is the logical approach future malware will likely take to try and combat power based anomaly
detection techniques either proposed in this work or mentioned in Chapter 2.
Chapter 4
Computer Based Detection Through File Integrity
4.1
Motivation for Computer Based Detection
Computer based detection is motivated by the idea that users often would connect their
phones to their computers to sync them on about a daily basis. To this end we came up with the
premise that if we utilized this use behavior of most smartphone users we could opt to scan from
computers vs on the phone to allow the superior processing power and physical power source to
allow for more sophisticated scanning to occur without impacting the user.
4.2
Design and Implementation
When we began to look into what scanning from a computer would require we realized that
it would likely need to get access to all the files and more importantly would likely copy the files to
the computer as part of the scanning process. When we looked into how long this would take based
on only 2 GB of total files including external storage for our test phone at the time, we discovered
that this would in fact upwards of 2 hours or more to get all the files off the phone in order to scan
them. This lead to the conclusion that no user would likely be interested in leaving their phone
plugged in that long in order to look for malicious code as part of their regular sync schedule.
To this end we decided that if we instead developed a way to determine which files had changed
via some kind of file integrity check that we could reduce the time significantly and even prioritize
files based on how often they regularly change so that we prioritize files that don’t regularly change
first.
24
To determine the effectiveness of this approach we built a prototype to check for files changing
via this file integrity mechanism. We made use of the Android ADB shell access tool to accomplish
this for our Android test phone. This is due to the fact that the ADB tool provides file access to
the phone and is how we tested copying all the files off the phone as well.
The built in shell lacks a lot of functionality in comparison to that of a full linux platform.
To get some much needed tools at the shell level we made use of tool that could be added onto
rooted Android devices called BusyBox.[3][4] This provided us a more complete version of the Linux
ls command that could provide us details about the files listed and if they were directories, files,
executables, etc. The benefit of this is we could now build a shell script from the computer side
that made use of this version of ls that could navigate the file structure and discover all of the files
in the system.
Now that we had a way to navigate the shell structure from the remote computer interface,
we needed a way to determine if files had changed. This is commonly approached with the use of
a hash. To this end we returned to using the extra features that BusyBox provides in the way of
hashing tools. We then decided we would investigate the runtime of hashing all the files on the
phone with the different hashing tools BusyBox provided. The reason for this is the computational
complexity of the different hashing mechanisms, which were cksum, md5sum, and sha1sum, are
different and thus lead to the assumption that a more secure hash would likely result in a longer
time to run.
We also developed a database to store the calculated hashes and files on the computer so we
could then look at the time to rehash all the files for subsequent file integrity checks. Additionally,
we used this further to investigate the frequency at which some files change vs others, which will
be discussed in Section 4.3.
4.2.1
Keyed Hash Design
As we worked on developing the regular hashing system it occurred to us that a sophisticated
piece of malicious code could potentially circumvent this approach by just compromising the hashing
25
executable to either ignore their file or return a correct hash back to the computer, which would
cause it to ignore any files it corrupted. This can be visualized in Figure 2.2. To potentially make
it more difficult for such sophisticated code to work or even force it to add such a delay to prevent
a hashing tool from detecting it we decided that we should investigate using a keyed hash instead
of one of the simple hashing mechanisms.
BusyBox sadly did not have any keyed hash tools provided with it’s package of new shell
tools. So to provide a keyed hash for at least investigatory benefit into how much longer it would
take to run over the simple hashing mechanism we built a shell script that implemented the HMAC
algorithm that would take a given key and file path as it’s parameters and then spit out a keyed
hash from those inputs.
4.3
Results and Discussion
Table 4.1: This table shows the different average runtimes for the various hashing mechanisms
investigated as part of the file integrity based detection method. It also shows both the first run
or initialization runtime and subsequent comparison check runtime.
Algorithm
cksum
md5
sha1
keyed hash
Initialization
377 secs
375 secs
366 secs
372 secs
Comparison Check
357 secs
384 secs
380 secs
378 secs
To get results for comparison we ran the prototype with each of the given hashing mechanisms.
Additionally, we ran it with a cold database to initialize the system by computing all the hashes
and then insert them into the database. We then also investigated the completion time to compute
all the hashes and compare it against these stored hashes. We did this for each of the hashing
mechanisms a few times to compute an average completion time for each of the hashing methods,
initialization time, and comparison check time. Table 4.1 shows the results that we observed from
the prototype for these different cases.
26
The results are a bit surprising compared to what was expected. We expected the initialization to take longer than the comparison check as the initialization requires more writes to the
database, which tend to be slower. But seemingly this wasn’t the case. Another interesting result is that there is no real difference in run time between the various hashing methods. This is
counter-intuitive; however, we believe the bottle neck for the runtime was not the database or even
the calculation time for any of the hashing methods but instead the limited USB 2.0 connection
found on smartphones today.
The benefits are that means we could use the more computationally secure keyed hash without
any noticeable delay over the other methods. Additionally, we can determine which files have
changed in far less than 2 hours and as we’ll discuss further in section 4.3.1 this also will greatly
reduce the number of files that would need to be scanned by any computer based scanning tool
that wished to employ this mechanism.
4.3.1
File Change Frequency
An additional consideration we investigated was looking at file change frequency. We accomplished this by running the prototype with one of the hashing mechanisms subsequently for about
an hour and then looking at which files changed. Then continuing this and running it every hour a
few more times. This gave us a reasonable picture of which files changed constantly, and through
a day. We also ran it once a day across a few days to get an idea what files change on more a daily
basis.
These results have a lot of factors that impact them that could be completely subjective;
however, we weren’t wanting to use this as a way to potentially further improve the previous
methodology by suggestion that there are some files that constantly change, others that change less
often, and even some that rarely change if at all. This allows us to prioritize scanning files who
have changed that rarely change first and then continue progressing through files in order of how
often they tend to change. The thought behind this is that if a file that has never changed before
suddenly changes it could potentially be malicious code behind it and should take top priority. This
27
priority queue wouldn’t add any extra overhead really to the prototype but could benefit users who
don’t always let the reduced scan complete as would insure the higher risk files got scanned first.
We observed that there were in fact files that changed constantly and looking closer at these
files most tended to be temporary files used by the system or other apps. Additionally, there were
some files that never changed over even the week of checking. This indicates that our proposed
approach of using a risk based priority queue for scanning files could in fact be built and deployed
in such a system.
4.4
Future Applications
As may be apparent to any current smartphone owner, the use behavior to need to plug in
your phone sometimes more than just daily to sync it with calendars, contacts, etc is no longer
even considered by most smartphone users as everything has migrated to the cloud. As such this
approach has lost it’s effectiveness in it’s current approach; however, it could be applied to future
applications making use of the cloud.
• Cloud Based Virus Scanner. It could be used as part of a cloud based virus scanner in
it could reduce the number of files it needs to scan and potentially offload to the cloud by
using file integrity based mechanisms to reduce the number of files.
• Local Network Based Virus Scanner. Recent benefits of the iPhone and some Android
phones have discussed that you can sync with iTunes or similar software while on the same
Local Area Network or LAN. It is possible that a future product could do a similar process
of scanning your phone for malicious code from the LAN. And this approach could greatly
reduce the number of files that would need to be scanned by such a tool.
As there are a number of future applications that we have thought of and potentially more as
file integrity is a commonly used for many things and will likely continue to be used in the future.
28
4.5
Limitations
We never addressed how to handle if the tool we are using for hashing was compromised, but
based on the consistent completion times for the comparison checks, and the variety of steps that
malicious code would have to take to return a valid or unchanged hash back to the computer we
would expect to see a significant delay that could be detected and throw up a flag alerting the user
that malicious code has potentially corrupted the hash mechanism.
Though the keyed hash based detection technique is quite strong. However, a strong rootkit
can evade detection by keeping copies of the unchanged files in addition to the modified files. In such
a case, the rootkit can easily compute the correct keyed hash since it has access to the unmodified
file. Note that the rootkit can avoid detection of modified file by simply infecting all system calls
to prevent showing the presence of modified files or memory occupied by those files. It is possible
to come up with some adhoc solutions to address these problems. For example, to ensure that the
amount of free space available as reported by the operating system is correct, a user may write
some data into all of the reported free space and then read it back to verify that the writes actually
took place. However, such adhoc solutions are too expensive in terms of time and potential long
term effects of writing and deleting from flash storage. Again the whole motivation of this research
is to limit the negative impacts on a user in order to detect malicious code being prevalent on the
system.
Another thought is that the time for a system to hide itself might be statistically significant in
comparison to the normal completion times to run the given hash mechanisms that the sophisticated
malicious code has now hidden itself from. And through this observation we could at least throw
up a flag that malicious code has potentially corrupted the users phone.
Chapter 5
General Power Based Detection
5.1
Motivation for General Power Based Detection
In general, it is extremely difficult to detect the presence of strong rootkits, since the entire
system software of the phone is under the rootkit control. The only way to detect such rootkits
is to observe a phone behavior from an external source and look for abnormalities. For example,
applications may run slower when a rootkit is present, or there may not be enough memory left
over to store a large file if the rootkit is occupying a significant amount of space.
With this power usage detection technique we monitor the rate of battery drain to detect
abnormal situations. The key idea here is that a malware or rootkit will result in a noticeable and
anomalous power usage in comparison to the regular power usage. This technique first develops a
profile of normal power usage of a phone by collecting power usage data over several weeks, and
then uses this normal power profile to detect any significant deviations in power consumption.
In addition to root kits a large quantity of malicious code behaviors would also consume
significant amounts of power and would likely be detectable. This observed behavior of known malicious code and the expected power drain that such behavior would cause is the primary motivation
behind using power as a mechanism for determining if malicious code is present on a system.
5.2
Design and Implementation
For general power based detection we initially just wanted to quickly gather data and observe
if it could be an effective approach. The reason behind this is that as it would likely be used as
30
Figure 5.1: Battery level change for Day 2
part of a larger system. To this end we made use of an existing app on the google play store called
Battery Graph.[7][2] Battery Graph offers the ability for their app to collect the battery level in
regular increments ranging from every 1 minute to every 5 minutes. We opted to use every 5 minute
battery level data collection as we wanted to limit the frequency we recorded data.
The app then allowed you to export data to a file but only the past weeks worth of data,
which made it difficult to get significant participants to collect data during our general power based
detection investigation push. As such all the data and analysis of this section will be for only one
phone. We did revisit general power analysis again with the use of data collected from the stand
alone thesis app, which gathered data from over 100 participants.
Once we gathered enough data from a month’s worth of use we combined all the data export
files into a single file and then used that file to get all our observational data from.
5.3
Results and Discussion
From the collected data we first decided we needed to look at what the battery level did
through a day to see if there was anything beneficial we could take away from that. Figures 5.1,
5.2, 5.3, and 5.4 are just a subset of daily graphs that give a general picture of the way the battery
31
Figure 5.2: Battery level change for Day 9
Figure 5.3: Battery level change for Day 15
32
Figure 5.4: Battery level change for Day 18
Figure 5.5: Graph of percent change in Battery graphs over 2000 samples
33
Figure 5.6: Figure illustrates the success percentage for given time ranges investigated against the
power naive location tracking malicious code simulator.
graphs for each day look. The key take away from looking at is they are all different but at the
same time have a similar look.
We couldn’t see any likelihood from looking at all of the data of using a day to day power use
graph comparison as a method for detecting anomalous behavior. As though the general behavior
is likely similar they tended to vary a lot day to day, which could be driven by anything from the
user waking up at a different time or making more phone calls in a day. As such we decided that
looking at these graphs and the data in this fashion wasn’t going to be very beneficial.
To this end we decided that the data that we really wanted to look at was the change between
data points in terms of power drop. Figure 5.5 is a graph of this change graphed across 2000 data
points. The key observation for us was that in only 4 occurrences did the power drop more than 4
percent in a 5 minute increment or only 0.2% of the time. This indicated to us that the majority
of the time the power drop observed would be minimal and we could potentially classify these
significant spikes as anomalous and indicating a likelihood of being malicious code.
We then decided to collect power data for the two malicious code detectors we had at this
point the SMS spam bot and the power naive location tracker malware. This allowed us to observe
how successful such an approach might be at detecting these malicious code simulations. When
34
Figure 5.7: Figure illustrates the success percentage for given time ranges investigated for capturing
the SMS Spam malicious code simulator data.
Figure 5.8: Figure illustrates what percentage of normal data would be considered malicious at the
different cutoffs for the time ranges investigated.
35
we first observed the simulation data against the 5 minute increment data one thing we observed
was that it wasn’t very successful but had a very low quantity of data that could be represented as
false positives at different rates of change we consider all data with a greater rate of change being
anomalous at. We decided that our malicious code simulations tended to have a sustained power
drain vs instantaneous change so we then investigated the success and false positive percentages at
different time increments ranging from the original 5 minutes up to 30 minutes.
This new view on the data had two noticeable impacts one was that as the window of time
we observed increased the quantity of malicious code captured increased. The other impact was at
the same time the false positive percentage observed also increased. This seemed to indicate for
general power use that there is a significant trade off between success and false positive percentage.
In Figure 5.6 we can see the observational data of how successful the different time windows
were at capturing the power naive location tracking malware simulator. This simulator was easily
captured 100% of the time for time windows exceeding 20 minutes at a cutoff of 2.0% or more drop
being flagged as anomalous. In Figure 5.7 we can see this same comparison but with the SMS
spam trojan simulator. This still had a statistically significant capture percentage at the 20 minute
window and cutoff of 2.0% of over 50% of the data captured and identified as anomalous. Finally,
the false positive capture percentages can be seen in Figure 5.8 which also show that a reasonable
false positive percentage observed for all the windows of time at the above 2.0% cutoff, and was
about 0.6% false positive percentage at the 20 minute window at the 2.0% cutoff.
5.3.1
Statistical Approach with New Data
In the recent deployment of a stand alone project as part of the thesis work described further
in Chapter 6 we collected power data when the battery changed, instead of on a regular basis. We
did this to try and reduce the impact the data collection app had on the battery drain observed by
the user. We also got a large quantity of phones to run this app exceeding 100 phones currently
reporting data back. Additionally, got a number of individuals to simulate malicious code that was
also reported back as simulation data. Again further details of the design and implementation of
36
this project is discussed later; however, through post processing we were able to get data segments
and average rates of change to revisit the general power analysis through the statistical driven
model used in our later project. Additionally, see the success of general power analysis using this
statistical model and the new power aware location tracking simulator that is more conscious and
power conservative than our earlier approach.
Table 5.1 contains the resultant data from the phones that we could test simulation data
against. The interesting observations of the data from Table 5.1 are that the false positive percentage calculated for the data is extremely low for both sigma values. A sigma value being the number
of standard deviations beyond the mean we then consider any rate of change to be anomalous at.
Secondly we could for a number of the phones detect the SMS spam trojan; however, we were
unable to detect the new power aware location tracker on any of the phones.
This indicates two things that general power based detection isn’t as attuned to increases
as it may observe significant high power use events that would drive up the average and standard
deviation for the phone. And our new power aware approach to location tracking has significant
power savings and would be more similar to an approach we would expect future malicious code
of this type to take going forward to try and hide from power based malicious code detection
techniques.
The fact that general power based detection is effective to some extent but not nearly as
successful as we had hoped lead to some of the future investigations and the development of the
thesis work discussed later. But the nice the about it is that it has a really low false positive
percentage and could detect some of the malicious code simulations for a few of the phones.
5.4
Future Applications
This work is the basis and foundation of the location and time domain based work that comes
later in this paper. But generally the understanding gathered from general power based detection is
that we can in fact use power as a method for detecting malicious code on a smartphone; however,
we need to either improve the sophistication of the approach used for anomaly detection or add
37
other variables to the detection as is the case with location or time domain based techniques that
add a new way to group the data based on different ways use their phones in differing factors.
38
Table 5.1: This table shows the effectiveness of general power analysis through the use of statistical
analysis with sigma values.
Device
Nexus 4 (1)
Nexus 4 (2)
Nexus 4 (3)
Nexus 4 (4)
SAMSUNG-SGH-I727
SAMSUNG-SGH-I747
Galaxy Nexus
SCH-I535 (1)
SCH-I535 (2)
Sigma
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive Percentage
0.46
0.46
0.54
0.43
0.56
0.54
0.39
0.31
0.28
0.26
0.23
0.23
0.30
0.24
0.36
0.36
0.65
0.57
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
-
Location
-
Chapter 6
Location-Based Power Detection
6.1
Motivation for Location Based Power Detection
Our investigation of the general power based detection technique lead to the conclusion that
additional details were needed to more finely tune the power profile. To that end, we considered
what factors changed the way a person might use their phone, such as location. For instance, they
would likely use their phone differently at home than at work, or when driving to work. This is the
motivation behind location based power detection. By developing a normal profile for locations a
user visits, we were able to improve the accuracy of detecting malicious code and continue to have
a low rate of false positives.
6.2
Design and Implementation
We created two different prototypes to investigate location-based power detection: a data
collector, and a standalone app that built on that foundation. For both prototypes, every design
choice prioritized minimal power use and minimal impact to the user while the prototype was
running. This was successful to the point that no measurable impact was caused by the app in
tests.
6.2.1
Design Choices
Some of these design choices are only for the standalone project, and they are identified as
such. Additionally, certain design features are merely details of implementation (such as malware
40
simulator integration), and do not directly relate to reducing the apps impact on the phone.
Event Triggering Utilizing the Android OS intent system that allows for the service to be signaled
when a specified event occurs, such as the battery status changing or the location changing.
Below are further descriptions of each of these intents, implementations, and additional
details each intent provided. Using event triggering also means we are not having to
constantly poll the system for data and can allow the app to not do anything when an
event hasn’t occurred. Additionally, our investigations have shown that during an event
triggered even we only spend an average of about 400ms per event to process and record
each event. This is not a lot of time, and the benefit is that it means we are using a very
minimal amount of CPU time to record the data when an even does occur.
Battery Intent For the battery intent a receiver class was created that would be instantiated any time the battery status changed. This would do different things based on
the classification of the event that changed the battery status.
Plug Status If the phone was plugged in or unplugged then this would be recorded
appropriately either in a file for the earlier project or in the appropriate database
table for the standalone project. The biggest benefit of the plug status is that
we could drive secondary services and events based on the plug status. As will
be discussed in a later design choice we used these events to either start phone
processing of data and sending to the central server or stop it depending on the
plug status.
OS Event If the phone was being shut down or rebooted this would be recorded
correctly as an OS event in the file or database depending on the project. This
event was recorded because we didn’t want to include in our data a segment that
spanned the phone rebooting or being shut down for a period. Accordingly, a
shutdown event was used as a signal to start a new segment.
Battery Changed The most important battery intent was the battery status intent
41
that indicated the battery level changing. From this intent we parsed out additional information to get a full picture of the status of the battery at this status
change. In addition to the battery level we also recorded the current plug status,
battery status, voltage, and current. Some of this data was used to facilitate
parsing out discharging power segments for statistical calculation; however, a lot
of the data (such as the voltage and current) was recorded but as of yet has not
been applied to any of the research conclusions. The reason for not using the
voltage and current is that not all phones provided access to this information,
and this inconsistency lead to it not being applicable large scale for any phone.
But as it was no real cost to collect this data, we opted to collect it for potential
future investigations.
Location Intent The location wasn’t really done through the intent service; however, we
made use of Android’s locationListener class that allows us to set up an action when
the location has moved beyond a set distance from the previous location. The benefit
of this is that we don’t constantly have to be polling for the location. We used a
distance of 1500 meters as the event notification distance to cause the location service
to wake up and make use of the new location.
Coarse Location To further limit the power required to get the location information,
we only made use of the coarse location features of Android that dont utilize GPS
based location information. The benefit of this is that it only relies on WiFi and
cellular network location data, which uses next to no additional power since the
phone is constantly making use of this data anyway. Since we were using a
very large bubble for a location, the location data did not need to be extremely
accurate for the data collection to function correctly.
Data Recording The standalone project improved over the earlier just data collection project
by providing a sophisticated method for getting data back from the users who were run-
42
ning the project. The earlier project just recorded all the battery and location data to
a single file on the users SD card. This necessitated a lot of trouble to process the data
and more importantly determine if locations were previously visited, etc. The standalone
project made use of the SQLite database feature available in the Android API to make
data manipulation simple on the phone, and make sending the data back easier. This had
a number of benefits:
• Made it quick and easy to store new data.
• Made sending data back to the central server simpler (this will be discussed later).
• Allowed for tables for individual locations to be created so data was already parsed
that helped reduce overhead of processing data.
• Table of locations visited that allowed for easy verification if a location had already
been visited without a lot of additional work.
• Allowed integration of simulator, as simulation data could be recorded in its own table
easily.
• Used far less space in the SQLite format than as a data file on the SD card (the data
file approach was used for the just data prototype).
• Worked on all phones, as some newer phones don’t have external SD cards, and have
varying paths to the SD card if they do have one. As such, not using the SQLite
database approach would have greatly reduced the phones the standalone project
could work with.
Data Processing The data collected by the phone needed to be processed on the phone in order
to drive the statistical based detection mechanism used for the standalone project. This
necessitated querying all of the data in every table to ensure it had the requirements to be
considered a viable location, and then processing the data to calculate the statistics. To
reduce the time spent doing this processing, as well as the impact to the user, a number of
43
design choices were utilized.
Data Check To reduce the overhead of data processing, if the data for a given location
didn’t meet the training requirements of at least a months worth of data and a minimum amount of time spent at that location during the month, then it immediately
moved on to the next location without doing anything with the data.
Segment Table Created a table to hold segments already created and processed from
the tables for each location. Data that was successfully stored in this table was then
flagged as processed in the table to prevent it being included in future processing. The
benefit of doing this is that segments are maintained after they are parsed out of the
data, greatly reducing the overhead of processing as the quantity of data continues
to grow. This also allows data that has already been processed to be filtered out of
future queries in order to reduce the resources needed to do the processing.
Plug Status As mentioned with the battery intent, we can discover if the battery intent
was caused by an event such as the phone being plugged in or unplugged. The benefit
of having this information, beyond triggering the end or start of a segment in the
processing, is that the processing can be initiated when the phone is plugged, which
avoids the power drain that might otherwise result from the processing of the data.
We also built in a mechanism to halt the processing without negative impacts on the
integrity of the data if the phone gets unplugged during the process.
Threaded To both reduce the impact on other applications, and allow for the full potential of processing (especially on newer multi-core phones), we made use of Android’s
AsyncTask class, which spawns a new thread for processing. In this case, each new
thread processed a single location. This works as a background task that is designed to
allow other threads to have higher priority, but at the same time allow for potentially
more than one location being processed at a time. In addition to making the processes
threaded in the background we also made use of a WakeLock from the system to keep
44
the CPU fully active even when the screen is off so that processing can make full use
of the processor while it is doing this multi-threaded and plugged processing. This
also prevents issues that were observed where the processing would halt due to the
CPU going to sleep if the WakeLock was not employed.
Data Sending The key design addition over the earlier work that the newer standalone project
provided was receiving data back from the phones running the project. As mentioned, the
earlier project stored everything in a data file. This file had to be manually copied from
the SD card and mailed to us by the user in order for us to access the data. The goal of
the standalone project was to make it as non-interactive as possible. As such, we needed a
way to retrieve the data that would be efficient, successful, and able to distinctly identify
the phone. This lead to some of the following design choices for getting data to send back
successfully:
HTML Based Send Using the HTML Post mechanism to send the data back row by
row allowed a few things to happen. Firstly, we could get a success response from
the server if the data was successfully received and stored on the central server side.
This allowed for the next design choice to be effective. Second, we could send a row
of a table at a time and then insert a row at a time on the server side, which reduced
the code complexity for sending data. This resulted in significantly more sends than
packaging the data into a larger server Post message would have; however, this ensured
we could send a single row if that was all that needed to be sent, rather than having
to wait for larger quantities of data to potentially be collected first. Additionally,
sending single rows reduced the overhead of trying to implement scalable data packets
for both the phone side of sending and the server side.
Only Send Once By making use of the HTML Post response from the server, we could
parse out a success or failure on the send. As such, if the send succeeded we could
flag this row as sent in the database, preventing duplicate sends.
45
Threaded The original reason we learned about the AsyncTask is that in more recent
versions of the Android OS, Android requires that all HTTP communications occur
wrapped in an AsyncTask class. The benefit for this is that it won’t hang the UI
for a user using the application while HTTP traffic occurs, since it isn’t part of the
application or service process. Making use of AsyncTask, we were able to make the
sending process multi-threaded and make better use of the processor. We additionally
made use of the WakeLock (mentioned for data processing) for sending to allow full
use of the CPU while the screen was off.
Plug Status We would only start sending data while the phone was plugged in. Additionally, a mechanism was built in to stop future sends if the phone became unplugged,
to avoid any unnecessary power drain.
WiFi Status Initially, we required a users phone to be connected to WiFi to send, in
order to avoid any data charges that could potentially be incurred by the app sending
out large quantities of data over the cellular network. We later made this optional,
as some users stated that they never used their WiFi since they had unlimited data.
When the users were making use of WiFi, we utilized a WiFi Lock that would ensure
the power to the WiFi antenna wasn’t reduced when the phone screen was off. For
newer phones running a more recent version of the Android OS, we enabled the high
power WiFi lock to allow sends to happen even faster. Since we were requiring the
phone to be plugged in to be sending, power use wasnt an issue.
Privacy We wanted to ensure that all data that a user transmitted back to us was obscured such that we could not know anything personal about the phones running the
standalone project; however, we still needed to be able to distinguish distinct phones.
To this end, we used two identifying values of the device: the serial number, and the
SIM serial number (if one existed). We then concatenated those into a single string
that was then hashed via the SHA1 hashing mechanism. This provided us a unique
46
identifier to for each phone in the data reported back to the central server. Additionally, we did not want the actual GPS coordinates of the locations a user visited.
Instead, the data was reported back as distinct location identifiers, which were integer
values representing the locations visited by the user starting from location 0, and being
incremented for each subsequent location a user visited. This provided the segmented
location information without giving away where a user actually visited.
Notifications The application has a user-set sigma value and flag that the user can enable to
allow the application to throw up notifications if an anomalous event is detected at the
user-selected sigma value (the sigma value being the number of standard deviations beyond
the mean after which data would be considered anomalous). When the user clicks on the
given notification they are given a dialogue with the opportunity to mark the flagged app
as trusted or untrusted. This gives general feedback as to whether the app is potentially
malicious, or a false positive based on the users feedback. In addition to the notifications
given to the user, the app records any possible notifications from sigma levels of 2.0 up
to 3.75 in increments of 0.25. This was done to see what notifications would have been
captured even if the user hadn’t enabled them, or if they were at a lower sigma value. This
was also used as the basis for the observed false positive rates discussed later.
Malware Simulator Integration When we began to search for users to run the malware simulator, we realized we needed a simple way to collect this data as a simulation, in addition
to making it simple for the user. As such, we originally wanted to include it as a menu
option in the standalone project; however, attempts at this resulted in conflicts with an
existing service in the standalone project. Instead, we opted for inter-app communications
in Android. This was accomplished through the Android Interface Definition Language or
AIDL. AIDL provided the ability to craft a shared Android method that both apps could
make use of. This allowed the standalone project to be signaled when a simulator started
and stopped, in addition to recording which simulator it was, in order to store the data as
47
simulation data of the appropriate type.
6.2.2
Deployment and Recruitment
Once the standalone project was built and the basic data collection side of things was working,
we deployed it onto the Google Play store to allow the beta test users to get automatic updates.[8]
Once we implemented all of the planned features, we began actively recruiting people through social
media such as Facebook and Google+, and through the XDA Android Developers Forums.[12][5][6]
This has been extremely successful, as will be discussed in Section 6.3, and is the only method for
recruitment utilized.
As part of this process we went through the Institutional Review Board (IRB) for approval,
since the data we were collecting was generated by humans on their smartphones. Though it was a
troublesome process, we were thankfully granted an exemption from the IRB because they decided
that our work was not human testing. This was helped in part due to our privacy considerations
when data was sent back to the central server.
6.3
Results and Discussion
Table 6.1:
This table shows the interesting data points collected as part of the standalone project as of
5/17/2013.
Statistic
Phone Data Collected
Active Phones
Phones with 30 or more days of data
Notifications
Value
1382412
108
47
0
The most interesting thing that the results initially gave us was the substantial quantity of
data we had collected. In Table 6.1 we can see these values. Additionally, of the 100 phones shown,
48
Figure 6.1: Figure shows the various countries currently running the standalone project from the
Google Play store statistics.[7]
49
over 45 had been reporting data back for more than 30 days, which is the minimum period required
for any location to be considered valid. These phones could begin processing their new data as
potentially anomalous and record notifications. They could also alert the user to anomalous events
and get user feedback. The most interesting statistic collected in Table 6.1 is that no notifications
were recorded for any of the phones that met the threshold for beginning to detect malicious code.
The lack of alert notifications for these users is considered a good sign because most of the
individuals running the app were very security-conscious, which means that there was a very low
likelihood that their phones would be infected. Thus, we interpreted this to mean that there were
no false positives generated.This is good support for our earlier claim that low profile location based
malicious code detection techniques offer good detection with a low false positive percentage.
The next interesting bit of statistics from data that the Google Play store provided was the
countries of origin for users who installed the application. In Figure 6.1 you can see a visualization
and break down of some of the countries that have installed and are currently running the standalone
project. These are interesting statistics as they show how far-reaching and varied the users are who
are running the standalone project.
After getting a general picture of how much data was collected, and who generally were
running the project we moved on to investigating the basis of the motivation and thesis of location
based power detection being an effective method for detecting malicious code with a minimal false
positive rate. The first step of this was looking to see that the power use for the various users
running the application did vary between locations.
As part of the post processing where the segments of power were parsed for each location,
the average and standard deviations for each location were calculated again at the central server in
the same way it was being done on the phone. We then could plot the average power use at each
distinct location that met the training data minimums of 30 days and at least 50 data segments.
These were then plotted in a histogram. In Figure 6.2 and Figure 6.3 we can see the average rate of
power change per second graphed for two of the phones that reported data. Included in Appendix
B there are more of such graphs for most of the other phones we have simulation data from. There
50
Figure 6.2: Figure shows the average rate of change for various locations visited by the phone
SGH-727.
Figure 6.3: Figure shows the average rate of change for various locations visited by the phone
SGH-747.
51
were a few that were not included even in the appendix due to having so many viable locations it
wouldn’t fit into a graph that was readable.
These rate of change by location graphs illustrate that in fact the premise of this location
based power detection approach that users tend to use their phones differently depending on where
they are in fact holds true. As you can see in Figures 6.2 and 6.3 the average rate of change at each
location is different and in some cases significantly different. These graphs are a great supporting
visual that the basis of this approach is viable since users actually do use their phones differently
depending on where they are.
52
Table 6.3:
This table shows the effectiveness of location based power detection through the use of statistical
analysis with sigma values for the Nexus 4 (4). It also includes the effective observed false positive
rate from the model running on the phone as a standalone project and the calculated false positive
% when using the statistical model against the collected data.
Location
0
2
7
9
13
17
19
21
22
27
30
41
61
62
65
80
110
113
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.52
0.52
2.38
2.38
2.86
2.86
2.44
2.44
2.44
2.44
3.08
1.54
1.02
1.03
3.64
3.64
3.41
3.41
1.39
1.39
1.67
1.67
3.23
0.00
3.57
3.57
3.51
2.63
2.91
2.91
2.47
2.47
1.61
1.61
3.57
3.57
Observed False Positive %
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
53
We then made use of the simulated malicious code to really investigate the effectiveness of
detecting malicious code. This was basically done by calculating the statistics for every location
and then testing the effectiveness or ability to capture a statistically significant quantity of the
simulated malicious code at different sigma values. This was done for all the sigma values that
were being used on the standalone project ranging from a sigma value of 2 up to a sigma value of
3.75 in increments of 0.25. In the included graphs we have only include the sigma values of 2.5
and 3.0 for discussion as 2.5 was the default setting in the standalone app and a sigma of 3.0 is
commonly used in a known anomaly detection machine learning approach that is effectively what
we are doing. In Table 6.3 and 6.5 we can see the results for the various locations of the two phones
with the most data collected. These offer the best basis for drawing conclusions; however, the
conclusions hold for the other phones we got simulation data for that are available in Appendix A
for reference.
Table 6.5 has the most interesting results in that it’s calculated false positive percentage
is 0.0% for nearly all of its locations in addition to the observed false positive percentage. The
calculated being the percentage of data in its normal profile that would be deemed anomalous if
it occurred again. This phones results are a clear indicated at how effective location based power
detection is at detecting malicious code as well as having a minimal false positive percentage. This
is also the power aware location tracking malicious code that was detected in every case as was the
SMS spam bot. The second phone with data in Table 6.3 also helps reinforce this as it also had
an observed false positive percentage of 0.0% and it also had a very small calculated false positive
percentage for all of its data. It also was successful in all cases of detecting both kinds of malicious
code simulations.
Not all the phones detected both kinds malicious code for all of its locations, as is the case for
some of the phones in the Appendix A these though could still detect malicious code for nearly all of
their locations and in one of the examples it could detect it at the lower sigma value threshold but
not the sigma 3.0 threshold. But these phones still have low false positive rates and could detect
malicious code at the vast majority of their locations. It is likely that these locations saw significant
54
fluctuations in power use in their training data that resulted in them to not see significant power
drain as anomalous and thus the power aware location tracking malicious code managed to evade
detection by using a minimal amount of power in these such locations.
To summarize, using this basic statistical based model to look for anomalous events we
were able to detect both kinds of malicious code simulations a statistically significant number of
times for nearly all the phones locations that had gathered enough training data. The results also
showed that there was a very low calculated false positive percentage as well as a 0.0% observed false
positive percentage. And finally, with these two factors taken into account this location based power
detection technique is highly successful and proves our theory that low profile detection techniques
can be successful with a minimal impact on the user as this approach had an unmeasurable impact
on the power drain observed in testing. Additionally, all high power processing, data sending, etc
occurs only when plugged in.
6.4
Limitations
This location based approach is extremely effective as the data shows; however, there are
a number of limitations, assumptions, and kinds of malicious code that it is optimized for. The
following limitations are:
Advanced Malware Advanced malware or a rootkit has significant access to the underlying
phone system and could make the reported battery data appear to be normal. Another
option is that this kind of malicious code could uninstall our app and there is no defense to
prevent this currently. Also if the approach proposed in this work is know then malicious
code developers could work to write their apps to use less power potentially making it more
difficult to detect or make it use less power in a sustained manner.
Kinds of Malicious Code The malicious code this approach targets are the sustained and significant security risk kinds of malicious code. This approach is limited in it would need a
significant power drain to trigger an alert and many of the top apps that could be consid-
55
ered malicious only steal the phone number, phone ID, and contacts would only use a brief
burst of power to send this data and then not do anything afterwards. The approach we
discuss would detect apps that track a users location, send spam for sustained periods, and
other such malicious code.
Location Required This approach though highly effective fails to detect at locations that haven’t
been visited frequently enough to meet the threshold for time or data segments to allow
it to be a location used for detection. Quite a few of these locations may never generate
data segments anyway as they may be along a daily commute where you move to a new
location before the battery has a chance to drop and allow for a data segment to exist
where the rate of change could be calculated. A solution for this could be to make use of
general power analysis for the all the data that isn’t at a location that has met the training
requirements. We tested this approach with the data collected as part of this project and
it appears that using general power like this is just as effective as general power as a whole
as discussed previously in this work.
Existing Malware This approach operated under the assumption that no malicious code was
present during the training period or already on the phone to start. As some of the known
kinds of malicious code that we developed simulators to replicate their behavior could
occasionally have a bursty nature, it is possible that these bursts though could increase the
over all average of the data might still fall outside the normal range for the profile and be
detected, but would make it less likely.
Erratic Data Some phones in our study for location based power detection were not effective for
all locations and this was also apparent in general and time domain based detection. But
users who use their phones radically differently regularly may result in a normal profile with
such a large standard deviation that it would require very high power use to fall outside
the normal profile. As such this style of use would be a limitation that this work would
not be able to adjust for.
56
Long Training Period Currently this work operated under the assumption that 30 days and a
significant volume of data segments were needed to generate a good normal profile. It
may be possible to only rely on the volume of data and ignore the length of time spent
at a location as then some locations would have valid location specific normal profiles
quicker as the locations a user frequents for long durations every day would become a
useful location in significantly less time. This would allow for quicker improvement in the
ability to detect malicious code than using general power based detection while the 30
days of data requirement is met. There is no justification for the 30 days beyond being an
arbitrary value.
6.5
Future Application
We can see this technique being integrated with future projects or current anti-virus projects
as a method to determine when more sophisticated techniques should be employed or when a classic
anti-virus signature based scan should be scheduled.
6.5.1
Future Improvements
It is possible that further analysis of what apps are running in addition to the power usage
data with this technique could yield a better approach for identifying the anomalous app versus
the current approach that just flags the current app in focus on the assumption it would be the
one most likely to be causing a power spike.
Additionally, taking the data collected for this project even farther and applying data clustering algorithms to get an improved picture of which variables of the ones collected cluster well
and thus could be used as additional factors for anomalous event detection with as the best way to
potentially improve this technique would to add another aspect of user use behavior to the what
makes the normal profile for a given location. A consideration would be including the approach in
chapter 7 using time domain analysis to further improve the normal profiles.
57
Table 6.5:
This table shows the effectiveness of location based power detection through the use of statistical
analysis with sigma values for the Samsung Galaxy S3 SCH-I535 (2). It also includes the effective
observed false positive rate from the model running on the phone as a standalone project and the
calculated false positive % when using the statistical model against the collected data.
Location
13
17
50
77
93
111
117
119
125
136
139
206
211
248
297
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
9.09
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Observed False Positive %
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Chapter 7
Time Domain Based Detection
7.1
Motivation for Time Domain Based Detection
The motivation for time domain based power detection is the premise that a user uses their
phone differently depending on what time of day it is. This was derived from the earlier premise
that a user uses their phone differently based on where they were located and that it makes just as
much sense that the time of day could play a factor. Even potentially what day of the week it is;
however, we did not investigate the time domain in context to the actual day of the week.
7.2
Results and Discussion
The data was actually processed out of the data collected as part of the larger stand alone
project for the location based power detection approach. Each segment of power was segmented
out and a python date time element that represented the midpoint in time between the start and
end of the segment to represent when the segment took place. This allowed group rates of change
segments by the hour of the day or time of day for graphical representation or effectiveness check
against the simulated malicious code.
Figures 7.1 and 7.3 represent box plots for the hours of the day in terms of rate of change per
second observed during each hour of the day. These plots are for the same two phones discussed
direction in chapter 6 as they have the most data recorded as well as largest quantity of simulation
data. These hourly graphs do illustrate that the use through the day varies a bit between hours of
of the day. Figures 7.2 and 7.4 show the power use in a more grouped format if we broke the hours
59
Figure 7.1: Figure shows the power distribution based on the hour of day for phone Nexus 4 (4).
60
Figure 7.2: Figure shows the power distribution based on the time of day for phone Nexus 4 (4).
61
Figure 7.3: Figure shows the power distribution based on the hour of day for phone Samsung
Galaxy S3 SCH-I535 (2).
62
Figure 7.4: Figure shows the power distribution based on the time of day for phone Samsung
Galaxy S3 SCH-I535 (2).
63
of the day into quarters or every 6 hours. This provides us a picture of the power use at night,
in the morning, afternoon, and evening. This also provides a distinct picture that there is a slight
difference in power use at different times of day; however, this difference isn’t nearly as distinct as
location based.
Table 7.1: This table shows the effectiveness of time domain based power detection through the
use of statistical analysis with sigma values for the Nexus 4 (4).
Time of Day
Night
Morning
Afternoon
Evening
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.35
0.35
2.53
2.07
2.94
2.21
0.73
0.66
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
This difference not being as different as location based detection might be a driving factor
behind the effectiveness of this time domain based detection technique’s effectiveness. As you can
see in Table 7.1 the detection worked for most of the times of day except for night, when the phone
would likely be plugged in anyway so likely had limited data. And had a very small false positive
percentage especially compared to a few of the location based technique’s false positives for a few
locations for the same phone. But this table shows a potentially successful technique; however,
in Table 7.3 we see this same approach applied to another phone and the results couldn’t detect
malicious code in any of the cases. There may be driving factors behind this, but the clear indicator
is that this approach isn’t consistent in its effectiveness as was the case for location based power
detection.
This trend of success doesn’t completely match up to this same analysis done against the
hour of the day data grouping. In Table 7.7 we can see that for a few hours of the day we can
64
Table 7.3: This table shows the effectiveness of time domain based power detection through the
use of statistical analysis with sigma values for the Samsung Galaxy S3 SCH-I535 (2).
Time of Day
Night
Morning
Afternoon
Evening
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
1.96
1.96
0.80
0.80
1.46
0.60
3.25
2.49
SMS
-
Location
-
successfully detect malicious code; however, for most of the hours of the day we still can’t. And in
Table 7.5 we can see that it is still effective for almost all hours of the day. But it still isn’t nearly
as effective as location based power detection.
7.3
Future Applications
We see this approach potentially being beneficial as an addition onto like the location based
approach to further fine tune the normal profiles and thus potentially further improve the effectiveness and further decrease the false positive percentages. Based on the results of the effectiveness it
is more beneficial to potentially use the hour of the day approach over time of day approach.
65
Table 7.5: Hour of Day effectiveness for Nexus 4 (4).
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.57
0.57
2.63
1.32
5.56
2.78
5.26
5.26
8.33
8.33
2.56
2.56
2.70
2.70
7.14
0.00
2.84
2.84
4.84
4.84
1.41
1.41
2.00
2.00
3.75
3.75
4.84
4.03
2.20
1.65
3.85
2.20
3.47
1.73
3.41
1.70
0.86
0.86
2.66
2.33
1.99
1.99
0.53
0.53
2.74
2.43
3.09
2.41
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
66
Table 7.7: Hour of day effectiveness for Samsung Galaxy S3 SCH-I535 (2).
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
3.57
3.57
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
2.17
2.17
1.74
1.16
1.09
1.09
0.77
0.77
0.64
0.64
0.25
0.25
1.79
0.77
2.06
1.61
3.37
0.90
0.86
0.86
3.56
2.80
4.17
2.78
3.13
2.56
3.43
1.87
3.11
1.95
5.36
2.68
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
-
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
-
Chapter 8
Conclusions and Future Work
This thesis seeks to improve malicious code detection for smartphones through proposing
methods of malicious code detection that don’t have a negative impact on the user, yet offer a
successful approach for detecting malicious codes prescence:
Light weight anomaly detection techniques can have a statistically significant capture percentage and a low false positive percentage without many of the drawbacks
conventional detection techniques cause.
We next explain how we addressed this statement by summarizing this dissertations fundamental contributions.
8.1
Fundamental Contributions
There were a number of fundamental contributions in this dissertation that can be summa-
rized as follows.
8.1.1
File Integrity to Accelerate Computer Based Scans
In order to harness an observed behavior most users had when using smartphones we provided
a new application of an existing technique and showed how this could be use to significantly reduce
the time to scan files from a computer when a user synced their phone. Additionally, we showed
that a keyed hash could be used for additional security without any delays to the whole process.
We also discussed how we investigated that there are frequencies to how often files change and this
68
could be used to prioritize scanning files that change less often first. And though users no longer
sync their phones to computers we discussed how this could potentially be applied in the future.
8.1.2
General Power Based Detection
Given our observation that users were trending not to use computers to sync anymore we then
looked at the behavior exhibited by most known malicious code and realized it caused a significant
drain to a phones battery. We therefore investigated the effectiveness of using this approach and
then revisited this effectiveness with the statistical based model and less power hungry location
tracking malicious code simulator. We were able to show that power based detection was promising
but still needed improvement, but most importantly didn’t have a negative impact on the users.
8.1.3
Location Based Power Detection
Realizing we needed to improve on general power based detection by adding another variable
to further tune the normal profiles to increase accuracy we considered the behaviors we saw in the
way we used smartphones. This realization lead to the concept that depending on where we were
we used our smartphone differently. As such this approach we investigated normal power profiles
at different locations. This was also done as a stand alone project and is the primary focus of this
thesis. This technique shows great success at detecting malicious code, having an unmeasurable
impact on the user in terms of battery drain, and resulted in low calculated false positive rates and
observed no false positives while actually running as a stand alone project. This is a significant
improvement over general power based detection and more importantly proves the thesis of this
paper that low profile techniques can be effective at detecting malicious code with a minimal impact
on the user and a low false positive rate.
8.1.4
Time Domain Based Power Detection
We decided to take the data collected a step farther and investigate if another tuning variable
could be the time of day a user uses their phone. This data had mixed results but did offer the
69
general understanding that users did in fact use their phones differently depending on the time of
day; however, the differences weren’t as different as the location based power distributions as such
believe this is the cause for the inconsistent effectiveness of this approach.
8.2
Future Work
We now will discuss a few future avenues of work to extend or build upon the fundamental
contributions of this thesis.
8.2.1
Additional Tuning Factors
To build on the previous approaches presented in this these we could continue to find new
tuning factors to improve the normal profile based on the way a user uses their phone. To do this
we may actually need to collect more data and apply clustering algorithms to get a picture for
which tuning factors have a significant impact.
8.2.2
Combining Tuning Factors
We are not sure what future tuning factors might be useful to apply; however, based on the
current work presented and discussed in this thesis we see the potential approach to overlap tuning
factors into a single approach could be effective. To this end we could use location based power
detection that in each location had time of day power profiles to potentially increase the normal
profiles accuracy in each location.
8.2.3
Integrating With Another System
So far all the work presented in this thesis and even future work primarily focuses on improving anomaly detection and then throwing up a red flag when potential malicious code might
exist. To really put these approaches in a position to potentially fix, remove, or identify the specific
malicious code it would likely need to be integrated with a larger project that employs more sophisticated methods for removal and identification of specific malicious code. The work presented
70
in this thesis would be a great way to determine when such higher impact systems needed to be
used by the red flag triggering the use of these systems.
8.2.4
Known Malicious Code
To really improve and test the effectiveness of the work presented in this thesis the true
test would be to actually test it against the known malicious code in the wild that the simulators
were built to replicate the behaviors of and additionally test against other new threats that have
behaviors likely to fall into the spectrum that the work in this thesis should be suited for detecting.
This would entail getting a grant for further research into this as currently relying on friends, family,
and others to run the data collection and simulations is one thing, but asking them to install actual
malicious code that could cost them money, damage their phone, and put their personal information
at risk is not an appropriate way to do research in our opinion. As such a grant to get test phones
and service for a period of time to collect data and then be able to install actual malicious code on
would be a significant benefit to this work in the future.
8.3
Final Remarks
To conclude, we have presented a variety of methods that aim to detect malicious code on
smartphones without having a noticeable impact on the user. By succeeding in detecting the
presence of malicious code with no noticeable impact our hope is that this could be integrated with
other systems such that users will be more likely to install and use malicious code detection on their
smartphones without having to worry about the tool to detect the malicious code draining their
battery more than potential malicious code might. As such potentially leading to an improved and
safer smartphone environment.
Bibliography
[1] Android developers. http://developer.android.com/index.html. Accessed: 2013-05-15.
[2] Battery graph app. https://play.google.com/store/apps/details?id=com.modroid.
battery&hl=en. Accessed: 2013-05-16.
[3] Busybox. http://www.busybox.net/. Accessed: 2013-05-15.
[4] Busybox git repo. http://git.busybox.net/busybox/. Accessed: 2013-05-15.
[5] Facebook. http://www.facebook.com/. Accessed: 2013-05-17.
[6] Google+. http://plus.google.com/. Accessed: 2013-05-17.
[7] Google play store. https://play.google.com. Accessed: 2013-05-16.
[8] Malicious code detector.
https://play.google.com/store/apps/details?id=
thesisproject.maliciouscodedetector. Accessed: 2013-05-17.
[9] Netflix. http://netflix.com. Accessed: 2013-05-15.
[10] Pandora internet radio. http://pandora.com. Accessed: 2013-05-15.
[11] Se android. http://selinuxproject.org/page/SEAndroid. Accessed: 2013-05-15.
[12] Xda developers forums. http://forum.xda-developers.com/. Accessed: 2013-05-17.
[13] O. Aciicmez, A. Latifi, J.P. Seifert, and X. Zhang. A Trusted Mobile Phone Prototype. In
5th IEEE Consumer Communications and Networking Conference, 2008. CCNC 2008, pages
1208–1209, 2008.
[14] A Baliga, L Iftode, and X Chen. Automated containment of rootkits attacks. computers &
security, Jan 2008.
[15] J. Bickford, R. OHare, A. Baliga, V. Ganapathy, and L. Iftode. Rootkits on Smart Phones:
Attacks, Implications and Opportunities.
[16] A Bose. Propagation, detection and containment of mobile malware. kabru.eecs.umich.edu,
Jan 2008.
[17] A Bose, X Hu, K Shin, and T Park. Behavioral detection of malware on mobile handsets. . . .
international conference on Mobile . . . , Jan 2008.
72
[18] J Cheng, S Wong, H Yang, and S Lu. Smartsiren: virus detection and alert for smartphones.
Proceedings of the 5th . . . , Jan 2007.
[19] B Dixon and S Mishra. On rootkit and malware detection in smartphones. . . . and Networks
Workshops (DSN-W), Jan 2010.
[20] Bryan Dixon, Yifei Jiang, Abhishek Jaiantilal, and Shivakant Mishra. Location based power
analysis to detect malicious code in smartphones. In Proceedings of the 1st ACM workshop on
Security and privacy in smartphones and mobile devices, SPSM ’11, pages 27–32, New York,
NY, USA, 2011. ACM.
[21] Adrienne Porter Felt, Matthew Finifter, Erika Chin, Steve Hanna, and David Wagner. A
survey of mobile malware in the wild. In Proceedings of the 1st ACM workshop on Security
and privacy in smartphones and mobile devices, SPSM ’11, pages 3–14, New York, NY, USA,
2011. ACM.
[22] Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. Riskranker: scalable and accurate zero-day android malware detection. In Proceedings of the 10th international
conference on Mobile systems, applications, and services, pages 281–294. ACM, 2012.
[23] B Halpert. Mobile device security. . . . of the 1st annual conference on Information security
. . . , Jan 2004.
[24] N Petroni Jr, T Fraser, and J Molina. Copilot-a coprocessor-based kernel runtime integrity
monitor. Proceedings of the 13th . . . , Jan 2004.
[25] Hahnsang Kim, Joshua Smith, and Kang G. Shin. Detecting energy-greedy anomalies and
mobile malware variants. In Proceeding of the 6th international conference on Mobile systems,
applications, and services, MobiSys ’08, 2008.
[26] Alexey Kushnerov. Smart phone under threat of attacks. http://www.theticker.org/about/
2.8220/smart-phone-under-threat-of-attacks-1.2174454, March 2010. Accessed: 201305-15.
[27] Lei Liu, Guanhua Yan, Xinwen Zhang, and Songqing Chen. Virusmeter: Preventing your
cellphone from spies. In Engin Kirda, Somesh Jha, and Davide Balzarotti, editors, Recent
Advances in Intrusion Detection, Lecture Notes in Computer Science. Springer Berlin / Heidelberg.
[28] J Oberheide, K Veeraraghavan, and E Cooke. Virtualized in-cloud security services for mobile
devices. . . . in Mobile Computing, Jan 2008.
[29] P Traynor, V Rao, T Jaeger, P McDaniel, and T La Porta. From mobile phones to responsible
devices. Wiley Online Library, Jan 2010.
[30] M.M. Williamson et al. Throttling viruses: Restricting propagation to defeat malicious mobile
code. In Proceedings of the 18th Annual Computer Security Applications Conference, page 61.
Citeseer, 2002.
[31] L Xie, X Zhang, A Chaugule, and T Jaeger. Designing system-level defenses against cellphone
malware. Proc. of 28th IEEE . . . , Jan 2009.
Appendix A
Additional Tables of Data
Table A.1: This table shows the effectiveness of location based power detection through the use of
statistical analysis with sigma values for the Nexus 4 (1). It also includes the effective observed
false positive rate from the model running on the phone as a stand alone project and the calculated
false positive % when using the statistical model against the collected data.
Location
0
2
3
14
16
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
1.40
0.47
4.26
2.13
1.85
1.85
2.04
2.04
9.52
0.00
Observed False Positive %
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
74
Table A.3: This table shows the effectiveness of location based power detection through the use of
statistical analysis with sigma values for the Nexus 4 (2). It also includes the effective observed
false positive rate from the model running on the phone as a stand alone project and the calculated
false positive % when using the statistical model against the collected data.
Location
1
2
4
7
10
13
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
1.72
0.79
1.77
1.42
3.92
3.92
2.79
2.79
5.88
3.92
3.08
3.08
Observed False Positive %
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
75
Table A.5: This table shows the effectiveness of location based power detection through the use of
statistical analysis with sigma values for the Nexus 4 (3). It also includes the effective observed
false positive rate from the model running on the phone as a stand alone project and the calculated
false positive % when using the statistical model against the collected data.
Location
0
1
3
6
28
31
42
44
45
57
72
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.27
0.27
2.44
2.44
2.08
1.56
2.87
2.87
0.39
0.39
2.56
2.56
7.14
7.14
3.33
3.33
3.36
2.52
2.76
2.07
3.03
3.03
Observed False Positive %
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
76
Table A.7: This table shows the effectiveness of time domain based power detection through the
use of statistical analysis with sigma values for the Nexus 4 (1).
Time of Day
Night
Morning
Afternoon
Evening
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.78
0.78
1.84
1.23
1.17
0.94
2.27
2.27
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
77
Table A.9: This table shows the effectiveness of time domain based power detection through the
use of statistical analysis with sigma values for the Nexus 4 (2).
Time of Day
Night
Morning
Afternoon
Evening
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
1.65
1.65
0.47
0.47
2.67
2.16
1.55
1.16
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Table A.11: This table shows the effectiveness of time domain based power detection through the
use of statistical analysis with sigma values for the Nexus 4 (3).
Time of Day
Night
Morning
Afternoon
Evening
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
1.14
1.14
1.30
0.78
0.65
0.32
3.50
1.95
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
78
Table A.13: This table shows the effectiveness of time domain based power detection through the
use of statistical analysis with sigma values for the SAMSUNG-SGH-I747.
Time of Day
Night
Morning
Afternoon
Evening
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.65
0.65
2.40
1.80
0.41
0.41
2.51
2.21
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
-
79
Table A.15: This table shows the effectiveness of time domain based power detection through the
use of statistical analysis with sigma values for the Galaxy Nexus (1).
Time of Day
Night
Morning
Afternoon
Evening
Cutoff
2.0
3.0
2.0
3.0
2.0
3.0
2.0
3.0
False Positive %
5.56
5.56
6.32
0.66
1.26
0.44
0.33
0.22
SMS
-
Location
-
80
Table A.17: Hour of Day effectiveness for Nexus 4 (1).
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
1.96
1.96
5.26
5.26
2.33
2.33
5.56
5.56
3.57
3.57
3.85
3.85
5.88
2.94
3.85
3.85
3.23
3.23
5.17
1.72
6.35
3.17
2.30
2.30
1.64
1.64
0.98
0.98
1.47
1.47
2.86
1.43
4.00
4.00
2.82
2.82
1.54
1.54
2.90
2.90
2.17
2.17
1.69
1.69
4.17
2.08
2.44
2.44
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
81
Table A.19: Hour of Day effectiveness for Nexus 4 (2).
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
2.82
2.82
2.44
2.44
4.76
4.76
4.17
4.17
5.56
5.56
0.00
0.00
4.55
4.55
4.35
4.35
4.00
4.00
2.27
2.27
4.35
2.90
5.75
1.15
2.00
1.00
3.05
3.05
2.55
2.55
2.50
2.50
2.98
1.79
4.55
4.55
3.92
1.96
1.32
1.32
0.82
0.82
3.25
3.25
0.76
0.76
1.12
1.12
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
82
Table A.21: Hour of Day effectiveness for Nexus 4 (3).
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
2.04
2.04
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
11.11
0.00
3.12
3.12
1.96
1.96
6.41
5.13
3.85
1.92
0.90
0.90
0.70
0.70
3.17
3.17
2.22
2.22
3.87
3.31
2.35
2.35
1.17
0.58
2.17
2.17
2.07
2.07
2.74
2.74
3.80
3.80
2.90
2.90
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
83
Table A.23: Hour of Day effectiveness for Samsung SGH-I727.
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
2.13
2.13
2.08
2.08
2.50
2.50
4.17
4.17
5.56
5.56
2.38
2.38
2.63
2.63
4.00
4.00
4.44
2.22
0.58
0.58
1.56
0.78
3.41
1.52
1.09
1.09
1.60
1.28
1.13
1.13
0.36
0.36
1.12
0.75
0.97
0.97
1.17
0.58
1.30
1.30
1.27
1.27
0.51
0.51
1.29
1.29
0.71
0.71
SMS
-
Location
Detected
Detected
Detected
Detected
Detected
Detected
Detected
-
84
Table A.25: Hour of Day effectiveness for Samsung SGH-I747.
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
2.00
2.00
2.70
2.70
5.26
0.00
0.00
0.00
8.33
8.33
7.69
0.00
0.00
0.00
0.00
0.00
5.00
5.00
3.57
3.57
3.45
3.45
3.70
1.23
1.46
1.46
2.72
1.36
1.18
1.18
1.66
1.10
0.68
0.68
0.60
0.60
1.95
1.30
1.27
1.27
3.12
1.25
2.34
2.34
1.80
1.80
2.56
1.28
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
-
85
Table A.27: Hour of Day effectiveness for Galaxy Nexus (1).
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
2.99
2.99
5.83
2.91
0.64
0.64
6.31
0.00
4.17
2.31
5.24
4.29
1.00
1.00
1.19
0.79
3.66
0.37
2.44
2.44
6.56
2.81
5.10
4.31
2.92
1.30
0.85
0.85
2.00
2.00
10.00
0.00
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
-
Location
-
86
Table A.29: Hour of Day effectiveness for Samsung Galaxy S3 SCH-I535 (1).
Time of Day
0
1
2
3
4
5
6
7
8
9
10
11
noon
13
14
15
16
17
18
19
20
21
22
23
Cutoff
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
2.5
3.0
False Positive %
0.90
0.90
2.74
2.74
1.35
1.35
0.00
0.00
11.11
0.00
0.00
0.00
0.00
0.00
4.00
0.00
1.27
1.27
2.02
1.01
0.83
0.83
2.03
2.03
0.62
0.62
0.52
0.52
0.78
0.26
0.60
0.30
0.28
0.28
2.43
2.43
0.48
0.48
1.28
0.64
0.53
0.53
0.33
0.33
2.11
2.11
0.72
0.72
SMS
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Detected
Location
-
Appendix B
Additional Graphs of Data
88
Figure B.1: Figure shows the average rate of change for various locations visited by the phone
Galaxy Nexus (1).
Figure B.2: Figure shows the average rate of change for various locations visited by the phone
Nexus 4 (1).
89
Figure B.3: Figure shows the average rate of change for various locations visited by the phone
Nexus 4 (2).
Figure B.4: Figure shows the average rate of change for various locations visited by the phone
Nexus 4 (3).
90
Figure B.5: Figure shows the average rate of change for various locations visited by the phone
Nexus 4 (4).
91
Figure B.6: Figure shows the power distribution based on the time of day for phone SGH-I727.
92
Figure B.7: Figure shows the power distribution based on the hour of day for phone SGH-I727.
93
Figure B.8: Figure shows the power distribution based on the time of day for phone SGH-I747.
94
Figure B.9: Figure shows the power distribution based on the hour of day for phone SGH-I747.
95
Figure B.10: Figure shows the power distribution based on the time of day for phone Galaxy Nexus
(1).
96
Figure B.11: Figure shows the power distribution based on the hour of day for phone Galaxy Nexus
(1).
97
Figure B.12: Figure shows the power distribution based on the time of day for phone Nexus 4 (1).
98
Figure B.13: Figure shows the power distribution based on the hour of day for phone Nexus 4 (1).
99
Figure B.14: Figure shows the power distribution based on the time of day for phone Nexus 4 (2).
100
Figure B.15: Figure shows the power distribution based on the hour of day for phone Nexus 4 (2).
101
Figure B.16: Figure shows the power distribution based on the time of day for phone Nexus 4 (3).
102
Figure B.17: Figure shows the power distribution based on the hour of day for phone Nexus 4 (3).