University of Colorado, Boulder CU Scholar Computer Science Graduate Theses & Dissertations Computer Science Spring 1-1-2013 Exploring Low Profile Techniques for Malicious Code Detection on Smartphones Bryan Charles Dixon University of Colorado at Boulder, [email protected] Follow this and additional works at: http://scholar.colorado.edu/csci_gradetds Part of the Information Security Commons Recommended Citation Dixon, Bryan Charles, "Exploring Low Profile Techniques for Malicious Code Detection on Smartphones" (2013). Computer Science Graduate Theses & Dissertations. 69. http://scholar.colorado.edu/csci_gradetds/69 This Dissertation is brought to you for free and open access by Computer Science at CU Scholar. It has been accepted for inclusion in Computer Science Graduate Theses & Dissertations by an authorized administrator of CU Scholar. For more information, please contact [email protected]. Exploring Low Profile Techniques for Malicious Code Detection on Smartphones by Bryan Dixon B.S., North Carolina State University, 2007 B.S., North Carolina State University, 2007 B.S., North Carolina State University, 2007 M.S., University of Colorado at Boulder, 2012 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science 2013 This thesis entitled: Exploring Low Profile Techniques for Malicious Code Detection on Smartphones written by Bryan Dixon has been approved for the Department of Computer Science Shivakant Mishra Prof. Richard Han Prof. Qin Lv Prof. John Black Prof. Eric Keller Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii Dixon, Bryan (Ph.D., Computer Science) Exploring Low Profile Techniques for Malicious Code Detection on Smartphones Thesis directed by Prof. Shivakant Mishra In recent years there has been a growing number of viruses, rootkits, and malware designed to gain access to system resources and information stored on smartphones. Most current approaches for detecting this malicious code have detrimental impacts on the user in terms of reduced functionality, slower network speeds, or loss of battery life. This work presents a number of approaches that have a minimal impact on the user but offer successful detection of potential malicious code on the smartphone. We do this primarily by focusing on anomalous power use as a method for detecting the presence of malicious code. This work also introduces ways to fine-tune the process by establishing a normal profile of power usage for each user, which increases the rate of malware detection. Dedication To my parents. v Acknowledgements This thesis is a culmination of over four years of work, and without the involvement and help of many people in my professional development as a researcher and my life personally I wouldn’t be where I am today. First, I want to thank my advisor, Shivakant Mishra, who has not only been instrumental in encouraging my progress, but helped encourage my pursuit of smartphone security research on which this work is based. Second, I want to thank my thesis committee, Richard Han, John Black, Qin (Christine) Lv, and Eric Keller, for their invaluable comments and suggestions. I’d also like to thank Jacqueline DeBoard for offering guidance throughout this whole process. I would also like to thank my many colleagues for their support and willingness to act as soundboards for ideas, as well as the advice they offered. I am particularly thankful to Harold Gonzales, Mike Gartrell, Dirk Grunwald, Doug Sicker, Richard Han, John Black, Yifei Jiang, Donny Warbritton, Junho Ahn, Andy Sayler, Ning Gao, Kevin Bauer, Ali Alzabarah, Allison Brown, and Frank Di Natale. I would like to thank all my friends, family, and individuals who offered to help me with my thesis project by running my research code on their smartphones. Without them none of this would be possible. I’d like to offer a special thanks to those who took the extra time and effort to run the malicious code simulator so I could get results to back up the effectiveness of my proposed thesis. Lastly, I would like to thank Jeannette Pepin for her constant encouragement and help editing and reviewing my writings, as well as helping me run my projects in their buggy development stage, and still more time helping me simulate data. Contents Chapter 1 Introduction 1 1.1 Need for Malicious Code Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Detection Doesn’t Need to Impact the User . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Fundamental Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Background and Related Works 2.1 2.2 5 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 6 Security Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Propagation and Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Malicious Code Simulators 20 3.1 Motivation for Use of Malware Simulation vs Real Malware . . . . . . . . . . . . . . 20 3.2 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 SMS Spam Trojan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2 Location Tracking Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 vii 3.3 Future of Malware Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Computer Based Detection Through File Integrity 23 4.1 Motivation for Computer Based Detection . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2.1 4.3 Keyed Hash Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.1 File Change Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4 Future Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5 General Power Based Detection 29 5.1 Motivation for General Power Based Detection . . . . . . . . . . . . . . . . . . . . . 29 5.2 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.3.1 5.4 Statistical Approach with New Data . . . . . . . . . . . . . . . . . . . . . . . 35 Future Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6 Location-Based Power Detection 39 6.1 Motivation for Location Based Power Detection . . . . . . . . . . . . . . . . . . . . . 39 6.2 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.2.1 Design Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.2.2 Deployment and Recruitment . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.5 Future Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.5.1 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 viii 7 Time Domain Based Detection 58 7.1 Motivation for Time Domain Based Detection . . . . . . . . . . . . . . . . . . . . . . 58 7.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7.3 Future Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 8 Conclusions and Future Work 8.1 8.2 8.3 67 Fundamental Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.1.1 File Integrity to Accelerate Computer Based Scans . . . . . . . . . . . . . . . 67 8.1.2 General Power Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.1.3 Location Based Power Detection . . . . . . . . . . . . . . . . . . . . . . . . . 68 8.1.4 Time Domain Based Power Detection . . . . . . . . . . . . . . . . . . . . . . 68 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.2.1 Additional Tuning Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.2.2 Combining Tuning Factors 8.2.3 Integrating With Another System . . . . . . . . . . . . . . . . . . . . . . . . 69 8.2.4 Known Malicious Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Bibliography 71 Appendix A Additional Tables of Data 73 B Additional Graphs of Data 87 Tables Table 2.1 Propagation Vectors Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1 File Integrity Hashing Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.1 General Power Analysis for All Phones . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.1 Statistics for the standalone Project . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3 Location Based Effectiveness for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . 52 6.5 Location Based Effectiveness for SCH-I535 (2) 7.1 Time of Day effectiveness for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . . . 63 7.3 Time of Day effectiveness for SCH-I535 (2) . . . . . . . . . . . . . . . . . . . . . . . 64 7.5 Hour of Day effectiveness for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . . . 65 7.7 Hour of Day effectiveness for Samsung Galaxy S3 SCH-I535 (2) . . . . . . . . . . . . 66 . . . . . . . . . . . . . . . . . . . . . 57 A.1 Location Based Effectiveness for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . 73 A.3 Location Based Effectiveness for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . 74 A.5 Location Based Effectiveness for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . 75 A.7 Time of Day effectiveness for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . . . 76 A.9 Time of Day effectiveness for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . . . 77 A.11 Time of Day effectiveness for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . . . 77 A.13 Time of Day effectiveness for SAMSUNG-SGH-I747 . . . . . . . . . . . . . . . . . . 78 x A.15 Time of Day effectiveness for Galaxy Nexus (1) . . . . . . . . . . . . . . . . . . . . . 79 A.17 Hour of Day effectiveness for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . . . 80 A.19 Hour of Day effectiveness for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . . . 81 A.21 Hour of Day effectiveness for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . . . 82 A.23 Hour of Day effectiveness for Samsung SGH-I727 . . . . . . . . . . . . . . . . . . . . 83 A.25 Hour of Day effectiveness for Samsung SGH-I747 . . . . . . . . . . . . . . . . . . . . 84 A.27 Hour of Day effectiveness for Galaxy Nexus (1) . . . . . . . . . . . . . . . . . . . . . 85 A.29 Hour of Day effectiveness for Samsung Galaxy S3 SCH-I535 (1) . . . . . . . . . . . . 86 Figures Figure 2.1 Visualization of the GPS exploit a root kit could perform . . . . . . . . . . . . . . . 7 2.2 Normal Behavior vs Root Kit Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Visualization of the eavesdrop exploit a root kit could perform . . . . . . . . . . . . 9 2.4 Visualization of Smartphones normal power drain versus all radios in use . . . . . . 10 2.5 Clustering Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.6 Performance of the two mitigation techniques . . . . . . . . . . . . . . . . . . . . . . 12 2.7 How a CAPTCHA system could be implemented for use on a smartphone . . . . . . 14 2.8 Visualization of how SmartSiren is implemented . . . . . . . . . . . . . . . . . . . . . 16 2.9 Visualization of how Risk Ranker works . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.1 Battery level change for Day 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2 Battery level change for Day 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3 Battery level change for Day 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.4 Battery level change for Day 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.5 Graph of percent change in Battery graphs over 2000 samples . . . . . . . . . . . . . 32 5.6 Capture Percentage Location Tracking General Power . . . . . . . . . . . . . . . . . 33 5.7 Capture Percentage SMS Spam General Power . . . . . . . . . . . . . . . . . . . . . 34 5.8 False Positive Rates for General Power . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.1 Countries Running standalone Project . . . . . . . . . . . . . . . . . . . . . . . . . . 48 xii 6.2 Average Rate of Power Change at Locations for SGH-727 . . . . . . . . . . . . . . . 50 6.3 Average Rate of Power Change at Locations for SGH-747 . . . . . . . . . . . . . . . 50 7.1 Hour of day based power use for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . 59 7.2 Time of day based power use for Nexus 4 (4) . . . . . . . . . . . . . . . . . . . . . . 60 7.3 Hour of day based power use for SCH-I535(2) . . . . . . . . . . . . . . . . . . . . . . 61 7.4 Time of day based power use for SCH-I535(2) . . . . . . . . . . . . . . . . . . . . . . 62 B.1 Average Rate of Power Change at Locations for Galaxy Nexus (1) . . . . . . . . . . 88 B.2 Average Rate of Power Change at Locations for Nexus 4 (1) . . . . . . . . . . . . . . 88 B.3 Average Rate of Power Change at Locations for Nexus 4 (2) . . . . . . . . . . . . . . 89 B.4 Average Rate of Power Change at Locations for Nexus 4 (3) . . . . . . . . . . . . . . 89 B.5 Average Rate of Power Change at Locations for Nexus 4 (4) . . . . . . . . . . . . . . 90 B.6 Time of day based power use for SGH-I727 . . . . . . . . . . . . . . . . . . . . . . . 91 B.7 Hour of day based power use for SGH-I727 . . . . . . . . . . . . . . . . . . . . . . . 92 B.8 Time of day based power use for SGH-I747 . . . . . . . . . . . . . . . . . . . . . . . 93 B.9 Hour of day based power use for SGH-I747 . . . . . . . . . . . . . . . . . . . . . . . 94 B.10 Time of day based power use for Galaxy Nexus (1) . . . . . . . . . . . . . . . . . . . 95 B.11 Hour of day based power use for Galaxy Nexus (1) . . . . . . . . . . . . . . . . . . . 96 B.12 Time of day based power use for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . 97 B.13 Hour of day based power use for Nexus 4 (1) . . . . . . . . . . . . . . . . . . . . . . 98 B.14 Time of day based power use for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . 99 B.15 Hour of day based power use for Nexus 4 (2) . . . . . . . . . . . . . . . . . . . . . . 100 B.16 Time of day based power use for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . 101 B.17 Hour of day based power use for Nexus 4 (3) . . . . . . . . . . . . . . . . . . . . . . 102 Chapter 1 Introduction 1.1 Need for Malicious Code Detection Smartphones are such an important part of many peoples daily lives now. And this impact continues to become a growing trend as more and more people make the move to owning smartphones. This leads to the need for these smartphones needing to be secure and safe from malicious entities and code. The growing amount of malicious code aimed at smartphones presents an increasing risk to smartphone users, making the detection of such code ever more important. The significant amount of personal data stored and captured by smartphones means that privacy as well as general security is a concern. In addition, malicious code can generate increased activity that drains a significant amount of power from the phone and can cause congestion in the cellular network. 1.2 Detection Doesn’t Need to Impact the User Given the need to protect smartphones from this malicious code, a variety of research and projects have been proposed and/or developed to try and combat the threat that malicious code can pose. Some of these proposed project though can have as much if not more of a negative impact on a user in the impact it has on the functionality of the smartphone. One major limitation for detection software that runs on smartphones is that smartphones have limited storage, RAM, and processing power in comparison to the hardware capabilities of modern computers. As such, techniques that effectively detect malicious code on a computer may 2 not be a feasible solution for a smartphone. In addition, the limitations imposed by the app development framework can hinder the functionality of detection software on a smartphone. The key drawback of detection software on a smartphone is that in order to be effective, such a service or app may need to be perpetually running, thus placing a significant drain on the battery. This factor would be a huge drawback for users, who would likely opt to risk malicious code over the loss of battery life. Most conventional anti-virus solutions for smartphones tend to be on-demand, or on-install scans that will likely limit the effectiveness to combat new unknown threats. This thought that tools to detect the presence of malicious code need to be resource hungry and have a negative impact on the user either in the way of significant battery drain or reducing the functionality of the phone by reducing the available CPU cycles and memory. This has lead to other strategies discussed in the related work that can other negative impacts on a user in order to fight potential. Many of these would observe increases in network traffic as anomalous and then throttle the increase that could be anything from a malicious app or a user opting to use Netflix, Pandora, or some other streaming service. [9][10] 1.3 Problem Statement q While it is apparent that the need for a way to detect malicious code exists, most current approaches tend to lead to negative impacts on the user. To this end, in this dissertation, we seek to illustrate that malicious code can be detected with a minimal impact on the user and smartphone. We assert the following thesis: Light weight anomaly detection techniques can have a statistically significant capture percentage and a low false positive percentage without many of the drawbacks conventional detection techniques cause. We will focus primarily on how intelligent data collection combined with simple anomaly detection results in a negligible impact on the user and a statistically significant capture rate. 3 1.4 Fundamental Contributions This thesis contributes the following to the field of malicious code detection for smartphones. • File Integrity Based Detection. (Chapter 4) An early work that utilized the fact that at the time users would typically plug their phones in at least daily to a computer to sync. This explored how to improve potentially scanning from a computer such that it would take less time and potentially be less susceptible to malicious code escaping detection.[19] • General Power Based Detection. (Chapter 5) Recognizing that most malicious code behaviors would have a significant drain to a smartphone’s battery we began our investigation into how to detect malicious code against a general normal profile for how a user uses their phone. • Location-Based Power Detection. (Chapter 6) Realizing that we could continue to improve on the general power based detection, we added location data into how we look at and generate what is normal for a user based on where they are located.[20] • Time Domain Based Detection. (Chapter 7) A final addition we looked into was using time domain based analysis that operates on the concept that a user uses their phone differently based on the time of day. 1.5 Dissertation Outline The remainder of this dissertation is organized as follows. In chapter 2 we will provide a survey of the relevant background and related work from the malicious code detection literature. Then in Chapter 3 the malware simulators that were created for testing effectiveness of the projects proposed in this paper. Chapter 4 will provide details of our initial work that looked into how to improved computer based scanning. In chapter 5 we will give details of how we initially looked general power analysis and then its effectiveness when we revisited it with the data and approach taken in the stand alone prototype used for the thesis project. Then in chapter 6 we will go into 4 details of how the thesis project prototype was designed, the results of the data, and how effective it was. Then chapter 7 will show how effective using the power usage at different times of day is. Finally, Chapter 8 will summarize the fundamental contributions of this thesis and highlight both what conclusions can be drawn and a variety of future work that could be derived from the work. Chapter 2 Background and Related Works 2.1 Background With an uptake in the smartphone market there have been an emergence of viruses, malware, and even root kits to gain access to smartphones. These security threats can be troublesome due to the large amount of personal data that smartphone users store in their phones. Additionally this malicious code will lead to both a dramatic drain in the battery of the phone, as well as creating an abnormal load on the cellular network in comparison to an expected phones load. And with the phone networks getting even faster and these devices having constant access to the Internet as long as they have a signal. This will make these devices even more susceptible to malicious attacks and a more viable device to exploit and gain control of. As such a need to study malicious code on cellphones, as well as how the malicious code could be mitigated, stopped, or detected has been a growing trend of research. And since there are such a varied number of infection vectors that these viruses, malware, and root kits can make use of to propagate themselves this Table 2.1: Table of propagation vectors and known examples of malicious code that use the vectors 6 makes them harder to stop on a network level. Figure 2.1 shows a number of existing smartphone viruses and their vector of propagation. [18] Smartphones offer a unique development platform in comparison to personal computer development platforms for the software to detect and remove the malicious code. A primary hinderance is the limitations of hardware, such as slower processors, limited storage and ram. These factors lead to a limitation on the effectiveness of applying modern algorithms for searching for this malicious code on the phone itself. Additionally having a service running perpetually looking for malicious code would dramatically increase the drain on the phones battery, which would be a drawback to users who would probably choose to risk malicious code over running code on their phone to find malicious code for that reason. And having only scanning at install time, which is how most existing solutions have solved the needing to constantly scan issue will also will allow new unidentified threats get past for a few users and won’t be detected unless the user initiates a full scan. In addition to there being trouble developing solutions on the phone do to the physical limitations of the device and the necessity for the solution on the phone not to have a significant drain on the battery, there are limitations on the functionality provided by the smartphone developers in regards to what they allow to run on their phones. Additionally the tools to develop kernel level software to combat malicious code are not readily available for most smartphones, and even those that provide such tools are not as well documented as the development tools for the higher level or app level development tools are documented. 2.1.1 Security Threats A primary security threat that is the threat of root kits on smartphones as these devices and operating systems are becoming increasingly complex and more closely resembling a small personal computer, and hence becoming viable targets for root kits. Root kits which compromise and gain long term control over an infected machine and can be the worst of the malicious code in that it can expose all of the functionality of the core system to a malicious user. With this level of 7 Figure 2.1: Visualization of the GPS exploit a root kit could perform 8 Figure 2.2: Normal behavior vs root kit behavior for computer based detection system. access and control a malicious user could make use of the root kit to eavesdrop on conversations, emails, messaging services, etc.[26] And as it has low level access and can virtually hide itself from detection from any tools on the device itself it means that most of the tools to conventionally look for malicious code on a system could be rendered useless as it could hide itself. Figure 2.2 shows how a root kit could hide itself from a detection system. Additionally, Figure 2.1 shows how a root kit could be used to query a compromised phone for its location without alerting the phone user that such a query took place.[15] This is a good real world example of how a root kit could gain private information from a user. This one example could be a huge security issue for personal security as if a stalker or assassin could easily use such an exploit to gain real time information as to their target’s location. An additional security threat that was discussed in these investigations brought to light about another reason that would be a huge threat in places with confidential or classified information being discussed during regular meetings would be for the phone, which is usually overlooked, being enabled based off a calendar signal being caught and then remote dialing the attacker allowing them to listen in on a conversation.[15] As root kits are able to hide from most detection tools, and the level of access they can gain on a phone, as well as the private information that phones are exposed to, this makes for a good motivator for strategies to detect this malicious code on a smartphone. Figure 2.3 illustrates how this could be accomplished. With this in mind developing tools to determine if a phone indeed has been compromised by a root kit is definitely a necessity 9 Figure 2.3: Visualization of the eavesdrop exploit a root kit could perform 10 Figure 2.4: Visualization of Smartphones normal power drain versus all radios in use when considering malicious code detection for smartphones. In addition to these root kits providing a huge security issue for users, corporations, and governments they also create a huge drain on the battery of smartphones. Figure 2.4 illustrates what the battery life is like for 3 different phones normally and what the battery life looks like when a lot more of the resources are being utilized a lot. As is pretty apparent if you use the radios to send a large number of messages constantly such as information on where the phone is or relaying a conversation via an eavesdropping function. This noticeable change in battery life is the motivation for the research discussed in this work focused on using power change data as a mechanism for malicious code detection. 2.2 Related Work When looking at related work it is easiest to think of the approaches as either mitigation, prevention, or detection. We will discuss each of the related projects that exist in those areas in more detail as well as some limitations they face. 11 Figure 2.5: Clustering Visualization 2.2.1 Propagation and Mitigation The need to understand the possible mechanisms for malicious code to propagate and the behavior such propagation takes is the start for further research in possible large scale mechanisms to contain or limit the propagation of malicious code during an initial outbreak period. This would be similar to CDC mechanism to limit the spread of large-scale viral outbreaks by slowing or stopping the spread through various containment protocols. Study of current investigations of cellphone virus propagation has lead to the understanding that most of them operate on a clustering behavior, as can be seen in Figure 2.5 as a virus propagates it effects the devices closer to it more than the ones farther away. [16] This makes sense since phones have a number of vectors available to them all of which will effect stuff closer to it geographically. The Cabir worm, which effects Symbian phones, replicates over Bluetooth connections via scanning for vulnerable Bluetooth phones in its vicinity. And even if it was utilizing the phones phonebook people commonly know others closer to themselves geographically, which would also result in a similar clustering behavior. Since a phones data usage or message sending volume would dramatically increase should it have a replicating piece of malicious code residing on it in order to send itself to other phones. And this observation that under normal operations the volume would be minimal could lead us 12 Figure 2.6: Performance of the two mitigation techniques 13 to methods to mitigate or slow the spread of this malicious code. [30] There were two approaches to implementing code to slow the propagation of malicious code after detecting a change in the expected normal data volume load. The first being Williamson’s rate limit algorithm, which operates on the phone and when an increase in data volume is detected will force a limit globally for all the phone operations. [30] The second is a proactive group behavior containment (PGBC) algorithm that would operate at the messaging server level, or on the cell tower level. [16] This second method could detect an increase in a phone’s data volume on its network and limit it and the other phones on its network’s data rate as well. Figure 2.6 shows a comparison of these two algorithms along with no change to mitigate the propagation, which shows that if you could put a layer on a cell tower or in the cell network to detect virus propagations and limit the network speed, the number of phones that would get infected over time would dramatically be limited. Though limiting the rate of data seems to be an effective solution for slowing a virus, this would slow the cell network and reduce the functionality of the phone and other phones on the network, which would not be a behavior that users and the cell providers would wish to invite for only limiting the spread and not preventing the spread of malicious code. Additionally these limiting mechanisms may identify normal behavior as malicious and slow regular traffic by reacting to what seems to be abnormal behavior. This would mean that when you expected to have lots of speed on the cell network you suddenly do not. This would be especially noticeable on the new cell networks running HPSA+ or 4G LTE (Long Term Evolution) speeds. Since these networks offer far faster network speeds, so users who started making full use of this speed could appear to be a malicious device and suddenly have their speed cut, which would result in very bad feedback to the cellular companies. This is one reason this is not a very viable solution. 2.2.2 Prevention Another approach was to provide system level defenses to challenge the malicious code by utilizing graphical turing test via a visual CAPTCHA (Completely Automated Public Turing test to Tell Computers and Human Apart).[31] Figure 2.7 shows a representation of how this challenge 14 Figure 2.7: How a CAPTCHA system could be implemented for use on a smartphone 15 could be implemented in the steps that it’d need to take in its implementation. This challenge was required to send or make connections through Bluetooth and messaging (SMS/MMS) services. Since the majority of the existing malware and viruses for smartphones utilize these two vectors, it could in theory prevent a phone that had been compromised from being able to use those vectors to exploit another phone. Also could possibly prevent the original exploit should to make a Bluetooth connection or process any payload from a MMS also require this challenge to be met. As it is nearly impossible for the malicious code to guess the correct challenge response, this could greatly reduce the exploitability of smartphones that implement a similar solution. Along similar lines was work to build a phone operating system that is built upon trusted code and everything running as an trusted code above a secure linux kernel.[13] This phone prototype was designed around a similar base kernel such as Security Enhanced Linux (SELinux) which has isolated operational domains. By creating distinct operational domains that code can operate in unique to each application this will make it tough if not nearly impossible for code to gain enough access to exploit and even if it was exploited would be limited in what it could accomplish in compared to a phone without such security protocols in place. Based our use and investigations of how to utilize Android for our own approach and it uses a similar design structure of trusted code.[1] A recent development in preventing malicious code is an actual release from the NSA, who developed SELinux, in the form of SE Android a project to address the lack of securing files and application domains in Android.[11] 2.2.3 Detection So far the current work has primarily related to that of how malicious code propagates once it has exploited a phone, how this malicious codes spread could be limited or slowed, and mechanisms could be added to limit the effectiveness of malicious code to get access to start. None of these have looked at how to detect malicious code or at the very least notify the user or network that it has been compromised. To this end, SmartSiren, utilizes a proxy system for its communications that would 16 Figure 2.8: Visualization of how SmartSiren is implemented 17 detect an increase in messaging usage and trigger alerts to both the rest of the phones utilizing this framework as well as the possibly infected phone alerting to them to the fact that the phone in question is compromised, or is believed to be infected.[18] Figure 2.8 shows how the SmartSiren system is setup. This could be useful as the rest of the phones could avoid communicating with the possibly infected device therefore better preventing and limiting the spread of malicious code over the cell network. As this system still requires a dramatic increase in messages for a period of time, a number of replications of the malicious code could still occur before any alert happens. Additionally if the rate of propagation of the malicious code was limited such that it would not be distinguishable from regular usage, which as the data usage of users increases with the increases in network speed, and services provided to these smartphones that is becoming increasingly more likely. There are two more detection projects that are closely related to some of this work in that they also use power for their basis of detection. The first project uses high grain power signatures to detect the power signatures of known malware and viruses.[25] The drawback of this project over the power based techniques we have investigated is that it requires the malicious code be known and requires a significantly more detailed power signature that isn’t able to be obtained on most smartphones. Plus constantly observing the power at such a high level of detail would have significantly more impact on the battery percentage on the phone then a periodic or triggered polling as used in the stand alone prototype discussed later and previous work. There is another recent research project called VirusMeter. [27] VirusMeter uses realtime power analysis similar to the general power based research in Chapter 5 with only power changes being investigated as a method of detection. This work came out after we had finished or initial power analysis but is interesting for comparison in it uses significantly more machine learning techniques and real time consumption analysis and we’ve shown effective detection without so much overhead. Additionally they measured a 1.5% drain in the battery running their system vs a phone not running it and running the data collection basis for the prototype we measure no change in the battery with the data collection on the phone or not, indicating it was negligible over a whole day to the point we 18 couldn’t measure it. A final project that falls into the classification of being used for detection is RiskRanker.[22] In Figure 2.9 we can see the general behavior of RiskRanker to effectively filter out via a few mechanisms of analysis apps that are likely to have pose significant risk through use of static code analysis. They actually investigated the libraries and code execution paths of known malicious code and then would investigate apps against a set of learning algorithm and analysis rules to filter out when an app presented significant risk based on a few risky behaviors being in the app. We think this project offers the most interesting and promising impact short term in actually potentially being put to use in Google and Amazon’s app stores to filter out apps that pose significant risk to the user. As this would likely happen in the store approval mechanism and not on the user’s phone or at the network level this is another example of work that offers an approach that would have a minimal impact on the user with great success at removing malicious code from the app stores to start with. And as they use the code access behaviors vs signatures it offers a solution that can potentially detect malicious code that hasn’t been identified yet as well. 19 Figure 2.9: Visualization of how Risk Ranker works Chapter 3 Malicious Code Simulators 3.1 Motivation for Use of Malware Simulation vs Real Malware As mentioned in previous sections most malicious code either has a negative impact on the phone or costs a user money. Additionally, some of the malicious code will root the users device and potentially corrupt the system in addition to the negative impacts. As such since all of this research is done by volunteers and on actual user phones and we don’t want to cause these issues on their phones, we have opted to instead build simulators that simulate the behaviors of the known malicious code discussed previously that was described by other researchers.[21][15] 3.2 Design and Implementation We designed two basic simulators, and then improved on one of them to make it more energy efficient on the assumption that smart malicious developers would do the same to try and defeat our approach. 3.2.1 SMS Spam Trojan The first kind of malicious code that we opted to simulate is an SMS Spam Trojan. This is based on a number of known malicious apps, viruses, and common method for a malicious code propagating itself. This simulator was developed on the premise that the malicious code would want to stay under the Android OS limit of 100 SMS in an hour, plus spread it’s communications over the hour to limit it’s impact on the battery for any given moment. 21 The simulator was built to use the android alarm to allow it to do periodic SMS messages. Additionally, the simulator uses the phone number of the device it is installed on to send the SMS messages back to the user running the simulator so as to not spam any given individual. 3.2.2 Location Tracking Malware The second kind of malicious code is based on a proposed root kit that could be installed on a users phone to allow a malicious user to gain real time access to a users location.[15] 3.2.2.1 Power Naive Version The first way we developed this was rather naive and didn’t make full use of some Android OS system features that would have allowed for the app to use significantly less power. This version constantly was getting the user’s location via the GPS and when it moved significantly would send a SMS message back to the user’s phone, again to insure it didn’t effect another user and not communicate where the user was to any third party. This version of the location tracking malware was used when we initially investigated the potential of general power analysis. 3.2.2.2 Power Aware Version When we designed the location based power detection data collector and eventually the stand alone location based malicious code detector every design decision was how to mitigate impact to the user including limiting the battery drain it caused. As such full investigation into better methods for making use of location without being as power hungry. To limit the power impact of gaining location data instead of constantly polling the OS for the current location, we put Android’s intent system to use to have the OS signal our service when the phone had moved a certain distance. This greatly reduced the CPU use of the location tracking malware even though we still made use of fine grain GPS based location services and a rather small distance to alert, unlike the location based data collector/thesis project. 22 This improved approach has even shown to use less power than the SMS Spam Trojan malicious code simulator that we will show in Chapter 5 used far less power than the power naive version of this malware simulation. 3.3 Future of Malware Simulations For future work the malware simulators could continue be expanded on to further investigate the effectiveness of both this current work and any future expansions on this work. This could be to include other known malicious code behaviors or to change the vectors of propagation. Finally, continuing to improve on the simulators power minimization to insure that any given behavior is being simulated making the best use of the Android OS to reduce the battery impact it has, which is the logical approach future malware will likely take to try and combat power based anomaly detection techniques either proposed in this work or mentioned in Chapter 2. Chapter 4 Computer Based Detection Through File Integrity 4.1 Motivation for Computer Based Detection Computer based detection is motivated by the idea that users often would connect their phones to their computers to sync them on about a daily basis. To this end we came up with the premise that if we utilized this use behavior of most smartphone users we could opt to scan from computers vs on the phone to allow the superior processing power and physical power source to allow for more sophisticated scanning to occur without impacting the user. 4.2 Design and Implementation When we began to look into what scanning from a computer would require we realized that it would likely need to get access to all the files and more importantly would likely copy the files to the computer as part of the scanning process. When we looked into how long this would take based on only 2 GB of total files including external storage for our test phone at the time, we discovered that this would in fact upwards of 2 hours or more to get all the files off the phone in order to scan them. This lead to the conclusion that no user would likely be interested in leaving their phone plugged in that long in order to look for malicious code as part of their regular sync schedule. To this end we decided that if we instead developed a way to determine which files had changed via some kind of file integrity check that we could reduce the time significantly and even prioritize files based on how often they regularly change so that we prioritize files that don’t regularly change first. 24 To determine the effectiveness of this approach we built a prototype to check for files changing via this file integrity mechanism. We made use of the Android ADB shell access tool to accomplish this for our Android test phone. This is due to the fact that the ADB tool provides file access to the phone and is how we tested copying all the files off the phone as well. The built in shell lacks a lot of functionality in comparison to that of a full linux platform. To get some much needed tools at the shell level we made use of tool that could be added onto rooted Android devices called BusyBox.[3][4] This provided us a more complete version of the Linux ls command that could provide us details about the files listed and if they were directories, files, executables, etc. The benefit of this is we could now build a shell script from the computer side that made use of this version of ls that could navigate the file structure and discover all of the files in the system. Now that we had a way to navigate the shell structure from the remote computer interface, we needed a way to determine if files had changed. This is commonly approached with the use of a hash. To this end we returned to using the extra features that BusyBox provides in the way of hashing tools. We then decided we would investigate the runtime of hashing all the files on the phone with the different hashing tools BusyBox provided. The reason for this is the computational complexity of the different hashing mechanisms, which were cksum, md5sum, and sha1sum, are different and thus lead to the assumption that a more secure hash would likely result in a longer time to run. We also developed a database to store the calculated hashes and files on the computer so we could then look at the time to rehash all the files for subsequent file integrity checks. Additionally, we used this further to investigate the frequency at which some files change vs others, which will be discussed in Section 4.3. 4.2.1 Keyed Hash Design As we worked on developing the regular hashing system it occurred to us that a sophisticated piece of malicious code could potentially circumvent this approach by just compromising the hashing 25 executable to either ignore their file or return a correct hash back to the computer, which would cause it to ignore any files it corrupted. This can be visualized in Figure 2.2. To potentially make it more difficult for such sophisticated code to work or even force it to add such a delay to prevent a hashing tool from detecting it we decided that we should investigate using a keyed hash instead of one of the simple hashing mechanisms. BusyBox sadly did not have any keyed hash tools provided with it’s package of new shell tools. So to provide a keyed hash for at least investigatory benefit into how much longer it would take to run over the simple hashing mechanism we built a shell script that implemented the HMAC algorithm that would take a given key and file path as it’s parameters and then spit out a keyed hash from those inputs. 4.3 Results and Discussion Table 4.1: This table shows the different average runtimes for the various hashing mechanisms investigated as part of the file integrity based detection method. It also shows both the first run or initialization runtime and subsequent comparison check runtime. Algorithm cksum md5 sha1 keyed hash Initialization 377 secs 375 secs 366 secs 372 secs Comparison Check 357 secs 384 secs 380 secs 378 secs To get results for comparison we ran the prototype with each of the given hashing mechanisms. Additionally, we ran it with a cold database to initialize the system by computing all the hashes and then insert them into the database. We then also investigated the completion time to compute all the hashes and compare it against these stored hashes. We did this for each of the hashing mechanisms a few times to compute an average completion time for each of the hashing methods, initialization time, and comparison check time. Table 4.1 shows the results that we observed from the prototype for these different cases. 26 The results are a bit surprising compared to what was expected. We expected the initialization to take longer than the comparison check as the initialization requires more writes to the database, which tend to be slower. But seemingly this wasn’t the case. Another interesting result is that there is no real difference in run time between the various hashing methods. This is counter-intuitive; however, we believe the bottle neck for the runtime was not the database or even the calculation time for any of the hashing methods but instead the limited USB 2.0 connection found on smartphones today. The benefits are that means we could use the more computationally secure keyed hash without any noticeable delay over the other methods. Additionally, we can determine which files have changed in far less than 2 hours and as we’ll discuss further in section 4.3.1 this also will greatly reduce the number of files that would need to be scanned by any computer based scanning tool that wished to employ this mechanism. 4.3.1 File Change Frequency An additional consideration we investigated was looking at file change frequency. We accomplished this by running the prototype with one of the hashing mechanisms subsequently for about an hour and then looking at which files changed. Then continuing this and running it every hour a few more times. This gave us a reasonable picture of which files changed constantly, and through a day. We also ran it once a day across a few days to get an idea what files change on more a daily basis. These results have a lot of factors that impact them that could be completely subjective; however, we weren’t wanting to use this as a way to potentially further improve the previous methodology by suggestion that there are some files that constantly change, others that change less often, and even some that rarely change if at all. This allows us to prioritize scanning files who have changed that rarely change first and then continue progressing through files in order of how often they tend to change. The thought behind this is that if a file that has never changed before suddenly changes it could potentially be malicious code behind it and should take top priority. This 27 priority queue wouldn’t add any extra overhead really to the prototype but could benefit users who don’t always let the reduced scan complete as would insure the higher risk files got scanned first. We observed that there were in fact files that changed constantly and looking closer at these files most tended to be temporary files used by the system or other apps. Additionally, there were some files that never changed over even the week of checking. This indicates that our proposed approach of using a risk based priority queue for scanning files could in fact be built and deployed in such a system. 4.4 Future Applications As may be apparent to any current smartphone owner, the use behavior to need to plug in your phone sometimes more than just daily to sync it with calendars, contacts, etc is no longer even considered by most smartphone users as everything has migrated to the cloud. As such this approach has lost it’s effectiveness in it’s current approach; however, it could be applied to future applications making use of the cloud. • Cloud Based Virus Scanner. It could be used as part of a cloud based virus scanner in it could reduce the number of files it needs to scan and potentially offload to the cloud by using file integrity based mechanisms to reduce the number of files. • Local Network Based Virus Scanner. Recent benefits of the iPhone and some Android phones have discussed that you can sync with iTunes or similar software while on the same Local Area Network or LAN. It is possible that a future product could do a similar process of scanning your phone for malicious code from the LAN. And this approach could greatly reduce the number of files that would need to be scanned by such a tool. As there are a number of future applications that we have thought of and potentially more as file integrity is a commonly used for many things and will likely continue to be used in the future. 28 4.5 Limitations We never addressed how to handle if the tool we are using for hashing was compromised, but based on the consistent completion times for the comparison checks, and the variety of steps that malicious code would have to take to return a valid or unchanged hash back to the computer we would expect to see a significant delay that could be detected and throw up a flag alerting the user that malicious code has potentially corrupted the hash mechanism. Though the keyed hash based detection technique is quite strong. However, a strong rootkit can evade detection by keeping copies of the unchanged files in addition to the modified files. In such a case, the rootkit can easily compute the correct keyed hash since it has access to the unmodified file. Note that the rootkit can avoid detection of modified file by simply infecting all system calls to prevent showing the presence of modified files or memory occupied by those files. It is possible to come up with some adhoc solutions to address these problems. For example, to ensure that the amount of free space available as reported by the operating system is correct, a user may write some data into all of the reported free space and then read it back to verify that the writes actually took place. However, such adhoc solutions are too expensive in terms of time and potential long term effects of writing and deleting from flash storage. Again the whole motivation of this research is to limit the negative impacts on a user in order to detect malicious code being prevalent on the system. Another thought is that the time for a system to hide itself might be statistically significant in comparison to the normal completion times to run the given hash mechanisms that the sophisticated malicious code has now hidden itself from. And through this observation we could at least throw up a flag that malicious code has potentially corrupted the users phone. Chapter 5 General Power Based Detection 5.1 Motivation for General Power Based Detection In general, it is extremely difficult to detect the presence of strong rootkits, since the entire system software of the phone is under the rootkit control. The only way to detect such rootkits is to observe a phone behavior from an external source and look for abnormalities. For example, applications may run slower when a rootkit is present, or there may not be enough memory left over to store a large file if the rootkit is occupying a significant amount of space. With this power usage detection technique we monitor the rate of battery drain to detect abnormal situations. The key idea here is that a malware or rootkit will result in a noticeable and anomalous power usage in comparison to the regular power usage. This technique first develops a profile of normal power usage of a phone by collecting power usage data over several weeks, and then uses this normal power profile to detect any significant deviations in power consumption. In addition to root kits a large quantity of malicious code behaviors would also consume significant amounts of power and would likely be detectable. This observed behavior of known malicious code and the expected power drain that such behavior would cause is the primary motivation behind using power as a mechanism for determining if malicious code is present on a system. 5.2 Design and Implementation For general power based detection we initially just wanted to quickly gather data and observe if it could be an effective approach. The reason behind this is that as it would likely be used as 30 Figure 5.1: Battery level change for Day 2 part of a larger system. To this end we made use of an existing app on the google play store called Battery Graph.[7][2] Battery Graph offers the ability for their app to collect the battery level in regular increments ranging from every 1 minute to every 5 minutes. We opted to use every 5 minute battery level data collection as we wanted to limit the frequency we recorded data. The app then allowed you to export data to a file but only the past weeks worth of data, which made it difficult to get significant participants to collect data during our general power based detection investigation push. As such all the data and analysis of this section will be for only one phone. We did revisit general power analysis again with the use of data collected from the stand alone thesis app, which gathered data from over 100 participants. Once we gathered enough data from a month’s worth of use we combined all the data export files into a single file and then used that file to get all our observational data from. 5.3 Results and Discussion From the collected data we first decided we needed to look at what the battery level did through a day to see if there was anything beneficial we could take away from that. Figures 5.1, 5.2, 5.3, and 5.4 are just a subset of daily graphs that give a general picture of the way the battery 31 Figure 5.2: Battery level change for Day 9 Figure 5.3: Battery level change for Day 15 32 Figure 5.4: Battery level change for Day 18 Figure 5.5: Graph of percent change in Battery graphs over 2000 samples 33 Figure 5.6: Figure illustrates the success percentage for given time ranges investigated against the power naive location tracking malicious code simulator. graphs for each day look. The key take away from looking at is they are all different but at the same time have a similar look. We couldn’t see any likelihood from looking at all of the data of using a day to day power use graph comparison as a method for detecting anomalous behavior. As though the general behavior is likely similar they tended to vary a lot day to day, which could be driven by anything from the user waking up at a different time or making more phone calls in a day. As such we decided that looking at these graphs and the data in this fashion wasn’t going to be very beneficial. To this end we decided that the data that we really wanted to look at was the change between data points in terms of power drop. Figure 5.5 is a graph of this change graphed across 2000 data points. The key observation for us was that in only 4 occurrences did the power drop more than 4 percent in a 5 minute increment or only 0.2% of the time. This indicated to us that the majority of the time the power drop observed would be minimal and we could potentially classify these significant spikes as anomalous and indicating a likelihood of being malicious code. We then decided to collect power data for the two malicious code detectors we had at this point the SMS spam bot and the power naive location tracker malware. This allowed us to observe how successful such an approach might be at detecting these malicious code simulations. When 34 Figure 5.7: Figure illustrates the success percentage for given time ranges investigated for capturing the SMS Spam malicious code simulator data. Figure 5.8: Figure illustrates what percentage of normal data would be considered malicious at the different cutoffs for the time ranges investigated. 35 we first observed the simulation data against the 5 minute increment data one thing we observed was that it wasn’t very successful but had a very low quantity of data that could be represented as false positives at different rates of change we consider all data with a greater rate of change being anomalous at. We decided that our malicious code simulations tended to have a sustained power drain vs instantaneous change so we then investigated the success and false positive percentages at different time increments ranging from the original 5 minutes up to 30 minutes. This new view on the data had two noticeable impacts one was that as the window of time we observed increased the quantity of malicious code captured increased. The other impact was at the same time the false positive percentage observed also increased. This seemed to indicate for general power use that there is a significant trade off between success and false positive percentage. In Figure 5.6 we can see the observational data of how successful the different time windows were at capturing the power naive location tracking malware simulator. This simulator was easily captured 100% of the time for time windows exceeding 20 minutes at a cutoff of 2.0% or more drop being flagged as anomalous. In Figure 5.7 we can see this same comparison but with the SMS spam trojan simulator. This still had a statistically significant capture percentage at the 20 minute window and cutoff of 2.0% of over 50% of the data captured and identified as anomalous. Finally, the false positive capture percentages can be seen in Figure 5.8 which also show that a reasonable false positive percentage observed for all the windows of time at the above 2.0% cutoff, and was about 0.6% false positive percentage at the 20 minute window at the 2.0% cutoff. 5.3.1 Statistical Approach with New Data In the recent deployment of a stand alone project as part of the thesis work described further in Chapter 6 we collected power data when the battery changed, instead of on a regular basis. We did this to try and reduce the impact the data collection app had on the battery drain observed by the user. We also got a large quantity of phones to run this app exceeding 100 phones currently reporting data back. Additionally, got a number of individuals to simulate malicious code that was also reported back as simulation data. Again further details of the design and implementation of 36 this project is discussed later; however, through post processing we were able to get data segments and average rates of change to revisit the general power analysis through the statistical driven model used in our later project. Additionally, see the success of general power analysis using this statistical model and the new power aware location tracking simulator that is more conscious and power conservative than our earlier approach. Table 5.1 contains the resultant data from the phones that we could test simulation data against. The interesting observations of the data from Table 5.1 are that the false positive percentage calculated for the data is extremely low for both sigma values. A sigma value being the number of standard deviations beyond the mean we then consider any rate of change to be anomalous at. Secondly we could for a number of the phones detect the SMS spam trojan; however, we were unable to detect the new power aware location tracker on any of the phones. This indicates two things that general power based detection isn’t as attuned to increases as it may observe significant high power use events that would drive up the average and standard deviation for the phone. And our new power aware approach to location tracking has significant power savings and would be more similar to an approach we would expect future malicious code of this type to take going forward to try and hide from power based malicious code detection techniques. The fact that general power based detection is effective to some extent but not nearly as successful as we had hoped lead to some of the future investigations and the development of the thesis work discussed later. But the nice the about it is that it has a really low false positive percentage and could detect some of the malicious code simulations for a few of the phones. 5.4 Future Applications This work is the basis and foundation of the location and time domain based work that comes later in this paper. But generally the understanding gathered from general power based detection is that we can in fact use power as a method for detecting malicious code on a smartphone; however, we need to either improve the sophistication of the approach used for anomaly detection or add 37 other variables to the detection as is the case with location or time domain based techniques that add a new way to group the data based on different ways use their phones in differing factors. 38 Table 5.1: This table shows the effectiveness of general power analysis through the use of statistical analysis with sigma values. Device Nexus 4 (1) Nexus 4 (2) Nexus 4 (3) Nexus 4 (4) SAMSUNG-SGH-I727 SAMSUNG-SGH-I747 Galaxy Nexus SCH-I535 (1) SCH-I535 (2) Sigma 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive Percentage 0.46 0.46 0.54 0.43 0.56 0.54 0.39 0.31 0.28 0.26 0.23 0.23 0.30 0.24 0.36 0.36 0.65 0.57 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected - Location - Chapter 6 Location-Based Power Detection 6.1 Motivation for Location Based Power Detection Our investigation of the general power based detection technique lead to the conclusion that additional details were needed to more finely tune the power profile. To that end, we considered what factors changed the way a person might use their phone, such as location. For instance, they would likely use their phone differently at home than at work, or when driving to work. This is the motivation behind location based power detection. By developing a normal profile for locations a user visits, we were able to improve the accuracy of detecting malicious code and continue to have a low rate of false positives. 6.2 Design and Implementation We created two different prototypes to investigate location-based power detection: a data collector, and a standalone app that built on that foundation. For both prototypes, every design choice prioritized minimal power use and minimal impact to the user while the prototype was running. This was successful to the point that no measurable impact was caused by the app in tests. 6.2.1 Design Choices Some of these design choices are only for the standalone project, and they are identified as such. Additionally, certain design features are merely details of implementation (such as malware 40 simulator integration), and do not directly relate to reducing the apps impact on the phone. Event Triggering Utilizing the Android OS intent system that allows for the service to be signaled when a specified event occurs, such as the battery status changing or the location changing. Below are further descriptions of each of these intents, implementations, and additional details each intent provided. Using event triggering also means we are not having to constantly poll the system for data and can allow the app to not do anything when an event hasn’t occurred. Additionally, our investigations have shown that during an event triggered even we only spend an average of about 400ms per event to process and record each event. This is not a lot of time, and the benefit is that it means we are using a very minimal amount of CPU time to record the data when an even does occur. Battery Intent For the battery intent a receiver class was created that would be instantiated any time the battery status changed. This would do different things based on the classification of the event that changed the battery status. Plug Status If the phone was plugged in or unplugged then this would be recorded appropriately either in a file for the earlier project or in the appropriate database table for the standalone project. The biggest benefit of the plug status is that we could drive secondary services and events based on the plug status. As will be discussed in a later design choice we used these events to either start phone processing of data and sending to the central server or stop it depending on the plug status. OS Event If the phone was being shut down or rebooted this would be recorded correctly as an OS event in the file or database depending on the project. This event was recorded because we didn’t want to include in our data a segment that spanned the phone rebooting or being shut down for a period. Accordingly, a shutdown event was used as a signal to start a new segment. Battery Changed The most important battery intent was the battery status intent 41 that indicated the battery level changing. From this intent we parsed out additional information to get a full picture of the status of the battery at this status change. In addition to the battery level we also recorded the current plug status, battery status, voltage, and current. Some of this data was used to facilitate parsing out discharging power segments for statistical calculation; however, a lot of the data (such as the voltage and current) was recorded but as of yet has not been applied to any of the research conclusions. The reason for not using the voltage and current is that not all phones provided access to this information, and this inconsistency lead to it not being applicable large scale for any phone. But as it was no real cost to collect this data, we opted to collect it for potential future investigations. Location Intent The location wasn’t really done through the intent service; however, we made use of Android’s locationListener class that allows us to set up an action when the location has moved beyond a set distance from the previous location. The benefit of this is that we don’t constantly have to be polling for the location. We used a distance of 1500 meters as the event notification distance to cause the location service to wake up and make use of the new location. Coarse Location To further limit the power required to get the location information, we only made use of the coarse location features of Android that dont utilize GPS based location information. The benefit of this is that it only relies on WiFi and cellular network location data, which uses next to no additional power since the phone is constantly making use of this data anyway. Since we were using a very large bubble for a location, the location data did not need to be extremely accurate for the data collection to function correctly. Data Recording The standalone project improved over the earlier just data collection project by providing a sophisticated method for getting data back from the users who were run- 42 ning the project. The earlier project just recorded all the battery and location data to a single file on the users SD card. This necessitated a lot of trouble to process the data and more importantly determine if locations were previously visited, etc. The standalone project made use of the SQLite database feature available in the Android API to make data manipulation simple on the phone, and make sending the data back easier. This had a number of benefits: • Made it quick and easy to store new data. • Made sending data back to the central server simpler (this will be discussed later). • Allowed for tables for individual locations to be created so data was already parsed that helped reduce overhead of processing data. • Table of locations visited that allowed for easy verification if a location had already been visited without a lot of additional work. • Allowed integration of simulator, as simulation data could be recorded in its own table easily. • Used far less space in the SQLite format than as a data file on the SD card (the data file approach was used for the just data prototype). • Worked on all phones, as some newer phones don’t have external SD cards, and have varying paths to the SD card if they do have one. As such, not using the SQLite database approach would have greatly reduced the phones the standalone project could work with. Data Processing The data collected by the phone needed to be processed on the phone in order to drive the statistical based detection mechanism used for the standalone project. This necessitated querying all of the data in every table to ensure it had the requirements to be considered a viable location, and then processing the data to calculate the statistics. To reduce the time spent doing this processing, as well as the impact to the user, a number of 43 design choices were utilized. Data Check To reduce the overhead of data processing, if the data for a given location didn’t meet the training requirements of at least a months worth of data and a minimum amount of time spent at that location during the month, then it immediately moved on to the next location without doing anything with the data. Segment Table Created a table to hold segments already created and processed from the tables for each location. Data that was successfully stored in this table was then flagged as processed in the table to prevent it being included in future processing. The benefit of doing this is that segments are maintained after they are parsed out of the data, greatly reducing the overhead of processing as the quantity of data continues to grow. This also allows data that has already been processed to be filtered out of future queries in order to reduce the resources needed to do the processing. Plug Status As mentioned with the battery intent, we can discover if the battery intent was caused by an event such as the phone being plugged in or unplugged. The benefit of having this information, beyond triggering the end or start of a segment in the processing, is that the processing can be initiated when the phone is plugged, which avoids the power drain that might otherwise result from the processing of the data. We also built in a mechanism to halt the processing without negative impacts on the integrity of the data if the phone gets unplugged during the process. Threaded To both reduce the impact on other applications, and allow for the full potential of processing (especially on newer multi-core phones), we made use of Android’s AsyncTask class, which spawns a new thread for processing. In this case, each new thread processed a single location. This works as a background task that is designed to allow other threads to have higher priority, but at the same time allow for potentially more than one location being processed at a time. In addition to making the processes threaded in the background we also made use of a WakeLock from the system to keep 44 the CPU fully active even when the screen is off so that processing can make full use of the processor while it is doing this multi-threaded and plugged processing. This also prevents issues that were observed where the processing would halt due to the CPU going to sleep if the WakeLock was not employed. Data Sending The key design addition over the earlier work that the newer standalone project provided was receiving data back from the phones running the project. As mentioned, the earlier project stored everything in a data file. This file had to be manually copied from the SD card and mailed to us by the user in order for us to access the data. The goal of the standalone project was to make it as non-interactive as possible. As such, we needed a way to retrieve the data that would be efficient, successful, and able to distinctly identify the phone. This lead to some of the following design choices for getting data to send back successfully: HTML Based Send Using the HTML Post mechanism to send the data back row by row allowed a few things to happen. Firstly, we could get a success response from the server if the data was successfully received and stored on the central server side. This allowed for the next design choice to be effective. Second, we could send a row of a table at a time and then insert a row at a time on the server side, which reduced the code complexity for sending data. This resulted in significantly more sends than packaging the data into a larger server Post message would have; however, this ensured we could send a single row if that was all that needed to be sent, rather than having to wait for larger quantities of data to potentially be collected first. Additionally, sending single rows reduced the overhead of trying to implement scalable data packets for both the phone side of sending and the server side. Only Send Once By making use of the HTML Post response from the server, we could parse out a success or failure on the send. As such, if the send succeeded we could flag this row as sent in the database, preventing duplicate sends. 45 Threaded The original reason we learned about the AsyncTask is that in more recent versions of the Android OS, Android requires that all HTTP communications occur wrapped in an AsyncTask class. The benefit for this is that it won’t hang the UI for a user using the application while HTTP traffic occurs, since it isn’t part of the application or service process. Making use of AsyncTask, we were able to make the sending process multi-threaded and make better use of the processor. We additionally made use of the WakeLock (mentioned for data processing) for sending to allow full use of the CPU while the screen was off. Plug Status We would only start sending data while the phone was plugged in. Additionally, a mechanism was built in to stop future sends if the phone became unplugged, to avoid any unnecessary power drain. WiFi Status Initially, we required a users phone to be connected to WiFi to send, in order to avoid any data charges that could potentially be incurred by the app sending out large quantities of data over the cellular network. We later made this optional, as some users stated that they never used their WiFi since they had unlimited data. When the users were making use of WiFi, we utilized a WiFi Lock that would ensure the power to the WiFi antenna wasn’t reduced when the phone screen was off. For newer phones running a more recent version of the Android OS, we enabled the high power WiFi lock to allow sends to happen even faster. Since we were requiring the phone to be plugged in to be sending, power use wasnt an issue. Privacy We wanted to ensure that all data that a user transmitted back to us was obscured such that we could not know anything personal about the phones running the standalone project; however, we still needed to be able to distinguish distinct phones. To this end, we used two identifying values of the device: the serial number, and the SIM serial number (if one existed). We then concatenated those into a single string that was then hashed via the SHA1 hashing mechanism. This provided us a unique 46 identifier to for each phone in the data reported back to the central server. Additionally, we did not want the actual GPS coordinates of the locations a user visited. Instead, the data was reported back as distinct location identifiers, which were integer values representing the locations visited by the user starting from location 0, and being incremented for each subsequent location a user visited. This provided the segmented location information without giving away where a user actually visited. Notifications The application has a user-set sigma value and flag that the user can enable to allow the application to throw up notifications if an anomalous event is detected at the user-selected sigma value (the sigma value being the number of standard deviations beyond the mean after which data would be considered anomalous). When the user clicks on the given notification they are given a dialogue with the opportunity to mark the flagged app as trusted or untrusted. This gives general feedback as to whether the app is potentially malicious, or a false positive based on the users feedback. In addition to the notifications given to the user, the app records any possible notifications from sigma levels of 2.0 up to 3.75 in increments of 0.25. This was done to see what notifications would have been captured even if the user hadn’t enabled them, or if they were at a lower sigma value. This was also used as the basis for the observed false positive rates discussed later. Malware Simulator Integration When we began to search for users to run the malware simulator, we realized we needed a simple way to collect this data as a simulation, in addition to making it simple for the user. As such, we originally wanted to include it as a menu option in the standalone project; however, attempts at this resulted in conflicts with an existing service in the standalone project. Instead, we opted for inter-app communications in Android. This was accomplished through the Android Interface Definition Language or AIDL. AIDL provided the ability to craft a shared Android method that both apps could make use of. This allowed the standalone project to be signaled when a simulator started and stopped, in addition to recording which simulator it was, in order to store the data as 47 simulation data of the appropriate type. 6.2.2 Deployment and Recruitment Once the standalone project was built and the basic data collection side of things was working, we deployed it onto the Google Play store to allow the beta test users to get automatic updates.[8] Once we implemented all of the planned features, we began actively recruiting people through social media such as Facebook and Google+, and through the XDA Android Developers Forums.[12][5][6] This has been extremely successful, as will be discussed in Section 6.3, and is the only method for recruitment utilized. As part of this process we went through the Institutional Review Board (IRB) for approval, since the data we were collecting was generated by humans on their smartphones. Though it was a troublesome process, we were thankfully granted an exemption from the IRB because they decided that our work was not human testing. This was helped in part due to our privacy considerations when data was sent back to the central server. 6.3 Results and Discussion Table 6.1: This table shows the interesting data points collected as part of the standalone project as of 5/17/2013. Statistic Phone Data Collected Active Phones Phones with 30 or more days of data Notifications Value 1382412 108 47 0 The most interesting thing that the results initially gave us was the substantial quantity of data we had collected. In Table 6.1 we can see these values. Additionally, of the 100 phones shown, 48 Figure 6.1: Figure shows the various countries currently running the standalone project from the Google Play store statistics.[7] 49 over 45 had been reporting data back for more than 30 days, which is the minimum period required for any location to be considered valid. These phones could begin processing their new data as potentially anomalous and record notifications. They could also alert the user to anomalous events and get user feedback. The most interesting statistic collected in Table 6.1 is that no notifications were recorded for any of the phones that met the threshold for beginning to detect malicious code. The lack of alert notifications for these users is considered a good sign because most of the individuals running the app were very security-conscious, which means that there was a very low likelihood that their phones would be infected. Thus, we interpreted this to mean that there were no false positives generated.This is good support for our earlier claim that low profile location based malicious code detection techniques offer good detection with a low false positive percentage. The next interesting bit of statistics from data that the Google Play store provided was the countries of origin for users who installed the application. In Figure 6.1 you can see a visualization and break down of some of the countries that have installed and are currently running the standalone project. These are interesting statistics as they show how far-reaching and varied the users are who are running the standalone project. After getting a general picture of how much data was collected, and who generally were running the project we moved on to investigating the basis of the motivation and thesis of location based power detection being an effective method for detecting malicious code with a minimal false positive rate. The first step of this was looking to see that the power use for the various users running the application did vary between locations. As part of the post processing where the segments of power were parsed for each location, the average and standard deviations for each location were calculated again at the central server in the same way it was being done on the phone. We then could plot the average power use at each distinct location that met the training data minimums of 30 days and at least 50 data segments. These were then plotted in a histogram. In Figure 6.2 and Figure 6.3 we can see the average rate of power change per second graphed for two of the phones that reported data. Included in Appendix B there are more of such graphs for most of the other phones we have simulation data from. There 50 Figure 6.2: Figure shows the average rate of change for various locations visited by the phone SGH-727. Figure 6.3: Figure shows the average rate of change for various locations visited by the phone SGH-747. 51 were a few that were not included even in the appendix due to having so many viable locations it wouldn’t fit into a graph that was readable. These rate of change by location graphs illustrate that in fact the premise of this location based power detection approach that users tend to use their phones differently depending on where they are in fact holds true. As you can see in Figures 6.2 and 6.3 the average rate of change at each location is different and in some cases significantly different. These graphs are a great supporting visual that the basis of this approach is viable since users actually do use their phones differently depending on where they are. 52 Table 6.3: This table shows the effectiveness of location based power detection through the use of statistical analysis with sigma values for the Nexus 4 (4). It also includes the effective observed false positive rate from the model running on the phone as a standalone project and the calculated false positive % when using the statistical model against the collected data. Location 0 2 7 9 13 17 19 21 22 27 30 41 61 62 65 80 110 113 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.52 0.52 2.38 2.38 2.86 2.86 2.44 2.44 2.44 2.44 3.08 1.54 1.02 1.03 3.64 3.64 3.41 3.41 1.39 1.39 1.67 1.67 3.23 0.00 3.57 3.57 3.51 2.63 2.91 2.91 2.47 2.47 1.61 1.61 3.57 3.57 Observed False Positive % 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected 53 We then made use of the simulated malicious code to really investigate the effectiveness of detecting malicious code. This was basically done by calculating the statistics for every location and then testing the effectiveness or ability to capture a statistically significant quantity of the simulated malicious code at different sigma values. This was done for all the sigma values that were being used on the standalone project ranging from a sigma value of 2 up to a sigma value of 3.75 in increments of 0.25. In the included graphs we have only include the sigma values of 2.5 and 3.0 for discussion as 2.5 was the default setting in the standalone app and a sigma of 3.0 is commonly used in a known anomaly detection machine learning approach that is effectively what we are doing. In Table 6.3 and 6.5 we can see the results for the various locations of the two phones with the most data collected. These offer the best basis for drawing conclusions; however, the conclusions hold for the other phones we got simulation data for that are available in Appendix A for reference. Table 6.5 has the most interesting results in that it’s calculated false positive percentage is 0.0% for nearly all of its locations in addition to the observed false positive percentage. The calculated being the percentage of data in its normal profile that would be deemed anomalous if it occurred again. This phones results are a clear indicated at how effective location based power detection is at detecting malicious code as well as having a minimal false positive percentage. This is also the power aware location tracking malicious code that was detected in every case as was the SMS spam bot. The second phone with data in Table 6.3 also helps reinforce this as it also had an observed false positive percentage of 0.0% and it also had a very small calculated false positive percentage for all of its data. It also was successful in all cases of detecting both kinds of malicious code simulations. Not all the phones detected both kinds malicious code for all of its locations, as is the case for some of the phones in the Appendix A these though could still detect malicious code for nearly all of their locations and in one of the examples it could detect it at the lower sigma value threshold but not the sigma 3.0 threshold. But these phones still have low false positive rates and could detect malicious code at the vast majority of their locations. It is likely that these locations saw significant 54 fluctuations in power use in their training data that resulted in them to not see significant power drain as anomalous and thus the power aware location tracking malicious code managed to evade detection by using a minimal amount of power in these such locations. To summarize, using this basic statistical based model to look for anomalous events we were able to detect both kinds of malicious code simulations a statistically significant number of times for nearly all the phones locations that had gathered enough training data. The results also showed that there was a very low calculated false positive percentage as well as a 0.0% observed false positive percentage. And finally, with these two factors taken into account this location based power detection technique is highly successful and proves our theory that low profile detection techniques can be successful with a minimal impact on the user as this approach had an unmeasurable impact on the power drain observed in testing. Additionally, all high power processing, data sending, etc occurs only when plugged in. 6.4 Limitations This location based approach is extremely effective as the data shows; however, there are a number of limitations, assumptions, and kinds of malicious code that it is optimized for. The following limitations are: Advanced Malware Advanced malware or a rootkit has significant access to the underlying phone system and could make the reported battery data appear to be normal. Another option is that this kind of malicious code could uninstall our app and there is no defense to prevent this currently. Also if the approach proposed in this work is know then malicious code developers could work to write their apps to use less power potentially making it more difficult to detect or make it use less power in a sustained manner. Kinds of Malicious Code The malicious code this approach targets are the sustained and significant security risk kinds of malicious code. This approach is limited in it would need a significant power drain to trigger an alert and many of the top apps that could be consid- 55 ered malicious only steal the phone number, phone ID, and contacts would only use a brief burst of power to send this data and then not do anything afterwards. The approach we discuss would detect apps that track a users location, send spam for sustained periods, and other such malicious code. Location Required This approach though highly effective fails to detect at locations that haven’t been visited frequently enough to meet the threshold for time or data segments to allow it to be a location used for detection. Quite a few of these locations may never generate data segments anyway as they may be along a daily commute where you move to a new location before the battery has a chance to drop and allow for a data segment to exist where the rate of change could be calculated. A solution for this could be to make use of general power analysis for the all the data that isn’t at a location that has met the training requirements. We tested this approach with the data collected as part of this project and it appears that using general power like this is just as effective as general power as a whole as discussed previously in this work. Existing Malware This approach operated under the assumption that no malicious code was present during the training period or already on the phone to start. As some of the known kinds of malicious code that we developed simulators to replicate their behavior could occasionally have a bursty nature, it is possible that these bursts though could increase the over all average of the data might still fall outside the normal range for the profile and be detected, but would make it less likely. Erratic Data Some phones in our study for location based power detection were not effective for all locations and this was also apparent in general and time domain based detection. But users who use their phones radically differently regularly may result in a normal profile with such a large standard deviation that it would require very high power use to fall outside the normal profile. As such this style of use would be a limitation that this work would not be able to adjust for. 56 Long Training Period Currently this work operated under the assumption that 30 days and a significant volume of data segments were needed to generate a good normal profile. It may be possible to only rely on the volume of data and ignore the length of time spent at a location as then some locations would have valid location specific normal profiles quicker as the locations a user frequents for long durations every day would become a useful location in significantly less time. This would allow for quicker improvement in the ability to detect malicious code than using general power based detection while the 30 days of data requirement is met. There is no justification for the 30 days beyond being an arbitrary value. 6.5 Future Application We can see this technique being integrated with future projects or current anti-virus projects as a method to determine when more sophisticated techniques should be employed or when a classic anti-virus signature based scan should be scheduled. 6.5.1 Future Improvements It is possible that further analysis of what apps are running in addition to the power usage data with this technique could yield a better approach for identifying the anomalous app versus the current approach that just flags the current app in focus on the assumption it would be the one most likely to be causing a power spike. Additionally, taking the data collected for this project even farther and applying data clustering algorithms to get an improved picture of which variables of the ones collected cluster well and thus could be used as additional factors for anomalous event detection with as the best way to potentially improve this technique would to add another aspect of user use behavior to the what makes the normal profile for a given location. A consideration would be including the approach in chapter 7 using time domain analysis to further improve the normal profiles. 57 Table 6.5: This table shows the effectiveness of location based power detection through the use of statistical analysis with sigma values for the Samsung Galaxy S3 SCH-I535 (2). It also includes the effective observed false positive rate from the model running on the phone as a standalone project and the calculated false positive % when using the statistical model against the collected data. Location 13 17 50 77 93 111 117 119 125 136 139 206 211 248 297 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 9.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Observed False Positive % 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Chapter 7 Time Domain Based Detection 7.1 Motivation for Time Domain Based Detection The motivation for time domain based power detection is the premise that a user uses their phone differently depending on what time of day it is. This was derived from the earlier premise that a user uses their phone differently based on where they were located and that it makes just as much sense that the time of day could play a factor. Even potentially what day of the week it is; however, we did not investigate the time domain in context to the actual day of the week. 7.2 Results and Discussion The data was actually processed out of the data collected as part of the larger stand alone project for the location based power detection approach. Each segment of power was segmented out and a python date time element that represented the midpoint in time between the start and end of the segment to represent when the segment took place. This allowed group rates of change segments by the hour of the day or time of day for graphical representation or effectiveness check against the simulated malicious code. Figures 7.1 and 7.3 represent box plots for the hours of the day in terms of rate of change per second observed during each hour of the day. These plots are for the same two phones discussed direction in chapter 6 as they have the most data recorded as well as largest quantity of simulation data. These hourly graphs do illustrate that the use through the day varies a bit between hours of of the day. Figures 7.2 and 7.4 show the power use in a more grouped format if we broke the hours 59 Figure 7.1: Figure shows the power distribution based on the hour of day for phone Nexus 4 (4). 60 Figure 7.2: Figure shows the power distribution based on the time of day for phone Nexus 4 (4). 61 Figure 7.3: Figure shows the power distribution based on the hour of day for phone Samsung Galaxy S3 SCH-I535 (2). 62 Figure 7.4: Figure shows the power distribution based on the time of day for phone Samsung Galaxy S3 SCH-I535 (2). 63 of the day into quarters or every 6 hours. This provides us a picture of the power use at night, in the morning, afternoon, and evening. This also provides a distinct picture that there is a slight difference in power use at different times of day; however, this difference isn’t nearly as distinct as location based. Table 7.1: This table shows the effectiveness of time domain based power detection through the use of statistical analysis with sigma values for the Nexus 4 (4). Time of Day Night Morning Afternoon Evening Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.35 0.35 2.53 2.07 2.94 2.21 0.73 0.66 SMS Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected This difference not being as different as location based detection might be a driving factor behind the effectiveness of this time domain based detection technique’s effectiveness. As you can see in Table 7.1 the detection worked for most of the times of day except for night, when the phone would likely be plugged in anyway so likely had limited data. And had a very small false positive percentage especially compared to a few of the location based technique’s false positives for a few locations for the same phone. But this table shows a potentially successful technique; however, in Table 7.3 we see this same approach applied to another phone and the results couldn’t detect malicious code in any of the cases. There may be driving factors behind this, but the clear indicator is that this approach isn’t consistent in its effectiveness as was the case for location based power detection. This trend of success doesn’t completely match up to this same analysis done against the hour of the day data grouping. In Table 7.7 we can see that for a few hours of the day we can 64 Table 7.3: This table shows the effectiveness of time domain based power detection through the use of statistical analysis with sigma values for the Samsung Galaxy S3 SCH-I535 (2). Time of Day Night Morning Afternoon Evening Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 1.96 1.96 0.80 0.80 1.46 0.60 3.25 2.49 SMS - Location - successfully detect malicious code; however, for most of the hours of the day we still can’t. And in Table 7.5 we can see that it is still effective for almost all hours of the day. But it still isn’t nearly as effective as location based power detection. 7.3 Future Applications We see this approach potentially being beneficial as an addition onto like the location based approach to further fine tune the normal profiles and thus potentially further improve the effectiveness and further decrease the false positive percentages. Based on the results of the effectiveness it is more beneficial to potentially use the hour of the day approach over time of day approach. 65 Table 7.5: Hour of Day effectiveness for Nexus 4 (4). Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.57 0.57 2.63 1.32 5.56 2.78 5.26 5.26 8.33 8.33 2.56 2.56 2.70 2.70 7.14 0.00 2.84 2.84 4.84 4.84 1.41 1.41 2.00 2.00 3.75 3.75 4.84 4.03 2.20 1.65 3.85 2.20 3.47 1.73 3.41 1.70 0.86 0.86 2.66 2.33 1.99 1.99 0.53 0.53 2.74 2.43 3.09 2.41 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected 66 Table 7.7: Hour of day effectiveness for Samsung Galaxy S3 SCH-I535 (2). Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 3.57 3.57 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.17 2.17 1.74 1.16 1.09 1.09 0.77 0.77 0.64 0.64 0.25 0.25 1.79 0.77 2.06 1.61 3.37 0.90 0.86 0.86 3.56 2.80 4.17 2.78 3.13 2.56 3.43 1.87 3.11 1.95 5.36 2.68 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected - Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected - Chapter 8 Conclusions and Future Work This thesis seeks to improve malicious code detection for smartphones through proposing methods of malicious code detection that don’t have a negative impact on the user, yet offer a successful approach for detecting malicious codes prescence: Light weight anomaly detection techniques can have a statistically significant capture percentage and a low false positive percentage without many of the drawbacks conventional detection techniques cause. We next explain how we addressed this statement by summarizing this dissertations fundamental contributions. 8.1 Fundamental Contributions There were a number of fundamental contributions in this dissertation that can be summa- rized as follows. 8.1.1 File Integrity to Accelerate Computer Based Scans In order to harness an observed behavior most users had when using smartphones we provided a new application of an existing technique and showed how this could be use to significantly reduce the time to scan files from a computer when a user synced their phone. Additionally, we showed that a keyed hash could be used for additional security without any delays to the whole process. We also discussed how we investigated that there are frequencies to how often files change and this 68 could be used to prioritize scanning files that change less often first. And though users no longer sync their phones to computers we discussed how this could potentially be applied in the future. 8.1.2 General Power Based Detection Given our observation that users were trending not to use computers to sync anymore we then looked at the behavior exhibited by most known malicious code and realized it caused a significant drain to a phones battery. We therefore investigated the effectiveness of using this approach and then revisited this effectiveness with the statistical based model and less power hungry location tracking malicious code simulator. We were able to show that power based detection was promising but still needed improvement, but most importantly didn’t have a negative impact on the users. 8.1.3 Location Based Power Detection Realizing we needed to improve on general power based detection by adding another variable to further tune the normal profiles to increase accuracy we considered the behaviors we saw in the way we used smartphones. This realization lead to the concept that depending on where we were we used our smartphone differently. As such this approach we investigated normal power profiles at different locations. This was also done as a stand alone project and is the primary focus of this thesis. This technique shows great success at detecting malicious code, having an unmeasurable impact on the user in terms of battery drain, and resulted in low calculated false positive rates and observed no false positives while actually running as a stand alone project. This is a significant improvement over general power based detection and more importantly proves the thesis of this paper that low profile techniques can be effective at detecting malicious code with a minimal impact on the user and a low false positive rate. 8.1.4 Time Domain Based Power Detection We decided to take the data collected a step farther and investigate if another tuning variable could be the time of day a user uses their phone. This data had mixed results but did offer the 69 general understanding that users did in fact use their phones differently depending on the time of day; however, the differences weren’t as different as the location based power distributions as such believe this is the cause for the inconsistent effectiveness of this approach. 8.2 Future Work We now will discuss a few future avenues of work to extend or build upon the fundamental contributions of this thesis. 8.2.1 Additional Tuning Factors To build on the previous approaches presented in this these we could continue to find new tuning factors to improve the normal profile based on the way a user uses their phone. To do this we may actually need to collect more data and apply clustering algorithms to get a picture for which tuning factors have a significant impact. 8.2.2 Combining Tuning Factors We are not sure what future tuning factors might be useful to apply; however, based on the current work presented and discussed in this thesis we see the potential approach to overlap tuning factors into a single approach could be effective. To this end we could use location based power detection that in each location had time of day power profiles to potentially increase the normal profiles accuracy in each location. 8.2.3 Integrating With Another System So far all the work presented in this thesis and even future work primarily focuses on improving anomaly detection and then throwing up a red flag when potential malicious code might exist. To really put these approaches in a position to potentially fix, remove, or identify the specific malicious code it would likely need to be integrated with a larger project that employs more sophisticated methods for removal and identification of specific malicious code. The work presented 70 in this thesis would be a great way to determine when such higher impact systems needed to be used by the red flag triggering the use of these systems. 8.2.4 Known Malicious Code To really improve and test the effectiveness of the work presented in this thesis the true test would be to actually test it against the known malicious code in the wild that the simulators were built to replicate the behaviors of and additionally test against other new threats that have behaviors likely to fall into the spectrum that the work in this thesis should be suited for detecting. This would entail getting a grant for further research into this as currently relying on friends, family, and others to run the data collection and simulations is one thing, but asking them to install actual malicious code that could cost them money, damage their phone, and put their personal information at risk is not an appropriate way to do research in our opinion. As such a grant to get test phones and service for a period of time to collect data and then be able to install actual malicious code on would be a significant benefit to this work in the future. 8.3 Final Remarks To conclude, we have presented a variety of methods that aim to detect malicious code on smartphones without having a noticeable impact on the user. By succeeding in detecting the presence of malicious code with no noticeable impact our hope is that this could be integrated with other systems such that users will be more likely to install and use malicious code detection on their smartphones without having to worry about the tool to detect the malicious code draining their battery more than potential malicious code might. As such potentially leading to an improved and safer smartphone environment. Bibliography [1] Android developers. http://developer.android.com/index.html. Accessed: 2013-05-15. [2] Battery graph app. https://play.google.com/store/apps/details?id=com.modroid. battery&hl=en. Accessed: 2013-05-16. [3] Busybox. http://www.busybox.net/. Accessed: 2013-05-15. [4] Busybox git repo. http://git.busybox.net/busybox/. Accessed: 2013-05-15. [5] Facebook. http://www.facebook.com/. Accessed: 2013-05-17. [6] Google+. http://plus.google.com/. Accessed: 2013-05-17. [7] Google play store. https://play.google.com. Accessed: 2013-05-16. [8] Malicious code detector. https://play.google.com/store/apps/details?id= thesisproject.maliciouscodedetector. Accessed: 2013-05-17. [9] Netflix. http://netflix.com. Accessed: 2013-05-15. [10] Pandora internet radio. http://pandora.com. Accessed: 2013-05-15. [11] Se android. http://selinuxproject.org/page/SEAndroid. Accessed: 2013-05-15. [12] Xda developers forums. http://forum.xda-developers.com/. Accessed: 2013-05-17. [13] O. Aciicmez, A. Latifi, J.P. Seifert, and X. Zhang. A Trusted Mobile Phone Prototype. In 5th IEEE Consumer Communications and Networking Conference, 2008. CCNC 2008, pages 1208–1209, 2008. [14] A Baliga, L Iftode, and X Chen. Automated containment of rootkits attacks. computers & security, Jan 2008. [15] J. Bickford, R. OHare, A. Baliga, V. Ganapathy, and L. Iftode. Rootkits on Smart Phones: Attacks, Implications and Opportunities. [16] A Bose. Propagation, detection and containment of mobile malware. kabru.eecs.umich.edu, Jan 2008. [17] A Bose, X Hu, K Shin, and T Park. Behavioral detection of malware on mobile handsets. . . . international conference on Mobile . . . , Jan 2008. 72 [18] J Cheng, S Wong, H Yang, and S Lu. Smartsiren: virus detection and alert for smartphones. Proceedings of the 5th . . . , Jan 2007. [19] B Dixon and S Mishra. On rootkit and malware detection in smartphones. . . . and Networks Workshops (DSN-W), Jan 2010. [20] Bryan Dixon, Yifei Jiang, Abhishek Jaiantilal, and Shivakant Mishra. Location based power analysis to detect malicious code in smartphones. In Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices, SPSM ’11, pages 27–32, New York, NY, USA, 2011. ACM. [21] Adrienne Porter Felt, Matthew Finifter, Erika Chin, Steve Hanna, and David Wagner. A survey of mobile malware in the wild. In Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices, SPSM ’11, pages 3–14, New York, NY, USA, 2011. ACM. [22] Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. Riskranker: scalable and accurate zero-day android malware detection. In Proceedings of the 10th international conference on Mobile systems, applications, and services, pages 281–294. ACM, 2012. [23] B Halpert. Mobile device security. . . . of the 1st annual conference on Information security . . . , Jan 2004. [24] N Petroni Jr, T Fraser, and J Molina. Copilot-a coprocessor-based kernel runtime integrity monitor. Proceedings of the 13th . . . , Jan 2004. [25] Hahnsang Kim, Joshua Smith, and Kang G. Shin. Detecting energy-greedy anomalies and mobile malware variants. In Proceeding of the 6th international conference on Mobile systems, applications, and services, MobiSys ’08, 2008. [26] Alexey Kushnerov. Smart phone under threat of attacks. http://www.theticker.org/about/ 2.8220/smart-phone-under-threat-of-attacks-1.2174454, March 2010. Accessed: 201305-15. [27] Lei Liu, Guanhua Yan, Xinwen Zhang, and Songqing Chen. Virusmeter: Preventing your cellphone from spies. In Engin Kirda, Somesh Jha, and Davide Balzarotti, editors, Recent Advances in Intrusion Detection, Lecture Notes in Computer Science. Springer Berlin / Heidelberg. [28] J Oberheide, K Veeraraghavan, and E Cooke. Virtualized in-cloud security services for mobile devices. . . . in Mobile Computing, Jan 2008. [29] P Traynor, V Rao, T Jaeger, P McDaniel, and T La Porta. From mobile phones to responsible devices. Wiley Online Library, Jan 2010. [30] M.M. Williamson et al. Throttling viruses: Restricting propagation to defeat malicious mobile code. In Proceedings of the 18th Annual Computer Security Applications Conference, page 61. Citeseer, 2002. [31] L Xie, X Zhang, A Chaugule, and T Jaeger. Designing system-level defenses against cellphone malware. Proc. of 28th IEEE . . . , Jan 2009. Appendix A Additional Tables of Data Table A.1: This table shows the effectiveness of location based power detection through the use of statistical analysis with sigma values for the Nexus 4 (1). It also includes the effective observed false positive rate from the model running on the phone as a stand alone project and the calculated false positive % when using the statistical model against the collected data. Location 0 2 3 14 16 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 1.40 0.47 4.26 2.13 1.85 1.85 2.04 2.04 9.52 0.00 Observed False Positive % 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected 74 Table A.3: This table shows the effectiveness of location based power detection through the use of statistical analysis with sigma values for the Nexus 4 (2). It also includes the effective observed false positive rate from the model running on the phone as a stand alone project and the calculated false positive % when using the statistical model against the collected data. Location 1 2 4 7 10 13 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 1.72 0.79 1.77 1.42 3.92 3.92 2.79 2.79 5.88 3.92 3.08 3.08 Observed False Positive % 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected 75 Table A.5: This table shows the effectiveness of location based power detection through the use of statistical analysis with sigma values for the Nexus 4 (3). It also includes the effective observed false positive rate from the model running on the phone as a stand alone project and the calculated false positive % when using the statistical model against the collected data. Location 0 1 3 6 28 31 42 44 45 57 72 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.27 0.27 2.44 2.44 2.08 1.56 2.87 2.87 0.39 0.39 2.56 2.56 7.14 7.14 3.33 3.33 3.36 2.52 2.76 2.07 3.03 3.03 Observed False Positive % 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected 76 Table A.7: This table shows the effectiveness of time domain based power detection through the use of statistical analysis with sigma values for the Nexus 4 (1). Time of Day Night Morning Afternoon Evening Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.78 0.78 1.84 1.23 1.17 0.94 2.27 2.27 SMS Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected 77 Table A.9: This table shows the effectiveness of time domain based power detection through the use of statistical analysis with sigma values for the Nexus 4 (2). Time of Day Night Morning Afternoon Evening Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 1.65 1.65 0.47 0.47 2.67 2.16 1.55 1.16 SMS Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Table A.11: This table shows the effectiveness of time domain based power detection through the use of statistical analysis with sigma values for the Nexus 4 (3). Time of Day Night Morning Afternoon Evening Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 1.14 1.14 1.30 0.78 0.65 0.32 3.50 1.95 SMS Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected 78 Table A.13: This table shows the effectiveness of time domain based power detection through the use of statistical analysis with sigma values for the SAMSUNG-SGH-I747. Time of Day Night Morning Afternoon Evening Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.65 0.65 2.40 1.80 0.41 0.41 2.51 2.21 SMS Detected Detected Detected Detected Detected Detected Detected Detected Location - 79 Table A.15: This table shows the effectiveness of time domain based power detection through the use of statistical analysis with sigma values for the Galaxy Nexus (1). Time of Day Night Morning Afternoon Evening Cutoff 2.0 3.0 2.0 3.0 2.0 3.0 2.0 3.0 False Positive % 5.56 5.56 6.32 0.66 1.26 0.44 0.33 0.22 SMS - Location - 80 Table A.17: Hour of Day effectiveness for Nexus 4 (1). Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 1.96 1.96 5.26 5.26 2.33 2.33 5.56 5.56 3.57 3.57 3.85 3.85 5.88 2.94 3.85 3.85 3.23 3.23 5.17 1.72 6.35 3.17 2.30 2.30 1.64 1.64 0.98 0.98 1.47 1.47 2.86 1.43 4.00 4.00 2.82 2.82 1.54 1.54 2.90 2.90 2.17 2.17 1.69 1.69 4.17 2.08 2.44 2.44 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected 81 Table A.19: Hour of Day effectiveness for Nexus 4 (2). Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 2.82 2.82 2.44 2.44 4.76 4.76 4.17 4.17 5.56 5.56 0.00 0.00 4.55 4.55 4.35 4.35 4.00 4.00 2.27 2.27 4.35 2.90 5.75 1.15 2.00 1.00 3.05 3.05 2.55 2.55 2.50 2.50 2.98 1.79 4.55 4.55 3.92 1.96 1.32 1.32 0.82 0.82 3.25 3.25 0.76 0.76 1.12 1.12 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected 82 Table A.21: Hour of Day effectiveness for Nexus 4 (3). Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 2.04 2.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.11 0.00 3.12 3.12 1.96 1.96 6.41 5.13 3.85 1.92 0.90 0.90 0.70 0.70 3.17 3.17 2.22 2.22 3.87 3.31 2.35 2.35 1.17 0.58 2.17 2.17 2.07 2.07 2.74 2.74 3.80 3.80 2.90 2.90 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected 83 Table A.23: Hour of Day effectiveness for Samsung SGH-I727. Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 2.13 2.13 2.08 2.08 2.50 2.50 4.17 4.17 5.56 5.56 2.38 2.38 2.63 2.63 4.00 4.00 4.44 2.22 0.58 0.58 1.56 0.78 3.41 1.52 1.09 1.09 1.60 1.28 1.13 1.13 0.36 0.36 1.12 0.75 0.97 0.97 1.17 0.58 1.30 1.30 1.27 1.27 0.51 0.51 1.29 1.29 0.71 0.71 SMS - Location Detected Detected Detected Detected Detected Detected Detected - 84 Table A.25: Hour of Day effectiveness for Samsung SGH-I747. Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 2.00 2.00 2.70 2.70 5.26 0.00 0.00 0.00 8.33 8.33 7.69 0.00 0.00 0.00 0.00 0.00 5.00 5.00 3.57 3.57 3.45 3.45 3.70 1.23 1.46 1.46 2.72 1.36 1.18 1.18 1.66 1.10 0.68 0.68 0.60 0.60 1.95 1.30 1.27 1.27 3.12 1.25 2.34 2.34 1.80 1.80 2.56 1.28 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location - 85 Table A.27: Hour of Day effectiveness for Galaxy Nexus (1). Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.99 2.99 5.83 2.91 0.64 0.64 6.31 0.00 4.17 2.31 5.24 4.29 1.00 1.00 1.19 0.79 3.66 0.37 2.44 2.44 6.56 2.81 5.10 4.31 2.92 1.30 0.85 0.85 2.00 2.00 10.00 0.00 SMS Detected Detected Detected Detected Detected Detected Detected Detected - Location - 86 Table A.29: Hour of Day effectiveness for Samsung Galaxy S3 SCH-I535 (1). Time of Day 0 1 2 3 4 5 6 7 8 9 10 11 noon 13 14 15 16 17 18 19 20 21 22 23 Cutoff 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 2.5 3.0 False Positive % 0.90 0.90 2.74 2.74 1.35 1.35 0.00 0.00 11.11 0.00 0.00 0.00 0.00 0.00 4.00 0.00 1.27 1.27 2.02 1.01 0.83 0.83 2.03 2.03 0.62 0.62 0.52 0.52 0.78 0.26 0.60 0.30 0.28 0.28 2.43 2.43 0.48 0.48 1.28 0.64 0.53 0.53 0.33 0.33 2.11 2.11 0.72 0.72 SMS Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Detected Location - Appendix B Additional Graphs of Data 88 Figure B.1: Figure shows the average rate of change for various locations visited by the phone Galaxy Nexus (1). Figure B.2: Figure shows the average rate of change for various locations visited by the phone Nexus 4 (1). 89 Figure B.3: Figure shows the average rate of change for various locations visited by the phone Nexus 4 (2). Figure B.4: Figure shows the average rate of change for various locations visited by the phone Nexus 4 (3). 90 Figure B.5: Figure shows the average rate of change for various locations visited by the phone Nexus 4 (4). 91 Figure B.6: Figure shows the power distribution based on the time of day for phone SGH-I727. 92 Figure B.7: Figure shows the power distribution based on the hour of day for phone SGH-I727. 93 Figure B.8: Figure shows the power distribution based on the time of day for phone SGH-I747. 94 Figure B.9: Figure shows the power distribution based on the hour of day for phone SGH-I747. 95 Figure B.10: Figure shows the power distribution based on the time of day for phone Galaxy Nexus (1). 96 Figure B.11: Figure shows the power distribution based on the hour of day for phone Galaxy Nexus (1). 97 Figure B.12: Figure shows the power distribution based on the time of day for phone Nexus 4 (1). 98 Figure B.13: Figure shows the power distribution based on the hour of day for phone Nexus 4 (1). 99 Figure B.14: Figure shows the power distribution based on the time of day for phone Nexus 4 (2). 100 Figure B.15: Figure shows the power distribution based on the hour of day for phone Nexus 4 (2). 101 Figure B.16: Figure shows the power distribution based on the time of day for phone Nexus 4 (3). 102 Figure B.17: Figure shows the power distribution based on the hour of day for phone Nexus 4 (3).
© Copyright 2026 Paperzz