Cyber Genome Project Annotated Bibliography
Revision –
March 4, 2010
(Chouchane, Walenstein et al. 2007; Jiang, Wang et al. 2007; Preda, Christodorescu et al.
2007; Crandall, Ensafi et al. 2008; Preda, Christodorescu et al. 2008; Zhou and Inge
2008; Ahmed, Hameed et al. 2009; Carbone, Cui et al. 2009; Chen, Yuan et al. 2009; Hu,
Chiueh et al. 2009; Kang, Yin et al. 2009; Tabish, Shafiq et al. 2009; Xue, Hu et al. 2009;
Lakhotia, Boccardo et al. 2010; Maughan 2010)
Ahmed, F., H. Hameed, et al. (2009). Using spatio-temporal information in API calls
with machine learning algorithms for malware detection. Proceedings of the 2nd ACM
workshop on Security and artificial intelligence. Chicago, Illinois, USA, ACM: 55-62.
Run-time monitoring of program execution behavior is widely used to
discriminate between benign and malicious processes running on an end-host.
Towards this end, most of the existing run-time intrusion or malware detection
techniques utilize information available in Windows Application Programming
Interface (API) call arguments or sequences. In comparison, the key novelty of
our proposed tool is the use of statistical features which are extracted from both
spatial arguments) and temporal (sequences) information available in Windows
API calls. We provide this composite feature set as an input to standard machine
learning algorithms to raise the final alarm. The results of our experiments show
that the concurrent analysis of spatio-temporal features improves the detection
accuracy of all classifiers. We also perform the scalability analysis to identify a
minimal subset of API categories to be monitored whilst maintaining high
detection accuracy.
Carbone, M., W. Cui, et al. (2009). Mapping kernel objects to enable systematic integrity
checking. Proceedings of the 16th ACM conference on Computer and communications
security. Chicago, Illinois, USA, ACM: 555-565.
Dynamic kernel data have become an attractive target for kernel-mode malware.
However, previous solutions for checking kernel integrity either limit themselves
to code and static data or can only inspect a fraction of dynamic data, resulting in
limited protection. Our study shows that previous solutions may reach only 28%
of the dynamic kernel data and thus may fail to identify function pointers
manipulated by many kernel-mode malware.
To enable systematic kernel integrity checking, in this paper we present KOP, a system
that can map dynamic kernel data with nearly complete coverage and nearly
perfect accuracy. Unlike previous approaches, which ignore generic pointers,
unions and dynamic arrays when locating dynamic kernel objects, KOP (1)
applies inter-procedural points-to analysis to compute all possible types for
generic pointers (e.g., void*), (2) uses a pattern matching algorithm to resolve
type ambiguities (e.g., unions), and (3) recognizes dynamic arrays by leveraging
knowledge of kernel memory pool boundaries. We implemented a prototype of
Annotated Bibliography – Page 1
KOP and evaluated it on a Windows Vista SP1 system loaded with 63 kernel
drivers. KOP was able to accurately map 99% of all the dynamic kernel data.
To demonstrate KOP's power, we developed two tools based on it to systematically
identify malicious function pointers and uncover hidden kernel objects. Our tools
correctly identified all malicious function pointers and all hidden objects from
nine real-world kernel-mode malware samples as well as one created by
ourselves, with no false alarms.
Chen, H., L. Yuan, et al. (2009). Control flow obfuscation with information flow
tracking. Proceedings of the 42nd Annual IEEE/ACM International Symposium on
Microarchitecture. New York, New York, ACM: 391-400.
Recent micro-architectural research has proposed various schemes to enhance
processors with additional tags to track various properties of a program. Such a
technique, which is usually referred to as information flow tracking, has been
widely applied to secure software execution (e.g., taint tracking), protect software
privacy and improve performance (e.g., control speculation).
In this paper, we propose a novel use of information flow tracking to obfuscate the whole
control flow of a program with only modest performance degradation, to defeat
malicious code injection, discourage software piracy and impede malware
analysis. Specifically, we exploit two common features in information flow
tracking: the architectural support for automatic propagation of tags and violation
handling of tag misuses. Unlike other schemes that use tags as oracles to catch
attacks (e.g., taint tracking) or speculation failures, we use the tags as flowsensitive predicates to hide normal control flow transfers: the tags are used as
predicates for control flow transfers to the violation handler, where the real
control flow transfer happens.
We have implemented a working prototype based on Itanium processors, by leveraging
the hardware support for control speculation. Experimental results show that
BOSH can obfuscate the whole control flow with only a mean of 26.7% (ranging
from 4% to 59%) overhead on SPECINT2006. The increase in code size and
compilation time is also modest.
Chouchane, M. R., A. Walenstein, et al. (2007). Statistical signatures for fast filtering of
instruction-substituting metamorphic malware. Proceedings of the 2007 ACM workshop
on Recurring malcode. Alexandria, Virginia, USA, ACM: 31-37.
Introducing program variations via metamorphic transformations is one of the
methods used by malware authors in order to help their programs slip past
defenses. A method is presented for rapidly deciding whether or not an input
program is likely to be a variant of a given metamorphic program. The method is
defined for the prominent class of metamorphic engines that work by
probabilistically selecting instruction-substituting program transformations. A
model of the probabilistic engine is used to predictthe expected distribution of
instruction forms for different generations ofvariants. These predicted
distributions form a type of "statistical signature" for the output of the
Annotated Bibliography – Page 2
metamorphic engines. A classifier is defined based on distance between the
observed and the predicted instruction form distributions. A case study using the
W32.Evol virus shows the classifier can distinguish between malicious samples
from multiple generations. The classification method may be useful for practical
malware detection by serving as an inexpensive filter to avoid more in-depth
analyses where they are unnecessary
Crandall, J. R., R. Ensafi, et al. (2008). The ecology of Malware. Proceedings of the 2008
workshop on New security paradigms. Lake Tahoe, California, USA, ACM: 99-106.
The fight against malicious software (or malware, which includes everything from
worms to viruses to botnets) is often viewed as an "arms race." Conventional
wisdom is that we must continually "raise the bar" for the malware creators.
However, the multitude of malware has itself evolved into a complex
environment, and properties not unlike those of ecological systems have begun to
emerge. This may include competition between malware, facilitation, parasitism,
predation, and density-dependent population regulation. Ecological principles will
likely be useful for understanding the effects of these ecological interactions, for
example, carrying capacity, species-time and species-area relationships, the
unified neutral theory of biodiversity, and the theory of island bio-geography. The
emerging malware ecology can be viewed as a critical challenge to all aspects of
malware defense, including collection, triage, analysis, intelligence estimates,
detection, mitigation, and forensics. It can also be viewed as an opportunity.
In this position paper, we argue that taking an ecological approach to malware defense
will suggest new defenses. In particular, we can exploit the fact that interactions
of malware with its environment, and with other malware, are neither fully
predictable nor fully controllable by the malware author--yet the emergent
behavior will follow general ecological principles that can be exploited for
malware defense.
Hu, X., T.-c. Chiueh, et al. (2009). Large-scale malware indexing using function-call
graphs. Proceedings of the 16th ACM conference on Computer and communications
security. Chicago, Illinois, USA, ACM: 611-620.
A major challenge of the anti-virus (AV) industry is how to effectively process
the huge influx of malware samples they receive every day. One possible solution
to this problem is to quickly determine if a new malware sample is similar to any
previously-seen malware program. In this paper, we design, implement and
evaluate a malware database management system called SMIT (Symantec
Malware Indexing Tree) that can efficiently make such determination based on
malware's function-call graphs, which is a structural representation known to be
less susceptible to instruction-level obfuscations commonly employed by malware
writers to evade detection of AV software. Because each malware program is
represented as a graph, the problem of searching for the most similar malware
program in a database to a given malware sample is cast into a nearest-neighbor
search problem in a graph database. To speed up this search, we have developed
an efficient method to compute graph similarity that exploits structural and
instruction-level information in the underlying malware programs, and a multiAnnotated Bibliography – Page 3
resolution indexing scheme that uses a computationally economical feature vector
for early pruning and resorts to a more accurate but computationally more
expensive graph similarity function only when it needs to pinpoint the most
similar neighbors. Results of a comprehensive performance study of the SMIT
prototype using a database of more than 100,000 malware demonstrate the
effective pruning power and scalability of its nearest neighbor search
mechanisms.
Jiang, X., X. Wang, et al. (2007). Stealthy malware detection through vmm-based "outof-the-box" semantic view reconstruction. Proceedings of the 14th ACM conference on
Computer and communications security. Alexandria, Virginia, USA, ACM: 128-138.
An alarming trend in malware attacks is that they are armed with stealthy
techniques to detect, evade, and subvert malware detection facilities of the victim.
On the defensive side, a fundamental limitation of traditional host-based antimalware systems is that they run inside the very hosts they are protecting ("in the
box"), making them vulnerable to counter-detection and subversion by malware.
To address this limitation, recent solutions based on virtual machine (VM)
technologies advocate placing the malware detection facilities outside of the
protected VM ("out of the box"). However, they gain tamper resistance at the cost
of losing the native, semantic view of the host which is enjoyed by the "in the
box" approach, thus leading to a technical challenge known as the semantic gap.
In this paper, we present the design, implementation, and evaluation of VMwatcher - an
"out-of-the-box" approach that overcomes the semantic gap challenge. A new
technique called guest view casting is developed to systematically reconstruct
internal semantic views (e.g., files, processes, and kernel modules) of a VM from
the outside in a non-intrusive manner. Specifically, the new technique casts
semantic definitions of guest OS data structures and functions on virtual machine
monitor (VMM)-level VM states, so that the semantic view can be reconstructed.
With the semantic gap bridged, we identify two unique malware detection
capabilities: (1) view comparison-based malware detection and its demonstration
in rootkit detection and (2) "out-of-the-box" deployment of host-based antimalware software with improved detection accuracy and tamper-resistance. We
have implemented a proof-of-concept prototype on both Linux and Windows
platforms and our experimental results with real-world malware, including elusive
kernel-level rootkits, demonstrate its practicality and effectiveness.
Kang, M. G., H. Yin, et al. (2009). Emulating emulation-resistant malware. Proceedings
of the 1st ACM workshop on Virtual machine security. Chicago, Illinois, USA, ACM:
11-22.
The authors of malware attempt to frustrate reverse engineering and analysis by
creating programs that crash or otherwise behave differently when executed on an
emulated platform than when executed on real hardware. In order to defeat such
techniques and facilitate automatic and semi-automatic dynamic analysis of
malware, we propose an automated technique to dynamically modify the
Annotated Bibliography – Page 4
execution of a whole-system emulator to fool a malware sample's anti-emulation
checks. Our approach uses a scalable trace matching algorithm to locate the point
where emulated execution diverges, and then compares the states of the reference
system and the emulator to create a dynamic state modification that repairs the
difference. We evaluate our technique by building an implementation into an
emulator used for in-depth malware analysis. On case studies that include real
samples of malware collected in the wild and an attack that has not yet been
exploited, our tool automatically ameliorates the malware sample's anti-emulation
checks to enable analysis, and its modifications are robust to system changes.
Lakhotia, A., D. R. Boccardo, et al. (2010). Context-sensitive analysis of obfuscated x86
executables. Proceedings of the 2010 ACM SIGPLAN workshop on Partial evaluation
and program manipulation. Madrid, Spain, ACM: 131-140.
A method for context-sensitive analysis of binaries that may have obfuscated
procedure call and return operations is presented. Such binaries may use operators
to directly manipulate stack instead of using native call and ret instructions to
achieve equivalent behavior. Since definition of context-sensitivity and
algorithms for context-sensitive analysis have thus far been based on the specific
semantics associated to procedure call and return operations, classic
interprocedural analyses cannot be used reliably for analyzing programs in which
these operations cannot be discerned. A new notion of context-sensitivity is
introduced that is based on the state of the stack at any instruction. While changes
in `calling'-context are associated with transfer of control, and hence can be
reasoned in terms of paths in an interprocedural control flow graph (ICFG), the
same is not true of changes in 'stack'-context. An abstract interpretation based
framework is developed to reason about stack-contexts and to derive analogues of
call-strings based methods for the context-sensitive analysis using stack-context.
The method presented is used to create a context-sensitive version of Venable et
al.'s algorithm for detecting obfuscated calls. Experimental results show that the
context-sensitive version of the algorithm generates more precise results and is
also computationally more efficient than its context-insensitive counterpart.
Maughan, D. (2010). "The need for a national cybersecurity research and development
agenda." Commun. ACM 53(2): 29-31.
Government-funded initiatives, in cooperation with private-sector partners in key
technology areas, are fundamental to cybersecurity technical transformation.
Preda, M. D., M. Christodorescu, et al. (2007). A semantics-based approach to malware
detection. Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on
Principles of programming languages. Nice, France, ACM: 377-388.
Malware detection is a crucial aspect of software security. Current malware
detectors work by checking for "signatures," which attempt to capture (syntactic)
characteristics of the machine-level byte sequence of the malware. This reliance
on a syntactic approach makes such detectors vulnerable to code obfuscations,
increasingly used by malware writers, that alter syntactic properties of the
malware byte sequence without significantly affecting their execution
behavior.This paper takes the position that the key to malware identification lies
Annotated Bibliography – Page 5
in their semantics. It proposes a semantics-based framework for reasoning about
malware detectors and proving properties such as soundness and completeness of
these detectors. Our approach uses a trace semantics to characterize the behaviors
of malware as well as the program being checked for infection, and uses abstract
interpretation to "hide" irrelevant aspects of these behaviors. As a concrete
application of our approach, we show that the semantics-aware malware detector
proposed by Christodorescu et al. is complete with respect to a number of
common obfuscations used by malware writers.
Preda, M. D., M. Christodorescu, et al. (2008). "A semantics-based approach to malware
detection." ACM Trans. Program. Lang. Syst. 30(5): 1-54.
Malware detection is a crucial aspect of software security. Current malware
detectors work by checking for signatures, which attempt to capture the syntactic
characteristics of the machine-level byte sequence of the malware. This reliance
on a syntactic approach makes current detectors vulnerable to code obfuscations,
increasingly used by malware writers, that alter the syntactic properties of the
malware byte sequence without significantly affecting their execution behavior.
This paper takes the position that the key to malware identification lies in their semantics.
It proposes a semantics-based framework for reasoning about malware detectors
and proving properties such as soundness and completeness of these detectors.
Our approach uses a trace semantics to characterize the behavior of malware as
well as that of the program being checked for infection, and uses abstract
interpretation to “hide” irrelevant aspects of these behaviors. As a concrete
application of our approach, we show that (1) standard signature matching
detection schemes are generally sound but not complete, (2) the semantics-aware
malware detector proposed by Christodorescu et al. is complete with respect to a
number of common obfuscations used by malware writers and (3) the malware
detection scheme proposed by Kinder et al. and based on standard modelchecking techniques is sound in general and complete on some, but not all,
obfuscations handled by the semantics-aware malware detector.
Tabish, S. M., M. Z. Shafiq, et al. (2009). Malware detection using statistical analysis of
byte-level file content. Proceedings of the ACM SIGKDD Workshop on CyberSecurity
and Intelligence Informatics. Paris, France, ACM: 23-31.
Commercial anti-virus software are unable to provide protection against newly
launched (a.k.a "zero-day") malware. In this paper, we propose a novel malware
detection technique which is based on the analysis of byte-level file content. The
novelty of our approach, compared with existing content based mining schemes,
is that it does not memorize specific byte-sequences or strings appearing in the
actual file content. Our technique is non-signature based and therefore has the
potential to detect previously unknown and zero-day malware. We compute a
wide range of statistical and information-theoretic features in a block-wise
manner to quantify the byte-level file content. We leverage standard data mining
algorithms to classify the file content of every block as normal or potentially
malicious. Finally, we correlate the block-wise classification results of a given file
to categorize it as benign or malware. Since the proposed scheme operates at the
Annotated Bibliography – Page 6
byte-level file content; therefore, it does not require any a priori information about
the filetype. We have tested our proposed technique using a benign dataset
comprising of six different filetypes --- DOC, EXE, JPG, MP3, PDF and ZIP and
a malware dataset comprising of six different malware types --- backdoor, trojan,
virus, worm, constructor and miscellaneous. We also perform a comparison with
existing data mining based malware detection techniques. The results of our
experiments show that the proposed nonsignature based technique surpasses the
existing techniques and achieves more than 90% detection accuracy.
Xue, J., C. Hu, et al. (2009). Metamorphic malware detection technology based on
aggregating emerging patterns. Proceedings of the 2nd International Conference on
Interaction Sciences: Information Technology, Culture and Human. Seoul, Korea, ACM:
1293-1296.
Obfuscating technology is used widely in metamorphic malware and most of
current detection methods fail to completely identify such ever-increasingly
covert metamorphic malware. In this paper, system call sequences in the process
of software execution are researched and metamorphic malware detection method
based on aggregating emerging patterns is proposed. Experimental results show
most metamorphic malware can be detected effectively by this method and it has
higher detection rate and lower false alarm rate when the minimum support and
growth rate thresholds are set reasonably.
Zhou, Y. and W. M. Inge (2008). Malware detection using adaptive data compression.
Proceedings of the 1st ACM workshop on Workshop on AISec. Alexandria, Virginia,
USA, ACM: 53-60.
A popular approach in current commercial anti-malware software detects
malicious programs by searching in the code of programs for scan strings that are
byte sequences indicative of malicious code. The scan strings, also known as the
signatures of existing malware, are extracted by malware analysts from known
malware samples, and stored in a database often referred to as a virus dictionary.
This process often involves a significant amount of human efforts. In addition,
there are two major limitations in this technique. First, not all malicious programs
have bit patterns that are evidence of their malicious nature. Therefore, some
malware is not recorded in the virus dictionary and can not be detected through
signature matching. Second, searching for specific bit patterns will not work on
malware that can take many forms--obfuscated malware. Signature matching has
been shown to be incapable of identifying new malware patterns and fails to
recognize obfuscated malware. This paper presents a malware detection technique
that discovers malware by means of a learning engine trained on a set of malware
instances and a set of benign code instances. The learning engine uses an adaptive
data compression model--prediction by partial matching (PPM)--to build two
compression models, one from the malware instances and the other from the
benign code instances. A code instance is classified, either as "malware" or
"benign", by minimizing its estimated cross entropy. Our preliminary results are
very promising. We achieved about 0.94 true positive rate with as low as 0.016
false positive rate. Our experiments also demonstrate that this technique can
effectively detect unknown and obfuscated malware.
Annotated Bibliography – Page 7
Annotated Bibliography – Page 8
© Copyright 2026 Paperzz