Forensic Analysis of Toolkit-Generated Malicious Programs

Forensic Analysis of
Toolkit-Generated
Malicious Programs
Yasmine Kandissounon
TSYS School of Computer Science
Columbus State University
2009 ACM Mid-Southeast Conference
Gatlinburg, Tennessee
November 12-13, 2009
State of the Threat
• (Jan – Jun 2009) Microsoft Security Intelligence Report :
– 115,854,807 infections in first half 2009
– 94,985,967 infections in second half 2008
 An increase of about 22%
• (2008) AVTest Labs
– 15,000 to 20,000 new specimens analyzed each day.
(4 times as many as in 2006, 15 times as many as in
2005)
• (ESET ) Talented teams of programmers
• Automated Malware Creation:
– W32.Evol, W32.Simile, W32.NGVCK, W32.VCL, etc.
What Does the AV Industry Need?
• Automation
– (Szor 2005) The need for analysis by humans is a major
bottleneck!
• Ability to quickly and accurately detect new malware.
– (Team Cymru, 2008) 1000 new samples submitted, only
37% detected by commercial AV products!
• Badly needs “good” Generic Signatures
– (Kaspersky Lab 2008) Windows Explorer was flagged as
malicious
– AVIEN’s HARLEY (On average, current detection(using
generic signatures) rates are no better than 70%-80%)
Our Problem: Engine Generated Malware
VIRUS
SAMPLE
In
Network
Variant1
Variant2
Variant3
Variant n
ENGINE
Out
Too many signatures
challenge the detector
Signature Database (Virus Definitions)
Malware detector
Solution: Use Engine Signature
VIRUS
SAMPLE
In
Network
Internet
Variant1
Variant2
Variant3
ENGINE
Out
Use one small piece of info
about the engine to detect all
of the variants.
Engine Signature
Malware detector
Variant n
MALWARE GENERATION AS A
HIDDEN MARKOV MODEL
MOV
JNZ
MOV
MOV
PUSH
MOV
NOP
MOV
NOP
ADD
JMP
JMP
MOV
MOV
NOP
PUSH
JZ
PUSH
MOV
CALL
MOV
CALL
SUB
MOV
PUSH
MOV
CALL
POP
MOV
MOV
MOV
*
MOV
MOV
PUSH
MOV
NOP
MOV
NOP
*
JMP
JMP
MOV
MOV
NOP
PUSH
*
PUSH
MOV
CALL
MOV
CALL
*
MOV
PUSH
MOV
CALL
POP
MOV
MOV
0.33
0.33
NOP
0.21
0.33
CALL
JMP
*
0.29
0.50
0.21
0.50
0.33
MOV
0.07
0.33
0.21
0.67
0.67
PUSH
0.60
Transition Matrix = Engine Signature
(Choice of relevant instructions = 5 most frequent instructions)
NOP
MOV
PUSH
CALL
JMP
*
NOP
0.00
0.21
0.00
0.00
0.00
0.00
MOV
0.33
0.29
0.60
0.67
0.50
0.67
PUSH
0.33
0.14
0.40
0.00
0.00
0.00
CALL
0.00
0.21
0.00
0.00
0.00
0.00
JMP
0.00
0.00
0.00
0.00
0.50
0.33
0.40
*
0.33
0.07
0.00
0.33
0.00
0.00
Take only the n most frequent instructions, for some n.
Transition matrix is n+1 by n+1 and represents the engine
 Problem: Find smallest n that will induce best accuracy
Subjects and Preparation
• 100 malware samples of
W32.Evol and W32.Simile
(Metamorphic viruses)
• 100 malware samples generated
by NGVCK
• 100 malware samples generated
by VCL
– Source: www.vx.netlux.org.
• 100 benign samples
– Source: sourceforge.net ,
download.com, installation of
Windows Vista.
Classification Method
• For each sample
– Identify a training subset of size 30
– Compute the transition matrix for each trainer
– Take the average of these.
– This average is the engine signature for the sample.
• For each instance not used for training
– Compute the transition matrix of the instance
– Compute the Euclidian Distance between the instance and
each of the engine signatures generated in the above stage
– The signature that is found to be closest to this instance’s
transition matrix is declared to be the instances’ family. If
there are ties, choose one at random.
Average Matrix Classifier
(1st Order Markov Chain)
• Results:
RELEVANT INSTRUCTIONS
MISCLASSIFICATIONS
20
5.33%
25
7.33%
10
8%
15
11%
Conclusion and Further Work
• Conclusion
– Good Accuracy (8% misclassifications)
– Small Signature (11 by 11 matrix)
– Fast Detection (12 min for 150 tests)
• Further Work
–
–
–
–
–
2nd order
Work with more samples
Work with other families of malware
Different ways of choosing the relevant instructions
Try a different distance measure
References
• http://www.microsoft.com/security/portal/Threat/SIR.aspx
• http://www.washingtonpost.com/wpdyn/content/article/2008/03/19/AR2008031901439.html
• http://packetstormsecurity.org/mag/40hex/40HEX-10/40HEX10.001J
• http://www.research.ibm.com/antivirus/SciPapers/Tesauro/N
euralNets.html. Last retrieved April 12, 2009
• M.R. Chouchane. “Approximate Detection of Machinemorphed Malicious Programs”. Ph.D. Dissertation. (2008)
• Using Engine Signature to Detect Metamorphic Malware.
Chouchane and Lakhotia, WORM 2006.
References
• Ivan Krsul and Eugene H. Spafford, Authorship Analysis:
Identifying the Author of a Program. Computers & Security
(1997)
• Peter Szor, The Art of Computer Virus Research and Defense.
(Chapter 7) 2005
• Wing Wong and Mark Stamp, Hunting for Metamorphic
Engines. J Comput Virol (2006)
• www.vx.netlux.org, last retrieved April 12, 2009