Graph Based Model for Software Tamper Protection

Iterated Transformations
and Quantitative Metrics
for Software Protection
Mariusz H. Jakubowski
Chit Wei (Nick) Saw
Ramarathnam Venkatesan
Microsoft Research
Redmond, WA (USA)
International Conference on Security and Cryptography
SECRYPT 2009
July 7-10, 2009 – Milan, Italy
Introduction
• Software protection
– Complicate reverse engineering and tampering.
– Enforce execution as intended by developer.
– DRM, licensing, anti-malware, OS security, etc.
• Iterated code transformations
– Multiple, often simple transformations applied repeatedly
– Protection built up via cascading effects
– Emergent program structures and operation
• Goals of our work:
– Develop protection framework based on iterated
transformations.
– Study security of iterated protection via metrics on code.
SECRYPT 2009
Milan, Italy
July 7-10, 2009
2
Overview
Iterated protection and security metrics
•
•
•
•
•
•
Introduction
Background
Iterated protection
Metrics for security analysis
Implementation and experiments
Conclusion
SECRYPT 2009
Milan, Italy
July 7-10, 2009
3
Background: Iterative Systems
• Complex systems
– Agents controlled by iterative evolution rules
• Traffic, crowds, economic markets, ant colonies, …
• Cellular automata (Game of Life)
– Emergent behavior over time
• Cryptography
– Iterated rounds in ciphers, hash functions, etc.
– Better security as number of rounds increases
SECRYPT 2009
Milan, Italy
July 7-10, 2009
4
Overview
Iterated protection and security metrics
•
•
•
•
•
•
Introduction
Background
Iterated protection
Metrics for security analysis
Implementation and experiments
Conclusion
SECRYPT 2009
Milan, Italy
July 7-10, 2009
5
Iterated Protection
A framework for design, implementation and analysis of
software-protection methods
• Protective operators are applied repeatedly over already
obfuscated code.
• Interaction among operators leads to complex
(emergent) code behavior and structure.
– “Whole greater than the sum of its parts”
– Security bootstrapped even via simple iterated primitives
– Complexity derived from iteration and recombination
• Methodology may offer hope for security analysis.
– Metrics to estimate complexity of breaking
– Foundation for analysis of practical obfuscation methods
– Just one element to aid with comprehensive approaches
SECRYPT 2009
Milan, Italy
July 7-10, 2009
6
An Analogy: Iterated Translation
“Weakly obfuscating” transformations that create complexity when iterated:
•
Original:
Code-obfuscation and tamper-resistance technologies aim to protect software against reverse engineering and
hacking. Such methods include code-integrity verification, control-flow obfuscation, data encoding, antidebugging, and many others. This technology is useful for applications such as content protection and Digital
Rights Management, where code must execute as intended in order to enforce anti-piracy measures. Software
protection is also helpful against viruses, worms, trojans, rootkits, and malicious code in general.
•
English  German  French  Spanish  English  Chinese  English (via Babel Fish):
Causes to change the dark code and the payment technology resistance cares for to protect r3etechnique and the
Zerhacken program computer science opposition. Such method including code data fullness reconsideraci3on,
Steuerung-fliessen changes darkly, compiles the statute book, AntiAuspruefen and other people. This technology
is the useful legal management the protection which likely satisfies for the application and, in the code numeral the
application must, expect there, imposes the antipiraterie set. The computer science program protection is the very
useful opposition virus, the endless screw, Turlogh 伊人, rootkits and code boeswilligen in brief to general.
•
English  German  English  German  English  German  English:
If you code Kollusion and resistance technologies supplying, the goal away away of protecting for of of software
from back technology and the Zerhacken. Such methods close completeness of the code test, taxliquid the
Kollusion, the data coding out and anti-examining and that differently the many. This technology is for applications
like content protection and must the right management, which is useful, the code digitally inside accomplish, for
Piraterie mass anti forces there been supposed. Software protection is useful generally also against viruses,
continuous screws, trojans, root installation of sentences and bad-ready code.
SECRYPT 2009
Milan, Italy
July 7-10, 2009
7
Sample Operator: Oblivious Hashing
• Injection of code to perform integrity checks of execution
– Hash updates after state changes (e.g., assignments and branches)
– Periodic checks for hash correctness
• First hashing round verifies execution of target code.
• Each subsequent round verifies execution of all previous
rounds (along with target code).
INITIALIZE_HASH(hash1);
int x = 123;
UPDATE_HASH(hash1, x);
int x = 123;
if (GetUserInput() > 10)
{
x = x + 1;
}
else
{
printf("Hello\n");
}
if (GetUserInput() >
{
UPDATE_HASH(hash1,
x = x + 1;
UPDATE_HASH(hash1,
}
else
{
UPDATE_HASH(hash1,
printf("Hello\n");
}
10)
BRANCH_ID_1);
x);
BRANCH_ID_2);
VERIFY_HASH(hash1);
SECRYPT 2009
Milan, Italy
July 7-10, 2009
8
Two Iterated Rounds of OH
Second round verifies both the original code and the first round of OH.
INITIALIZE_HASH(hash1);
INITIALIZE_HASH(hash2);
INITIALIZE_HASH(hash1);
int x = 123;
UPDATE_HASH(hash1, x);
int x = 123;
if (GetUserInput() > 10)
{
x = x + 1;
}
else
{
printf("Hello\n");
}
if (GetUserInput() >
{
UPDATE_HASH(hash1,
x = x + 1;
UPDATE_HASH(hash1,
}
else
{
UPDATE_HASH(hash1,
printf("Hello\n");
}
VERIFY_HASH(hash1);
10)
BRANCH_ID_1);
x);
BRANCH_ID_2);
int x = 123;
UPDATE_HASH(hash1, x);
UPDATE_HASH(hash2, x);
UPDATE_HASH(hash2, hash1);
if (GetUserInput() >
{
UPDATE_HASH(hash1,
UPDATE_HASH(hash2,
UPDATE_HASH(hash2,
x = x + 1;
UPDATE_HASH(hash1,
UPDATE_HASH(hash2,
UPDATE_HASH(hash2,
}
else
{
UPDATE_HASH(hash1,
UPDATE_HASH(hash2,
UPDATE_HASH(hash2,
printf("Hello\n");
}
10)
BRANCH_ID_1);
BRANCH_ID_1);
hash1);
x);
x);
hash1);
BRANCH_ID_2);
BRANCH_ID_2);
hash1);
VERIFY_HASH(hash1);
VERIFY_HASH(hash2);
SECRYPT 2009
Milan, Italy
July 7-10, 2009
9
Example Protection Operators
Complexity derived from iteration and recombination
• Pointer conversion
– Conversion of variable references to be performed via pointers
– Addition of arbitrary layers of indirection
int x = GetTickCount();
printf("%d\n", x);
SECRYPT 2009
Milan, Italy
int * ptr_x_0;
int x;
ptr_x_0 = &x;
unsigned int tmp_151 =
(* (unsigned int (__stdcall *)())
&GetTickCount)();
int tmp_152 = (int) tmp_151;
*(int *) ptr_x_0 = tmp_152;
char * tmp_ptr_154 = (char *) "%d\n";
printf(tmp_ptr_154, * (int *) ptr_x_0);
July 7-10, 2009
int * ptr_x_2;
int ** ptr_ptr_x_0_1;
int * ptr_x_0;
int x;
ptr_ptr_x_0_1 = &ptr_x_0;
ptr_x_2 = &x;
*(int **) ptr_ptr_x_0_1 = ptr_x_2;
unsigned int tmp_151 =
(* (unsigned int (__stdcall *)())
&GetTickCount)();
int tmp_152 = (int) tmp_151;
int * tmp_ptr_159 = * (int **) ptr_ptr_x_0_1;
* (int *) tmp_ptr_159 = tmp_152;
char * tmp_ptr_154 = (char *) "%d\n";
int * tmp_ptr_160 = * (int **) ptr_ptr_x_0_1;
printf(tmp_ptr_154, * (int *) tmp_ptr_160);
10
Example Protection Operators
Complexity derived from iteration and recombination
• Code outlining
– Extraction of code sections into separate functions
– Complementary operation to common code-inlining optimizations
– Potential for creation of arbitrarily structured control-flow graphs
• Superdiversification
– Peephole instruction replacement
– Guided brute-force search for equivalent instruction sequences
– Generation of arbitrarily individualized code
• Dataflow flattening
– Injection of artificial variable dependencies
– Implementation via opaque predicates or “chaff” expressions on
two variables
– Production of flat (complete or nearly complete) dataflow graphs
SECRYPT 2009
Milan, Italy
July 7-10, 2009
11
Design of Protection Operators
• Arbitrary operators are possible.
– May be designed heuristically to achieve specific
objectives.
– Operation over time may be emergent and thus apparent
only via experimentation.
– Very simple operators in combination may reduce the
need to construct complicated schemes.
• Classic techniques can serve as operators:
–
–
–
–
–
SECRYPT 2009
Opaque predicates
Control-flow flattening
Data encoding
Chaff-code injection
…
Milan, Italy
July 7-10, 2009
12
Overview
Iterated protection and security metrics
•
•
•
•
•
•
Introduction
Background
Iterated protection
Metrics for security analysis
Implementation and experiments
Conclusion
SECRYPT 2009
Milan, Italy
July 7-10, 2009
13
Quantitative Security Metrics
• Complex systems do not lend themselves
to modeling of future states.
– “Must be run to see what happens.”
– “Cannot be short-cut.”
• One solution: Analyze security via
complexity metrics computed over
protected code.
SECRYPT 2009
Milan, Italy
July 7-10, 2009
14
Security Evaluation via Metrics
• [Anckaert et al. ‘07]: Code-complexity
metrics to evaluate protection
– Instruction count
– Cyclomatic number
• #edges – #nodes + 2
• “Number of decision points”
– Knot count:
• #crossings
• “Unstructuredness”
SECRYPT 2009
Milan, Italy
July 7-10, 2009
15
Security Evaluation via Metrics
• Other metrics
– Variable density (#variables per instruction)
– Operational indirection (fraction of references
performed via pointers)
–…
• Metrics should be chosen to reflect difficulty
of various analysis tasks.
SECRYPT 2009
Milan, Italy
July 7-10, 2009
16
Overview
Iterated protection and security metrics
•
•
•
•
•
•
Introduction
Background
Iterated protection
Metrics for security analysis
Implementation and experiments
Conclusion
SECRYPT 2009
Milan, Italy
July 7-10, 2009
17
Implementation
• Iterated-protection tool
– Compiler plug-in for C/C++ code
– Based on Microsoft Phoenix compiler framework
– Source-to-source transformations
• Simple architecture
– Each protection operator is straightforward to
implement and test.
– Power of tool derives from iteration and
recombination of multiple operators.
SECRYPT 2009
Milan, Italy
July 7-10, 2009
18
Experimental Results
Selected SPEC benchmarks
Tables display values of metrics (ratios) relative to original code.
SECRYPT 2009
Milan, Italy
July 7-10, 2009
19
Experimental Results
SECRYPT 2009
Milan, Italy
July 7-10, 2009
20
Experimental Results
SECRYPT 2009
Milan, Italy
July 7-10, 2009
21
Conclusion
• Iterated-protection framework
– Iteration and mixing of simple primitives
– Cascading effects and emergent behavior
– Quantitative metrics over code to assess security
• Future directions
– Additional protection operators to achieve given
objectives
– Closer linking of metrics to actual difficulty of
analysis and breaking
SECRYPT 2009
Milan, Italy
July 7-10, 2009
22