Julio

Practical (Introduction to) Reverse
Engineering
Julio Auto
<julio . auto *a* gmail>
Agenda
 Part I - 101
 Why this presentation? (I mean... WHY?!?!)
 A few concepts (Mumble jumble++)
 Demo (Show me the goods)
 Part II - 1337
 Advancing RE (Do your own!)
 Something extra (Finish pretty – er... almost )
 Linkz, lulz, refz, and shoutz
 Q & (maybe) A
Why?
 Suggested by the H2HC crew
 Based on my article ‘Cracking CrackMes’, published earlier this
year while working for my previous employer, Scanit ME
 RE is getting lots of attention, and many people seem
interested in learning it
 Still, it remains largely a black art
Why? (2)
 It seems, then, that moving up from ground zero is the most
problematic step
 This presentation tries to help fix it
 It aims to expose instant useful knowledge
 And pointers to where go digging deeper
 Instead of advanced research _results_, basic _techniques_ and
_processes_
 Obs.: We’ll be targeting the Windows platform most of the
time in this speech
Concepts
 Reverse Engineering is a very self-explicative term
 You take something and, from there, try to learn how (some
aspect of) it was engineered
 It’s also obviously broad
 For example, it’s often used to describe the process through
which you generate a higher-level, architectural view of a piece
of software given its source code
My Own Concept
 Think of the times you asked yourself “why” and “how” and
let it go without an answer...
 ...
 ...
 ...
RE is not letting go
A Few Applications
 Malware Analysis
 Vulnerability Analysis
 Security Assessment of 3rd-party COTS
 Evaluation/Breaking of copy-protection schemes
 Assorted how’s and why’s
Why Still a Black Art?
 Perhaps because people think it’s only good for SW cracking

 Perhaps because DRM has become a nightmare no one is
happy with and related laws everywhere bash reversers too
hard every now and then  (does anybody remember
Dmitry Sklyarov, the DMCA and all that madness?)
 Perhaps because many people still think it should be illegal 
(wtf?!)
How To Learn
 The Crack-Me approach
 The one I illustrate in the paper I mentioned
 Small and targeted challenges with different levels and obstacles
to choose from
 The real life approach
 Choose a real-world problem and attack it
 Tough but rewarding
 We’ll demo a bit of both
Tools of The Trade
 Probably millions of tools that can give you some useful piece
of info about your target
 I’ll try to restrict myself to the most relevant/common, then
 Unfortunately, many of the best tools are commercial
 On the other hand, many of them have free/student/evaluation
versions 
 For the rest... Well, remember “the real life approach”? ;)
Debuggers
 Obvious importance
 Fairly good variety
 It’s nice to play and know your way with all of them
 But mastering them all is quite hard, so you’ll most likely elect your
debugger of choice in little time
 Choose your debugger well!
Debuggers (2)
 WinDbg
 My personal choice of debugger
 Developed by MSFT
 Comes for free in the “Debugging Tools for Windows” package
 Amazingly rich in features
 Extensible with some C++ programming
 Not the easiest or simplest dev environment
 Very rich API, though
 Poor interface
Debuggers (3)
 Visual Studio Debugger
 It’s crap, not suited for reversing
 But it’s pretty and nice for developers :>
 Seriously, don’t try to go very far reversing with it
 It may use up the rest of your sanity
Debuggers (4)
 OllyDbg
 Enjoys quite a lot of popularity in the reversing community
 Nice interface
 In particular, a nice disassembly view
 Comes in a few “tuned” versions, being one of the most
popular...
Debuggers (5)
 Immunity Debugger
 Developed by Immunity Inc. (someone from the dev team in
the audience? )
 Extends OllyDbg with a python interpreter and exposes a
couple of debugging modules for the user to interact with
 Very neat plugin support
 Embeds a command-line with windbg-aliased commands
 Maintains a forum to support developers/users of ImmDbg
plugins
Debuggers (6)
 gdb
 The standard debugger on *NIX systems
 Quite complete debugger
 Not the best thing in the RE world, but overall a good debugger
Disassemblers
 Reading assembly is not the sweetest thing for most people
 The way the code is represented is extremely important and
makes an increasingly great difference in big RCE tasks
 Therefore, being confortable with your disassembler is
essential
Disassemblers (2)
 Pretty much every debugger is capable of disassembling
 Apart of that, there’s lots of other tools that can do it too
 In Linux, objdump is pretty much a standard tool
 However, one particular tool is specially known for its
disassembly features
Disassemblers (3)
 IDA Pro
 Supports many binary formats and architectures
 Displays the code in graphs, which greatly enhance the
visualization
 Block-level CFGs
 Many things can be customized/adjusted
 Graph layout, data types, annotations...
 Quite frankly, it’s in every reverser’s toolkit
 IDA Pro is a commercial tool currently in version 5.3
 But version 4.9 is available in a free edition 
System Monitoring Tools
 All of those from the SysInternals Suite
 Process Explorer
 RegMon
 FileMon
 TCPView
 Etc...
Advanced Tools
 Binary Diff’ers
 BinDiff
 Decompilers
 Hex-Rays
 RE Frameworks
 ERESI ;)
 PaiMei and all the PyThings
Demo
 We’ll try and beat a crack-me challenge
 This crack-me was taken from a real competition
 HITB Dubai 2007 CTF
 Perhaps it can serve as a tip for H2HC’s CTF as well 
RE – Advanced Topics
 Cutting to the chase, advancing RE basically means
automating stuff
 Many of the RE tools are
scriptable/programmable/extensible
 Developing smart ways to deal with repetitive tasks is the
way for more effective analyses
RE – Advanced Topics (2)
 Less often, you might see opportunities to advance RE in
ways not based on automation
 Defeating a new anti-debug trick
 Developing new environments for RE
 Virtualization, Sandboxing...
 Or even radically changing paradigms
 E.g. The graph-based approach to binary navigation
RE – Advanced Topics (3)
 Perhaps the most important lesson here is not to reinvent the
wheel
 Re-use the tools you have!
 You’ll be amazed at how much stuff you can do by “glueing”
pieces together
 Having that said...
 Perhaps the tools you have are not perfect
 Or you might wanna re-do something just for learning
 But be sure to have the right goals in mind!
Teaching By (Bad) Examples
 I wanted to do something really neat to show these concepts
in practice
 Unfortunately, I didn’t manage to finish it in time 
 The thing is currently under test/final touches
 However the idea is so cool and in such a (relatively)
advanced state, that I decided to talk about it anyway
Problem
 Suppose you have ways to reproduce a high-profile, possibly
exploitable bug –Yay!
 BUT....
 The target is closed-source software
 The target is as large and complex as an operating system – and
way less documented
 The input is huge and has a complex, possibly undisclosed
format
 The source of the bug can be anywhere in the input
 From user-input to actual bug/crash, about 3 million
instructions happen
WHAT DO YOU DO????
Introducing LEP
 LEP tries to answer a big question in this problem:
 What exact part of this input is causing the bug?
 If you can answer this question and somehow co-relate this
with the input format, you may gain a great deal of
understanding of the bug
 For this, I have invented a new technique: “Staged Partial
Tracing-Based Backwards Taint Analysis”
 Because not sounding like a Ph.D. is so 2001 :>
 And also because we all just love new terms we can go media-
cuckoo about
Introducing LEP (2)
 One-liner idea: If we know when our input is brought to
memory and know where it’s mapped, we can trace the
program from this point to the crash and then go backwards
analyzing the dataflow to find out where the faulting data
came from
 We do it in two stages, with a component for each: the tracer
and the analyzer
 Simple, huh?
Fundamental Concepts
 When we trace the program, it becomes “linear”, i.e.
control-flow is irrelevant
 Dataflow becomes evident
 Aliasing is not an issue (in essence, it disappears)
 All info we need is available in runtime
 In particular, effective addresses
 If the input is as big as the problem states, it should be no
problem to find it in memory
 We get most of the info we need from the disassembly text
(ASCII)! It’s like hacking with grep again!
LEP Tracer
 A WinDbg extension
 Traces every instruction until the program raises an
exception
 Dumps the following instruction info to a file:
 Mnemonic
 Destination operand
 Source operand
 Dependences of the source op – e.g. mov eax,[ecx+edx*2]
LEP Tracer (2)
 Discards control-flow changing instructions
 Discards in/out instructions (all relevant input should be in
memory already?)
 Discards other groups of instructions that will be supported
as we go
 FPU, MMX, SSE{2,3}, etc...
 This might be one of the reasons it currently doesn’t work
 Tries to parse the right info even when the debugger is too
stupid to work as expected 
 Why not to compute effective addresses in rep’ed instructions?
LEP Analyzer
 Reads the file generated by the tracer and goes bottom-up
investigating the dataflow
 You have to specify the piece of data that causes the last
instruction to fail – usually (always?) a register
 And the memory range(s) where your input was mapped into,
at the time the trace was taken
 Ignores register “slices” for simplicity
 al = ah = ax = eax = rax
LEP Analyzer (2)
 When the source operand of a given instruction is an
immediate/constant, LEP tries it best to evaluate whether it
_transforms_ or _overwrites_ the destination
 If it overwrites, we finish the analysis for this branch
 mov eax, deadf0f0h
 Else if it transforms, we keep looking for another def of the same
destination operand
 inc eax
 This gives a very special meaning for LEP’s existence
 Otherwise, searching for occurences of the faulting data inside the input
could be just as effective
 LEP also tries to identify non-obvious constant overwrites
 xor eax, eax
Engineering Tech-Talk
 LEP was intended to be written entirely in Python
 Didn’t work for performance reasons 
 LEP Tracer is written in C++, since it’s a WinDbg extension
 It makes use of a reference of the x86 instruction set written in
XML
 The XML is mapped to C++ using CodeSynthesis’ XSD XML
Data Binding
 LEP Analyzer was firstly written in Python
 Then I also re-wrote it in C++
 LEP Analyzer’s search algorithm was initially a DFS
 Then I implemented it as a BFS
Demo II
• Placeholder slide :>
LEP Release
 As much as I like to make my software free and wide open, I
have chosen not to release LEP to the public for now
 Instead, I’m willing to share it with whoever contacts me
directly (by e-mail, for example)
 Basically, I just wanna know who is using it and what it’s
gonna be used for
 Makes no difference if you come from Wall Street or from an
underground cracking ghetto – drop me a line
Linkz & Refz
 Cracking CrackMes
 http://www.scanit.net/rd/wp/wp04
 X86 Opcode and Instruction Reference, by MazeGen
 http://ref.x86asm.net/
 CodeSynthesis XSD – XML Data Binding for C++
 http://www.codesynthesis.com/products/xsd/
 Thousands of elite RE projects
 http://www.google.com 
 Seriously though, contact me if you can’t find anything
Greetz & Shoutz
 Filipe Balestra for lending me the bug used in the 2nd demo
 H2HC crew for letting me ruin their conference again
 The ERESI team, with whom I have most of my discussions
about RE, programa analysis, etc
 All of the great people that I know from the security scene
 It’s simply impossible to mention each and everyone of you, but
you know who you are!
Questions?
Practical (Introduction to) Reverse
Engineering
Julio Auto
<julio . auto *a* gmail>