Practical (Introduction to) Reverse
Engineering
Julio Auto
<julio . auto *a* gmail>
Agenda
Part I - 101
Why this presentation? (I mean... WHY?!?!)
A few concepts (Mumble jumble++)
Demo (Show me the goods)
Part II - 1337
Advancing RE (Do your own!)
Something extra (Finish pretty – er... almost )
Linkz, lulz, refz, and shoutz
Q & (maybe) A
Why?
Suggested by the H2HC crew
Based on my article ‘Cracking CrackMes’, published earlier this
year while working for my previous employer, Scanit ME
RE is getting lots of attention, and many people seem
interested in learning it
Still, it remains largely a black art
Why? (2)
It seems, then, that moving up from ground zero is the most
problematic step
This presentation tries to help fix it
It aims to expose instant useful knowledge
And pointers to where go digging deeper
Instead of advanced research _results_, basic _techniques_ and
_processes_
Obs.: We’ll be targeting the Windows platform most of the
time in this speech
Concepts
Reverse Engineering is a very self-explicative term
You take something and, from there, try to learn how (some
aspect of) it was engineered
It’s also obviously broad
For example, it’s often used to describe the process through
which you generate a higher-level, architectural view of a piece
of software given its source code
My Own Concept
Think of the times you asked yourself “why” and “how” and
let it go without an answer...
...
...
...
RE is not letting go
A Few Applications
Malware Analysis
Vulnerability Analysis
Security Assessment of 3rd-party COTS
Evaluation/Breaking of copy-protection schemes
Assorted how’s and why’s
Why Still a Black Art?
Perhaps because people think it’s only good for SW cracking
Perhaps because DRM has become a nightmare no one is
happy with and related laws everywhere bash reversers too
hard every now and then (does anybody remember
Dmitry Sklyarov, the DMCA and all that madness?)
Perhaps because many people still think it should be illegal
(wtf?!)
How To Learn
The Crack-Me approach
The one I illustrate in the paper I mentioned
Small and targeted challenges with different levels and obstacles
to choose from
The real life approach
Choose a real-world problem and attack it
Tough but rewarding
We’ll demo a bit of both
Tools of The Trade
Probably millions of tools that can give you some useful piece
of info about your target
I’ll try to restrict myself to the most relevant/common, then
Unfortunately, many of the best tools are commercial
On the other hand, many of them have free/student/evaluation
versions
For the rest... Well, remember “the real life approach”? ;)
Debuggers
Obvious importance
Fairly good variety
It’s nice to play and know your way with all of them
But mastering them all is quite hard, so you’ll most likely elect your
debugger of choice in little time
Choose your debugger well!
Debuggers (2)
WinDbg
My personal choice of debugger
Developed by MSFT
Comes for free in the “Debugging Tools for Windows” package
Amazingly rich in features
Extensible with some C++ programming
Not the easiest or simplest dev environment
Very rich API, though
Poor interface
Debuggers (3)
Visual Studio Debugger
It’s crap, not suited for reversing
But it’s pretty and nice for developers :>
Seriously, don’t try to go very far reversing with it
It may use up the rest of your sanity
Debuggers (4)
OllyDbg
Enjoys quite a lot of popularity in the reversing community
Nice interface
In particular, a nice disassembly view
Comes in a few “tuned” versions, being one of the most
popular...
Debuggers (5)
Immunity Debugger
Developed by Immunity Inc. (someone from the dev team in
the audience? )
Extends OllyDbg with a python interpreter and exposes a
couple of debugging modules for the user to interact with
Very neat plugin support
Embeds a command-line with windbg-aliased commands
Maintains a forum to support developers/users of ImmDbg
plugins
Debuggers (6)
gdb
The standard debugger on *NIX systems
Quite complete debugger
Not the best thing in the RE world, but overall a good debugger
Disassemblers
Reading assembly is not the sweetest thing for most people
The way the code is represented is extremely important and
makes an increasingly great difference in big RCE tasks
Therefore, being confortable with your disassembler is
essential
Disassemblers (2)
Pretty much every debugger is capable of disassembling
Apart of that, there’s lots of other tools that can do it too
In Linux, objdump is pretty much a standard tool
However, one particular tool is specially known for its
disassembly features
Disassemblers (3)
IDA Pro
Supports many binary formats and architectures
Displays the code in graphs, which greatly enhance the
visualization
Block-level CFGs
Many things can be customized/adjusted
Graph layout, data types, annotations...
Quite frankly, it’s in every reverser’s toolkit
IDA Pro is a commercial tool currently in version 5.3
But version 4.9 is available in a free edition
System Monitoring Tools
All of those from the SysInternals Suite
Process Explorer
RegMon
FileMon
TCPView
Etc...
Advanced Tools
Binary Diff’ers
BinDiff
Decompilers
Hex-Rays
RE Frameworks
ERESI ;)
PaiMei and all the PyThings
Demo
We’ll try and beat a crack-me challenge
This crack-me was taken from a real competition
HITB Dubai 2007 CTF
Perhaps it can serve as a tip for H2HC’s CTF as well
RE – Advanced Topics
Cutting to the chase, advancing RE basically means
automating stuff
Many of the RE tools are
scriptable/programmable/extensible
Developing smart ways to deal with repetitive tasks is the
way for more effective analyses
RE – Advanced Topics (2)
Less often, you might see opportunities to advance RE in
ways not based on automation
Defeating a new anti-debug trick
Developing new environments for RE
Virtualization, Sandboxing...
Or even radically changing paradigms
E.g. The graph-based approach to binary navigation
RE – Advanced Topics (3)
Perhaps the most important lesson here is not to reinvent the
wheel
Re-use the tools you have!
You’ll be amazed at how much stuff you can do by “glueing”
pieces together
Having that said...
Perhaps the tools you have are not perfect
Or you might wanna re-do something just for learning
But be sure to have the right goals in mind!
Teaching By (Bad) Examples
I wanted to do something really neat to show these concepts
in practice
Unfortunately, I didn’t manage to finish it in time
The thing is currently under test/final touches
However the idea is so cool and in such a (relatively)
advanced state, that I decided to talk about it anyway
Problem
Suppose you have ways to reproduce a high-profile, possibly
exploitable bug –Yay!
BUT....
The target is closed-source software
The target is as large and complex as an operating system – and
way less documented
The input is huge and has a complex, possibly undisclosed
format
The source of the bug can be anywhere in the input
From user-input to actual bug/crash, about 3 million
instructions happen
WHAT DO YOU DO????
Introducing LEP
LEP tries to answer a big question in this problem:
What exact part of this input is causing the bug?
If you can answer this question and somehow co-relate this
with the input format, you may gain a great deal of
understanding of the bug
For this, I have invented a new technique: “Staged Partial
Tracing-Based Backwards Taint Analysis”
Because not sounding like a Ph.D. is so 2001 :>
And also because we all just love new terms we can go media-
cuckoo about
Introducing LEP (2)
One-liner idea: If we know when our input is brought to
memory and know where it’s mapped, we can trace the
program from this point to the crash and then go backwards
analyzing the dataflow to find out where the faulting data
came from
We do it in two stages, with a component for each: the tracer
and the analyzer
Simple, huh?
Fundamental Concepts
When we trace the program, it becomes “linear”, i.e.
control-flow is irrelevant
Dataflow becomes evident
Aliasing is not an issue (in essence, it disappears)
All info we need is available in runtime
In particular, effective addresses
If the input is as big as the problem states, it should be no
problem to find it in memory
We get most of the info we need from the disassembly text
(ASCII)! It’s like hacking with grep again!
LEP Tracer
A WinDbg extension
Traces every instruction until the program raises an
exception
Dumps the following instruction info to a file:
Mnemonic
Destination operand
Source operand
Dependences of the source op – e.g. mov eax,[ecx+edx*2]
LEP Tracer (2)
Discards control-flow changing instructions
Discards in/out instructions (all relevant input should be in
memory already?)
Discards other groups of instructions that will be supported
as we go
FPU, MMX, SSE{2,3}, etc...
This might be one of the reasons it currently doesn’t work
Tries to parse the right info even when the debugger is too
stupid to work as expected
Why not to compute effective addresses in rep’ed instructions?
LEP Analyzer
Reads the file generated by the tracer and goes bottom-up
investigating the dataflow
You have to specify the piece of data that causes the last
instruction to fail – usually (always?) a register
And the memory range(s) where your input was mapped into,
at the time the trace was taken
Ignores register “slices” for simplicity
al = ah = ax = eax = rax
LEP Analyzer (2)
When the source operand of a given instruction is an
immediate/constant, LEP tries it best to evaluate whether it
_transforms_ or _overwrites_ the destination
If it overwrites, we finish the analysis for this branch
mov eax, deadf0f0h
Else if it transforms, we keep looking for another def of the same
destination operand
inc eax
This gives a very special meaning for LEP’s existence
Otherwise, searching for occurences of the faulting data inside the input
could be just as effective
LEP also tries to identify non-obvious constant overwrites
xor eax, eax
Engineering Tech-Talk
LEP was intended to be written entirely in Python
Didn’t work for performance reasons
LEP Tracer is written in C++, since it’s a WinDbg extension
It makes use of a reference of the x86 instruction set written in
XML
The XML is mapped to C++ using CodeSynthesis’ XSD XML
Data Binding
LEP Analyzer was firstly written in Python
Then I also re-wrote it in C++
LEP Analyzer’s search algorithm was initially a DFS
Then I implemented it as a BFS
Demo II
• Placeholder slide :>
LEP Release
As much as I like to make my software free and wide open, I
have chosen not to release LEP to the public for now
Instead, I’m willing to share it with whoever contacts me
directly (by e-mail, for example)
Basically, I just wanna know who is using it and what it’s
gonna be used for
Makes no difference if you come from Wall Street or from an
underground cracking ghetto – drop me a line
Linkz & Refz
Cracking CrackMes
http://www.scanit.net/rd/wp/wp04
X86 Opcode and Instruction Reference, by MazeGen
http://ref.x86asm.net/
CodeSynthesis XSD – XML Data Binding for C++
http://www.codesynthesis.com/products/xsd/
Thousands of elite RE projects
http://www.google.com
Seriously though, contact me if you can’t find anything
Greetz & Shoutz
Filipe Balestra for lending me the bug used in the 2nd demo
H2HC crew for letting me ruin their conference again
The ERESI team, with whom I have most of my discussions
about RE, programa analysis, etc
All of the great people that I know from the security scene
It’s simply impossible to mention each and everyone of you, but
you know who you are!
Questions?
Practical (Introduction to) Reverse
Engineering
Julio Auto
<julio . auto *a* gmail>
© Copyright 2026 Paperzz