Electronic Investigations Workshop

Workshop Description
Electronic Investigations
Workshop
Ian M. MacDonald , Ph.D.
Associate Professor and Chair
Department of Computer Science
The College of Saint Rose
[email protected]
Disclaimer
• This is an introductory workshop in which we will survey
important topics in the field and explore a few of the
freely available software tools.
▫ The actual tools used in the field are cost-prohibitive
for a workshop (>$1500 per license)
• Realistically speaking, obtaining proficiency with respect
to the workshop goals would take several semesters (if
not years) of study and experience.
▫ The good news is that many of these skills can be selftaught!
▫ This workshop should give you a jump start and/or
help you determine if this is something you would like
to learn more about
• “As computers become more advanced, so do criminal
activities. Therefore, the computer forensics niche is in
constant progression along with the technological
advancements of computers,”
▫ Frederick Gallegos
• “Cyberspace is an indefinite place where individuals
transact and communicate. It is the place between
places”
▫ Bruce Sterling (1994)
• “Electronic evidence is likely to exist for most crimes!”
▫ OAS-REMJA Working Group on Cybercrime (2009)
• In mid-2010, over 1.8 BILLION users were on the
Internet exchanging information
▫ www.internetworldstats.com
“This workshop addresses the issues faced by
investigators in an increasingly high-tech world.
The workshop will focus on the investigative
aspect of electronic evidence; the use of
software to assist in investigations, hardware
and operating system fundamentals, defining
computer and computer-related crime,
document management, searching techniques,
identifying possible fraud, and collecting and
documenting electronic evidence.”
Disclaimer (continued)
• I am not a lawyer, nor am I qualified to lecture
on any of the legal policies and/or procedures
involved in the collection of digital evidence
(such as search warrants, etc.). Therefore, I will
not be covering any such topics
• I am not an auditor, investigator, nor do I
represent any law enforcement agency.
Therefore, I will provided limited coverage with
respect to the “applied” aspect of electronic
evidence collection and cyber-forensics.
Crime Terminology
• Computer Crime
These are
the SAME
▫ Any criminal act committed via a computer
• Computer-related Crime
▫ Any criminal act in which a computer is
directly or indirectly involved
• Cybercrime
▫ Any criminal act in which the Internet is
involved (and therefore, multiple computer
systems)
What is computer forensics?
• AKA electronic investigations, system foresnsics,
digital forensics, electronic discovery, etc.
• Term coined in 1991 at the first training session
held by IACIS (International Association of
Computer Specialists)
• Refers to the tools and area(s) of expertise
required to effectively collect, investigate and
analyze evidence within the digital realm
Where is digital forensics used?
•
•
•
•
•
•
•
Law enforcement agencies
Government agencies / military
Law firms, criminal prosecutors
Academic institutions (research)
Corporations
Insurance companies
Individuals
Concerns
• Computer/Cyber crime often lacks physical
boundaries
Criminal Behavior
Profiles, Statistics and Challenges
of Computer Forensic
Investigations
Perceived Insignificance
• More than 1/3 of polled officers believe that the
investigation of computer crime is not necessary
▫ i.e. “interferes with the ability to focus on traditional
crime”
▫ Many investigators focus exclusively on child
pornography
• Cyber criminals themselves are viewed as
harmless “geeks” or “nerds”
▫ This is seldom the case, however!
▫ “cross borders” without a passport (virtually, of
course)
▫ Difficult to coordinate law enforcement in
multiple countries
▫ Some countries do not cooperate at all!
• Lack of physical evidence
▫ Criminal does not need to acquire expensive
“tools”
Ability to Investigate?
• >34% of agencies surveyed had at least 1
individual who had “received training” in
computer crime investigations
▫ <19% feel this person is competent
▫ 12% feel this person can actually do computer
forensic examinations
• 70% of those “trained” claim the training was
“basic, general, introductory, etc.”
▫ Long story short... Many computer/cyber
criminals are outpacing law enforcement.
▫ Fortunately, much of the criminals are nonspecialist users!
Ability to Prosecute?
• Prosecutors lack sufficient knowledge to
prosecute computer crime!
▫ Remember, in order to be prosecuted, you first
have to be caught!
• Often prosecutors place low priority on
computer crime
▫ Violent crimes attract much more attention
The vicinage problem
• The vicinage refers to the location of the physical
act (of crime, presumably)
• Identifying the vicinage is very difficult when the
crime occurs in cyberspace!
• There are no international guidelines for cyberactivity
• Example from the text:
Suppose an individual in Washington, DC uses a server
in Canada to send a threatening e-mail to the
president of the United states. To complicate
matters, let’s assume that this individual utilizes an
anonymizer located in Germany, although the
perpetrator and the victim are in the same area.
▫ What authorities must cooperate with the USA in
order to locate the individual?
Other Complications
• The degree of anonymity is closely related to the
amount of inter-jurisdictional communication
required to locate an individual
• More advanced offenders use:
▫ encryption
▫ steganography
More on these later!
• Legislation has been proposed that would make
encryption keys “discoverable” under court order
▫ How in the world will they pull this one off?
Lack of Reporting
• CSI/FBI Computer Crime and Security Survety
(Goroshko, Ludmila, 2004) report:
• The majority of Fortune 500 companies are
electronically compromised each year
▫ At least $10 Billion in losses per year
• 75% of all businesses have been compromised at
some point
▫ 45% from “insiders”
• Yet, only 17% of such victimizations are
reported to law enforcement!!!!
What does it mean to be anonymous?
• Anonymous e-mail account registration
• Anonymous forum membership
• anonymizer
▫ Sites that mask the IP address of a user
▫ Accomplished through rerouting, deletion/reencapsulation of packet header information
• re-mailers
▫ A form of an anonymizer that strips source address and
other header information from an e-mail message, then
re-sends with alternate information
Some re-mailers then send these e-mails out to other remailers
The forwarding of e-mail messages is often intentionally
delayed by random time intervals
Who is most at risk?
• According to a Department of Justice study:
1. Businesses
2. Individuals
3. Financial Institutions
• Typical criminals:
▫ “Insiders”, i.e. long-term employees (ages 20 – 45)
▫ Most trusted (i.e. most authority/access)
• Motives:
▫ Revenge
▫ Greed
▫ Resentment
What is most investigated?
• A study (Hinduja, 2004) reports that agencies
investigated the following (most common first):
1. harassment / stalking
2. child pornography
3. forgery
4. identity theft
5. e-commerce fraud
6. solicitation of minors
Offender Profiles
• Ages 16 – 57
▫ Most prevalent in upper 30’s to 40’s
• Minimum of high school diploma
▫ Many with college degrees
• Moderate to high technical ability
• Few (or no) prior arrests
• Possession of highly capable computer equipment (and
mass storage)
All of the above describes a the majority of all
Internet users!!!
Based on this information, do you think it is possible to
use profiling to identify potential offenders?
Federal Resources
• FBI, Secret Service, CEPTF (Child Exploitation and
Pornography Task Force)
• Very capable of investigating computer crime
• However, physically impossible to help all
state/local agencies
• In order for the feds to get involved, the crime
must:
▫ Threaten public safety
▫ Involve exploitation of children
▫ etc.
Real Case: “Maxus”
• Hacker by the name of “Maxus” gained access to
almost a half-million credit card numbers from
CD Universe.
• Maxus demanded $100,000 blackmail to
prevent releasing the numbers to the public
• CD Universe alerted the FBI
▫ 25,000 credit card numbers were compromised
▫ “Maxus” still at-large
Extent of Victimization Experience by American Corporations
The SCARY Truth!
•
•
•
•
•
•
•
•
25% detected external system penetration
27% detected denial of service
79% detected employee abuse of Internet privileges
85% detected viruses
19% suffered unauthorized use
19% reported 10 or more incidents
35% reported 2 to 5 incidents
64% acknowledging an attack reported web site vandalism
(60% reported denial of service)
• Over 260 million dollars in damages were reported by
those with documentation
▫ Unreported money loss estimated to be much higher!
Real Case: Western Union
• September, 2000
• Western Union is the world leading money
transfer agency
• Over 15,000 credit/debit numbers were
obtained by intruders
▫ This caused WU to temporarily shut down its
website!!!
▫ Unknown how much money was lost due to the
downtime.
Hardware Theft
• Generally speaking, computer hardware is seldom
secured
▫ Often available in public areas without any security!
• Small is a big problem
▫ Components (i.e. memory, etc.) are extremely small
and can fit in a pocket, wallet, etc.
• Difficult to trace missing components
• There is a market for this stuff!
▫ Many online auctions selling stolen merchandise
▫ Black market dealers
▫ Market values differ
For example, at one time a $1K CPU in the US was worth
$3K in the UK
Intellectual Property Theft
• AKA software theft
• Real Case: August 2001:
▫ FBI arrested a group of men possessing $10
Million in counterfeit Microsoft software
▫ DVD install discs had mock hologram
• Data Piracy
▫ Also referred to as software piracy
▫ The reproduction, distribution and use of
software without the permission of the
copyright owner.
▫ Very, very difficult to prevent
Software Piracy
• Shareware
▫ Freely distributed software, but different from
“freeware”
▫ Sharing with friends/colleagues is encouraged
▫ Authors ask for a voluntary donation from users
(but it is not required)
Occasionally “registered” versions of the software
may have enhanced functionality to encourage
donations
Electronic Evidence
• WareZ
▫ Commercial programs that have been made
publically available through “wareZ sites”
▫ Owners/administrators of wareZ sites are highly
elusive, well-educated and therefore, avoid
prosecution
Electronic Evidence
Def’n: data and information of some
investigative value that are stored on or
transmitted by an electronic device (usually in
digital form)
• What does it mean to be in “digital form”?
Electronic Evidence
• Circumstantial
▫ Indirect. Obtained by synthesizing an idea from seemingly
unrelated facts
• Physical
▫ Factual, undeniable evidence.
▫ Interpretation may be prone to error.
• Hearsay
▫ Statements made out of court by someone not giving testimony
(generally not admissible)
• Repeatability and Reproducibility
▫ Repeatability – the ability to get the same results in the same
testing environment
▫ Reproducibility – the ability to get the same test results in a
different testing environment
Chain-of-Custody
(Chain-of-Evidence)
The Four-Step Process
• The route the evidence takes from initial
possession until final disposition.
• Very important with respect to computer
forensics!
Acquisition
Identification
▫ Careful record keeping and procedures will help to
ensure a valid chain-of-custody
▫ Failure to do so dismisses the evidence you have
collected!
Evaluation
Presentation
The Four-Step Process
• Acquisition
▫ Gathering evidence/data from a crime either currently in
progress or, more commonly, one that has already occurred.
• Identification
▫ Evidence classified with physical (i.e. a hard drive), and logical
relationships (i.e. the location of evidence on the drive)
• Evaluation
▫ Is the computer evidence valid/relevant?
▫ Quality of evidence, not quantity!
• Presentation
▫ Filter out non-critical evidence and decide on the most profound
exhibits to use for legal proceedings
Discussion Points
• Why are individual victims reluctant to report
computer crime?
• Why are private corporations reluctant to report
computer crime?
• What can be done by a company to help prevent
computer crime (both internal and external)
• Why do you think bulletin boards (and chat
rooms) are favored by some deviant
subcultures?
Hardware Basics
• Three basic computer components:
Computer Terminology
Hardware, Software, File System
Structure
▫ Hardware
▫ Software
▫ Firmware
Data & Storage Basics
• The structure of data is extremely simple:
Binary
▫ Note: If you do not understand the binary number
system, please read up on this (you learned this in
CIS111)
• Bit = Binary Digit (0 or 1)
▫ Maybe 0 = off, 1 = on
▫ Maybe 0 = false, 1 = true
• Byte = 8 bits
Bigger Bytes!
• All larger byte representations are in blocks of size
2N
▫
▫
▫
▫
Kilobyte = 210 bytes (or 1,024 bytes)
Megabyte = 220 bytes (or 1,048,576 bytes)
Gigabyte = 230 bytes (or 1,073,741,823 bytes)
Terabyte = 240 bytes (or 1,099,511,627,776 bytes)
• Acronyms are KB, MB, GB, TB, respectively.
• Roughly speaking, the entire library of congress
could fit (uncompressed!) on 10 TB
Take the bus!
• Buses
▫ Sets of parallel wires connecting various
components
▫ Parallel wiring allows several bits to be
simultaneous transmitted
• USB (universal serial bus)
▫ Allows quick connection to system bus
Nibbles & Bytes
Let
= 1 byte
Word =
Double Word =
A single character can be represented by a single
byte (spaces and newlines are considered
characters)
Therefore, how many bytes would it take to store
the phrase Forensic Computing? How many
words?
Hardware
• Motherboard:
▫ Primary circuit board in which all other
components attach
• PC cards – “expansion” cards. Not so
common anymore (what used to be
optional components are now standard on
the motherboard)...
▫ Connected to PCI (peripheral component
interconnect) express bus
• CPU
▫ Hz = Cycles per second
▫ MHz = 1 Million Hz
▫ GHz = 1 Billion Hz
RAM
• Random Access Memory
• Temporary storage of application(s) and some
data
• Volatile
Hard Drives (HDD) & Mass Storage
Devices
• “permanent” storage solution
• death of the floppy disk!
▫ 1960’s saw 5.25”, then 3.5”
▫ Many people still have boxes full of floppy disks
containing valuable data
• CD/DVD/Blue Ray RW Drives becoming
standard
• Storage:
Other Mass Storage Media
• Memory cards (AKA flash memory)
▫
▫
▫
▫
▫
SD
Micro-SD
Flash drives (AKA jump drives, dongles, etc.)
Compact Flash (CF)
XD
▫ CD storage ≈ 650 – 850 MB
▫ DVD storage ≈ 4.7 GB (dual layers ≈ 8.5 GB)
▫ Blue Ray storage ≈ 25 GB (dual layers ≈ 50 GB)
Multi-Format Card Readers
• Commonly installed in most desktops (also
becoming common in laptops)
• Necessary for digital camera / video
enthusiasts.
• Necessary for digital forensic lab
Software
• Three major types:
▫ Boot sequence
▫ Operating system (OS)
▫ Application software
Handheld Devices
• Thousands of handheld devices have been
released over the years
• Many of these devices mimic the capabilities of
standard computers
• Frequently, these devices contain crucial digital
evidence
Boot sequence
• Series of steps taken by the computer starting
immediately after it is powered on
• “pulling itself up by its bootstraps”
• Initial boot sequence loads low level
software/data from CMOS
• Once completed, the OS begins to load
Battery
CMOS (Complementary
metal-oxide semiconductor):
small memory chip on the
motherboard
Operating Systems (OS)
• A “layer” of software that provides:
▫ A level of abstraction from the hardware
▫ An interface between the application programs and the
hardware
▫ A method of visually accessing the file system (and other
components)
▫ GUI
• Most popular:
▫
▫
▫
▫
Microsoft Windows (1987)
MAC (1980’s) (new systems are Unix-based)
Unix (1969)
Linux (1990) – Unix-like & freely distributed
Elementary Networking
• Routers
▫ Special-purpose computers to handle
connections between 2 or more networks
▫ Kind of like “traffic cops” for packets
• Hubs
▫ Central switching devices
▫ “Dumb routing”
• Packets
▫ Relatively small chunks of data labeled for
delivery somewhere on a network
▫ Consist of control information (header) and data
IP’s / Domains / DNS
• IP Address
▫ 32 bit (4 byte) logical numerical identifier for a
machine (must be globally unique*)
Kind of like a “phone number”
IPv6 uses128 bit addresses
Ex: 36.231.98.53
• Domain
▫ A group of IP addresses identified by a domain name
(i.e. strose.edu)
• DNS (Domain Name System)
▫ In short, this is a mapping from domain names to IP
addresses
Internet
• Started: ARPANet (September, 1969)
• network
▫ The interconnection of 2 or more communicating
devices
• internet
▫ The interconnection of 2 or more networks
▫ “The” Internet is the most well-known example
• Note: Internet ≠ WWW
Cookies
• Used by HTTP
• Pieces of information sent from a web server
to a local host machine (web browser)
• Saved on local machine
• Sent back to the server when requested
• Examples:
▫ Login/registration information
▫ Shopping cart contents & info
▫ User preferences for a site
WWW
• World Wide Web
▫ The layer of software & applications that “sit” on the
physical Internet.
• Contents:
▫ Web pages / sites
▫ Newsgroups / Bulletin Boards
▫ IRC (Internet relay chat): “chat rooms”
Individuals use “nicknames” that must be unique for each
system.
Therefore, an individual belonging to several chat sites may have
multiple nicknames
Connections to the Internet
• Digital Subscriber Line (DSL)
▫ Various types: ADSL, HDSL, RADSL
▫ Use standard phone line (if communications hardware
is available within your area)
• Cable Modem
▫ Use standard co-ax cable (usually cable TV provider
offers this service)
Microsoft File System
Very Brief Overview
• Dial-up Modem
▫ Uses phone line (very slow connection speed)
• Satellite
▫ For those in the “boonies”
FAT & NTFS
• AKA Windows/DOS
• FAT (file allocation table)
▫ Older MS file system, but frequently used as the
file system in removable media
▫ Locations of data on disk stored in table(s)
▫ History: FAT12, FAT16, then FAT32
Ex: FAT32 = 32 bit addressing (max file size is
therefore 232 or about 4.3 GB)
• NTFS (new technology file system)
▫ Replacement for FAT
• FAT/NTFS determine how/where files can be
“hidden” on the disk
Partitions
• A portion of the HDD separated from others
• Example:
▫ You can install 2 OS’s on the same machine (but
each usually resides on its own partition)
• Partition gaps
▫ Data can be hidden between partitions
▫ A utility can be used to remove references to hidden
locations on the drive
Utilities include Norton Disk Edit, WinHex, Hex
Workshop
Registry
• Hardware / Software configuration database
• Access to registry through REGEDIT application
Windows File System Structure
A Brief Overview
Hard Disks: Sector Format
• A sector is the basic 512 byte unit of storage on a HDD.
• In addition to data, each sector stores some control
information:
▫ ID: sector number that identifies it on disk (and contains status
info)
▫ Synchronization fields: helps guide the read process
▫ Error-Checking Code (ECC): for data integrity
▫ Gaps: Spaces provided to allow enough time for the drive
controller to continue the read process
Clusters
• Smallest logical storage unit on a HDD
• Contiguous “chunks” of space managed by the file
system for efficient storage
• Clusters can range in size from 4 sectors (2,048 bytes)
to 64 sectors (32,768 bytes)
• Example: Consider a 3000 byte (1.3Kb) file
▫ If the system uses 2,048 byte clusters, how many clusters
would this file occupy?
▫ How much space is wasted?
• Slack Space
Control
▫ The area between the end of a file and the end of a
cluster.
▫ This unused space is still assigned to the file
Data
512 bytes
Windows - NTFS
NTFS – Deleting Files
• New Technology File System
• Capable of self-repair and high-performance
• Supports:
▫
▫
▫
▫
Large volume storage
File-level security (encryption, decryption)
Compression
Auditing
• NTFS Master File Table (MFT)
▫ Stores information regarding file attributes
• When the number of files on an NTFS volume increases,
the size of the MFT increases
• Utilities that defragment NTFS volumes on Windows
systems cannot move MFT entries
• Files deleted within Windows are moved to the
Recycle Bin
▫ All information about the original file (and
original location) is maintained
• Files deleted from a command prompt are not
moved to the Recycle Bin
▫ However, we can still recover all or part of the file
using Windows forensic tools
▫ NTFS reserves space for the MFT to maintain it as it expands
NTFS – Data Streams
C:\Java>dir
Volume in drive C has no label.
Volume Serial Number is 1E9E-1A20
C:\Java>more < myfile.txt:stream1
text_message
C:\Java>dir
Volume in drive C has no label.
Volume Serial Number is 1E9E-1A20
Directory of C:\Java
03/02/2010
03/02/2010
02/06/2010
03/02/2010
NTFS File Streams
01:57 PM
<DIR>
.
01:57 PM
<DIR>
..
10:03 PM
8,680,549 drjava.jar
01:49 PM
40,960 pmdump.exe
2 File(s)
8,721,509 bytes
2 Dir(s) 82,108,739,584 bytes free
C:\Java>echo text_message > myfile.txt:stream1
Directory of C:\Java
03/16/2010
03/16/2010
02/06/2010
03/16/2010
03/02/2010
10:41 AM
<DIR>
.
10:41 AM
<DIR>
..
10:03 PM
8,680,549 drjava.jar
10:41 AM
0 myfile.txt
01:49 PM
40,960 pmdump.exe
3 File(s)
8,721,509 bytes
2 Dir(s) 82,108,731,392 bytes free
What happened?
File size difference: 8,192
bytes
• A file stream (AKA alternative stream) is the ability to hide
data “behind a file”
▫ This can be text, images, etc… or a virus!
• Example:
▫ You could create a simple file 1KB in size, like resume.txt
▫ Then… attach a 3MB executable file to it (hidden, of course)
▫ The file system would report this file as 1KB resume.txt file
• Deleting the file deletes the stream
• As an exercise:
▫ Next time you are on your laptop, try hiding an executable file
behind a simple text file. Look at the file size both at the
command prompt and with the Windows GUI
WinHex
WinHex Screen Shot
• A hexadecimal editor capable of viewing all types
of files, including hidden files, streams and slack
space on virtually any media.
• The tool allows you to examine the contents of
RAM as well.
• Free-version has just enough capability for
personal use. A commercial version has many
more features (and less annoying limitations)
Plain-text Editor
(NotePad++)
Text Files
• By far, the simplest of all file formats
• Usually contain simple ASCII text. In other words,
the file is simply a set of contiguous bytes (one for
each character)
File Size:
120 bytes
WinHex
▫ Special characters, such as line-feed, carriage return,
space, tab, etc. exist in there as well
• Several types of files are plain-text:
▫
▫
▫
▫
HTML files
Source-code files
Log files
Etc.
Word Processor and Spreadsheet
Documents
• As we know, these contain much more data than
just the text.
• In fact, the file size can be several thousand
times larger than the equivalent text-only
version!
• Much of the “extra” information can be parsed
using WinHex (or similar tool)
▫ Locate user, group(company), author, revision
history, etc.
File Size:
26,112 bytes!!!
I had to scroll all the way down to here
just to see the document contents
File Size:
9,542 bytes
Notice the lines and lines of data showing up within WinHex for a
simple document!
Author (user)
WOW!!!
Image Files
Image Files
• Most image files have a header record (at the
very beginning of the file) that indicates that it
is, in fact, an image
• JPEG (by far, most popular)
▫ Joint Photographic Expert Group (1986)
• Two compression modes:
▫ Lossless (compression ratio ≈ 3:1)
▫ Lossy (compression ration ≈10:1)
▫ JPEG files have “JFIF” within the header
▫ PNG files have “PNG” within the header
• Therefore, opening a suspected image file within
a hex editor can be an easy way to determine the
actual file type.
Note: Time permitting, there will be a more
detailed discussion regarding digital images
• “Q number” controls the quality
▫ Q = 100 denotes highest quality, minimal
compression
▫ The lower the Q number, the lower the quality,
and hence higher the compression (and
quantization)
JPEG compression process
(Lyu, S., 2010)
(Lyu, S., 2010)
Steganography
Application of Steganography
• Steganography
• Purposes:
▫ Practice of embedding hidden messages within a
carrier medium
• Modern steganography works by replacing bits
of useless or unused data in regular computer
files with bits of different, invisible information
• Steganography can also be used to supplement
encryption
Classification of Steganography
(cont’d.)
▫
▫
▫
▫
▫
Medical records
Workplace communication
Digital music
Terrorism
The movie industry
Detecting Steganography
• Indicators include:
▫ Software clues on the computer
▫ Other program files
▫ Multimedia files
• Detection Techniques
▫
▫
▫
▫
▫
LSB substitutes the rightmost bit
in the binary notation with a bit
from the embedded message.
Tools
Cryptography
• Tools include:
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
2Mosaic
FortKnox
BlindSide
S-Tools
StegHide
Snow
Camera/Shy
Steganos
Pretty Good Envelope
Gifshuffle
JPHS
wbStego
OutGuess
Invisible Secrets 4
Masker
Data Stash
Hydan
Cloak
StegaNote
Statistical tests
Stegdetect
Stegbreak
Visible noise
Appended spaces and invisible characters
◦
◦
◦
◦
◦
◦
◦
◦
◦
Stegomagic
Hermetic Stego
StegParty
StegoSuite
StegoWatch
StegoAnalyst
StegoBreak
StegSpy
Stego Hunter
◦
◦
◦
◦
◦
◦
◦
◦
◦
WNSTORM
Xidie
CryptArkan
Info Stego
Stealth Files
InPlainView
EzStego
Jpegx
Camouflage
◦
◦
◦
◦
◦
◦
◦
Scramdisk
CryptoBola JPEG
Steganosaurus
ByteShelter I
appendX
Z-File
MandelSteg and GIFExtract
Here’s an example using an online tool “Steganografie”
http://www.kwebbel.net/stega/enindex.php
• Cryptography
▫ Art of writing text or data in a secret code
• Three types of cryptographic schemes used:
▫ Secret-key (or symmetric) cryptography
▫ Public-key (or asymmetric) cryptography
▫ Hash function
• Steganography Versus Cryptography
▫ Steganography replaces bits of unused data from
various media files with other bits that, when
assembled, reveal a hidden message
▫ In cryptography an encrypted message that is
communicated can be detected but cannot be read
Watermarking
• Digital watermarks
▫ Digital stamps embedded into digital signals
• Application of Watermarking
▫
▫
▫
▫
▫
Embedding copyright statements
Monitoring and tracking copyright material
Providing automatic audits of radio transmissions
Supporting data augmentation
Supporting fingerprint applications
• Steganography Versus Watermarking
Now, let’s go through tutorial 1
http://academic2.strose.edu/math_and_science/
macdonai/EEWorkshop
▫ Main goal of steganography is to protect the data
from detection, while that of watermarking is to
protect data from distortion by others
What is Identity Theft / Fraud
• What is identity?
• Affects over 10 million Americans each year.
• Methods:
▫
▫
▫
▫
▫
Defining & Identifying Fraud
Focus on Identity Theft & Fraud
Eavesdropping
Postal mail theft
Dumpster diving
Computer theft / hacking
… and more!
• Motives include:
▫ Economic gain
▫ Access to secure information
▫ Revenge (rare)
Implications
•
•
•
•
Financial loss (and inconvenience)
Tarnished reputation
Unauthorized access
National security!
▫ Border crossings
▫ Immigration
▫ Airline & public transportation security
Refining Our Definition of Identity
Theft vs. Identity Fraud
• Identity Theft
“the illegal use or transfer of a third party’s personal
identification information with unlawful intent”
• Identity Fraud
“a vast array of illegal activities based on fraudulent
use of identifying information of a real or fictitious
person”
• Main types of identity theft/fraud in the US:
1.
2.
3.
4.
5.
Assumption of identity
Theft for employment and/or border entry
Criminal record identity theft/fraud
Virtual identity theft/fraud
Credit or financial theft
Assumption of Identity
• Rare, difficult to pull off
• Criminal assumes the identity of the victim
▫
▫
▫
▫
Personal histories
Friendships / relationships
Job
Etc.
Theft for Employment and/or Border
Entry
• Illegal immigration is a serious problem in the
US!
• Real case: INS has seized/intercepted tens of
thousands of fraudulent documents, such as:
Alien registration cards
Visas
Passports
Citizenship documents
Employment eligibility documents
Criminal Record Identity Theft/Fraud
Virtual Identity Theft/Fraud
• In this case, an innocent victim may appear to have a
criminal record
• Often goes unnoticed by victim for a long period of time
• The burden is often on the victim:
▫ First, the victim must clear his/her name and prove
innocence
▫ Second, must obtain a court order to expunge the
record(s)
▫ This could all take years to resolve
▫ This typically costs the victim a substantial amount of
money in legal fees
• A development of a fraudulent virtual
personality
• Easy to construct:
Credit Identity Theft/Fraud
• Most common
• Identity theft/fraud to facilitate the creation
of fraudulent accounts / credit cards
• Does not include stolen credit cards, rather
just someone’s credit
• In 2006, the FTC reported 3.2 million
Americans fell victim to credit identity
theft/fraud.
▫ i.e. I could create a virtual identity that I am 6’9”
and bald with a historic college basketball career.
• Often used for online dating, flirtations
▫ Old pictures, fake age, etc.
• May be used as a method for stalking/harassing
or financial fraud
Reporting
• Again, we are faced with inaccuracies in the
statistics for identity theft/fraud
▫ Delay in reporting/awareness
▫ Private companies want to protect their own
interests
▫ Lack of mandatory reporting to federal agencies
▫ Lack of national measurement standards
• Information sources
▫
▫
▫
▫
Credit reporting agencies
Software companies
Popular and trade media
Government agencies
• Often identity theft/fraud is simply a component
in a larger crime and is therefore not separately
reported
Some Statistics
• 2002 General Accounting Office – first study of ID
theft/fraud
▫ Identity theft/fraud cases are increasing
▫ Most common consumer complaint to FTC
▫ ~5% of those surveyed had been victims of identity
theft/fraud
▫ ~6% of Americans had seen unauthorized purchases on
credit cards
▫ ~13% had discovered the misuse of their personal
information
▫ Of FTC complaints:
42% identity theft used in conjunction with credit card fraud
20% unauthorized telecommunications/utility services
13% bank fraud
9% personal info used for employment purposes
7% fraudulent loans
6% falsifying government documents or fraudulent receipt of
benefits
Methods of ID Theft
• Mail theft
▫ Personal mail contains personal/sensitive info
▫ Often mailboxes are not secure storage for mail!
▫ “popcorning”
Targeting mailboxes with the red flag up (i.e.
outgoing mail)
Often contain credit card payments, information,
etc.
▫ Consider using the post office for sending
sensitive mail
• Insiders
▫ Yes, again!
▫ Can be either intentional or accidental
▫ Example: In 2005 Citigroup reported that UPS
lost the personal financial information of about 4
million customers
Talk about blaming someone else!!!
• Fraudulent or Fictitious Companies
▫ Collect and process information either voluntarily
from naïve customers or without the victims
knowledge at all
▫ Examples:
Choicepoint - A US company that collects all sorts of
information about people across the country
The cost of Victimization
• Disclaimer: Lack of accurate / mandatory reporting
makes cost estimation difficult
• Cost should be measured by both:
▫ Loss of money (may be recoverable)
▫ Loss of time (not recoverable)
• Average time before awareness of identity theft/fraud
activity is 12 – 14 months (Stuart, et. al, 2004)
• Costs to economy exceeded 50 billion in 2007
(America)
▫ Much less in other countries
• Common victim profile:
▫ White male, early 40s, living in metropolitan areas
• Dumpster Diving
▫ Sifting through commercial/residential trash
looking for sensitive documents
▫ Combating this:
Paper shredders
Disk-wiping software
• Physical Computer Theft
▫ Very common
▫ Personal information can be retrieved very quickly
• Bag Operations
▫ Sneaking into a hotel room to obtain information
from computers, paperwork, removable media, etc.
▫ Entire hard drives can be copied onto removable
media in a short period of time.
Danger at the ATM
• Reading / recording information from the
magnetic strip on an ATM or credit card
▫ This information can then be “re-printed” on a
secondary card and used for fraudulent purchases
or cash withdrawals
• Card skimmers
▫ Mini cameras/copiers mounted near ATM
machines (or near point-of-sale machines)
▫ See images on next slide…
Virtual/Internet Methods
• Many Internet protocols are not secure
Card skimming equipment is installed over existing card reader
▫ SMTP (application-level e-mail protocol)
▫ FTP (application-level file transfer protocol)
▫ HTTP (application-level protocol for transferring web
pages from server to client)
• Phishing
▫ Solicitation of information via e-mail
▫ Redirection to fake websites
▫ Stats
2004-05 73 million Americans received
at least 50 phishing e-mails
2.4 million reported losing money
PIN # camera
• Popular phishing
examples
• Categories of phishing
▫ spoofing
Using company trademarks/logos so that the e-mail
appears to be valid (30% linked to e-bay/Paypal!)
▫ pharming
redirects the IP address from a legitimate site to a
phishing site (accomplished through DNS modification,
virus, etc.)
▫ redirectors
redirect network traffic to undesired sites
often redirects to fraudulent DNS servers (most DNS
look-ups are “good” to delay user detection)
▫ advance-fee fraud (or 419 fraud)
promise of large financial windfall if personal
information is given
▫ phishing Trojans & spyware
executable malicious code typically attached to e-mail,
but can also be executed remotely
• Keyloggers
▫ Inexpensive ($20 - $300)
▫ USB loggers now available
▫ devices/software which record keystroke data
▫ can be stored locally then “collected” or transmitted to
a remote location periodically
▫ may also capture screen shots
▫ goal is most often to retrieve usernames/passwords
Protecting Yourself
• Monitor the use of your cards/accounts!!!
• Check with the major credit reporting agencies:
▫ Equifax (www.equifax.com) 1-800-685-1111
▫ Experian (www.experian.com) 1-888-EXPERIAN
▫ Trans Union (www.tuc.com) 1-800-916-8800
• FDIC can answer some FAQs about regulations and
banking practices
▫ Google “Fair and Accurate Credit Transactions Act (FACTA)
under the FTC / FDIC
• Check – verification services
▫ If you suspect someone is writing checks under your name,
contact the merchant’s check-verification company
Ex: www.checkrite.com, www.crosscheck.com,
www.equifax.com, www.telecheck.com, etc.
Tips to Avoid ID Fraud
• From Newman text:
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
▫
Be suspicious of contests
Beware of imposters
Beware of suspicious downloads
Do not respond to unsolicited e-mails
Guard personal information
Look for complaints and complaint processes
Pay using the safest method
Remember that easy money does not exist!!!!
Research the dealer or vendor
Resist pressure
Understand the offer!
Contacts
• Government
▫
▫
▫
▫
▫
Consumer.gov – www.consumer.gov
FBI – www.fbi.gov
FDIC – www.fdic.gov
Federal Trade Commission – www.ftc.gov
US postal service – www.usps.com/postalinspector
• Nongovernment
▫ Better business bureau – www.bbb.org
▫ National Association of Attorneys General www.naag.org
▫ National Consumers League – www.nclnet.org
▫ National Fraud Information Center – www.fraud.org
Initial Steps
• If you are the “first responder”
Collecting and Documenting
Electronic Evidence
▫ Make every effort to not move the electronic
device(s)
• Wherever and whenever possible obtain a
detailed record using:
▫ Video
▫ Photography
▫ Notes/sketches
Documentation – Initial Steps
• Immediately record:
▫
▫
▫
▫
▫
The type of device
Location
Position of computer(s)/device(s)
List of connected peripherals
Any wireless access points (WAPs) that may be
capable of connecting to other devices
The presence of such a device may indicate the
existence of evidence external to the initial scene
Documentation – Initial Steps
• If the monitor is on
▫ Immediately photograph and take notes of the
display contents
• If the monitor is off (i.e. suspended) or if a
screen-saver is presently running
▫ Slightly move the mouse (do not press any keys on
the keyboard)
▫ Photograph/record display contents
Documentation – Label Everything!
• Be sure to label all cables and connections so
that they can be referred to later in your report
Collecting Physical Evidence (cont.)
• When NOT to pull the plug:
▫ If any of the following are actively in use on the
machine:
Chat rooms
Instant message windows
Open documents
Remote data storage connections
Obvious illegal activity
Etc.
• Refer to the handout entitled “Collecting Digital
Evidence Flow Chart”
Volatile vs Non-Volatile Data
• Volatile data
▫ Must be retrieved while computer is still on (and
in a state left by the alleged perpetrator)
▫ Stored in RAM, cache or some temporary file
▫ A shutdown (or re-boot) may destroy or distort the
data
• Non-volatile data
▫ May be retrieved at any time (i.e. off-site or after
collection)
▫ Permanent storage within a system or data file
Collecting Physical Evidence
• Often, the device(s) used to generate the evidence in question must
be confiscated.
• If the computer is OFF:
▫ Carefully package all evidence devices, power cables, etc. AFTER
you have fully photographed and documented the scene.
• If the computer is ON:
▫ Removing the power supply is usually the safest option, but
proceed according to your company’s policy
▫ If running Windows, pulling the power cord will preserve the last
user’s login information and many other recently performed
actions
▫ Again, this should be done only after thoroughly collecting written
and photographic evidence
Example: Evidence to Collect When
Online/Economic Fraud is Expected
• According to the First Responders Guide,
potential electronic evidence may exist in:
• Computers
• Removable media
• Mobile communication
devices
• External storage devices
• Online auction
sites/account data
• databases
• PDAs, address books,
contact lists
• Any printed e-mail,
notes, letters, etc.
• Calendars or journals
• Financial asset records
• Accounting or
recordkeeping software
• Photos and/or image files
Quick Note…
• The process of collecting volatile and nonvolatile data from a computer system requires a
rather large set of software tools and a great deal
of experience and study.
• Further complications arise with:
▫ Different operating systems (and versions): MAC,
Linux, Windows, etc.
▫ Multiple software solutions for every possible
need in the evidence collection process
Tip #1
Communicating with IT
Professionals
Tip #2
• If you are investigating a case involving
electronic evidence, be sure to involve qualified
IT professionals as much as possible
▫ You may not be able to retrieve the information
you are looking for without help
• Use e-mail whenever possible to seek the help of
an IT professional
▫ Maintain a record of questions and answers for
future use
▫ Written queries allow you to take the time to
explain your problem clearly
▫ Most IT professionals tend to respond quicker to
e-mail inquiries (as opposed to voice messages).
Tip #3
• It may not be possible to maintain anonymity
while investigating an individual.
▫ Be sure to use only the IT professionals with the
necessary clearance to assist you.
▫ Keep all communication confidential
Ex: system administrators may be able to access
archived e-mail records (end-users typically delete emails that contain self-incriminating evidence)
▫ IT staff usually have the tools/utilities to assist
you in an investigation
Tip #4
• Keep a record of every action performed by both
you and the IT professional(s).
▫ It is often helpful to request the IT professional
keep their own record (you can never have too
much documentation!)
▫ Be there while the IT professional is assisting in
the case. Don’t hesitate to ask questions!
Discussion Points
• What are some of the things to look for when identifying
fraudulent electronic documents (e-mail, web pages,
correspondence, etc.)?
▫ I’m interested in hearing your ideas specific to your
occupation/needs (hypothetical cases, of course)
• For non-IT professionals, what are some of the challenges
faced when communicating with IT professionals?
▫ i.e. what do you need from your IT staff?
• For IT professionals, what are some of the challenges
faced when communicating with non-IT professionals?
▫ i.e. what do you need from your non-IT staff in order to do
your job?
What is a Search Engine?
Keyword v. Advanced Conceptual
Searching
• There are millions and millions of web
pages with interesting content out there!
• Imagine the number of potential words or
phrases one could search!!!
• A search engine is a web-based
application that specializes in information
retrieval.
How does a Search Engine Work?
Search Engine Example - Google
• Input:
• Google searches for keywords that appear
in web pages.
• Rankings are done based on a number of
factors, including:
▫ Keywords & phrases from end user
• Output:
▫ Hyperlinks to web content, usually sorted
based on relevancy. We call this the search
engine result page, (or SERP)
• Mechanism:
▫ Complex internal algorithms
▫ Most search engines are “always working”,
crawling through the web 24-7 in an effort to
better optimize the engine
Search Engines – Tips & Tricks
• Search for key words, not entire sentences
• Leave out connective words, such as “the”,
“a”, “of”, “on”, etc.
▫ Ex: “fiddler roof”, not “fiddler on the roof”
• Try not to be too specific! Try being
general and digging deeper from the
SERP.
▫ Ex: “Saint Rose computer science”, not
“College Saint Rose computer science
department faculty artificial intelligence
publications recent ”
▫ The number of times the keyword appears on a
page
• Crawlers (or “spiders”) continuously
traverse link-to-link through the Internet
to build index pages for certain keywords.
Search Engines – Tips & Tricks (2)
Keyword Search - Limitations
• Keyword searches work pretty well… however, they do
have some significant limitations
• Fundamental Problem: index term synonymy
▫ That is, not all documents use the same words to refer
to similar concepts
• Polysemy
▫ Many words have multiple meanings
▫ This can create loads of irrelevant query results
▫ For example: “Axe” image results include:
Keyword Search Example
• From (Kiryakov, A., et. al, 2007):
▫ Take the query following query:
“telecom company” Europe “John Smith”
director
▫ Keyword search would NOT return:
“At its meeting on the 10th of May, the
board of London-based O2 appointed John
Smith as CTO”
Keyword Search – Limitations (cont.)
• Keyword searches are destined for text-based
results
• Images, videos, audio files, etc. must be
explicitly labeled (or “tagged”) in order to show
up in a query result set.
• In short, we need to model “ideas”, not simply
keywords
Thesaurus Expansion
• Easy method for expanding the keyword search
▫ Expands the search using several different keywords
within the same concept
▫ When not available, can be implemented manually
▫ Still may return non-relevant query results
▫ Why not? Well, the search engine would need to
know the following relationships:
1. O2 is a kind of telecom company
2. London is located in Europe
3. CTO is a type of “director”
What is Concept-Search?
• A concept-based search takes a query string and
expands it using relevant terms based on a
defined lexicon (the words and expressions of a
language)
• For example (Taylor, C., 2009):
▫ The search phrase: “bank account”
May be easy to modify to return bank, banks,
account, accounts, banking, etc.
▫ A concept search can return related results:
“bank” and “account”, but also “deposit”, “funds”,
“withdrawal”, “transaction”, etc.
Google Concept
• Uses Google’s search engine technology
• Adds a feature called mind-mapping
▫ Think of this as a tree-like structure with the
central topic at the center
▫ Each “branch” leads to a subtopic which can be a:
word
phrase, or
image
• The central topic and all subtopics are treated as
keywords and fed into the Google search engine
▫ All search results are broken down into the central
topic and all subtopics
TheBrain Software
• Tool for generating a mind map
• Watch video here.
Hardware Solutions for Searching
• Several companies provide hardware to better
facilitate and optimize the search process.
• Example: “Search Appliance” by Thunderstone
▫ None of these solutions are cheap…
• Google also sells a hardware search appliance,
i.e. Google Mini
Mind Mapping
• Start with the central idea and place this in the
direct center of your page
• Here is a decent video tutorial to explain the
general concept.
Concept Search Products for Windows
• conceptSearching (conceptsearching.com) has
released a product called conceptClassifier for
Microsoft SharePoint
▫ The product automatically generates metadata
and extracts concepts from content as it is created
Therefore, the solution may need to be installed and
active on a machine before any potential “evidence”
is generated
• not free!
Concept Search Example
• Real Case (Taylor, C., 2009): A NJ Law Journal
reported that a company (not named) was conducting an
internal investigation looking for insiders involved in
embezzlement.
• Keyword searches related to banks, accounts, deposits,
etc. turned up nothing useful
• A concept-based search was then run on clustered and
threaded terms. The result came up with “A large
number of baseball-related discussions between two men
who were not sports fans”
▫ The company matched terms, e-mail dates, etc. with
bank transfers, deposits, etc.
WordNet (Princeton University)
• Contains a large database of cognitive synonym
sets (synsets) that group synonymous or similar
words together as concepts.
• This could be used to expand the keyword
search.
• Access Web-Interface through here:
▫ http://wordnet.princeton.edu/
• Unix-based downloadable product available
▫ Windows version coming soon…
WordNet (cont.)
• Antonymy
▫ E.g. rich and poor are antonyms
▫ Not a relationship between word meanings
i.e. {rise, ascend} and {fall, descend} are conceptual
opposites, but not antonyms
Now, let’s go through tutorial 2
• Hyponymy
▫ A semantic relationship between word meanings
i.e. maple is a hyponym of tree
tree is a hyponym of plant
http://academic2.strose.edu/math_and_science/
macdonai/EEWorkshop
• Meronymy
▫ The part-whole relation
▫ A y has an x (as a part) or x is a part of y
Thank you!!!
• Please feel free to contact me at any time if you
have any questions!
• My on-campus phone is (518)454-5163,
however, I am usually much easier to reach via
e-mail ([email protected])
References
•
•
•
•
•
•
•
•
•
•
•
•
Britz, M., “Computer Forensics and Cyber Crime: An Introduction”, Pearson (2009)
Newman, Robert., “Computer Forensics: Evidence Collection and Management”, Auerbach
Publications (2007)
EC-Council, “Computer Forensics: Investigating Hard Disks, File and Operating Systems”,
Cengage (2010)
Lyu, S., “Digital Image Forensics”, University at Albany, SUNY (2010)
EC-Council, “Computer Forensics: Investigating Image and Data Files”, Cengage (2010)
U.S. Department of Justice / NIJ, “Electronic Crime Scene Investigation: A Guide for First
Responders”, 2nd ed. (2008)
Kiryakov, A., et. al, “Concept Searching”, Information Society public document (2007)
Woods, D., “Introducting Google Concept”, http://ezinearticles.com/?Introducing-GoogleConcept&id=129641
Marcella, A. J. & Menendez, D., “Cyber Forensics: A Field Manual for Collecting, Examining, and
Preserving Evidence of Computer Crimes”, Auerbach Publications (2008).
Taylor, C. , “A Quick Look at Concept Search”, NetworkComputing.com ,
http://www.networkcomputing.com/e-discovery/a-quick-look-at-concept-search.php (2009)
Miller, George A. "WordNet - About Us." WordNet. Princeton University. 2009.
<http://wordnet.princeton.edu>
Vacca, J.& Rudolph, K., “System Forensics, Investigation, and Response”, Jones & Bartlett
Learning (2010)