The Link Between Pharmaceutical Spam And Illicit Online

THE LINK BETWEEN PHARMACEUTICAL SPAM AND ILLICIT ONLINE
PHARMACIES
by
BRANDI JEFFERSON
DR. ELIZABETH GARDNER, CHAIR
DR. JASON LINVILLE
DARRELL BURKE
A THESIS
Submitted to the graduate faculty of The University of Alabama at Birmingham,
in partial fulfillment of the requirements for the degree of
Master of Science
BIRMINGHAM, ALABAMA
2013
THE LINK BETWEEN PHARMACEUTICAL SPAM AND ILLICIT ONLINE
PHARMACIES
BRANDI JEFFERSON
MASTER OF SCIENCE IN FORENSIC SCIENCE
ABSTRACT
Spam, or its more formal name, unsolicited commercial e-mail, is a problem for
every person in the world that uses the internet. Spam is mostly illegal but there are legal
types of spam. The spam found in emails that are not requested, or that are on social networks and search engines can be illegal spam. Illegal spam masks the original destination
and normally leads a consumer to a harmful site. Pharmaceutical spam leads to an affiliate site selling prescription drugs without requiring a prescription, along with other questionable products. These drugs range from erectile dysfunction drugs to pain medications.
All of these drugs are being sold at very low prices per pill, making them more appealing
to the consumer to buy. In this research, pharmaceutical spam subjects were manually
analyzed and then grouped into spam campaigns based on common domains. The results
of this project include: the determination of the connections among the spam campaigns,
as well as the connections among the affiliate sites; the determination of the most dominate affiliate sites and spam campaigns that appeared throughout the nine month research
period; and the determination that only about eight to ten affiliate programs are hiring
affiliates to advertise their affiliate site using the spam campaigns identified in this research.
Keywords: pharmaceutical spam, prescription drugs, data mine, Viagra, Cialis, Levitra
ii
TABLE OF CONTENTS
Page
ABSTRACT ........................................................................................................................ ii
LIST OF TABLES ...............................................................................................................v
LIST OF FIGURES .......................................................................................................... vii
LIST OF ABBREVIATIONS ............................................................................................ ix
CHAPTER 1
1. INTRODUCTION .....................................................................................................1
Background on Spam ...............................................................................................1
Why Use These Illicit Online Pharmacies? .............................................................5
Use as Directed ........................................................................................................6
Affiliate Programs, Affiliates, and Affiliate Sites ...................................................9
Glavmed/ SpamIt ...................................................................................................10
Microsoft Security Intelligence Reports ................................................................12
Previous Studies .....................................................................................................14
Wei et al. (2007) ........................................................................................14
Zhang et al. (2009) .....................................................................................15
Levchenko et al. (2011) .............................................................................16
Spam Data Mine ....................................................................................................16
CHAPTER 2
2. DATA COLLECTION METHODOLOGY ............................................................18
Virtual Machine .....................................................................................................18
Collecting the Data ................................................................................................19
Method 1: December 2011- February 2012 ...............................................23
Method 2: March 2011-August 2012 .........................................................24
International Characters and Running Queries with PSQL .......................25
Spam Campaigns ...................................................................................................30
CHAPTER 3
3. RESULTS AND DISCUSSION ..............................................................................33
iii
Monthly Reports ....................................................................................................33
December 2011 ..........................................................................................33
January 2012 ..............................................................................................37
February 2012 ............................................................................................40
March 2012 ................................................................................................43
April 2012 ..................................................................................................47
May 2012 ...................................................................................................50
June 2012 ...................................................................................................53
July 2012 ....................................................................................................57
August 2012 ...............................................................................................60
Significant Observations ........................................................................................64
CHAPTER 4
4. CONCLUSION ........................................................................................................71
LIST OF REFERENCES ...................................................................................................75
iv
LIST OF TABLES
Tables
Page
1
Volume 5-13 percentages of spam messages blocked by EHS for the
Pharmacy-Sexual and Pharmacy-Non Sexual categories of spam ........................13
2a
Queries used to generate results and the query’s target .........................................20
3a
Key words used for determining pharma spam subjects in Method 1 ...................21
3b
Key words used for determining pharma spam subjects in Method 2 ...................21
2b
Modified queries used with international characters, an
example of the international characters, and the target of the queries. ..................26
4
Table illustrating when each spam campaign appeared and
disappeared in December .......................................................................................36
5
Table illustrating when each spam campaign appeared and
disappeared in January ...........................................................................................39
6
Table illustrating when each spam campaign appeared and
disappeared in February .........................................................................................42
7
Table illustrating when each spam campaign appeared and
disappeared in March .............................................................................................46
8
Table illustrating when each spam campaign appeared and
disappeared in April ...............................................................................................49
9
Table illustrating when each spam campaign appeared and
disappeared in May ................................................................................................52
10
Table illustrating when each spam campaign appeared and
disappeared in June ................................................................................................56
11
Table illustrating when each spam campaign appeared and
disappeared in July.................................................................................................59
v
12
Table illustrating when each spam campaign appeared and
disappeared in August ............................................................................................63
13a
Subgroup of domains that appeared with CHCM and CNP spam
campaigns in May, June, and July .........................................................................65
13b
Subgroup of domains that appeared with CHCM and CNP spam
campaigns beginning at the end of July through August .......................................66
vi
LIST OF FIGURES
Figure
Page
1
Example of Canadian Pharmacy template ...............................................................9
2
Results of the spam query ......................................................................................20
3
Results of the subject query ...................................................................................23
4
Results of the modified subject query 1 ......................................................................27
5
Results of the modified subject query 2 .................................................................28
6
Depiction of the actual amount of times the subject appeared ..............................28
7
Results of the modified domain query ...................................................................29
8
Depiction of the actual number of subjects associated with the
domain ".drugsspee.ru" ..........................................................................................29
9
Results of removing the duplicate subjects in excel ..............................................30
10
Number of subjects over 1,000 by day for December ...........................................34
11
Number of spam campaigns in December for each affiliate
site that appeared....................................................................................................34
12
Number of subjects over 1,000 by day for January ...............................................37
13
Number of spam campaigns in January for each affiliate
site that appeared....................................................................................................38
14
Number of subjects over 1,000 by day for February .............................................40
15
Number of spam campaigns in February for each affiliate
site that appeared....................................................................................................41
16
Number of spam campaigns in March for each affiliate
vii
site that appeared....................................................................................................44
17
Number of spam campaigns in April for each affiliate
site that appeared....................................................................................................47
18
Number of spam campaigns in May for each affiliate
site that appeared....................................................................................................51
19
Number of spam campaigns in June for each affiliate
site that appeared....................................................................................................54
20
Number of spam campaigns in July for each affiliate
site that appeared....................................................................................................57
21
Number of spam campaigns in August for each affiliate
site that appeared....................................................................................................60
viii
LIST OF ABBREVIATIONS
CHCM
Canadian Health&Care Mall
CNP
Canadian Neighbor Pharmacy
CP
Canadian Pharmacy
CPE
Canadian Pharmacy/Pharmacy Express
DM
Discount Meds
DP
Direct Pharm
DTP
Direct Pills
ED
erectile dysfunction
EHS
Microsoft Exchange Hosted Services
FDA
Food and Drug Administration
FTC
U.S. Federal Trade Commission
GP
Generic Pills
GRX
Global RX
IP
Internet Protocol
ix
ISP
Internet Service Providers
MCP
My Canadian Pharmacy
MH
Men's Health
MJ
Mister Joy
MRX
Mega RX
OP
Online Pharmacy
PE
Pharmacy Express
PMP
Pharmacy Med Pro
PS
Pills Shop
PSQL
Postgre Structured Query Language
RD
RX Discounts
RXD
RX Deals
RXO
RX Orders
SEO
Search Engine Optimization
SIR
Security Intelligence Report
TDS
Toronto Drug Store
TM
Top Meds
x
TP
Top Pharm
VM
virtual machine
WHO
World Health Organization
xi
1
CHAPTER 1
INTRODUCTION
This research focused on the daily analysis of pharma spam collected from the
UAB spam data mine over a period of nine months, December 2011 through August
2012. The initial overall purpose of this research was to collect the daily pharma spam,
analyze the frequency of the daily pharma spam, and identify the most prominent affiliate
sites as well as the most prominent spam campaigns for each affiliate site. During the
course of this project, there was a major shift in the distribution of pharma spam subjects,
which required the development of new analysis methods in the middle of the project.
Because of this shift, the overall goal was also altered to focus on the analysis of the
spam campaign activity in order to identify the most prominent affiliate sites as well as
the most prominent spam campaigns for each affiliate site. Also, contrary to published
reports, many affiliate sites remain active for months rather than days.
Background on Spam
Unsolicited commercial email or spam is not a new problem. The first spam email
was sent in 1975 as an Internet Request for Comments (1). The Internet Request for
Comments is a series of standards for the internet that are enforced by the Internet
Engineering Task Force. Spam makes up 45% of all emails sent out each day (2). On
2
average, there are 14.5 billion spam messages sent out globally per day. There were 200
billion spam email sent out every day in 2010 (3). Spam can take the form of a document
(e-mail), an automated vote, a user profile, or an annotation (4). In addition to email,
spam can be posted on social network sites and can even come up as a result when
searching on a search engine. The form of the spam depends on the spammer’s
motivation. Motivations include self-promotion, curiosity, advertisement, disruption, or
mocking a competitor (4). No matter the form of the spam, if it catches the receiver’s
attention, the spammer has succeeded.
The two most common types of spam are advertising and adult related spam (2).
Advertising spam accounts for 36% of all spam messages sent, and adult related spam
accounts for 31.7%. The goal of both types of spam messages is to entice the receiver to
click on an URL that will open to an online store. The contents of the advertising emails
will advertise all types of products such as: replica products, male enhancements,
counterfeit software, or pharmaceuticals (5). The contents of the adult related emails will
advertize pornographic items or sexual enhancement products (6). Both types of emails
generally contain redirect links, meaning the spam message will point to a small webpage
hosted on a hacked web server. Once the URLs are clicked, the consumer will be
forwarded to a website site containing the advertised products. These are generally not
legitimate sites, but are instead unsafe or illicit websites designed by the spammer to
offer questionable goods. Spammers also tweak domains, name servers, and web servers
in order to forward the consumer to the spam website. Using these methods, the spammer
can circumvent spam filters that check the destination context (5).
3
The reason people are spammed is simple; it is cheap, easy, and lucrative. A
spammer can send out 100,000 spam messages for less than $200 and buy a million email
addresses to spam for less than $100 (1). Spamming is easy because the spam is being
sent out, not by humans, but by computers that are a part of a botnet. A botnet is a group
of infected computers that can be used to spam millions of people with the same message
in a matter of seconds (7). One method used by spammers to infect computers is to send
malware to an unsuspecting computer owner. Malware is short for malicious software
and is software designed with malicious intent (8). A computer may become infected
when the owner receives an email from a known contact, opens it, and clicks on an
included link in the body of the email. Clicking the link results in the computer
surreptitiously downloading an executable file which installs a piece of malware that
takes remote control of the computers, i.e.: a Trojan horse (9). Once the malware is
loaded, it will download all the contacts from the email account on the infected computer
and repeat the cycle of infection and control. Once the botnet is activated, the computers
in the botnet will begin sending out the spam messages, usually without the owner of the
computer even knowing the computer is infected. The spammer controls the botnet
through a central command and control server. Most spam networks consist of several
botnets as well as several command and control servers (7). A botnet can serve two
functions: one is to collect a database of email addresses and the second is to flood those
accounts with spam email.
To a spammer, the main benefit of using botnets is efficiency because efficiency
equals profit. More spam being generated means higher potential for sales and/or another
computer being infected. Botnets also have the benefit of obfuscation. Obfuscation in this
4
context is the ability to hide the spam’s original destination. The spammer is able to alter
the spam emails to look like an authentic email from a known contact, furthering the
potential for a consumer to open the email, click on the embedded link, and become
infected.
Although spam is unsolicited, there are legal types of spam received each day that
are solicited by consumers. When a consumer solicits legal spam, the consumer
knowingly and voluntarily signs up to receive emails from a business and has the option
to opt out whenever desired (10). Mostly though, spam is illegal because the sender’s
information is falsified in order to keep the sender anonymous. In the case of illegal
spam, the sender does not have the recipient’s consent to send the unsolicited mail nor
does the recipient request to receive the unsolicited mail (10).
There are three types of web spam: link spam, content spam, and cloaking; with
content spam and cloaking being the most common (11). Link spam has links between
pages that are present for malicious intentions rather than moral intentions (12). Linkbased ranking algorithms are used with link spam (11). Websites are given higher
rankings from the link-based ranking algorithms based on how often other highly ranked
websites link to it (13). For example, Facebook is a highly ranked website. When other
websites are advertised on Facebook’s website, those websites obtain a higher ranking.
The higher the ranking of the website, the more likely the website is to be in the top
results returned from a Google search. If a spammer can get its website to have a high
enough ranking to appear in the top results for a Google search, there is a better potential
for revenue. Content spam maliciously crafts the content of web pages by inserting
keywords that are related to popular query terms rather than to the actual content of the
5
pages. This type of spam is also known as Search Engine Optimization (SEO). When a
spammer uses cloaking, the spammer is sending altered web page content to a search
engine rather than to the visitors of the website (11). Meaning, instead of infecting the
consumer via a link that leads to a malicious site in an email, the spammer alters a
particular web page with the malicious content. For example, if the consumer uses a
search engine such as Google to search for ‘target’, Google will generate results related
to the word ‘target’. The consumer will find the link that leads to target.com and click the
link, which should open to the official Target web page. The spammer, however, has used
cloaking on this web page, leading the consumer to an altered version of target.com with
malicious content embedded.
There are legal and simple technical measures used to block spam emails.
Blacklists and key-word based filters can be used, but these measures have not had much
of an effect (14). One reason these filters do not work as well as hoped is that spammers
quickly learn which words are being filtered and simply alter the keywords that would
keep the spam out, ensuring their particular spam will get through the filters.
Why Use These Illicit Online Pharmacies?
United States citizens spent nearly $2.6 trillion on healthcare in 2011; ten times
the amount spent in 1980, $256 billion (15; 16). With the increase of unemployment and
household incomes decreasing, many Americans are more concerned about health
spending and affordability (15). Many consumers find it difficult to pay for essential
medications and may choose to leave their prescriptions unfilled (17). When a person
6
absolutely needs the medicine but cannot afford the medication, one option is to turn to
an illicit online pharmacy offering a cheaper alternative. Illicit pharmacies sell
pharmaceuticals ranging from male enhancement, to antibiotics, to blood pressure
medications (18). The Center for Medicine in Public Interest estimated the sales of
counterfeit medicines are growing twice as fast as the sales of legitimate medicines (17).
There was an annual increase of 13% for counterfeit medicines compared to a 7.5%
increase for legitimate medicines from 2004-2010. This increase is likely due to the fact
that the illicit sites offer significant discounts over what a traditional pharmacy could
offer, even with insurance (17). The ability to purchase medications without needing a
prescription or insurance, as well as the costs being significantly lower, offers a huge
incentive to consumers.
Use as Directed
Purchasing prescription drugs online without a prescription is illegal. These
prescription drugs are such because of the dangers that arise when the drugs are taken
irresponsibly. Many of these prescription drugs are also controlled meaning the purchase
of them without an official prescription from a practicing doctor is unlawful (19).
Purchasing those controlled drugs from illicit online pharmacies is a federal crime and
could lead to a prison sentence. Many of these illicit online pharmacies advertise having
an online doctor that can write the prescriptions, but in reality, these online doctors are
not recognized under the law, therefore the “prescriptions” being written by them are not
legal (19).
7
Beside it being illegal to purchase prescription medication without a prescription,
these purchases are also dangerous. With any medication, there are always side effects
that should be taken into account, as well as the medication’s interactions with other
drugs. Buying prescription medications online may be easier, but is not by any means
safe. It is not known where these medications come from. The medications are not
inspected by the regulatory agencies such as the Food and Drug Administration (FDA),
nor are the medications manufactured under safe conditions (20). Medicine purchased
online could have additives (21), be expired, be the wrong dosages, or not contain correct
dosage directions (19).
Another issue with purchasing drugs from these illicit online pharmacies is the
active ingredient of the original drug may not be present in the drug advertised on the
illicit pharmacy. For example, some illicit pharmacies advertise female sexual enhancers,
Pink Female Viagra (Sildenafil) and Female Cialis (Tadalafil) (22). Both Viagra and
Cialis have only been approved for men, so the female version of Viagra and Cialis may
not contain any sildenafil or tadalafi at all and could be dangerous for consumption.
There were 46 confidential reports received by the World Health Organization
(WHO) from 20 different countries related to counterfeit drugs (23). The reports included
information about products without the active ingredients, with incorrect quantities of the
active ingredient, with wrong ingredients, correct quantities of active ingredients but fake
packaging, products with impurities and contaminants, as well as products that were
copies of the original. The majority of the products, 32.1%, did not have the active
ingredient. The products that contained wrong ingredients accounted for 21.4% and the
products that contained incorrect quantities of the active ingredient accounted for 20.2%.
8
Products that contained the correct active ingredients but fake packaging accounted for
15.6%, and the products that contained high levels of impurities and contaminants
accounted for 8.5%. Only 1% of the products were copies of the original product (23).
Viagra has gained wide acceptance as a treatment for erectile dysfunction (ED)
since its debut in 1998 (21); but there are still many men who are too embarrassed to
purchase the drug openly. The drug has also become very popular with younger men. On
the black market, Viagra is sold for $25-$30 per pill but can be purchased for lower than
$1 on illicit online pharmacies (18; 21). Purchasing Viagra online allows the buyer to be
both anonymous and to save money (21).
Viagra has become a party drug on college campuses, at clubs, and rave parties
(21). At these parties, it is often taken in a life-threatening combination with street drugs
such as Ecstasy. Viagra is taken to offset the decrease in sexual arousal experience with
Ecstasy. The ED drugs are also being combined with "poppers," or amyl nitrate. Both of
these drugs dilate blood vessels, which can result in a sudden drop in blood pressure and
cause a heart attack or stroke. For that same reason, Viagra is not prescribed for patients
taking nitrates for certain heart conditions. The combination of Viagra and nitrates has
resulted in several reported deaths (21). Drug task force agents in Athens Georgia have
reported that they routinely discover Viagra in the possession of college males who don't
have ED problems, nor do the students posses a valid prescription for the Viagra (24).
The students steal it from their parents, order it online, or buy it from their friends. While
the ED drugs Cialis and Levitra are also drugs of interest in this research, they have not
yet reached Viagra’s level of popularity and are not abused as heavily as Viagra (25).
9
Affiliate Programs, Affiliates, and Affiliate Sites
Pharmaceutical spam leads a consumer to a website that sells many different
prescription pharmaceuticals; however, on these sites a prescription is not required. Upon
clicking a link in a spam message, the consumer is led or forwarded to an affiliate site
such as Canadian Pharmacy (CP), (Figure 1).
Figure 1- Example of Canadian Pharmacy template
Affiliates programs are responsible for providing the template and source code for
the affiliate sites (26). The graphics, as well as the selection and arrangements of products
sold are what define the template of an affiliate site. Most affiliate programs are free for
affiliates to join. The affiliate program may have similar templates for each of the
affiliates, or the affiliate program may provide different templates for each of the
affiliates. Often times there are multiple affiliate sites being run by the same affiliate.
Affiliates also work for multiple affiliate programs at once, so the different affiliate sites
being run by the one affiliate may have all been provided by a different affiliate program.
10
It is also possible for an affiliate to leave one affiliate program and begin working with
another affiliate program.
The affiliate programs will provide their affiliates with the source code and
template for the affiliate sites, and the affiliates decide upon their own manner of
advertisement. Once the affiliates have the source code and templates for the affiliate
sites, the affiliates register domains for the affiliate sites through a domain registry such
as Go Daddy. Once the domains are registered, the affiliates can host the affiliate sites on
numerous domains. Once the domains are active, if the affiliates choose spam as their
form of advertisement, the affiliates generate the spam emails containing links to the
affiliate sites and send out the spam via botnets.
The affiliates handle the shopping experience for the customer but when the
customer is ready to check out, the affiliate program regains control (26). The affiliate
programs are responsible for obtaining the contracts with outside parties for the payment
and fulfillment services (5). Often times, the customer is redirected to a payment gateway
site (26). The payment gateway can be obvious to the customer or a proxy mode can be
installed so the customer is not aware of the redirection.
Glavmed/ SpamIt
Glavmed is an affiliate program that promoted illicit online pharmacies (27). The
glavmed.com domain was registered in March 2006 (28). SpamIt is sometimes used
when referring to Glavmed or called the sister company of Glavmed. The spamit.com
domain was registered in June 2004 with the same administrative contact name as the
11
glavmed.com domain (28). The difference between the glavmed.com and spamit.com
domains was the corresponding postal address and phone number. SpamIt was a forum of
members that were affiliates whom had access to the high end pharmaceuticals, such as
Percocet. The high-end pharmaceuticals are often controlled medications. Only those
members of SpamIt were allowed to sell the high end pharmaceuticals, while non
members were only allowed to sell other pharmaceuticals sold by Glavmed. Glavmed
focused on websites, SEO and banner advertising, while SpamIt was more directly tied to
the spam emails (28).
Igor A. Gusev was known to be in charge of both Glavmed and SpamIt (29). The
Russian authorities opened a criminal investigation on Gusev and he was charged with
operating a pharmacy without a license and of failing to register a business. Charges
pending, SpamIt “decided” to close down in September 2010, while Glavmed continued
with business as usual, paying affiliates to promote illicit online pharmacies (30). Before
being able to close down, however, three years worth of clientele from the
Glavmed/SpamIt database was exposed. Eventually, the Russian authorities made efforts
to crack down on the spam distributing from within Russian, and the Glavmed was also
shut down. Gusev is said to still be at large and is wanted in Russian on those charges
(31).
The three years of clientele information that was leaked showed that Glavmed’s
affiliates sold knockoff prescription drugs to more than 800,000 consumers and processed
over $1.8 million, which generated at least $150 million in revenue (30). Following
SpamIt’s closure in October 2010, there was a 40% decrease in spam volume (32). This
did not last, however, as spam emails leading to illicit pharmacies quickly returned to the
12
previous highs. The Glavmed database contained 705 unique entries. Those entries
included, but were not limited to drugs, drug tests, wines, and veterinary medicines.
Along with selling prescription drugs without a prescription, Glavmed was also
responsible for selling discontinued drug far beyond their discontinuation dates.
Investigations through the known Glavmed database showed 46 discontinued drugs
actively being sold after their discontinuation dates.
It is not known exactly when, but at some point towards the end of 2012
beginning of 2013, the Glavmed website was back up and running, and being hosted in
Dallas, Texas. There is no evidence of any real activity occurring on the site, but it is
interesting that the site is live again.
Microsoft Security Intelligence Reports
Since January of 2006, Microsoft has published a Security Intelligence Report
(SIR) that focuses on data and trends every 6 months. This report helps to provide the
public with in-depth perspectives of potentially unwanted software, software
vulnerabilities and exploits, and malicious code threats. One of the topics in these reports
includes the different types of spam messages that are blocked by the Microsoft
Exchange Hosted Services (EHS) (33). There are 14 Volumes of the SIR to date, but
Microsoft did not begin documenting the percentages for the different types of spam that
was blocked until the fourth volume. In Volume 4, the spam category of interest as it
pertains to this research was RX/Herbal spam messages. Microsoft recorded the
13
percentage of spam blocked in this category in 2004 and 2007. In 2004 the percentage for
the RX/Herbal category was 10%, and in 2007 it tripled to 31% (34).
Beginning in Volume 5, the spam categories of interest were pharmacy- sexual
(Viagra, Cialis, Levitra, etc.) and other pharmacy or pharmacy-non sexual. The
percentages in Table 1 show the percentage of the spam blocked for each category. There
were a high percentage of spam messages being blocked in early 2008 for the pharmacysexual category, but as the months elapsed, the percentages of spam messages being
blocked for this category drastically decreased. The percentage of pharmacy-non sexual
spam messages being blocked fluctuated throughout the SIRs, but reached its high in
Volume 13, January – June 2012. Microsoft makes no conclusions as to why the
pharmacy-sexual category is consistently decreasing as the years pass, nor as to why the
pharmacy-non sexual category fluctuates in the manner that it does.
Table 1
Volume 5-14 percentages of spam messages blocked by EHS for the pharmacy-sexual
and pharmacy-non sexual categories of spam
Volume #
Pharmacy- Sexual
Pharmacy- Non Sexual
5 (Jan. – June 2008)
30.6%
20.9%
6 (July – Dec. 2008)
10.0%
38.6%
7 (Jan. – June 2009)
7.8%
40.5%
8 (July – Dec. 2009)
6.4%
31.7%
9 (Jan. – June 2010)
3.3%
31.9%
10 (July – Dec. 2010)
3.3%
32.4%
11 (Jan. – June 2011)
3.8%
28.0%
12 (July – Dec. 2011)
3.2%
46.5%
13 (Jan. – June 2012)
3.4%
46.7%
14 (July – Dec. 2012)
1.4%
43.8%
14
Previous Studies
Wei et al. (2007)
The current method for investigating spam emails was developed by Wei et al. in
2007 (35). Spam email messages were clustered in groups based on images from the
websites that were in the messages. Once the associated messages were clustered, there
was a validation process in which the clusters were tested to decide if the images within
the cluster were actually related. The validation process was preformed with one or two
levels. The first level validation included visual investigation of the clusters. If the cluster
contained website images that were all the same, the cluster’s integrity was given a high
confidence.
If the cluster contained divergent images, the second level validation was
performed. The second level validation checked the domains in the cluster(s) against
WHOIS, and WHOIS returned information regarding the hosting information, registrant
information, and name server information associated with the domains. WHOIS is a
database that contains information on registered domains and IP addresses.
The largest seven clusters out of 42 clusters contained more than 100 messages
each. For the first level of validation, five of the seven large clusters each contained
exactly one image pattern. Those five clusters were given a high level of validity. Two of
the seven clusters needed the second level of validity. Wei et al. determined those two
clusters each contained multiple image patterns being used. The WHOIS information for
the first cluster on which the second level of validation was performed determined that
the cluster contained 100 domains and 22 image patterns, but the domains were only
15
being hosted on six IP addresses. For the first cluster there were two unique sets of owner
registration data found and three sets of name servers found. The second cluster on
which the second level of validation was performed contained numerous divergent
images and this level of validation did not give a high correlation to the cluster. Wei et al.
believed there were false positive within this cluster which caused all the divergent
images. Use of these validation levels proposed a new approach for analyzing spam
emails.
Zhang et al. (2009)
A similar approach to studying spam emails via data mining was taken by Zhang
et al. (10) in 2009. In this research, the authors focused on the image attachments in the
spam email in order to identify phishing groups or spam clusters. The authors analyzed
the background textures, foreground text layouts, and foreground picture illustrations and
then extracted the visual features from the spam images. Once extracted, the visually
similar spam images were clustered using an “unsupervised clustering algorithm”. Zhang
et al. concluded that the “unsupervised clustering algorithm” was very effective for
verifying visual similarities between spam images and providing key information about
the spam images and their common source. The algorithm also helped to automate the
visual validation process of the spam clustering results.
16
Levchenko et al. (2011)
The data collection methodology in this research was based on a method
developed by Levchenko et al. (5) in 2011. Spam emails were collected in a data mine
and the URLs were investigated. The websites were then clustered by similarities in
content and purchases were made from each cluster. In the research, the authors focused
on replica goods, counterfeit software, and pharmaceuticals, including herbal goods. This
method concluded there were only 13 distinct banks acting as Visa acquirers and 13
different suppliers. Of the 13 distinct banks, most of the herbal and replica purchases
were cleared through the same bank in St. Kitts (West Indies). The pharmaceutical
affiliate programs used a bank in Latvia and a bank in Azerbaijan. The purchased
software was cleared through one bank in Latvia and one bank in Russia. Of the 13
different suppliers, all the pharmaceuticals were shipped from India with the exception of
one, which shipped from the United States (US). The replicas were shipped from China,
and the herbal goods were shipped mostly from the US, but were also shipped from
China and New Zealand. There was no clear supplier identified for the counterfeit
software. Levchenko et al. concluded that the payment tier is the most valuable asset in
the spam ecosystem.
Spam Data Mine
While most people in the world are trying to eliminate spam from their email
accounts, the UAB Spam Data mine is collecting it. The spam is collected from UAB
staff and students and entered into the spam data mine. Spam is also collected from
17
expired domain names purchased by Internet Identity1 as non-existing email accounts.
These email accounts have a “catch-all” policy on the filter, so the domains collect
anything from the email accounts identified as spam (36). Once in the data mine, the
emails are parsed into database fields. The email subject, sender’s email, sender’s IP
address, sender email-id, (email header information), and usernames (7; 37) are stored in
one table. Any URLs in the email are stored in a second table and attachment information
about the emails is stored in a third table (37). The data mine also extracts the website’s
domain names and IP addresses (7).
The remainder of this thesis provides information on the virtual machine (VM)
used, information on the methodologies used throughout this research, how spam
campaigns were developed, information about the data collection using Postgre
Structured Query Language (PSQL) (Chapter 2), the results of the research as well as
discussions about the results (Chapter 3), and the conclusions of the research (Chapter 4).
1
Anti-Phishing Company, Tacoma, Washington
18
CHAPTER 2
DATA COLLECTION METHODOLOGY
Virtual Machine
A virtual machine (VM) was run using the VMWare2 Workstation
software. This software was used to protect the computer from being infected by malware
or invasive software. The VM is an operating environment, running as a normal
application that does not have contact with the host operating system. The computer used
in this research uses a Windows operating system, and the VM used on this computer
opens using a Linux operating system inside the Windows operating system. The VM is
built with a base image where any downloaded software or data are written to temporary
files. Each time the VM is shut down, the temporary files are deleted and only the
uncontaminated base image is left. This prevented the computers from being infected
with viruses during the analysis.
2
Palo Alto, CA
19
Collecting the Data
Two methods were used to collect the data, method 1 and method 2. Method 1
was used from December 2011 through February 2012. Method 2 was used from March
2012 through August 2012.
A ‘spam query’ was used to retrieve all of the spam subjects for one day and the
number of entries in the spam data mine for the spam subjects on the date being queried.
An example of the spam query is shown in Table 2a and the results of the spam query are
shown in Figure 2. Using the results of the spam query, the subjects were manually
analyzed to determine which could be labeled as pharma spam. Pharma spam subjects
were labeled as such by a certain amount of instinct, as well as by manually analyzing the
subjects for any key words indicative of pharmaceutical spam (Table 3a, Table 3b). Once
the data was collected, the data was further analyzed and potential spam campaigns were
developed.
20
Table 2a
Queries used to generate results and the query’s target
Query
Target
Spam Query
Total spam subjects for a particular
day
Select count (*), subject from spam where
receiving_date = ‘2012-3-18’ group by
subject order by count desc;
Example spam query
Subject Query
All domains associated with a
specific subject
Select count(*), subject, machine, path
from spam a, link_url b, spam_link c
where a.message_id = b.message_id and
b.urlid = c.urlid and receiving_date =
‘2012-6-19’ and subject = 'saleoff:Viaqra
ppil - the only secret of perfect' group by
subject, machine, path order by count
desc;
Example subject query
Domain Query
All subjects associated with a
specific domain
Select * from spam a, link_url b,
spam_link c where a.message_id =
b.message_id and b. urlid = c.urlid and
receiving_date = ‘2012-3-18’ and machine
= ‘pillstomp.ru’;
Sample of domain query
Figure 2- Results of the spam query
21
Table 3a
Key words used for determining pharma spam subjects in Method 1
Method 1 Key Words
Viagra
Cialis
Levitra
Sildenafil
Tadalafil
Vardenafil
Table 3b
Key words used for determining pharma spam subjects in Method 2
Method 2 Key Words
Viagra
Pharmacy
Cialis
Drugs
Levitra
Pills
Sildenafil
Store
Tadalafil
Rx
Vardenafil
Prescription
In the beginning of the research, method 1 was acceptable because there were
numerous pharma spam subject entries in the spam data mine for each day. Also, all of
the subjects contained a key word from Table 3a making the pharma spam subjects easily
detectable. January 21st, 2012 marked the date of the first noticeable change in the
amount of daily pharma spam entries in the spam data mine. On that day, a considerable
decrease in the amount of daily pharma spam was noticed. The decreased continued
through the end of January.
22
The hypothesis behind the daily pharma spam decrease was the spammers were
beginning to notice the same IP address visiting their websites numerous times a day,
every day. In an attempt to thwart the investigative efforts, the spammers began to
obfuscate the pharma spam, or make the pharma spam less obvious. This obfuscation
occurred by using less obvious subject lines for the pharma spam, so the pharma spam
subjects were harder to detect. The obfuscation continued into February, so starting
February 27th, all of the subjects that appeared with at least 1,000 entries in the data mine
were analyzed, not just those that contained the specified key words. The change in the
amount of daily pharma spam continued to decrease as February came to an end, so
beginning in March method 2 was used to analyze the daily pharma spam.
During Method 2, another obfuscation effort that was noticed included the
spammers using international characters at the end of the pharma spam subjects. For
example, if a subject actually appeared with a total of 1,564 entries in the data mine on
one day, but the spammer added on random international characters to the end of the
subjects, the subject would only show as appearing with maybe 385 entries, 294 entries,
375 entries, 256 entries, and 254 entries. Each time the subject would appear there would
be a different set of randomized international characters attached to the subject. This
obfuscation gave the illusion there were fewer pharma spam subject entries in the data
mine than there actually were.
23
Method 1: December 2011- February 2012
The spam query was used to retrieve all of the spam that was collected each day
in the spam data mine. From the results of the spam query, the subjects that appeared
with at least 1,000 entries in the spam data mine per day were manually analyzed. With
the aid of the key words (Table 3a), certain subjects could be labeled as pharma spam.
Once labeled, the pharma spam subjects were further analyzed using a subject query
(Table 2a). The subject query returns the particular machines and/or domains along with
the paths associated with each subject used in the query. The domains were then opened
in the VM in order to determine if the domain led to an affiliate site or not. A sample of
results from the subject query is shown in Figure 3. Each subject, its associated domains,
and the affiliate sites the domains led to were recorded. The amount of pharma spam
observed for each month was recorded as well as the number of spam campaigns that
were associated with each affiliate site. As each distinct affiliate site was identified,
screenshots of the main page, the contacts page and the logos were taken and saved.
Figure 3- Results of the subject query
24
Method 2: March 2011-August 2012
Method 2 was developed to help counter the obfuscation attempts of the
spammers. Each subject, its associated domains, the affiliate sites the domains led to, and
amount of spam campaigns continued to be recorded as with method 1. The screenshots
of the affiliate sites’ main page, contact page, and logos were also still collected.
Similar to method 1, the spam query was used to retrieve all the spam subjects for
one day, as was the subject query used on the labeled pharma spam subjects. The changes
in method 2 included using addition keyword to label the spam as pharma spam, analysis
of all of the spam subjects that appeared in the data mine each day, not just the spam that
appeared with at least 1,000 entries in the data mine, and the use of the domain query.
The domain query was use to collect all of the subjects associated with a
particular domain (Table 2a). The domain query takes a domain as an input and returns
all subjects associated with that domain as well as the associated IP addresses, sender
names, and usernames. The domains were chosen at random from the list of results
generated by the subject query. The sender names, usernames, and IP addresses returned
by the domain query were discarded because they were most likely associated with an
infected computer and not information about the affiliate site. Using the domain query
helped to retrieve pharma spam subjects that may have been over looked while manually
analyzing the daily spam subjects.
25
International Characters and Running Queries with PSQL
While running a query, PSQL will return an error when it encounters international
characters such as “Ô. If a subject being queried contained the international character,
the international character was replaced with a percent sign (%), which functions as a
wild card and the equal sign (=) in the query was replaced with “like”. Table 2b includes
a subject that contains an international character in bold and italicized font. The
international character(s) in the subject query are replaced with the percent sign (bold,
italicized) and the equal sign is changed to “like” (bold), to create the modified subject
query 1. Figure 4 shows the equal sign has been replaced with ‘like’ and the international
characters have been replaced with the percent sign. The results of modified subject
query 1 are also shown in Figure 4.
26
Table 2b
Modified queries used with international characters, an example of the international
characters, and the target of the queries.
Query
Modified Subject Query 1
Buà Ciails and Viarga online!
Select count(*), subject, machine, path from
spam a, link_url b, spam_link c where
a.message_id = b.message_id and b.urlid =
c.urlid and receiving_date like ‘2012-6-19’ and
subject like ‘Bu% Ciails and Viarga online!’
group by subject, machine, path order by count
desc;
Modified Subject Query 2
Over 75.000 customers trust us. Buy Cheap
Viagra & Cialis Pills. We accept Visa,
Mastercard, AmEx & ACH. vixd1
Select count(*), subject, machine, path from
spam a, link_url b, spam_link c where
a.message_id = b.message_id and b.urlid =
c.urlid and receiving_date = '2012-7-3' and
subject like ‘%Over 75.000 customers trust us.
Buy Cheap Viagra & Cialis Pills. We accept
Visa, Mastercard, AmEx & ACH.%' group by
subject, machine, path order by count desc;
Modified Domain Query
fui.drugsspee.ru
Select * from spam a, link_url b, spam_link c
where a.message_id = b.message_id and b.
urlid = c.urlid and receiving_date = ‘2012-318’ and machine like ‘%.drugsspee.ru%’;
Target
Domains associated with a specific
subject containing international characters
Sample subject with international character
Sample of modified query 1
All of the exact subjects used in the query
without the additional characters
Sample subject with additional randomized
characters
Sample of modified query 2
All subjects associated with only the
domain name
Sample of domain with machine attached
Sample of modified domain query
27
Figure 4- Results of the modified subject query 1
The wild card was also used when a subject contain a randomized set of
characters. The random characters were usually attached at the end of the subject. When
this occurred, the modified subject query 2 was used. Table 2b includes an example of a
subject containing the randomized additional characters (bold and italicized). In order to
get accurate results, the percent signs (bold, italicized) are used at the beginning of the
subject and at the end of the subject in place of the randomized additional characters, and
the equal sign is changed to “like” (bold). The modified subject query 2 returns all the
subjects that look exactly like the characters between the percent signs, regardless of the
additional characters at the end of each subject. An example of the modified subject
query 2 and its results are shown in Figure 5. The number of entries for the sample
subject that actually appeared in the data mine on July 5, 2012 was 1,181 (Figure 6).
28
Figure 5- Results of the modified subject query 2
Figure 6- Depiction of the actual amount of times the subject appeared
Domain queries also needed to be modified occasionally (Table 2b). The modified
domain query is similar to the modified subject query 2. The example in Table 2b has a
machine (bold, italicized) attached to the domain (underlined). The machine is random
and will change with the subject but the domain will remain constant. In order to receive
accurate results, the machine name is removed from the query, the domain is enclosed
with percent signs (bold, italicized), and the equal sign is changed to “like” (bold). The
results of the modified domain query are circled in Figure 7. The actual number of
subjects associated with the domain ‘.drugsspee.ru’ on March 18, 2012 was 7,995,
29
enclosed in the box in Figure 8. In an excel sheet, all of the duplicate subjects returned
from the modified domain query were removed and the total number of distinct subjects
associated with the domain was actually ten (Figure 9).
Figure 7- Results of the modified domain query
Figure 8- Depiction of the actual number of subjects associated with the domain
".drugsspee.ru"
30
Figure 9- Results of removing the duplicate subjects in excel
Spam Campaigns
Beginning in November, the daily spam collected from the UAB spam data mine
were closely examined in order to determine trends among the pharma spam subjects.
Once the trends among the subjects were established, the similar subjects were put into a
group identified as a potential spam campaign.
The association of domains to subjects was the primary criteria for inclusion in a
spam campaign. If multiple subjects contained at least one common domain, those
subjects were grouped together into a spam campaign. Within the spam campaigns, the
domains connecting the subjects may not be the exact same domains. For example,
subject A in a particular spam campaign has domain a, domain b, and domain c
associated with it, subject B in the same spam campaign has domain b, domain d, and
domain e, and subject C also in the same spam campaign, has domain d, domain f, and
31
domain g. All three subjects are connected by a single domain, but the connected domain
is not the same among the three different subjects. This is most often the case when there
were many subjects in a spam campaign. For that reason, only one domain connection
among subjects need be apparent in order to group different subjects in the same spam
campaign.
The affiliate sites to which the domains opened were also taken into consideration
when grouping subjects into the spam campaigns. If the same domain was associated
with different subjects, that subject would always open to the same affiliate site, if the
domain loaded at all. There are a few cases in which the spam campaign has multiple
subjects that share the same domains, but the different domains within the spam
campaign open to various spam campaigns, not just one. Generally though, all the
domains in each spam campaign opened to the same affiliate site.
As the subjects were grouped into the spam campaigns, trends among the subject
structures were identified. Many of the spam campaigns contained subjects that had
similar structures in the subject line format. The trends among these subjects included
common misspellings, capitalization of one letter or the whole subject line, short and/or
long length subject lines, and the use of international characters. Once the trends in the
structure of the subjects lines was identified, that criteria was also taken into
consideration when grouping together similar subjects. From there, the spam campaigns
could be analyzed for any overlap among them.
Each active spam campaign is designated by the affiliate site abbreviation
followed by a dash and then a capital letter. For example, a spam campaign for Pharmacy
32
Express (PE) is designated as PE-A. Occasionally there were subjects that appeared once
or twice during the research, but were not observed consistently enough throughout the
research, and therefore were not put into spam campaigns. There were also subjects put
into spam campaign(s), but the spam campaign(s) were not mentioned because the spam
campaign(s) did not appear consistently enough throughout the research.
33
CHAPTER 3
RESULTS AND DISCUSSION
The results of method 1 are depicted from December 2011 through February
2012. The results of method 2 are depicted from March 2012 through August 2012. Each
month contains a table illustrating the appearances and disappearances of each spam
campaign that was active per month. The results from December through February focus
on the frequency of the daily pharma spam that was analyzed as well as the number of
spam campaigns active for each month. The results from March through August focus
solely on the appearances and disappearances of the spam campaigns and any overlaps
among them. Significant observations observed throughout the research were also
discussed.
Monthly Results
December 2011
The pharma spam subjects that appeared with at least 1,000 entries in the spam
data mine each day were analyzed. Using method 1, a total of 452 pharma spam subjects
that appeared with at least 1,000 entries in the spam data mine each day were identified
for December (Figure 10). These 452 pharma spam subjects were responsible for
1,965,222 entries in the spam data mine for the month of December. Of the 452 subjects
34
collected, there were 115 distinct subjects; many subjects were repeated throughout the
month. Each day, the subject that appeared in the greatest volume was a subject in the
spam campaign CP-A. This spam campaign alone accounted for 52% of the pharma spam
observed in December.
30
25
20
15
10
5
0
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
Figure 10- Number of subjects over 1,000 by day for December
Six affiliate sites had active spam campaigns during December: Canadian
Pharmacy (CP), Online Pharmacy (OP), Canadian Health&Care Mall (CHCM), Canadian
Neighbor Pharmacy (CNP), My Canadian Pharmacy (MCP), and Pharmacy Express
(PE). Based on analysis done in November of the pharma spam subjects that were
collected, 12 spam campaigns were identified for the month of December. Affiliate sites
CHCM, OP, and CNP each had three active spam campaigns in December, while MCP,
PE, and CP each only had one (Figure 11).
4
3
2
1
0
CP
OP
CHCM
CNP
MCP
PE
Figure 11- Number of spam campaigns in December for each affiliate site that appeared
35
On December 10th and 28th, only one pharma spam subject appeared with at least
1,000 entries in the spam data mine (CP-A), and December 29th was the only day no
pharma spam subjects appeared with at least 1,000 entries in the spam data mine. Spam
campaign CP-A appeared the most this month, while spam campaign PE-A appeared the
least. There were three subjects a part of the CNP-A* spam campaign that appeared
within the MCP-A* spam campaign on December 20th. The subjects in CNP-A* were
associated with MCP-A* because the subjects’ corresponding domains opened to both
affiliate sites MCP and CNP at some point in December.
Spam campaign CNP-C was independent of any other CNP spam campaign
during December, but CNP-A* overlapped with spam campaign CNP-B as well as spam
campaign MCP-A*. The overlap of spam campaign CNP-A* and spam campaign MCPA* was likely due to the common subject between the two spam campaigns.
There was no overlap of the OP spam campaigns during December. As one OP
spam campaign disappeared another reappeared. The activity for all of the active spam
campaigns in December is shown in Table 4.
CHCM-C
CHCM-B
CHCM-A
CNP-C
CNP-B
CNP-A*
MCP-A*
OP-C
OP-B
OP-A
CP-A
PE-A
Spam Campaign
December
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Table illustrating when each spam campaign appeared and disappeared in December
Table 4
20
21
22
23
24
25
26
27
28
29
30
31
36
37
January 2012
The most significant observation in January was the dramatic decrease in daily
spam subjects that appeared with at least 1,000 entries in the spam data mine each day.
From January 21st -31st, only one pharma spam subject appeared with at least 1,000
entries in the spam data mine, “BUY NOW VIAGRA CIALIS !!!”. There were a total of
349 pharma spam subjects that appeared with at least 1,000 entries in the spam data mine
each day in January (Figure 12). The 349 pharma spam subjects accounted for 2,813,677
entries in the spam data mine for January. Of those 349, there were 96 distinct subjects.
In January, the spam campaign CP-A alone accounted for 62% of the pharma
spam observed in January. The spam campaign CP-A was again the primary spam
campaign observed, and even with the significant decrease in daily pharma spam
subjects, this spam campaign’s volume increased 10% from December to January. The
volume increase in spam campaign CP-A caused a 70% increase from December to
January in the amount of pharma spam that was observed.
30
25
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Figure 12- Number of subjects over 1,000 by day for January
38
Only five affiliate sites were observed to have active spam campaigns in January:
CP, OP, CHCM, CNP, and MCP (Figure 13). Affiliate site PE did not have any active
spam campaigns in January. Of the twelve December spam campaigns, only ten remained
active in January, and no new spam campaigns appeared. All three spam campaigns for
affiliate sites CHCM and OP remained active from December to January. Two CNP spam
campaigns remained active from December to January. Affiliate sites CP and MCP each
had their one spam campaign remain active from December to January.
4
3
2
1
0
CP
OP
CHCM
CNP
MCP
Figure 13- Number of spam campaigns in January for each affiliate site that appeared
Spam campaign CP-A appeared the most often again in January and spam
campaigns OP-A and CNP-C appeared the least often. Spam campaign CNP-A* was not
observed in January because the subjects in this spam campaign were now associated
with spam campaign MCP-A*. The CNP spam campaigns did not overlap in time with
each other, but did overlap with spam campaign MCP-A*. The only overlap observed
among the OP spam campaigns occurred on January 2nd. On the days in January that
spam campaign CHCM-A did not appear, either CHMC-B or CHCM-C would appear.
Table 5 illustrates the activity of all the active spam campaigns during the month of
January.
CHCM-C
CHCM-B
CHCM-A
CNP-C
CNP-B
MCP-A*
OP-C
OP-B
OP-A
CP-A
Spam Campaign
January
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Table illustrating when each spam campaign appeared and disappeared in January
Table 5
20
21
22
23
24
25
26
27
28
29
30
31
39
40
February 2012
The decrease in the daily pharma spam subjects continued from January into
February. In February only a total of 47 subjects appeared with at least 1,000 entries in
the spam data mine each day (Figure 14) and of those 47, there were only 17 distinct
subjects. The total number of entries that the 47 pharma spam subjects accounted for this
month, however, was still over a million; 1,522,032; a 54% decrease from January to
February.
A large percentage of the pharma spam subjects were again courtesy of the CP-A
spam campaign, which accounted for 92% of the pharma spam observed in February; a
30% increase from January. Even with the large volume of spam accumulated for the
spam campaign CP-A, February did not have more pharma spam subjects collected than
January; unlike from December to January. This is mainly due to there being very few
pharma spam subjects observed each day. The increase in subjects on February 27th is
due to the analysis of all the subjects that appeared with at least 1,000 entries in the spam
data mine each day, not just those subjects that appeared with at least 1,000 entries in the
spam data mine and contained the key words.
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Figure 14- Number of subjects over 1,000 by day for February
41
Of the ten January spam campaigns, only one remained active in February. Three
new spam campaigns appeared in February and one spam campaign from December
reappeared. This month only three affiliate sites appeared with active spam campaigns:
CP, OP, and PE (Figure 15). One spam campaign remained active from January to
February for affiliate site CP, and there was one new spam campaign that appeared, CPB. Affiliate site PE had one spam campaign from December reappear in February, PE-A,
and one new spam campaign appear, PE-B. No spam campaigns remained active for
affiliate site OP, but one new spam campaign, OP-D, appeared in February.
3
2
1
0
CP
OP
PE
Figure 15- Number of spam campaigns in February for each affiliate site that appeared
No pharma spam subjects appeared with at least 1,000 entries in the spam data
mine on February 2nd. For the majority of the days in February, the only pharma spam
subject that appeared with at least 1,000 entries in the spam data mine was “BUY NOW
VIAGRA CIALIS”. The new spam campaign CP-B included subjects whose
corresponding domains opened to affiliate sites PE, OP, CP, and Toronto Drug Store
(TDS). All of the subjects that were observed on February 27th contained domains that
opened to affiliate sites PE, OP, and/or CP. The subjects observed on February 28th and
29th all contained domains that opened to affiliate sites OP and TDS. Table 6 illustrates
the activity of all the spam campaigns active during the month of February.
OP-D
CP-B
CP-A
PE-B
PE-A
Spam Campaign
February
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Table illustrating when each spam campaign appeared and disappeared in February
Table 6
18
19
20
21
22
23
24
25
26
27
28
29
42
43
March 2012
Due to the dramatic decrease in pharma spam subjects appearing with at least
1,000 entries in the spam data mine each day, which occurred from January 21st, 2012
through February 2012, method 2 was used to analyze the pharma spam from March
through the end of the project.
During March, there were six affiliate sites with active spam campaigns: CP, OP,
CHCM, CNP, MCP, and PE (Figure 16). In addition to their individual spam campaigns,
affiliate sites PE and CP were jointly associated with one spam campaign (CPE). All five
spam campaigns that were active in February remained active in March. Nine spam
campaigns from January reappeared in March and ten new spam campaigns appeared.
The spam campaign for CPE appeared in March, CPE-A. Two CNP spam
campaigns from January reappeared in March, CNP-B and CNP-C, as well as one MCP
spam campaign, MCP-A*. Four PE spam campaigns were active in March; two remained
active from February to March, and two new spam campaigns appeared: PE-C and PE-D.
There were six active CHCM spam campaigns in March; three spam campaigns from
January reappeared: CHCM-A, CHCM-B, and CHCM-C, and three new spam campaigns
appeared: CHCM-D, CHCM-E, and CHCM-H. Five CP spam campaigns were active in
March; two spam campaigns remained active from February to March, and three new
spam campaigns appeared: CP-C, CP-D, and CP-E. There were also five OP spam
campaigns active in March. One spam campaign remained active from February to
March, three spam campaigns from January reappeared in March: OP-B, OP-C, and OP-
44
D, and one new spam campaign appeared, OP-E. This month affiliate site CHCM had the
most spam campaigns active.
7
6
5
4
3
2
1
0
CP
OP
CHCM
CNP
MCP
PE
CPE
Figure 16- Number of spam campaigns in March for each affiliate site that appeared
In March, spam campaign PE-A appeared the most often while spam campaign
CNP-C again appeared the least often, along with CP-D. Only subjects from spam
campaigns CP-A, CP-C, and OP-D appeared on March 2nd. The subjects in the new spam
campaign CPE-A included corresponding domains that opened to affiliate sites CP and/or
PE. In March, affiliate site CP appeared most often with spam campaign CPE-A.
Spam campaign CNP-B included three subjects that were previously associated
with spam campaign MCP-A*. The subjects previously associated with the spam
campaign MCP-A* included corresponding domains however, that never loaded. One
subject from spam campaign CNP-C appeared within spam campaign CNP-B on March
3rd and 4th.
On March 3rd, 4th, and 27th, spam campaign CHCM-H contained subjects whose
corresponding domains opened to affiliate site MCP. Spam campaigns CHCM-C,
CHCM-B, and CHCM-A were observed in succession with each other. Spam campaign
CHCM-B appeared first in March, and then on its last active day, spam campaign
45
CHCM-C appeared. Once spam campaign CHCM-C became inactive, spam campaign
CHCM-A appeared. The only overlap among the CHCM spam campaigns was on March
13th.
Two subjects from spam campaign OP-B were observed in combination with
spam campaign OP-C in March. The subjects included in spam campaign OP-B appeared
within spam campaign OP-C on March 16th -18th.
In addition to affiliate site CP, both affiliate site OP and affiliate site PE appeared
within spam campaign CP-A this month. On March 2nd and March 14th, affiliate site OP
appeared along with affiliate site CP. Affiliate site PE appeared only on March 31st
within spam campaign CP-A. The subjects in spam campaign CP-B continued to have
corresponding domains that opened to affiliate sites CP, PE, and/or OP; though CP was
still the dominant affiliate site.
Spam campaign CP-E contained some subjects whose corresponding domains
opened to affiliate sites CHCM, PE, and Men’s Health (MH) in addition to affiliate site
CP. Affiliate site PE only appeared on March 17th along with affiliate site CP within
spam campaign CP-E. Affiliate site MH only appeared on March 27th within CP-E.
Affiliate site CHCM appeared within CP-E on March 25th and 28th alone, and again on
March 23rd, 26th, 27th, and 29th along with affiliate site CP. The spam campaign activity
for March is shown in Table 7. Out of all the months analyzed throughout this research,
the most spam campaign activity occurred during the month of March.
March
Spam Campaign
PE-A
PE-B
PE-C
PE-D
CP-A
CP-B
CP-C
CP-D
CP-E
CPE-A
CNP-B
CNP-C
MCP-A*
OP-A
OP-B
OP-C
OP-D
OP-E
CHCM-A
CHCM-B
CHCM-C
CHCM-D
CHCM-E
CHCM-H
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Table illustrating when each spam campaign appeared and disappeared in March
Table 7
19
20
21
22
23
24
25
26
27
28
29
30
31
46
47
April 2012
During April, there were seven affiliate sites with active spam campaigns: CP,
OP, CHCM, CNP, MCP, PE, and PMP (Figure 17). Again in April, PE and CP were
jointly associated with one spam campaign (CPE), in addition to their individual spam
campaigns. Of the 22 spam campaigns in March, 15 spam campaigns remained active in
April. Four new spam campaigns appeared.
Two spam campaigns for both OP and PE remained active from March to April.
There was one spam campaign for CNP that remained active from March to April and
one new spam campaign that appeared in April, CNP-D. Five spam campaigns for CP
remained active from March to April, and one new spam campaign appeared in April,
CP-F. There were three CHCM spam campaigns that remained active from March to
April as well as one new spam campaign that appeared in April, CHCM-F. The spam
campaigns for both CPE and affiliate site MCP remained active from March to April.
One new affiliate site appeared in April, Pharmacy Med Pro (PMP), with one active spam
campaign, PMP-A. In April, the affiliate site with the most active spam campaigns was
CP.
8
6
4
2
0
CP
OP
CHCM
CNP
MCP
PE
CPE
PMP
Figure 17- Number of spam campaigns in April for each affiliate site that appeared
48
Similar to March, the affiliate site with the most active spam campaigns in April
was CP. This month, affiliate site PE appeared within spam campaign CHCM-F on April
5th, 7th, 8th, and 9th. On April 13th and 14th, affiliate site MCP appeared within spam
campaign CHCM-H. Spam campaign CNP-D contained two subjects whose
corresponding domains once opened to affiliate site CHCM (CHCM-C), but in April
were opening to affiliate site CNP. The only overlap between the two CNP spam
campaigns occurred on April 4th.
Some of the subjects within spam campaign PE-C contained corresponding
domains that opened to affiliate site CP on April 8th -10th. This month, affiliate sites PE
and CP appeared an equal amount of times within spam campaign CPE-A.
Spam campaign CP-C contained one subject whose corresponding domains
opened to both affiliate sites CHCM and CP (April 14th), as well as two subjects whose
corresponding domains only opened to affiliate site CHCM (April 5th and April 7th).
Affiliate site PE appeared within spam campaign CP-A on April 1st, 2nd, 5th, 26th, and
27th. Spam campaign CP-B only contained subjects whose corresponding domains
opened to affiliate site PE this month. In April, only affiliate sites CP and CHCM
appeared within spam campaign CP-E. On April 12th, affiliate site OP appeared along
with affiliate site CP within spam campaign CP-F, as well as on April 17th by itself. Table
8 illustrates the all of the active spam campaigns’ activity during the month of April.
3
1
2
3
4
5
6
7
8
9
Due to a data base error, no data is available for this day
April
Spam
Campaign
PE-A
PE-C
CP-A
CP-B
CP-C
CP-D
CP-E
CP-F
CPE-A
CNP-B
CNP-D
MCP-A*
OP-C
OP-E
CHCM-A
CHCM-D
CHCM-F
CHCM-H
PMP-A
10
11
12
13
14
15
16
17
18
Table illustrating when each spam campaign appeared and disappeared in April
Table 8
19
20
21
22
23
24
253
26
27
28
29
30
49
50
May 2012
There were six affiliate sites with active spam campaigns in May: CP, OP,
CHCM, CNP, PE, and PMP (Figure 18). As with previous months, PE and CP were
jointly associated with one spam campaign (CPE), in addition to their individual spam
campaigns. Only 12 of the 19 active spam campaigns from April remained active in May.
Four new spam campaigns appeared in May, and one spam campaign from March
reappeared.
The spam campaigns for CPE remained active from April to May. The one spam
campaign for affiliate site PMP remained active from April to May. Both spam
campaigns for affiliate site OP remained active from April to May. Two spam campaigns
for affiliate site PE remained active from April to May, and one spam campaign from
March reappeared in May, PE-D. Three spam campaigns for affiliate site CP remained
active from April to May. One of the spam campaigns for affiliate site CP, however, was
renamed and assigned to affiliate site CHCM, leaving affiliate site CP with only two
active spam campaigns in May. Three CHCM spam campaigns remained active from
April to May, two new spam campaigns appeared in May, CHCM-I and CHCM-J, and
one spam campaign was gained from CP. Two new spam campaigns appeared in May for
affiliate site CNP, CNP-E and CNP-F, and one spam campaign remained active from
April to May. In May, the most spam campaigns active were for affiliate site CHCM.
51
8
6
4
2
0
CP
OP
CHCM
CNP
PE
CPE
PMP
Figure 18- Number of spam campaigns in May for each affiliate site that appeared
Spam campaigns PE-A and PE-C appeared the most often in May, and spam
campaign CP-C (CHCM-G) and spam campaign CP-F appeared the least often. This
month affiliate site CP appeared most often within spam campaign CPE-A. Unlike in
April, affiliate site CP did not appear within the PE-C spam campaign in May.
In May, a spam campaign was taken from affiliate site CP and assigned to
affiliate site CHCM because spam campaign CP-C no longer contained subjects whose
corresponding domains opened to affiliate site CP, but to affiliate site CHCM instead.
Spam campaign CP-C is now designated as CHCM-G. Spam campaign CHCM-J was
observed on the days in May that no subjects within spam campaign CHCM-I appeared.
The only overlap observed among the CNP spam campaigns this month occurred on May
14th. The spam campaign activity for all of the active spam campaigns in May is shown in
Table 9.
CHCM-G
CHCM-H
CHCM-I
CHCM-J
PMP-A
CP-C
CP-F
CPE-A
CNP-D
CNP-E
CNP-F
OP-E
CHCM-D
CHCM-F
May
Spam Campaign
PE-A
PE-C
PE-D
CP-A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Table illustrating when each spam campaign appeared and disappeared in May
Table 9
19
20
21
22
23
24
25
26
27
28
29
30
31
52
53
June 2012
In June, there were five previously observed affiliate sites with active spam
campaigns: CP, OP, CHCM, CNP, PE, and two new affiliate site appeared with active
spam campaigns: RX Discounts (RD) and RX Deals (RXD) (Figure 19). As with
previous months, PE and CP were jointly associated with one spam campaign (CPE), in
addition to their individual spam campaigns in June. There were 14 spam campaigns that
remained active in June out of the 17 active spam campaigns in May. Five new spam
campaigns appeared.
The spam campaign for CPE remained active from May to June. There were two
CNP spam campaign that remained active from May to June and one new spam campaign
that appeared in June, CNP-G. One spam campaign for affiliate site CP remained active
from May to June, and one new spam campaign appeared in June, CP-G. In June another
spam campaign for affiliate site CP was renamed. This month the CP spam campaign was
assigned to affiliate site OP, leaving affiliate site CP with only one active spam campaign
in June. One OP spam campaign remained active from May to June and two new spam
campaigns appeared in June, OP-F and OP-H. No new spam campaigns appeared for
affiliate site PE in June, but three spam campaigns remained active from May to June.
Six CHCM spam campaigns remained active from May to June. Affiliate site CHCM also
lost a spam campaign in June. The spam campaign was renamed and assigned to the new
affiliate site RXD, leaving affiliate site CHCM with five active spam campaigns. Two
new affiliate sites appeared in June, RXD and RD; each with one spam campaign, RXDA and RD-A. The majority of the spam campaigns active in June again came from
54
affiliate site CHCM. Similar to May, the affiliate site with the most active spam
campaigns in May was affiliate site CHCM.
6
5
4
3
2
1
0
CP
OP
CHCM
CNP
PE
CPE
RD
RXD
Figure 19- Number of spam campaigns in June for each affiliate site that appeared
The spam campaigns that appeared most often in June were spam campaign CP-F
(OP-G) and CPE-A. The spam campaign that appeared the least in June was spam
campaign OP-H. This month a CP spam campaign was renamed and assigned to affiliate
site OP because spam campaign CP-F no longer contained subjects whose corresponding
domains that opened to affiliate site CP, but instead to affiliate site OP. The spam
campaign CP-F is now designated as spam campaign OP-G.
A CHCM spam campaign was also renamed and was assigned to affiliate site
RXD because spam campaign CHCM-G included subjects whose corresponding domains
opened to affiliate sites RXD, Generic Pills (GP), and Top Pharm (TP); while only one
subject’s corresponding domains opened to affiliate site CHCM (June 1st). The spam
campaign CHCM-G is now designated as spam campaign RXD-A.
The new spam campaign RD-A was made up of subjects whose corresponding
domains opened to multiple different affiliate sites. In June the affiliate sites associated
with this spam campaign were: RD, Mega RX (MRX), and RX Orders (RXO).
55
No CNP spam campaigns were active concurrently this month. Spam campaign
CNP-E was the CNP spam campaign that first appeared in June and once it became
inactive, spam campaign CNP-D became active for the rest of the month, with the
exception of about a week. During that week, spam campaign CNP-G was active. The
activity for the active spam campaigns in June is shown in Table 10.
June
Spam campaign
PE-A
PE-C
PE-D
CP-F
CP-G
CPE-A
CNP-D
CNP-E
CNP-G
OP-E
OP-F
OP-G
OP-H
CHCM-D
CHCM-F
CHCM-G
RXD-A
CHCM-H
CHCM-I
CHCM-J
RD-A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Table illustrating when each spam campaign appeared and disappeared in June
Table 10
18
19
20
21
22
23
24
25
26
27
28
29
30
56
57
July 2012
There were six affiliate sites with active spam campaigns in July: OP, CHCM,
CNP, PE, RD, and RXD (Figure 20). In addition to their own spam campaigns, again
affiliate sites PE and CP were jointly associated with one spam campaign (CPE). Of the
19 spam campaigns active in June, there were 14 spam campaigns that remained active in
July. Three new spam campaigns appeared.
One spam campaign remained active from June to July for CPE and affiliate sites
CNP, RXD, and RD. Three CHCM spam campaigns remained active from June to July;
no new spam campaigns appeared in July. Three OP spam campaigns remained active
from June to July and one new OP spam campaign appeared in July, OP-I. Only one
spam campaign for affiliate site CP remained active from June to July, but the spam
campaign was renamed and assigned to affiliate site PE, leaving affiliate site CP with no
active spam campaigns in July. There were three PE spam campaigns that remained
active from June to July and two new PE spam campaigns appeared in July, PE-E and
PE-F. Unlike any other month, the most active spam campaigns belonged to affiliate site
PE in July.
8
6
4
2
0
OP
CHCM
CNP
PE
CPE
RD
RXD
Figure 20- Number of spam campaigns in July for each affiliate site that appeared
58
Spam campaign PE-A appeared the most often in July and spam campaign OP-F
appeared the least often. The PE spam campaigns attributed to the majority of the spam
campaign activity in July. Even though a spam campaign for every affiliate site that was
active appeared every day in July, this month contained the least amount of spam
campaign activity of all the months analyzed throughout this research. When spam
campaign PE-D appeared in other months, the subjects within the spam campaign
contained all upper case letters in the subject. During July, the same variations of subjects
were observed, except the subjects contained upper and lower case letters. Spam
campaign PE-F included subjects whose corresponding domains opened to affiliate site
PE as well as a foreign online pharmacy Mister Joy (MJ). The domains opened to MJ for
only one subject, and that subject appeared on July 26th and July 27th.
Spam campaign RXD-A no longer included subjects whose corresponding
domains opened to affiliate site CHCM. This month the subjects’ corresponding domains
opened to three affiliate sites from June: GP, RXD, TP, as well as two new affiliate sites:
Direct Pharm (DP) and Pills Shop (PS). This month, only affiliate site CP appeared
within spam campaign CPE-A. In July, only affiliate site MRX appeared within spam
campaign RD-A.
Affiliate site CP lost another spam campaign in July, this month to affiliate site
PE. In spam campaign CP-G, the subjects’ corresponding domains were no longer
opening to affiliate site CP. Instead, the subjects corresponding domains were opening to
affiliate site PE, thus spam campaign CP-G is now designated as spam campaign PE-G.
Table 11 illustrates the activity of all of the active spam campaigns for the month of July.
July
Spam campaign
PE-A
PE-C
PE-D
PE-E
PE-F
PE-G
CP-G
CPE-A
CNP-D
OP-E
OP-F
OP-G
OP-I
CHCM-D
CHCM-F
CHCM-H
RXD-A
RD-A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Table illustrating when each spam campaign appeared and disappeared in July
Table 11
19
20
21
22
23
24
25
26
27
28
29
30
31
59
60
August 2012
Seven affiliate sites had active spam campaigns in August: CP, OP, CHCM, CNP,
PE, RD, and RXD (Figure 21). There were six new spam campaigns that appeared in
August and 13 of the 17 spam campaigns from July remained active. One spam campaign
from June reappeared in August.
One spam campaign for each of the affiliate sites RD, RXD, and CP remained
active from July to August. Five new OP spam campaigns appeared in August: OP-J, OPK, OP-L, OP-M, OP-N, and two spam campaigns remained active from July to August.
All six PE spam campaigns remained active from July to August; no new spam
campaigns appeared. Two CHCM spam campaigns remained active from July to August;
no new spam campaigns appeared. One CNP spam campaign from June reappeared in
August, CNP-G, and one CNP spam campaign remained active from July to August. This
month, unlike any other month, the most spam campaigns active were for affiliate site
OP.
8
6
4
2
0
CP
OP
CHCM
CNP
PE
RD
RXD
Figure 21- Number of spam campaigns in August for each affiliate site that appeared
In August spam campaign RXD-A appeared the most often and spam campaign
OP-M appeared the least often. Similar to July, the PE spam campaigns accounted for
most of the spam campaign activity. The activity of the spam campaigns increased from
61
July to August. Mostly everyday a new subject in the spam campaign PE-F replaced a
previous subject. Spam campaign PE-F started with three different subjects in July and
expanded to 31 different subjects in August. Only two of the subjects that appeared in
July also appeared in August. During August, MJ appeared within spam campaign PE-F
on August 24th and 26th; associated with a different subject than in July. Affiliate site CP
appeared once on August 23rd within spam campaign PE-F.
The affiliate sites that appeared within spam campaign RD-A in August included
two affiliate sites from June, MRX and RXO, one affiliate site from July, RD, and three
new affiliate sites: Direct Pills (DTP), Discount Meds (DM), and Top Meds (TM).
Affiliate site RD appeared the most when subjects in this spam campaign appeared,
followed by affiliate site DM and affiliate site DTP.
During the month of August, the same two affiliate sites from July appeared with
spam campaign RXD-A: RXD and GP. There was also one new affiliate site that
appeared, Global RX (GRX), and two known affiliate sites that appeared within the spam
campaign, PE and CP. The majority of the days in August, the subjects’ corresponding
domains opened to affiliate sites RXD, GP, and PE. Affiliate site GRX was only
observed on August 23rd -25th. On August 25th, affiliate site GRX was observed along
with affiliate sites RXD and PE. Affiliate site CP only appeared on August 30th and 31st.
The only days in August affiliate site PE did not appear were August 5th, 23rd and 24th.
Spam campaign OP-L contained one subject, but occasionally, there would be a
different variation of the subject that also appeared in this spam campaign. When the
different version of the subject would appear, both subjects would appear on the same
62
day. Spam campaign OP-J only contained two subjects and both subjects appeared within
spam campaign CNP-A* in December as well as spam campaign MCP-A* in January
and March. During August, however, both subjects contained corresponding domains that
opened to affiliate site OP.
The two CNP spam campaigns alternated appearances, so there was no overlap
between the CNP spam campaigns in August. The spam campaign activity for the active
spam campaigns in August is shown in Table 12.
August
Spam campaign
PE-A
PE-C
PE-D
PE-E
PE-F
PE-G
CP-H
CNP-D
CNP-G
OP-E
OP-I
OP-J
OP-K
OP-L
OP-M
OP-N
CHCM-D
CHCM-F
RXD-A
RD-A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Table illustrating when each spam campaign appeared and disappeared in August
Table 12
19
20
21
22
23
24
25
26
27
28
29
30
31
63
64
Significant Observations
Affiliate sites CHCM and CNP do not share a template or domains, but the two
affiliate sites do share several IP addresses. The two affiliate sites also share a pattern
among the domains associated with their subjects. Beginning in May, affiliate sites CNP
and CHCM both contained subjects with multiple different domains that would appeared
and disappear. These domains would appear for about a week and then disappear and be
replaced by other domains. This occurred through August. Spam campaigns CNP-D,
CNP-E, CNP-F, CNP-G, CHCM- F, CHCM-I, and CHCM- J all contained subjects
whose domains appeared and disappeared (Table 13a and Table 13b). Table 13a contains
domains that were observed beginning in May, June and most of July. Table 13b contains
domains that appeared during the end of July and into August.
Association among these different subgroups of domains comes from the domains
in one subgroup appearing with other domains within the same subgroup. Not all of the
domains in one subgroup appeared at the same time. At most, 2 or 3 domains would
appear together with on pharma spam subject. When one domain disappeared, either the
other two remained independently, or another domain was added. For example, subject A
was associated with domain a, domain b, and domain c on one day, and then the next day
only domain b and domain c were associated with subject A. Sometimes domain a would
be replaced by domain d and other times, only domain b and domain c would remain
associated with subject A. This domain pattern occurred through the end of the research
and no overlap among the different subgroups was observed, nor was there any overlap
among the CNP and CHCM domains.
65
Table 13a
Subgroup of domains that appeared with CHCM and CNP spam campaigns in May,
June, and July
CHCM Domains
jarb.ru
haii.ru
jkvix.ru
ghdsjg.ru
sdfhgh.ru
cotu.ru
qgfdg.ru
jkvix.ru
sdfhgs.ru
sdfjgh.ru
sdfhsj.ru
blkfts.ru
sdhfg.ru
cjzkef.ru
syli.ru
dfnmsd.ru
sugn.ru
dfdonf.ru
tbin.ru
dfnsmf.ru
tamy.ru
dfnsnf.ru
thow.ru
dhfjsf.ru
tiew.ru
fdjss.ru
utay.ru
fndmsf.ru
uped.ru
fkvjgx.ru
whas.ru
fnsmsf.ru
vieh.ru
igjfmd.ru
zaph.ru
vkflrk.ru
zinh.ru
vhdhfd.ru
zirl.ru
xkflwf.ru
ardt.ru
akjkkd.ru
dhhfgh.ru
zjfkqf.ru
nbmvdf.ru
dfnmsd.ru
nvmcn.ru
sitty.ru
ooidkf.ru
skfjjs.ru
pchvif.ru
skirk.ru
qkfgif.ru
vcxjvv.ru
wieu.ru
vhcjv.ru
dhfjsg.ru
dhfgs.ru
dfsmnf.ru
mkfluk.ru
CNP Domains
loug.ru
hfdjd.ru
lupp.ru
jhfdj.ru
maimy.ru
jvhjaq.ru
orpe.ru
kjckv.ru
phad.ru
kjvkxd.ru
phah.ru
ontia.ru
poug.ru
outtp.ru
plew.ru
pitre.ru
pryl.ru
qbncb.ru
puee.ru
rubed.ru
pupt.ru
schid.ru
qeurtu.ru
sdgfg.ru
sdfgsh.ru
sdhff.ru
rawn.ru
slous.ru
sdfhjg.ru
socey.ru
sdhfhf.ru
soind.ru
sdhfjh.ru
surum.ru
bvhuf.ru
vcbxn.ru
cvnmr.ru
vchvvc.ru
dfjdd.ru
vhcjjv.ru
djfhf.ru
vncmv.ru
dfjdd.ru
cvjckv.ru
dfjkd.ru
vkjcvv.ru
hism.ru
fecy.ru
humf.ru
ghfdjg.ru
hyse.ru
hief.ru
jdfuf.ru
julone.ru
setle.ru
kreap.ru
bvhuf.ru
kjffg.ru
frof.ru
jetle.ru
66
Table 13b
Subgroup of domains that appeared with CHCM and CNP spam campaigns beginning at
the end of July through August
CHCM Domains
jith.ru
herj.ru
bolf.ru
hiek.ru
ethi.ru
husc.ru
eyst.ru
chli.ru
jium.ru
lhad.ru
nuld.ru
moif.ru
nyed.ru
oing.ru
oxie.ru
pahs.ru
peem.ru
plii.ru
powd.ru
rerr.ru
vatz.ru
terct.ru
vaub.ru
verib.ru
waral.ru
wetz.ru
wreb.ru
wrau.ru
yult.ru
adae.ru
CNP Domains
inif.ru
neaf.ru
jkflkf.ru
kkflik.ru
undy.ru
ukfjxf.ru
vasce.ru
ytti.ru
elof.ru
zkflwf.ru
eers.ru
foan.ru
gnag.ru
gnoe.ru
lolm.ru
mcal.ru
muet.ru
olew.ru
othy.ru
ouck.ru
prua.ru
tbing.ru
tetl.ru
tirv.ru
ucle.ru
uncup.ru
uniot.ru
urax.ru
gaig.ru
wrof.ru
Affiliate site MCP could also be included in this connection between affiliate sites
CNP and CHCM. There was overlap among spam campaigns between affiliate sites CNP
and MCP in December and January. Also, the spam campaign for affiliate site MCP often
appeared on the same days as spam campaigns for affiliate sites CNP and CHCM until
affiliate MCP disappeared in May. There are also shared IP addresses among affiliate
sites CNP, CHCM, and MCP.
The only significant pattern of overlap among spam campaigns was observed
between spam campaigns for affiliate sites CNP and CHCM. Beginning in December,
spam campaigns CNP-B and CHCM-A appeared on the same days except December 5th,
17th, 21st, and 27th. The spam campaigns appeared on the same days in January as well
except for January 8th-10th. The two spam campaigns did not appear in February and there
67
was no clear pattern of overlap in March. In April, spam campaigns CNP-B and CHCMA appeared on the exact same days, April 1st-4th, and then neither spam campaign
appeared again throughout the research.
Another overlap between spam campaigns for affiliate sites CNP and CHCM
began in April with spam campaigns CNP-D and CHCM-F. In April, both spam
campaigns appeared on the same days except on April 30th. No clear pattern was
observed between the two spam campaigns in May, but in June, spam campaigns CNP-D
and CHCM-F appeared on the exact same days, June 4th-9th and June 16th-30th. Spam
campaigns CNP-D and CHMC-F appeared again on the exact same days in July except
for July 13th. In August, the only days the spam campaigns CNP-D and CHCM-F did not
appear on the same days were August 18th-21st, and August 31st. Spam campaigns CNP-G
and CHCM-J displayed only one clear pattern of overlap, which occurred in June, when
they appeared on the exact same days, June 10th-15th. After June, the two spam
campaigns did not appear within the same month together.
Another interesting finding occurred among spam campaigns and their associated
affiliate sites. The spam campaign CP-C first appeared in March with subjects whose
corresponding domains opened to affiliate site CP. Spam campaign CP-C remained active
through April. Beginning in May, spam campaign CP-C no longer contained subjects
whose corresponding domains were opening to affiliate site CP, but to affiliate site
CHCM instead. The spam campaign CP-C was assigned to affiliate site CHCM and
renamed CHCM-G in May. However, in June the same spam campaign (CHCM-G) no
longer contained subjects whose corresponding domains were opening to affiliate site
CHCM but to multiple other affiliate sites: GP, RXD, TP, DP, and PS. The spam
68
campaign CHCM-G then became designated as spam campaign RXD-A because out of
the five different affiliate sites that were associated with this spam campaign in June,
RXD appeared the most often. This hinted to two possible conclusions: 1. the same
affiliate is working for multiple affiliate programs and using the same spam campaign
while working for all of the different affiliate programs or 2. Each of the affiliate sites
observed within this spam campaign is just a different template being used by the same
affiliate program.
Two other spam campaigns for affiliate site CP were also renamed and assigned
to other affiliate sites, CP-F and CP-G. Spam campaign CP-G was assigned to affiliate
site PE and renamed PE-G. Spam campaign CP-F was assigned to affiliate site OP and
renamed OP-G.
It was also interesting to find that many of the affiliate sites that appeared in June,
July, and August shared similar templates with other affiliate sites as well as the same IP
addresses. Some affiliate sites shared a template but not necessarily the IP address. This
held true for the templates also; just because multiple affiliate sites shared an IP address
or two, did not mean those same affiliate sites shared a template.
Affiliate sites RXD and GP appeared within spam campaign RXD-A and always
appeared together with the shared IP addresses, 78.110.164.200 and 78.46.248.33. These
two affiliate sites also contained nearly identical templates, meaning the source codes for
their websites were identical except for the name of the affiliate site. Affiliate site TP also
appeared in June within spam campaign RXD-A and was also associated with the same
69
two IP addresses as affiliate sites RXD and GP. The template for affiliate site TP,
however, did not match the template for affiliate sites RXD and GP.
Affiliate sites MRX, RD, and RXO also appeared in June but within spam
campaign RD-A. Both affiliate sites MRX and RD were associated with the same IP
addresses, 193.16.12.67, but affiliate site RXO was not associated with the IP address.
The templates for affiliate sites MRX and RD were somewhat similar, but still very
different, and the template for affiliate site RXO was very different from both affiliates
site MRX and RD. Interestingly enough, though, affiliate site RXO shares a nearly
identical template with affiliate site TP, but the two do not share an IP address or a spam
campaign.
In July, affiliate sites GP, RXD, and TP continued to appear within spam
campaign RXD-A with the same two IP addresses 78.110.164.200 and 78.46.248.33.
Two new affiliate sites appeared within the spam campaign also: DP and PS. Affiliate
sites GP, RXD, and PS all shared the same IP address, 193.164.128.67. Both affiliate
sites PS and DP shared a common IP address with the affiliate sites GP, RXD, and TP,
78.46.248.33. The template for affiliate site PS is nearly identical to the template for
affiliate site RD, but the two affiliate sites do not share common IP addresses or a spam
campaign. Affiliate site MRX was the only affiliate site that appeared with spam
campaign RD-A in July, but affiliate site MRX shared its only associated IP address,
193.164.128.67 with the affiliate sites from spam campaign RXD-A: GP, RXD, and PS.
During August, spam campaign RXD-A appeared with the same affiliate sites
RXD and GP, but also affiliate sites PE, CP, and a new affiliate site GRX. Affiliate site
70
PE always appeared with its own IP address, 84.22.127.34 and does not share a template
with any other affiliate site. When affiliate site CP appeared, it appeared with the same IP
address as affiliate site PE and also does not share a template with any other affiliate
program. Affiliate site GRX does not share a template with another affiliate site either,
but shared the IP address 193.164.128.67 with affiliate sites RXD and GP, as well as
affiliate sites from spam campaign RD-A: DTP, RD, MRX, DM, TM, and RXO.
The spam campaign RD-A appeared in August with more than one affiliate site:
DTP, RD, MRX, DM, TM, and RXO. Each of those affiliate sites shared a common IP
address 193.164.128.67; the same IP address associated with spam campaign RXD-A.
There were also two separate IP addresses, 81.19.183.149 and 88.86.115.45 shared
between affiliate sites DM and TM. Affiliate sites DTP and MRX have nearly identical
templates, but affiliate site DM shares a nearly identical template with affiliate site DP,
which appeared in spam campaign RXD-A in July. Affiliate site TM does not share a
template with any other affiliate site observed throughout the research.
The spam campaigns RXD-A and RD-A contained two different groups of
subjects, but there were two areas of overlaps between the two spam campaigns that
cannot be ignored. The main overlap was the shared IP addresses among the affiliate sites
within each spam campaign. The IP address, 193.164.128.67, appeared with almost all of
the affiliate sites associated with both the spam campaigns. The spam campaigns RXD-A
and RD-A also shared templates among their affiliate sites. In general, affiliate sites
within the same spam campaign share nearly identical templates, but in this case the
affiliate sites from spam campaign RXD-A shared nearly identical templates with the
affiliate sites from spam campaign RD-A.
71
CHAPTER 4
CONCLUSION
Using the UAB spam data mine, pharma spam subjects were collected
daily and analyzed. There were 22 different affiliate sites that made up 47 different spam
campaigns from December 2011 through August 2012. During each month, some
subjects that appeared contained associated domains that did not always load either on
the first visit or after visiting the affiliate site once. There were also some domains that
never opened to an affiliate site. Those “dead” domains could account for certain gaps in
the spam campaign activity throughout the months.
Based on the shared IP addresses, shared patterns of spam campaign appearances,
and the pattern of domain appearances and disappearances for affiliate sites CNP and
CHCM, the two are connected by an affiliate. There were multiple overlaps in the spam
campaigns for affiliate sites CNP, CHCM, and MCP, and the three affiliate sites also
share IP addresses, implying that affiliate site MCP is also run by the same affiliate as
affiliate sites CNP and CHCM.
The appearance of affiliate sites PE and CP within the spam campaign CPE-A
from March through July indicates that the two affiliate sites are connected. Both affiliate
sites also appeared within other spam campaigns together. There are multiple shared IP
addresses between affiliate sites CP and PE. The shared IP addresses and the similar
72
appearance patterns of the spam campaigns for both affiliate sites CP and PE indicate that
the two affiliate sites are being run by the same affiliate. On any given day throughout the
research, a spam campaign for affiliate site CP and/or affiliate site PE appeared. These
two affiliate sites appeared with spam campaigns the most frequently and consistently
throughout the research, including during the drastic change in the number of entries for
pharma spam that was appearing each day in the spam data mine; which affected most of
the other affiliate sites. Affiliate sites CP and PE were the dominate affiliate sites
observed throughout the research.
The overlaps in the templates used in the spam campaigns RXD-A and RD-A
indicate that all of the affiliate sites within both the spam campaigns are different
templates that were made by the same affiliate program. The common IP address,
193.164.128.67, between the two campaigns indicates the same affiliate is hosting all of
these affiliate sites for that affiliate program.
The switch in spam campaigns from CP-C to CHCM-G to RXD-A, as mentioned
earlier, does imply that all the affiliate sites that appeared within all three spam
campaigns: CP, CHCM, PE, GP, RXD, TP, PS, DP, DTP, and GRX, though not
connected in any other manner4, are different templates used by the same affiliate. Even
though PE and CP are run by the same affiliate, that does not, however, mean that the
templates for affiliate sites CP and PE or CHCM came from the same affiliate program as
the affiliate site templates for GP, RXD, TP, PS, DP, DTP, and GRX. There were also no
shared IP addresses among spam campaign RXD-A, spam campaign CHCM-G, or spam
campaign CP-C. This means the affiliate worked for multiple affiliate programs and
4
With the exception of CP and PE
73
continued to use the same spam campaign. It is also important to mention, the affiliate
running theses affiliate sites is not necessarily the same affiliate that runs the independent
spam campaigns for affiliate sites CP and PE or affiliate site CHCM.
Three spam campaigns for affiliate site CP were renamed and assigned to other
affiliate sites. As previously discussed, spam campaign CP-C was assigned to affiliate
CHCM and renamed CHCM-G. Spam campaign CP-G was assigned to affiliate site PE
and renamed PE-G. This is not so unusual because affiliate sites PE and CP have already
been confirmed to be connected. However, spam campaign CP-F was assigned to affiliate
OP and renamed OP-G. This is unusual because outside of this spam campaign’s switch
in affiliate sites, no other evidence has implied a connection between affiliate sites CP
and OP. The IP addresses used when the spam campaign was CP-F are different from the
IP addresses used when the spam campaign was switched to OP-G. This implies that the
same affiliate ran the spam campaign CP-G for one affiliate program and then later began
working for another affiliate program, running the same spam campaign, but with a
different template. Again, this does not mean that the affiliates running the individual
spam campaigns for affiliate site CP are the same affiliates running the individual spam
campaigns for affiliate site OP.
The most consistent spam campaigns were courtesy of affiliate sites PE, CP, and
CHCM. Though affiliate sites CP and PE were the most frequently encountered affiliate
sites throughout the research, spam campaigns for all three affiliate sites remained
constant throughout the entire nine months of the project. There were 47 spam campaigns
identified, which indicates that there is a maximum of 47 different affiliates running the
47 spam campaigns. Those “47” affiliates, though, work for only about eight to ten
74
different affiliate programs, based on the different templates that appeared throughout the
research. Multiple spam campaigns appeared for numerous affiliate sites but that does not
mean that only one affiliate was running all the spam campaigns for each affiliate site.
The multiple spam campaigns for each affiliate site also does not mean only one affiliate
was running each spam campaign. It is very possible that one affiliate could have run at
least two of the spam campaigns that appeared for each affiliate site. In order to be sure,
further research would have to be conducted.
This research was exploratory, so it may not have identified every affiliate site
that was active between December 2011 and August 2012. What this research does reveal
is that the most predominate affiliate sites responsible for the most prevalent spam
campaigns are the affiliate sites CP, PE, and CHCM. From an investigative stand point, if
the servers hosting the domains that run these spam campaigns were taken down, a vast
majority of the spam that appears in the email box’s of Americans would decrease.
75
LIST OF REFERENCES
1. Spam! Lorrie Faith Cranor, Brian A. LaMacchia. 8, 1998, Vol. 41, pp. 74-83.
2. Spam Statistics and Facts. Spam Laws. [Online] 2013. [Cited: 04 04, 2013.]
http://www.spamlaws.com/spam-stats.html.
3. O'Leary, Tom. Spam Statistics: Worst spam offenders, countries, conversion rates.
GroupMail. [Online] 1997-2013. [Cited: 04 04, 2013.] http://group-mail.com/emailmarketing/spam-statistics-worst-spam-offenders-countries-conversion-rates/.
4. Fighting Spam on Social Websites: A Survey of Approaches and Future Challenges.
Paul Heyman, Georgia Koutrika, Hector Garcia-Molina. Stanford : IEEE Computer
Society, 2007, pp. 36-45.
5. K. Levchenko, A. Pitsillidis, N. Chachra, S. Savage, M. Félegyházi, C. Grier et al.
Click Trajectories:End-toEnd Analysis of the Spam Value Chain. 2011.
6. Adult content spam. SECURELIST. [Online] 1997-2013. [Cited: 04 04, 2013.]
http://www.securelist.com/en/threats/spam?chapter=89.
7. Soma Halder, Richa Tiwari, Alan Sprague. Identifying Features to Imporve Real
TIme Clustering and Domain Blacklisting. 2011.
8. Fisher, Tim. Malware. About.com. [Online] [Cited: 04 07, 2013.]
http://pcsupport.about.com/od/termsm/g/malware.htm.
9. Rouse, Margaret. botnet (zombie army). TechTarget. [Online] 02 2012. [Cited: 02 09,
2013.] http://searchsecurity.techtarget.com/definition/botnet.
10. Spam Image Clustering for Identifying Common Sources of Unsolicited Emails.
Chengcui Zhang, Xin Chen, Wei-Bang Chen, Lin Yang, Gary Warner. 3, s.l. : IGI
Global, 2009, International Journal of Digital Crime and Forensics, Vol. 1.
11. Know you Neighbors: Web Spam Detection using the Web Technology. Carlos
Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, Fabrizio Silvestri.
Amsterdam : s.n., 2007, pp. 423-430.
12. Wikipedia. Spamdexing. Wikipedia. [Online] 03 18, 2013. [Cited: 03 19, 2013.]
http://en.wikipedia.org/wiki/Spamdexing#Link_spam.
76
13. Link Spam. [Online] 03 19, 2013. [Cited: 03 19, 2013.]
http://www.searchenginepromotionhelp.com/m/articles/promotion-encyclopedia/linkspam.php.
14. Stacking classifiers for anti-spam filtering of e-mail. Georgios Sakkis, Ion
Androroutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Constantine D.
Spyropoulos, Panagiotis Stamatopoulos. Pittsburg : s.n., 2001, pp. 44-50.
15. Kaiser. U.S> Health Care Costs. kaiseredu.org. [Online] [Cited: 04 04, 2013.]
http://www.kaiseredu.org/issue-modules/us-health-care-costs/background-brief.aspx.
16. Kliff, Sarah. We SPend $750 billion on unnecessary health care. Two charts explain
why. The Washington Post. [Online] 09 07, 2012. [Cited: 04 19, 2013.]
http://www.washingtonpost.com/blogs/wonkblog/wp/2012/09/07/we-spend-750-billionon-unnecessary-health-care-two-charts-explain-why/.
17. Dirt Cheap and Without Prescription: How Susceptible are Young US Consumers to
Purchasing Drugs From Rogue Internet Pharmacies? Lana Ivanitskaya,
PhD,corresponding author1 Jodi Brookins-Fisher, PhD, Irene O´Boyle, PhD,
Danielle Vibbert, PhD Student, Dmitry Erofeev, PhD, and Lawrence Fulton, PhD.
04 26, 2010, Journal of Medical Internet Research.
18. Canadian Pharmacy. [Online] [Cited: 04 11, 2013.]
19. DEA. READ THIS BEFORE PURCHASING PRESCRIPTION DRUGS OVER
THE INTERNET !!! DEA Office of Diversion Control. [Online] [Cited: 03 18, 2013.]
http://www.deadiversion.usdoj.gov/consumer_alert.htm.
20. Pfizer. Countrfeiting & Importation. Pfizer. [Online] [Cited: 03 18, 2013.]
http://www.pfizer.com/products/counterfeit_and_importation/counterfeit_importation.jsp.
21. Peterson, Karen S. Young men add Viagra to their drug arsenal. USA Today. 03 21,
2011.
22. Canadian Neighbor Pharmacy. [Online] [Cited: 04 11, 2013.]
23. World Health Organization. General information on counterfeit medications.
Medicines. [Online] [Cited: 03 18, 2013.] Page 1.
http://www.who.int/medicines/services/counterfeit/overview/en/.
24. Salyer, David. The Dangers of Using and Abusing Viagra. The Body: The Complete
HIV/AIDS Resource. [Online] November/December 2004. [Cited: 08 19, 2012.]
http://www.thebody.com/content/art32246.html.
25. Anonymous, Viagraholics. Frequently Asked Questions. 2006.
77
26. C. Kanich, N. Weaver, D. McCoy, T. Halvorson, C. Kreibich, S. Savage et al.
Show Me the Money: Characterizing Spam-advertised Revenue.
27. Spam Trackers. SpamIt. Spam Trackers. [Online] 10 01, 2010. [Cited: 01 15, 2013.]
spamtrackers.eu/wiki/index.php/Glavmed.
28. —. Glavmed. Spam Trackers. [Online] 11 29, 2010. [Cited: 01 15, 2013.]
http://spamtrackers.eu/wiki/index.php/Glavmed.
29. KRAMER, ANDREW E. E-Mail Spam Falls After Russian Crackdown. The New
York Times. [Online] 10 26, 2010. [Cited: 04 17, 2013.]
http://www.nytimes.com/2010/10/27/business/27spam.html?_r=2&.
30. Krebs, Brian. SpamIt, Glavmed Pharmacy Network Exposed. Krebs on Security.
[Online] 02 11, 2011. [Cited: 01 15, 2013.] krebsonsecurity.com/2011/02/spamitglavmed-pharmacy-networks-exposed/#more-8147.
31. —. Rove Digital Was Core ChronoPay Shareholder. Krebs on Security. [Online] 11
11, 2011. [Cited: 04 17, 2013.] http://krebsonsecurity.com/tag/igor-gusev/.
32. —. Spam Volumes Dip After Spamit.com Closure. Krebs on Security. [Online] 10 10,
2010. [Cited: 01 16, 2013.] http://krebsonsecurity.com/2010/10/spam-volume-dip-afterspamit-com-closure/#more-5593.
33. Microsoft. Microsoft Security Intelligence Report.
34. —. Microsoft Intelligence Security Report: July through December 2007. 2008. p. 68.
35. Chun Wei, Alan Sprague, Gary Warner, Anthony Skjellum. Mining Spam Email
to Identify Common Origins for Forensic Application. 2007.
36. Chun Wei, Alan Sprague, Gary Warner. Clustering Malware-generated Spam
Emails WIth a Novel Fuzzy String Mathcing Algorithm. Birmingham : s.n., 2007.
37. —. Detection of Networks Blocks Used by the Storm Worm Botnet. Birmingham : s.n.,
2007. p. 357.
38. Calton Pu, Steve Webb. Observed Trends in Spam Construction Techniques: A Case
Study of Spam Evolution. Atlanta : s.n., 2006.
39. Harris, Tom. How Affiliate Programs Work. How Stuff Works. [Online] 08 11, 2000.
[Cited: 04 11, 2013.] http://money.howstuffworks.com/affiliate-program1.htm.