THE LINK BETWEEN PHARMACEUTICAL SPAM AND ILLICIT ONLINE PHARMACIES by BRANDI JEFFERSON DR. ELIZABETH GARDNER, CHAIR DR. JASON LINVILLE DARRELL BURKE A THESIS Submitted to the graduate faculty of The University of Alabama at Birmingham, in partial fulfillment of the requirements for the degree of Master of Science BIRMINGHAM, ALABAMA 2013 THE LINK BETWEEN PHARMACEUTICAL SPAM AND ILLICIT ONLINE PHARMACIES BRANDI JEFFERSON MASTER OF SCIENCE IN FORENSIC SCIENCE ABSTRACT Spam, or its more formal name, unsolicited commercial e-mail, is a problem for every person in the world that uses the internet. Spam is mostly illegal but there are legal types of spam. The spam found in emails that are not requested, or that are on social networks and search engines can be illegal spam. Illegal spam masks the original destination and normally leads a consumer to a harmful site. Pharmaceutical spam leads to an affiliate site selling prescription drugs without requiring a prescription, along with other questionable products. These drugs range from erectile dysfunction drugs to pain medications. All of these drugs are being sold at very low prices per pill, making them more appealing to the consumer to buy. In this research, pharmaceutical spam subjects were manually analyzed and then grouped into spam campaigns based on common domains. The results of this project include: the determination of the connections among the spam campaigns, as well as the connections among the affiliate sites; the determination of the most dominate affiliate sites and spam campaigns that appeared throughout the nine month research period; and the determination that only about eight to ten affiliate programs are hiring affiliates to advertise their affiliate site using the spam campaigns identified in this research. Keywords: pharmaceutical spam, prescription drugs, data mine, Viagra, Cialis, Levitra ii TABLE OF CONTENTS Page ABSTRACT ........................................................................................................................ ii LIST OF TABLES ...............................................................................................................v LIST OF FIGURES .......................................................................................................... vii LIST OF ABBREVIATIONS ............................................................................................ ix CHAPTER 1 1. INTRODUCTION .....................................................................................................1 Background on Spam ...............................................................................................1 Why Use These Illicit Online Pharmacies? .............................................................5 Use as Directed ........................................................................................................6 Affiliate Programs, Affiliates, and Affiliate Sites ...................................................9 Glavmed/ SpamIt ...................................................................................................10 Microsoft Security Intelligence Reports ................................................................12 Previous Studies .....................................................................................................14 Wei et al. (2007) ........................................................................................14 Zhang et al. (2009) .....................................................................................15 Levchenko et al. (2011) .............................................................................16 Spam Data Mine ....................................................................................................16 CHAPTER 2 2. DATA COLLECTION METHODOLOGY ............................................................18 Virtual Machine .....................................................................................................18 Collecting the Data ................................................................................................19 Method 1: December 2011- February 2012 ...............................................23 Method 2: March 2011-August 2012 .........................................................24 International Characters and Running Queries with PSQL .......................25 Spam Campaigns ...................................................................................................30 CHAPTER 3 3. RESULTS AND DISCUSSION ..............................................................................33 iii Monthly Reports ....................................................................................................33 December 2011 ..........................................................................................33 January 2012 ..............................................................................................37 February 2012 ............................................................................................40 March 2012 ................................................................................................43 April 2012 ..................................................................................................47 May 2012 ...................................................................................................50 June 2012 ...................................................................................................53 July 2012 ....................................................................................................57 August 2012 ...............................................................................................60 Significant Observations ........................................................................................64 CHAPTER 4 4. CONCLUSION ........................................................................................................71 LIST OF REFERENCES ...................................................................................................75 iv LIST OF TABLES Tables Page 1 Volume 5-13 percentages of spam messages blocked by EHS for the Pharmacy-Sexual and Pharmacy-Non Sexual categories of spam ........................13 2a Queries used to generate results and the query’s target .........................................20 3a Key words used for determining pharma spam subjects in Method 1 ...................21 3b Key words used for determining pharma spam subjects in Method 2 ...................21 2b Modified queries used with international characters, an example of the international characters, and the target of the queries. ..................26 4 Table illustrating when each spam campaign appeared and disappeared in December .......................................................................................36 5 Table illustrating when each spam campaign appeared and disappeared in January ...........................................................................................39 6 Table illustrating when each spam campaign appeared and disappeared in February .........................................................................................42 7 Table illustrating when each spam campaign appeared and disappeared in March .............................................................................................46 8 Table illustrating when each spam campaign appeared and disappeared in April ...............................................................................................49 9 Table illustrating when each spam campaign appeared and disappeared in May ................................................................................................52 10 Table illustrating when each spam campaign appeared and disappeared in June ................................................................................................56 11 Table illustrating when each spam campaign appeared and disappeared in July.................................................................................................59 v 12 Table illustrating when each spam campaign appeared and disappeared in August ............................................................................................63 13a Subgroup of domains that appeared with CHCM and CNP spam campaigns in May, June, and July .........................................................................65 13b Subgroup of domains that appeared with CHCM and CNP spam campaigns beginning at the end of July through August .......................................66 vi LIST OF FIGURES Figure Page 1 Example of Canadian Pharmacy template ...............................................................9 2 Results of the spam query ......................................................................................20 3 Results of the subject query ...................................................................................23 4 Results of the modified subject query 1 ......................................................................27 5 Results of the modified subject query 2 .................................................................28 6 Depiction of the actual amount of times the subject appeared ..............................28 7 Results of the modified domain query ...................................................................29 8 Depiction of the actual number of subjects associated with the domain ".drugsspee.ru" ..........................................................................................29 9 Results of removing the duplicate subjects in excel ..............................................30 10 Number of subjects over 1,000 by day for December ...........................................34 11 Number of spam campaigns in December for each affiliate site that appeared....................................................................................................34 12 Number of subjects over 1,000 by day for January ...............................................37 13 Number of spam campaigns in January for each affiliate site that appeared....................................................................................................38 14 Number of subjects over 1,000 by day for February .............................................40 15 Number of spam campaigns in February for each affiliate site that appeared....................................................................................................41 16 Number of spam campaigns in March for each affiliate vii site that appeared....................................................................................................44 17 Number of spam campaigns in April for each affiliate site that appeared....................................................................................................47 18 Number of spam campaigns in May for each affiliate site that appeared....................................................................................................51 19 Number of spam campaigns in June for each affiliate site that appeared....................................................................................................54 20 Number of spam campaigns in July for each affiliate site that appeared....................................................................................................57 21 Number of spam campaigns in August for each affiliate site that appeared....................................................................................................60 viii LIST OF ABBREVIATIONS CHCM Canadian Health&Care Mall CNP Canadian Neighbor Pharmacy CP Canadian Pharmacy CPE Canadian Pharmacy/Pharmacy Express DM Discount Meds DP Direct Pharm DTP Direct Pills ED erectile dysfunction EHS Microsoft Exchange Hosted Services FDA Food and Drug Administration FTC U.S. Federal Trade Commission GP Generic Pills GRX Global RX IP Internet Protocol ix ISP Internet Service Providers MCP My Canadian Pharmacy MH Men's Health MJ Mister Joy MRX Mega RX OP Online Pharmacy PE Pharmacy Express PMP Pharmacy Med Pro PS Pills Shop PSQL Postgre Structured Query Language RD RX Discounts RXD RX Deals RXO RX Orders SEO Search Engine Optimization SIR Security Intelligence Report TDS Toronto Drug Store TM Top Meds x TP Top Pharm VM virtual machine WHO World Health Organization xi 1 CHAPTER 1 INTRODUCTION This research focused on the daily analysis of pharma spam collected from the UAB spam data mine over a period of nine months, December 2011 through August 2012. The initial overall purpose of this research was to collect the daily pharma spam, analyze the frequency of the daily pharma spam, and identify the most prominent affiliate sites as well as the most prominent spam campaigns for each affiliate site. During the course of this project, there was a major shift in the distribution of pharma spam subjects, which required the development of new analysis methods in the middle of the project. Because of this shift, the overall goal was also altered to focus on the analysis of the spam campaign activity in order to identify the most prominent affiliate sites as well as the most prominent spam campaigns for each affiliate site. Also, contrary to published reports, many affiliate sites remain active for months rather than days. Background on Spam Unsolicited commercial email or spam is not a new problem. The first spam email was sent in 1975 as an Internet Request for Comments (1). The Internet Request for Comments is a series of standards for the internet that are enforced by the Internet Engineering Task Force. Spam makes up 45% of all emails sent out each day (2). On 2 average, there are 14.5 billion spam messages sent out globally per day. There were 200 billion spam email sent out every day in 2010 (3). Spam can take the form of a document (e-mail), an automated vote, a user profile, or an annotation (4). In addition to email, spam can be posted on social network sites and can even come up as a result when searching on a search engine. The form of the spam depends on the spammer’s motivation. Motivations include self-promotion, curiosity, advertisement, disruption, or mocking a competitor (4). No matter the form of the spam, if it catches the receiver’s attention, the spammer has succeeded. The two most common types of spam are advertising and adult related spam (2). Advertising spam accounts for 36% of all spam messages sent, and adult related spam accounts for 31.7%. The goal of both types of spam messages is to entice the receiver to click on an URL that will open to an online store. The contents of the advertising emails will advertise all types of products such as: replica products, male enhancements, counterfeit software, or pharmaceuticals (5). The contents of the adult related emails will advertize pornographic items or sexual enhancement products (6). Both types of emails generally contain redirect links, meaning the spam message will point to a small webpage hosted on a hacked web server. Once the URLs are clicked, the consumer will be forwarded to a website site containing the advertised products. These are generally not legitimate sites, but are instead unsafe or illicit websites designed by the spammer to offer questionable goods. Spammers also tweak domains, name servers, and web servers in order to forward the consumer to the spam website. Using these methods, the spammer can circumvent spam filters that check the destination context (5). 3 The reason people are spammed is simple; it is cheap, easy, and lucrative. A spammer can send out 100,000 spam messages for less than $200 and buy a million email addresses to spam for less than $100 (1). Spamming is easy because the spam is being sent out, not by humans, but by computers that are a part of a botnet. A botnet is a group of infected computers that can be used to spam millions of people with the same message in a matter of seconds (7). One method used by spammers to infect computers is to send malware to an unsuspecting computer owner. Malware is short for malicious software and is software designed with malicious intent (8). A computer may become infected when the owner receives an email from a known contact, opens it, and clicks on an included link in the body of the email. Clicking the link results in the computer surreptitiously downloading an executable file which installs a piece of malware that takes remote control of the computers, i.e.: a Trojan horse (9). Once the malware is loaded, it will download all the contacts from the email account on the infected computer and repeat the cycle of infection and control. Once the botnet is activated, the computers in the botnet will begin sending out the spam messages, usually without the owner of the computer even knowing the computer is infected. The spammer controls the botnet through a central command and control server. Most spam networks consist of several botnets as well as several command and control servers (7). A botnet can serve two functions: one is to collect a database of email addresses and the second is to flood those accounts with spam email. To a spammer, the main benefit of using botnets is efficiency because efficiency equals profit. More spam being generated means higher potential for sales and/or another computer being infected. Botnets also have the benefit of obfuscation. Obfuscation in this 4 context is the ability to hide the spam’s original destination. The spammer is able to alter the spam emails to look like an authentic email from a known contact, furthering the potential for a consumer to open the email, click on the embedded link, and become infected. Although spam is unsolicited, there are legal types of spam received each day that are solicited by consumers. When a consumer solicits legal spam, the consumer knowingly and voluntarily signs up to receive emails from a business and has the option to opt out whenever desired (10). Mostly though, spam is illegal because the sender’s information is falsified in order to keep the sender anonymous. In the case of illegal spam, the sender does not have the recipient’s consent to send the unsolicited mail nor does the recipient request to receive the unsolicited mail (10). There are three types of web spam: link spam, content spam, and cloaking; with content spam and cloaking being the most common (11). Link spam has links between pages that are present for malicious intentions rather than moral intentions (12). Linkbased ranking algorithms are used with link spam (11). Websites are given higher rankings from the link-based ranking algorithms based on how often other highly ranked websites link to it (13). For example, Facebook is a highly ranked website. When other websites are advertised on Facebook’s website, those websites obtain a higher ranking. The higher the ranking of the website, the more likely the website is to be in the top results returned from a Google search. If a spammer can get its website to have a high enough ranking to appear in the top results for a Google search, there is a better potential for revenue. Content spam maliciously crafts the content of web pages by inserting keywords that are related to popular query terms rather than to the actual content of the 5 pages. This type of spam is also known as Search Engine Optimization (SEO). When a spammer uses cloaking, the spammer is sending altered web page content to a search engine rather than to the visitors of the website (11). Meaning, instead of infecting the consumer via a link that leads to a malicious site in an email, the spammer alters a particular web page with the malicious content. For example, if the consumer uses a search engine such as Google to search for ‘target’, Google will generate results related to the word ‘target’. The consumer will find the link that leads to target.com and click the link, which should open to the official Target web page. The spammer, however, has used cloaking on this web page, leading the consumer to an altered version of target.com with malicious content embedded. There are legal and simple technical measures used to block spam emails. Blacklists and key-word based filters can be used, but these measures have not had much of an effect (14). One reason these filters do not work as well as hoped is that spammers quickly learn which words are being filtered and simply alter the keywords that would keep the spam out, ensuring their particular spam will get through the filters. Why Use These Illicit Online Pharmacies? United States citizens spent nearly $2.6 trillion on healthcare in 2011; ten times the amount spent in 1980, $256 billion (15; 16). With the increase of unemployment and household incomes decreasing, many Americans are more concerned about health spending and affordability (15). Many consumers find it difficult to pay for essential medications and may choose to leave their prescriptions unfilled (17). When a person 6 absolutely needs the medicine but cannot afford the medication, one option is to turn to an illicit online pharmacy offering a cheaper alternative. Illicit pharmacies sell pharmaceuticals ranging from male enhancement, to antibiotics, to blood pressure medications (18). The Center for Medicine in Public Interest estimated the sales of counterfeit medicines are growing twice as fast as the sales of legitimate medicines (17). There was an annual increase of 13% for counterfeit medicines compared to a 7.5% increase for legitimate medicines from 2004-2010. This increase is likely due to the fact that the illicit sites offer significant discounts over what a traditional pharmacy could offer, even with insurance (17). The ability to purchase medications without needing a prescription or insurance, as well as the costs being significantly lower, offers a huge incentive to consumers. Use as Directed Purchasing prescription drugs online without a prescription is illegal. These prescription drugs are such because of the dangers that arise when the drugs are taken irresponsibly. Many of these prescription drugs are also controlled meaning the purchase of them without an official prescription from a practicing doctor is unlawful (19). Purchasing those controlled drugs from illicit online pharmacies is a federal crime and could lead to a prison sentence. Many of these illicit online pharmacies advertise having an online doctor that can write the prescriptions, but in reality, these online doctors are not recognized under the law, therefore the “prescriptions” being written by them are not legal (19). 7 Beside it being illegal to purchase prescription medication without a prescription, these purchases are also dangerous. With any medication, there are always side effects that should be taken into account, as well as the medication’s interactions with other drugs. Buying prescription medications online may be easier, but is not by any means safe. It is not known where these medications come from. The medications are not inspected by the regulatory agencies such as the Food and Drug Administration (FDA), nor are the medications manufactured under safe conditions (20). Medicine purchased online could have additives (21), be expired, be the wrong dosages, or not contain correct dosage directions (19). Another issue with purchasing drugs from these illicit online pharmacies is the active ingredient of the original drug may not be present in the drug advertised on the illicit pharmacy. For example, some illicit pharmacies advertise female sexual enhancers, Pink Female Viagra (Sildenafil) and Female Cialis (Tadalafil) (22). Both Viagra and Cialis have only been approved for men, so the female version of Viagra and Cialis may not contain any sildenafil or tadalafi at all and could be dangerous for consumption. There were 46 confidential reports received by the World Health Organization (WHO) from 20 different countries related to counterfeit drugs (23). The reports included information about products without the active ingredients, with incorrect quantities of the active ingredient, with wrong ingredients, correct quantities of active ingredients but fake packaging, products with impurities and contaminants, as well as products that were copies of the original. The majority of the products, 32.1%, did not have the active ingredient. The products that contained wrong ingredients accounted for 21.4% and the products that contained incorrect quantities of the active ingredient accounted for 20.2%. 8 Products that contained the correct active ingredients but fake packaging accounted for 15.6%, and the products that contained high levels of impurities and contaminants accounted for 8.5%. Only 1% of the products were copies of the original product (23). Viagra has gained wide acceptance as a treatment for erectile dysfunction (ED) since its debut in 1998 (21); but there are still many men who are too embarrassed to purchase the drug openly. The drug has also become very popular with younger men. On the black market, Viagra is sold for $25-$30 per pill but can be purchased for lower than $1 on illicit online pharmacies (18; 21). Purchasing Viagra online allows the buyer to be both anonymous and to save money (21). Viagra has become a party drug on college campuses, at clubs, and rave parties (21). At these parties, it is often taken in a life-threatening combination with street drugs such as Ecstasy. Viagra is taken to offset the decrease in sexual arousal experience with Ecstasy. The ED drugs are also being combined with "poppers," or amyl nitrate. Both of these drugs dilate blood vessels, which can result in a sudden drop in blood pressure and cause a heart attack or stroke. For that same reason, Viagra is not prescribed for patients taking nitrates for certain heart conditions. The combination of Viagra and nitrates has resulted in several reported deaths (21). Drug task force agents in Athens Georgia have reported that they routinely discover Viagra in the possession of college males who don't have ED problems, nor do the students posses a valid prescription for the Viagra (24). The students steal it from their parents, order it online, or buy it from their friends. While the ED drugs Cialis and Levitra are also drugs of interest in this research, they have not yet reached Viagra’s level of popularity and are not abused as heavily as Viagra (25). 9 Affiliate Programs, Affiliates, and Affiliate Sites Pharmaceutical spam leads a consumer to a website that sells many different prescription pharmaceuticals; however, on these sites a prescription is not required. Upon clicking a link in a spam message, the consumer is led or forwarded to an affiliate site such as Canadian Pharmacy (CP), (Figure 1). Figure 1- Example of Canadian Pharmacy template Affiliates programs are responsible for providing the template and source code for the affiliate sites (26). The graphics, as well as the selection and arrangements of products sold are what define the template of an affiliate site. Most affiliate programs are free for affiliates to join. The affiliate program may have similar templates for each of the affiliates, or the affiliate program may provide different templates for each of the affiliates. Often times there are multiple affiliate sites being run by the same affiliate. Affiliates also work for multiple affiliate programs at once, so the different affiliate sites being run by the one affiliate may have all been provided by a different affiliate program. 10 It is also possible for an affiliate to leave one affiliate program and begin working with another affiliate program. The affiliate programs will provide their affiliates with the source code and template for the affiliate sites, and the affiliates decide upon their own manner of advertisement. Once the affiliates have the source code and templates for the affiliate sites, the affiliates register domains for the affiliate sites through a domain registry such as Go Daddy. Once the domains are registered, the affiliates can host the affiliate sites on numerous domains. Once the domains are active, if the affiliates choose spam as their form of advertisement, the affiliates generate the spam emails containing links to the affiliate sites and send out the spam via botnets. The affiliates handle the shopping experience for the customer but when the customer is ready to check out, the affiliate program regains control (26). The affiliate programs are responsible for obtaining the contracts with outside parties for the payment and fulfillment services (5). Often times, the customer is redirected to a payment gateway site (26). The payment gateway can be obvious to the customer or a proxy mode can be installed so the customer is not aware of the redirection. Glavmed/ SpamIt Glavmed is an affiliate program that promoted illicit online pharmacies (27). The glavmed.com domain was registered in March 2006 (28). SpamIt is sometimes used when referring to Glavmed or called the sister company of Glavmed. The spamit.com domain was registered in June 2004 with the same administrative contact name as the 11 glavmed.com domain (28). The difference between the glavmed.com and spamit.com domains was the corresponding postal address and phone number. SpamIt was a forum of members that were affiliates whom had access to the high end pharmaceuticals, such as Percocet. The high-end pharmaceuticals are often controlled medications. Only those members of SpamIt were allowed to sell the high end pharmaceuticals, while non members were only allowed to sell other pharmaceuticals sold by Glavmed. Glavmed focused on websites, SEO and banner advertising, while SpamIt was more directly tied to the spam emails (28). Igor A. Gusev was known to be in charge of both Glavmed and SpamIt (29). The Russian authorities opened a criminal investigation on Gusev and he was charged with operating a pharmacy without a license and of failing to register a business. Charges pending, SpamIt “decided” to close down in September 2010, while Glavmed continued with business as usual, paying affiliates to promote illicit online pharmacies (30). Before being able to close down, however, three years worth of clientele from the Glavmed/SpamIt database was exposed. Eventually, the Russian authorities made efforts to crack down on the spam distributing from within Russian, and the Glavmed was also shut down. Gusev is said to still be at large and is wanted in Russian on those charges (31). The three years of clientele information that was leaked showed that Glavmed’s affiliates sold knockoff prescription drugs to more than 800,000 consumers and processed over $1.8 million, which generated at least $150 million in revenue (30). Following SpamIt’s closure in October 2010, there was a 40% decrease in spam volume (32). This did not last, however, as spam emails leading to illicit pharmacies quickly returned to the 12 previous highs. The Glavmed database contained 705 unique entries. Those entries included, but were not limited to drugs, drug tests, wines, and veterinary medicines. Along with selling prescription drugs without a prescription, Glavmed was also responsible for selling discontinued drug far beyond their discontinuation dates. Investigations through the known Glavmed database showed 46 discontinued drugs actively being sold after their discontinuation dates. It is not known exactly when, but at some point towards the end of 2012 beginning of 2013, the Glavmed website was back up and running, and being hosted in Dallas, Texas. There is no evidence of any real activity occurring on the site, but it is interesting that the site is live again. Microsoft Security Intelligence Reports Since January of 2006, Microsoft has published a Security Intelligence Report (SIR) that focuses on data and trends every 6 months. This report helps to provide the public with in-depth perspectives of potentially unwanted software, software vulnerabilities and exploits, and malicious code threats. One of the topics in these reports includes the different types of spam messages that are blocked by the Microsoft Exchange Hosted Services (EHS) (33). There are 14 Volumes of the SIR to date, but Microsoft did not begin documenting the percentages for the different types of spam that was blocked until the fourth volume. In Volume 4, the spam category of interest as it pertains to this research was RX/Herbal spam messages. Microsoft recorded the 13 percentage of spam blocked in this category in 2004 and 2007. In 2004 the percentage for the RX/Herbal category was 10%, and in 2007 it tripled to 31% (34). Beginning in Volume 5, the spam categories of interest were pharmacy- sexual (Viagra, Cialis, Levitra, etc.) and other pharmacy or pharmacy-non sexual. The percentages in Table 1 show the percentage of the spam blocked for each category. There were a high percentage of spam messages being blocked in early 2008 for the pharmacysexual category, but as the months elapsed, the percentages of spam messages being blocked for this category drastically decreased. The percentage of pharmacy-non sexual spam messages being blocked fluctuated throughout the SIRs, but reached its high in Volume 13, January – June 2012. Microsoft makes no conclusions as to why the pharmacy-sexual category is consistently decreasing as the years pass, nor as to why the pharmacy-non sexual category fluctuates in the manner that it does. Table 1 Volume 5-14 percentages of spam messages blocked by EHS for the pharmacy-sexual and pharmacy-non sexual categories of spam Volume # Pharmacy- Sexual Pharmacy- Non Sexual 5 (Jan. – June 2008) 30.6% 20.9% 6 (July – Dec. 2008) 10.0% 38.6% 7 (Jan. – June 2009) 7.8% 40.5% 8 (July – Dec. 2009) 6.4% 31.7% 9 (Jan. – June 2010) 3.3% 31.9% 10 (July – Dec. 2010) 3.3% 32.4% 11 (Jan. – June 2011) 3.8% 28.0% 12 (July – Dec. 2011) 3.2% 46.5% 13 (Jan. – June 2012) 3.4% 46.7% 14 (July – Dec. 2012) 1.4% 43.8% 14 Previous Studies Wei et al. (2007) The current method for investigating spam emails was developed by Wei et al. in 2007 (35). Spam email messages were clustered in groups based on images from the websites that were in the messages. Once the associated messages were clustered, there was a validation process in which the clusters were tested to decide if the images within the cluster were actually related. The validation process was preformed with one or two levels. The first level validation included visual investigation of the clusters. If the cluster contained website images that were all the same, the cluster’s integrity was given a high confidence. If the cluster contained divergent images, the second level validation was performed. The second level validation checked the domains in the cluster(s) against WHOIS, and WHOIS returned information regarding the hosting information, registrant information, and name server information associated with the domains. WHOIS is a database that contains information on registered domains and IP addresses. The largest seven clusters out of 42 clusters contained more than 100 messages each. For the first level of validation, five of the seven large clusters each contained exactly one image pattern. Those five clusters were given a high level of validity. Two of the seven clusters needed the second level of validity. Wei et al. determined those two clusters each contained multiple image patterns being used. The WHOIS information for the first cluster on which the second level of validation was performed determined that the cluster contained 100 domains and 22 image patterns, but the domains were only 15 being hosted on six IP addresses. For the first cluster there were two unique sets of owner registration data found and three sets of name servers found. The second cluster on which the second level of validation was performed contained numerous divergent images and this level of validation did not give a high correlation to the cluster. Wei et al. believed there were false positive within this cluster which caused all the divergent images. Use of these validation levels proposed a new approach for analyzing spam emails. Zhang et al. (2009) A similar approach to studying spam emails via data mining was taken by Zhang et al. (10) in 2009. In this research, the authors focused on the image attachments in the spam email in order to identify phishing groups or spam clusters. The authors analyzed the background textures, foreground text layouts, and foreground picture illustrations and then extracted the visual features from the spam images. Once extracted, the visually similar spam images were clustered using an “unsupervised clustering algorithm”. Zhang et al. concluded that the “unsupervised clustering algorithm” was very effective for verifying visual similarities between spam images and providing key information about the spam images and their common source. The algorithm also helped to automate the visual validation process of the spam clustering results. 16 Levchenko et al. (2011) The data collection methodology in this research was based on a method developed by Levchenko et al. (5) in 2011. Spam emails were collected in a data mine and the URLs were investigated. The websites were then clustered by similarities in content and purchases were made from each cluster. In the research, the authors focused on replica goods, counterfeit software, and pharmaceuticals, including herbal goods. This method concluded there were only 13 distinct banks acting as Visa acquirers and 13 different suppliers. Of the 13 distinct banks, most of the herbal and replica purchases were cleared through the same bank in St. Kitts (West Indies). The pharmaceutical affiliate programs used a bank in Latvia and a bank in Azerbaijan. The purchased software was cleared through one bank in Latvia and one bank in Russia. Of the 13 different suppliers, all the pharmaceuticals were shipped from India with the exception of one, which shipped from the United States (US). The replicas were shipped from China, and the herbal goods were shipped mostly from the US, but were also shipped from China and New Zealand. There was no clear supplier identified for the counterfeit software. Levchenko et al. concluded that the payment tier is the most valuable asset in the spam ecosystem. Spam Data Mine While most people in the world are trying to eliminate spam from their email accounts, the UAB Spam Data mine is collecting it. The spam is collected from UAB staff and students and entered into the spam data mine. Spam is also collected from 17 expired domain names purchased by Internet Identity1 as non-existing email accounts. These email accounts have a “catch-all” policy on the filter, so the domains collect anything from the email accounts identified as spam (36). Once in the data mine, the emails are parsed into database fields. The email subject, sender’s email, sender’s IP address, sender email-id, (email header information), and usernames (7; 37) are stored in one table. Any URLs in the email are stored in a second table and attachment information about the emails is stored in a third table (37). The data mine also extracts the website’s domain names and IP addresses (7). The remainder of this thesis provides information on the virtual machine (VM) used, information on the methodologies used throughout this research, how spam campaigns were developed, information about the data collection using Postgre Structured Query Language (PSQL) (Chapter 2), the results of the research as well as discussions about the results (Chapter 3), and the conclusions of the research (Chapter 4). 1 Anti-Phishing Company, Tacoma, Washington 18 CHAPTER 2 DATA COLLECTION METHODOLOGY Virtual Machine A virtual machine (VM) was run using the VMWare2 Workstation software. This software was used to protect the computer from being infected by malware or invasive software. The VM is an operating environment, running as a normal application that does not have contact with the host operating system. The computer used in this research uses a Windows operating system, and the VM used on this computer opens using a Linux operating system inside the Windows operating system. The VM is built with a base image where any downloaded software or data are written to temporary files. Each time the VM is shut down, the temporary files are deleted and only the uncontaminated base image is left. This prevented the computers from being infected with viruses during the analysis. 2 Palo Alto, CA 19 Collecting the Data Two methods were used to collect the data, method 1 and method 2. Method 1 was used from December 2011 through February 2012. Method 2 was used from March 2012 through August 2012. A ‘spam query’ was used to retrieve all of the spam subjects for one day and the number of entries in the spam data mine for the spam subjects on the date being queried. An example of the spam query is shown in Table 2a and the results of the spam query are shown in Figure 2. Using the results of the spam query, the subjects were manually analyzed to determine which could be labeled as pharma spam. Pharma spam subjects were labeled as such by a certain amount of instinct, as well as by manually analyzing the subjects for any key words indicative of pharmaceutical spam (Table 3a, Table 3b). Once the data was collected, the data was further analyzed and potential spam campaigns were developed. 20 Table 2a Queries used to generate results and the query’s target Query Target Spam Query Total spam subjects for a particular day Select count (*), subject from spam where receiving_date = ‘2012-3-18’ group by subject order by count desc; Example spam query Subject Query All domains associated with a specific subject Select count(*), subject, machine, path from spam a, link_url b, spam_link c where a.message_id = b.message_id and b.urlid = c.urlid and receiving_date = ‘2012-6-19’ and subject = 'saleoff:Viaqra ppil - the only secret of perfect' group by subject, machine, path order by count desc; Example subject query Domain Query All subjects associated with a specific domain Select * from spam a, link_url b, spam_link c where a.message_id = b.message_id and b. urlid = c.urlid and receiving_date = ‘2012-3-18’ and machine = ‘pillstomp.ru’; Sample of domain query Figure 2- Results of the spam query 21 Table 3a Key words used for determining pharma spam subjects in Method 1 Method 1 Key Words Viagra Cialis Levitra Sildenafil Tadalafil Vardenafil Table 3b Key words used for determining pharma spam subjects in Method 2 Method 2 Key Words Viagra Pharmacy Cialis Drugs Levitra Pills Sildenafil Store Tadalafil Rx Vardenafil Prescription In the beginning of the research, method 1 was acceptable because there were numerous pharma spam subject entries in the spam data mine for each day. Also, all of the subjects contained a key word from Table 3a making the pharma spam subjects easily detectable. January 21st, 2012 marked the date of the first noticeable change in the amount of daily pharma spam entries in the spam data mine. On that day, a considerable decrease in the amount of daily pharma spam was noticed. The decreased continued through the end of January. 22 The hypothesis behind the daily pharma spam decrease was the spammers were beginning to notice the same IP address visiting their websites numerous times a day, every day. In an attempt to thwart the investigative efforts, the spammers began to obfuscate the pharma spam, or make the pharma spam less obvious. This obfuscation occurred by using less obvious subject lines for the pharma spam, so the pharma spam subjects were harder to detect. The obfuscation continued into February, so starting February 27th, all of the subjects that appeared with at least 1,000 entries in the data mine were analyzed, not just those that contained the specified key words. The change in the amount of daily pharma spam continued to decrease as February came to an end, so beginning in March method 2 was used to analyze the daily pharma spam. During Method 2, another obfuscation effort that was noticed included the spammers using international characters at the end of the pharma spam subjects. For example, if a subject actually appeared with a total of 1,564 entries in the data mine on one day, but the spammer added on random international characters to the end of the subjects, the subject would only show as appearing with maybe 385 entries, 294 entries, 375 entries, 256 entries, and 254 entries. Each time the subject would appear there would be a different set of randomized international characters attached to the subject. This obfuscation gave the illusion there were fewer pharma spam subject entries in the data mine than there actually were. 23 Method 1: December 2011- February 2012 The spam query was used to retrieve all of the spam that was collected each day in the spam data mine. From the results of the spam query, the subjects that appeared with at least 1,000 entries in the spam data mine per day were manually analyzed. With the aid of the key words (Table 3a), certain subjects could be labeled as pharma spam. Once labeled, the pharma spam subjects were further analyzed using a subject query (Table 2a). The subject query returns the particular machines and/or domains along with the paths associated with each subject used in the query. The domains were then opened in the VM in order to determine if the domain led to an affiliate site or not. A sample of results from the subject query is shown in Figure 3. Each subject, its associated domains, and the affiliate sites the domains led to were recorded. The amount of pharma spam observed for each month was recorded as well as the number of spam campaigns that were associated with each affiliate site. As each distinct affiliate site was identified, screenshots of the main page, the contacts page and the logos were taken and saved. Figure 3- Results of the subject query 24 Method 2: March 2011-August 2012 Method 2 was developed to help counter the obfuscation attempts of the spammers. Each subject, its associated domains, the affiliate sites the domains led to, and amount of spam campaigns continued to be recorded as with method 1. The screenshots of the affiliate sites’ main page, contact page, and logos were also still collected. Similar to method 1, the spam query was used to retrieve all the spam subjects for one day, as was the subject query used on the labeled pharma spam subjects. The changes in method 2 included using addition keyword to label the spam as pharma spam, analysis of all of the spam subjects that appeared in the data mine each day, not just the spam that appeared with at least 1,000 entries in the data mine, and the use of the domain query. The domain query was use to collect all of the subjects associated with a particular domain (Table 2a). The domain query takes a domain as an input and returns all subjects associated with that domain as well as the associated IP addresses, sender names, and usernames. The domains were chosen at random from the list of results generated by the subject query. The sender names, usernames, and IP addresses returned by the domain query were discarded because they were most likely associated with an infected computer and not information about the affiliate site. Using the domain query helped to retrieve pharma spam subjects that may have been over looked while manually analyzing the daily spam subjects. 25 International Characters and Running Queries with PSQL While running a query, PSQL will return an error when it encounters international characters such as “Ô. If a subject being queried contained the international character, the international character was replaced with a percent sign (%), which functions as a wild card and the equal sign (=) in the query was replaced with “like”. Table 2b includes a subject that contains an international character in bold and italicized font. The international character(s) in the subject query are replaced with the percent sign (bold, italicized) and the equal sign is changed to “like” (bold), to create the modified subject query 1. Figure 4 shows the equal sign has been replaced with ‘like’ and the international characters have been replaced with the percent sign. The results of modified subject query 1 are also shown in Figure 4. 26 Table 2b Modified queries used with international characters, an example of the international characters, and the target of the queries. Query Modified Subject Query 1 Buà Ciails and Viarga online! Select count(*), subject, machine, path from spam a, link_url b, spam_link c where a.message_id = b.message_id and b.urlid = c.urlid and receiving_date like ‘2012-6-19’ and subject like ‘Bu% Ciails and Viarga online!’ group by subject, machine, path order by count desc; Modified Subject Query 2 Over 75.000 customers trust us. Buy Cheap Viagra & Cialis Pills. We accept Visa, Mastercard, AmEx & ACH. vixd1 Select count(*), subject, machine, path from spam a, link_url b, spam_link c where a.message_id = b.message_id and b.urlid = c.urlid and receiving_date = '2012-7-3' and subject like ‘%Over 75.000 customers trust us. Buy Cheap Viagra & Cialis Pills. We accept Visa, Mastercard, AmEx & ACH.%' group by subject, machine, path order by count desc; Modified Domain Query fui.drugsspee.ru Select * from spam a, link_url b, spam_link c where a.message_id = b.message_id and b. urlid = c.urlid and receiving_date = ‘2012-318’ and machine like ‘%.drugsspee.ru%’; Target Domains associated with a specific subject containing international characters Sample subject with international character Sample of modified query 1 All of the exact subjects used in the query without the additional characters Sample subject with additional randomized characters Sample of modified query 2 All subjects associated with only the domain name Sample of domain with machine attached Sample of modified domain query 27 Figure 4- Results of the modified subject query 1 The wild card was also used when a subject contain a randomized set of characters. The random characters were usually attached at the end of the subject. When this occurred, the modified subject query 2 was used. Table 2b includes an example of a subject containing the randomized additional characters (bold and italicized). In order to get accurate results, the percent signs (bold, italicized) are used at the beginning of the subject and at the end of the subject in place of the randomized additional characters, and the equal sign is changed to “like” (bold). The modified subject query 2 returns all the subjects that look exactly like the characters between the percent signs, regardless of the additional characters at the end of each subject. An example of the modified subject query 2 and its results are shown in Figure 5. The number of entries for the sample subject that actually appeared in the data mine on July 5, 2012 was 1,181 (Figure 6). 28 Figure 5- Results of the modified subject query 2 Figure 6- Depiction of the actual amount of times the subject appeared Domain queries also needed to be modified occasionally (Table 2b). The modified domain query is similar to the modified subject query 2. The example in Table 2b has a machine (bold, italicized) attached to the domain (underlined). The machine is random and will change with the subject but the domain will remain constant. In order to receive accurate results, the machine name is removed from the query, the domain is enclosed with percent signs (bold, italicized), and the equal sign is changed to “like” (bold). The results of the modified domain query are circled in Figure 7. The actual number of subjects associated with the domain ‘.drugsspee.ru’ on March 18, 2012 was 7,995, 29 enclosed in the box in Figure 8. In an excel sheet, all of the duplicate subjects returned from the modified domain query were removed and the total number of distinct subjects associated with the domain was actually ten (Figure 9). Figure 7- Results of the modified domain query Figure 8- Depiction of the actual number of subjects associated with the domain ".drugsspee.ru" 30 Figure 9- Results of removing the duplicate subjects in excel Spam Campaigns Beginning in November, the daily spam collected from the UAB spam data mine were closely examined in order to determine trends among the pharma spam subjects. Once the trends among the subjects were established, the similar subjects were put into a group identified as a potential spam campaign. The association of domains to subjects was the primary criteria for inclusion in a spam campaign. If multiple subjects contained at least one common domain, those subjects were grouped together into a spam campaign. Within the spam campaigns, the domains connecting the subjects may not be the exact same domains. For example, subject A in a particular spam campaign has domain a, domain b, and domain c associated with it, subject B in the same spam campaign has domain b, domain d, and domain e, and subject C also in the same spam campaign, has domain d, domain f, and 31 domain g. All three subjects are connected by a single domain, but the connected domain is not the same among the three different subjects. This is most often the case when there were many subjects in a spam campaign. For that reason, only one domain connection among subjects need be apparent in order to group different subjects in the same spam campaign. The affiliate sites to which the domains opened were also taken into consideration when grouping subjects into the spam campaigns. If the same domain was associated with different subjects, that subject would always open to the same affiliate site, if the domain loaded at all. There are a few cases in which the spam campaign has multiple subjects that share the same domains, but the different domains within the spam campaign open to various spam campaigns, not just one. Generally though, all the domains in each spam campaign opened to the same affiliate site. As the subjects were grouped into the spam campaigns, trends among the subject structures were identified. Many of the spam campaigns contained subjects that had similar structures in the subject line format. The trends among these subjects included common misspellings, capitalization of one letter or the whole subject line, short and/or long length subject lines, and the use of international characters. Once the trends in the structure of the subjects lines was identified, that criteria was also taken into consideration when grouping together similar subjects. From there, the spam campaigns could be analyzed for any overlap among them. Each active spam campaign is designated by the affiliate site abbreviation followed by a dash and then a capital letter. For example, a spam campaign for Pharmacy 32 Express (PE) is designated as PE-A. Occasionally there were subjects that appeared once or twice during the research, but were not observed consistently enough throughout the research, and therefore were not put into spam campaigns. There were also subjects put into spam campaign(s), but the spam campaign(s) were not mentioned because the spam campaign(s) did not appear consistently enough throughout the research. 33 CHAPTER 3 RESULTS AND DISCUSSION The results of method 1 are depicted from December 2011 through February 2012. The results of method 2 are depicted from March 2012 through August 2012. Each month contains a table illustrating the appearances and disappearances of each spam campaign that was active per month. The results from December through February focus on the frequency of the daily pharma spam that was analyzed as well as the number of spam campaigns active for each month. The results from March through August focus solely on the appearances and disappearances of the spam campaigns and any overlaps among them. Significant observations observed throughout the research were also discussed. Monthly Results December 2011 The pharma spam subjects that appeared with at least 1,000 entries in the spam data mine each day were analyzed. Using method 1, a total of 452 pharma spam subjects that appeared with at least 1,000 entries in the spam data mine each day were identified for December (Figure 10). These 452 pharma spam subjects were responsible for 1,965,222 entries in the spam data mine for the month of December. Of the 452 subjects 34 collected, there were 115 distinct subjects; many subjects were repeated throughout the month. Each day, the subject that appeared in the greatest volume was a subject in the spam campaign CP-A. This spam campaign alone accounted for 52% of the pharma spam observed in December. 30 25 20 15 10 5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 Figure 10- Number of subjects over 1,000 by day for December Six affiliate sites had active spam campaigns during December: Canadian Pharmacy (CP), Online Pharmacy (OP), Canadian Health&Care Mall (CHCM), Canadian Neighbor Pharmacy (CNP), My Canadian Pharmacy (MCP), and Pharmacy Express (PE). Based on analysis done in November of the pharma spam subjects that were collected, 12 spam campaigns were identified for the month of December. Affiliate sites CHCM, OP, and CNP each had three active spam campaigns in December, while MCP, PE, and CP each only had one (Figure 11). 4 3 2 1 0 CP OP CHCM CNP MCP PE Figure 11- Number of spam campaigns in December for each affiliate site that appeared 35 On December 10th and 28th, only one pharma spam subject appeared with at least 1,000 entries in the spam data mine (CP-A), and December 29th was the only day no pharma spam subjects appeared with at least 1,000 entries in the spam data mine. Spam campaign CP-A appeared the most this month, while spam campaign PE-A appeared the least. There were three subjects a part of the CNP-A* spam campaign that appeared within the MCP-A* spam campaign on December 20th. The subjects in CNP-A* were associated with MCP-A* because the subjects’ corresponding domains opened to both affiliate sites MCP and CNP at some point in December. Spam campaign CNP-C was independent of any other CNP spam campaign during December, but CNP-A* overlapped with spam campaign CNP-B as well as spam campaign MCP-A*. The overlap of spam campaign CNP-A* and spam campaign MCPA* was likely due to the common subject between the two spam campaigns. There was no overlap of the OP spam campaigns during December. As one OP spam campaign disappeared another reappeared. The activity for all of the active spam campaigns in December is shown in Table 4. CHCM-C CHCM-B CHCM-A CNP-C CNP-B CNP-A* MCP-A* OP-C OP-B OP-A CP-A PE-A Spam Campaign December 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Table illustrating when each spam campaign appeared and disappeared in December Table 4 20 21 22 23 24 25 26 27 28 29 30 31 36 37 January 2012 The most significant observation in January was the dramatic decrease in daily spam subjects that appeared with at least 1,000 entries in the spam data mine each day. From January 21st -31st, only one pharma spam subject appeared with at least 1,000 entries in the spam data mine, “BUY NOW VIAGRA CIALIS !!!”. There were a total of 349 pharma spam subjects that appeared with at least 1,000 entries in the spam data mine each day in January (Figure 12). The 349 pharma spam subjects accounted for 2,813,677 entries in the spam data mine for January. Of those 349, there were 96 distinct subjects. In January, the spam campaign CP-A alone accounted for 62% of the pharma spam observed in January. The spam campaign CP-A was again the primary spam campaign observed, and even with the significant decrease in daily pharma spam subjects, this spam campaign’s volume increased 10% from December to January. The volume increase in spam campaign CP-A caused a 70% increase from December to January in the amount of pharma spam that was observed. 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Figure 12- Number of subjects over 1,000 by day for January 38 Only five affiliate sites were observed to have active spam campaigns in January: CP, OP, CHCM, CNP, and MCP (Figure 13). Affiliate site PE did not have any active spam campaigns in January. Of the twelve December spam campaigns, only ten remained active in January, and no new spam campaigns appeared. All three spam campaigns for affiliate sites CHCM and OP remained active from December to January. Two CNP spam campaigns remained active from December to January. Affiliate sites CP and MCP each had their one spam campaign remain active from December to January. 4 3 2 1 0 CP OP CHCM CNP MCP Figure 13- Number of spam campaigns in January for each affiliate site that appeared Spam campaign CP-A appeared the most often again in January and spam campaigns OP-A and CNP-C appeared the least often. Spam campaign CNP-A* was not observed in January because the subjects in this spam campaign were now associated with spam campaign MCP-A*. The CNP spam campaigns did not overlap in time with each other, but did overlap with spam campaign MCP-A*. The only overlap observed among the OP spam campaigns occurred on January 2nd. On the days in January that spam campaign CHCM-A did not appear, either CHMC-B or CHCM-C would appear. Table 5 illustrates the activity of all the active spam campaigns during the month of January. CHCM-C CHCM-B CHCM-A CNP-C CNP-B MCP-A* OP-C OP-B OP-A CP-A Spam Campaign January 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Table illustrating when each spam campaign appeared and disappeared in January Table 5 20 21 22 23 24 25 26 27 28 29 30 31 39 40 February 2012 The decrease in the daily pharma spam subjects continued from January into February. In February only a total of 47 subjects appeared with at least 1,000 entries in the spam data mine each day (Figure 14) and of those 47, there were only 17 distinct subjects. The total number of entries that the 47 pharma spam subjects accounted for this month, however, was still over a million; 1,522,032; a 54% decrease from January to February. A large percentage of the pharma spam subjects were again courtesy of the CP-A spam campaign, which accounted for 92% of the pharma spam observed in February; a 30% increase from January. Even with the large volume of spam accumulated for the spam campaign CP-A, February did not have more pharma spam subjects collected than January; unlike from December to January. This is mainly due to there being very few pharma spam subjects observed each day. The increase in subjects on February 27th is due to the analysis of all the subjects that appeared with at least 1,000 entries in the spam data mine each day, not just those subjects that appeared with at least 1,000 entries in the spam data mine and contained the key words. 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Figure 14- Number of subjects over 1,000 by day for February 41 Of the ten January spam campaigns, only one remained active in February. Three new spam campaigns appeared in February and one spam campaign from December reappeared. This month only three affiliate sites appeared with active spam campaigns: CP, OP, and PE (Figure 15). One spam campaign remained active from January to February for affiliate site CP, and there was one new spam campaign that appeared, CPB. Affiliate site PE had one spam campaign from December reappear in February, PE-A, and one new spam campaign appear, PE-B. No spam campaigns remained active for affiliate site OP, but one new spam campaign, OP-D, appeared in February. 3 2 1 0 CP OP PE Figure 15- Number of spam campaigns in February for each affiliate site that appeared No pharma spam subjects appeared with at least 1,000 entries in the spam data mine on February 2nd. For the majority of the days in February, the only pharma spam subject that appeared with at least 1,000 entries in the spam data mine was “BUY NOW VIAGRA CIALIS”. The new spam campaign CP-B included subjects whose corresponding domains opened to affiliate sites PE, OP, CP, and Toronto Drug Store (TDS). All of the subjects that were observed on February 27th contained domains that opened to affiliate sites PE, OP, and/or CP. The subjects observed on February 28th and 29th all contained domains that opened to affiliate sites OP and TDS. Table 6 illustrates the activity of all the spam campaigns active during the month of February. OP-D CP-B CP-A PE-B PE-A Spam Campaign February 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Table illustrating when each spam campaign appeared and disappeared in February Table 6 18 19 20 21 22 23 24 25 26 27 28 29 42 43 March 2012 Due to the dramatic decrease in pharma spam subjects appearing with at least 1,000 entries in the spam data mine each day, which occurred from January 21st, 2012 through February 2012, method 2 was used to analyze the pharma spam from March through the end of the project. During March, there were six affiliate sites with active spam campaigns: CP, OP, CHCM, CNP, MCP, and PE (Figure 16). In addition to their individual spam campaigns, affiliate sites PE and CP were jointly associated with one spam campaign (CPE). All five spam campaigns that were active in February remained active in March. Nine spam campaigns from January reappeared in March and ten new spam campaigns appeared. The spam campaign for CPE appeared in March, CPE-A. Two CNP spam campaigns from January reappeared in March, CNP-B and CNP-C, as well as one MCP spam campaign, MCP-A*. Four PE spam campaigns were active in March; two remained active from February to March, and two new spam campaigns appeared: PE-C and PE-D. There were six active CHCM spam campaigns in March; three spam campaigns from January reappeared: CHCM-A, CHCM-B, and CHCM-C, and three new spam campaigns appeared: CHCM-D, CHCM-E, and CHCM-H. Five CP spam campaigns were active in March; two spam campaigns remained active from February to March, and three new spam campaigns appeared: CP-C, CP-D, and CP-E. There were also five OP spam campaigns active in March. One spam campaign remained active from February to March, three spam campaigns from January reappeared in March: OP-B, OP-C, and OP- 44 D, and one new spam campaign appeared, OP-E. This month affiliate site CHCM had the most spam campaigns active. 7 6 5 4 3 2 1 0 CP OP CHCM CNP MCP PE CPE Figure 16- Number of spam campaigns in March for each affiliate site that appeared In March, spam campaign PE-A appeared the most often while spam campaign CNP-C again appeared the least often, along with CP-D. Only subjects from spam campaigns CP-A, CP-C, and OP-D appeared on March 2nd. The subjects in the new spam campaign CPE-A included corresponding domains that opened to affiliate sites CP and/or PE. In March, affiliate site CP appeared most often with spam campaign CPE-A. Spam campaign CNP-B included three subjects that were previously associated with spam campaign MCP-A*. The subjects previously associated with the spam campaign MCP-A* included corresponding domains however, that never loaded. One subject from spam campaign CNP-C appeared within spam campaign CNP-B on March 3rd and 4th. On March 3rd, 4th, and 27th, spam campaign CHCM-H contained subjects whose corresponding domains opened to affiliate site MCP. Spam campaigns CHCM-C, CHCM-B, and CHCM-A were observed in succession with each other. Spam campaign CHCM-B appeared first in March, and then on its last active day, spam campaign 45 CHCM-C appeared. Once spam campaign CHCM-C became inactive, spam campaign CHCM-A appeared. The only overlap among the CHCM spam campaigns was on March 13th. Two subjects from spam campaign OP-B were observed in combination with spam campaign OP-C in March. The subjects included in spam campaign OP-B appeared within spam campaign OP-C on March 16th -18th. In addition to affiliate site CP, both affiliate site OP and affiliate site PE appeared within spam campaign CP-A this month. On March 2nd and March 14th, affiliate site OP appeared along with affiliate site CP. Affiliate site PE appeared only on March 31st within spam campaign CP-A. The subjects in spam campaign CP-B continued to have corresponding domains that opened to affiliate sites CP, PE, and/or OP; though CP was still the dominant affiliate site. Spam campaign CP-E contained some subjects whose corresponding domains opened to affiliate sites CHCM, PE, and Men’s Health (MH) in addition to affiliate site CP. Affiliate site PE only appeared on March 17th along with affiliate site CP within spam campaign CP-E. Affiliate site MH only appeared on March 27th within CP-E. Affiliate site CHCM appeared within CP-E on March 25th and 28th alone, and again on March 23rd, 26th, 27th, and 29th along with affiliate site CP. The spam campaign activity for March is shown in Table 7. Out of all the months analyzed throughout this research, the most spam campaign activity occurred during the month of March. March Spam Campaign PE-A PE-B PE-C PE-D CP-A CP-B CP-C CP-D CP-E CPE-A CNP-B CNP-C MCP-A* OP-A OP-B OP-C OP-D OP-E CHCM-A CHCM-B CHCM-C CHCM-D CHCM-E CHCM-H 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Table illustrating when each spam campaign appeared and disappeared in March Table 7 19 20 21 22 23 24 25 26 27 28 29 30 31 46 47 April 2012 During April, there were seven affiliate sites with active spam campaigns: CP, OP, CHCM, CNP, MCP, PE, and PMP (Figure 17). Again in April, PE and CP were jointly associated with one spam campaign (CPE), in addition to their individual spam campaigns. Of the 22 spam campaigns in March, 15 spam campaigns remained active in April. Four new spam campaigns appeared. Two spam campaigns for both OP and PE remained active from March to April. There was one spam campaign for CNP that remained active from March to April and one new spam campaign that appeared in April, CNP-D. Five spam campaigns for CP remained active from March to April, and one new spam campaign appeared in April, CP-F. There were three CHCM spam campaigns that remained active from March to April as well as one new spam campaign that appeared in April, CHCM-F. The spam campaigns for both CPE and affiliate site MCP remained active from March to April. One new affiliate site appeared in April, Pharmacy Med Pro (PMP), with one active spam campaign, PMP-A. In April, the affiliate site with the most active spam campaigns was CP. 8 6 4 2 0 CP OP CHCM CNP MCP PE CPE PMP Figure 17- Number of spam campaigns in April for each affiliate site that appeared 48 Similar to March, the affiliate site with the most active spam campaigns in April was CP. This month, affiliate site PE appeared within spam campaign CHCM-F on April 5th, 7th, 8th, and 9th. On April 13th and 14th, affiliate site MCP appeared within spam campaign CHCM-H. Spam campaign CNP-D contained two subjects whose corresponding domains once opened to affiliate site CHCM (CHCM-C), but in April were opening to affiliate site CNP. The only overlap between the two CNP spam campaigns occurred on April 4th. Some of the subjects within spam campaign PE-C contained corresponding domains that opened to affiliate site CP on April 8th -10th. This month, affiliate sites PE and CP appeared an equal amount of times within spam campaign CPE-A. Spam campaign CP-C contained one subject whose corresponding domains opened to both affiliate sites CHCM and CP (April 14th), as well as two subjects whose corresponding domains only opened to affiliate site CHCM (April 5th and April 7th). Affiliate site PE appeared within spam campaign CP-A on April 1st, 2nd, 5th, 26th, and 27th. Spam campaign CP-B only contained subjects whose corresponding domains opened to affiliate site PE this month. In April, only affiliate sites CP and CHCM appeared within spam campaign CP-E. On April 12th, affiliate site OP appeared along with affiliate site CP within spam campaign CP-F, as well as on April 17th by itself. Table 8 illustrates the all of the active spam campaigns’ activity during the month of April. 3 1 2 3 4 5 6 7 8 9 Due to a data base error, no data is available for this day April Spam Campaign PE-A PE-C CP-A CP-B CP-C CP-D CP-E CP-F CPE-A CNP-B CNP-D MCP-A* OP-C OP-E CHCM-A CHCM-D CHCM-F CHCM-H PMP-A 10 11 12 13 14 15 16 17 18 Table illustrating when each spam campaign appeared and disappeared in April Table 8 19 20 21 22 23 24 253 26 27 28 29 30 49 50 May 2012 There were six affiliate sites with active spam campaigns in May: CP, OP, CHCM, CNP, PE, and PMP (Figure 18). As with previous months, PE and CP were jointly associated with one spam campaign (CPE), in addition to their individual spam campaigns. Only 12 of the 19 active spam campaigns from April remained active in May. Four new spam campaigns appeared in May, and one spam campaign from March reappeared. The spam campaigns for CPE remained active from April to May. The one spam campaign for affiliate site PMP remained active from April to May. Both spam campaigns for affiliate site OP remained active from April to May. Two spam campaigns for affiliate site PE remained active from April to May, and one spam campaign from March reappeared in May, PE-D. Three spam campaigns for affiliate site CP remained active from April to May. One of the spam campaigns for affiliate site CP, however, was renamed and assigned to affiliate site CHCM, leaving affiliate site CP with only two active spam campaigns in May. Three CHCM spam campaigns remained active from April to May, two new spam campaigns appeared in May, CHCM-I and CHCM-J, and one spam campaign was gained from CP. Two new spam campaigns appeared in May for affiliate site CNP, CNP-E and CNP-F, and one spam campaign remained active from April to May. In May, the most spam campaigns active were for affiliate site CHCM. 51 8 6 4 2 0 CP OP CHCM CNP PE CPE PMP Figure 18- Number of spam campaigns in May for each affiliate site that appeared Spam campaigns PE-A and PE-C appeared the most often in May, and spam campaign CP-C (CHCM-G) and spam campaign CP-F appeared the least often. This month affiliate site CP appeared most often within spam campaign CPE-A. Unlike in April, affiliate site CP did not appear within the PE-C spam campaign in May. In May, a spam campaign was taken from affiliate site CP and assigned to affiliate site CHCM because spam campaign CP-C no longer contained subjects whose corresponding domains opened to affiliate site CP, but to affiliate site CHCM instead. Spam campaign CP-C is now designated as CHCM-G. Spam campaign CHCM-J was observed on the days in May that no subjects within spam campaign CHCM-I appeared. The only overlap observed among the CNP spam campaigns this month occurred on May 14th. The spam campaign activity for all of the active spam campaigns in May is shown in Table 9. CHCM-G CHCM-H CHCM-I CHCM-J PMP-A CP-C CP-F CPE-A CNP-D CNP-E CNP-F OP-E CHCM-D CHCM-F May Spam Campaign PE-A PE-C PE-D CP-A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Table illustrating when each spam campaign appeared and disappeared in May Table 9 19 20 21 22 23 24 25 26 27 28 29 30 31 52 53 June 2012 In June, there were five previously observed affiliate sites with active spam campaigns: CP, OP, CHCM, CNP, PE, and two new affiliate site appeared with active spam campaigns: RX Discounts (RD) and RX Deals (RXD) (Figure 19). As with previous months, PE and CP were jointly associated with one spam campaign (CPE), in addition to their individual spam campaigns in June. There were 14 spam campaigns that remained active in June out of the 17 active spam campaigns in May. Five new spam campaigns appeared. The spam campaign for CPE remained active from May to June. There were two CNP spam campaign that remained active from May to June and one new spam campaign that appeared in June, CNP-G. One spam campaign for affiliate site CP remained active from May to June, and one new spam campaign appeared in June, CP-G. In June another spam campaign for affiliate site CP was renamed. This month the CP spam campaign was assigned to affiliate site OP, leaving affiliate site CP with only one active spam campaign in June. One OP spam campaign remained active from May to June and two new spam campaigns appeared in June, OP-F and OP-H. No new spam campaigns appeared for affiliate site PE in June, but three spam campaigns remained active from May to June. Six CHCM spam campaigns remained active from May to June. Affiliate site CHCM also lost a spam campaign in June. The spam campaign was renamed and assigned to the new affiliate site RXD, leaving affiliate site CHCM with five active spam campaigns. Two new affiliate sites appeared in June, RXD and RD; each with one spam campaign, RXDA and RD-A. The majority of the spam campaigns active in June again came from 54 affiliate site CHCM. Similar to May, the affiliate site with the most active spam campaigns in May was affiliate site CHCM. 6 5 4 3 2 1 0 CP OP CHCM CNP PE CPE RD RXD Figure 19- Number of spam campaigns in June for each affiliate site that appeared The spam campaigns that appeared most often in June were spam campaign CP-F (OP-G) and CPE-A. The spam campaign that appeared the least in June was spam campaign OP-H. This month a CP spam campaign was renamed and assigned to affiliate site OP because spam campaign CP-F no longer contained subjects whose corresponding domains that opened to affiliate site CP, but instead to affiliate site OP. The spam campaign CP-F is now designated as spam campaign OP-G. A CHCM spam campaign was also renamed and was assigned to affiliate site RXD because spam campaign CHCM-G included subjects whose corresponding domains opened to affiliate sites RXD, Generic Pills (GP), and Top Pharm (TP); while only one subject’s corresponding domains opened to affiliate site CHCM (June 1st). The spam campaign CHCM-G is now designated as spam campaign RXD-A. The new spam campaign RD-A was made up of subjects whose corresponding domains opened to multiple different affiliate sites. In June the affiliate sites associated with this spam campaign were: RD, Mega RX (MRX), and RX Orders (RXO). 55 No CNP spam campaigns were active concurrently this month. Spam campaign CNP-E was the CNP spam campaign that first appeared in June and once it became inactive, spam campaign CNP-D became active for the rest of the month, with the exception of about a week. During that week, spam campaign CNP-G was active. The activity for the active spam campaigns in June is shown in Table 10. June Spam campaign PE-A PE-C PE-D CP-F CP-G CPE-A CNP-D CNP-E CNP-G OP-E OP-F OP-G OP-H CHCM-D CHCM-F CHCM-G RXD-A CHCM-H CHCM-I CHCM-J RD-A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Table illustrating when each spam campaign appeared and disappeared in June Table 10 18 19 20 21 22 23 24 25 26 27 28 29 30 56 57 July 2012 There were six affiliate sites with active spam campaigns in July: OP, CHCM, CNP, PE, RD, and RXD (Figure 20). In addition to their own spam campaigns, again affiliate sites PE and CP were jointly associated with one spam campaign (CPE). Of the 19 spam campaigns active in June, there were 14 spam campaigns that remained active in July. Three new spam campaigns appeared. One spam campaign remained active from June to July for CPE and affiliate sites CNP, RXD, and RD. Three CHCM spam campaigns remained active from June to July; no new spam campaigns appeared in July. Three OP spam campaigns remained active from June to July and one new OP spam campaign appeared in July, OP-I. Only one spam campaign for affiliate site CP remained active from June to July, but the spam campaign was renamed and assigned to affiliate site PE, leaving affiliate site CP with no active spam campaigns in July. There were three PE spam campaigns that remained active from June to July and two new PE spam campaigns appeared in July, PE-E and PE-F. Unlike any other month, the most active spam campaigns belonged to affiliate site PE in July. 8 6 4 2 0 OP CHCM CNP PE CPE RD RXD Figure 20- Number of spam campaigns in July for each affiliate site that appeared 58 Spam campaign PE-A appeared the most often in July and spam campaign OP-F appeared the least often. The PE spam campaigns attributed to the majority of the spam campaign activity in July. Even though a spam campaign for every affiliate site that was active appeared every day in July, this month contained the least amount of spam campaign activity of all the months analyzed throughout this research. When spam campaign PE-D appeared in other months, the subjects within the spam campaign contained all upper case letters in the subject. During July, the same variations of subjects were observed, except the subjects contained upper and lower case letters. Spam campaign PE-F included subjects whose corresponding domains opened to affiliate site PE as well as a foreign online pharmacy Mister Joy (MJ). The domains opened to MJ for only one subject, and that subject appeared on July 26th and July 27th. Spam campaign RXD-A no longer included subjects whose corresponding domains opened to affiliate site CHCM. This month the subjects’ corresponding domains opened to three affiliate sites from June: GP, RXD, TP, as well as two new affiliate sites: Direct Pharm (DP) and Pills Shop (PS). This month, only affiliate site CP appeared within spam campaign CPE-A. In July, only affiliate site MRX appeared within spam campaign RD-A. Affiliate site CP lost another spam campaign in July, this month to affiliate site PE. In spam campaign CP-G, the subjects’ corresponding domains were no longer opening to affiliate site CP. Instead, the subjects corresponding domains were opening to affiliate site PE, thus spam campaign CP-G is now designated as spam campaign PE-G. Table 11 illustrates the activity of all of the active spam campaigns for the month of July. July Spam campaign PE-A PE-C PE-D PE-E PE-F PE-G CP-G CPE-A CNP-D OP-E OP-F OP-G OP-I CHCM-D CHCM-F CHCM-H RXD-A RD-A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Table illustrating when each spam campaign appeared and disappeared in July Table 11 19 20 21 22 23 24 25 26 27 28 29 30 31 59 60 August 2012 Seven affiliate sites had active spam campaigns in August: CP, OP, CHCM, CNP, PE, RD, and RXD (Figure 21). There were six new spam campaigns that appeared in August and 13 of the 17 spam campaigns from July remained active. One spam campaign from June reappeared in August. One spam campaign for each of the affiliate sites RD, RXD, and CP remained active from July to August. Five new OP spam campaigns appeared in August: OP-J, OPK, OP-L, OP-M, OP-N, and two spam campaigns remained active from July to August. All six PE spam campaigns remained active from July to August; no new spam campaigns appeared. Two CHCM spam campaigns remained active from July to August; no new spam campaigns appeared. One CNP spam campaign from June reappeared in August, CNP-G, and one CNP spam campaign remained active from July to August. This month, unlike any other month, the most spam campaigns active were for affiliate site OP. 8 6 4 2 0 CP OP CHCM CNP PE RD RXD Figure 21- Number of spam campaigns in August for each affiliate site that appeared In August spam campaign RXD-A appeared the most often and spam campaign OP-M appeared the least often. Similar to July, the PE spam campaigns accounted for most of the spam campaign activity. The activity of the spam campaigns increased from 61 July to August. Mostly everyday a new subject in the spam campaign PE-F replaced a previous subject. Spam campaign PE-F started with three different subjects in July and expanded to 31 different subjects in August. Only two of the subjects that appeared in July also appeared in August. During August, MJ appeared within spam campaign PE-F on August 24th and 26th; associated with a different subject than in July. Affiliate site CP appeared once on August 23rd within spam campaign PE-F. The affiliate sites that appeared within spam campaign RD-A in August included two affiliate sites from June, MRX and RXO, one affiliate site from July, RD, and three new affiliate sites: Direct Pills (DTP), Discount Meds (DM), and Top Meds (TM). Affiliate site RD appeared the most when subjects in this spam campaign appeared, followed by affiliate site DM and affiliate site DTP. During the month of August, the same two affiliate sites from July appeared with spam campaign RXD-A: RXD and GP. There was also one new affiliate site that appeared, Global RX (GRX), and two known affiliate sites that appeared within the spam campaign, PE and CP. The majority of the days in August, the subjects’ corresponding domains opened to affiliate sites RXD, GP, and PE. Affiliate site GRX was only observed on August 23rd -25th. On August 25th, affiliate site GRX was observed along with affiliate sites RXD and PE. Affiliate site CP only appeared on August 30th and 31st. The only days in August affiliate site PE did not appear were August 5th, 23rd and 24th. Spam campaign OP-L contained one subject, but occasionally, there would be a different variation of the subject that also appeared in this spam campaign. When the different version of the subject would appear, both subjects would appear on the same 62 day. Spam campaign OP-J only contained two subjects and both subjects appeared within spam campaign CNP-A* in December as well as spam campaign MCP-A* in January and March. During August, however, both subjects contained corresponding domains that opened to affiliate site OP. The two CNP spam campaigns alternated appearances, so there was no overlap between the CNP spam campaigns in August. The spam campaign activity for the active spam campaigns in August is shown in Table 12. August Spam campaign PE-A PE-C PE-D PE-E PE-F PE-G CP-H CNP-D CNP-G OP-E OP-I OP-J OP-K OP-L OP-M OP-N CHCM-D CHCM-F RXD-A RD-A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Table illustrating when each spam campaign appeared and disappeared in August Table 12 19 20 21 22 23 24 25 26 27 28 29 30 31 63 64 Significant Observations Affiliate sites CHCM and CNP do not share a template or domains, but the two affiliate sites do share several IP addresses. The two affiliate sites also share a pattern among the domains associated with their subjects. Beginning in May, affiliate sites CNP and CHCM both contained subjects with multiple different domains that would appeared and disappear. These domains would appear for about a week and then disappear and be replaced by other domains. This occurred through August. Spam campaigns CNP-D, CNP-E, CNP-F, CNP-G, CHCM- F, CHCM-I, and CHCM- J all contained subjects whose domains appeared and disappeared (Table 13a and Table 13b). Table 13a contains domains that were observed beginning in May, June and most of July. Table 13b contains domains that appeared during the end of July and into August. Association among these different subgroups of domains comes from the domains in one subgroup appearing with other domains within the same subgroup. Not all of the domains in one subgroup appeared at the same time. At most, 2 or 3 domains would appear together with on pharma spam subject. When one domain disappeared, either the other two remained independently, or another domain was added. For example, subject A was associated with domain a, domain b, and domain c on one day, and then the next day only domain b and domain c were associated with subject A. Sometimes domain a would be replaced by domain d and other times, only domain b and domain c would remain associated with subject A. This domain pattern occurred through the end of the research and no overlap among the different subgroups was observed, nor was there any overlap among the CNP and CHCM domains. 65 Table 13a Subgroup of domains that appeared with CHCM and CNP spam campaigns in May, June, and July CHCM Domains jarb.ru haii.ru jkvix.ru ghdsjg.ru sdfhgh.ru cotu.ru qgfdg.ru jkvix.ru sdfhgs.ru sdfjgh.ru sdfhsj.ru blkfts.ru sdhfg.ru cjzkef.ru syli.ru dfnmsd.ru sugn.ru dfdonf.ru tbin.ru dfnsmf.ru tamy.ru dfnsnf.ru thow.ru dhfjsf.ru tiew.ru fdjss.ru utay.ru fndmsf.ru uped.ru fkvjgx.ru whas.ru fnsmsf.ru vieh.ru igjfmd.ru zaph.ru vkflrk.ru zinh.ru vhdhfd.ru zirl.ru xkflwf.ru ardt.ru akjkkd.ru dhhfgh.ru zjfkqf.ru nbmvdf.ru dfnmsd.ru nvmcn.ru sitty.ru ooidkf.ru skfjjs.ru pchvif.ru skirk.ru qkfgif.ru vcxjvv.ru wieu.ru vhcjv.ru dhfjsg.ru dhfgs.ru dfsmnf.ru mkfluk.ru CNP Domains loug.ru hfdjd.ru lupp.ru jhfdj.ru maimy.ru jvhjaq.ru orpe.ru kjckv.ru phad.ru kjvkxd.ru phah.ru ontia.ru poug.ru outtp.ru plew.ru pitre.ru pryl.ru qbncb.ru puee.ru rubed.ru pupt.ru schid.ru qeurtu.ru sdgfg.ru sdfgsh.ru sdhff.ru rawn.ru slous.ru sdfhjg.ru socey.ru sdhfhf.ru soind.ru sdhfjh.ru surum.ru bvhuf.ru vcbxn.ru cvnmr.ru vchvvc.ru dfjdd.ru vhcjjv.ru djfhf.ru vncmv.ru dfjdd.ru cvjckv.ru dfjkd.ru vkjcvv.ru hism.ru fecy.ru humf.ru ghfdjg.ru hyse.ru hief.ru jdfuf.ru julone.ru setle.ru kreap.ru bvhuf.ru kjffg.ru frof.ru jetle.ru 66 Table 13b Subgroup of domains that appeared with CHCM and CNP spam campaigns beginning at the end of July through August CHCM Domains jith.ru herj.ru bolf.ru hiek.ru ethi.ru husc.ru eyst.ru chli.ru jium.ru lhad.ru nuld.ru moif.ru nyed.ru oing.ru oxie.ru pahs.ru peem.ru plii.ru powd.ru rerr.ru vatz.ru terct.ru vaub.ru verib.ru waral.ru wetz.ru wreb.ru wrau.ru yult.ru adae.ru CNP Domains inif.ru neaf.ru jkflkf.ru kkflik.ru undy.ru ukfjxf.ru vasce.ru ytti.ru elof.ru zkflwf.ru eers.ru foan.ru gnag.ru gnoe.ru lolm.ru mcal.ru muet.ru olew.ru othy.ru ouck.ru prua.ru tbing.ru tetl.ru tirv.ru ucle.ru uncup.ru uniot.ru urax.ru gaig.ru wrof.ru Affiliate site MCP could also be included in this connection between affiliate sites CNP and CHCM. There was overlap among spam campaigns between affiliate sites CNP and MCP in December and January. Also, the spam campaign for affiliate site MCP often appeared on the same days as spam campaigns for affiliate sites CNP and CHCM until affiliate MCP disappeared in May. There are also shared IP addresses among affiliate sites CNP, CHCM, and MCP. The only significant pattern of overlap among spam campaigns was observed between spam campaigns for affiliate sites CNP and CHCM. Beginning in December, spam campaigns CNP-B and CHCM-A appeared on the same days except December 5th, 17th, 21st, and 27th. The spam campaigns appeared on the same days in January as well except for January 8th-10th. The two spam campaigns did not appear in February and there 67 was no clear pattern of overlap in March. In April, spam campaigns CNP-B and CHCMA appeared on the exact same days, April 1st-4th, and then neither spam campaign appeared again throughout the research. Another overlap between spam campaigns for affiliate sites CNP and CHCM began in April with spam campaigns CNP-D and CHCM-F. In April, both spam campaigns appeared on the same days except on April 30th. No clear pattern was observed between the two spam campaigns in May, but in June, spam campaigns CNP-D and CHCM-F appeared on the exact same days, June 4th-9th and June 16th-30th. Spam campaigns CNP-D and CHMC-F appeared again on the exact same days in July except for July 13th. In August, the only days the spam campaigns CNP-D and CHCM-F did not appear on the same days were August 18th-21st, and August 31st. Spam campaigns CNP-G and CHCM-J displayed only one clear pattern of overlap, which occurred in June, when they appeared on the exact same days, June 10th-15th. After June, the two spam campaigns did not appear within the same month together. Another interesting finding occurred among spam campaigns and their associated affiliate sites. The spam campaign CP-C first appeared in March with subjects whose corresponding domains opened to affiliate site CP. Spam campaign CP-C remained active through April. Beginning in May, spam campaign CP-C no longer contained subjects whose corresponding domains were opening to affiliate site CP, but to affiliate site CHCM instead. The spam campaign CP-C was assigned to affiliate site CHCM and renamed CHCM-G in May. However, in June the same spam campaign (CHCM-G) no longer contained subjects whose corresponding domains were opening to affiliate site CHCM but to multiple other affiliate sites: GP, RXD, TP, DP, and PS. The spam 68 campaign CHCM-G then became designated as spam campaign RXD-A because out of the five different affiliate sites that were associated with this spam campaign in June, RXD appeared the most often. This hinted to two possible conclusions: 1. the same affiliate is working for multiple affiliate programs and using the same spam campaign while working for all of the different affiliate programs or 2. Each of the affiliate sites observed within this spam campaign is just a different template being used by the same affiliate program. Two other spam campaigns for affiliate site CP were also renamed and assigned to other affiliate sites, CP-F and CP-G. Spam campaign CP-G was assigned to affiliate site PE and renamed PE-G. Spam campaign CP-F was assigned to affiliate site OP and renamed OP-G. It was also interesting to find that many of the affiliate sites that appeared in June, July, and August shared similar templates with other affiliate sites as well as the same IP addresses. Some affiliate sites shared a template but not necessarily the IP address. This held true for the templates also; just because multiple affiliate sites shared an IP address or two, did not mean those same affiliate sites shared a template. Affiliate sites RXD and GP appeared within spam campaign RXD-A and always appeared together with the shared IP addresses, 78.110.164.200 and 78.46.248.33. These two affiliate sites also contained nearly identical templates, meaning the source codes for their websites were identical except for the name of the affiliate site. Affiliate site TP also appeared in June within spam campaign RXD-A and was also associated with the same 69 two IP addresses as affiliate sites RXD and GP. The template for affiliate site TP, however, did not match the template for affiliate sites RXD and GP. Affiliate sites MRX, RD, and RXO also appeared in June but within spam campaign RD-A. Both affiliate sites MRX and RD were associated with the same IP addresses, 193.16.12.67, but affiliate site RXO was not associated with the IP address. The templates for affiliate sites MRX and RD were somewhat similar, but still very different, and the template for affiliate site RXO was very different from both affiliates site MRX and RD. Interestingly enough, though, affiliate site RXO shares a nearly identical template with affiliate site TP, but the two do not share an IP address or a spam campaign. In July, affiliate sites GP, RXD, and TP continued to appear within spam campaign RXD-A with the same two IP addresses 78.110.164.200 and 78.46.248.33. Two new affiliate sites appeared within the spam campaign also: DP and PS. Affiliate sites GP, RXD, and PS all shared the same IP address, 193.164.128.67. Both affiliate sites PS and DP shared a common IP address with the affiliate sites GP, RXD, and TP, 78.46.248.33. The template for affiliate site PS is nearly identical to the template for affiliate site RD, but the two affiliate sites do not share common IP addresses or a spam campaign. Affiliate site MRX was the only affiliate site that appeared with spam campaign RD-A in July, but affiliate site MRX shared its only associated IP address, 193.164.128.67 with the affiliate sites from spam campaign RXD-A: GP, RXD, and PS. During August, spam campaign RXD-A appeared with the same affiliate sites RXD and GP, but also affiliate sites PE, CP, and a new affiliate site GRX. Affiliate site 70 PE always appeared with its own IP address, 84.22.127.34 and does not share a template with any other affiliate site. When affiliate site CP appeared, it appeared with the same IP address as affiliate site PE and also does not share a template with any other affiliate program. Affiliate site GRX does not share a template with another affiliate site either, but shared the IP address 193.164.128.67 with affiliate sites RXD and GP, as well as affiliate sites from spam campaign RD-A: DTP, RD, MRX, DM, TM, and RXO. The spam campaign RD-A appeared in August with more than one affiliate site: DTP, RD, MRX, DM, TM, and RXO. Each of those affiliate sites shared a common IP address 193.164.128.67; the same IP address associated with spam campaign RXD-A. There were also two separate IP addresses, 81.19.183.149 and 88.86.115.45 shared between affiliate sites DM and TM. Affiliate sites DTP and MRX have nearly identical templates, but affiliate site DM shares a nearly identical template with affiliate site DP, which appeared in spam campaign RXD-A in July. Affiliate site TM does not share a template with any other affiliate site observed throughout the research. The spam campaigns RXD-A and RD-A contained two different groups of subjects, but there were two areas of overlaps between the two spam campaigns that cannot be ignored. The main overlap was the shared IP addresses among the affiliate sites within each spam campaign. The IP address, 193.164.128.67, appeared with almost all of the affiliate sites associated with both the spam campaigns. The spam campaigns RXD-A and RD-A also shared templates among their affiliate sites. In general, affiliate sites within the same spam campaign share nearly identical templates, but in this case the affiliate sites from spam campaign RXD-A shared nearly identical templates with the affiliate sites from spam campaign RD-A. 71 CHAPTER 4 CONCLUSION Using the UAB spam data mine, pharma spam subjects were collected daily and analyzed. There were 22 different affiliate sites that made up 47 different spam campaigns from December 2011 through August 2012. During each month, some subjects that appeared contained associated domains that did not always load either on the first visit or after visiting the affiliate site once. There were also some domains that never opened to an affiliate site. Those “dead” domains could account for certain gaps in the spam campaign activity throughout the months. Based on the shared IP addresses, shared patterns of spam campaign appearances, and the pattern of domain appearances and disappearances for affiliate sites CNP and CHCM, the two are connected by an affiliate. There were multiple overlaps in the spam campaigns for affiliate sites CNP, CHCM, and MCP, and the three affiliate sites also share IP addresses, implying that affiliate site MCP is also run by the same affiliate as affiliate sites CNP and CHCM. The appearance of affiliate sites PE and CP within the spam campaign CPE-A from March through July indicates that the two affiliate sites are connected. Both affiliate sites also appeared within other spam campaigns together. There are multiple shared IP addresses between affiliate sites CP and PE. The shared IP addresses and the similar 72 appearance patterns of the spam campaigns for both affiliate sites CP and PE indicate that the two affiliate sites are being run by the same affiliate. On any given day throughout the research, a spam campaign for affiliate site CP and/or affiliate site PE appeared. These two affiliate sites appeared with spam campaigns the most frequently and consistently throughout the research, including during the drastic change in the number of entries for pharma spam that was appearing each day in the spam data mine; which affected most of the other affiliate sites. Affiliate sites CP and PE were the dominate affiliate sites observed throughout the research. The overlaps in the templates used in the spam campaigns RXD-A and RD-A indicate that all of the affiliate sites within both the spam campaigns are different templates that were made by the same affiliate program. The common IP address, 193.164.128.67, between the two campaigns indicates the same affiliate is hosting all of these affiliate sites for that affiliate program. The switch in spam campaigns from CP-C to CHCM-G to RXD-A, as mentioned earlier, does imply that all the affiliate sites that appeared within all three spam campaigns: CP, CHCM, PE, GP, RXD, TP, PS, DP, DTP, and GRX, though not connected in any other manner4, are different templates used by the same affiliate. Even though PE and CP are run by the same affiliate, that does not, however, mean that the templates for affiliate sites CP and PE or CHCM came from the same affiliate program as the affiliate site templates for GP, RXD, TP, PS, DP, DTP, and GRX. There were also no shared IP addresses among spam campaign RXD-A, spam campaign CHCM-G, or spam campaign CP-C. This means the affiliate worked for multiple affiliate programs and 4 With the exception of CP and PE 73 continued to use the same spam campaign. It is also important to mention, the affiliate running theses affiliate sites is not necessarily the same affiliate that runs the independent spam campaigns for affiliate sites CP and PE or affiliate site CHCM. Three spam campaigns for affiliate site CP were renamed and assigned to other affiliate sites. As previously discussed, spam campaign CP-C was assigned to affiliate CHCM and renamed CHCM-G. Spam campaign CP-G was assigned to affiliate site PE and renamed PE-G. This is not so unusual because affiliate sites PE and CP have already been confirmed to be connected. However, spam campaign CP-F was assigned to affiliate OP and renamed OP-G. This is unusual because outside of this spam campaign’s switch in affiliate sites, no other evidence has implied a connection between affiliate sites CP and OP. The IP addresses used when the spam campaign was CP-F are different from the IP addresses used when the spam campaign was switched to OP-G. This implies that the same affiliate ran the spam campaign CP-G for one affiliate program and then later began working for another affiliate program, running the same spam campaign, but with a different template. Again, this does not mean that the affiliates running the individual spam campaigns for affiliate site CP are the same affiliates running the individual spam campaigns for affiliate site OP. The most consistent spam campaigns were courtesy of affiliate sites PE, CP, and CHCM. Though affiliate sites CP and PE were the most frequently encountered affiliate sites throughout the research, spam campaigns for all three affiliate sites remained constant throughout the entire nine months of the project. There were 47 spam campaigns identified, which indicates that there is a maximum of 47 different affiliates running the 47 spam campaigns. Those “47” affiliates, though, work for only about eight to ten 74 different affiliate programs, based on the different templates that appeared throughout the research. Multiple spam campaigns appeared for numerous affiliate sites but that does not mean that only one affiliate was running all the spam campaigns for each affiliate site. The multiple spam campaigns for each affiliate site also does not mean only one affiliate was running each spam campaign. It is very possible that one affiliate could have run at least two of the spam campaigns that appeared for each affiliate site. In order to be sure, further research would have to be conducted. This research was exploratory, so it may not have identified every affiliate site that was active between December 2011 and August 2012. What this research does reveal is that the most predominate affiliate sites responsible for the most prevalent spam campaigns are the affiliate sites CP, PE, and CHCM. From an investigative stand point, if the servers hosting the domains that run these spam campaigns were taken down, a vast majority of the spam that appears in the email box’s of Americans would decrease. 75 LIST OF REFERENCES 1. Spam! Lorrie Faith Cranor, Brian A. LaMacchia. 8, 1998, Vol. 41, pp. 74-83. 2. Spam Statistics and Facts. Spam Laws. [Online] 2013. [Cited: 04 04, 2013.] http://www.spamlaws.com/spam-stats.html. 3. O'Leary, Tom. Spam Statistics: Worst spam offenders, countries, conversion rates. GroupMail. [Online] 1997-2013. [Cited: 04 04, 2013.] http://group-mail.com/emailmarketing/spam-statistics-worst-spam-offenders-countries-conversion-rates/. 4. Fighting Spam on Social Websites: A Survey of Approaches and Future Challenges. Paul Heyman, Georgia Koutrika, Hector Garcia-Molina. Stanford : IEEE Computer Society, 2007, pp. 36-45. 5. K. Levchenko, A. Pitsillidis, N. Chachra, S. Savage, M. Félegyházi, C. Grier et al. Click Trajectories:End-toEnd Analysis of the Spam Value Chain. 2011. 6. Adult content spam. SECURELIST. [Online] 1997-2013. [Cited: 04 04, 2013.] http://www.securelist.com/en/threats/spam?chapter=89. 7. Soma Halder, Richa Tiwari, Alan Sprague. Identifying Features to Imporve Real TIme Clustering and Domain Blacklisting. 2011. 8. Fisher, Tim. Malware. About.com. [Online] [Cited: 04 07, 2013.] http://pcsupport.about.com/od/termsm/g/malware.htm. 9. Rouse, Margaret. botnet (zombie army). TechTarget. [Online] 02 2012. [Cited: 02 09, 2013.] http://searchsecurity.techtarget.com/definition/botnet. 10. Spam Image Clustering for Identifying Common Sources of Unsolicited Emails. Chengcui Zhang, Xin Chen, Wei-Bang Chen, Lin Yang, Gary Warner. 3, s.l. : IGI Global, 2009, International Journal of Digital Crime and Forensics, Vol. 1. 11. Know you Neighbors: Web Spam Detection using the Web Technology. Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, Fabrizio Silvestri. Amsterdam : s.n., 2007, pp. 423-430. 12. Wikipedia. Spamdexing. Wikipedia. [Online] 03 18, 2013. [Cited: 03 19, 2013.] http://en.wikipedia.org/wiki/Spamdexing#Link_spam. 76 13. Link Spam. [Online] 03 19, 2013. [Cited: 03 19, 2013.] http://www.searchenginepromotionhelp.com/m/articles/promotion-encyclopedia/linkspam.php. 14. Stacking classifiers for anti-spam filtering of e-mail. Georgios Sakkis, Ion Androroutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Constantine D. Spyropoulos, Panagiotis Stamatopoulos. Pittsburg : s.n., 2001, pp. 44-50. 15. Kaiser. U.S> Health Care Costs. kaiseredu.org. [Online] [Cited: 04 04, 2013.] http://www.kaiseredu.org/issue-modules/us-health-care-costs/background-brief.aspx. 16. Kliff, Sarah. We SPend $750 billion on unnecessary health care. Two charts explain why. The Washington Post. [Online] 09 07, 2012. [Cited: 04 19, 2013.] http://www.washingtonpost.com/blogs/wonkblog/wp/2012/09/07/we-spend-750-billionon-unnecessary-health-care-two-charts-explain-why/. 17. Dirt Cheap and Without Prescription: How Susceptible are Young US Consumers to Purchasing Drugs From Rogue Internet Pharmacies? Lana Ivanitskaya, PhD,corresponding author1 Jodi Brookins-Fisher, PhD, Irene O´Boyle, PhD, Danielle Vibbert, PhD Student, Dmitry Erofeev, PhD, and Lawrence Fulton, PhD. 04 26, 2010, Journal of Medical Internet Research. 18. Canadian Pharmacy. [Online] [Cited: 04 11, 2013.] 19. DEA. READ THIS BEFORE PURCHASING PRESCRIPTION DRUGS OVER THE INTERNET !!! DEA Office of Diversion Control. [Online] [Cited: 03 18, 2013.] http://www.deadiversion.usdoj.gov/consumer_alert.htm. 20. Pfizer. Countrfeiting & Importation. Pfizer. [Online] [Cited: 03 18, 2013.] http://www.pfizer.com/products/counterfeit_and_importation/counterfeit_importation.jsp. 21. Peterson, Karen S. Young men add Viagra to their drug arsenal. USA Today. 03 21, 2011. 22. Canadian Neighbor Pharmacy. [Online] [Cited: 04 11, 2013.] 23. World Health Organization. General information on counterfeit medications. Medicines. [Online] [Cited: 03 18, 2013.] Page 1. http://www.who.int/medicines/services/counterfeit/overview/en/. 24. Salyer, David. The Dangers of Using and Abusing Viagra. The Body: The Complete HIV/AIDS Resource. [Online] November/December 2004. [Cited: 08 19, 2012.] http://www.thebody.com/content/art32246.html. 25. Anonymous, Viagraholics. Frequently Asked Questions. 2006. 77 26. C. Kanich, N. Weaver, D. McCoy, T. Halvorson, C. Kreibich, S. Savage et al. Show Me the Money: Characterizing Spam-advertised Revenue. 27. Spam Trackers. SpamIt. Spam Trackers. [Online] 10 01, 2010. [Cited: 01 15, 2013.] spamtrackers.eu/wiki/index.php/Glavmed. 28. —. Glavmed. Spam Trackers. [Online] 11 29, 2010. [Cited: 01 15, 2013.] http://spamtrackers.eu/wiki/index.php/Glavmed. 29. KRAMER, ANDREW E. E-Mail Spam Falls After Russian Crackdown. The New York Times. [Online] 10 26, 2010. [Cited: 04 17, 2013.] http://www.nytimes.com/2010/10/27/business/27spam.html?_r=2&. 30. Krebs, Brian. SpamIt, Glavmed Pharmacy Network Exposed. Krebs on Security. [Online] 02 11, 2011. [Cited: 01 15, 2013.] krebsonsecurity.com/2011/02/spamitglavmed-pharmacy-networks-exposed/#more-8147. 31. —. Rove Digital Was Core ChronoPay Shareholder. Krebs on Security. [Online] 11 11, 2011. [Cited: 04 17, 2013.] http://krebsonsecurity.com/tag/igor-gusev/. 32. —. Spam Volumes Dip After Spamit.com Closure. Krebs on Security. [Online] 10 10, 2010. [Cited: 01 16, 2013.] http://krebsonsecurity.com/2010/10/spam-volume-dip-afterspamit-com-closure/#more-5593. 33. Microsoft. Microsoft Security Intelligence Report. 34. —. Microsoft Intelligence Security Report: July through December 2007. 2008. p. 68. 35. Chun Wei, Alan Sprague, Gary Warner, Anthony Skjellum. Mining Spam Email to Identify Common Origins for Forensic Application. 2007. 36. Chun Wei, Alan Sprague, Gary Warner. Clustering Malware-generated Spam Emails WIth a Novel Fuzzy String Mathcing Algorithm. Birmingham : s.n., 2007. 37. —. Detection of Networks Blocks Used by the Storm Worm Botnet. Birmingham : s.n., 2007. p. 357. 38. Calton Pu, Steve Webb. Observed Trends in Spam Construction Techniques: A Case Study of Spam Evolution. Atlanta : s.n., 2006. 39. Harris, Tom. How Affiliate Programs Work. How Stuff Works. [Online] 08 11, 2000. [Cited: 04 11, 2013.] http://money.howstuffworks.com/affiliate-program1.htm.
© Copyright 2026 Paperzz