Bachelor Thesis Finance
Stock Spam on US Small Cap Stocks:
The effect on Investor Behavior, Stock Volume and Return.
ANR: 380346
Author: Bjorn Heesakkers
Supervisor: E.S. Pikulina
Study Program: Business Studies
Date of submission: Friday 18th of May 2012
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Chapter 1: Introduction .......................................................................................................... 2
1.1 Background ...................................................................................................................... 2
1.2 Problem formulation ........................................................................................................ 3
1.3 Research questions ........................................................................................................... 4
1.4 Managerial relevance ....................................................................................................... 4
1.5 Introduction to methodology ............................................................................................ 4
1.6 Thesis structure ................................................................................................................ 5
Chapter 2 Methods .................................................................................................................. 6
2.1 Keyword analysis ............................................................................................................. 6
2.2 Event study ....................................................................................................................... 7
Chapter 3 Characteristics of a typical stock spam e-mail .................................................. 12
Chapter 4 Results event study .............................................................................................. 16
4.1 Stock return .................................................................................................................... 18
4.2 Volume change............................................................................................................... 18
Chapter 5 Conclusions........................................................................................................... 20
5.1 Limitations ..................................................................................................................... 21
5.2 Future work .................................................................................................................... 22
References ............................................................................................................................... 24
Appendix A – Python scripts ................................................................................................ 25
1
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Chapter 1: Introduction
1.1 Background
According to Symantec in February 2012 68 percent of e-mail was spam1. Among these emails are e-mails that are unsolicited e-mails promoting stocks. The stocks that are promoted
often are stocks that are traded on over-the-counter markets such as the OTC Bulletin Board
and the Pink Sheets in the United States. In the research Stock Market Manipulation – Theory
and Evidence Aggarwal & Wu (2003) identified 142 cases of stock market manipulation.
Their analysis showed that most manipulation cases happen in relatively inefficient markets
such as the OTC Bulletin Board and the Pink Sheets in the US. These stocks are illiquid
stocks and are, according to the research, easily manipulated (Aggarwal & Wu, 2003, p2).
Frieder and Zittrain (2007) found convincing evidence that stock prices were manipulated
through spam. A stock spammer tries to manipulate a stocks price in order to decrease their
risk or the risk of a third party paying to the stock spammer. When the price in a promoted
stock rises because of this promotion, the spammer or the third party involved in this scheme
will sell their shares into the market. This scheme is known as a pump-and-dump scheme2.
A significant amount of prior research has been done on this subject. In the paper ‘On the
effects of stock spam e-mails’ (Hanke & Hauser, 2006) the effect of stock spam e-mails on
excess returns, turnover and intra-day price range is investigated. In their sample Hanke and
Hauser used stocks that are traded on the US Pink Sheets. They used a data set that ranges
from Nov. 2004 to Feb. 2006. They found that stock spam had a significant positive impact
on all of the above variables. Hanke and Hauser also discovered that the positive news in the
spam e-mails has no long-term positive effect on the stock prices. Furthermore, they
discovered that repeated spamming on successive days enlarged the time window where the
spammers can liquidate their positions.
1
Symantec February 2012 Intelligence Report shows that 68% of e-mail was spam in that period.
(http://www.slideshare.net/symantec/2012-february-symantec-intelligence-report)
2
“"Pump and dump" schemes, also known as "hype and dump manipulation," involve the touting of a
company's stock (typically microcap companies) through false and misleading statements to the
marketplace. After pumping the stock, fraudsters make huge profits by selling their cheap stock into
the market.” (http://www.sec.gov/answers/pumpdump.htm)
2
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Böhme and Holz (2006) conducted research on the impact of stock spam on traded volume
and market valuation. In their research ‘The effect of Stock Spam on Financial Markets’
(Böhme & Holz, 2006) they concluded that stock promotion relates to an increase in trading
activity of the promoted stock. Moreover, they found positive cumulative abnormal returns
shortly after the spam messages were sent. The sample that was used here were stock spam
messages between November 2004 and February 2006. 68% of the stocks were listed on the
US Pink Sheets and the remaining were quoted on the US OTC Bulletin Board.
The aforementioned papers were published ranging from 2003 to 2007. In 1992 Allen and
Gale stated that stock-price manipulation through information was considered to be a thing of
the past with the passing of the Securities Exchange Act of 1934. Subsequent research proved
otherwise. There seems to be some doubt whether stock manipulation is something that still
occurs nowadays. Lastly, Bouraoui (2011) has researched the volatility of promoted stocks.
He found that stock spam affect stocks volatility positively and significantly. His research is
recent and confirms results that are found by Koski (1998) and Hanke and Hauser (2008)
who also researched the volatility of spammed stocks.
In this thesis I want to research academically if nowadays these stock spam messages indeed
still affect stock return and volume and subsequently, if stock spam affects investor behavior.
With investor behavior is meant whether people seem to be buying the spam stocks based on
the spam messages. In the conclusion I come back to this and see if policy makers should
protect investors more. I also research what the findings in this thesis mean for investors and
fund managers. Moreover, Böhme and Holz (2006) and Hanke & Hauser (2006) use the
same sample in their research. A different sample might shed a new light on the situation.
Furthermore, Frieder and Zittrain’s research is 5 years old. I want to research if, after 5 years,
stock spam still works. Because of recent prior research, volatility is not taken into account in
this thesis.
1.2 Problem formulation
Does stock spam affect the stock volume and return of a stock and subsequently, investor
behavior?
This problem statement leads to the following research questions. The answers of these
questions are used to determine whether stock spam affects investor behavior, in other words,
3
Running head: STOCK SPAM ON US SMALL CAP STOCKS
if people are buying the spammed stocks. The characteristics of a typical promoted stock
might shed some light on how spammers influence their readers.
1.3 Research questions
- What are the characteristics of a typical promoted stock?
- Does stock promotion affect stock volume?
- Does stock promotion affect stock returns?
1.4 Managerial relevance
If nowadays, there still is an effect of stock spam on the spammed stocks returns and volume;
in other words: if the buying behavior of these investors is affected by these spam e-mails,
this has some implications for investors.
For one, when investors buy spammed stocks that crash later on, these investors lose their
money. On the other side of the coin, market makers that sell or short the stocks at a high
price will have a positive return on their portfolio. For the misinformed investor that believes
the claims that are made in stock spam e-mails are real, this poses a threat to their financial
health. Policy makers should adapt their policies to protect investors so that they do not fall
into the trap of buying these stock spam e-mails. Market makers on the other hand, can adapt
their strategies to short these artificially overvalued stocks so that they on the one hand, can
make a profit and on the other hand dampen the effect of the stock spam e-mails with their
selling pressure, making the market more efficient.
1.5 Introduction to methodology
In this thesis I describe the general characteristics of promoted stocks and describe typical
stock spam e-mail. The spam e-mails are e-mails that I have collected whose date’s range
from 22-8-2010 to 1-3-2012. In total this amounts to about 8735 e-mails.
Out of these 8735 e-mails, I extracted the stock tickers. I only extracted the stock tickers of
stocks that are traded on the OTCBB and Pink Sheets exchanges. After I collected the tickers,
I searched for the accompanying spam e-mails for each ticker. I counted which tickers are
spammed most often and ran a keyword analysis where I extracted the top 10 keywords that
were used for each ticker. These top 10s will give insight into which words are often used for
stock promotion purposes.
After the characteristics of typical stock spam e-mail are described an event study follows
where the 2 dependent variables are researched: stock return and stock volume. 6 event
4
Running head: STOCK SPAM ON US SMALL CAP STOCKS
windows are used to determine if the stock spams have a significant effect on the 2 dependent
variables and if so, whether this effect is positive or negative.
The software that was used for the analysis is the Python programming language. The scripts
that were written for this research are in Appendix A. The end of day stock prices and
volumes were retrieved via Google Finance or Yahoo Finance, depending on which source
had the longest record of historical prices. After the stock prices and volumes were retrieved,
the return and volume changes were calculated and added to the files. Also, the prices and
volume changes were ordered from oldest to newest before this calculation was done. In the
event study the daily return and daily volume of the promoted stocks that are traded on the
US OTC Bulletin Board and Pink Sheets are used. In the end I draw conclusions about
volume change and stock return based on the sample.
1.6 Thesis structure
This first chapter provides the background and an introductory literature review to the topic.
It also gives the managerial relevance and an introduction to the methodology. Chapter 2
provides the methods used in this research intensively. Chapter 3 describes the characteristics
of a typical stock spam e-mail and chapter 4 describes the empirical findings of the event
study. In chapter 5 I describe the conclusions, limitations and future work. In the end of this
thesis the references and the Python scripts that were written for the research are included.
5
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Chapter 2 Methods
2.1 Keyword analysis
For the keyword analysis, a Gmail inbox containing 8735 stock spam e-mails is used. Using a
Python script the e-mails are synchronized to disk, so that they can quickly be accessed. Each
e-mail is assigned a unique identifier, so that each unique e-mail is downloaded only once.
After this is done, the subjects of the e-mails are used to search for tickers. A ticker consists
of alphabetic characters and a ticker typically has a length of 4 to 5 characters. Also, tickers
are written in uppercase. When a ticker is found, it is matched against a list of all the tickers
that are traded on the US OTCBB and Pink Sheets exchanges. When a match is found, this
means a valid ticker is found. For the gathered tickers that are extracted from the e-mails,
Gmail is used to search for this ticker in all messages in the subject and body of all the emails. The e-mails that are returned are the stock spam e-mails that refer to this ticker.
Some manual blacklisting is done to filter tickers such as “ALSO” and “WHEN” which
happen to be words that occur frequently in the English language and incidentally are tickers
on the OTCBB or Pink Sheets exchanges.
Using the tickers that are obtained in the above process, the number of e-mails that contain
information about a certain ticker are counted. A list is build with each ticker and its
accompany spam frequency. The top 10 spammed tickers are extracted. These are used for
further analysis. The e-mails for these top 10 spammed tickers are retrieved.
Empirically I found that text versions of the e-mails often are broken, incomplete or missing.
This is why I try to retrieve the HTML version of each e-mail. The HTML is converted to
text and all alphanumeric words are extracted from the bodies of the e-mails. These are words
containing only characters a to z or 0-9, regardless of their casing. Out of all the words in the
e-mail, each unique word is extracted and the frequency that this word occurs is counted.
After the frequencies for each unique word are found, the frequencies can be sorted from high
to low and the top 10 used keywords for each ticker can be extracted. Articles such as ‘a’ and
‘the’ and pronouns such as ‘they’ are manually blacklisted so that only nouns such as
‘information’ are in the top 10s.
6
Running head: STOCK SPAM ON US SMALL CAP STOCKS
2.2 Event study
An event study is used in order to determine whether a significant positive or negative effect
on a dependent variable in the stock takes place. This event study has two dependent
variables: stock return and stock volume change.
For this event study a 150 consecutive trading day estimation window is used. This
estimation window is used to create a model that can predict the returns and volume changes
in the pre-event window, on the event date and the post-event window.
Figure 1 – The setup of the event study
The pre-event window is a mean to research whether there is information leakage. In this
research, the post-event window is a tool to check what stock spam does with a stock after
stock spam has been sent.
As is illustrated in Figure 1, a buffer of 5 trading days is used between the estimation window
and the pre-event window, to make sure the model is not contaminated with data that can
possibly already include significant effects that take place within the pre-event window.
6 event windows are tested in this event study:
1. [-15, 0]
2. [-10, 0]
3. [0, 10]
4. [-10, 10]
5. [0, 20]
6. [0, 40]
7
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Event window 1 and 2 give information about information leakage. Window 3, 5 and 6 are
meant to research what the effects of stock spam are on a stock after the spamming has taken
place on a short and long-term basis. Window 4 tests what the effect on the stock is 10 days
before the event and 10 days after the event. It checks for information leakage and short-term
effects in one test. Note that because of the fact that some event windows require more
historical data, the number of stocks tested for that event window can be smaller.
Determining the event date follows this process for each event window:
1. Look for the oldest spam e-mail that can be found related to the stock ticker.
2. Check whether there is enough historical data to process the event study with this
event window. This means there must be enough data for the estimation window, the
pre-event window and the post-event window.
3. When there is enough data, the event study is executed for this event window and it
continues to the next ticker. When the data is insufficient, the second oldest spam email is selected and the process continues at step 2.
4. When all e-mails for a ticker have been processed and no proper event dates can be
found, the ticker is skipped.
The above process maximizes the chance to extract at least one testable event date for each of
the tickers that are in the sample.
When the event date was in the weekend or a non-trading day, the nearest trading day that
lies in the future was selected.
For the estimation window, the returns and volume changes of the Dow Jones U.S. SmallCap Index are used. This index is more representative for the sample than for example, the
S&P 500 index, which represents large capitalization companies.
To adjust for gaps in the historical data of the stocks, I use the historical data of the Dow
Jones U.S. Small-Cap Index to see which days are trading days. Subsequently the historical
data of the stock is filtered using this information, so that there are no gaps in the historical
data of the stock. In other words, only consecutive stock returns (and volume changes) are
left. For each stock return there is a Dow Jones U.S. Small-Cap Index return and for each
stock volume change there is a Dow Jones U.S. Small-Cap Index volume change with
matching dates.
8
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Secondly, the Dow Jones U.S. Small-Cap Index returns and volume changes are regressed
against each stock return so that the accompanying gradient, intercept, r-value, p-value and
standard error are returned. The gradient is beta and the intercept is alpha. Note that the same
is done for the other dependent variable, the volume change.
As mentioned before, the models that follow from the regressions can be used to predict the
stock returns and volume changes:
𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑒𝑡𝑢𝑟𝑛 𝑠𝑡𝑜𝑐𝑘 𝑑𝑎𝑦! = 𝛼1 + (𝛽1 ∗ 𝑑𝑗𝑢𝑠𝑠𝑐 𝑟𝑒𝑡𝑢𝑟𝑛 𝑑𝑎𝑦! )
and
𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑑𝑎𝑦! = 𝛼2 + (𝛽2 ∗ 𝑑𝑗𝑢𝑠𝑠𝑐 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑑𝑎𝑦! )
where djussc represents the Dow Jones U.S. Small-Cap Index data.
These models are used to predict the returns and volume changes within the 6 event windows.
After the predicted returns and volume changes are gathered for the 6 event windows, the
following is done:
𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑒𝑡𝑢𝑟𝑛 𝑑𝑎𝑦! = 𝑟𝑒𝑎𝑙 𝑟𝑒𝑡𝑢𝑟𝑛 𝑑𝑎𝑦! − 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑟𝑒𝑡𝑢𝑟𝑛(𝑑𝑎𝑦! )
and
𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑑𝑎𝑦!
= 𝑟𝑒𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑑𝑎𝑦! − 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒(𝑑𝑎𝑦! )
Now, for each event window there are two sets of abnormal returns: one contains the
abnormal stock returns and one contains the abnormal volume changes.
9
Running head: STOCK SPAM ON US SMALL CAP STOCKS
These abnormal stock returns and abnormal volume changes can be summed for each set.
This will result in the Cumulative Abnormal Returns (CARS) and Cumulative Abnormal
Volume Changes (CAVCS) for each stock for each event window:
!
𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑒𝑡𝑢𝑟𝑛 = 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑒𝑡𝑢𝑟𝑛(𝑑𝑎𝑦! )
!!!
and
!
𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 = 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒(𝑑𝑎𝑦! )
!!!
where n is the length of the event window.
For each stock for each event window the following tests are done:
𝑚𝑒𝑎𝑛 = 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑒𝑡𝑢𝑟𝑛 𝑠𝑡𝑜𝑐𝑘
𝑛
𝑡 − 𝑡𝑒𝑠𝑡 = 𝑚𝑒𝑎𝑛 𝜎 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑒𝑡𝑢𝑟𝑛𝑠 𝑠𝑡𝑜𝑐𝑘
√𝑛
and
𝑚𝑒𝑎𝑛 = 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 𝑠𝑡𝑜𝑐𝑘
𝑛
𝑡 − 𝑡𝑒𝑠𝑡 = 𝑚𝑒𝑎𝑛 𝜎 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒𝑠 𝑠𝑡𝑜𝑐𝑘
√𝑛
where n is the length of the event window.
10
Running head: STOCK SPAM ON US SMALL CAP STOCKS
This results in a t-test value for each stock for each dependent variable for each event
window. To clarify, each event window tests a set of stocks. For each stock within this
sample the CARS and the CAVCS are tested with the above tests. For example, when event
window [-10, 0] is tested for 200 stocks, this results in 400 test results: 200 CARS t-test
values and 200 CAVSCS t-test values.
Finally, now the CARS and CAVSCS are gathered for each stock, they can be tested as a
group. The significance of the cumulative abnormal returns of all the stocks needs to be
tested with a t-test statistic that checks whether the cumulative abnormal returns of all stock
are significantly different from 0:
𝑚𝑒𝑎𝑛 = 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑒𝑡𝑢𝑟𝑛𝑠
𝑛
𝑡 − 𝑡𝑒𝑠𝑡 = 𝑚𝑒𝑎𝑛 𝜎 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑒𝑡𝑢𝑟𝑛𝑠
√𝑛
Also, the cumulative abnormal volume changes are tested whether they are significantly
different from 0:
𝑚𝑒𝑎𝑛 = 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒𝑠
𝑛
𝑡 − 𝑡𝑒𝑠𝑡 = 𝑚𝑒𝑎𝑛 𝜎 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑎𝑏𝑛𝑜𝑟𝑚𝑎𝑙 𝑣𝑜𝑙𝑢𝑚𝑒 𝑐ℎ𝑎𝑛𝑔𝑒𝑠
√𝑛
All of the above t-tests are significant when they are in the 95% confidence levels.
11
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Chapter 3 Characteristics of a typical stock spam e-mail
Typical stock spam e-mail has a subject and a body, in which the stock ticker is promoted. In
the sample of 8735 e-mails, 3992 e-mails have the ticker in the subject. This is 45.7%. In the
e-mail the company is introduced and key information about the company is given. The
company is advertised in a persuasive way and in the end of the e-mail a disclaimer is noted
where the promoter mentions that the e-mail is an advertisement and not investment advice.
This is according to SEC regulations. In some cases, there is no disclaimer. These e-mails are
not in line with section 17b of the Securities Act of 1933, which states that promoters should
disclose their financial interest.
To discover what might trigger an investor to buy a promoted stock, a keyword analysis is
done on the 10 most promoted stocks. The measure that is used is the number of e-mails sent
about a ticker. Mentioned below are the top 10 keywords of the top 10 promoted stocks.
Table 1 - Top 10 promoted stocks
Ticker
Frequency (e-mails)
Relative frequency
SNPK
203
2,32%
MSTG
170
1,95%
FROG
109
1,25%
AMPW
99
1,13%
GSTP
85
0,97%
ONTC
83
0,95%
HHWW
82
0,94%
LSTG
75
0,86%
GCLL
74
0,85%
AUMY
74
0,85%
12
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Table 2 - Top 10 keywords
Rank
SNPK
MSTG
FROG
AMPW
GSTP
1
profiled
mstg
frog
company
com
2
information
gold
not
com
company
3
company
all
information
click
gold
4
you
high
you
info
not
5
companies
has
company
not
gstp
6
may
today
profiled
promoter
our
7
sscg
news
com
ampw
you
8
securities
com
securities
coal
has
9
snpk
investors
companies
about
companies
10
stock
more
investment
detailed
investment
Rank
ONTC
HHWW
LSTG
GCLL
AUMY
1
com
company
lstg
gcll
aumy
2
company
com
com
com
our
3
information
click
our
not
has
4
profiled
info
not
company
you
5
not
about
gold
our
company
6
our
detailed
information
information
we
7
companies
promoter
has
you
gold
8
you
not
company
has
information
9
securities
symbol
website
investment
not
10
may
name
profiled
madpennysto
cks
13
companies
Running head: STOCK SPAM ON US SMALL CAP STOCKS
As follows from the keyword analysis, the stock ticker is mentioned most often. Furthermore,
the words ‘company’, ‘information’ and ‘com’ are popular. ‘com’ is the remainder of an
URL. This means that the e-mails contain a lot of links to external websites. Also, the word
‘click’ is often used. What seems to be the case is that the e-mails try to convince the user to
click on a link. When looking at some of the e-mails of the top 10 promoted stocks, one
would see that these links refer to the website of the stock spammer, or to websites that
contain information about the company such as Yahoo Finance. Moreover, gold seems to be
used quite often. The reason is that the related tickers that represent these companies run gold
related activities. The recent rise in the gold price is a good convincer for people to buy a
stock. What is also interesting is that the word ‘investment’ is in the 9 to 10 ranks. This
potentially could also be a convincer for people to invest in a stock.
Moreover, the words ‘information’ and ‘info’ could influence the receiver to believe that he
is well informed when he reads the e-mail. ‘detailed’ refers to ‘information’. Also, what is
notable is that the word ‘not’ is mentioned often in the e-mails.
Typical stock spam e-mail looks like follows:
Figure 2 – The layout of a typical stock spam e-mail
The disclaimer is in small case letters hidden away from the rest of the message. This
disclaimer is there to comply with the section 17b Securities Exchange Act of 1933. The
stock spammer discloses his financial interest to the reader. In the disclaimer is mentioned
that the message is an advertisement and not investment advice. Furthermore, the disclaimer
14
Running head: STOCK SPAM ON US SMALL CAP STOCKS
contains information about the compensation for the advertising campaign and the length of
the advertising campaign. Note that in some e-mails this disclaimer is not there, meaning that
this stock spam is not legal, as it is not in accordance with section 17b of the Securities
Exchange Act of 1933. It is interesting to mention that the word ‘not’ oftentimes is used in
this disclaimer. It could be that using negatives in the disclaimer is hard to avoid for the
spammer if he wants to comply with regulations. It could also be the case that the spammer
uses negatives to discourage the reader to read the disclaimer.
Furthermore, the words ‘high’ and ‘investment’ are used in two of the top 10 spammed
stocks. With ‘investment’ the spammer could try to influence the reader to think that this
stock can be seen as an investment. The spammer wants the reader to buy the stock and hold
it as an investment. Moreover, the word ‘high’ can be seen as a positive word that could
trigger the reader to think the stock has high potential or high yield or a high chance of
succeeding.
Also, the words ‘company’ and ‘companies’ occur in almost the entire top 10. This may seem
logical, as the spammers promote companies. It might be the case however; that spammers
try to convey to the reader that there actually is a company behind the ticker and it is not just
a company that exists on paper. It could give a real feel to the ticker.
15
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Chapter 4 Results event study
In table 3 are the results of the event study. In the first row are the 6 event windows used. In
the second row is the number of stocks that were tested for that particular event window. For
example, for event window [0, 40] 178 stocks were tested.
In the next block of table 3 are the results of the t-tests that test the stocks as a group. ‘CARS
significant?’ identifies whether the cumulative abnormal returns of all stocks is significant. In
the next row is shown whether the cumulative abnormal returns are positive or negative. So,
for event window [0, 40] there is a significant negative effect on the stock returns over a
sample of 178 stocks.
In the following two rows the same is done for the cumulative abnormal volume changes. For
event window [-15, 0], there is significant positive change in volume for a sample of 201
stocks, for example.
In the last block of table 3 are the results for the case where the cumulative abnormal stock
returns and cumulative abnormal volume changes are tested individually for each stock
within the sample. For example, for event window [0, 40] for 1 stock the cumulative
abnormal stock return was significantly positive and for 19 stocks the cumulative abnormal
stock return was significantly negative.
The same is done again for the cumulative abnormal volume changes. For example, for event
window [-15, 0] 44 stocks have a significant negative cumulative abnormal volume change
and 1 stock has a significant positive cumulative abnormal volume change.
16
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Table 3 – Results event study
Event
[-15, 0]
[-10, 0]
[-10, 10]
[0, 10]
[0, 20]
[0, 40]
201
201
198
198
185
178
window
Number of
stocks
tested (N)
CARS and CAVSCS tested for whole group of stocks
CARS
Yes
Yes
No
No
Yes
Yes
Positive
Positive
Negative
Negative
Negative
Negative
Yes
Yes
Yes
No
No
No
Positive
Positive
Positive
Positive
Positive
Positive
significant
?
CARS
positive /
negative
CAVSCS
significant
?
CAVSCS
positive /
negative
Individual stock tests
N stocks
3
4
1
2
1
1
14
17
16
18
19
19
CARS
positive
N stocks
CARS
negative
17
Running head: STOCK SPAM ON US SMALL CAP STOCKS
N stocks
1
1
2
0
1
6
44
43
48
64
57
54
CAVSCS
positive
N stocks
CAVSCS
negative
As can be seen above, out of 449 tickers that are collected from the e-mails, a maximum of
201 tickers could be tested. The 2 dependent variables stock return and volume change are
discussed below.
4.1 Stock return
Interesting results from the event study follow: a significant positive effect on the cumulative
abnormal stock returns take place in the pre-event window. This means that there is
information leakage. Information leakage can mean different things in this context: the
spammer does not send all e-mails at once to the same group of people meaning that my
mailbox receives the e-mails relatively late or that the spammer buys the stock himselve
before spamming. When looking at the effects on stock return in the last 2 windows, another
interesting effect can be seen: 20 to 40 days after the event date the cumulative abnormal
stock returns are significantly negative. Conclusively, in the 15 days before the event takes
place there is a significant positive cumulative abnormal stock return in the stocks. 20 to 40
days after the event there is significant negative cumulative abnormal stock return.
Implications for investors are discussed in the conclusion.
4.2 Volume change
It is interesting that volume change displays the same behavior: the cumulative abnormal
volume changes are significantly positive in the first two event windows. This means that the
number of stocks that are traded in the pre-event window is significantly higher than normal.
For the event windows that cover the post-event trading days, no conclusions can be drawn,
as the test values are not significant.
Conclusively, from a causation versus correlation standpoint it could even mean that
heightened trading activity and significantly higher stock prices cause stock spamming, as the
action precedes the event. From a practical viewpoint this would not make any sense
18
Running head: STOCK SPAM ON US SMALL CAP STOCKS
however. Furthermore, when focusing on the results of the individual tests, the data is not
very informative when it comes to the individual CARS tests: the number of stocks that have
significantly positive or negative CARS are very small in comparison to the sample. They are
not representative of the sample. For the CAVSCS it seems that a relatively high number of
stocks consistently have a negative cumulative abnormal volume change in all of the event
windows. What is notable is that after the event these numbers are consistently higher than
before the event. This is represented in the significance of the CAVSCS when tested as a
group: after the event the test values are not significant anymore.
19
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Chapter 5 Conclusions
In the problem statement is asked:
“Does stock spam affect the stock volume and return of a stock and subsequently, investor
behavior?”
The following research questions are stated:
- What are the characteristics of a typical promoted stock?
- Does stock promotion affect stock volume?
- Does stock promotion affect stock returns?
As can be seen in the results, stock spammers use certain words to influence the readers of
their e-mails. They try to convince their readers to buy the stock that is promoted. They also
actively discourage the reader from reading the disclaimer through printing the disclaimer in
a very small font size and through the use of negatives. Some e-mail in the sample contains a
disclaimer that informs the reader that the e-mail is an advertisement and some do not. Emails without a disclaimer does not comply with SEC regulations.
In this research I wanted to see if these e-mails really were successful into convincing these
readers into buying these promoted stocks. In order to do this, the stock returns and stock
volumes around the date on which the spam is sent needed to be researched. This was done
through an intensive event study. In the results of the event study it can be seen that stock
return increases significantly 10 to 15 days before a stock spam e-mail was received.
Moreover, volume increased significantly in these 10 to 15 days. This can mean that people
are buying the stocks based on the e-mails. It can also mean that the stock spammer is the
only one buying. This would not make any sense. It does make sense however, that the
mailbox that is the sample receives the spam relatively late. What is important is that in the
10 days after the spam was received, no significant effect can be detected in the stock returns.
This means that on a short term these stocks stay on the same level after they have increased
significantly in the pre-event trading days. More important, a significant negative stock return
is seen in the 20 days to 40 days after the event. This means that after a few weeks to 1
month, the stocks crash. Unfortunately we cannot draw any conclusions on the volume after
the event date.
20
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Implications for investors and policy implications
For investors this means that the money they put in as an investment in these stocks, is lost.
Looking at the sample in this research, a lot of people are tricked into buying these artificially
pumped stocks. These people partially lose the money invested. To prevent this from
happening, regulators can consider employing stricter laws that either forbid this kind of
promotion or prescribe rules that ensure that investors can see the disclaimer more easily.
Regulators should also check for the legality of the spam messages more often, as a
disclaimer is not always in the e-mails. Luckily recently the SEC is tracking these fraudsters
down. 2 Moreover, the SEC suspended the trading of 379 dormant shell companies before
they could be hijacked by fraudsters and used to harm investors in pump and dump schemes.
3
It seems like the SEC is realizing that stock spam is a problem and that it has implications
for investors.
5.1 Limitations
In this thesis, I talk about spammed, touted and promoted stocks. These terms all mean the
same. Officially, e-mail is spam when it is unsolicited e-mail, meaning that a person does not
want to receive the e-mail. It can be the case that an e-mail that promotes a stock is
unsolicited e-mail. In legal terms, this is called spam. It can also be the case that it is
solicited, meaning that a person subscribed to a stock picking newsletter on the Internet. The
sample contains both unsolicited e-mails and solicited e-mails. I subscribed to a lot of stock
picking newsletters so that I could gather the sample. Most of the e-mail in the sample comes
from these subscriptions. The e-mails from these subscriptions are all e-mails that have a
legal disclaimer stating that the e-mail is stock promotion. Unfortunately, it is hard to
recognize the two different types of e-mail programmatically. In this research, we will treat it
as one sample.
Moreover when it comes to historical trading data in stocks trading days may be missing
because of trading halts or simply because of the fact that the data provider misses the data. I
excluded the stocks that miss too much data from the sample. Almost all of the historical
trading had gaps. However, for the sample enough consecutive trading days could be
2
SEC Charges Florida Stock Scheme Mastermind and 10 Cohorts
(http://sec.gov/news/press/2012/2012-82.htm)
3
SEC Microcap Fraud-Fighting Initiative Expels 379 Dormant Shell Companies to Protect Investors
From Potential Scams (http://www.sec.gov/news/press/2012/2012-91.htm)
21
Running head: STOCK SPAM ON US SMALL CAP STOCKS
extracted. Also, in this research sub-penny stocks have not been taken into account. These are
stocks that trade below $0,01. Historical data for sub-pennies sometimes is not available at
all. On sub-penny stocks where historical data was available, the closing prices were rounded
to a precision of 2 digits, rendering the data useless.
Furthermore, the historical trading data might be incorrect and contain incorrect closing
prices. Unfortunately, this is hard to correct for automatically. Reverse stock splits and stock
splits are accounted for. It must be mentioned however, that in historical data the adjusted
closing prices can have data points where (reverse) stock splits are not accounted for. This
could create unreliable results.
Lastly, the estimation window used in the event study is 150 trading days. For reliability, it
would be better to use 200 trading days for the estimation window. The problem is however,
that the number of stocks in the sample that are testable would shrink significantly using this
window, as not enough historical trading data is available. Furthermore, the standard errors of
the regressions are relatively high. This means that when one would test the regressions with
a t-test for reliability only 6% would be reliable. This is a strong indicator that a CAPM as a
model does not work well for the illiquid small cap companies in this sample. In future work,
another method or model can be used to compensate for this.
5.2 Future work
In future work, it might be interesting to research how the compensation that stock spammers
receive relates to the return on the stock. Does a bigger compensation mean a higher stock
return? When shares in the spammed company are used as a mean of compensation, is the
stock return higher or lower in that case?
Another interesting addition to this research is to use another model for the estimation
window. In this research, a simple linear regression using S&P 500 index returns is used.
There are more advanced models to estimate stock returns, such as Fama French three factor
model. The Fama French model takes country-specific factors into account (1992) and the
model also takes small market capitalization minus big market capitalization and high minus
low book-to-market ratio as two additional factors into account (1993). Furthermore, in
previous research a group of control stocks is used to test the difference between the returns
in a promoted stock and the returns in the control stocks (Hanke & Hauser, p59). Both could
be more reliable than the CAPM model used in this research.
22
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Thirdly, stock spammers or promoters use a name to identify themselves. It might be
interesting to identify using the spam e-mails if certain stock spammers are better at
promoting stocks than others, and if so, if conclusions can be drawn on the number of people
they reach or the persuasive text writing ability of the stock spammers.
Lastly, it is possible to gather data about the short float of a stock. The short float simply
indicates how many stocks there are available for shorting at a certain broker. If this data can
be obtained in a time series, it might be interesting to research whether there is a strong
correlation between the number of stocks that are available for shorting and the returns of the
respective stocks within the event-window.
23
Running head: STOCK SPAM ON US SMALL CAP STOCKS
References
Aggarwal, R. and Wu, G. (2003) Stock Market Manipulation - Theory and Evidence,
Working Paper.
Allen, F. and Gale, D. (1992) Stock-Price Manipulation, Review of Financial Studies, 5, 503529.
Bohme, R. and Holz, T. (2006) The Effect of Stock Spam on Financial Markets, Working
Paper.
Bouraoui, T. (2011) The Impact of Stock Spams on Volatility, Applied Financial Economics,
21:13, 969-977.
Fama, F. and French, R. (1992) The Cross-Section of Expected Stock Returns, Journal of
Finance, 47 - 2, 427-465.
Fama, F. and French, R. (1993) Common Risk Factors in the Returns on Stocks and Bonds,
Journal of Financial Economics, 33 – 1, 3-56.
Frieder, L. and Zittrain, J. (2007), Spam Works: Evidence from Stock Touts and
Corresponding Market Activity, Berkman Center Research Publication, No. 2006-11(2007).
Hanke, M. and Hauser, F. (2006) On the effects of stock spam e-mails, Journal of Financial
Markets, 11, 57–83.
Koski, J. L. (1998) Measurement effects and the variance of returns after stock splits and
stocks dividends, The Review of Financial Studies, 11, 143–62.
24
Running head: STOCK SPAM ON US SMALL CAP STOCKS
Appendix A – Python scripts
A.1 Keyword analysis
#!/usr/bin/env python
# coding: utf-8
import imaplib
import base64
import email
from BeautifulSoup import BeautifulSoup
import re
import os
import glob
MSGS_DIR = './messages/'
tickers = open('tickers.txt', 'r').readlines()
pattern_uid = re.compile('\d+ \(UID (?P<uid>\d+)\)')
def parse_uid(data):
match = pattern_uid.match(data)
return match.group('uid')
def extract_words(raw_email):
message = email.message_from_string(raw_email)
maintype = message.get_content_maintype()
raw_message = ''
if maintype == 'multipart':
types = []
for part in message.get_payload():
if not part.get_content_subtype() in types:
types.append(part.get_content_subtype())
if 'html' in types:
25
Running head: STOCK SPAM ON US SMALL CAP STOCKS
for part in message.get_payload():
if part.get_content_subtype() == 'html':
raw_message = part.get_payload(None, True)
else:
for part in message.get_payload():
if part.get_content_subtype == 'plain':
raw_message = part.get_payload(None, True)
else:
raw_message = message.get_payload()
lines = BeautifulSoup(raw_message,
convertEntities=BeautifulSoup.HTML_ENTITIES
).findAll(
text=lambda text:
text.parent.name != "script" and
text.parent.name != "style"
)
clean_msg = ''
words = []
for line in lines:
if not line.startswith('DOCTYPE'):
if len(line.strip()) > 0:
clean_msg += line.strip() + ' '
if len(clean_msg) > 0:
words = re.findall(r'[\w\d]+', clean_msg)
words = [word.lower() for word in words]
return words
USER = ''
PASSWORD = ’'
conn = imaplib.IMAP4_SSL('imap.gmail.com', 993)
conn.login(USER, base64.b64decode(PASSWORD))
conn.select('[Gmail]/All Mail')
26
Running head: STOCK SPAM ON US SMALL CAP STOCKS
spammed_count = {}
for ticker in tickers:
ticker = ticker.strip()
status, subjects = conn.search(None, 'SUBJECT', '"'+ str(ticker) +'"')
subjects = subjects[0].split()
status, bodies = conn.search(None, 'BODY', '"' + str(ticker) + '"')
bodies = bodies[0].split()
matches = list(set(subjects + bodies))
spammed_count[ticker] = len(matches)
frequencies = spammed_count.values()
frequencies.sort(reverse=True)
top_tickers = {}
for frequency in frequencies[0:10]:
for key, value in spammed_count.items():
if value == frequency and key not in top_tickers.keys():
top_tickers[key] = value
results = open('results.html', 'w+')
for ticker in top_tickers.keys():
status, subjects = conn.search(None, 'SUBJECT', '"'+ str(ticker) +'"')
subjects = subjects[0].split()
status, bodies = conn.search(None, 'BODY', '"' + str(ticker) + '"')
bodies = bodies[0].split()
matches = list(set(subjects + bodies))
words = []
for mail_id in matches:
status, uid = conn.fetch(mail_id, '(UID)')
uid = parse_uid(uid[0])
if os.path.exists(MSGS_DIR + str(uid) + '.txt'):
27
Running head: STOCK SPAM ON US SMALL CAP STOCKS
words += (
extract_words(open(MSGS_DIR + str(uid) + '.txt', 'r').read())
)
frequencies = {}
blacklist_words = set(['the', 'with', 'a', 'are',
'that', 'in', 'of', 'to', 'and', 'an', 'at', 'be', 'this', 'for', 'is',
'or', 'on', 'any', 'at', 'as', 'by', 'it', 'which', 's', 'its', 'from', '0', '1']
)
for unique_word in set(words) - blacklist_words:
if frequencies.get(words.count(unique_word), None) is None:
frequencies[words.count(unique_word)] = []
if unique_word.strip() != '':
frequencies[words.count(unique_word)].append(unique_word)
keys = frequencies.keys()
keys.sort(reverse=True)
top_ten = []
top_num = 11
for number in keys:
top_ten.append(frequencies[number][0:(min(frequencies[number].__len__(), top_num))])
top_num -= frequencies[number].__len__()
if top_num < 0:
break
results.write('<table style="border:1px solid black;bordercollapse:collapse;margin:0px;padding:0px;"><tr><td style="border-top:1px solid black;">' +
str(ticker) +'</td></tr>')
for row in top_ten:
if len(row) > 0:
results.write('<tr><td style="border-top:1px solid black;">' + str(row[0]) + '</td></tr>')
results.write('</table>')
results.close()
conn.close()
28
Running head: STOCK SPAM ON US SMALL CAP STOCKS
conn.logout()
29
Running head: STOCK SPAM ON US SMALL CAP STOCKS
A.2 Quote downloader
#!/usr/bin/env python
# coding: utf-8
import urllib2
y_response = ''
QUOTES_DIR = './quotes/'
tickers = open('tickers.txt', 'r').readlines()
for ticker in tickers:
ticker = ticker.strip()
y_response = ''
g_response = ''
print "Trying to download ticker information for " + str(ticker)
try:
y_response = urllib2.urlopen(
'http://ichart.finance.yahoo.com/table.csv' +
'?s=' + ticker + '&a=1&b=1&c=1900&d=31&e=12&f=2012&g=d&' +
'ignore=.csv').readlines()
except:
pass
try:
g_response = urllib2.urlopen(
'http://www.google.com/finance/historical?q=OTC:' + ticker + '&output=csv'
).readlines()
except:
try:
g_response = urllib2.urlopen(
'http://www.google.com/finance/historical?q=PINK:' + ticker +
'&output=csv'
30
Running head: STOCK SPAM ON US SMALL CAP STOCKS
).readlines()
except:
pass
if max(len(y_response), len(g_response)) > 1:
print "Data found, flushing to disk.."
print
f = open(QUOTES_DIR + ticker + '.txt', 'w+')
if len(y_response) >= len(g_response):
f.writelines(y_response)
else:
f.writelines(g_response)
f.close()
31
Running head: STOCK SPAM ON US SMALL CAP STOCKS
A.3 Quote organizer, add returns
#!/usr/bin/env python
# coding: utf-8
import glob
from decimal import Decimal
QUOTES_DIR = './quotes/'
CLEANED_DIR = './cleaned_quotes/'
for file in glob.glob(QUOTES_DIR + '*.txt'):
quotes = open(file, 'r')
print file
lines = quotes.readlines()
# Determine which field we need to look for
# for the closing price
index = None
if lines[0].split(',').__len__() == 7:
index = 6
else:
index = 4
prices = lines[1:]
prices.reverse()
prev_close = None
prev_volume = None
# Filter out subpennies with no price information
sub_penny_test = []
for price in prices:
sub_penny_test.append(price.split(',')[index])
32
Running head: STOCK SPAM ON US SMALL CAP STOCKS
if list(set(sub_penny_test)).__len__() == 1:
continue
# Create a new ticker file in the cleaned_quotes dir
new_ticker = open(CLEANED_DIR + file.split('/')[2][0:-4] + '.txt', 'w+')
for price in prices:
s_price = price.split(',')[index]
s_volume = price.split(',')[5]
if prev_close == None:
prev_close = s_price
if prev_volume == None:
prev_volume = s_volume
if int(prev_volume.strip()) == 0:
v_return = Decimal(0)
else:
v_return = (Decimal(s_volume) - Decimal(prev_volume)) / Decimal(prev_volume)
if prev_close.strip() == '0.00':
s_return = Decimal(0)
else:
s_return = (Decimal(s_price) - Decimal(prev_close)) / Decimal(prev_close)
new_ticker.write(price.strip() + ',' + str(s_return) + ',' + str(v_return) + "\n")
prev_close = s_price
prev_volume = s_volume
new_ticker.close()
33
Running head: STOCK SPAM ON US SMALL CAP STOCKS
A.4 Event study for specific event window
#!/usr/bin/env python
# coding: utf-8
import glob
from datetime import datetime, timedelta
from scipy import stats, std
import imaplib
import base64
import email
import math
from time import sleep
# Thesis library
from thesislib import *
# S&P 500 index data with returns and volume returns
sp500 = open('^DJUSS.txt', 'r').readlines()
# Connect to Gmail
conn = imaplib.IMAP4_SSL('imap.gmail.com', 993)
conn.login(USER, base64.b64decode(PASSWORD))
conn.select('[Gmail]/All Mail')
filename = '[-' + str(pre_event_window) + ',' + str(post_event_window) + '].txt'
f = open(filename, 'w+')
f.write('ticker, cumstockreturn_test, cumstockreturns_sig, car_stock, cumstockvol_test,
cumvolreturns_sign, car_vol' + "\n")
# Statistics
total_tested = 0
# The abnormal stock returns for each company
34
Running head: STOCK SPAM ON US SMALL CAP STOCKS
cumulative_abnormal_stock_returns = []
# The abnormal vol returns for each company
cumulative_abnormal_vol_returns = []
sig_return_negative = 0
sig_return_positive = 0
sig_vol_negative = 0
sig_vol_positive = 0
for file in glob.glob(CLEANED_DIR + '*.txt'):
# Read all the lines into prices
prices = open(file, 'r').readlines()
# Determine which date format the file uses.
if prices[0].split(',').__len__() == 8:
format = '%d-%b-%y'
else:
format = '%Y-%m-%d'
# Hash with stock returns
stockreturns = {}
# Hash with volume returns
volumereturns = {}
# List containing all the found datestrings for this stock ticker data file.
unique_dates = []
for price in prices:
# For each unique data line, extract the date string.
datestring = datetime.strptime(price.split(',')[0], format).date().__str__()
# Add the datestring to unique_dates
unique_dates.append(datestring)
# Add the stock return for this datestring to the stockreturns hash
35
Running head: STOCK SPAM ON US SMALL CAP STOCKS
stockreturns[datestring] = float(price.split(',')[-2].strip())
# Add the volume return for this datestring to the volumereturns hash
volumereturns[datestring] = float(price.split(',')[-1].strip())
sp500returns = {}
sp500volumereturns = {}
found_dates = []
for line in sp500:
# For each unique data sp500 line, extract the date string.
datestring = datetime.strptime(line.split(',')[0], '%Y-%m-%d').date().__str__()
# Add the stock return for this datestring to the sp500returns hash
sp500returns[datestring] = float(line.split(',')[-2].strip())
# Add the volume return for this datestring to the sp500returns hash
sp500volumereturns[datestring] = float(line.split(',')[-1].strip())
# If this sp500 datestring is in the stock datestrings list, append it.
if datestring in unique_dates:
found_dates.append(datestring)
else:
# Append None when this trading day is not found
found_dates.append(None)
# This algo puts the consecutive trading days date strings into dates_buffer_two
dates_buffer_one = []
dates_buffer_two = []
for date in found_dates:
if date != None:
dates_buffer_one.append(date)
else:
if dates_buffer_one.__len__() > dates_buffer_two.__len__():
dates_buffer_two = dates_buffer_one
dates_buffer_one = []
else:
dates_buffer_one = []
36
Running head: STOCK SPAM ON US SMALL CAP STOCKS
# Retrieve the ticker name
ticker = file.replace(CLEANED_DIR, '')
ticker = ticker[0:-4]
# Search in body and subject for the ticker
try:
status, subjects = conn.search(None, 'SUBJECT', '"'+ str(ticker) +'"')
subjects = subjects[0].split()
status, bodies = conn.search(None, 'BODY', '"' + str(ticker) + '"')
bodies = bodies[0].split()
except:
try:
print 'connection fail, retry'
sleep(2)
status, subjects = conn.search(None, 'SUBJECT', '"'+ str(ticker) +'"')
subjects = subjects[0].split()
status, bodies = conn.search(None, 'BODY', '"' + str(ticker) + '"')
bodies = bodies[0].split()
except:
print 'connection error search'
continue
# Uniquefy the matches
matches = list(set(subjects + bodies))
# Sort the matches
matches.sort()
# Let's walk through the e-mails until we have a
# event date that lies within the historical data we have.
event_date = None
for match in matches:
try:
status, mail = conn.fetch(match, '(RFC822)')
except:
37
Running head: STOCK SPAM ON US SMALL CAP STOCKS
try:
print 'connection fail, retry'
sleep(2)
status, mail = conn.fetch(match, '(RFC822)')
except:
print 'connection error fetch'
continue
raw_email = mail[0][1]
message = email.message_from_string(raw_email)
email_date = email.utils.parsedate(message['Date'])
from_addr = email.utils.parseaddr(message['From'])[1]
# Blacklist froms that are not spammers.
if from_addr in BLACKLIST:
continue
event_date = datetime(email_date[0], email_date[1], email_date[2])
# If the event_date lies out of the historical data range, we can continue to the next mail.
if event_date > datetime.strptime(dates_buffer_two[-1], '%Y-%m-%d') or event_date <
datetime.strptime(dates_buffer_two[0], '%Y-%m-%d'):
event_date = None
continue
# If the event date lies in our historical data range, but we can't find it,
# it's probably not a trading day. We look forward until we have a trading day.
elif not event_date.date().__str__() in dates_buffer_two:
# The event_date is probably a weekend and we need to look forward until we have a
trading day.
test_date = event_date
while (not test_date.date().__str__() in dates_buffer_two) or
(stockreturns.get(test_date.date().__str__()) == None):
test_date += timedelta(days=1)
if test_date > datetime.strptime(dates_buffer_two[-1], '%Y-%m-%d'):
event_date = None
38
Running head: STOCK SPAM ON US SMALL CAP STOCKS
break
if event_date == None:
continue
event_date = test_date
if not dates_buffer_two.index(event_date.date().__str__()) < (estimation_window + 20):
break
else:
event_date = None
continue
# When the event_date lies within range and it is a trading day, we have to check if we
have enough
# historical data, otherwise, we continue to the next email.
elif dates_buffer_two.index(event_date.date().__str__()) < (estimation_window + 20):
event_date = None
continue
# Check if we really have a return for this event_date.
elif stockreturns.get(event_date.date().__str__()) == None:
event_date = None
continue
# When all is a great succes, we break out of the loop and we have our event_date.
else:
break
# When no event_date can be found for this ticker,
# or when the last check fails, that is, the last check
# that checks whether we have enough data for the post_event_window.
if event_date is None or dates_buffer_two.index(dates_buffer_two[-1]) dates_buffer_two.index(event_date.date().__str__()) <= post_event_window:
print 'Skipping ', ticker
continue
print 'Testing ', ticker
39
Running head: STOCK SPAM ON US SMALL CAP STOCKS
event_index = dates_buffer_two.index(event_date.date().__str__())
estimation_window_last_index = event_index - 21
estimation_window_first_index = estimation_window_last_index - (estimation_window - 1)
# This is a test to check integrity.
assert (estimation_window_last_index - estimation_window_first_index) ==
estimation_window - 1
# STOCK RETURNS
x = []
y = []
for i in range(estimation_window_first_index, estimation_window_last_index + 1):
y.append(stockreturns[dates_buffer_two[i]])
x.append(sp500returns[dates_buffer_two[i]])
assert len(x) == estimation_window
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
abnormal_returns = []
k=0
for i in range(event_index - pre_event_window, event_index + post_event_window + 1):
k += 1
expected_return = intercept + (gradient * sp500returns[dates_buffer_two[i]])
abnormal_return = stockreturns[dates_buffer_two[i]] - expected_return
abnormal_returns.append(abnormal_return)
# Integrity check for number of days checked
assert k == (pre_event_window + post_event_window + 1)
mean = sum(abnormal_returns) / len(abnormal_returns)
test = mean / ( std(abnormal_returns) / math.sqrt(len(abnormal_returns)))
40
Running head: STOCK SPAM ON US SMALL CAP STOCKS
sig = None
if test <= -1.96 or test >= 1.96:
sig = True
else:
sig = False
if sig == True:
if sum(abnormal_returns) < 0:
sig_return_negative += 1
else:
sig_return_positive += 1
cumulative_abnormal_stock_returns.append(sum(abnormal_returns))
# VOLUMES
abnormal_returns = []
a = []
b = []
for i in range(estimation_window_first_index, estimation_window_last_index + 1):
b.append(volumereturns[dates_buffer_two[i]])
a.append(sp500volumereturns[dates_buffer_two[i]])
vol_gradient, vol_intercept, vol_r_value, vol_p_value, vol_std_err = stats.linregress(a, b)
k=0
for i in range(event_index - pre_event_window, event_index + post_event_window + 1):
k += 1
expected_return = vol_intercept + (vol_gradient *
sp500volumereturns[dates_buffer_two[i]])
abnormal_return = volumereturns[dates_buffer_two[i]] - expected_return
abnormal_returns.append(abnormal_return)
# Integrity check for number of days checked
41
Running head: STOCK SPAM ON US SMALL CAP STOCKS
assert k == (pre_event_window + post_event_window + 1)
mean = sum(abnormal_returns) / len(abnormal_returns)
test_vol = mean / ( std(abnormal_returns) / math.sqrt(len(abnormal_returns)))
sig_vol = None
if test_vol <= -1.96 or test_vol >= 1.96:
sig_vol = True
else:
sig_vol = False
cumulative_abnormal_vol_returns.append(sum(abnormal_returns))
if sig_vol == True:
if sum(abnormal_returns) < 0:
sig_vol_negative += 1
else:
sig_vol_positive += 1
f.write(ticker + ', ' + str(test) + ',' + str(sig) + ',' + str(cumulative_abnormal_stock_returns[1]) + ',' + str(test_vol) + ',' + str(sig_vol) + ',' + str(cumulative_abnormal_vol_returns[-1]) +
"\n")
f.flush()
total_tested += 1
print "Total tested ", total_tested
mean = sum(cumulative_abnormal_stock_returns) / len(cumulative_abnormal_stock_returns)
test_cum_return = mean / (std(cumulative_abnormal_stock_returns) /
math.sqrt(len(cumulative_abnormal_stock_returns)))
sig = None
if test_cum_return <= -1.96 or test_cum_return >= 1.96:
sig = True
42
Running head: STOCK SPAM ON US SMALL CAP STOCKS
else:
sig = False
if sum(cumulative_abnormal_stock_returns) < 0:
t = 'negative'
else:
t = 'positive'
f.write("Test cum stock returns for all: " + str(test_cum_return) + ', ' + str(sig) + ', ' + t + "\n")
mean = sum(cumulative_abnormal_vol_returns) / len(cumulative_abnormal_vol_returns)
test_cum_vol = mean / (std(cumulative_abnormal_vol_returns) /
math.sqrt(len(cumulative_abnormal_vol_returns)))
sig = None
if test_cum_vol <= -1.96 or test_cum_vol >= 1.96:
sig = True
else:
sig = False
if sum(cumulative_abnormal_vol_returns) < 0:
t = 'negative'
else:
t = 'positive'
f.write("Test cum vol returns for all: " + str(test_cum_vol) + ', ' + str(sig) + ', ' + t + " \n")
f.write("Total tested: " + str(total_tested) + "\n")
f.write("\n")
f.write("Total sig return negative: " + str(sig_return_negative) + "\n")
f.write("Total sig return positive: " + str(sig_return_positive) + "\n")
f.write("Total sig vol negative: " + str(sig_vol_negative) + "\n")
f.write("Total sig vol positive: " + str(sig_vol_positive) + "\n")
f.close()
43
Running head: STOCK SPAM ON US SMALL CAP STOCKS
A.5 Event study settings file
# Blacklist e-mail addresses that are not spammers/promoters.
BLACKLIST = [
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]'
]
# Estimation window for event study
estimation_window = 150
# Pre-event window
pre_event_window = 15
# Post-event window
post_event_window = 0
# Mail connection settings
USER = ‘’
PASSWORD = ‘'
# The dir where the quotes with stock and volume returns are stored
CLEANED_DIR = './cleaned_quotes/'
44
Running head: STOCK SPAM ON US SMALL CAP STOCKS
A.6 E-mail to disk sync
#!/usr/bin/env python
# coding: utf-8
import imaplib
import base64
import email
import re
import os
pattern_uid = re.compile('\d+ \(UID (?P<uid>\d+)\)')
BLACKLIST = [
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]'
]
def parse_uid(data):
match = pattern_uid.match(data)
return match.group('uid')
MSGS_DIR = './messages/'
USER = ‘’
PASSWORD = ‘'
conn = imaplib.IMAP4_SSL('imap.gmail.com', 993)
conn.login(USER, base64.b64decode(PASSWORD))
conn.select('[Gmail]/All Mail')
status, data = conn.search(None, 'ALL')
for mail_id in data[0].split():
status, mail = conn.fetch(mail_id, '(RFC822)')
45
Running head: STOCK SPAM ON US SMALL CAP STOCKS
raw_email = mail[0][1]
message = email.message_from_string(raw_email)
from_addr = email.utils.parseaddr(message['From'])[1]
if from_addr.strip() not in BLACKLIST:
status, uid = conn.fetch(mail_id, '(UID)')
uid = parse_uid(uid[0])
if not os.path.exists(MSGS_DIR + str(uid) + '.txt'):
f = open(MSGS_DIR + str(uid) + '.txt', 'w+')
f.write(raw_email)
f.close()
print 'Downloading msg UID ' + str(uid)
else:
print 'Skipping msg UID ' + str(uid)
conn.close()
conn.logout()
46
Running head: STOCK SPAM ON US SMALL CAP STOCKS
A.7 Ticker extraction
#!/usr/bin/env python
# coding: utf-8
import glob
import email
from BeautifulSoup import BeautifulSoup
import re
OTC_LIST = 'allotcbb_otherotc.txt'
f = open(OTC_LIST, 'r').readlines()[1:]
valid_tickers = []
for line in f:
valid_tickers.append(line.split('|')[0])
# Tickers that do not occur in the list, but that are actually traded.
valid_tickers.append(['SCEY', 'HELI', 'CIST', 'MMOG', 'CLXM'])
MSGS_DIR = './messages/'
tickers = []
for file in glob.glob(MSGS_DIR + '*.txt'):
raw_email = open(file, 'r').read()
message = email.message_from_string(raw_email)
words = re.findall(r'[\w\d]+', message['Subject'])
for word in words:
if word.isalpha() and word.isupper() and len(word) in [4, 5] and \
word in valid_tickers:
tickers.append(word)
tickers = list(set(tickers))
47
Running head: STOCK SPAM ON US SMALL CAP STOCKS
f = open('tickers.txt', 'w+')
for ticker in tickers:
f.write(ticker + "\n")
f.close()
48
Running head: STOCK SPAM ON US SMALL CAP STOCKS
A.8 Summary statistics for the tickers in the subjects
#!/usr/bin/env python
# coding: utf-8
import glob
import email
import re
MSGS_DIR = './messages/'
OTC_LIST = 'allotcbb_otherotc.txt'
f = open(OTC_LIST, 'r').readlines()[1:]
valid_tickers = []
for line in f:
valid_tickers.append(line.split('|')[0])
# Tickers that do not occur in the list, but that are actually traded.
valid_tickers.append(['SCEY', 'HELI', 'CIST', 'MMOG', 'CLXM'])
ticker_count = 0
email_count = 0
for message in glob.glob(MSGS_DIR + '*.txt'):
raw_email = open(message, 'r').read()
message = email.message_from_string(raw_email)
words = re.findall(r'[\w\d]+', message['Subject'])
for word in words:
if word.isalpha() and word.isupper() and len(word) in [4, 5] and \
word in valid_tickers:
ticker_count += 1
email_count += 1
49
Running head: STOCK SPAM ON US SMALL CAP STOCKS
print ticker_count
print email_count
50
© Copyright 2026 Paperzz