Financial Information Grid - National e

FINGRID
RES-149-25-0028
All Hands Meeting,
31/08-3/09, 2004
Nottingham
Financial Information Grid –an
ESRC e-Social Science Pilot
Khurshid Ahmad
Department of Computing, University of Surrey;
Jon Nankervis
Department of Accountancy and Finance, University of Essex
FINGRID Project
The FINGRID project is a collaboration
between econometricians at Essex, computing
academics, particularly in grid computing and
artificial intelligence, at Surrey (plus financial traders).
 The FINGRID project aims to provide a solution
for the information management/ processing
challenge in social sciences: analysis and fusion
of distributed quantitative and qualitative data
and programs.
 FINGRID is the third project at Surrey that deals with qualitative
data (news and reports) and qualitative data (time series) EU
Projects ACE (1996-99), GIDA (2001-03).
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Objectives
Create a Grid environment based on
Open Grid Services Architecture to
provide a demonstrable software
application, for analysing financial
information in the form of quantitative
and qualitative data.
Evaluate the benefits of the Grid
approach.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Reflections
DAME (York): Engine Behaviour Timeseries + Reports in a controlled
language; Case-based Reasoning;
Belfast e-Science Centre: Value at
Risk Computation;
 MYGRID and MIAKT: Information
Extraction + Image Annotation
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Project Team
David Cheng, Research Officer, Text Analysis; (ESRC funded)
Tuğba Taşkaya-Temizel, Tutor, Grid Computing, Grid Architect;
Lee Gillam, Research Officer, Grid Implementation;
Pensiri Manumapousat, Research Student, Text Categorisation;
Saif Ahmad, Research Student, Wavelet Analysis;
Hayssam Trablousi, Research Student, Named Entity Extraction;
Ademola Popoula, Research Student, Fuzzy Logic Analysis;
Gary Dear, Computing Officer, Grid Implementation;
Khurshid Ahmad, Principal Investigator;
Jon Nankervis, Co-Investigator (Essex)
ESRC Funding: Fifty Thousand Pounds Sterling (Gross).
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
The Problem
Social science research requires the
capture and analysis of data that is
quantitative - numerical data - and
data that is qualitative - opinions
expressed in language or other sign
systems.
The fusion of multi-modal
information, is critical to social
sciences research.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
The Problem – Decision Making
Challenges:
Hypothesis formation and theory development in
financial and political economics,both by
researchers and financial traders, now involves
analysis of streaming time serial data and
financial and political news.
The Data:
Numerical data
Textual data
Time series
price/value movement of financial instruments;
c. 5MB/day, per
instrument
Text streams
different genres:
news items; financial reports; company
brochures; government documents
All-Hands Meeting, Nottingham, 3 September 2004
c. 20MB/day
Fingrid (RES-149-25-0028)
Streaming Time-serial Data
and News Service
STREAMING ECONOMIC/POLITICAL NEWSReuters; Yahoo; Bloomberg, BBC! Al Jazeera
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
The Problem – Decision Making
• Financial and political analysis requires data over short time
periods (daily) or longer time periods (5-10 years).
• This is large volume of data which requires instant
processing – much like data emerging from particle or gene
factories- except that the data is in two or more
modalities in our case.
• The financial/political analysis
requires access to data tombs
(archives) and data nurseries
(streaming news and time-series)
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
The Problem – Decision Making
• Decision making involves dealing with
factual news (who, where, what, when) and
news related to ‘market sentiment’
news
• Decision making involves dealing with
time-ordered data which lacks
stochastic stability and has
considerable variance changes.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Market Sentiment?
In addition to the very quantitative data
related to trading volumes and price
movements, the financial traders, and
increasingly economists, rely on market
sentiment.
Behaviour of the investors, security analysts,
and financial/monetary theoreticians, is
influenced by information other than market
data: investor credulity; herding 
sentiment analysis
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Market Sentiment? Motivation
Bounded Rationality
Herbert Simon(Nobel Prize in Economics 1978)
Rational Decision Making in Business Organisations:
Mechanisms of Bounded Rationality –failures of knowing all of the
alternatives, uncertainty about relevant exogenous events, and inability to
calculate consequences .
Daniel Kahneman (Nobel Prize in Economics 2002)
Maps of bounded rationality –intuitive judgement & choice:
Two generic modes of cognitive function: an intuitive mode: automatic and
rapid decision making; controlled mode deliberate and slower.
E-Economics? FINGRID?
Computing at the limits of rationality  distributed
multi-modal data analysis and fusion
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Market Sentiment,
Behavioral Psychology
Investor sentiment & stock market bubbles
has some causal relationship with:
1961
-tronics mania
1967
franchise and computer ‘crazies’
1983
high tech issues
2001
dot.com
Baker, M., & Wurgler, J. (2003). ‘Investor sentiment and cross-section of stock returns. Proc. Conf on Investor Sentiment.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Market Sentiment,
Quantitative Behavioral Psychology
 Investor sentiment can be affected by:







Closed-end fund discount (CEFD);
Turnover ratio (in NYSE for example) (TURN)
Number of Initial Public Offerings (N-IPO);
Average First Day Returns on R-IPO
Equity share S
Dividend Premium
Age of the firm, external finance, ‘size’(log(equity))…….
 A novel composite index:

Sentiment = -0.358CEFDt+0.402TURNt-1+0.414NIPOt
+0.464RIPOt+0.371 St-0.431Pt-1
A very complex non-linear regression on large
data sets – computed on monthly basis
Baker, M., & Wurgler, J. (2003). ‘Investor sentiment and cross-section of stock returns. Proc. Conf on Investor Sentiment.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Contribution
• Extraction of market sentiments using a ‘local
grammar’ of rise/fall, growth/decay coupled with
attributed and un-attributed news (rumours).
• Automatic analysis of terminology and ontology:
Financial Trading has 25 sub-domains.
• An integrated framework of time-series analysis (preprocessing, filtering, trend and seasonality, variance
change) using wavelet analysis and fuzzy-logic.
• Neural network based classifiers for classifying
streaming news.
• Implementation of a Grid-based solution and ‘daily’
market report service.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Fusing Qualitative and
Quantitative Data Analysis
 We have developed a Sentiment and Time Series:
Financial analysis system (SATISFI) for visualising
and correlating the sentiment and instrument time
series both as text (and numbers) and graphically as
well.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
What we need…
A common infrastructure:



for interoperability and reusability
for aggregating distributed
resources to create a single-source
computing power and provides
seamless access
which allows sharing geographically
distributed resources
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Is Grid Computing the
Solution?
IBM on Financial Grid
Computing:
Grid computing enables the
virtualisation of distributed
computing and data
resources
@ IBM “What is grid computing?” http://www-1.ibm.com/grid/about_grid/what_is.shtml
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Is Grid Computing the Solution?
GRID
Resource Sharing;
Collaboration: Financial Economics,
Sociology of Poverty, Policy Formation
Working with living data
 much Grid work relates to data tombs  social sciences with data
nurseries
 living data is unstable, incomplete, and requires at least two
interdependent modalities – one compensates for the other
 Software, including legacy, is in silos and its operation based on tradition.
Packages come with experts!
 ‘Home’ punters – everybody plays the market
Speed up –
simulations
factor of 5 in text analysis; 3-4 in Monte Carlo
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Infrastructure in
Surrey
A 24-node data and compute Grid interfaced to a ‘real world’ data
stream (Reuters News and Financial Time series Feed) for
capturing, analysing and fusing quantitative and ‘qualitative’ data.
Reuters Feed: 2
dedicated data
lines, PC and
Sun for feed
management
and associated
networking
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Infrastructure:
Reuters Financial Services Streaming Data and
News Service
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Architecture
A 3 tier Architecture
The first tier facilitates the client in sending a
request to one of the services: Text
Processing Service or Time Series
Service;
The second tier facilitates the execution of
parallel tasks in the main cluster and is
distributed to a set of slave machines
(nodes);
The third tier comprises the connection of the
slave machines to the data providers
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Architecture
Text and Time Series
Service
Distribute Tasks
Streaming
Textual Data
1
2
Send Service
Request
Notify user
about results
4
Receive Results
3
Main Cluster
Streaming
Numeric Data
GRID Cluster
Surrey Grid
24 Slaves
•Given an allocated task, the corresponding data is retrieved from
the data providers by the slave machines.
•The main cluster monitors the slave machines until they have
completed their tasks, and subsequently combines the interim
results.
•The final result is sent back to the client machine.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Technology
 Globus Toolkit 3.0 (based on Open Grid Services
Architecture (OGSA))
 Java CogKit (Java Commodity Grid) for resource
management
 Languages for Development  JAVA +
Reuters SSL Developer’s Kit
(Java) for the connection with the Reuters streaming data
 Applications Integrated:
 Existing statistical programs in FORTRAN
Matlab: JMatlink (adapted to Linux environment for the
communication with Matlab environment)
 Other Technologies:
 XML (NewsML) for the news information
 CGI for communication of Java Applet with the server side
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Services
News Analysis: service for extracting
MARKET SENTIMENT.
Correlation: Market sentiment correlation
with financial time series.
Bootstrapping: service for computing
standard errors, confidence intervals and
hypothesis testing by a simulation of the time
series or market sentiment series.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Service: Market
Sentiment
At one level market sentiment is often
expressed in news reports and
editorials, and ranges from views about
national economies to the imminent
take-overs, mergers and acquisitions
and from people leaving/joining an
organization to news about political and
economic successes and failures.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Market Sentiment
 Sentiments are expressed using metaphors.
 The metaphors, bullish and bearish, so-called animal
metaphors, refer to the aggressive or recessive (shy)
mood of the investors and perhaps of the traders.
 The sentiment words are typically used
metaphorically and in general are ambiguous (‘rose’
may be used in different contexts and indeed as a
proper noun).
 The local grammar reduces the ambiguity by
constraining the use of the sentiment words.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Market Sentiment
A finite state automata (local grammar), learnt by
our system, from a news corpus, for identifying
‘sentiments’ in free text unambiguously, was used
for extracting sentiment information.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Market Sentiment
A finite state automata (local grammar),
was learnt by our system, from a news
corpus, for identifying names of persons
and organisations in free text
unambiguously, was used for attributing
sentiment information to people and
organisations.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Case Studies & Results
 Text Analysis Service



For the Brown Corpus, the number of words processed per second
is similar to Hughes et al.: 7,120 versus 6,670 in a single CPU
system.
Our 2-node grid implementation shows a 98% gain of
performance, whereas Hughes et al. (SMP configuration, equivalent
to our 2-node grid) implementation shows a 27% gain.
Relative performance of the word frequency counting experiment
on the RCV1 corpus is lower than the Brown corpus - it is
necessary to parse the XML files prior to processing.
Words/s (1 machine)
Brown
7,120
RCV1
-
Words/s (2 machines)
14,091
5,334
Words/s (4 machines)
23,944
10,532
Words/s (8 machines)
31,453
14,590
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Case Studies & Results
Text Analysis Service

A Java program for sentiment extraction has been
developed.
Experiments on Reuters RCV1 corpus (2.3GB) were
conducted. Significant improvement on processing
time: 15.9 hours on a 4-node grid to 13.1 hours on a
8-node grid.
Text Analysis
Time required to
process a month news
with different
configurations
600
500
Time in seconds

400
300
200
100
0
1
2
4
8
# of machines
All-Hands Meeting, Nottingham, 3 September 2004
Text Analysis (process time in ms)
Fingrid (RES-149-25-0028)
FINGRID Service:
Fusing quantitative & qualitative information
 Time serial data related to financial instruments, for
example, currency, stocks, derivatives, often exhibit
nonstationarity.
 In order to extract long-term trends, seasonal variation, and the
random component, in a complex time-series, increasingly multiscale analysis and fuzzy-logic is used.
 The positive and negative sentiments related to a financial
instrument may be ordered as a time series.
 This sentiment series is then correlated with the movement of a
financial instrument.
 Such correlation can be used for prediction, or better still for
the analysis of (volatile) movements in the market.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Fusing Qualitative and
Quantitative Data Analysis
 We have developed a Sentiment and Time Series:
Financial analysis system (SATISFI) for visualising
and correlating the sentiment and instrument time
series both as text (and numbers) and graphically as
well.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
FINGRID Service:
Bootstrapping & Large-scale simulations
 Bootstrap method assumes that the observed data is a
representative of the unknown population.
 Bootstrap procedures are data-based simulation
methods that estimate the distribution of estimators by
re-sampling observed data.
 Statistical inferences obtained from distributions of simulated
data are reported to be more reliable than inferences gained
from asymptotic theory when the sample size is infinitely large
(MacKinnon 2002).
 Bootstrap tests and Monte Carlo tests are examples of
simulation-based tests.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Case Studies & Results

Simple Bootstrapping
2500
Time in se co n d s

Bootstrapping
Java-wrapped (Fortran) implementations of
bootstrapping algorithm.
processing time of the bootstrapping program with
different grid node configurations, starting from twonode to eight-node, was measured.
2000
1500
1000
500
0
1
2
4
8
# of machines
Bootstrap rep=500
When the number of
bootstrap replications
set to 1000, 1050
seconds was required
on a 2- node grid; and
404 seconds on a 8node grid
Bootstrap rep=1000
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Fusing Qualitative and
Quantitative Data Analysis
 We have developed a Sentiment and Time Series:
Financial analysis system (SATISFI) for visualising
and correlating the sentiment and instrument time
series both as text (and numbers) and graphically as
well.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Fusing Qualitative and
Quantitative Data Analysis
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Fusing Qualitative and
Quantitative Data Analysis
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Fusing Qualitative and
Quantitative Data Analysis
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Fusing Qualitative and
Quantitative Data Analysis
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Conclusion
 We have identified the following problems that may
cause performance degradation in a grid
environment:




The configurations of the machines: During the
distribution of tasks, we did not consider the configuration of
the machines  faster machines were idling while the rest
were still processing.
One common data source: Network latency occurs due to
the number of nodes using the same bandwidth to retrieve
files.
Amdahl’s law: Amdahl’s law is applicable to our grid,
where the fraction of code f, which cannot be parallelised,
affects speedup factor.
Program constraints: In the task distribution process, the
file size is not considered.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Conclusions
 The FinGrid project has achieved three major
objectives.



The project demonstrates how both quantitative and
qualitative data from multiple sources can be processed,
analysed, and fused.
It has raised considerable interest in the financial news
information market ( Ahmad et al. 2004).
Contribution in terms of improvements to goods and services
and financial software houses, and news vendors have
shown interest in the project.
 A Master’s level Grid Computing module has been
developed based on our experience in FinGrid.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)
Next Steps
Investigate and evaluate Condor-G, MPICH2
and OGSA-DAI for effective job management,
parallel processing and database
management.
Towards a knowledge grid
PARALLEL and DISTRIBUTED KNOWLEDGE DISCOVERY:
Continual analysis and fusion of text and numerical data both real-time and
historical data.
KNOWLEDGE GRID SERVICES:
KNOWLEDGE RETRIEVAL: Adapt information extraction methods and systems (e.g.
Surrey’s SYSTEM QUIRK) onto a GRID architecture for extended semantic analysis.
KNOWLEDGE MODELLING: Representation of non-stationary time series using Wavelet
Analysis, Neural Networks and Fuzzy Logic, such that the system learns from its past
experience.
All-Hands Meeting, Nottingham, 3 September 2004
Fingrid (RES-149-25-0028)