Distributed Computing

Distributed Computing
Utilize unused PC resources

Processing


Complex calculations
Load distribution
25% of storage is unused


SANs
100 computers 80gb drives = 6tb unused
Process Sharing Applications



For large-scale computations
Data analysis, data mining, scientific computing
Research Problems








SETI@Home
Folding@Home
distributed.net
Genome@Home
FightAIDS@Home
climate simulation
Economics
medicine
Distributed Computing



P2P is not distributed computing; similar challenges and
issues from: sharing and taking advantage of resources
available at endpoints and harnessing their power for
computationally intensive problems
SETI@home, fightaids@home, genome@home
Grid computing and e-science




Computational grids to solve/simulate real-life problems
E-Science
Commercial applications
United Devices, Entropia, Avaki, etc.
Distributed Computing
A central coordinator schedules tasks
on volunteer computers,
Master worker paradigm,
Cycle stealing

Dedicated Applications



Client application
Params. /results.
Production applications

Coordinator

Parameters


Internet


Volunteer
PC
Javelin, Bayanihan, JET,
Charlotte (based on Java),
Commercial Platforms

Volunteer PC
Downloads and executes
the application
Folding@home,
Genome@home,
Xpulsar@home,Folderol,
Exodus, Peer review,
Research Platforms

Volunteer
PC
SETI@Home, distributed.net,
Décrypthon (France)

Entropia, Parabon,
United Devices, Platform (AC)
Cycle Sharing Model




Chunks of data are sent to client in suspend mode
Data is processed by clients when client is not in use and returned to the master
Internet-based (Master-slave) computing
Example: SETI@Home scans radio telescope images
Master
Raw
Data
Processed
Data
Raw
Data
Processed
Data
Bob
Ted
Carol
Alice
Data Crunching
Data Crunching
Data Crunching
Data Crunching
SETI@HOME
• Launched In 1996
• Scientific experiment - uses Internet-connected computers
in the Search for Extraterrestrial Intelligence (SETI)
• Distributes a screen saver–based application to users
• Applies signal analysis algorithms different data sets to
process radio-telescope data.
• Has more than 3 million users - used over a million years of
CPU time to date
SETI@Home
Main Server
3. SETI client gets
data from server and runs
Client/Server
P2P
1. Install
Screen Server
4. Client sends results
back to server
Radio-telescope
Data
2. SETI client (screen
Saver) starts
Distributed Computing: SETI@home







Search for Extraterrestrial Intelligence that has over two million
computers crunching away and downloading data gathered from
the Arecibo radio telescope in Puerto Rico
The SETI@Home project is widely regarded as the fastest
computer in the world
In fact, the project has already performed the single largest
cumulative computation to date
From the architecture point of view Seti@Home is based upon
client-server
The centralised servers hold enormous amounts of data
gathered from the Arecibo radio telescope "listening" to the skies
That data needs to be analysed for distinct or unusual radio
waves that might suggest extraterrestrial communications
http://setiathome.ssl.berkeley.edu
SETI@Home

Search for
Extraterrestrial
Intelligence
Processing

Intel’s Netbatch
 10,000 workstations over 25 locations
 Chip design
 Shortened time for chip development
 Reduced outlay for new mainframes
 $500 million savings
Processing

Amerada Hess


Connects 200 Dell PCs to handle complex
seismic data interpretation
Allowed them to replace a pair of IBM
supercomputers.
“We’re running seven times the throughput at a
fraction of the cost.”
Richard Ross, CIO
Storage

Intel




Distribution of computer-based training
Prevents large downloads from central servers
Preserves bandwidth
Preserves expensive network storage
P2P Distributed Computing
Allows any node to play different roles (client, server, system
infrastructure)
Client (PC)
Server (PC)
accept
request
PC
PC
PC
result
PC
Client (PC)
request
result
PC
PC
P2P
system
provide
Potential
communications for
parallel applications
accept
PC
PC
Server (PC)
PC
provide
Request may be related to
Computations or data
Accept concerns
computation or data
A very simple problem statement but leading to a lot of research issues:
scheduling, security, message passing, data storage
Large Scale enlarges the problematic: volatility, confidence, etc.
“Three Obstacles
to Making P2P Distributed Computing Routine”
1) New approaches to problem solving

Data Grids, distributed computing, peer-to-peer,
collaboration grids, …
2) Structuring and writing programs

Abstractions, tools
Programming Problem
3) Enabling resource sharing across distinct institutions

Resource discovery, access, reservation, allocation;
authentication, authorization, policy; communication;
fault detection and notification; …Systems Problem
Credit: Ian Foster
P2P for Distributed Computing or Web Computing





The distributed computing P2P applications are highlighted by the use of
millions of Internet clients to analyze data looking for extraterrestrial life
(SETI@home http://setiathome.ssl.berkeley.edu/ ) and the
Newer project examining the folding of proteins ( Folding@home
http://www.stanford.edu/group/pandegroup/Cosm/ ).
These are building distributed computing solutions for a special class of
applications:
 Those that can be divided into a huge number of essentially independent
computations, and a central server system doles out separate work
chunks to each participating client.
 In the parallel computing community, these problems are called
"pleasingly or embarrassingly parallel".
This approach is included in the P2P category because the computing is
Peer based even though it does not have the "Peer only communication"
characteristic of all aspects of Gnutella and Napster for information transfer.
SETI@home and Folding@home are elegantly implemented as screen
savers that you download.
P2P space: Distributed Computing

Distributed Collaboration



Use under utilized Internet and/or network resources for improving
computation and data analysis
MetaComputing, CareScience, DataSynapse, Distributed.net,
DistributedScience, Entropia, Parabon, The Open Lab
Distributed Search Engines


Used to easily lookup and share files and offer content management
BearShare, Filetopia, Hotline Connect, InfraSearch, Plebio,
Jibe, LimeWire, MusicBrainz.org, NeuroGrid, NextPage, Redfoot,
Opencola, Project Pandango
Entropia Financial Modeling I
Entropia Financial Modeling II



Each basic financial instrument can be
calculated independently
Central Server interprets the total simulation
Make Money or Learn what causes market
swings or ….
Drug
Structure
Simulations
United Devices also does Drug
Simulation



Parameter Study: do billions of simulations – each
with different parameters
Search Engine like interface to simulation
Works as each calculation fits in a PC – a detailed
molecular model would usually not do this
Performance of Entropia Network
Server
Server
Server
Server
Server
Server
Peer to Peer P2P “Illusion” among collaborating clients
For Napster like Services or Collaboration