Resource Management in Volunteer Computing Grids

Resource Management in
Volunteer Computing Grids
An analysis of the different
approaches to maximizing
throughput on a BOINC grid
Presented by Geoffrey Oxholm and Beata Chrulkiewicz
CS-575 Position Paper Presentation Fall 2007
Volunteer Grids
• A Type of Grid Computer
– Decentralized, volunteer nodes
• Supercomputing for free
– 1.1 PetaFLOPS vs. 360 TeraFLOPS
• Unreliable Nodes
– Users can disconnect their computers anytime
– Amount of donated resources is subject to change
– Evil jerks can upload malicious data
Image: http://www.di.unipi.it/groups/architetture/images/grid.gif
http://holistic.com.mt/h/?Page=Article&Ref=107
Berkeley Open Infrastructure for
Network Computing
• Duplicate work to ensure validity
– R – The “Redundancy Factor”
• Validate computation results. If the validation
fails, repeat computation.
– Validation Methods:
• Majority Voting
– More than R/2 nodes must agree
• M-First Voting
– First M nodes must agree
Image: http://en.wikipedia.org/wiki/Image:BOINC_logo_July_2007.png
Success and Limitations of BOINC
• With proper configuration high
throughput can be achieved
• Still quite difficult to get
volunteers
• Proper configuration is difficult
• Fixed configurations can not
account for constantly
changing grid characteristics
Image: http://www.baseacid.com/imagesRR/workBand.jpg
Fix: User Encouragement
Feedback and Reward
•
•
•
•
Each node generates statistics
Teams can be formed
Sense of pride in commitment
Encourages users to donate more time, resources
Team OCUK
Predictor@home
total credit.
Go team!
Image: http://teamocuk.com/cprojectcred1.php?p=PAH
Fix: Maximizing Configuration Through
Usage Simulation
•
•
•
•
Enumerate a set of possible configurations
Test configurations in a fraction of the time
Avoid disturbing volunteers by simulating
Zero in on an effective configuration
Image: http://www.cyberroach.com/tron/tron3_circuit.jpg
Fix: Dynamic Redundancy Through
Reliability Prediction
• Wait for a minimum number of nodes before
assigning work
• Choose nodes which have higher reliability
• Higher reliability means less need for
redundancy
• Successful completion yields higher reliability
rating for the node
Image: http://image.compusa.com/prodimages/44/8537c95c-8027-4840-b976-67deb0690e13.gif
Evaluation
• User Encouragement
– Encourages cheating
– Does nothing to maximize
efficient use of resources
• Usage Simulation
– Still requires researchers to configure system
– Static configuration fails to match dynamic grid
• Reliability Rating
– Subject to further exploitation
– Further minimizes the value of slow nodes, working
against incentives
Image: GPL Licensed
Conclusion
• Build on existing methods
– Continue to encourage users
– Create a starting point by using simulation
– Update reliability system to avoid conflict with
system of incentives
• Develop new technologies
– Blacklist malicious nodes
– Develop a more comprehensive reliability system
which uses past schedules to predict future
availability
Image: http://pixels.dessgeega.com/wp-content/uploads/2006/10/organize_big.gif
Questions?
Image: http://www.grid.phys.uvic.ca/
Geoff Oxholm
Beata Churkiewicz