Lessons Learned from The Sims Online.

Scaling The
Software Development Process:
Lessons Learned from
The Sims Online
Greg Kearney,
Larry Mellon, Darrin West
Spring 2003, GDC
1
Talk Overview
• Covers: Software Engineering techniques to
help when projects get big
– Code structure
– Work processes (for programmers)
– Testing
• Does Not Cover:
– Game Design / Content Pipeline
– Operations / Project Management
2
How to Apply it.
•
•
•
•
We didn’t do all of this right away
Improve what you can
Don’t change too much at once
Prove that it works, and others will take up
the cause
• Iterate
3
Match Process to Scale
+tve
Team
Efficiency
0
Change to a
new process
“Everything’s
Broken Hell”
Team Size
“Meeting Hell”
Process for 5 to 15 programmers
Process for 30 to 50 programmers
4
What You Should Leave With
• TSO “Lessons Learned”
– Where we were with our software process
– What we did about it
– How it helped
• Some Rules of Thumb
– General practices that tend to smooth software
development @ scale
– Not a blueprint for MMP development
– Useful “frame of reference”
5
Classes of
Lessons Learned & Rules
• Architecture / Design: Keep it Simple
– Minimizing dependencies, fatal couplings
– Minimizing complexity, brittleness
• Workspace Management: Keep it Clean
– Code and directory structure
– Check in and integration strategies
• Dev. Support Structure: Make it Easy, Prove it
– Testing
– Automation
-All of these had to change as we scaled up.
-They eventually exceeded the team’s ability to deal
with (using existing tools & processes).
6
Non-Geek Analogy
–Sharpen your tools.
–Clean up your mess.
–Measure twice, cut once.
–Stay with your buddy.
Bad flashbacks found at:
http://www.easthamptonhigh.org/cernak/
http://www.hancock.k12.mi.us/high/art/wood/index.html
7
Key Factors Affecting Efficiency
• High “Churn Rate”: large #coders
times tightly coupled code equaled
frequent breaks
– Our code had a deep
root system
– And we had a forest of
changes to make
“Big root ball” found at:
http://www.on.ec.gc.ca/canwarn/norwich/norsummary-e.html
8
Make It Smaller
Evolve
9
Key Factors Affecting Efficiency
• “Key Logs”: some issues were
preventing other issues from even
being worked on
10
Key Factors Affecting Efficiency
• A chain of single
points of failure
took out the entire
team
Login
Create an avatar
Enter a city
Buy a house
Enter a house
Buy the chair
Sit on a chair
11
So, What Did We Do
That Worked
• Switched to a logical architecture with less
coupling
• Switched to a code structure with fewer
dependencies
• Put in scaffolding to keep everyone working
• Developed sophisticated configuration
management
• Instituted automated testing
• Metrics, Metrics, Metrics
12
So, What Did We Do
That Didn’t?
•
•
•
•
Long range milestone planning
Network emulator(s)
Over engineered a few things (too general)
Some tasks failed due to:
– Not replanning, reviewing long tasks
– Not breaking up long tasks
• Coding standard changed part way through
• …
13
What we were faced with:
•
•
•
•
•
•
750K lines of legacy Windows code
Port it to Linux
Change from “multiplayer” to Client/Server
18 months
Developers must remain alive after shipping
Continuous releases starting at Beta
14
Go To Final
Architecture
ASAP
15
Go to final architecture ASAP
Multiplayer:
Client/Server:
Client
Sim
Here be
Sync
Hell
Client
Sim
Sim
Evolve
Client
Sim
Client
Sim
Nice
Undemocratic
Request/
Command
Client
Client
Client
16
Final Architecture ASAP:
“Refactoring”
• Decomposed into Multiple dll’s
– Found the Simulator
• Interfaces
• Reference Counting
• Client/Server subclassing
How it helped:
–Reduced coupling. Even reduced compile times!
–Developers in different modules broke each other less often.
–We went everywhere and learned the code base.
17
Final Architecture ASAP:
It Had to Always Run
• But, clients would not behave predictably
• We could not even play test
• Game design was demoralized
• We needed a bridge, now!
?
?
18
Final Architecture ASAP:
Incremental Sync
• A quick temporary solution…
– Couldn’t wait for final system to be finished
– High overhead, couldn’t ship it
• We took partial state snapshots on the
server and restored to them on the client
How it helped:
–Could finally see the game as it would be.
–Allowed parallel game design and coding
–Bought time to lay in the “right” stuff.
19
Final Architecture ASAP:
Null View
• Created Null View HouseSim on Windows
– Same interface
– Null (text output) implementation
How it helped
–No ifdef’s!
–Done under Windows, we could test this first step.
–We knew it was working during the port.
–Allowed us to port to Linux only the “needed” parts.
20
Final Architecture ASAP:
More “Bridges”
• HSB’s: proxy on Linux, pass-through to a
Windows Sim.
How it helped
–Could exercise Linux components before finishing HouseSim port.
–Allowed us to debug server scale, performance and stability issues early.
–Make best use of Windows developers.
–Allowed single platform development. Faster compiles.
• Disabled authentication, etc.
How it helped
–Could keep working even when some of the system wasn’t available.
21
Mainline *Must* Work!
22
If Mainline Doesn’t Work,
Nobody Works
• The Mainline source control branch *must*
run
• Never go dark: Demo/Play Test every day
• If you hit a bug, do you sync to mainline,
hoping someone else fixed it? Or did you
just add it?
–If mainline breaks for “only” an hour, the project loses a man-week.
–If each developer breaks the mainline “only” once a month, it is
broken every day.
23
Mainline must work:
Sniff Test
• Mainline was breaking for “simple” things.
– Features you “didn’t touch” (and didn’t test).
• Created an auto-test to exercise all core functions.
• Quick to run. Fun to watch. Checked results.
• Mandated that it pass before submitting code
changes.
• Break the build: “feed the pig”.
How it helped
–Very simple test. Amazing difference.
–Sometimes we got lazy and trusted it too much.
24
Mainline must work:
Stages to “Sandboxing”
1. Got it to build reliably.
2. Instituted Auto-Builds: email all on failure.
3. Used a “Pumpkin” to avoid duplicate mergetest cycles, pulling partial submissions,...
4. Used a Pumpkin Queue when we really got
rolling
How it helped
–Far fewer thumbs twiddled.
–The extra process got on some people’s nerves.
25
Mainline must work:
Sandboxing
5. Finally, went to per-developer branching.
–
–
–
–
Develop on your own branch.
Submit changes to an integration engineer.
Full Smoke test run per submission/feature.
If it worked, integrated to mainline in
priority order, or else it is bounced.
How it helped
–Mainline *always* runs. Pull any time.
–Releases are not delayed by partial features.
–No more code freezes going to release.
26
Support Structure
27
Background: Support Structure
• Team size placed design constraints on
supporting tools
– Automation: big win in big teams
– Churn rate: tool accuracy / support cost
• Types of tools
– Data management: collection / corrolation
– Testing: controlled, sync’ed, repeatable inputs
– Baselines: my bug, your bug, or our bug?
28
Overview: Support Structure
• Automated testing: designs to minimize
impact of churn rate
• Automated data collection / corrolation
– Distributed sytem == distributed data
– Dashboard / Esper / MonkeyWatcher
• Use case: load testing
– Controlled (tunable) inputs, observable results
– “Scale&Break”
29
Problem: Testing Accuracy
• Load & Regression: inputs must be
– Accurate
– Repeatable
• Churn rate: logic/data in constant motion
– How to keep testing client accurate?
• Solution: game client becomes test client
– Exact mimicry
– Lower maintenance costs
30
Test Client == Game Client
Test Client
Game Client
Test Control
Game GUI
State
State
Commands
Presentation Layer
Client-Side Game Logic
31
Game Client: How Much To Keep?
Game Client
View
Presentation Layer
Logic
32
What Level To Test At?
Mouse
Clicks
Game Client
View
Presentation Layer
Logic
Regression: Too Brittle (pixel shift)
Load: Too Bulky
33
What Level To Test At?
Game Client
View
Internal
Events
Presentation Layer
Logic
Regression: Too Brittle
(Churn Rate vs Logic & Data)
34
Semantic Abstractions
Basic gameplay changes less frequently
than UI or protocol implementations.
View
Logic
NullView Client
~¾
Presentation Layer
Buy Lot
Buy Object
…
Enter Lot
~¼
Use Object
35
Scriptable User Play Sessions
• Test Scripts: Specific / ordered inputs
– Single user play session
– Multiple user play session
• SimScript
– Collection: Presentation Layer
“primitives”
– Synchronization: wait_until, remote_command
– State probes: arbitrary game state
• Avatar’s body skill, lamp on/off, …
36
Scriptable User Play Sessions
• Scriptable play sessions: big win
– Load: tunable based on actual play
– Regression: walk a set of avatars thru
various play sessions, validating
correctness per step
• Gameplay semantics: very stable
– UI / protocols shifted constantly
– Game play remained (about) the same
37
Automated Test: Team Baselines
• Hourly “critical path” stability
tests
– Sync / clean / build / test
– Validate Mainline / Servers
• Snifftest weather report
– Hourly testing
– Constant reporting
38
How Automated Testing
Helped
• Current, accurate baseline for
developers
• Scale&break found many bugs
• Greatly increased stability
– Code base was “safe”
– Server health was known (and better)
39
Tools & Large Teams
• High tool ROI
– team_size * automation_savings
• Faster triage
– Quickly narrow down problem
– across any system component
• Monitoring tools became a focal point
• Wiki: central doc repository
40
Monitoring / Diagnostics
When you can measure what you are speaking about and
can express it in numbers, you know something about it.
But when you cannot measure it, when you cannot
express it in numbers, your knowledge is of a meager and
unsatisfactory kind." - Lord Kelvin
• DeMarco: You cannot control what you cannot
measure.
• Maxwell: To measure is to know.
• Pasteur: A science is as mature as its
measurement tools.
41
Dashboard
• System resource & health tool
– CPU / Memory / Disk / …
• Central point to access
– Status
– Test Results
– Errors
– Logs
– Cores
–…
42
Test Central / Monkey Watcher
• Test Central UI
– Control rig for developers & testers
• Monkey Watcher
–
–
–
–
–
Collects & stores (distributed) test results
Produces summarized reports across tests
Filters known defects
Provides baseline of correctness
Web frontend, unique IDs per test
43
Esper
• In-game profiler for a distributed
system
• Internal probes may be viewed
– Per process / machine / cluster
– Time view or summary view
• Automated data management
– Coders: add one line probe
– Esper: data shows up on web site
44
Use Case: Scale & Break
• Never too early to begin scaling
– Idle: keep doubling server processes
– Busy: double #users, dataset size
• Fix what broke, start again
• Tune input scripts using Beta data
45
Load Testing: Data Flow
Resource
Metrics
Load Testing Team
Debugging
Data
Client
Metrics
Load Control Rig
Test
Test
Test
Client
Client
Client
Test
Test
Test
Client
Client
Client
Test
Test
Test
Client
Client
Client
Test Driver CPU
Test Driver CPU
Test Driver CPU
Game
Traffic
System
Monitors
Server Cluster
Internal
Probes
Outline: Wrapup
•
•
•
•
Wins / Losses
Rules: Analysis & Discussion
Recommended reading
Questions
47
Process: Wins / Losses
• Wins
– Module decomposition
• Logical: client / server architecture
• Physical: code structure
– Scaffolding for parallel development
– Tools to improve workflow
– Automated Regression / Load
48
Process: Wins / Losses
• Losses
– Early lack of tools
– #ifdef as a cross-platform port
– Single points of failure blocked entire
development team
49
Not Done Yet:
More Challenges
• How to ship, and ship, and ship…
• How to balance infrastructure cleanup
against new feature development
• …
50
Rules of Thumb (1)
• KISS: software and processes
• Incremental changes
– <Inhale><Hold It><Exhale>
– <Say>:“Baby-Steps”
• Continual tool/process improvement
51
Rules of Thumb (2)
• Mainline has got to work
• Get something on the ground. Quickly.
52
Rules of Thumb (3)
•
•
•
•
•
Key Logs: break up quickly, ruthlessly
Scaffolding: keep others working
Do important things, not urgent things
Module separation (logically, physically)
If you can’t measure it, you don’t
understand it
53
Final Rule:
“Sharpen The Saw”
• Efficiency impacted by
– Component coupling / team size
– Compile / load / test / analyze cycle
• Tool Justification in large teams
– Large ROI @ large scale
– 5% gain across 30 programmers
– “Fred Brooks”: 31st programmer…
54
Recommended Reading
• Influences
– Extreme Programming
– Scott Meyers: large-scale software engineering
– Gamma et al: Design Patterns
• Caveat Emptor: slavish following not
encouraged
– Consider “ground conditions” for your project
55
Questions & Answers
56