Of Scripts and Programs Tall tales, Urban Legends and Future

Of Scripts and Programs
Tall tales, Urban Legends and Future Prospects
Jan Vitek
S3lab, Purdue University
based on joint work with
Bard Bloom John Field IBM
Nate Nystrom U. Texas Arlington
Francesco Zappa Nardelli INRIA
Tobias Wrigstad Stockholm University
Brian Burg Nicholas Kidd Sylvain Lebresne Johan Östlund Gregor Richards Purdue
I want it tall
and strong
…and
cheap ...and
fast
Tall Tales
No worries mate. I’ve got just
the right language for the job
At the beginning...
• Our work motivated by Tobias’ involvement with Pluto
(aka in Swedish Premiepensionmyndigheten)
• We have been trying to get code for three years,
but it’s too sensitive to release...
• So, apply salt to the following slides,
second hand material only, based on
discussions with Tobias and slides
by Lemonnier and Lundborg,
errors are mine, all mine.
...there was a script
• A modest Perl program hacked together to perform a simple data migration step
while the grown ups (BIC = Big IT Company) built the real system
• Unfortunately, the real system was late, over budget, and unusable
• So, BIC got fired and the script became the real system
...that grew up to be a program that...
• … manages the retirement savings of 5.5 million users
• … for a value of 23 billion Euros
• … with a team of 30 developers over 7 years
Pluto
320 000 lines of Perl
68 000 lines of SQL
27 000 lines of shell
26 000 lines of HTML
230 database tables
750 G bytes of data
24/7 availability
0 bugs allowed
The Road to Glory
• A number of factors contributed to the success of Pluto
• High productivity of scripting languages
Perl won over Java in all internal evaluations
• Disciplined use of the language
Many features disallowed by standards. Only C-like code.
No floating points. No threads. No OO.
• Fail fast, Abort, Undo
Batch daily runs, undo all changes if an error is detected.
• Contracts
Home-brewed contract notation for Perl, runtime checked
Contracts for Perl
contract(‘do_sell_current_holdings’)
-> in(&is_person, &is_date)
-> out(&is_state)
-> enable;
sub do_sell_current_holdings {
my ($person, $date)
…
if ($operation eq “BUD_”) {
…
…
return $state;
}
Lessons Learned
• The Pluto developers complained about
• Syntax (it’s ugly)
• Typing (it’s weak)
• Speed (it’s slow)
• Support for concurrency and parallelism is lacking
• Lack of encapsulation and modularity
The Questions are thus...
• Can we write dynamic scripts and robust programs in the same language?
• Can we go from scripts to programs and from programs to scripts freely?
• Can we do this without losing either the flexibility of scripting or
the benefits that come with static guarantees?
• Can I have my cake and eat it too?
Related Work
• Lundborg, Lemonnier. PPM or how a system written in Perl can juggle with billions. Freenix 2006
• Lemonnier. Testing Large Software With Perl. Nordic Perl Workshop 2007
• Stephenson. Perl Runs Sweden's Pension System. O’Reilly On Lamp, 2005
Thanks: Tobias Wrigstad
Urban Legends
Understanding the dynamics of dynamic languages
• How dynamic should we, must we, be?
Many anecdotal stories about need and use of dynamic features, but few case studies
Are dynamic features used to make up for missing static features?
Or are the programmers just “programmers”?
• Can we add a static type system to an existing dynamic language?
without having to rewrite all legacy programs and libraries
Methodology
• We selected a very dynamic language, JavaScript, a cross between
Scheme and Self without their elegance but with a large user base
• We instrumented a popular browser (Safari) and collected traces from the
100 most popular websites (Alexa) plus many other traces
• We ran an offline analysis of the traces to gather data
• We analyzed the source code to get static metrics
JavaScript is a Programming Language
• A familiar syntax
function List(v,n) {this.value=v; this.next=n;}
List.prototype.map = function(f){
return new List( f(this.value),
this.next ? this.next.map(f) : null); }
var ls = new List(1, new List(2, new List(3, null)));
var nl = ls.map( function(x){return x*2;} );
JavaScript is a Programming Language
• A familiar syntax
function List(v,n) {this.value=v; this.next=n;}
List.prototype.map = function(f){
return new List( f(this.value),
Constructor this.next ? this.next.map(f) : null); }
field addition
var ls = new List(1, new List(2, new List(3, null)));
var nl = ls.map( function(x){return x*2;} );
JavaScript is a Programming Language
adding a shared
• A familiar syntax
method
function List(v,n) {this.value=v; this.next=n;}
List.prototype.map = function(f){
return new List( f(this.value),
this.next ? this.next.map(f) : null); }
var ls = new List(1, new List(2, new List(3, null)));
higher order function
var nl = ls.map( function(x){return x*2;} );
JavaScript is a Programming Language
• A familiar syntax
function List(v,n) {this.value=v; this.next=n;}
List.prototype.map = function(f){
object construction
return new List(
f(this.value),
this.next ? this.next.map(f) : null); }
var ls = new List(1, new List(2, new List(3, null)));
var nl = ls.map( function(x){return x*2;} );
JavaScript is a Programming Language
• A familiar syntax
function List(v,n) {this.value=v; this.next=n;}
List.prototype.map = function(f){
return new List( f(this.value),
this.next ? this.next.map(f) : null); }
Closure
var ls = new List(1, new List(2, new List(3, null)));
var nl = ls.map( function(x){return x*2;} );
Challenges to static typing
• Lack of type declarations
• Eval and dynamic loading
• Addition/deletion of fields/methods
• Changes in the prototype hierarchy
Corpus
• 100 JavaScript programs were recorded
Focus on the following:
280slides, Bing, Blogger, CNET, Digg, ESPN, Facebook,
Flicker, GMaps, Gmail, Google, ImageShack, LivelyKernel,
Other, Purdue, Twitter, Wikipedia, WordPress, YouTube,
eBay, AppleMe
• The total size of the traces is 6.7 GB
• Size of source in bytes
3000
3000
• Size of average trace in bytes
0
0
280slides
280slides
Bing
Bing
Blogger
Blogger
CNET
CNET
Digg
Digg
ESPN
ESPN
Fbook
Fbook
Flickr
Flickr
GMaps
GMaps
Gmail
Gmail
Google
Google
ImgShack
ImgShack
Lively
Lively
Other
Other
Purdue
Purdue
Twitter
Twitter
Wikip
Wikip
YouTube
YouTube
eBay
eBay
me.com
me.com
Corpus
7000
7000
6000
6000
5000
5000
1 MB
4000
4000
500 KB
2000
2000
1000
1000
5 MB
Instruction Mix
Average
Call
●
●
●
●
●
●
●
Read
●
●
●
●
●
●
Write
●
stores
● stores
● stores
stores_direct
● reads
stores_hashmap
● reads
stores_idx ●
reads
reads_direct●
delete
reads_hashmap
● delete
reads_idx ●
delete
deletes_direct
● define
deletes_hashmap
● create
deletes_idx ●
calls
defines
● throws
creates
● catchs
calls
throws
catchs
WordPress
Image Shack
●
Google
Gmail
GMaps
Flickr
Fbook
ESPN
Digg
CNET
Blogger
Bing
280slides
Average: less than 30 operations per
function.
• Number of instruction executed within each function
me.com
eBay
YouTube
Wikip
Twitter
Purdue
Other
Lively
ImgShack
0
50
100
150
200
Function Size
dom
arrays
Live Data
anonymous
• Objects allocated in the traces
broken down in major
categories
prototypes
dates
regexps
Gmai
ImShk
Cnet
Espn
Blog
Wikp
PU
280
Bing
WorP
errors
instances
functions
eBay
Fbok
Lively
Googl
Me
Digg
Twit
Other
FLikr
Gmap
YTube
Max
Avg
me.com
eBay
YouTube
WordP
Wikip
Twitter
Purdue
Other
Lively
ImgShack
Google
Gmail
GMaps
Flickr
Fbook
ESPN
Digg
CNET
Blogger
Bing
x._proto_ = y
280slides
0
2
• While, average close to 1,
maximum depends on
coding style
4
• Prototype chain length
similar to inheritance
depth metrics
6
8
• Prototype chains allow
sharing behavior in a more
flexible way than
inheritance
10
12
Prototype chain lengths
7000
6000
JSON
eval(“x.f = 3”)
0slides
Bing
Blogger
CNET
Digg
ESPN
Fbook
Flickr
GMaps
Gmail
Google
gShack
Lively
Purdue
Twitter
WordP
ouTube
eBay
me.com
0
1000
2000
3000
• What if most evals were
deserialization of JSON
data?
5000
• Eval can perform arbitrary
damage to data structures
Random code
4000
Oh Eval, my Eval
Variadicity
me.com
eBay
YouTube
WordP
Wikip
Twitter
Purdue
Other
Lively
ImgShack
Google
Gmail
GMaps
Flickr
Fbook
ESPN
Digg
CNET
Blogger
f = function(x,y) {…}
Bing
# too few
# too many
280slides
f(1)
f(1,2,3)
0.0
0.1
0.2
• Missing args have value
UNDEFINED, additional
args accessed by
position
0.3
• Functions need not be
called with the “right”
number of arguments
0.4
0.5
60% of functions
are used
variadically
Dynamic dispatch
10000
1 call site
dispatches to
100K functions
100
• How dynamic is our
corpus?
1
• Dynamic binding is a
hallmark of object-oriented
languages
>100K call sites
monomorphic
1
10
100
1000
10000
#of different functions called from a call site
Dynamic dispatch
~100K bodies
are unique
• Programming style
(or lack thereof) matters
10000
• What if there were many
identical functions?
function List(h,t){
this.head=h;
this.tail=t;
this.c=function(l){…}
}
1
100
1 body shared
50K functions
1
10
100
1000
#of functions sharing the same body
10000
Dynamic dispatch
~100K call sites
monomorphic
10000
• These are not so different
from Java or C#…
1
100
1 call site
dispatches <2K
functions
1
5
10
50
100
500
1000
#of different function body called from a call site
Constructor dynamism
10000
1 constructor
returns
~300 “types”
1
function Person(n,M){
this.name=n;
this.sex=M;
if(M){
this.likes= “guns”
}
}
10
100
• Accordingly a constructor
can return different “types”,
i.e. objects with different
properties
1000
• Constructors are “just”
functions that side-effect
this.
>2K constructors
monomorphic
1
2
5
10
20
50
100
200
#of different “types” returned by a constructor
Addition/Deletion
2.5
• Ignoring constructors,
the average number of
additions per object is
considerable
• Deletion are less frequent
but can’t be ignored
x._proto_.f = F
delete x.y
2.0
1.5
deletes
adds
1.0
0.5
0.0
280
Bing
Blog
CNET
Digg
ESPN
Fbook
Flikr
GMail
Gmap
Googl
IShak
Livly
Other
PU
Twit
Wikip
WordP
eBay
YTube
me.com
all
• JavaScript allows
runtime addition and
deletion of fields and
methods in objects
(including prototypes)
Addition/Deletion
2.5
• Programming style (or
lack thereof) matters.
2.0
• What about objects
constructed by extending
an empty object?
1.5
• Heuristic: construction
ends at first read
1.0
x.head=”Mickey”
x.map=function(x){…}
… = x.f …
0.0
280
Bing
Blog
CNET
Digg
ESPN
Fbook
Flikr
GMail
Gmap
Googl
IShak
Livly
Other
PU
Twit
Wikip
WordP
eBay
YTube
me.com
all
x = {}
0.5
Addition/Deletion
2.5
• Hash tables & arrays are
objects, and vice versa
2.0
• What if most of add/
deletes were hash/index
operation?
1.5
• Heuristic:
syntax of access
operations
1.0
0.5
… x[3]…
0.0
280
Bing
Blog
CNET
Digg
ESPN
Fbook
Flikr
GMail
Gmap
Googl
IShak
Livly
Other
PU
Twit
Wikip
WordP
eBay
YTube
me.com
all
… x[name] …
Conclusions, maybe
• JavaScript programs are indeed dynamic
• All features are used, but not all the time
• Some of the abuse is sloppy programming
• Hope of imposing types on legacy code?
Related Work
• Anderson, Giannini, Drossopoulou. Towards type inference for JavaScript. ECOOP 2005
• Mikkonen, Taivalsaari. Using JavaScript as a Real Programming Language. TR SUN 2007
• Holkner, Harland. Evaluating the dynamic behavior of Python applications. ACSC 2009
• Chugh, Meister, Jhala, Lerner. Staged information flow for JavaScript. PLDI 2009
• Furr, An, Foster, Hicks. Static type inference for Ruby. SAC 2009
• Guha, Krishnamurthi, Jim. Using static analysis for Ajax intrusion detection. WWW 2009
• Jensen, Møller, Thiemann. Type analysis for JavaScript. SAS 2009
Current team: Sylvain Lebresne, Gregor Richards, Brian Burg
Alumni: Johan Ostlund, Tobias Wrigstad
Sponsor: ONR, NSF
Future Prospects
?
Thorn
• The story so far:
• We are looking into how to help scripts grow up to be programs
• Scripts are really dynamic
• Existing languages are ill-suited to non-invasive evolution
Methodology
• Design a new language, benefit from
• the ability to correct unlucky language design decision
• the freedom from legacy code and user base
• Thorn is an experiment in language design
• Thorn lets scripts grow up by
• addition of encapsulation/modularization (classes, modules, components)
• addition of concurrency (components)
• addition of types (like types)
Thorn is a Programming Language
• A familiar syntax
class List(hd,tl) {
def lmap(f) = List( f.apply(head),
if(hd==null) tl.lmap(f); else null;
}
ls = List(1, List(2, List(3, null)));
Growing up modular
Modularity and encapsulation mechanisms
• A script
• A simple script that does some text manipulation
• No encapsulation, no modularity, but does the job
words = "story.txt".file.contents.split("\\W+");
wc = %group(word=w.toLower){ n=%count; | for w<-words };
sorted = %sort["%3d %s".format(n,word)
%> n %< word | for {:word, n:} <- wc];
println( sorted.joined("\n") );
Modularity and encapsulation mechanisms
• A class
• Classes provide basic encapsulation and modularity
• Limited support for access control
class Counter(name) {
words = name.file.contents.split("\\W+");
def count() = %group(word=w.toLower) {
n=%count; | for w <- words };
def sort() = %sort["%3d %s".format(n,w)
%> n %< w | for {:w,n:} <- count()];
}
Modularity and encapsulation mechanisms
• A module
• Better support for encapsulation
• Control over linking
module COUNT {
import own file.*;
import util.Vector;
class Counter(name) {
…
}
}
Modularity and encapsulation mechanisms
• A component
• Stronger encapsulation
component Count {
sync count(name) ...
}
Growing up concurrent
Concurrency features
A
L
• The unit of concurrency is the component
• Components are
s
’
• single-threaded
• fully isolated
• have a mailbox
o
M
e
r
d
t
e
s
l
i
a
n
i
d
r
a
B
O
O
S
P
a
t
k
l
Growing up typed
Static and dynamic type checking
Dynamic type checking is great:
• anything goes, until it doesn't;
• a program can be run even when
crucial pieces are missing
Static type checking is great:
• catches bugs earlier;
• enables faster execution.
Can they co-exist in the same design?
Problem
class Foo{ def bar(x:Int) = x+1; }
a:Foo = Foo();
Idea: let the run-time check
that X is compatible with
type Int.
a.bar(X);
,
# assume no static type information available on X
When should this check be performed?
How long does it take?
Run-time wrappers
class Ordered { def compare(o:Ordered):Int; }
fun sort (x:[Ordered]):[Ordered] = …
a:[Ordered] = sort(X);
•
Checking that X is an array of Ordered is linear time
•
Arrays are mutable, so checking at invocation of sort is not enough.
Idea: add a wrapper around X that checks
that it can respond to methods invoked on it
Compiled code:
a:[Ordered] = sort(#[Ordered]#X)
Cannot optimize code
class Foo{ def bar(x:Int) = x+1; }
a:Foo = Foo();
a.bar(X);
With static types, body of Foo.bar compiled with aggressive optimizations
Any decent compiler would unbox the Int
This is not possible if it is a wrapped object #Int#X
Our design
Our design principles
Permissive:
accept as many programs as possible
Modular:
be as modular as possible
Reward good behavior:
reward programmer with performance or clear correctness guarantees
Our design
Introduce a novel type construct that mediates between static and dynamic.
still flexible
flexible
fast
like type
static
catch errors
dynamic
catch some errors
• For each class name C, add type like
C
• Compiler checks that operations on like
C variables are well-typed if the
referred object had type C
• Does not restrict binding of like
C variables,
checks at run-time that invoked method exists
An example
class Point(var x:Int, var y:Int) {
def getX():Int = x;
def getY():Int = y;
def move(p
) {
x := p.getX(); y := p.getY(); }
}
Requirements:
1. Fields x and y declared Int
2. move accepts any object with getX and getY methods
like Point
class Point(var x:Int, var y:Int) {
def getX():Int = x;
def getY():Int = y;
def move(p:like Point) {
}
x := p.getX(); y := p.getY(); }
1. Flexibility
class Point(var x:Int, var y:Int) {
def getX():Int = x;
def getY():Int = y;
def move(p:like Point) {
x := p.getX(); y := p.getY(); }
}
class Coordinate(x:Int,y:Int) {
def getX():Int = x;
def getY():Int = y;
}
p = Point(0,0);
c = Coordinate(5,6);
p.move(c);
move runs fine if c has getX/getY
2. Checks
class Point(var x:Int, var y:Int) {
def getX():Int = x;
def getY():Int = y;
def move(p:like Point) {
x := p.getX(); y := p.getY();
p.hog;
}
}
Compile-time Error
move is type-checked under assumption that the argument is a Point
3. Return values have the expected type
class Cell(var contents) {
def get() = contents;
def set(c) { contents := c }
}
class IntCell { def get():Int;
q:Cell = Cell(41);
p:like IntCell = Cell(42);
def set(c:Int);
}
q.get():dyn
q.get() + 1;
p.get() + 1;
p.get():Int
optimizations possible
Like types
• A unilateral promise as to how a value will be treated locally
allow most of the regular static checking machinery
allows the flexibility of structural subtyping
• Concrete types can stay concrete
allow reusing names as semantics tags
• Interact nicely with generics
Rewards
running speed relative to Python 2.5.1
Lies, big and small...
3.0
2.5
Typed Thorn
Dynamic Thorn
Python 2.5.1
Ruby 1.8.6
4.87
4.84
11
12
2.0
1.5
1.0
0.5
0.0
1000
1500
spectral-norm
1000
mandelbrot
1500
.
*
*
*
*
fannkuch
Obtained with an older version of Thorn
Benefits due to unboxing of numeric values
Benchmarks are meaningless
Still slower than Java
Related Work
• Findler, Felleisen. Contracts for higher-order functions. 2002
• Bracha. The Strongtalk Type System for Smalltalk. 2003
• Gray, Findler, Flatt. Fine-grained interoperability through mirrors and contracts. 2005
• Siek, Taha. Gradual typing for functional languages. 2006
• Tobin-Hochstadt, Felleisen. Interlanguage migration: From scripts to programs. 2006
• ECMA. Proposed ECMAScript 4th Edition – Language Overview. 2008
• Siek, Garcia, Taha. Exploring the design space of higher-order casts. 2009
• Wadler, Findler. Well-typed programs can’t be blamed. 2009
Current team: Bard Bloom, John Field, Johan Ostlund, Gregor Richards, Brian Burg, Nick Kidd
Alumni: Tobias Wrigstad, Nate Nystrom, Rok Strinsa
Sponsor: IBM, ONR, NSF
Conclusion, actually
• Transforming a one-off script into a program requires language and virtual
machine support
• Language design may have to make certain compromises in order to achieve
performance
• Rewarding programmer effort is essential for adoption
• More information:
THORN tutorial @ OOPSLA
Talk in the main track
Online-demo
http://thorn-lang.org