Semmle QL white paper

Validating API
implementations
with Semmle™ QL
Using the object-oriented Semmle QL query
language to find web services whose
implementations do not match their RAMLbased API specifications.
Technical White Paper
© 2016 Semmle, Inc.
Better Insight. Better Team. Better Software.
Exploring your code with Semmle™ QL
Semmle™ QL, is a declarative, object-oriented query language that is used by
software architects to explore their source code. It is ideal for those who want an
unbounded ability to ask questions of their code by interrogating it the way they
would any database.
The syntax of QL is modeled on Java, with a strong influence from other query
languages like SQL. The object-oriented syntax, with support for recursion, allows
you to define queries with very sophisticated logic that understand how data and
logic flow through the code, scoping, typing, and so on The complexity of that logic
can be hidden from query users and made reusable by storing it in query libraries.
This paper describes how QL can be used to identify mismatches between API
specifications in RAML and their JavaScript implementations.
CONTENTS
OVERVIEW........................................................................................................................................................ 2
MOTIVATING EXAMPLE.............................................................................................................................. 2
MODELLING RAML SPECIFICATIONS .................................................................................................... 4
MODELLING OSPREY WEBAPPS............................................................................................................... 8
CONCLUSION .............................................................................................................................................. 14
FOR MORE INFORMATION ..................................................................................................................... 14
© 2016 Semmle, Inc.
1
Better Insight. Better Team. Better Software.
OVERVIEW
RAML is a YAML-based specification language for web service APIs that describes the resources
offered by a web service, and the operations that are available on them.
RAML does not by itself provide an implementation of these operations; instead it serves as an
API specification for a separate implementation of the web service in another language such as
Ruby or JavaScript.
In this paper, we will consider the problem of checking that the implemented API conforms to
its specification. We restrict our attention to implementations in JavaScript based on Osprey, a
Node.js framework library developed specifically for implementing RAML-specified web
services. We will develop a query that ensures that all HTTP response codes sent by the
implementation appear in the specification, and highlights violations of this rule.
The query makes use of two libraries, also implemented in QL, that model RAML specifications
and Osprey applications, respectively. Both the libraries and the query will be discussed in some
detail.
MOTIVATING EXAMPLE
As our running example we use robonode, a RAML-specified web service API for Robotis
DARWIN-MINI robots. Being a proof-of-concept application, robonode is quite small and has a
few mismatches between implementation and specification, thus making it an ideal subject for us
to study.
The RAML specification for the API implemented by robonode can be found in
src/assets/raml/api.raml. The bulk of the specification is taken up by the definitions of the
various resources offered by the API. For instance, the specification defines a resource /robots
that lists all robots controlled by the web service. For each individual robot, the resource
/robots/{robotId} provides more information, and an API for submitting commands to the
robot. In particular, /robots/{robotId}/state gives access to information about the current
state of the robot identified by {robotId}.
Note that the specification stipulates that a GET request for a resource of the form
/robots/{robotId}/state, where {robotId} is the identifier of a robot controlled by
robonode, will yield a response with response code 200.
The bulk of the implementation of the robonode web service is contained in src/app.js. We
briefly show and discuss a few important code snippets.
© 2016 Semmle, Inc.
2
Better Insight. Better Team. Better Software.
Since robonode is a Node.js application, the app.js module starts with a few require
statements that import other modules on which robonode depends. In particular, it imports the
Osprey module like this:
var osprey = require('osprey');
The API implementation starts by invoking the method osprey.create to create an API object:
api = osprey.create('/api', app, {
ramlFile: path.join(__dirname, '/assets/raml/api.raml'),
logLevel: 'debug'
});
This call establishes the connection between implementation and specification by telling Osprey
about the location of the RAML specification. Osprey will parse the specification and create a
stub implementation based on it. The rest of app.js simply fills in these stubs, replacing them
with meaningful implementations.
In particular, here is the implementation of the GET operation on the resource
/robots/{robotId}/state (written /robots/:robotId/state in Osprey):
api.get('/robots/:robotId/state', function (req, res) {
var robotId = req.params.robotId;
var address = Number(req.param('address'));
var bytesToRead = Number(req.param('bytesToRead'));
console.info('Reading state: ' + [ address, bytesToRead ]);
robot.connect(robotId, function onConnect(err, btSerial) {
if (err) {
res.status(500).send({ error: err });
} else {
robot.readState(btSerial, address, bytesToRead);
res.status(202).send('');
}
});
});
It uses Osprey’s get method to associate a handler function with GET requests on the
/robots/{robotId}/state resource. The handler function is passed two arguments req and
res. The former represents the incoming GET requests, exposing its request parameters
through the req.params object. The latter represents the outgoing response object, which
should be filled in by the handler function.
© 2016 Semmle, Inc.
3
Better Insight. Better Team. Better Software.
In this case, we can see that the handler sends a 500 response if an error occurred
communicating with the robot, and otherwise a 202 response. Referring back to our discussion
above, we can see that this does not conform to the specification, which only allows for a 200
response.
We will now show how to write a simple QL query that identifies mismatches of this kind.
MODELLING RAML SPECIFICATIONS
Our first task is to develop a QL library for modelling RAML specifications. Since the JavaScript
analysis already knows how to handle YAML files, we can define our RAML model as a set of
QL classes that refine the corresponding classes for general YAML.
We aim to model three RAML concepts: entire specifications, resources specified in a
specification, and methods specified on a resource. These three concepts will be represented
by three QL classes, RAMLSpec, RAMLResource and RAMLMethod, respectively.
To begin with, we define a QL class representing entire RAML specifications: a RAML
specification is simply a YAML document defined in a file with extension .raml:
/** A RAML specification. */
class RAMLSpec extends YAMLDocument, YAMLMapping {
RAMLSpec() {
getLocation().getFile().getExtension() = "raml"
}
}
Note the use of multiple inheritance to ensure that a RAMLSpec is both a YAMLDocument (that
is, a top-level YAML value, not nested inside another value), and a YAMLMapping (as opposed
to, say, a scalar). The characteristic predicate imposes the further condition that the file in
which the specification object is located has extension .raml.
When writing a QL query, it is a good idea to frequently test the abstractions being defined, so
let us write a quick query to test that our definition of RAMLSpec makes sense:
© 2016 Semmle, Inc.
4
Better Insight. Better Team. Better Software.
import javascript
class RAMLSpec { /* see above */ }
from RAMLSpec s
select s
If we run this query on a snapshot of robonode, it will find a single RAML specification, namely
the one in src/assets/raml/api.raml, as expected.
Next, we define a QL class for dealing with RAML resource specifications. A resource is simply
a YAML mapping that is itself embedded inside another mapping under a key whose name starts
with a slash character:
/** A RAML resource specification. */
class RAMLResource extends YAMLMapping {
RAMLResource() {
getDocument() instanceof RAMLSpec and
exists (YAMLMapping m, string name |
this = m.lookup(name) and
name.matches("/%")
)
}
}
There are two things we would like to know for a given resource: its path relative to the API
root, and which methods it provides.
For the former, we can define a method RAMLResource.getPath() that recursively computes
the path of a resource: if the resource is at the toplevel, its path is simply its name; otherwise, it
must be nested inside an outer resource, and its path is the path of the outer resource
concatenated with the resource name.
© 2016 Semmle, Inc.
5
Better Insight. Better Team. Better Software.
This translates straightforwardly to QL:
/** Get the path of this resource relative to the API root. */
string getPath() {
exists (RAMLSpec spec |
this = spec.lookup(result)
) or
exists (RAMLResource that, string p |
this = that.lookup(p) and
result = that.getPath() + p
)
}
At this point, we again pause to quickly check that our definitions make sense and write a query
to find all resources and list their paths:
import javascript
class RAMLSpec { /* see above */ }
class RAMLResource { /* see above */ }
from RAMLResource rr
select rr, rr.getPath()
On robonode, this finds four resources with paths /robots, /robots/{robotId},
/robots/{robotId}/commands and /robots/{robotId}/state, as expected.
Before we finish our implementation of RAMLResource to connect resources and methods, let
us first define a QL class to represent RAML methods.
A simple definition of a RAML method specification could be as follows: a RAML method is a
YAML value appearing anywhere in a RAML specification under a key that corresponds to an
HTTP verb such as get, put, post or delete.
This is easy to implement in QL:
RAMLMethod() {
exists (YAMLMapping obj, string verb |
this = obj.lookup(verb) and
(verb = "get" or verb = "put" or
verb = "post" or verb = "delete")
)
}
© 2016 Semmle, Inc.
6
Better Insight. Better Team. Better Software.
In fact, since we will need to reuse the concept of HTTP verbs later on we will introduce a
string-valued predicate httpVerb whose result is any of the HTTP verbs we are interested in:
string httpVerb() {
result = "get" or result = "put" or
result = "post" or result = "delete"
}
The characteristic predicate of class RAMLMethod then simplifies to:
RAMLMethod() {
getDocument() instanceof RAMLSpec and
exists (YAMLMapping obj |
this = obj.lookup(httpVerb())
)
}
In this paper, we will be mostly interested in the responses that an API method can return, so
let us define a method getResponse that looks up response specifications of API methods by
their numeric response code:
/** Get the response specification for the given status code. */
YAMLValue getResponse(int code) {
exists (YAMLMapping obj, string s |
obj = this.(YAMLMapping).lookup("responses") and
result = obj.lookup(s) and
code = s.toInt()
)
}
This simply says that to find the specification for a response code, we need to obtain the
responses mapping of the method (which consequently has to be a mapping itself), and look up
the response code in it.
© 2016 Semmle, Inc.
7
Better Insight. Better Team. Better Software.
We can test our implementation of RAMLMethod with a simple query like this:
import javascript
string httpVerb() { /* see above */ }
class RAMLSpec { /* see above */ }
class RAMLResource { /* see above */ }
class RAMLMethod { /* see above */ }
from RAMLMethod m
select m
The final bit of modelling is to connect RAML resources with the methods they define, which is
done by method RAMLResource.getMethod:
/** Get the the method for this resource with the given verb. */
RAMLMethod getMethod(string verb) {
verb = httpVerb() and
result = lookup(verb)
}
We now assemble all of these definitions into a library RAML.qll that can be imported by
queries or other libraries.
MODELLING OSPREY WEBAPPS
Next, we develop a QL model of Osprey-based Node.js applications. As with our RAML model,
the aim is not to fully model all details of the Osprey framework, but simply to provide a basis
for our analysis.
Recall that ultimately we are interested in calls to the status method on response objects,
where the response object is passed to a handler function for a RAML method. Hence we need
to:
1.
2.
3.
4.
Identify imports of Osprey
Find calls to osprey.create that initialize a new API implementation
Detect API method definitions and the handler methods they install
Locate calls to status on the second argument of such handler methods
© 2016 Semmle, Inc.
8
Better Insight. Better Team. Better Software.
To begin with, we define a QL class for picking out require statements importing the Osprey
module, and variables in which the returned module object is stored:
/** An import of the Osprey module. */
class OspreyImport extends Require {
OspreyImport() {
getImportedPath().getValue() = "osprey"
}
}
/** A variable that holds the Osprey module. */
class Osprey extends Variable {
Osprey() {
getAnAssignedValue() instanceof OspreyImport
}
}
Identifying API creation sites is now easy: we simply look for calls to a create method where
the receiver is a variable holding the Osprey module object.
/** A call to `osprey.create`. */
class OspreyCreateAPICall extends MethodCallExpr {
OspreyCreateAPICall() {
getReceiver().(VarAccess).getVariable() instanceof Osprey and
getMethodName() = "create"
}
}
© 2016 Semmle, Inc.
9
Better Insight. Better Team. Better Software.
To establish the link between Osprey API implementations and their specifications, we add a
method getSpecFile()to this class, which reads the configuration object passed as the third
parameter of the call to create, extracts its ramlFile property and resolves it as a file system
path:
/** Determine the root folder relative to which the RAML file path is
resolved. */
private Folder getSearchRoot(PathExpr path) {
// paths starting with a dot are resolved relative to the enclosing
directory
if path.getValue().matches(".%") then
result = getFile().getParent()
// all other paths are resolved relative to the file system root
else
result.getName() = ""
}
/** Get the specification file for the API definition. */
File getSpecFile() {
exists (ObjectExpr oe, PathExpr p |
oe = getArgument(2) and
p = oe.getProperty("ramlFile").getInit() and
result = p.resolve(getSearchRoot(p))
)
}
After the API implementation is created, it is usually stored in a variable. We introduce a new
QL class OspreyAPI to identify such variables, and add a convenience method getSpecFile()
that simply delegates to the corresponding method on OspreyCreateAPICall:
/** A variable in which an Osprey API object is stored. */
class OspreyAPI extends Variable {
OspreyAPI() {
getAnAssignedValue() instanceof OspreyCreateAPICall
}
File getSpecFile() {
result = getAnAssignedValue().(OspreyCreateAPICall).getSpecFile()
}
}
© 2016 Semmle, Inc.
10
Better Insight. Better Team. Better Software.
This allows us to easily find REST method definitions: they are themselves method calls, where
the receiver is an OspreyAPI variable, and the name of the invoked method is an HTTP verb:
/** An Osprey REST method definition. */
class OspreyMethodDefinition extends MethodCallExpr {
OspreyMethodDefinition() {
exists (OspreyAPI api | getReceiver() = api.getAnAccess()) and
getMethodName() = httpVerb()
}
/** Get the API to which this method belongs. */
OspreyAPI getAPI() {
getReceiver() = result.getAnAccess()
}
/** Get the verb which this method implements. */
string getVerb() {
result = getMethodName()
}
/** Get the resource path to which this method belongs. */
string getResourcePath() {
result = getArgument(0).(StringLiteral).getValue()
}
}
A callback function for a REST method is then simply the second argument to a REST method
definition:
/** A callback function bound to a REST method. */
class OspreyMethod extends FunctionExpr {
OspreyMethod() {
exists (OspreyMethodDefinition omd | this = omd.getArgument(1))
}
OspreyMethodDefinition getDefinition() {
this = result.getArgument(1)
}
string getVerb() {
result = getDefinition().getVerb()
}
string getResourcePath() {
result = getDefinition().getResourcePath()
}
}
© 2016 Semmle, Inc.
11
Better Insight. Better Team. Better Software.
The second parameter of a callback function contains the response object to be filled in by the
callback1:
/** A variable that is bound to a response object. */
class MethodResponse extends Variable {
MethodResponse() {
exists (OspreyMethod m, SimpleParameter res |
res = m.getParameter(1) and
this = res.getVariable()
)
}
OspreyMethod getMethod() {
this = result.getParameter(1).(SimpleParameter).getVariable()
}
}
After these preparations, we can now write a QL class capturing invocations of the status
method on such response objects:
/** A call that sets the status on a response object. */
class MethodResponseSetStatus extends MethodCallExpr {
MethodResponseSetStatus() {
exists (MethodResponse mr |
getReceiver() = mr.getAnAccess() and
getMethodName() = "status"
)
}
OspreyMethod getMethod() {
exists (MethodResponse mr |
getReceiver() = mr.getAnAccess() and
result = mr.getMethod()
)
}
int getStatusCode() {
result = getArgument(0).getIntValue()
}
}
1
Note that we need to explicitly ensure that the second parameter is a SimpleParameter, that is, a parameter
that binds a single variable and not an ECMAScript 6-style destructuring parameter.
© 2016 Semmle, Inc.
12
Better Insight. Better Team. Better Software.
As before, we assemble all these classes and methods into a reusable library Osprey.qll. It is
always a good idea to write simple test queries to ensure that the definitions make sense. We
have elided examples here for brevity.
Finding missing response specifications
Having implemented QL models for RAML specifications and Osprey implementations, we can
now proceed to implement our main query.
First, we need to match up RAML method specifications and their implementations in Osprey.
This means that:
•
•
•
The RAML method appears in the specification referred to by the Osprey API
implementation;
The specification and the implementation refer to the same resource path; note that
RAML resource paths may contain variable elements surrounded in curly braces, while
the corresponding element in Osprey will be preceded by a colon;
The specification and the implementation refer to the same HTTP verb.
These three requirements translate straightforwardly into a QL predicate, where we use a
simple regular expression to translate from Osprey paths to RAML paths:
RAMLMethod getSpecification(OspreyMethod om) {
exists (RAMLResource rr, File f, string rPath |
rr.getLocation().getFile() = f and
f = om.getDefinition().getAPI().getSpecFile() and
rPath = om.getResourcePath() and
rr.getPath() = rPath.regexpReplaceAll("/:([^/]+)/", "/{$1}/") and
result = rr.getMethod(om.getVerb())
)
}
The final query is now simplicity itself, and identifies five unspecified response codes on
robonode:
from MethodResponseSetStatus mrss, RAMLMethod rm
where rm = getSpecification(mrss.getMethod()) and
not exists(rm.getResponse(mrss.getStatusCode()))
select mrss, "Response " + mrss.getStatusCode() + " is not specified by
$@.", rm, rm.toString()
© 2016 Semmle, Inc.
13
Better Insight. Better Team. Better Software.
CONCLUSION
We have discussed an analysis to find a particular type of mismatch between API specifications
in RAML and their JavaScript implementations. The analysis makes use of two libraries to model
RAML specifications and Osprey-based webapp implementations, respectively. The libraries are
quite concise at 148 lines of QL code, yet they provide an adequate foundation for the analysis
query, which can be expressed in no more than 14 lines.
The development of this analysis and the associated custom QL libraries demonstrates the ease
with which the standard QL libraries included with Semmle can be extended to describe
specific frameworks for a language, and arbitrary other software engineering artifacts such as
API specifications. After defining custom libraries, writing queries in the high-level query
language QL to find common errors is a simple task, which any Semmle user can execute.
FOR MORE INFORMATION
Visit semmle.com to learn more about Semmle engineering analytics, team insight, and code
exploration software.
© 2016 Semmle, Inc.
14