Estudi de les API RESTFUL A Degree Thesis

Estudi de les API RESTFUL
A Degree Thesis
Submitted to the Faculty of the
Escola Tècnica d'Enginyeria de Telecomunicació de
Barcelona
Universitat Politècnica de Catalunya
by
Francesc Garcia Peña
In partial fulfilment
of the requirements for the degree in
TELEMATICS ENGINEERING
Advisor: José Luis Muñoz Tapia
Barcelona, July 2015
Abstract
To mitigate the lack of standardization that technologies such as RMI and CORBA had,
around the year 2000 two technologies with different approaches were developed.
Microsoft patented a protocol named SOAP that standardizes the client-server
interactions. On the other side, Dr. Roy Fielding presented his PhD thesis, where he
defined an architectural style named Representational State Transfer.
This document has three main objectives: first, to study the REST architectural style from
three points of view: theory, practice and implementation. Second, to generate
documentation that allows other ETSETB students to learn about REST. Finally, to apply
the learned concepts to Software Defined Networks applications and generate
documentation about it.
The study of REST has shown that systems that apply this architectural style acquire
some desired characteristics. Most of them are not quantifiable and others that are
quantifiable depend on the implementation that the developer has done. Even so, it's the
dominant technology in the distributed API field at the moment.
2
Resum
Per mitigar la falta d'estandardització que tenien tecnologies com RMI i CORBA, al
voltant de l'any 2000 es van desenvolupar dos tecnologies amb enfocaments diferents.
Microsoft va patentar un protocol anomenat SOAP que estandarditza les interaccions
entre client i servidor. D'altra banda el Dr. Roy Fielding va presentar la seva tesis
doctoral, on va definir un estil arquitectònic anomenat Representational State Transfer.
Aquest document te tres objectius: Primerament, l'estudi de l'estil arquitectònic REST des
de tres punts de vista: teoria, pràctica i implementació. El segon objectiu es generar
documentació que permeti altres estudiants de l'ETSETB aprendre sobre REST.
Finalment, l'aplicació dels conceptes apresos a les aplicacions de Software Defined
Networks i generar documentació sobre aquesta aplicació.
L'estudi de REST demostra que els sistemes que apliquen aquesta estil arquitectònic
guanyen certes característiques positives. La majoria d'elles no son quantificables i
altres, que son qualificables depenen de la implementació que el programador fa. Tot i
això, és la tecnologia dominant en el camp de les APIS distribuïdes.
3
Resumen
Para mitigar la falta de estandarización que tienen tecnologías como RMI y CORBA,
sobre el año 2000 se desarrollaron dos tecnologías con diferentes enfoques. Microsoft
patentó un protocolo llamado SOAP que estandariza las interacciones entre cliente y
servidor. Por otro lado, el Dr. Roy Fielding presentó su tesis doctoral, donde definió un
estilo arquitectónico llamado Representational State Transfer.
Este documento tiene tres objetivos: primeramente, el estudio del estilo arquitectónico
REST des de tres puntos de vista: teoría, practica e implementación. El segundo objetivo
es generar documentación que permita otros estudiantes de ETSETB aprender sobre
REST. Finalmente, la aplicación de los conceptos aprendidos a las aplicaciones de
Software Defined Networks i generar documentación sobre dicha aplicación.
El estudio de REST demuestra que los sistemas que aplican éste estilo arquitectónico
ganan ciertas características positivas. La mayoría de ellas no son cuantificables y otras,
que sí son cuantificables dependen de la implementación que el programador hace. Aún
así, es la tecnología dominante en el campo de las APIS distribuidas.
4
Acknowledgements
Per començar voldria agrair a la meva família el seu recolzament incondicional, les seves
paraules d'ànim quan les coses no han anat bé i tot l'esforç que han fet per a que pugui
arribar fins aquí.
Seguidament, vull agrair a la meva parella, Judit, que sempre estigui al meu costat i que
em doni els ànims i l'empenta que necessito a vegades.
Seguidamente querría agradecer a Carlos que compartiera conmigo los conocimientos
que había aprendido con su trabajo.
Para acabar, quiero agradecer a mi tutor, José Luis por ofrecerme éste trabajo, por la
libertad que me ha dado para elaborar el trabajo, por toda la documentación que me ha
facilitado cuando la he necesitado y por todo lo que ha puesto de su parte para evitarme
viajes desde que supo que no era de Barcelona.
5
Revision history and approval record
Revision
Date
Purpose
0
26/06/2015
Document creation
1
03/07/2015
Document revision
2
05/07/2015
Document revision
3
08/07/2015
Document revision
4
10/07/2015
Document final revision and approval
DOCUMENT DISTRIBUTION LIST
Name
e-mail
Francesc Garcia Peña
[email protected]
José Luis Muñoz Tapia
[email protected]
Written by:
Reviewed and approved by:
Date
26/06/2015
Date
10/07/2015
Name
Francesc Garcia peña
Name
Jose Luis Muñoz Tapia
Position
Project Author
Position
Project Supervisor
6
Table of contents
Abstract............................................................................................................................. 2
Resum............................................................................................................................... 3
Resumen........................................................................................................................... 4
Acknowledgements............................................................................................................ 5
Revision history and approval record................................................................................. 6
Table of contents................................................................................................................ 7
List of Figures.................................................................................................................... 9
List of Tables:................................................................................................................... 10
1.Introduction................................................................................................................... 11
1.1.Project work plan.................................................................................................... 12
1.1.1.Tasks:.............................................................................................................. 12
1.1.2.Milestones....................................................................................................... 14
1.1.3.Gantt diagram.................................................................................................. 15
2.State of the art of the technology used or applied in this thesis:................................... 16
2.1.HTTP...................................................................................................................... 16
2.1.1.cURL................................................................................................................ 17
2.1.2.Requests......................................................................................................... 17
2.2.XML........................................................................................................................ 17
2.3.JSON..................................................................................................................... 17
2.4.DJANGO................................................................................................................ 18
2.5.RYU....................................................................................................................... 18
2.6.LXC........................................................................................................................ 18
2.7.MININET................................................................................................................ 19
2.8.OPENFLOW........................................................................................................... 19
3.Methodology / project development:............................................................................. 20
3.1.Rest study.............................................................................................................. 20
3.1.1.Client-Server....................................................................................................... 20
3.1.2.Stateless............................................................................................................. 20
3.1.3.Cache.................................................................................................................. 21
3.1.4.Uniform Interface................................................................................................. 21
3.1.5.Layered System.................................................................................................. 22
7
3.1.6.Code on demand................................................................................................. 22
3.1.7.Implementational approach.............................................................................. 22
3.2.Documentation Generation.................................................................................... 22
3.3.Rest SDN Application............................................................................................. 23
3.3.1.Define Functionalities:..................................................................................... 23
3.3.2.Define resources:............................................................................................. 23
3.3.3.Define Resource representation:..................................................................... 23
3.3.4.HATEOAS........................................................................................................ 24
3.3.5.Cache.............................................................................................................. 25
3.3.6.Implementing................................................................................................... 25
4.Results.......................................................................................................................... 27
5.Budget.......................................................................................................................... 28
6.Conclusions and future development:........................................................................... 29
Bibliography:.................................................................................................................... 30
Glossary.......................................................................................................................... 32
Appendices:..................................................................................................................... 34
7.RESTful APIS book................................................................................................... 34
8.SDN REST API......................................................................................................... 34
9.API Demonstration.................................................................................................... 34
8
List of Figures
Gant Diagram…………………………………………………………………………………….15
Finite state machine representation of the SDN API………………………………………...24
Virtualization topology…………………………………………………………………………..25
Apendix 1 List of figures:
1.1 Client - Server architecture………………………….……………………………………….6
2.1 HTTP client/server………………………………….……………………………………….10
2.2 How HTTP cookies work…………………………..……………………………………….13
2.3 HTTP Proxies……………………………………..…………………………………………14
2.4 How CGIs work in HTTP………………………….………………………………………..15
2.5 An HTML form viewed from a browser……………..……………………………………..16
2.6 2 Multiple Persistent Connections with an HTTP Server……………………………….19
3.1 Finite state machine representation……………………………..………………………..41
3.2 Layered system example……………………………………….…...……………………..41
3.3 Flight finite state machine representation…………………………….…………………..43
4.1 Cache model………………………………………………………………………………...47
6.1 Django’s request-response procedure………………………………..…………………..58
8.1 App and Controller interconnection…………………………………..…………………...98
8.2 Monitoring system topology procedure………………………………………………….107
8.3 Authentication algorithm procedure……………………………………………………...108
8.4 Content Negotiation algorithm procedure…………………………….…………………110
8.5 Topology generated with lxc containers procedure…………………...………………..117
8.6 Wireshark captures from a POST request…………………………..………………….121
Appendix 3 List of figures:
1 Graphical representation of the statistics retreived………………………………………….5
2 Wireshark capture of a cached response…………………………………………………….6
9
List of Tables:
1-List of resources………………………………………………………………………………23
2-Project Budget……...…………………………………………………………………………28
Annex 1 list of tables:
2.1 HTTP Status Codes………………………………………………………………………...22
2.2 Common mime types……………………………………………………………………….23
2.3 Cache-Control header directives…………………………………………………………..25
2.4 Commands for WWW………………………………………………………………………30
6.1 Django’s important field list………………………………………………………………...60
6.2 Django’s important field options list……………………………………………………….61
6.3 Django’s important QuerySet methods…………………………………………………...63
6.4 Django’s HttpRequest object’s attributes…………………………………………………63
8.1 List of URIs…………………………………………………………………………………..99
10
1.
Introduction
In computer programming, an application programming interface (API) is a set of
routines, protocols, and tools for building software applications. An API expresses a
software component in terms of its operations, inputs, outputs, and underlying types. An
API defines functionalities that are independent of their respective implementations,
which allows definitions and implementations to vary without compromising each other. A
good API makes it easier to develop a program by providing all the building blocks. A
programmer then puts the blocks together.
Some APIS are offered in a distributed system context such as a client-server
topology, where the client and the server communicate though networking protocols.
As a first approach to develop APIS able to be used in a distributed system scenario,
many platforms developed their own technologies, such as CORBA, Java Remote
Method Invocation (RMI), DCOM, or .NET Remoting.
Some difficulties appeared here, the most important one was the incompatibility
between technologies: the client and the server had to use the same technologies.
Furthermore some vendor implementations/toolkits had troubles talking to each other due
to the lack of standardization.
To avoid the lack of standardization the web services were created. The web
services include a set of technologies standardized basically by two organisations: W3C
and Oasis. The W3C defines web services as follows: “A Web service is a software
system designed to support interoperable machine-to-machine interaction over a
network. It has an interface described in a machine-processable format (specifically
WSDL). Other systems interact with the Web service in a manner prescribed by its
description using SOAP messages, typically conveyed using HTTP with an XML
serialization in conjunction with other Web-related standards”.
SOAP (Simple Object Access Protocol) is a protocol used to exchange structured
information. It was developed by Microsoft, who has the main company interested in web
services development. Even if in its name it specifies that it's an 'Object Access' protocol,
it's only a past reminiscence of the past technologies that were used to access objects
(DCOM). In the present, SOAP is used to use services instead of accessing objects.
However, at the same time that web services were developed, Dr. Roy Fielding
developed his thesis Architectural Styles and the Design of Network-based Software
Architectures where he defines a new architectural style called REST (Representational
State Transfer).
In this thesis I will deeply analyse the REST architectural style from three points of
view:
•
The theoretical one: I will study the Dr. Fielding's dissertation.
•
The practical one: I will study how to apply REST style to the API design.
•
The implementational one: I will try to study technologies that can be used in the
design of APIS that follow the REST style (from protocols and standards to
programming frameworks and packages).
Once I have studied the three previous points of view I will develop a REST API for a
SDN application using Ryu as a framework to create the application.
11
I will study the Ryu framework from the point of view of REST APIS development,
and I will develop a simple API that offers monitoring functionalities such as traffic
statistics retrieval.
Finally, as a derivative of this study a book oriented towards education purposes for
ETSETB will be created. The content of this book will have as much of a practical
approach to the subjects as possible. It will be filled with examples to a better
understanding of the REST architectural style and it's appliance to Software Defined
networks. It is this project's main result and can be found in the appendices on
section ????.
To be able to carry out this project the many skills have been needed, such as the
ability to autonomously learn complex concepts and tools (HTTP, JSON, XML, Django,
Ryu, LaTeX), the ability to be able to express learnt concepts in a schematic and clear
way, and the creativity and knowledge to design examples that demonstrate the main
characteristics from the studied technologies.
This whole project has been developed by using open source tools over a laptop
computer running Linux Mint 17,1 “Rebbecca”. It could have been carried out on any midrange computer with 4GB of RAM or more.
1.1.
Project work plan
1.1.1. Tasks:
Project:
Major constituent: Familiarize with technologies
WP ref: WP1
Sheet n of m
Short description:
Planned start date: 23/02/2015
Installation of all software needed (Python, LaTeX editor,
Apache's Subversion, etc.) and getting used to work with it.
Planned end date: 08/03/2015
Start event:
End event:
Internal task T1: Software Installation
Deliverables:
Dates:
Internal task T2: Develop test to get used to the software
Project:
Major constituent: Python Review
WP ref: WP2
Sheet n of m
Short description:
Planned start date: 09/03/2015
Familiarize with syntax and particularities of python language.
Planned end date: 15/03/2015
Start event:
Internal task T1: Study Python.
End event:
Deliverables:
Dates:
Internal task T2: Develop some test projects.
12
Project:
Major constituent: Django Review
WP ref: WP3
Sheet n of m
Short description:
Planned start date: 16/03/2015
Familiarize with syntax and particularities of Django Framework.
Planned end date: 22/03/2015
Start event:
Internal task T1: Study Django's particularities
End event:
Deliverables:
Dates:
Internal task T2: Develop some test projects
Project:
Major constituent: Study restful API
WP ref: WP4
Sheet n of m
Short description:
Planned start date: 23/03/2015
In depth study of restful API.
Planned end date: 26/04/2015
Start event:
End event:
Deliverables:
Dates:
Project:
Major constituent: Compare other API architectures
WP ref: WP5
Sheet n of m
Short description:
Planned start date: 27/04/2015
Compare restful with other viable API architectures.
Planned end date: 10/05/2015
Start event:
Internal task T1: Study other API architectures (SOAP, XMLRPC)
End event:
Deliverables:
Dates:
Internal task T2: Compare other API architectures with restful
Project:
Major constituent: Examples Developing
WP ref: WP6
Sheet n of m
Short description:
Planned start date: 16/03/2015
Generate useful examples that will be included in documentation.
Planned end date: 31/05/2015
Start event:
Internal task T1: Study Django's particularities
End event:
Deliverables:
Dates:
Internal task T2: Develop some test projects
13
Project:
Major constituent: Generate Documentation
WP ref: WP7
Sheet n of m
Short description:
Planned start date: 6/04/2015
Generate all the documentation about restful APIs and Django.
Planned end date: 21/06/2015
Start event:
End event:
Deliverables:
Dates:
Project:
Major constituent: Develop a RESTFul API
WP ref: WP8
Sheet n of m
Short description:
Planned start date: 11/05/2015
Implement a REST API that offers monitoring functionalities for a Planned end date: 21/06/2015
SDN application
Start event:
End event:
Deliverables:
Dates:
Project:
Major constituent: Generate docummentation about the SDN
REST API
WP ref: WP9
Sheet n of m
Short description:
Planned start date: 11/05/2015
Generate necessary documentation for a complete understandin
of the principles and technologies used in the creation of the
SDN REST API.
Planned end date: 21/06/2015
Start event:
End event:
Deliverables:
Dates:
1.1.2. Milestones
WP#
3
4, 5
6,7
8,9
Task#
Short title
Python, Django
Restful study
Generate Documentation and examples
Restful API
Milestone / deliverable
Date (week)
4
11
17
17
14
1.1.3. Gantt diagram
Tasks
Familiarize with technologies
Python review
Django review
Study restful API
Compare w/ other API
Examples developing
Generate documentation
Developing a Restful API
Generate documentation API
Figure 1: Gantt Diagram
F
1
March
2 3 4
5
April
6 7 8
9
May
10 11 12 13
June
July
14 15 16 17 18
19
2.
State of the art of the technology used or applied in this
thesis:
This project starts as a part of a bigger project which objective is studying SDN networks.
Having in mind that SDN are a vast subject and that it would be impossible for a single
student to study it in depth in the period of time given to elaborate this project, the
supervisor of the project narrowed it down to some independent themes that can be
studied individually.
This project, as stated before, will focus on the study of APIS that allow the
communication between the Application layer and the Control layer. More concretely it will
focus on REST APIS, since they're very popular at the moment -many important
enterprises such as Google or Twitter use them-.
The REST architectural style does not specify on top of which protocols the applications
should be built, which format the communication messages should follow or which
technologies should be used on the server side. However, in practice, the applications
that follow the REST constrains are built on top of HTTP and use one of the common
internet formats: XML or JSON. In the server side technologies there's much more
diversity. In this thesis Django framework has been used to implement the server.
To create the SDN API, the communication protocol used between the switches and the
controller will be Open Flow 1.3 and the framework used to develop the application will be
Ryu.
Finally, in this project two different virtualization technologies have been used: Linux
Containers and Mininet.
Since the scope of this project is studying this technologies, apply them and generate
documentation to learn how to use it correctly, in this section there will only be a brief
explanation of the technologies used.
2.1.
HTTP
HTTP (Hypertext Transfer Protocol) is an application protocol that runs on top of TCP. Its
first version (HTTP/1.0) was specified on 1996 (rfc1945). After that a new version
(HTTP/1.) was released in 1999 (rfc2616) and it has remained as the standard version
until now. It has received some minor updates to add more functionalities. The protocol
received a big update in June 2014 but it remains still on version 1.1. It is now defined in
multiple rfc's:
RFC7230: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing
RFC7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
RFC7232: Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests
RFC7233: Hypertext Transfer Protocol (HTTP/1.1): Range Requests
RFC7234: Hypertext Transfer Protocol (HTTP/1.1): Caching
RFC7235: Hypertext Transfer Protocol (HTTP/1.1): Authentication
HTTP defines standardized semantics in an application level.
16
It defines different methods, which are meant to define different kind of actions. There are
8 of them: GET, POST, PUT, DELETE, OPTIONS, HEAD, TRACE and CONNECT.
It also defines multiple headers, which add information about the data that a message
carries. For example, you can specify the format of the information contained in a
message with the Content-Type header.
Finally, it defines status codes, which indicate standard responses from the server such
as “Everything went ok” (200) or “Not found” (404).
Another important key aspect of HTTP is that it's a stateless protocol, the response
originated to a message does not depend from previous requests.
The HTTP protocol will be the protocol used throughout this thesis because it's
singularities match perfectly with the REST constrains.
2.1.1. cURL
curl is an open source command line tool and library for transferring data with URL
syntax, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP,
LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMTP, SMTPS, Telnet and
TFTP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form
based upload, proxies, HTTP/2, cookies, user+password authentication (Basic, Plain,
Digest, CRAM-MD5, NTLM, Negotiate and Kerberos), file transfer resume, proxy
tunnelling and more.
2.1.2. Requests
Requests is a python HTTP library that will be used in this project to perform HTTP
requests against a server. It's very simple to use and it will match perfectly this
document's didactic ideology.
2.2.
XML
XML (eXtensible Markup Language) defines a set of rules for encoding documents in
a readable form. An XML document is a ``text'' file, i.e a string of characters coded with
UTF8 or with an ISO standard like ISO-8859-1 (Latin1). The characters which make up
an XML document are divided into \textit{markup} and \textit{content}. All strings which
constitute markup either begin with the character "<" and end with a ">", or begin with the
character "\&" and end with a ";". Strings which are not markup are content. In particular,
a \textit{tag} is a markup construct that begins with "<" and ends with ">". Tags come in
three flavors:
start-tags, for example <section>
end-tags, for example </section>
empty-element tags, for example <line-break />
XML is widely used on the web and it will be used to format the data in HTTP messages.
2.3.
JSON
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for
humans to read and write. It is easy for machines to parse and generate. It is based on a
subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition 17
December 1999. JSON is a text format that is completely language independent but uses
conventions that are familiar to programmers of the C-family of languages, including C,
C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON
an ideal data-interchange language.
JSON is widely used to format data in HTTP messages in APIS because even if it's not
easily readable it is easy to parse.
2.4.
DJANGO
Django is a high-level Phyton Web framework built to make the task of developing web
applications much easier. It's power comes from the ability to separate the application
development from the low-level hassles such as database connection. Another important
aspect of Django is the modularity that brings. A project is a set of applications
assembled.
Even if Django is a WEB framework and it's not meant to be used to build APIs it has
many properties that are useful:
Django models use DAO design pattern. This pattern allows the developer to avoid the
coupling between the application designed and the database connection.
Django middleware can be useful to design multi-layered systems. Because of the order
in which the middleware components are executed, they behave like if every one of them
was a distinct layer of the system. This allows the developer to design the whole system
before deploying it, avoiding the need of using multiple machines or virtualization tools.
Django pattern based URIS. Django framework does not only accept exact URIS but also
pattern-based URIS. It will ease the task of recognising URIS that vary depending on the
resource desired.
2.5.
RYU
Ryu is a component-based framework for Software-Defined Networking applications. It
provides software components with well defined API that make it easy to create network
management and control applications. Ryu supports various protocols for managing
network devices, such as OpenFlow, Netconf, OF-config, etc.
Ryu supports OpenFlow 1.0, 1.2, 1.3, 1.4. It's fully developed in Python and all of the
code is freely available under the Apache 2.0 license.
2.6.
LXC
With several functionalities added to the Linux kernel, it has become very easy to isolate
Linux processes into their own little environments. Isolation tools allow to build
containers, which are a lightweight virtualization technology. While hardware
virtualizations or para-virtualizations provide virtual machines, containers are an
operating system-level virtualization method for running multiple isolated Linux systems
(containers) on a single host.
With containers, a single Linux kernel is shared between the host and the virtual
machines. Containers can achieve higher densities of isolated environments than when
using virtual machines.
18
2.7.
MININET
Mininet is a network emulator. It runs a collection of end-hosts, switches, routers, and
links on a single Linux kernel. It uses lightweight virtualization to make a single system
look like a complete network, running the same kernel, system, and user code.
2.8.
OPENFLOW
OpenFlow is the first standard communications interface defined between the control and
forwarding layers of an SDN architecture. OpenFlow allows direct access to and
manipulation of the forwarding plane of network devices such as switches and routers,
both physical and virtual (hypervisor-based).
19
3.
Methodology / project development:
This project is, as stated before, has been divided in two phases with two different
methodologies.
In the first part of the project, the main objective was the study of the REST architectural
style and the generation of documentation that eases the task for future students that
want to learn about it, knowing the difficulties that are found in the way.
3.1.
Rest study
This first part had three visions: theoretical, practical and implementational. In the first
part, the task was to understand every detail of REST architectural style. The main
source of information was Dr. Fielding's dissertation but it was a rough read. It defines the
style in a very abstract way and it does not explain how to apply the explained concepts
in the thesis to actual (or at least at the time the thesis was published) technologies.
To solve this problem I started gathering information about REST in other sources such
as books or online articles. Those books and articles focus on a single concept on rest:
the resources and it's representations. It is the most important one, it's what defines its
identity, but it's not the only one. The other properties from REST were not explained in
many publication.
Once I had my ideas clear about which were the main concepts required by REST
applications, the next step was to learn how to apply this concepts to the development of
an API.
Finally, the last part was to find tools and protocols that could be used in the creation of
APIS that follow the REST constrains.
In the next part I will list the constrains defined in Dr Fielding's dissertation and a practical
approach to them, defining (if any) the technologies that can be used to apply those
constrains.
3.1.1. Client-Server
This constrains requires separation of concerns between the client and the server. An
intermediate block that separates interfaces from data.
The best technology to apply are URIS. They allow to identify resources, but they don't
need to be specifically files or directories on your server. It's the server task to interpret
the URIS and understand the resource to which they're pointing.
3.1.2. Stateless
The REST architectural style is defined on top of stateless communications.
To accomplish this requirement, there are two conditions to be fulfilled: the first one is that
the communication between the client and the server must be stateless and the second
one is that the API built must be stateless too.
The application protocol where we will build our applications (HTTP) is by nature
stateless. The second requirement has to be kept present when designing APIS, but
there's no technology or tool behind it.
20
3.1.3. Cache
The responses from a requests must be labelled as cacheable or non-cacheable.
Behind this constrain there are two questions: How to label the responses and which
ones are cacheable and wich ones are not?
The first question is easy to respond: The HTTP headers are the response. There's a
whole rfc dedicated to HTTP/1.1 cache handling (rfc7234).
About the second one, as a first approach is easy to say that the ones that have a high
variation rate should not be cached and the ones with low variation rate should be
cached.
To find a more elaborate response, you can look for a compromise between the
probability of data not being valid depending on the time since it was generated and the
improvements that you find by caching a determined resource. Also you should keep in
mind the possibility that a client can work with erroneous data when designing your API.
3.1.4. Uniform Interface
This is the most iconic and reviewed part of the REST architectural style. It states that
REST APIS have to apply the principle of 'generality to the component interface', meaning
that everything in REST (any resource) has to be accessed through the same interface. It
defines four sub-constrains:
Identification of resources: every resource must have a unique identifier.
Resource representation: a resource is never transferred to the client but a
representation of it.
Self-descriptive messages: messages must contain all the relevant information for a
server to process them.
Hypermedia as the engine of the application state: the client is guided by the server
through the application state by sending 'paths' in the form of URIS.
The first sub constrain is easy to accomplish: you only need to identify resources with
unique identifiers, to build URIS for a resource, they have to contain the identifier for the
server to be able to differentiate between different resources.
The second one is not that easy to understand, but it's easy to apply. From the client side
it has to be impossible to differentiate a resource that returns a file from one that returns
the result of executing an algorithm. The server has to define representations for every
resource and that's what he needs to transmit to the client. The resource representation
metadata is defined by the HTTP headers that are carried by the responses but it's
always referred to the data carried, not the way that the data was generated.
To generate self-descriptive messages we have to use one more time HTTP headers.
HTTP headers add the necessary metadata for the server to understand every request.
Also, since the connections are stateless, the messages can't vary it's meaning because
of previous interactions between the client and the server.
Finally, the HATEOAS requirement is the most forgotten one from this list. Since there's
no stored state, the server has to give to the client the information about where can it go
inside the API. You can look at your API like if it was a finite state machine. When a client
the API's bookmark is on the initial state, and you can guide them through the application
21
by providing links to the next possible resources in the application flow depending on the
last resource that they accessed.
3.1.5. Layered System
The basic configuration of a central node acting as a server is not scalable. Instead,
multiple layered systems have to be deployed to achieve salability.
To exemplify how a layered system is built, the last part of the project (link with SDN) is
developed deploying a multi-layered system.
3.1.6. Code on demand
It's an optional constrain that states that the server can extend the clients functionality by
sending scripts that are executed on the client side.
Since it's an optional constrain and due to the possible risks of executing code received
from a remote location (Man in the middle attack for example) I've preferred not to detail
this constrain. I've only listed some scripting languages that are widely used and stated
that they can be potentially dangerous.
3.2.
Theoretical benefits from using REST
Studying the REST architectural style has shown how systems that apply it gain:
portability, scalability, visibility, reliability, efficiency, improvement on the user-perceived
performance, visibility of interaction and simplicity on the system architecture.
However, it's hard to quantify this measures. Some of them don't have a measurable
magnitude, and other that can be measured depend highly on the implementation that the
developer has done.
3.2.1. Implementational approach
Once the REST constrains were clear, the next step was to study how to apply the
constrains to APIS generated with the Django framework, using HTTP as the application
protocol:
The implementational challenges to implement the constrains listed above are basically
three:
How to correctly manage the HTTP headers. Both the headers that are included in the
requests and the ones that are included as part of the responses. Django defines it's own
request and response objects. Those encapsulate a complete HTTP message: its
content, its headers and other HTTP fields such as the HTTP method and the status
code.
How to address correctly the resources: Django framework offers URL resolvers. In
Django's, URIS are parsed and depending on the results of the parsing, the requests are
sent to different 'views', which are callable objects that return an arbitrary response that is
not linked to the requests' URIS unless you want them to be.
How to simulate a layered system with Django. Django contains a built in cache
middleware that implements application-level data cache. Given its execution order,
middleware components act like a layered system.
22
3.3.
Documentation Generation
Once I had already clear the REST constrains from the three points of view, I started
generating the documentation. It was generated on Latex.
Latex is a word processor which gets its own markup language as plaint text input and
renders it into high quality typesetting text. It is widely used for writing publications or
scientific documents.
To communicate with my advisor and be able to discuss the documentation in an optimal
way, we've been using subversion repositories.
Subversion is an open source version control system which makes a good combination
with Latex because the source files of Latex are text based and svn does not update
whole documents but only the lines affected by the changes. The combination of both
technologies allows to make little changes to a document simultaneously.
The documentation generated about REST contains information about four topics:
A chapter that explains the constrains that have been explained before fully developed,
using numerous examples, written in an easy and intuitive way to understand and a
simple methodology that can be followed in the design process of a REST API.
A chapter were REST is compared to web services.
A whole chapter that describes how to use the Django framework to generate APIS that
follow the REST architectural style.
Finally, the documentation contains two practice chapters. One designed for the student
to practice the API design and another one dedicated to the use of Django as
implementation tool. Both chapters have the practices resolved.
3.4.
Rest SDN Application
In this second part of the project, the main idea is to apply all the knowledge learnt in the
first one in a SDN environment. More precisely I've developed a REST API using the
SDN framework Ryu as controller, working with Open Flow 1.3 protocol to communicate
with switches generated with mininet virtualization tool that use OpenVSwitch following
the design metodology explained in the REST documentation:
3.4.1. Define Functionalities:
The API has to be able to show the network topology.
The API must show the network performance statistics.
The API must show the routes installed in the switches and allow to add and delete them.
23
3.4.2. Define resources:
Bookmark
Topology
Bookmark
Switch List
Switch's Flow List
Link List
Individual Flow
Flow List
Port statistics
Table 1: List of resources
3.4.3. Define Resource representation:
Switch List representation:
[
{ "dpid" : "0000000000000001",
"ports" : [ { "dpid" : "0000000000000001",
“hw_addr" : "16:7a:df:c9:02:e5",
"name" : "s1-eth1",
"port_no" : "00000001",
"statslink" : "/statistics/1/1/"
},
{ "dpid" : "0000000000000001",
"hw_addr" : "ba:98:17:d4:27:f4",
"name" : "s1-eth2",
"port_no" : "00000002",
"statslink" : "/statistics/1/2/"
}]
}
]
You can find the other representations in annex 3:
24
3.4.4. HATEOAS
Figure 2: Finite State Machine representation of the REST
SDN API
3.4.5. Cache
For this application I've separated the resources in three categories: Low variation rate,
medium variation rate, high variation rate and statistics.
There are two resources with low variation rates: The bookmark and the topology
bookmark. This resources, have a validity time of 5 minutes.
Three resources have a medium variation rate: switches, links and switch's links. They
must be updated every 1 minute.
The three resources related with flows are updated every 30 seconds.
Finally, the statistics data varies exactly every three seconds, which means that the
validity time of this resource is 3 seconds.
3.4.6. Implementing
This whole application will work on a visualised environment with three main virtual
machines: a client, an intermediary server and a Ryu server. The objective is that the
client communicates with the Django intermediary server and the intermediary
communicates with the Ryu server:
25
Figure 3: Virtualization topology
This scheme lowers the load from the Ryu server, since the Django server can perform
many tasks that otherwise should have been done in the Ryu server.
The Django server will receive the request and check the authentication credentials and
the content negotiation headers. It also checks if the request URI corresponds with one of
the resources and also if the data that the client sends matches one of the formats
defined in 3.3.3. Finally it also acts as a cache: when a request is processed and returns
a 2XX code the client the server stores in memory the response. If a new request arrives
(requesting the same resource) if the stored data is still valid, it returns the stored
response. Otherwise, it connects to the Ryu server and retrieves new information.
The Ryu virtual machine has two tasks: The first one is to generate the virtual network
that is going to be monitored and the second one is to execute a controller application.
The controller application is fragmented in two parts: One part is the designed to send
and receive the Open Flow messages and the other one is designed to manage the
requests received from the Django server.
The application code can be found on annex 2 and the procedures applied to create the
network with virtualization technologies can be found on annex 1 (section 8.4.3) (plus the
LxC configuration files and the software installed on each machine so the whole server
can work)
A deeper explanation of the SDN API implementation can be found on annex 1 and an
example of use of the API can be found on annex 3.
26
4.
Results
The main result of this project is the documentation generated, which includes an
introduction, the core explanation of REST architectural style, practices about the REST
API development, a comparative between REST and other technologies applied on the
web, a whole review of the Django framework (the parts implied on the API
implementation), practices about the Django framework and finally a review of the RYU
framework's functionalities that are needed to develop REST APIS. All of it filled with
representative examples for the student to understand the topic better.
The study of REST has shown that it's application provides some measurable and some
non measurable benefits in front of other technologies (see Annex 1). Since it is an
architectural style that the developer uses as a guide, the performance improvements of
applying REST depend on the implementation that the developer does.
Another result is the application of Django in the development of REST APIS. Django has
good qualities desirable on a back-end server technology such as separation of concerns
through MVC pattern, DAO models and middleware, capacity to adapt to changes
through URL resolvers, a big community that has developed many pluggable components
and several sources of information about the framework.
When Django is used as an intermediary server, even if it keeps the qualities detailed
above, it's not an optimal tool. Ideally an intermediary server would receive requests,
perform some tasks on them and then forward them to the core server. The same
happens with the responses. Since you can't forward requests and responses, you need
to receive the requests and then establish a new connection to the core server. This
causes the server to block the client connection until the core server has processed the
new request and has generated a new response.
Finally, regarding the application of REST APIS to SDN it's safe to say that REST APIS
have qualities that fit perfectly on the SDN context. REST APIS are scalable, prepared for
change, robust in front of partial failures and are versatile regarding data formats and
codifications. Again, the performance benefits of applying REST APIS instead of other
types of technologies will depend on the implementation that the developer does.
27
5.
Budget
This project has been carried out by using open source tools over a laptop of an
estimate cost of 900€ and a Microsoft Office license has been needed in order to fill this
document adding 269€ to the total amount. Because of the speed that technology
evolves nowadays, an amortization period of 3 year is considered before the
obsolescence of the hardware and software used. We will consider a residual cost of the
computer of 100€.
Four months of full-time work of a junior engineer is considered, resulting on a
wage of 22000 €/year and 1833,33€/months after taxes.
Concept
Desktop Computer
Microsoft Office 2013
Salary
Price/month
Price/project
22,20€
88,80€
--
269€
1833,33€
7333,33€
Total Price
7691,13€
Table 2: Project Budget
28
6.
Conclusions and future development:
Rest is a widely accepted architectural style that many big enterprises have adopted
(Twitter, LinkedIn, Facebook, Amazon Product Advertising, etc.). It is widely used in APIS
that have to be used massively due to the freedom that it brings.
On the other side, on an enterprise environment SOAP is much more used. SOAP brings
some aspects that REST does not cover such as web services policies which wile are
important for inter-enterprise services, for other kind of applications are not required.
The developer will have to decide before building his own API which technology is better
suited for his needs.
Regarding the SDN REST API, it is in an initial phase. This document has a
demonstrative intention and there's still lots of details to describe and implement.
For instance, the API developed runs on top of a dummy application that is only used to
receive statistics messages and send flow modifications and statistics requests but
ideally, even if a SDN REST API contains a monitoring part, it should be used to control
the controller behaviour.
29
Bibliography:
[1] Roy Thomas Fielding. Architectural Styles and the Design of Network-based Software
Architectures. PhD thesis, University of California, Irvine, 2000.
[2] Paul Sobocinski. Hypermedia apis: The benefits of hateoas, 2014. [Online;
http://www.programmableweb.com/news/hypermedia-apis-benefits-hateoas/howto/2014/02/27].
[3] Roy T. Fielding. Rest apis must be hypertext-driven,
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven].
2008.
[Online;
[4] Leonard Richardson and Sam Ruby. RESTful Web Services. O’Reilly Media, 2007.
[5] Joshua Thijssen. The restful cookbook. [Online; http://restcookbook.com/].
[6] Jim Webber, Savas Parastatidis, and Ian Robinson. How to get a cup of coffee, 2008.
[Online; http://www.infoq.com/articles/webber-rest-workflow].
[7] Draft - make readable uris, 2004. [Online; http://www.w3.org/QA/2004/08/readableuri].
[8] Mike Amundsen. Roy fielding on versioning, hypermedia, and rest, 2014. [Online;
http://www.infoq.com/articles/roy-fielding-on-versioning].
[9] T. Berners-Lee, L. Masinter, and M. McCahill. Uniform Resource Locators (URL). RFC
1738 (Proposed Standard), December 1994. Obsoleted by RFCs 4248, 4266, updated by
RFCs 1808, 2368, 2396, 3986, 6196, 6270.
[10] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. BernersLee. Hypertext Transfer Protocol – HTTP/1.1. RFC 2616 (Draft Standard), June 1999.
Obsoleted by RFCs 7230, 7231, 7232, 7233, 7234, 7235, updated by RFCs 2817, 5785,
6266, 6585.
[11] A. Barth. HTTP State Management Mechanism. RFC 6265 (Proposed Standard),
April 2011.
[12] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Message
Syntax and Routing. RFC 7230 (Proposed Standard), June 2014.
[13] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Semantics and
Content. RFC 7231 (Proposed Standard), June 2014.
[14] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Conditional
Requests. RFC 7232 (Pro-posed Standard), June 2014.
[15] R. Fielding, Y. Lafon, and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1):
Range Requests. RFC 7233 (Proposed Standard), June 2014.
[16] R. Fielding, M. Nottingham, and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1):
Caching. RFC 7234 (Proposed Standard), June 2014.
[17] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Authentication.
RFC 7235 (Proposed Standard), June 2014.
[18] Introducing json. [Online; http://www.json.org].
30
[19] Use http basic authentification to login into
http://ponytech.net/blog/use-http-basic-authentification-login].
django,
2014.
[Online;
[20] Jacob K. Moss Adrian Holovaty. The Definitive Guide to Django: Web Development
Done Right. Apress, 2006.
[21] Django documentation. [Online; https://docs.djangoproject.com/en/1.8/].
[22] Erik Christensen, Francisco Curbera, Greg Meredith, and Sanjiva Weerawarana.
Web
services
description
lan-guage
(wsdl)
1.1,
2001.
[Online;
http://www.w3.org/TR/wsdl].
[23] Nilo Mitra and Yves Lafon. Soap version 1.2 part 0: Primer (second edition), 2007.
[Online; www.w3.org/TR/soap12-part0/].
[24] Martin Gudgin, Marc Hadley, Noah Mendelsohn, Jean-Jacques Moreau, Henrik
Frystyk Nielsen, Anish Kar-markar, and Yves Lafon. Soap version 1.2 part 1: Messaging
framework (second edition), 2007. [Online; www.w3.org/TR/soap12-part1/].
[25] Hugo Haas and Allen
http://www.w3.org/TR/ws-gloss/].
Brown.
Web
services
[26]
Don
Box.
A
brief
history
of
soap,
http://www.xml.com/pub/a/ws/2001/04/04/soap.html].
glossary,
April
2004.
2001.
[Online;
[Online;
[27] Some thoughts for the enterprise embracing web apis, 2012. [Online;
http://apievangelist.com/2012/12/09/some-thoughts-for-the-enterprise-embracing-webapis/].
[28] Douglas C. Schmidt. Overview of remote procedure calls (rpc). [Online;
http://www.cs.wustl.edu/ schmidt/PDF/rpc4.pdf].
[29] From edi to xml and uddi: A brief history of web services, 2001. [Online;
http://www.informationweek.com/from-edi-to-xml-and-uddi-a-brief-history-of-webservices/d/d-id/1012008].
[30] R. Srinivasan. RPC: Remote Procedure Call Protocol Specification Version 2. RFC
1831 (Proposed Standard), August 1995. Obsoleted by RFC 5531.
[31] Topology discovery with ryu, 2014. [Online; http://sdn-lab.com/2014/12/31/topologydiscovery-with-ryu/].
[32] Setting up openvswitch 2.0 + mininet 2.1+ ubuntu 13.04, 2013. [Online; http://sdnlab.com/2013/11/14/setting-up-openvswitch-2-0-mininet-2-1/].
[33] Robert Daigneau. Service design patterns : fundamental design solutions for
SOAP/WSDL and restful Web services. Addison-Wesley, 2012.
[34] Ryu development team. Ryubook 1.0, 2014. [Online; http://osrg.github.io/ryubook/en/html/index.html].
[35] Hao He. What is service-oriented architecture, september 2003. [Online;
http://www.xml.com/lpt/a/1292].
[36] Dave Marshall. Remote procedure
http://www.cs.cf.ac.uk/Dave/C/node33.html].
calls
(rpc),
March
1999.
[Online;
31
Glossary
API: Application Programming Interface
CoD: Code on Demand
CORBA: Common Object Request Broker Architecture
DAO: Data Access Object
DCOM: Distributed Component Object Model
HATEOAS: Hyperlink As The Engine Of Application State
HTTP: Hypertext Transfer Protocol
JSON: JavaScript Object Notation
LxC: Linux Containers
MVC: Model View Controller
OF: Open Flow
OVS: OpenVSwitch
REST: Representational State Transfer
RMI: Remote Method Invocation
RPC: Remote Procedure Call
SDN: Software Defined Networks
SOA: Service Oriented Architecture
SOAP: Simple Object Access Protocol
SVN: Subversion
UI: User Interface
URI: Universal Resource Identifier
32
URL: Universal Resource Locator
W3C: World Wide Web Consortium
XML: eXtensible Markup Language
33
Appendices:
7.
RESTful APIS book
8.
SDN REST API
9.
API Demonstration
34
REST API Book
Contents
1
2
Introduction to APIs
1.1 Introduction . . . . . . . . . . . . . .
1.2 Distributed APIs . . . . . . . . . . .
1.3 Web Services . . . . . . . . . . . . .
1.3.1 Service Oriented Architecture
1.4 REST . . . . . . . . . . . . . . . . .
1.5 When to use web technologies? . . . .
1.6 What will you find in this book? . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
5
6
6
6
7
Web Technologies
2.1 History . . . . . . . . . . . . . . . . . . . . .
2.2 HTML Documents . . . . . . . . . . . . . . .
2.3 HTTP Motivation . . . . . . . . . . . . . . . .
2.4 URL/URI . . . . . . . . . . . . . . . . . . . .
2.5 HTTP 1.0 . . . . . . . . . . . . . . . . . . . .
2.5.1 HTTP Requests . . . . . . . . . . . . .
2.5.2 Headers . . . . . . . . . . . . . . . . .
2.5.3 HTTP Responses . . . . . . . . . . . .
2.6 Cookies . . . . . . . . . . . . . . . . . . . . .
2.7 HTTP Proxies . . . . . . . . . . . . . . . . . .
2.8 Dynamic Web . . . . . . . . . . . . . . . . . .
2.8.1 Introduction . . . . . . . . . . . . . . .
2.8.2 CGIs . . . . . . . . . . . . . . . . . .
2.8.3 HTML Forms . . . . . . . . . . . . . .
2.9 HTTP 1.1 . . . . . . . . . . . . . . . . . . . .
2.9.1 Introduction . . . . . . . . . . . . . . .
2.9.2 Headers . . . . . . . . . . . . . . . . .
2.9.3 Chunked Data . . . . . . . . . . . . . .
2.9.4 Persistent Connections . . . . . . . . .
2.9.5 Continue . . . . . . . . . . . . . . . .
2.9.6 Caching . . . . . . . . . . . . . . . . .
2.9.7 HTTP 1.1 Methods . . . . . . . . . . .
2.9.8 HTTP 1.1 Status codes . . . . . . . . .
2.9.9 HTTP 1.1 Representation Headers . . .
2.9.10 HTTP 1.1 Content-negotiation headers
2.9.11 HTTP 1.1 Cache headers . . . . . . . .
2.9.12 HTTP 1.1 Conditional headers . . . .
2.9.13 HTTP 1.1 Authentication headers . . .
2.10 Practical HTTP with apache . . . . . . . . .
2.10.1 Introduction . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
10
10
11
11
11
12
13
14
14
14
14
15
17
17
17
18
18
20
20
21
22
22
23
24
26
26
27
27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2.10.2 Virtual Hosts (sites) .
2.10.3 CGIs . . . . . . . .
2.10.4 Modules . . . . . .
2.11 Commands summary . . . .
2.12 XML . . . . . . . . . . . .
2.12.1 Introduction . . . . .
2.12.2 XML Comments . .
2.12.3 Escaping . . . . . .
2.12.4 Well-formed XML .
2.12.5 Valid XML . . . . .
2.13 JSON . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
28
29
30
30
30
31
31
31
32
32
Restful Architectural Style
3.1 REST motivation . . . . . . . . . . .
3.2 REST Constrains . . . . . . . . . . .
3.2.1 Client-server . . . . . . . . .
3.2.2 Stateless . . . . . . . . . . .
3.2.3 Cache . . . . . . . . . . . . .
3.2.4 Uniform interface . . . . . . .
3.2.5 Layered System . . . . . . . .
3.2.6 Code on demand . . . . . . .
3.3 How to design your APIs . . . . . . .
3.3.1 Define functionalities . . . . .
3.3.2 Define your resources . . . .
3.3.3 Define resource representation
3.3.4 HATEOAS . . . . . . . . . .
3.3.5 Cache . . . . . . . . . . . . .
3.3.6 Implement your API . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
35
36
36
37
37
41
42
42
42
42
43
43
44
44
4
REST Practices
4.1 Interface exercices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Cache exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
45
46
5
Other API architectures
5.1 RPC APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Message based APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
55
55
57
6
Django Development
6.1 Introduction to DJANGO . . . .
6.2 Starting a new project . . . . . .
6.3 Project structure . . . . . . . . .
6.4 Models . . . . . . . . . . . . .
6.4.1 Model relationships . . .
6.4.2 Managers and QuerySets
6.5 Views . . . . . . . . . . . . . .
6.5.1 Function views . . . . .
6.5.2 Class-Based views . . .
6.6 URI patterns . . . . . . . . . . .
6.7 Formatting the output . . . . . .
6.8 Middleware . . . . . . . . . . .
6.9 Deploy the project . . . . . . . .
58
58
59
59
60
61
62
62
62
64
65
66
67
68
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6.10 Cache in Django . . . . . . . .
6.11 Example 1: File distribution API
6.11.1 First iteration . . . . . .
6.11.2 Second Iteration . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
Django Practices
8
REST Aplied to a SDN Application
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
8.2 Ryu Introduction . . . . . . . . . . . . . . . . . . .
8.3 Ryu Features . . . . . . . . . . . . . . . . . . . . .
8.3.1 Message Reply Handlers . . . . . . . . . . .
8.3.2 OpenFlow protocol messages . . . . . . . .
8.3.3 HTTP Request Handlers . . . . . . . . . . .
8.3.4 Link REST Controllers with Ryu applications
8.4 Monitoring Application . . . . . . . . . . . . . . . .
8.4.1 RYU implementation . . . . . . . . . . . . .
8.4.2 Django Implementation . . . . . . . . . . .
8.4.3 Topology configuration . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
68
70
70
73
80
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
. 95
. 95
. 96
. 96
. 97
. 97
. 98
. 99
. 99
. 107
. 115
Chapter 1
Introduction to APIs
1.1
Introduction
In computer programming, an application programming interface (API) is a set of routines, protocols, and tools for
building software applications. An API expresses a software component in terms of its operations, inputs, outputs,
and underlying types. An API defines functionalities that are independent of their respective implementations, which
allows definitions and implementations to vary without compromising each other. A good API makes it easier to
develop a program by providing all the building blocks. A programmer then puts the blocks together. 1
Basically an API is a black box that has some specified inputs and performs certain operations, returning (if any)
some outputs. We can find them in many places, for instance:
• Linux kernel interfaces. They allow user space programs to access system resources and services of the linux
kernel2 via syscalls.
• 3D computer graphics such as DirectX and OpenGL.
• Distributed APIs.
This document will focus on Distributed APIs.
1.2
Distributed APIs
Everyone knows what distributed systems are but everyone defines them in their own words, so when we talk about
a distributed system we will use the Client - Server architecture (Fig 1.1). The purpose behind the development of a
distributed api is to be able to execute a given call in the server from the client.
As a first approach, many platforms developed their own technologies, such as CORBA, Java Remote Method
Invocation (RMI), DCOM, or .NET Remoting. Some difficulties appeared here, the most important one was the
incompatibility between technologies: the client and the server had to use the same technologies. Furthermore some
vendor implementations/toolkits had troubles talking to each other due to the lack of standardization.
In order to get rid of the incompatibilities developers started to use web services. The main advantage of using
web services is the level of standardization achieved with them.
1.3
Web Services
Web services are applications that use a set of technologies that are able to operate in the web. Web services involve
many protocols and technologies such as XML, RPC, SOAP, WSDL, WS-SECURITY, etc. to achieve a high level of
standardization. Most of the web services use HTTP as a transport protocol.
1
http://en.wikipedia.org/wiki/Application_programming_interface
2 http://www.linux.it/~rubini/docs/ksys/ksys.html
5
Re
sp
on
se
Re
qu
es
t
Server / Service Provider
Clients
Figure 1.1: Client - Server architecture.
"A Web service is a software system designed to support interoperable machine-to-machine interaction over a
network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact
with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP
with an XML serialization in conjunction with other Web-related standards. " []
The main two organizations behind the web services standardization are Oasis and W3C. They define
Typically, web services have followed Service Oriented Architecture.
1.3.1
Service Oriented Architecture
In SOA, the server defines services, which can be seen as functionalities. The basic idea is to break down big applications into simple little units, which are services. This way, when you need a new functionality you can make use of
existing little services (Even outside from your domain) and concatenate them to achieve your desired functionality.
The servers should be highly reusable and multi-purpose.
In SOA, the interface is a fundamental part. If the interface between two applications doesn’t work, the system
doesn’t work. That’s why standard interfaces gain a big importance in web services.
In SOA, it is important to separate the functionality given from the implementation. "This is like going to a
restaurant: you tell your waiter what you would like to order and your preferences but you don’t tell their cook how to
cook your dish step by step." [35]
Notice that SOA is not restricted at all to web services.
1.4
REST
In year 2000 Dr. Roy Fielding published his PhD thesis: Architectural Styles and the Design of Network-based
Software Architectures [1]. In his thesis, he gave a completely different approach to web architecture and defined the
bases of a new architectural style: Representational state transfer. It may have some similarities with SOA but it is
different in a core aspect: the principal elements are Resources instead of Services.
Even if REST does not define any underlying protocol, it’s mostly used on top of HTTP, which fits perfectly with
the architectural constrains of REST.
1.5
When to use web technologies?
As we said before, web technologies solve one of the biggest problems of distributed APIs: if we find diversity among
the clients that access the server, the web will be the easiest way to implement an API (Understanding web technologies
6
and the web any application/protocol that is built on top of HTTP).
On the other hand, web technologies have some drawbacks:
• In every request the client must serialize the data as a stream of bytes and transmit it and the reverse operation
must be performed by the server: when it receives a stream of data it must deserialize it into an understandable
data format and structure. The same happens for each response (if the response is something more than a status
code). This processes is expensive.
• The HTTP protocol will add some headers that may be significant if an application requires high throughput or
low response time.
• HTTP is by definition ’stateless’, which means that if we need to develop a ’stateful’ application we will have
to build a state mechanism.
1.6
What will you find in this book?
In this book, you’ll find a very practical introduction to the design of APIs. More specifically apis that follow the
REST architectural style.
First of all, in chapter 2 you’ll find a review of the web technologies that we’ll use in the process of creating APIS
(HTTP, XML and JSON).
After that, in chapter 3 you’ll find the basic principles that define the REST architecture, a practical approach to
the developing of REST APIs and examples to fully understand the REST nature.
After that, in chapter 5 we’ll review other architectures and popular protocols that can be used to develop APIs.
Once the REST concept is clear, in chapter 6 you’ll be able to learn how to use the popular Web framework
’Django’, which is a very powerful tool that can be used to implement APIs.
Finally, in chapter 8 you’ll be able to see how to implement a REST API, applying all the learned concepts in the
previous chapters to a SDN project.
To fix the knowledges learned in chapters 3 and 6, you’ll find some proposed exercises in chapters 4 and 7.
7
Chapter 2
Web Technologies
2.1
History
Tim Berners-Lee is credited with having created the initial World Wide Web (WWW) during 1985-1991, while he was
a researcher at the European High-Energy Particle Physics lab at CERN (Centre Européen de Recherche Nucléaire). In
this context, a multi-platform tool was needed to enable sharing documents between physicists and other researchers
in the high energy physics community. Tim Berners-Lee wrote a proposal that was a solution for enabling such
collaboration. Four basic technologies were part of his proposal:
• HTML (HyperText Markup Language): a language to write documents.
• HTTP (HyperText Transfer Protocol): a protocol to transmit resources (like HTML documents).
• A WEB server: a software that serves resources like HTML documents.
• A WEB browser: a software that acts as client to send requests and process responses for resources available on
a WEB server (like HTML documents).
2.2
HTML Documents
HTML (HyperText Markup Language) as its name states is not a programming language like C or Java but a markup
language. In plain English, this means that HTML is a language for describing how content (text, images, etc.) should
be displayed. With the HTML language, we can create HTML documents to be displayed in a browser. HTML
documents are just text files so you can edit them with any text editor. There are also available “HTML editors”,
specially designed for writing HTML. Analyzing HTML documents is a good way of learning HTML. Let’s take a
look at a simple HTML document (see Code 2.1).
1
2
3
4
5
6
7
8
9
< html >
<head >
< t i t l e > H e l l o World < / t i t l e >
< meta h t t p −e q u i v = " c o n t e n t −t y p e " c o n t e n t = " t e x t / h t m l ; c h a r s e t =UTF−8" >
</ head >
<body >
H e l l o <b>World < / b > ! ! ! ! ! ! !
</ body >
</ html >
Code 2.1: Simple HTML document
As you observe, the HTML document is just text. However, some of the text is considered “hypertext”, which
means that it has a special meaning in HTML. Text enclosed between the characters “<” and “>” is hypertext and
those hypertexts are called “HTML tags”. HTML tags tell the browser to do something special. In our example,
8
“<b>World</b>” tells the browser to use the boldface font. As you see, some HTML tags have an opening tag and
an ending tag. This is marked as <tag> ... </tag>, like in the case of the boldface tag. Other tags however, are
just composed of a single tag. The HTML document is delimited by <html> and </html>. In addition, the HTML
document is divided in two parts:
• <head>. This part is optional. When <head> exists, it can contain several labels like <title>, <meta> etc. For
example, the <title> tag specifies the title that must be displayed in the browser’s window. With the <meta> tag
we can define the charset:
1
< meta h t t p −e q u i v = " c o n t e n t −t y p e " c o n t e n t = " t e x t / h t m l ; c h a r s e t =UTF−8" >
• <body>. Inside the body is where the whole HTML document is specified. All text, images, etc. are contained
between <body> and </body>.
On the other hand, we can also use tags to create hyperlinks to other resources (like other HTML documents). This
is a fundamental feature in HTML. The hyperlink tag is <a>... </a>. To see an example, look at Code 2.2:
1
2
3
4
5
6
7
8
9
10
11
< html >
<head >
< t i t l e > H e l l o World < / t i t l e >
< meta h t t p −e q u i v = " c o n t e n t −t y p e " c o n t e n t = " t e x t / h t m l ; c h a r s e t =UTF−8" >
</ head >
<body >
H e l l o <b>World < / b > ! ! ! ! ! ! !
Go t o <a h r e f = d o c s / o t h e r d o c . html > a n o t h e r
document </ a >
</ body >
</ html >
Code 2.2: Simple HTML document with an hyperlink.
In the previous example, we link our HTML document with another HTML document that is located in a relative
directory called “docs”. Relative paths are described taking the location of the HTML document as reference. Notice
that the hyperlink is specified as a parameter “href” inside the opening tag. The <img> tag is used to display an image.
The src attribute provides the path to the image. Example:
1
<img s r c = " p i c t u r e s / image . g i f " >
On the other hand, blank spaces and new lines are called ”whites”. You can add as many ”whites” as you like to
make your HTML file easier to read but browsers display consecutive whites as a single space. If you need to create
a paragraph, you have to use the labels <p> ... </p>. For paragraphs, the browser will adjust the text lines correctly
based on the window width. If you really want to force a new line, you have to use the <br> tag. HTML has many
tags but with a few of these tags, we can have an idea about how HTML works. Some more useful tags are:
• <i> </i> Sets text in italics.
• <tt> </tt> Sets text in teletype.
• <h1> </h1> Sets text in type “header 1”. You can use numbers of headers in descending order of importance
(size): <h2> </h2> . . . <h6> </h6>
• <hr> Prints an horizontal line.
• <center> </center> Centers text and images.
• <blockquote> </blockquote> Indents text.
• <pre> </pre> Pre-formatted text, i.e. spaces and line breaks between these tags are maintained.
• <!-- text comments... --> Comments in the HTML file.
9
2.3
HTTP Motivation
Initially, HTTP (Hypertext Transfer Protocol) arised from the necessity of creating hyperlinks in HTML documents
to resources that are not on the same host. HTTP is a text protocol and it is based on a client/server model that can
be used over a TCP/IP network to deliver virtually any resource of the World Wide Web (WWW). For now, we will
consider that a resource is just an HTML document. An HTTP server or WEB server is a network daemon that uses
by default the well-known TCP port 80. HTTP clients, generically called WEB Browsers (e.g. firefox or lynx),
send HTTP requests to the HTTP servers asking for a resource and the server responds with the requested resource
(see Figure 2.1).
HTTP protocol
Browser
GET doc.html
HTTP server
connection TCP/80
doc.html
Figure 2.1: HTTP client/server.
2.4
URL/URI
The first issue to implement HTTP is to define how to identify resources. The identifiers used in HTTP were initially
defined by Tim Berners in 1991. They were called URLs (Uniform Resource Locators) and they were first used
to allow authors of HTML documents to establish hyperlinks in the WWW. An URL is just a text string with a
standard format that allows you to name a resource based on its location on the WWW. In 1994, the URL concept
was incorporated into a more general concept called URI (Uniform Resource Identifier). URI is the standard name for
resource identifiers in the Internet, but the term URL is still widely used. The simplest URL/URI format is as follows:
1
p r o t o c o l : / / hostname / d i r e c t o r y / r e s o u r c e
But, other information can also be present in the URL:
1
p r o t o c o l : / / u s e r n a m e : password@hostname : p o r t / d i r e c t o r y / r e s o u r c e
The detailed specification for URL/URIs is in RFC 1738[9]. Some examples are:
• http://www.example.com/pictures/upc.jpg
• http://www.example.com
• http://192.168.0.5
• http://www.example:8080/cgi-bin/time.sh
• http://user:[email protected]/
• ftp://debian.org
If in the URL there is not any resource (filename) specified, it is assumed that the client is asking for a file called
index.html or index.htm. As its name suggests, this file contains an HTML file with the Web site index.
On the other hand, we can use absolute or relative paths in HTTP hyperlinks. In an HTTP server, absolute
paths are related to a directory called DocumentRoot. This parameter is defined in the configuration file of
the HTTP server. For example, a typical DocumentRoot when using Linux is /var/www. In this case, the URL
http://www.example.com/images/upc1.gif refers to a file called upc1.gif that is stored in the HTTP server in the directory /var/www/images. The following HTML file serves as an example of how to use absolute and relative paths:
10
1
2
3
4
5
6
7
8
9
10
11
12
13
< html >
<head >
< t i t l e > H e l l o World < / t i t l e >
</ head >
<body >
<p> H e l l o <b>World < / b > ! ! ! ! ! ! ! < / p>
<p>Go t o <a h r e f = d o c s / o t h e r d o c . html > a n o t h e r document </ a > </ p>
<p>You c a n v i s i t t h e UPC home p a g e a t <a h r e f = " h t t p : / / www. upc . edu " >UPC home < / a > . </ p>
<img s r c = " / i m a g e s / upc1 . g i f " >
<img s r c = " / i m a g e s / upc2 . g i f " >
<img s r c = " h t t p : / / www. e x a m p l e . com / i m a g e s / upc1 . g i f " >
</ body >
</ html >
Code 2.3: Simple HTML Document with an Absolute Path and External Hyperlinks.
2.5
HTTP 1.0
HTTP is a text protocol that uses the client-server model like many other TCP/IP applications and 80 as its default
port. Other TCP port can be used but the client must know this port and include it in the URL. Then, the HTTP client
opens a TCP connection and sends an HTTP request to an HTTP server. If everything is correct, the server returns
an HTTP response that contains the requested resource. After delivering the response, the HTTP server closes the
TCP connection. HTTP is a stateless protocol, which means that HTTP does not maintain state information between
different requests.
2.5.1
HTTP Requests
In an HTTP request, the first line is the only one mandatory and it contains the “request method”, the path to the
resource and the HTTP version. Then, it follows a blank line (CR+LF). The minimal request in HTTP 1.0 is something
like the following:
1
2
GET / HTTP / 1 . 0
[ blank l i n e ]
GET is the most commonly used request method and it means “give me this resource”. After the GET keyword
we find a “/”. This means that the resource that we are requesting is the index file of the WEB server. Finally the line
ends with a CR+LF ([blank line]). Another example is:
1
2
GET / i m a g e s / upc1 . g i f HTTP / 1 . 0
[ blank l i n e ]
In this case, the client is requesting a file called upc1.gif that is stored in the HTTP server in the directory images
(relative to the server’s DocumentRoot).
2.5.2
Headers
Requests (and also responses) can have header lines. Headers are text lines that provide additional information or
functionality in requests/responses. The format is ”Header-Name: value1, value2”, ending with CR+LF. The header
name is not case-sensitive. There can be any number of spaces or tabs between : and the value. The header lines
starting with space or tab are actually part of the previous header line (used for readability). The following headers are
equivalent:
1
H e a d e r 1 : some−l o n g−v a l u e −1a , some−l o n g−v a l u e −1b
2
3
4
H e a d e r 1 : some−l o n g−v a l u e −1a
some−l o n g−v a l u e −1b
11
HTTP 1.0 defines 16 headers, though none is required. Typical headers included in the requests are:
• From: gives the email address of the user who makes the request.
• User-Agent: name of the browser and OS.
For example, a request with headers could be the following:
1
2
3
4
GET / p a t h / f i l e . h t m l HTTP / 1 . 0
From : user@example . n e t
User−Agent : M o z i l l a / 5 . 0 ( X11 ; Ubuntu ; L i n u x i 6 8 6 ; r v : 2 4 . 0 ) Gecko / 2 0 1 0 0 1 0 1 F i r e f o x / 2 4 . 0
[ blank l i n e ]
The headers can help to solve the problems in web sites but they also reveal information about the user. Thus,
notice that there is a trade-off between information provided for debugging and the user privacy.
2.5.3
HTTP Responses
HTTP responses are also composed of text lines. The first text line of an HTTP response is the status. Typical status
lines are:
1
2
HTTP / 1 . 0 200 OK
HTTP / 1 . 0 404 Not Found
The first digit identifies the general category of the status:
• 1xx indicates an informational message only.
• 2xx indicates success of some kind.
• 3xx redirects the client to another URL.
• 4xx indicates an error in the client side.
• 5xx indicates an error in the server.
Examples:
• 301 Moved Permanently.
• 302 Moved Temporarily.
• 303 See Other (HTTP 1.1 only. Means that the resource has been moved to another URL given by the location
header in the response).
• 500 Server Error.
On the other hand, a response can also have headers. The headers usually included in responses by servers are:
• Server: header is analogous to the User-Agent (it identifies the server software).
• Date: current date.
• Last-Modified: date of last modification of the resource being returned. This header is used for caching (explained later).
After the headers, if the resource was available in the server, we can find a CR+LF and then the response’s body
containing the requested resource. In general, if an HTTP message includes a body, there are at least two additional
header lines to describe the body’s content. These header lines are “Content-Type” and “Content-Length”:
• Content-Type: MIME-type of the object.
• Content-Length: number of bytes of the object.
For example, to retrieve the file http://www.example.com/path/file.html using HTTP 1.0, the first step is to open a
TCP connection with the server www.example.com using the HTTP default TCP port 80. Then, through this connection the client could send an HTTP 1.0 request like the following:
1
2
3
GET / p a t h / f i l e . h t m l HTTP / 1 . 0
From : user@example . n e t
[ blank l i n e ]
Through the same socket (connection), the server could respond with something like the following:
12
1
2
3
4
5
6
7
8
9
10
HTTP / 1 . 0 200 OK
D a t e : Mon , 21 Oct 2013 2 2 : 2 9 : 5 9 GMT
C o n t e n t −Type : t e x t / h t m l
C o n t e n t −L e n g t h : 50
[ blank l i n e ]
< html >
<body >
<h1 > I t works ! < / h1 >
</ body >
</ html >
After receiving the response, in the basic implementation of HTTP 1.0, the client closes the TCP socket.
2.6
Cookies
As previously mentioned, HTTP is a stateless protocol, which means that HTTP does not maintain state information
between different requests. A cookie is a piece of information (UTF8 text) sent from an HTTP server and that is
stored by the browser in the client’s filesystem. Sometimes cookies are also called footprints. The browser returns
cookies unchanged to the server. Cookies provide a state (memory of previous events) into otherwise stateless HTTP
transactions. Without cookies, each retrieval of a web page or component of a web page is an isolated event, mostly
unrelated to all other views of the pages of the same site. The most common uses of cookies are:
• User Control. For example, when a user enters his username and password, a cookie can store this information
so there is no need to enter them again in a later visit to the web server.
• Getting information about user’s browsing habits.
The HTTP server sends lines with the Set-Cookie header if the server wishes the browser to store these cookies.
Set-Cookie is a directive for the browser to store the cookie and send it back in future requests to the server (subject to
expiration time or other cookie attributes).
For example, the browser requests the resource http://www.example.org/doc.html (see Figure 2.2).
Browser
HTTP server
GET /doc.html HTTP/1.0
HTTP/1.0 200 OK
Content-type: text/html
Set-Cookie: EXSID=ABCKKO…em_vYg; Expires=Wed, 27 Feb 2019 10:10:10 GMT
(content of page)
GET /doc.html HTTP/1.0
Cookie: EXSID=ABCKKO…em_vYg; Expires=Wed, 27 Feb 2019 10:10:10 GMT
...
Figure 2.2: How HTTP cookies work.
The client sends a regular request, then the server asks the client to store the cookie. Then, the client sends the
cookie in a subsequent request. It is worth to mention that there are more fields (like path and domain) in the cookie
to help in deciding when to send it or not. Finally, as a you may imagine cookies can cause problems of privacy.
13
2.7
HTTP Proxies
An HTTP proxy is a program that acts as an intermediary between a browser and a Web server. HTTP Proxies are
typically used for security (a single point of control) or efficiency (caching).
HTTP server
Browser
Browser
GET
GET
TCP connection
GET
TCP connection
HTTP Proxy
Server
(transparent)
HTTP server
New TCP connection
HTTP Proxy
Server
(no transparent)
(a) Transparent.
(b) No transparent.
Figure 2.3: HTTP Proxies.
From the point of view of users, there are two basic types of proxies:
• Transparent (Figure 2.3a). A transparent proxy intercepts normal communication at the network layer without
requiring any special client configuration. Clients need not be aware of the existence of the proxy.
• No transparent (Figure 2.3b). A proxy that is not transparent receives requests from clients and sends requests
to servers. The responses go the way back also using the proxy. Therefore, a proxy has functions of a client and
a server. A non-transparent proxy can use another transparent or non-transparent proxy to reach the final server.
Clients send their requests to the proxy instead of the real server specified in the URL (the proxy IP address
and port is defined in the browser). HTTP requests using a non-transparent proxy must include the full URL of
the resource (not only the relative path). In this way, the proxy knows to which server it must send the HTTP
request. For example:
1
2
GET h t t p : / / www. s o m e h o s t . com / p a t h / f i l e . h t m l HTTP / 1 . 0
[ blank l i n e ]
Finally, it is worth to mention that we have open source HTTP proxy implementations like Squid (which widely
used).
2.8
2.8.1
Dynamic Web
Introduction
In today’s Web, the content is not static but documents are generated on the fly by servers with information provided by
clients. As a result, WWW is not just a huge database of documents or content but a platform to implement services and
applications. Common applications of the dynamic web are searching engines, remote access to corporate applications
and databases, etc.
2.8.2
CGIs
There are several ways of implementing the dynamic Web. In this document, we only deal with CGIs because they are
easy to understand and they were the first method used for such purpose. CGIs or Common Gateway Interfaces are
a standard procedure through which HTTP servers can use external applications to dynamically generate content (see
Figure 2.4).
When we use a CGI, the URL identifies:
• An executable program (which is also called “the CGI”).
• The parameters with which the CGI has to be executed.
14
Browser
HTTP server
GET /cgi-bin/cgi HTTP/1.0
HTTP/1.0 200 OK
Content-type: text/html
...
(content of page generate by the CGI)
cgi
Figure 2.4: How CGIs work in HTTP.
The first issue to take into account is how a web server knows that it has to execute a program instead of sending
a resource. An usual solution is to store all the CGIs in a special directory, typically called /cgi-bin/. In this way, if a
client asks for www.example.com/cgi-bin/program the server knows that it must execute program instead of sending
it. The second issue is how to send the parameters to the program. When using GET, the parameters are encoded in
the URL. These parameters are added to the URL after a character “?” and separated by the character “&”. Example:
1
h t t p : / / www. e x a m p l e . com / c g i −b i n / p r o g r a m ? param1 = v a l u e 1&param2 = v a l u e 2 . . .
Note. Spaces are translated using the character ”+” and ASCII characters can also be sent in the format %NNN,
where NNN is the ASCII code number.
Finally, before executing the CGI, the Web server establishes a special context for the program using environment
variables. These variables are:
CONTENT_LENGTH, CONTENT_TYPE, REMOTE_HOST, REMOTE_USER, REQUEST_METHOD,
SERVER_NAME, QUERY_STRING, GATEWAY_INTERFACE, HTTP_*
For GET requests, the QUERY_STRING variable takes the value of the parameters, as shown in the URL. In this
manner, the CGI can get the parameters that the client has specified. Regarding the response, the CGI writes it to the
standard output (STDOUT). Then, the server reads this answer and sends it to the client through the socket. Depending
on the type of web server, the CGI application can act in two ways:
• NPH Server (No Parse Header). The CGI application must write the complete response including the HTTP
headers.
• PH Server (Parse Headers). The CGI application must write a response without HTTP headers and it must pass
information to the server on how to form the headers.
Typically, web servers are NPH.
Finally, we would like to remark that CGIs are not the most efficient solution because a process is created per
request. Today we have other solutions more efficient or flexible to create dynamic Websites like Javascript, Phyton,
PHP, JAVA servlets, etc.
2.8.3
HTML Forms
An HTML form allows a client to send parameters to a WEB server. The tag to declare a form is <FORM>. Different
elements can be inserted into the form: text input elements, codes, images, files, checkboxes, etc. These elements are
inserted in the form using the <INPUT> tag. All the items of the form have a “type” attribute and they might have a
”name” attribute. There are two special elements: RESET, which clears the form to its original state and SUBMIT,
which presents a button to send the form. Example:
15
1
2
3
4
5
6
7
< html >
<head >
< t i t l e > W e b s i t e t i t l e </ t i t l e >
</ head >
<body >
Form t o s e l e c t p a r a m e t r e s t o s e n d t o t h e s e r v e r .
< form ACTION= " / c g i −b i n / p r o c e s s " METHOD= "GET" >
8
9
E n t e r a name : <INPUT NAME= " a " TYPE= " t e x t " > < br >
10
11
E n t e r a p a s s w o r d : <INPUT TYPE= " p a s s w o r d " NAME= " b " MAXLENGHT= " 8 " > < br >
12
13
Checkbox : <INPUT TYPE= " c h e c k b o x " NAME= " c " > < br >
14
15
16
17
18
19
<INPUT TYPE= " r e s e t " > <INPUT TYPE= " s u b m i t " >
< br >
</ form >
</ body >
</ html >
Code 2.4: HTML document with a form.
Figure 2.5: An HTML form viewed from a browser.
Figure 2.5 shows the form viewed in a WEB browser. When the form is sent (pressing the submit button), the
client generates an HTTP request using the method (GET or POST) showed in the METHOD attribute to execute the
script or application indicated in the ACTION attribute.
As already discussed, GET requests do not have a body but parameters for the execution of the application are
encoded in the URL, while POST requests have a body with the parameters. Using the content-type header, the POST
request defines how the parameters-values have been encoded:
• application/x-www-form-urlencoded. This is the default encoding type. It is similar to the encoding used by
GET. You cannot send a body in the request.
• multipart/form-data. Separates parameters with a mark (boundary). You can also include a body (e.g. a binary
file) in the request.
Example:
1
2
3
POST / c g i −b i n / p r o g r a m HTTP / 1 . 0
From : user@example . n e t
C o n t e n t −Type : a p p l i c a t i o n / x−www−form−u r l e n c o d e d
16
4
5
6
C o n t e n t −L e n g t h : 27
[ blank l i n e ]
param1 = v a l u e 1&param2 = v a l u e 2
The question is: use GET or POST? Actually, each method has advantages and drawbacks. When using GET,
the parameters for the server are encoded in the URL. This can be considered security vulnerability because these
parameters can be read by anyone. Another drawback of GET is that it does not allow sending binary files in the body
of the request. However, GET is useful to perform requests and store the results together with the associated URL
(that contains all the parameters of the query). GET also allows to use the back button to go to the previous results.
On the other hand, with POST the parameters for the server are sent in the body of the request. With POST
the parameters are not visible in the browser as a query string. In general, GET is useful for idempotent operations
(which always give the same result). POST means ”carry out” an action with a ”side effect” or a change of state
(non-idempotent operations).
2.9
2.9.1
HTTP 1.1
Introduction
HTTP 1.1 defines 46 headers, and one of them “Host” is mandatory in requests. HTTP 1.1 was defined to face up new
needs and to overcome the shortcomings of HTTP 1.0. In general terms, HTTP 1.1 is a superset of HTTP 1.0. These
improvements include:
• Host header. Provides efficient use of IP addresses. Now, multiple domains can be served from a single IP
address.
• Chunked encoding. Allows a faster response for dynamically generated pages. Pages are divided and sent in
chunks (fragments). In this way, a response can be sent before its total content or length is known.
• Persistent connections. A TCP connection is not opened/closed for each request. By allowing multiple HTTP
transactions in one TCP connection we can reduce the total transmission delay.
• Caching. The protocol provides headers to implement caching. This allows a faster response and bandwidth
savings.
HTTP 1.1 requires changes in both client and server. Next, we describe in more detail each of the previous features.
HTTP1.1 was originally defined in RFC2616[10]. In June 2014 it had major changes and now is defined not only
by one rfc’s but for many of them. The ones that carry the most important information are:
RFC7230[12]: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing
RFC7231[13]: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
RFC7232[14]: Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests
RFC7233[15]: Hypertext Transfer Protocol (HTTP/1.1): Range Requests
RFC7234[16]: Hypertext Transfer Protocol (HTTP/1.1): Caching
RFC7235[17]: Hypertext Transfer Protocol (HTTP/1.1): Authentication
2.9.2
Headers
This documment does not contain all the available headers from HTTP1.1 specification. Instead, the most used ones
are listed and explained. For an exhaustive and precise understanding of the headers you should look into RFC7231
[13].
HTTP headers have a specific format: They are case sensitive, following the camel case 1 style but adding a hyphen
between words (Content-Type, If-Match, etc.) followed by a colon (:) and the value of the header.
Example:
1 http://en.wikipedia.org/wiki/CamelCase
17
1
Allow : GET , PUT
There is only one header that MUST be included in every request: Host
From HTTP 1.1, web servers can be multi-domain. For example, we can have the domain ”www.example.com“
and ”www.example.net” on the same server. Thus, the IP address of the server is not enough to figure out which is the
domain to be served. An analogy is a situation in which several people share a phone, then, when we call, we have to
ask who is speaking and possibly ask for the correct person. So, in HTTP 1.1, each request must specify the hostname
(and optionally the port). A minimal HTTP request for version 1.1 could be the following:
1
2
3
GET / HTTP / 1 . 1
H o s t : www. e x a m p l e . com : 8 0
[ blank l i n e ]
The host header contains the domain name or IP address of the WEB server. The port number (”:80”). In this case,
specifying the port is not necessary because 80 is the default port for HTTP.
Regarding HTTP proxies and the Host header, the destination for a request can appear in the URL (as an absolute
URI) as well as in the Host header. So, it is important for proxies to behave correctly when both appear. In short, the
host and port in an absolute URI always override the Host header. For example:
1
2
GET h t t p : / / e x a m p l e . n e t / f o o HTTP / 1 . 1
H o s t : www. e x a m p l e . com : 8 0 0 0
Here, the server that will be used is example.net and the port 80 (the default for HTTP).
2.9.3
Chunked Data
This mechanism allows a server to start sending a response before knowing the complete content, that is to say,
before knowing the total length of the content. The idea is to divide the response in small pieces called “chunks” and
send these chunks one after another. Responses divided in chunks are identified by the header ”Transfer-Encoding:
chunked”. All HTTP 1.1 clients must be able to correctly process responses divided in chunks. The body of a message
such as “chunked” contains: several fragments (chunks) followed by a line with ”0” (zero). Optionally followed by
the foot of the page (footers). Each “chunk” consists of two parts: (1) a line with the size of the chunk in hexadecimal
+ CR+LF and (2) Data + CR+LF. Example without chunks:
1
2
3
4
5
HTTP / 1 . 1 200 OK
C o n t e n t −Type : t e x t / p l a i n
C o n t e n t −L e n g t h : 42
[ blank l i n e ]
abcdefghijklmnopqrstuvwxyz1234567890abcdef
The same example with chunked data:
1
2
3
4
5
6
7
8
9
10
HTTP / 1 . 1 200 OK
C o n t e n t −Type : t e x t / p l a i n
T r a n s f e r −E n c o d i n g : c h u n k e d
[ blank l i n e ]
1a
abcdefghijklmnopqrstuvwxyz
10
1234567890 a b c d e f
0
[ blank l i n e ]
2.9.4
Persistent Connections
In HTTP 1.0, TCP connections are closed after each request/response by default. As we know, opening/closing TCP
connections requires a substantial amount of CPU time, bandwidth, and memory. In practice, most web pages consist
of several files (linked HTML documents, images, etc.) that are located on the same server. Consecutive requests (and
18
their associated responses) can be more efficiently transmitted by allowing multiple requests/responses to be sent over
a single connection. This is mechanism is called “persistent connections”.
In HTTP 1.1, persistent connections are used by default. We do not need anything special to use persistent connections. Simply, the clients open a connection, send multiple requests one after another and then, read the corresponding
responses in order. The client can include a header “Connection: close”. Then, the server has to close the connection
after the reply. This should only be used if the client is unable to process persistent connections or if it is known that
the request will be the last.
On the other hand, if a response contains the header “Connection: close”, then, the client cannot send more
requests through that connection and it must close the connection after the response is received. A server may close
the connection before sending all the answers. In this case, the client is responsible for tracking the answered requests
and resend these unanswered requests if necessary. The HTTP 1.1 client can also send multiple requests through a
single connection without having received any response (pipelining).
On its side, an HTTP 1.1 server must store queued requests while it can not process them, and it must send the
responses in the same order as it received the requests. If a request includes the header ”Connection: close”, the
server must interpret this as that the request is the latest and it must close after sending the corresponding response.
The server also closes idle connections (after a period of time, typically 10 seconds). Some servers do not support
persistent connections to save resources (minimize the number of concurrent open sockets). If it is not wanted to use
the persistent connection, then the server can include the header ”Connection: close” in each response.
Finally, it is worth to mention that typically, clients (browsers) open several simultaneous persistent TCP connections with each server. In the example of Figure 2.6, the browser uses 2 persistent connections with the HTTP
server.
Browser
HTTP server
index.html
GET / HTTP/1.0
connection 1
HTTP/1.0 200 OK
Content-type: text/html
...
<img src=”images/img1.gif”>
<img src=”images/img2.gif”>
<img src=”images/img3.gif”>
<img src=”images/img4.gif”>
<img src=”images/img5.gif”>
...
connection 1
GET /images/img1.gif HTTP/1.0
connection 1
GET /images/img2.gif HTTP/1.0
connection 2
GET /images/img3.gif HTTP/1.0
connection 1
...
Figure 2.6: 2 Multiple Persistent Connections with an HTTP Server.
For example, in a real browser like firefox the default is having up to 6 persistent connections per server. In fact,
this can be configured with the parameter network.http.max-persistent-connections-per-server.
To configure this parameter, type about:config in the URL bar of firefox. Enabling multiple connections
helps in increasing the performance since we can obtain more throughput from several connections than from just one
connection. In addition, the client can send its requests in parallel through the different connections.
19
2.9.5
Continue
The “continue” mechanism allows to determine if the server is willing to accept a request based on the message
headers. This is useful if a client has to send a request with a big body (e.g. big file). This mechanism prevents to
waste time and resources if the server is going to reject the message (independently of its body). Clients include the
header “Expect :100-continue“. Then, if the server is going to process the request must respond with 100 (Continue)
status. A client should not send the Expect header if it is not going to send any body in its request.
2.9.6
Caching
HTTP defines two different kind of headers to achieve a caching system. From the client and server’s point of view it
defines the ”date“ header and conditional headers. The date header is used to know the reference time when a response
was created. The conditional headers indicate some conditions for the server to determine if it should process a
request or not. The most typical one is ’If-Modified-Since’ which specifies a date. A server will only respond a
request including this header if the information requested has changed since that specified date.
Finally, an ETag can be added. This is an identifier assigned by a web server to a specific version of a resource.
Example: A client sends a request on index.html to the server and the server responds with the following message:
1
2
3
4
5
6
7
8
9
10
HTTP / 1 . 1 200 OK
C o n t e n t −Type : t e x t / h t m l
D a t e : Mon , 04 May 2015 2 0 : 4 5 : 0 2 GMT
[ blank l i n e ]
< html >
<body >
H e l l o World !
</ body >
</ html >
[ blank l i n e ]
Some time later the client wants to check the index.html page again. and it sends this HTTP request:
1
2
3
4
5
GET / i n d e x . h t m l HTTP / 1 . 1
H o s t : www. m y s i t e . com
I f −M o d i f i e d −S i n c e : Mon , 04 May 2015 2 0 : 4 5 : 0 2 GMT
[ blank l i n e ]
[ blank l i n e ]
If this page has changed since that day the server will send back a normal response. But if the page has not changed
since then, the server will only send a response with a 304 code (Not modified), reducing the server’s execution time
and network’s load.
HTTP 1.1 defines a set of headers for intermediary systems. One of the most importants is Cache-Control
(see Section for further information of caching).
Example: A client requests a resource from a server. It sends the following message:
1
2
3
4
5
GET / p a g e . h t m l HTTP / 1 . 1
H o s t : www. m y s i t e . com
Cache−C o n t r o l : max−a g e =120
[ blank l i n e ]
[ blank l i n e ]
The max-age parameter in Cache-Control indicates that the client wants a response that has been stored in some
intermediary server for less than two minutes.
The server could respond with something like this:
1
2
3
4
5
6
HTTP / 1 . 1 200 OK
C o n t e n t −Type : t e x t / h t m l
D a t e : Mon , 04 May 2015 2 0 : 4 5 : 0 2 GMT
Cache−C o n t r o l : no−c a c h e
[ blank l i n e ]
< html >
20
7
8
9
10
11
<body >
Bye World !
</ body >
</ html >
[ blank l i n e ]
The no-cache parameter in Cache-Control indicates to the caching intermediaries that they mustn’t store the
information in the message’s payload.
You can find a list of the most important cache related headers in section 2.9.2.
2.9.7
HTTP 1.1 Methods
We have already seen some HTTP methods (GET and POST). In this section we will review all the available methods
and explain what are each of them used for.
• OPTIONS: ”This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.“ [13].
The OPTIONS method should return an ’Allow’ header listing all available methods for a certain resource. It’s
not required for the response to contain a body, but if it does, it should contain information about the communication options. The body structure is not standardized so it depends on the developer implementation. The
OPTIONS responses are non cacheable.
• GET: This method is the most used one on the web. The method allows a client to obtain the a resource. When
a server receives a GET request it should not perform ANY modification in the resources. For this reason is
considered a ’safe’ method: you can use it as many times as you want and it’s not going to change anything on
the server. The GET responses are cacheable
• HEAD: Head is the other ’safe’ method. It is used in the same terms of GET requests but the response to a HEAD
message doesn’t contain a body. Instead, it only contains the headers that would be sent with GET. HEAD can
be used for checking link validity, accessibility and recent modification in order to reduce network load. The
HEAD responses are cacheable
• POST: In the RFC7231[13] definition of POST method it’s stated that: ”The POST method is used to request
that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by
the Request-URI in the Request-Line.“. This is, in plain language, to add some data to an existing resource. For
example: if you have a resource that represents your cars and you buy a new car you should POST it to the list.
However, the POST method has been used wrong a lot because of the HTML specification. It allows only two
methods in it’s formularies: GET and POST. This means that everything that requires something more than
retrieve information is achieved with POST. For example, if a server has a web interface and you want to delete
an resource from the server, the developer has to implement a formulary that uses POST or GET and points to a
certain URI and make the server listen to the request as a DELETE request on the resource.
POST response are by default not cacheable.
• PUT: PUT is used to create new resources. The body of the PUT message must be considered as the new
resource. The resource must be identified with the request URI. If the request URI point to an existing resource
the message should be considered as a modification of the existing resource and should be updated. PUT
responses are not cacheable.
• DELETE: It’s used to remove a resource identified by the request URI. DELETE responses are not cacheable.
• TRACE: It is used as a diagnose tool. When invoking TRACE the request is sent to the server. All the intermediaries process it like any other request, but the final server generates a response that contains the request inside
the response’s body without changing anything from it. It is used to learn how the intermediaries modify the
requests.
The request mustn’t contain a body and the request will be sent in the body of the response. TRACE responses
are not cacheable.
21
• CONNECT: It is used to establish a tunnel between the client and the server. When a client establishes a secure
communication with the server with (for example TLS), the messages sent from one to another are encrypted.
When an intermediary receives a CONNECT request, it should, from that time, start to redirect all the content
between the client and the server, without trying to perform any change in the message. It is normally used with
the Proxy-Authorization
2.9.8
HTTP 1.1 Status codes
You can find the complete list of the available codes in figure 2.1. Since most of them are self-explained there won’t be
further explanation, however if you want to look into the little details behind each of them you can look at the section
6 from RFC7231 [13].
CODE
DESCRIPTION
CODE
DESCRIPTION
100
Continue
405
Method Not Allowed
101
Switching Protocols
406
Not Acceptable
200
OK
407
Proxy Authentication Required
201
Created
408
Request Timeout
202
Accepted
409
Conflict
203
Non-Autoritative Information
410
Gone
204
No Content
411
Length Required
205
Reset Content
412
Precondition Failed
206
Partial Content
413
Payload Too Large
300
Multiple Choices
414
URI Too Long
301
Moved Permanently
415
Unsupported Media Type
302
Found
416
Range Not Satisfiable
303
See Other
417
Expectation Failed
304
Not Modified
426
Upgrade Required
305
Use Proxy
500
Internal Server Error
307
Temporary Redirect
501
Not implemented
400
Bad Request
502
Bad Gateway
401
Unautorized
503
Service Unavailable
402
Payment Required
504
Gateway Timeout
403
Forbidden
505
HTTP Version Not Supported
404
Not Found
Table 2.1: HTTP Status Codes.
2.9.9
HTTP 1.1 Representation Headers
Representation headers add information about the payload content. They define the type of data that contains, it’s
encoding, it’s language and it’s location.
Content-Type: the media type of the payload. It’s values are mime-types (Fig. 2.2)
Example:
1
C o n t e n t −Type : t e x t / h t m l
Content-Encoding: Indicates what codings have been applied to the content data. Content-Encoding is
primarily used to allow a representation’s data to be compressed without losing the identity of its underlying media
type.
22
Media type
Application
Audio
Image
Text
Video
Specific value
application/json
application/soap+xml
application/javascript
application/xhtml+xml
application/pdf
application/xml
application/postscript
application/zip
audio/basic
audio/mpeg
audio/mp4
audio/vnd.wave
image/gif
image/bmp
image/jpeg
image/svg+xml
image/png
image/tiff
text/css
text/plain
text/html
text/xml
video/avi
video/mp4
video/mpeg
video/x-flv
Table 2.2: Common mime types.
Example:
1
C o n t e n t −E n c o d i n g : g z i p , d e f l a t e
In this example, there were applied two encodings: ’gzip’ and ’deflate’. The encodings are always listed in the
order that they were applied, and therefore, the response data should be decoded in reverse order. In the example,
’gzip’ was applied first, and after that ’deflate’ was applied, which means the client has to apply ’deflate’ decoding
first and then ’gzip’.
Identity is used to indicate that the data was not encoded (The header is not required if there is no encoding).
Content-Language:
It defines the language in which the content is written.
Example:
1
C o n t e n t −Language : en
Content-Location:
It contains an URI that points to a resource. If it is used in the responses of PUT or
POST methods, it contains the URI that points to the resource that has been created, If it is used alongside the 301 or
307 it contains the URI where the resource was moved to.
Example: If we use POST and we get the response below, it means that the resource was created and it is accessible
through the URL specified.
1
2
HTTP / 1 . 1 200 OK
C o n t e n t −L o c a t i o n : h t t p : / / m y s i t e . o r g / m y r e s o u r c e /
Example: This response means that the client should make a new request to the URI specified.
1
2
HTTP / 1 . 1 301 OK
C o n t e n t −L o c a t i o n : h t t p : / / m y s i t e . o r g / m y r e s o u r c e /
2.9.10
HTTP 1.1 Content-negotiation headers
There are three content-negotiation strategies: server-driven, agent-driven and transparent negotiation.
23
In server-driven negotiation, the client supplies a list of representations allowed and the server decides which of
them serves. The client can, however add headers into his request that list a set of allowed types.
Accept header: Accept header allows the client to list all the formats wanted for a response. It has two
important fields: media-range and a quality factor. The media-range and the ’q’ factor are separated by a semicolon,
and a media type is separated from another with a comma. Let’s see it with an example:
1
Accept : t e x t / p l a i n , a p p l i c a t i o n / pdf ; q =0.8 , a p p l i c a t i o n / j s o n ; q =0.3 , t e x t /*
As you can see, we specified four media types and only the second and the third media types have a q parameter.
’q’ is a quality value, the bigger it is the more desirable is for the client. If q is not present, it takes 1 as default value.
An asterisk means anything, if two types have the same ’q’ parameter but one of them contains an asterisk, the one
without the asterisk is preferred.
In this example the order of preference would be:
1.
2.
3.
4.
1
2
3
4
text / plain
text /*
a p p l i c a t i o n / pdf
application / json
In figure 2.2 you can see a list of the most used mime types. You can find a more extensive list in 2 .
There are three more ’Accept’ headers: Accept-Charset, Accept-Encoding and Accept-Language.
They have the same syntax as ’Accept’ but they parse charsets, encodings and languages. For example:
GET / m i s c / m y r e s o u r c e HTTP / 1 . 1
Accept : t e x t / p l a i n , a p p l i c a t i o n / pdf ; q =0.8 , a p p l i c a t i o n / j s o n ; q =0.3 , t e x t /*
Accept−C h a r s e t : u t f −8;q =1 , i s o −8859−1;q = 0 . 5
Accept−E n c o d i n g : g z i p ; q =1 , i d e n t i t y ; q = 0 . 5 , * / * ; q = 0 . 5
Accept−Language : en ; q =1 , e s ; q = 0 . 8 , c a ; q = 0 . 7
1
2
3
4
5
On the other hand, in an agent-driven negotiation, the client decides which representation wants. First, the client
performs a request in order to learn all available representations and after that it performs a second request pointing to
the specific representation. You can look at it as if you create a single resource for each representation:
1
2
3
4
5
http
http
http
http
http
://
://
://
://
://
myrestaurantsite
myrestaurantsite
myrestaurantsite
myrestaurantsite
myrestaurantsite
. org /
. org /
. org /
. org /
. org /
Restaurants
Restaurants
Restaurants
Restaurants
Restaurants
/{
/{
/{
/{
/{
Restaurant
Restaurant
Restaurant
Restaurant
Restaurant
Name } / Menu
Name } / Menu / png /
Name } / Menu / p d f /
Name } / Menu / p l a i n /
Name } / Menu / xml /
The resource ’ /Menu’ returns links to all the other representations and after that the client performs a new request
to get the resource.
The last strategy (Transparent negotiation) is a combination of the previous strategies. The client uses server-driven
negotiation, but instead of the server handling it, an intermediary redirects the client to the correct representation. This
implies that the intermediary must know all the representations that the server has for every resource. So, from the point
of view of the client it is a server-driven negotiation, but from the server side, who has to serve all the representations
to the proxy it is a agent-driven negotiation, only that the agent is the proxy.
2.9.11
HTTP 1.1 Cache headers
Cache headers contain directives to determine whether if a response can be cached, if the client wants a cached
response or not and so on.
Cache-Control: It contains a list of directives that indicate to the cache agents what to do with the request
or the response that they are processing. The cache agents MUST follow this directives and said header must be
forwarded to further layers. In figure 2.3 you can find a list of the available directives with a brief description (for
further information see section 5.2.2 of RFC7234 [16]).
Example: The following header indicates that the client requires a resource that comes from a cache agent but that
has not been stored for more than two minutes.
2 http://www.sitepoint.com/web-foundations/mime-types-complete-list/
24
1
Cache−C o n t r o l : o n l y−i f −c a c h e d , max−a g e =120
Directive
name
Value
Description
Validity
max-age
seconds
In a request it indicates the maximum time that a response
has been stored since it was generated.
REQUEST
max-stale
seconds
In a request it indicates the maximum time that a response
has exceeded it's validity time.
REQUEST
min-fresh
seconds
In a request it indicates that the client wants a response that
is valid at least for 'min-fresh' seconds.
REQUEST
only-if-cached
nothing
Indicates that the client wants a stored response.
REQUEST
mustrevalidate
nothing
It indicates that caches MUST NOT use not fresh responses.
It has to revalidate before with the server.
RESPONSE
public
nothing
It indicates that the cache agents may store the response,
even if it'd normally be a non-cacheable response.
RESPONSE
private
nothing
It indicates that the store should not be stored by shared
caches.
RESPONSE
proxyrevalidate
nothing
It wors as must-revalidate but it does not apply to private
caches
RESPONSE
max-age
seconds
It specifies the number of seconds during which the response
will be valid
RESPONSE
s-maxage
seconds
In shared caches it overrides the max-age directive
RESPONSE
Extensions
optional
The directives can be extended with other private directives.
REQUEST AND
RESPONSE
no-cache
nothing
In a request it indicates that the request MUST NOT be
responded with a stored response. In a response it indicates
that it MUST NOT be stored.
REQUEST AND
RESPONSE
no-store
nothing
It indicates that a cache MUST NOT store any part of the
request or the response containing it.
REQUEST AND
RESPONSE
no-transform
nothin
It means that an intermediary MUST NOT transform in any
way the payload.
REQUEST AND
RESPONSE
Table 2.3: Cache-Control header directives.
Date:
It specifies the HTTP data corresponding to the time the message was originated. All the responses
(including errors) except the continue answers (status 100) should include the header ”date“. Example:
1
D a t e : Mon , 01 J u l 2014 1 2 : 1 3 : 1 4 GMT
Unfortunately, due to earlier versions of HTTP, the value date can be in any of three possible formats:
Date: Mon, 27 Apr 2009 23:59:59 GMT
Date: Monday, 27-Apr-09 23:59:59 GMT
Date: Mon April 27 23:59:59 2009
Although servers can accept all three formats of date, HTTP 1.1 only generates the first type.
Age: It indicates the number of seconds that have passed since the response was generated. It is used in responses.
Example: This header means that the response was generated 25 seconds ago and it has been stored since then.
1
Age : 25
25
Expires:
1
It indicates that the response is valid until the HTTP date specified:
E x p i r e s : Mon , 01 J u l 2015 1 6 : 0 0 : 0 0 GMT
Warning:
It contains additional information that is not reflected in the status code. It contains a numerical
code, a brief description and a HTTP date that must match the Date header. You can find the list of codes in the
RFC7234 [16] section 5.5.
Example:
1
2
D a t e : Mon , 01 J u l 2015 1 6 : 0 0 : 0 0 GMT
Warning : 110 − " R e s p o n s e i s S t a l e " "Mon , 01 J u l 2015 1 6 : 0 0 : 0 0 GMT"
2.9.12
HTTP 1.1 Conditional headers
These time stamps use the Greenwich Mean Time (GMT).
There are two headers called ”If-Modified-since“ and ”If-Unmodified-Since“ that can be included in HTTP requests.
• The If-Modified-Since header means “send the response if it has changed since that date“.
• The If-Unmodified-Since header means ”send the response if it has not changed since that date“.
Clients are not required to use them but it is assumed that the HTTP 1.1 servers will consider these headers and proceed
as follows:
• If we use If-Modified-Since in the request and the data of the response has not been changed, the server must
send "304 Not Modified".
• If we use the header If-Unmodified-Since, and the data of the response has been modified, the server must send
"412 Precondition Failed".
The most commonly used is the If-Modified-Since header. The If-Unmodified-Since has some not so common
uses. As an example, it can be used in a situation in which you request a resource that needs other resources and that
if someone changes the original resource in the meantime, this might lead to inconsistencies. In this case, we can use
the if-unmodified-since header and the HTTP server will send us information if a record has been changed.
2.9.13
HTTP 1.1 Authentication headers
Authentication headers allow the user to send its credentials to the server. There are some standard authentication
schemes defined on HTTP 1.1 but the most used one is ’basic’ (explained below).
WWW-Authenticate:
It is a response header sent from the server to tell the client which authentication
schemes are allowed in the requested resource. It should always be included when the server returns a 401 (Unauthorized) status code. It can also contain other parameters, such as the ’realm’ identification. Realms are virtual collection
of resources that share the same authentication permissions.
Example:
1
WWW
−A u t h e n t i c a t e : B a s i c
Authorization It contains the authentication credentials for a user.
In the basic authentication scheme, the client must take the user and the password and construct a ’user:password’
structure. Then it has to encode it using base64 encoding.
For example, user-ID "Aladdin" and password "open sesame" would be encoded as:
1
A u t h o r i z a t i o n : B a s i c QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Proxy-Authenticate: It is used to request to an intermediary proxy which authorization schemes does it
allow. They are the same kind of schemes used in WWW-Authenticate.
Proxy-Authorization: It is the proxy equivalent of the ’Authorization’ header. This is sent by the client to
the proxy.
26
2.10
Practical HTTP with apache
2.10.1
Introduction
The Apache HTTP Server, commonly referred to as Apache, is a WEB server software notable for playing a key role
in the initial growth of the WWW. Today it is also widely deployed in many sites. In our case, we are going to use
its second version: the apache2 daemon. One of the main advantages of apache2 is its modular architecture. You
can add or remove functionality as dictated by your requirements.
Debian-based distros store the Apache 2.0 configuration files in the directory /etc/apache2. Actually, this configuration file is used to load other configuration files. One of these other configuration files is ports.conf, which contains
the Listen directives telling apache2 what IP addresses and ports should listen to.
As usual, if you change the configuration of the daemon you have to stop and start it to apply the changes. As most
of the network daemons, apache2 can be started and stopped under Debian Linux using a script under the directory
/etc/init.d. In particular, to stop apache2 type:
1
# /etc/init.d/apache2 stop
To start the daemon type:
1
# /etc/init.d/apache2 start
2.10.2
Virtual Hosts (sites)
A virtual host is just a web site served by the HTTP server. Each virtual host or site has its own configuration file that
contains all the directives that pertain only to that site (a sample configuration file is shown in Code 2.5).
27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
< V i r t u a l H o s t * :80 >
ServerAdmin webmaster@localhost
DocumentRoot / v a r /www
<Directory />
O p t i o n s FollowSymLinks
A l l o w O v e r r i d e None
</ D i r e c t o r y >
< D i r e c t o r y / v a r /www/ >
O p t i o n s I n d e x e s FollowSymLinks M u l t i V i e w s
A l l o w O v e r r i d e None
O r d e r a l l o w , deny
Allow from a l l
</ D i r e c t o r y >
S c r i p t A l i a s / c g i −b i n / / v a r /www/ c g i −b i n /
< D i r e c t o r y " / v a r /www/ c g i −b i n " >
A l l o w O v e r r i d e None
O p t i o n s +ExecCGI −M u l t i V i e w s + SymLinksIfOwnerMatch
O r d e r a l l o w , deny
Allow from a l l
</ D i r e c t o r y >
E r r o r L o g $ {APACHE_LOG_DIR } / e r r o r . l o g
# P o s s i b l e v a l u e s i n c l u d e : debug , i n f o , n o t i c e , warn , e r r o r , c r i t ,
# a l e r t , emerg .
L o g L e v e l warn
CustomLog $ {APACHE_LOG_DIR } / a c c e s s . l o g combined
A l i a s / doc / " / u s r / s h a r e / doc / "
< D i r e c t o r y " / u s r / s h a r e / doc / " >
O p t i o n s I n d e x e s M u l t i V i e w s FollowSymLinks
A l l o w O v e r r i d e None
O r d e r deny , a l l o w
Deny from a l l
Allow from 1 2 7 . 0 . 0 . 0 / 2 5 5 . 0 . 0 . 0 : : 1 / 1 2 8
</ D i r e c t o r y >
</ V i r t u a l H o s t >
Code 2.5: Sample Apache 2.0 configuration file for a virtualhost
In apache2, the configuration of virtual hosts are in the directory /etc/apache2/sites-available. To activate a site
(virtual host), you can use the a2ensite command:
1
2
# a2ensite default
# /etc/init.d/apache2 restart
There is a respective a2dissite command for disabling a site:
1
2
# a2dissite default
# /etc/init.d/apache2 restart
Typically, if you only run one web site on your server, apache2 uses the default virtual host. The configuration of the default site is in the file /etc/apache2/sites-available/default. After you enable this site, if you look at
/etc/apache2/sites-enabled/, you will find that there is a symbolic link called 000-default. Looking at the configuration
of the default site you can easily create other virtual hosts.
2.10.3
CGIs
Next, we discuss how CGIs work with apache2. A CGI defines a way for a web server to interact with external
content-generating programs, which are often referred to as CGI programs or CGI scripts. This is one of the simplest
ways of creating dynamic content on your web site. In the case of apache2, in order to get your CGI programs to
work properly, you will need to have Apache configured to permit CGI execution. In Code 2.5 you can observe the
following configuration line:
28
1
S c r i p t A l i a s / c g i −b i n / / v a r /www/ c g i −b i n /
This tells Apache that any request for a resource beginning with /cgi-bin/ should be served from the directory
/var/www/cgi-bin/ and should be treated as a CGI program. For example, if the URL http://localhost/cgi-bin/datecgi.sh
is requested, Apache will attempt to execute the file /var/www/cgi-bin/datecgi.sh and return the output. Of course, the
file has to exist, be executable and return a correct output (e.g. an HTML file) or apache2 will return an error
message. You can use Code 2.6 for datecgi.sh.
1
2
3
4
5
6
7
# ! / bin / sh
e c h o " C o n t e n t −t y p e : t e x t / h t m l "
echo
e c h o " < html > <body > "
e c h o −n " The c u r r e n t d a t e i s "
date
e c h o " </ body > </ html > "
Code 2.6: Simple CGI script with Bash
In Code 2.7 you have another example of a CGI (in C) that multiplies two numbers.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# i n c l u d e < s t d i o . h>
# i n c l u d e < s t d l i b . h>
i n t main ( v o i d ) {
char * data ;
long x , y ;
d a t a = g e t e n v ( "QUERY_STRING" ) ;
p r i n t f ( " C o n t e n t −t y p e : t e x t / h t m l \ n \ n " ) ;
p r i n t f ( " < html ><body > \ n " ) ;
p r i n t f ( " <h1 >MULTIPLICATION < / h1 > \ n< hr > \ n " ) ;
i f ( d a t a == NULL)
p r i n t f ( " <P>ERROR : No q u e r y s t r i n g r e c e i v e d </ P> " ) ;
e l s e i f ( s s c a n f ( d a t a , " x=%l d&y=%l d " ,&x ,& y ) ! = 2 )
p r i n t f ( " <P>ERROR : I n v a l i d Arguments </ P> " ) ;
else
p r i n t f ( " <P>The p r o d u c t o f x=%l d and y=%l d i s z=%l d </ P> " , x , y , x * y ) ;
p r i n t f ( " </ body > </ html > \ n " ) ;
return 0;
}
Code 2.7: Simple CGI in C
2.10.4
Modules
To manage modules, Debian-based distros use two directories: /etc/apache2/mods-enabled and /etc/apache2/modsavailable. To activate a module, use the a2enmod command:
1
2
# a2enmod userdir
# /etc/init.d/apache2 restart
The previous a2enmod creates symbolic links in the mods-enabled directory. Likewise, to disable the “userdir”
module you can type:
1
2
# a2dismod userdir
# /etc/init.d/apache2 restart
29
The “userdir” module is quite useful because it gives users a default place to setup their own WEB pages. As
a system user, you just create a subdirectory called “public_html” in your home directory and place your files and
HTML documents there. You can test this module locally with a browser using the following URL:
h t t p : / / l o c a l h o s t /~ username /
1
The “username” is your user and of course you can use an IP address instead of “localhost” to remotely connect
to your personal site. If an “access denied” appears in your browser, this might be due to the fact that apache2 runs
using the system user www-data and your “public_html” directory is not readable by www-data. In general, all the
resources on the server must be readable by the user www-data.
2.11
Commands summary
Table 2.4 summarizes the commands used within this section.
Table 2.4: Commands for WWW.
firefox
apache2
a2enmod
a2dismod
service
a2ensite
a2dissite
2.12
XML
2.12.1
Introduction
A WEB browser.
An HTTP server.
Enable an Apache 2.0 module.
Disable an Apache 2.0 module.
Start, stop, restart, etc. services (daemons).
Enable an Apache 2.0 WEB site (Virtual Host).
Disable an Apache 2.0 WEB site (Virtual Host).
XML (eXtensible Markup Language) defines a set of rules for encoding documents in a readable form. An XML
document is a “text” file, i.e a string of characters coded with UTF8 or with an ISO standard like ISO-8859-1 (Latin1).
The characters which make up an XML document are divided into markup and content. All strings which constitute
markup either begin with the character "<" and end with a ">", or begin with the character "&" and end with a ";".
Strings which are not markup are content. In particular, a tag is a markup construct that begins with "<" and ends with
">". Tags come in three flavors:
• start-tags, for example <section>
• end-tags, for example </section>
• empty-element tags, for example <line-break />
Another special component in a XML file is the element. An element is a logical document component that either
begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag. The characters
between the start-tag and the end-tag, if any, are the element’s content. The element content may also contain markup,
including other elements, which are called child elements. An example of an element is <Greeting>Hello,
world.</Greeting>. A more elaborated example is the following:
1
2
3
4
5
6
7
<person >
< n i f >46117234 </ n i f >
<name>
< f i r s t >Peter </ f i r s t >
< l a s t >Scott </ l a s t >
</ name>
</ p e r s o n >
Finally, the attribute of an element is a markup construct consisting of a name="value" pair that exists within a
start-tag or empty-element tag. For example, the above person record can be modified using attributes to add the age
and the gender of the person definition:
30
1
2
3
4
5
6
7
< p e r s o n a g e = " 17 " g e n d e r = " male " >
< n i f >46117234 </ n i f >
<name>
< f i r s t >Peter </ f i r s t >
< l a s t >Scott </ l a s t >
</ name>
</ p e r s o n >
2.12.2
XML Comments
You can use comments to leave a note or to temporarily edit out a portion of XML code. Although XML is supposed
to be self-describing data, you may still come across some instances where an XML comment might be necessary.
XML comments have the exact same syntax as HTML comments: they start with "<!--" and end with "-->". Below
is an example of a notation comment that should be used when you need to leave a note to yourself or to someone who
may be viewing your XML.
1
2
3
4
5
6
7
8
< p e r s o n a g e = " 17 " g e n d e r = " male " >
<!−− P e t e r i s a r e a l l y n i c e p e r s o n −−>
< n i f >46117234 </ n i f >
<name>
< f i r s t >Peter </ f i r s t >
< l a s t >Scott </ l a s t >
</ name>
</ p e r s o n >
2.12.3
Escaping
XML uses several characters in special ways as part of its markup, in particular the less-than symbol (<), the greaterthan symbol (>), the double quotation mark ("), the apostrophe (’), and the ampersand (&). But what if you need to
use these characters in your content, and you don’t want them to be treated as part of the markup by XML processors?
For this purpose, XML provides escape facilities for including characters which are problematic to include directly.
These escape facilities to reference problematic characters or “entities” are implemented with the ampersand (&) and
semicolon (;). There are five predefined entities in XML:
• &amp; refers to an ampersand (&)
• &lt; refers to a less-than symbol (<)
• &gt; refers to a greater-than symbol (>)
• &apos; refers to an apostrophe symbol (’)
• &quot; refers to an quotation symbol (")
For example, suppose that our XML file should contain the following text line:
1
<commnand> e c h o " 1 " >/ p r o c / s y s / n e t / i p v 4 / i p _ f o r w a r d </ commnand>
The previous line is not correct in XML. To avoid our XML parser being confused with the greater-than character,
we have to use:
1
<commnand> e c h o " 1 " &g t ; / p r o c / s y s / n e t / i p v 4 / i p _ f o r w a r d </ commnand>
In the same way, the quotation mark (") might be problematic if you need to use it inside an attribute. In this case,
you have to scape this symbol. Notice however, that escaping the quotation mark is not necessary in our previous
example, since the quotation mark appears inside the content of the element (and not in the value of an attribute).
2.12.4
Well-formed XML
A “well-formed” XML document is a text document that satisfies the list of syntax rules provided in the XML specification. The list of syntax rules is fairly lengthy but some key rules are the following:
31
• The document contains only properly encoded legal Unicode characters.
• None of the special syntax characters such as "<" and "&" appear except when performing their markupdelineation roles.
• The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and
none overlapping.
• The element tags are case-sensitive; the beginning and end tags must match exactly.
• Tag names cannot contain any of the characters !"#$%&’()*+, /;<=>?@[] \^‘{|}~ nor a space character, and
cannot start with - (dash), . (point), or a numeric digit.
• There must be a single "root" element that contains all the other elements.
2.12.5
Valid XML
In addition to being well-formed, an XML document has to be “valid“. This means that all the elements and attributes
used in the XML document must be in the set defined in the language specification and must be used correctly. For
example, if we define a language specification for person registry, we can define the elements: person, nif, name, first,
last. We can also define the person attributes: age and gender and the type of values for each of the attributes (e.g. age
attribute is an integer number and gender attribute has a value inside the set {male, female}). We might also define the
order in which elements can appear and the nesting rules.
For addressing all these issues, XML defines a especial file called ”Document Type Definition” (DTD) file. A
DTD file defines an XML specification language, including all the elements, attributes and grammatical rules. Finally,
the DTD file is used by XML processors to check if an XML document is ”valid”.
In Code 2.8 we show the beginning of the DTD file of the VNUML language.
1
2
3
4
5
6
7
<!−− VNUML DTD version 1.8 −−>
<!ELEMENT vnuml (global,net*,vm*,host?)>
<!ELEMENT global (version,simulation_name,ssh_version?,ssh_key*,automac?,netconfig?,vm_mgmt?,
tun_device?,vm_defaults?)>
<!ELEMENT vm_defaults (filesystem?,mem?,kernel?,shell?,basedir?,
mng_if?,console*,xterm?,route*,forwarding?,user*,filetree*)>
...
Code 2.8: Beginning of the VNUML DTD file.
In the previous DTD file we can see several quantifiers. A quantifier in a DTD file is a single character that
immediately follows the specified item to which it applies, to restrict the number of successive occurrences of these
items at the specified position in the content of the element. The quantifier may be either:
• + for specifying that there must be one or more occurrences of the item. The effective content of each occurrence
may be different.
• * for specifying that any number (zero or more) of occurrences are allowed. The item is optional and the
effective content of each occurrence may be different.
• ? for specifying that there must not be more than one occurrence. The item is optional.
• If there is no quantifier, the specified item must occur exactly one time at the specified position in the content of
the element.
2.13
JSON
JSON or JavaScript Object Notation is a data-interchange format. It’s power is it’s simplicity. The data is encapsulated
in pair form: a string (a name) and the data contained which can be a string, a number, an object, an array or boolean
types. The pairs are separated by a colon ’:’. Strings are always double-quoted.
Example:
1
"mystring" : "HelloWorld"
32
1
"mynumber" : 123
Objects are elements that encapsulate one or more data pairs. They are limited by ’{’ and ’}’ and the data pairs
are separated one from another with a comma ’,’ but the last element is never followed by a comma.
Example:
1
2
3
4
5
{
"string1" : "Hello",
"string2" : "World",
"number1" : 1
}
Arrays are lists of elements . Arrays are delimited by ’[’ and ’]’ and the data is also separated with commas ’,’ and
like objects, the last element is never followed by a comma. The data inside arrays can be strings, numbers, boolean
expressions, objects or other arrays but cannot be value pairs.
Example:
1
2
3
4
5
6
7
[
"Alice",
"Bob",
{
"name": "Carla"
}
]
You can form nested structures by combining objects and arrays.
Example:
1
[
{
"string1" : "HeollWlrod",
"spelling-checked" : false
},
{
"string1" : "HelloWorld",
"spelling-checked" : true
}
2
3
4
5
6
7
8
9
10
11
]
You can define an object or array as a value in a data pair:
1
2
3
4
5
6
7
8
"message" : {
"from" : "Bob",
"to" : [
"Alice",
"Carla"
],
"body" : "Hello Alice! Hello Carla!"
}
Note: Normally JSON messages are defined in multiple lines, adding indentation to clarify their structures but
it’s not mandatory, the following message is equally valid:
33
1
"message":{"from":"Bob","to":["Alice","Carla"],"body":"Hello Alice!Hello Carla
!"}
Note: Since it doesn’t make sense to use JSON to encapsulate only one value most validators find errors if you
define a JSON message like the previous example. To solve it you can simply wrap it inside an object or an array:
1
2
3
4
5
6
7
8
9
{"message" : {
"from" : "Bob",
"to" : [
"Alice",
"Carla"
],
"body" : "Hello Alice! Hello Carla!"
}
}
JSON is named after JavaScript because it uses the same syntax to encapsulate data, you can look at JSON objects
as javaScript dictionaries and JSON arrays as JavaScript variables. The same happens with python except for the
boolean expression, which in python are Capitalized.
34
Chapter 3
Restful Architectural Style
Representational State Transfer (REST from now on) is not a protocol or a standard. Rest is an architectural style
for distributed hypermedia systems defined by Dr. Roy Fielding in his PhD dissertation: Architectural Styles and the
Design of Network-based Software Architectures[1].
3.1
REST motivation
REST was designed in order to improve the modern web architecture and help solve some existing problems. Here
are some of the desired requirements:
• A system that provides a universally consistent interface to structured information, available on as many platforms as possible.
• Simplicity: all of the protocols are defined as text.
• Extensibility: A system must be prepared for change.
• Systems must be designed for large-grain data transfer.
• The architecture must minimize network interactions
• The architecture element must be able to continue operating when they are subjected to an unanticipated load
or when given malformed or maliciously constructed data. (Architecture elements refer to all the elements
participating in the connection, thus client is included in this requirement).
• To have a safe set of operations with well-defined semantics.
• A system must be prepared for gradual and fragmented change. Old and new implementations co-exist.
• The architecture must be designed to ease the deployment of architectural elements.
When the REST was designed, the web architecture was already widely deployed and it had significant limitations in
its support for extensibility, shared caching and intermediaries.
The big problem was: How to create a new architectural style that added the desired requirements listed above but not
producing a major change to the properties that had allowed the web to grow exponentially.
The solution that Dr. Fielding found was to take the deployed web architecture, study the constrains that are responsible
for its properties and add a new set of constrains to create a modern web architecture. To learn more about the
justification and thought process behind this ideas you can read the fourth chapter from Dr. Fielding’s dissertation.
3.2
REST Constrains
In this section you can find the constrains (or rules) added by Dr. Fielding and a brief explanation for each of them. Dr.
Fielding defines this constrains in a very general way so the result can be applied to multiple systems and protocols.
This document’s scope, however, is to describe it’s usual deployment: over HTTP. In each section, after the description
of the constrain you’ll find a full description on how can you apply this rule to the development of APIs over HTTP
under the title In practice.
35
3.2.1
Client-server
The first constrain is to apply the main client-server architectural style principle: Separation of concerns. The separation of user interface concerns from data storage concerns adds portability of the user interface across multiple
platforms and improves scalability. It also allows the components to evolve independently.
In practice:
The responsible for the separation between user interface and data storage are URIs. URIs are just a text string with a
standard format that allows you to name a resource based on its location on the web.
1
p r o t o c o l : / / hostname / d i r e c t o r y / r e s o u r c e
But, other information can also be present in the URL:
1
p r o t o c o l : / / u s e r n a m e : password@hostname : p o r t / d i r e c t o r y / r e s o u r c e
The detailed specification for URL/URIs is detailed in RFC 1738[9].
Some examples are:
• http://www.example.com/pictures/upc.jpg
• http://www.example.com
• http://192.168.0.5
• http://www.example:8080/cgi-bin/time.sh
• http://user:[email protected]/
• ftp://debian.org
Dr. Fielding doesn’t specify any rule for defining identifiers in his dissertation. However, some rules can be
deduced from other constrains. For example, every resource has one or more representation, but they shouldn’t be part
of the resource identifier: if you want to GET an image, traditionally its identifier will look like ’/img/myimg.png’ but
an optimal way to GET the image would be requesting ’/img/myimage’ and send an ’Accept’ header in the request,
specifying the desired media type.
Apart from this constrain, there is a global tendency impulsed by W3C to use readable URIs with hierarchical
structure that can be helpful[7].
Example: A service that provides information about restaurants. Every restaurant will be identified as a resource,
you can also have a resource that lists them (or a subset of them) and a different resource that lists the ones near a
certain zip code and finally a resource that represents the menu of a restaurant (The definition of a resource will be
explained in section 3.2.4).
1
2
3
4
http
http
http
http
://
://
://
://
myrestaurantsite
myrestaurantsite
myrestaurantsite
myrestaurantsite
. org /
. org /
. org /
. com /
Restaurants
Restaurants
Restaurants
Restaurants
/
/ { R e s t a u r a n t Name}
/ { R e s t a u r a n t Name } / Menu
/ n e a r / { ZIPCODE}
As you can see this URIs are self explained and don’t require a further explanation. Have in mind that this only
helps to the understanding of the API structure and (maybe) to the URI parsing but it is not a REST requirement (and
has nothing to do with the ’visibility’ attribute). Actually, following REST constrains, clients should not use fixed or
pattern-generated URIs but retrieve the URIs from the server (see section 3.2.4).
3.2.2
Stateless
In the client-server architecture a server serves many clients. This can be a problem if the server needs to store data
from each client’s state because if the number of clients grow, the resources necessary also grow and the response time
will be affected. To solve it, all the communications must be stateless, this implies that each request must contain all
the necessary information for the server to process it and that the session information must be stored on the client.
Stateless systems have better scalability, visibility and reliability. Scalability because the server needs less resources for each client, visibility because each request can be monitored independently (it contains all the information
required) and reliability because it is easier to recover from partial failures.
36
On the other hand, by having a stateless server the network performance can be decreased because of repetitive
data sent in a series of requests.
In practice:
The stateless constrain has two points of view: protocol side and server side. First of all, HTTP is a stateless protocol,
so the protocol side is covered but you must keep in mind that you can’t use cookies. Cookies are not defined in the
original standard of HTTP. They were defined some time after to fill the need of some applications to store sessions
[11] but are widely used.
On the server side, there is not any technology behind being stateless, it depends on the implementation each
developer does. The key aspect behind this constraint is that it is only applied to the server, so the solution to implement
stateful application is to transfer the state of the application to the client.
Example:
Imagine you want to develop an API for an online shop. A client might want to buy multiple products, and many
online shops implement a virtual cart, where the app stores elements that you’ve already chosen. In a REST API, the
shop would not keep the track of the elements that the client has visited or chosen. Instead, the client should keep the
record and in the check-out, however it’s implemented, it should send the list of elements that the client wants.
3.2.3
Cache
Cache strategy is a method to improve network performance. By labeling data as cacheable or non cacheable we allow
the client (or another network element) to reuse the information retrieved from the server in previous requests. Of
course, since this can cause reliability problems if the data stored is drastically different from the data stored in the
server, the decision of whether some data is cacheable or not is crucial.
By implementing a cache-capable system the efficiency, scalability and the user-perceived performance are improved.
In practice:
From the point of view of messages, HTTP has headers that define the caching strategy that a client or a server must
follow regarding that specific message. You can find those headers detailed in section 2.9.2.
The cache system can be applied not only to the clients but also to intermediary nodes on the network. The
problem is you can only use cache for non encrypted connections. Therefore, if an application needs to be encrypted
the cache-capable nodes have to be located behind the security layer.
From the network point of view there are many available proxy server technologies that implement caching.
Important: The decisions taken in the resource definition may condition some system’s ability to properly apply
the cache rules defined.
For example, some resources return different information depending on the query string included on their URI. The
most typical examples are search resources:
1
h t t p : / / m y s e a r c h s i t e . com / s e a r c h ? q=news&o r d e r _ b y = d a t e
Some cache intermediaries can consider this kind of URIs non-cacheable regardless of the cache headers included
on the response.
Squid, for example, handles query string cache since version 2.7.
3.2.4
Uniform interface
This constrain is probably the most important one. It’s the one that defines the very nature or REST.
REST requires applying the software engineering principle of ’generality to the component interface’. It allows
the implementations to be decoupled from the services they provide.
REST defines four interface constrains:
• Identification of resources
37
• Manipulation of resources through representations
• Self-descriptive messages
• Hypermedia as the engine of application state
Identification of resources: REST uses resource identifiers to map it’s resources. REST enforces the author to
choose the resource identifier that best fits the nature of the concept being identified. The author is also responsible
for maintaining the semantic validity of the mapping over time.
Resource representation: REST resources are not transfered. Instead, a representation of the resource is transferred. A representation is a sequence of bytes containing the information traded with additional meta-data to describe
the sequence. For example, if your resource is an email stored on a database, you could represent it as three strings
under the names of ’from’, ’to’, and ’message’ in a JSON format, but you’d never send the whole row from the table
that contains it in the database. Representations must match one of an evolving set of standard data types.
Self-descriptive messages: REST messages must contain standard methods and media types to indicate semantics.
Joining the previous ’stateless’ constrain the result are self-descriptive messages.
Hypermedia as the engine of application state: Since the ’stateless’ constrain prevents servers from storing
application state, this must be kept in the client side. This can be achieved by making the server send to the client the
set of choices he has in the state point where it is. Dr. Fielding defends in his blog:
"A REST application should be entered with no prior knowledge beyond the initial URI (bookmark)
and set of standardized media types that are appropriate for the intended audience (i.e., expected to be
understood by any client that might use the API). From that point on, all application state transitions must
be driven by client selection of server-provided choices that are present in the received representations or
implied by the user’s manipulation of those representations."
By defining a uniform interface visibility of interaction is improved and overall system architecture is simplified.
On the other hand, by defining a uniform interface, the efficiency may be degraded since some application could work
optimally under some different conditions.
In practice:
Everything is a resource in REST APIs: files, raw information, database query results, data retrieved from an algorithm,
etc. are the same from the point of view of the client. It is task of the developer to define what resources are important
from the client perspective and how to treat those diverse elements as a resource.
If we look at resources from the class based programming languages perspective, the resources could be seen as
classes. They have their attributes, which will be used for representing such resource and their methods, which will
allow us to interact with them.
Example: If we develop an address book the resources can be: a concrete person, a telephone number, an address,
lists of people, etc. A concrete person will contain a telephone number, an address (or more), a name, etc. A list of
people will contain multiple person resources.
We are not interested on how this resources are obtained. They might be stored on databases, they might be stored
on different single files... but from the client perspective it doesn’t matter.
Sometimes it’s hard to transform some services or functionalities into resources. For example, following with the
online shop example:
Imagine that you want your API to allow a user to buy a list of things, pay for them and keep track of them until
the transportation enterprise takes care.
You could, for example define a resource that represents a list of orders, a resource that defines the payment
information about an order and a resource that represents the shipment information about a payed order.
The client would look for products in the API and elaborate a list of the ones that the user wants to buy. Once
the list is finished, the client would POST the list on the list of orders resource. If everything went ok (the list is well
formed, there is enough stock for all the products, etc.) the API should return a 201 status code (created) and a content
location header that points to the payment information resource. The client would GET the payment information and
pay the bill by sending a PUT action with the bank account information for the store to collect the money. Once the
payment is done, another 201 status code is returned and a content location header pointing to a new resource that
represents the shipment information. Also the order is removed from the orders list.
38
Resource representation:
As stated before, the resources are never transfered, instead a representation of them is sent to work with. The
developer must decide and document which formats representation have. As Dr. Fielding stated in his blog:
“A REST API should spend almost all of its descriptive effort in defining the media type(s) used
for representing resources and driving application state, or in defining extended relation names and/or
hypertext-enabled mark-up for existing standard media types”
Example: In the previous address book we could define a person representation in JSON format like this:
1
[
{
2
"person":{
"first_name":"STRING",
"last_name":"STRING",
"telephone_number":"URI pointing to telephone resource",
"addresses":[
{
"address_name":"STRING",
"address":"URI pointing to address 1"
},
{
"address_name":"STRING",
"address":"URI pointing to address n"
}
]
}
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
}
18
19
]
A good practice regarding resource representations is to define custom Content-Type values instead of using the
generic ones for your representations, this will ease the task of parsing the payload of your messages. The application/vnd mime type allows you to define your custom non standardized formats.
If you define it like this:
1
a p p l i c a t i o n / vnd+company . c a t e g o r y + f o r m a t
You’re not only defining the format but also the kind of data it contains and it’s structure.
Example: For the previous example you could’ve defined it’s Content-Type like usually:
1
C o n t e n t −Type : a p p l i c a t i o n / j s o n
But a much better option would’ve been:
1
C o n t e n t −Type : a p p l i c a t i o n / vnd+ yourcompanyname . p e r s o n + j s o n
If you want to standardize it and make it public so it can be accessible all around the world you can use IANA’s
service 1
Self-descriptive messages
Self descriptive messages imply two conditions: the use of well known semantics and the need of sending not only a
request but also metadata that should be used to describe the message.
1 http://www.iana.org/cgi-bin/mediatypes.pl
39
The semantics in HTTP1.1 are defined by HTTP method names and status codes. You can look for the available
methods in HTTP in section 2.9.7 and you can learn about status codes in section 2.9.8.
HTTP1.1 also defines headers that will contain the metadata needed for a message to be Self-descriptive. There are
three types of headers: Request-only, Response-only and Request-and-Response. You have a list of them in section
2.9.2.
Hypermedia as the engine of application state
From Wikitionary:
1
Noun
2
3
hypermedia (uncountable)
4
5
6
7
(computing) The use of text, data, graphics, audio and video as elements
of an extended hypertext system in which all elements are linked so that
the user can move among them at will.
Hypermedia is a superset of hypertext. Hypermedia refers to any kind of data that may be transfered but that
contains interconnections between resources.
Since in this example we are using URIs as resource identifiers the interconnections of resources and the drive of
the application state will be through URIs.
Example: Remember the previous restaurant example. We can access its bookmark in ’/’ and we may get:
1
<link href="/Restaurants/" ref="List" />
Then if we perform a get action on ’Restaurants’ we could get:
1
2
3
4
5
6
<Restaurant>
<Name>My Restaurant</Name>
<Location>123 Fake street</Location>
...
<link href="/Resturants/168498/ ref="Restaurant" />
</Restaurant>
After that we could even follow the link provided and we might get:
1
2
3
4
5
6
7
8
<RestaurantDetail>
<Menu>
<link href="/Restaurants/168498/Menu" ref="Menu" />
</Menu>
<Reserve>
<link href="/Restaurants/168498/ ref="Reserve" />
</Reserve>
</RestaurantDetail>
Or if the restaurant doesn’t allow on-line reservations:
1
2
3
4
5
<RestaurantDetail>
<Menu>
<link href="/Restaurants/168498/Menu" ref="Menu" />
</Menu>
</RestaurantDetail>
You can look at your API like if it was a finite state machine. When a client the API’s bookmark is on the
initial state, and you can guide them through the application by providing links to the next possible resources in the
application flow depending on the last resource that they accessed.
For example, in figure 3.1 you can see a representation of a finite state machine that describes the restaurant API.
The nodes are not resources but actions performed on those resources (you can see how it’s very simple to relate to
HTTP verbs).
40
Show list
Show Restaurant
Show menu
Show list
Restaurant
List Shown
Bookmark
Show
list
Restaurant
Details
Shown
Show
Details
Show
list
Add
Restaurant
Restaurant
Added
Menu
Shown
Add
restaurant
Modify Menu
Modify
menu
Modified
Menu
Add a menu
Figure 3.1: Finite state machine representation.
3.2.5
Layered System
The most basic server-side architecture is a central node that responds all the incoming requests. This solution,
obviously is not the optimal one. REST proposes layered systems that add hierarchy layers where any component
can’t see beyond the immediate layer.
A layered system offers scalability improvements, since it opens the possibility of load-balancing. It also allows
to allocate multiple strategically placed caches in order to boost the performance. The main disadvantage is that they
add overhead and latency to the processing of data.
In practice:
Let’s take a look at a possible server side arrange.
ZONE 1
ZONE 2
CORE
SERVER
Figure 3.2: Layered system example.
41
ZONE 3
As you can see, there are three zones:
• Zone 1: Authentication zone. It handles authentication and since it is the edge zone, it will also decrypt the
incoming requests.
• Zone 2: Proxy zone. It caches data. If a request can be responded with previously stored data it will be done in
this layer.
• Zone 3: Core zone. It will manage the remaining requests.
When defining layered systems the important rule to remember is to export as many tasks as possible to the outer
layers.
3.2.6
Code on demand
Code on demand is an optional constrain within REST. REST allows the client to download and execute code in the
form of applets or scripts in order to extend it’s functionality.
This constrain improves system extensibility but at the same time it reduces visibility, and that’s the reason behind
it’s optional condition.
In practice:
As seen before, code on demand may reduce server load and improve network performance but you have to be careful.
If the connection is attacked with a Man-In-The-Middle attack2 the attacker could send malicious code to the client.
That’s why the code should be interpretable and not compiled in order to be able to determine whether the code is safe
or not. Some common used scripting languages3 are JavaScript or Python.
3.3
How to design your APIs
In this section you’ll find a methodology that can be helpful when applied to the API design. The ideas developed in
the next sections will be exemplified using a common example: A flight booking API.
3.3.1
Define functionalities
The design of an API must be started with a top down thinking process. You need to start by listing what functionalities
should your API offer and as much as possible forget about the implementation of this functionalities. It should be
done from the point of view of the client: What does the client need? What would be useful for a client?
You should avoid at all cost to think about how would you implement the functionalities or how this functionalities match to resources because if you can’t stay away from this ideas you might fail to accomplish the Uniform
Interface constrain.
Example:
In this flight booking API we’ll define only one functionality to keep it simple: to access a list of flights. You have
to set a day, an origin and a destination and the API shall return the list.
3.3.2
Define your resources
Once you’ve thought which functionalities your API must offer, the next part is to map each on of them into one or
many resources. It’s important to keep in mind the fourth constrain, specifically the first two conditions of it. You
must define resources that are identifiable and have a representation.
Example:
You could create a resource that parses bits of URI to query a database such as:
1
http://api.myflightcompany.com/flight/{from}/{to}/{yyyy}/{mm}/{dd}/
2 http://en.wikipedia.org/wiki/Man-in-the-middle_attack
3
http://en.wikipedia.org/wiki/List_of_programming_languages_by_type#Scripting_languages
42
But you should communicate this structure to the client and make it construct the URIs instead of you sending
them.
A better solution is to create a query resource, which takes a standard data type that contains the date, the origin city
and the destination city. When the client performs a POST request in this resource, it would process the information,
query the different companies that the API works with and generate a new resource that contains a list of flights. The
server would communicate the new resource’s URI to the client.
This strategy allows the server to implement cache on application level. This means that if a client POSTs a query
and a new resource is generated, this resource may have a validity time, and can be reused to answer other client
requests that arrive in a certain period of time.
3.3.3
Define resource representation
You’ll basically need to define which data is exchanged in every connection with the server. It needs to be communicated to the client so it can be prepared for it.
When designing the data structure you should keep in mind that while REST APIs are built to avoid coupling
between the client and the server and that they increase the independence between them, a change in the data representation is one aspect that REST APIs are not prepared to deal with and it may cause major changes in the client.
Example:
The data structure in JSON for a POST request into the query resource could be:
1
2
3
4
5
6
"query": {
"from":"string",
"to":"string",
"date":"yyyy-mm-dd",
"company": "string" or None
}
Or it could be completely different. You could’ve defined the ’from’ and ’to’ fields by integers that represent the
city. You should then create resources that represents the list of cities and relates the city names with the integer
identification number.
3.3.4
HATEOAS
After the representation data has been decided, you should elaborate the HATEOAS flow chart. This means creating
different states depending on the last action that the client has performed. The client then will be offered a list of valid
links to the possible resources that it could access to follow the application flow.
If the format that you’ve chosen does not define a standard type for links, you’ll have to create your own. You’ll
have to define for every link to what kind of resource does it point.
Example:
In this example there’s only three possible states: Bookmark, Query posted and list of flights shown.
The application flow should be like the one that you can see in figure 3.3
Post a new query
Bookmark
Post a query
Good
Query
query
posted
Show the
flight list
Flight list
shown
Post another query
Figure 3.3: Flight finite state machine representation.
43
3.3.5
Cache
The next step is to define which information is able to be cached or not.
The decision of the time that a resource can be cacheable must be based on the variability of the resource over the
time and on the risk for the client to work with non-valid data.
You’ll also need to specify other cache-related questions such as if the data is private or not. If it’s private, it can’t
be stored in public caches.
Example:
The query resource responds only to POST requests. Since POST is not a cacheable method, the query resource
cannot be cached. However, the resources created when a query request arrives can be cacheable and the validity time
of the data will depend on the data they carry. For example, if they contain the number of free seats remaining on the
plane, the validity time will be short, but if they don’t it will be longer.
3.3.6
Implement your API
Once you’ve fully designed your API functionalities, the resources that will be implemented to offer those functionalities, the structure of the representation data, and the list of cacheable and non-cacheable resources it’s time to
implement your API.
You’ll have to have in mind the part of the Uniform Interface constrain which requires self-descriptive messages.
You’ll have to add as much metadata as possible using the HTTP Headers correctly.
Also, keep in mind to add the links generated to flow the application state correctly into the responses.
In chapter 6 you’ll find a concise explanation of Django framework. You’ll find documentation about how to
implement REST APIs with the framework. and examples to clarify ideas.
44
Chapter 4
REST Practices
4.1
Interface exercices
Exercise 1: Suppose a given fully developed newspaper article API. All the resources are given.
Resource List:
1. Bookmark: A list of URIs pointing to the existing resources
2. Article: It represents an article, it contains information about the newspaper which published the article, it’s
author and the article headline and body.
3. Author: It represents an author. It contains contact information about the author, the newspaper that the author
writes for, etc.
4. List of articles: An author’s list of publications.
5. Newspaper: It represents a publishing authority. It contains information about the authority (Name, location,
list of authors, brief description, etc.)
6. List of newspapers: A list containing all the newspapers.
7. Search by date: It represents a list of articles written on a date sent in the body payload.
8. Search by newspaper: It represents a list of articles published by the newspaper whose representation is sent
in the body payload.
9. Most recent articles: Lists the most recent articles.
Exercise 1.1: Write a URI for each of the previous resources.
Exercise 1.2: For each resource list all the HTTP methods that you’d develop and describe what whould they do.
Exercise 1.3: Describe the application’s flow diagram for a client that wants to read the articles for a certain author.
Exercise 1.4: For each resource think of at least one resource representation and specify it (Remember that with every
representation you must send the hypermedia necessary for the client to be able to follow the application flow). Define
also a private mime type for each resource representation
For example: the bookmark could return a JSON representation that could look like this:
1
2
3
4
5
6
[
"articlelistlink": "URI to `List of Articles` resource",
"newspaperlistlink' : "URI to `List of Newspapers` resource",
"searchbydatelink" : "URI to `Search by Date` resource",
"searchbynewspaperlink" : "URI to `Search by Newspaper` resource"
"recentarticleslink": "URI to `Most recent news`resource"
45
7
8
]
Content-Type: application/vnd+example.bookmark+json
Exercise 1.5: You can see how the ’Search by author’ and the ’Search by newspaper’ resources are very similar. How
could you make them point to the same resource (a search engine) using query parameters in the URI?
Exercise 2: In this exercise we will practice the topics explained in section 3.2.4. We will start from a defined list
of resources for a concrete API and we will work from there. The API is invented and tries to manage an automated
house. It can contain sensors and actuators, for example: you can have temperature and light sensors and heaters and
leds as actuators.
API resources:
• Bookmark: Contains a list of links to the existing resources.
• List of sensors: A list of all the active sensors showing their id number and the URI to access their
information.
• List of actuators: A list of all the active actuators showing their id number and the URI to access their
information.
• Sensor: It represents a sensor. It contains the information about what the sensor is capable of reading (light,
sound, etc.), the link to a resource that returns its actual reading, their refresh rate and an identification number.
• Sensor value: a resource that represents the actual reading of the sensor.
• Actuator: It represents an actuator. It contains a list of actions actions the actuator is capable of performing
(start/stop for example) and an identification number.
• List of Rules: Lists all the defined rules of the system.
• List of active rules: Lists all the rules whose condition is true at the moment of the request.
• Rule: It is the link between sensors and activators. A rule specifies two arrays: one for conditions and one for
actuations. The conditions array will contain a list of condition which will define the sensor to which they’re
applied, the operand (bigger than, smaller than, equal, etc.) and the reference value. The actuations array will
contain a list of objects which will contain the actuator, and the action. They can be accessed through their id
number
Exercise 2.1: Write a URI for each of the previous resources.
Exercise 2.2: For each resource list all the HTTP methods that you’d develop and describe what’d they be used for.
Exercise 2.3: Describe the application flow diagram for a client that wants to add a new rule to the system.
Exercise 2.4: For each resource think of at least one resource representation and specify it (Remember that with every
representation you must send the hypermedia necessary for the client to be able to follow the application flow). Define
also a private mime type for each resource representation.
Exercise 2.5: What cache related property could be related to the ’refresh rate’ parameter existing in a sensor?
4.2
Cache exercises
Exercise 1 (Theoretical): Suppose there is a cache-ready system. It will take a request and if there has been a previous
request in the last T seconds it will return the stored response, otherwise, it will recalculate the response. The time
spent generating a response with the stored data is ’x’ seconds and the time spent generating a new response from the
resource is ’y’ seconds. Suppose also Poisson distribution for the incoming requests with income tax λ req.
sec .
HINT: Consider a fixed span of time (duration T) and consider that the chached response expired just before this
span as you can see in figure 4.1 . The arrows represent requests. The red one represents a request responded with a
non-cached response and the black ones represent responses responded with cache responses.
46
t (s)
T (s)
Validity time
Figure 4.1: Cache model.
Exercise 1.1: Calculate the new average service time (T s0 = 1/µ (s)).
s0
Exercise 1.2: Calculate the relative improvement as T s−T
knowing that the old service time was T s = y
Ts
Exercise 1.3: Check that, supposing x=1, y=5, λ = 1 and T = 10, the new service time is 1.4 s and that the relative
improvement is 72%
Exercise 2: Suppose a given fully developed shop API. All the resources are given.
Resource List:
1. Article: It represents an article, it contains its price, its description and the number of products in stock.
2. List of Categories: List of all the categories on the store.
3. List of Articles: List of all the articles existing in a certain category.
4. Shop Information: Plain information about the store such as legal name, physical shop location, etc.
5. My previous buys: Lists the last 5 buys you’ve done in the shop.
6. Recommended for me: Lists 5 articles that are recommended for the user accessing the API based on
previous buys and a random factor.
7. Sales: List of articles on sale.
In this example, the user based resources (’My previous buys’ and ’Recommended for me’) return different data
depending on the authentication header found. For a non authenticated client, they will return a 401 (unauthorized).
Exercise 2.1: From the point of view of the client, which resources have less variation rate and therefore should be
cached?
Exercise 2.2: From the point of view of an intermediary cache which resources should be stored? Remember that an
intermediary cache may handle requests from multiple users.
Exercise 2.3: For each resource listed before list all the headers that you’d add to its requests and its responses based
on the responses of the previous exercises.
47
Solution exercise 1:
1.1:
1
2
3
4
5
6
7
8
9
Bookmark : ' / '
Article : ' / a r t i c l e /{ id } '
Author : ' / a u t h o r / { id } '
L i s t of a r t i c l e s : ' / author /{ id }/ a r t i c l e s / '
Newspaper : ' / n e w s p a p e r s / { i d } '
L i s t of newspaper : ' / newspaper / '
S e a r c h by d a t e : ' / s e a r c h / b y _ d a t e / { dd−mm−yyyy } '
S e a r h by n e w s p a p e r : ' / s e a r c h / b y _ n e w s p a p e r / { n e w s p a p e r _ i d } '
Most r e c e n t a r t i c l e s : ' / r e c e n t / '
1.2
1
2
3
4
5
6
7
8
9
Bookmark : GET
A r t i c l e : GET , DELETE , POST
A u t h o r : GET
L i s t o f a r t i c l e s : GET , POST
Newspaper : GET
L i s t o f n e w s p a p e r s : GET
S e a r c h by d a t e : GET
S e a r h by n e w s p a p e r : GET
Most r e c e n t a r t i c l e s : GET
1. The GET methods are always used to retrieve information.
2. DELETE on Article, should be used to delete it ONLY if the client making the request is authenticated and has
the right permissions.
3. POST on Article should modify its content ONLY if the client making the request is authenticated and has the
right permissions.
4. POST on List of articles should be used to create a new article ONLY if the client making the request is authenticated and has the right permissions.
1.3
1
2
3
4
5
1−GET
2−GET
3−GET
4−GET
5−GET
1.4
1
2
3
4
5
6
7
8
the
the
the
the
the
L i s t of Newspapers
newspaper
l i s t of authors
l i s t of a r t i c l e s
articles .
-Bookmark
[
"articlelistlink": "URI to `List of Articles` resource",
"newspaperlistlink' : "URI to `List of Newspapers` resource",
"searchbydatelink" : "URI to `Search by Date` resource",
"searchbynewspaperlink" : "URI to `Search by Newspaper` resource",
"recentarticlelink":"URI to `Most recent articles`resource"
]
Content-Type: application/vnd+example.bookmark+json]
2- Article:
1
2
3
{
"newspaper":newspaper_id,
"author":author_id,
48
4
5
6
7
8
9
"article":{
"headline":"String",
"body":"String"
}
}
Content-Type: application/vnd+example.article+json]
3-Author:
1
2
3
4
5
6
7
8
9
10
11
{
"information":{
"name":"String",
"city":"String",
"about_me":"String",
...
},
"newspaper":newspaper_id,
"articleslink":"URI to this author's list of articles",
}
Content-Type: application/vnd+example.author+json]
4- List of articles:
1
2
3
4
5
6
7
8
[
{
"headline":"String",
"articlelink":"URI to the article"
},
...
]
Content-Type: application/vnd+example.articlelist+json]
5-Newspaper:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"information": {
"name":"string",
"location":"string"
}
"authors":[
{
"authorname":"String",
"authorlink":"URI to the author resource"
},
...
]
}
Content-Type: application/vnd+example.newspaper+json]
6-List of newspapers:
1
2
3
4
[
{
"name":"String",
"newspaperlink":"URI pointing to the newspaper"
49
5
6
7
8
}
...
]
Content-Type: application/vnd+example.newspaperlist+json]
7-Search by date:
1
2
3
4
5
6
7
8
[
{
"headline":"String",
"articlelink":"URI to the article"
},
...
]
Content-Type: application/vnd+example.articlelist+json]
8-Searh by newspaper:
1
[
{
"headline":"String",
"articlelink":"URI to the article"
},
...
2
3
4
5
6
7
8
]
Content-Type: application/vnd+example.articlelist+json]
9-Most recent articles:
1
[
2
{
3
"headline":"String",
"articlelink":"URI to the article"
},
...
4
5
6
7
8
}
Content-Type: application/vnd+example.articlelist+json]
Solution exercise 2:
2.1:
Bookmark : ' / '
L i s t of sensors : ' / sensors / '
L i s t of a c t u a t o r s : ' / a c t u a t o r s / '
Sensor : ' / sensors /{ sensor_id } '
Sensor value : ' / sensors /{ sensor_id }/ value '
Actuator : ' / actuator /{ actuator_id } '
L i s t of r u l e s : ' / r u l e s /
L i s t of a c t i v e r u l e s : ' / r u l e s / a c t i v e '
Rule : ' / r u l e s / { r u l e _ i d } '
1
2
3
4
5
6
7
8
9
2.2:
1
2
3
4
5
1−
2−
3−
4−
Bookmark : GET − R e t r i e v e t h e l i n k s t o t h e r e s o u r c e s .
L i s t o f s e n s o r s : GET − R e t r i e v e t h e l i s t o f s e n s o r s .
L i s t o f a c t u a t o r s : GET − R e t r i e v e t h e l i s o f a c t u a t o r s .
S e n s o r : GET − R e t r i e v e a s e n s o r ' s i n f o r m a t i o n .
POST − Change t h e r e f r e s h _ r a t e v a l u e .
50
5− S e n s o r v a l u e : GET − R e t r i e v e t h e a c t u a l v a l u e o f a s e n s o r .
6− A c t u a t o r : GET − R e t r i e v e t h e i n f o r m a t i o n a b o u t an a c t u a t o r
7− L i s t o f r u l e s : GET − R e t r i e v e t h e l i s t o f a l l t h e e x i s t i n g r u l e s .
POST − Add a new r u l e .
8− L i s t o f a c t i v e r u l e s : GET − R e t r i e v e t h e l i s t o f a c t i v e r u l e s
9− R u l e : GET − R e t r i e v e a r u l e .
6
7
8
9
10
11
2.3:
1
2
3
4
5
6
1−
2−
4−
5−
6−
7−
GET Bookmark . S t o r e URIs p o i n t i n g t o t h e r e s o u r c e s .
GET L i s t o f s e n s o r s . S t o r e t h e i r i d ' s .
GET S e n s o r s u n t i l f i n d i n g t h e d e s i r e d one .
GET L i s t o f a c t u a t o r s . S t o r e t h e i r i d ' s .
GET A c t u a t o r u n t i l f i n d i n g t h e d e s i r e d one .
POST A new r u l e t o t h e l i s t o f r u l e s .
2.4: JSON Representation
1- Bookmark:
1
2
[
"sensorlistlink": "URI to `List of Sensors` resource",
"actuatorlistlink' : "URI to `List of Actuators` resource",
"ruleslistlink" : "URI to `List of Rules`resource",
"activeruleslistlink" : "URI to `List of active Rules` resource"
3
4
5
6
7
8
]
application/vnd+myexample.bookmark+json
2- List of sensors:
1
2
3
4
5
6
7
8
[
{
"sensor_id": id_number
'sensorlink': "URI to the sensor with `id_number` identification"
},
...
]
application/vnd+myexample.sensorlist+json
3- List of actuators:
1
2
3
4
5
6
7
8
[
{
"sensor_id": id_number
'actuatorlink': "URI to the actuator with `id_number` identification"
},
...
]
application/vnd+myexample.actuatorlist+json
4- Sensor:
1
2
3
4
5
6
{
"sensor_id": id_number
"magnitude": "String with the magnitude read by the sensor"
"units": "String to know what units the sensor uses"
"refresh_rate": number representing the interval in seconds between reads.
"valuelink": "URI pointing to the actual value of the sensor"
51
}
application/vnd+myexample.sensor+json
7
8
5- Sensor value:
value: number
application/json
1
2
6- Actuator:
{
"actuator_id": id_number
"actions":[
"action": "One of the possible actions",
...
]
}
1
2
3
4
5
6
7
7- List of rules:
[
{Rule object},
...
]
application/vnd+myexample.rulelist+json
1
2
3
4
5
8- List of active rules:
[
"rule":{ Rule object},
...
]
application/vnd+myexample.rulelist+json
1
2
3
4
5
9- Rule:
{
1
"rule_id": id_number,
"conditions": [
"sensor": id_number,
"operand": "string representing an operand",
"value": number representing the reference value
],
"actuations": [
"actuator":id_number,
"action":"String representing an action"
],
"active": True/False
2
3
4
5
6
7
8
9
10
11
12
}
application/vnd+myexample.rule+json
13
14
2.5
1
2
3
4
The r e f r e s h _ r a t e m a t c h e s d i r e c t l y w i t h t h e v a l i d i t y o f t h e d a t a . You c o u l d
s p e c i f y t h e ' D a t e ' h e a d e r a l o n g s i d e w i t h ' max−a g e ' Cache−C o n t r o l d i r e c t i v e
o r s i m p l y t h e max−a g e s i n c e t h e t i m e i t was g e n e r a t e d i s n o t v e r y i m p o r t a n t
because of the changing nature of the value .
52
Solution
1.1:
Since in a cache system we will have two different service times, in order to find the average service time we will
have to look for the expectation of the service time.
E[Ts ] =
∞
X
xi pi
(4.1)
i=1
For our particular case:
E[Ts ] = x × pcached + y × pnocached
(4.2)
To find the probabilities of a response being cached or not we will use the frequency analysis:
ni
N
In our case of study, N will be the average number of arrivals in a T interval which will be:
fi =
N =λ×T
(4.3)
(4.4)
In a T interval there will be 1 non cached response and N − 1 cached responses, therefore:
pcached = fcache =
ncached
N −1
nnocached
1
=
; pnocached = fnocached =
=
N
N
N
N
(4.5)
Applying the equation 4.2:
T s0 = E[X] = x
N −1
1
+y
N
N
(4.6)
1.2
Directly applying the equation given with Ts = y and with the results of 4.2 we obtain:
−1]
y − ( y+x[N
)
T s − T s0
N
=
Ts
y
(4.7)
1
T s − T s0
xN −1
=1−
−
Ts
y N
N
(4.8)
Rearranging the equation we can get:
Which is valid for xy < 1
1.3
Simply applying the values to the equation we get:
T s0 = 1
10 − 1
1
+5
= 1.4s
10
10
T s − T s0
1
1 10 − 1
=1−
−
= 0.72 = 72%
Ts
10 5 10
Solution exercise 2:
2.1
1
2
3
4
5
6
7
A r t i c l e : High v a r i a t i o n due t o t h e p r o d u c t s i n s t o c k v a r i a b l e −> No Cache
L i s t o f c a t e g o r i e s : Very low v a r i a t i o n . −> Cache
L i s t o f a r t i c l e s : Low v a r i a t i o n −> Cache
Shop I n f o r m a t i o n : Very low v a r i a t i o n . −> Cache
My p r e v i o u s b u y s : Low v a r i a t i o n −> Cache
Recommended f o r me : High v a r i a t i o n . −> No Cache
S a l e s : Medium v a r i a t i o n . −> Cache w i t h low v a l i d i t y t i m e
53
(4.9)
(4.10)
2.2
All the resources that are not different for each customer and don’t have high variation rates.
1. List of categories
2. List of articles
3. Shop information
4. Sales
2.3
Article
1
2
3
Request :
Cache−C o n t r o l : no−c a c h e ( o n l y i f t h e i n f o r m a t i o n n e e d e d i s
I f −Changed−S i n c e : −HTTP DATE−
critical )
4
5
6
7
8
Response :
Cache−C o n t r o l : p u b l i c
D a t e : −HTTP DATE−
E x p i r e s : −HTTP DATE c l o s e t o t h e a c t u a l d a t e −
List of categories, List of articles, Shop information and Sales:
1
2
Request :
I f −Changed−S i n c e : −HTTP DATE−
3
4
5
6
7
Response :
Cache−C o n t r o l : p u b l i c
D a t e : −HTTP DATE−
E x p i r e s : −HTTP DATE f a i r l y away from t h e a c t u a l d a t e −
My previous buys:
Ideally, you’d store this list as long as possible, but right after buying an article the list should be updated.
1
2
Request :
Cache−C o n t r o l : no−c a c h e
3
4
5
6
7
Response :
Cache−C o n t r o l : p r i v a t e
D a t e : −HTTP DATE−
E x p i r e s : −HTTP DATE f a i r l y away from t h e a c t u a l d a t e −
Recommended for me:
1
2
Request :
Cache−C o n t r o l : no−c a c h e
3
4
5
6
7
Response :
Cache−C o n t r o l : p r i v a t e
D a t e : −HTTP DATE−
E x p i r e s : −HTTP DATE n o t v e r y c l o s e b u t n o t v e r y f a r away from t h e a c t u a l d a t e −
54
Chapter 5
Other API architectures
After we’ve deeply analyzed how to design REST APIs, let’s see other existing styles and protocols.
5.1
RPC APIs
Remote Procedure Call (RPC) was the first model that was created for developing distributed APIs. It is based on
message exchanges and follows the service oriented architecture.
The client, has to generate a message that exactly identifies a procedure and also contains all the necessary parameters for the procedure to work. Procedures can be subroutines, functions, methods, services, system calls or any
executable object.
When the server receives the message, it inspects the process identifier and calls the procedure indicated mapping
the message parameters into the procedure arguments.
The main problem with RPC model is that the applications are highly coupled. One application is highly coupled
to another when the first one depends strongly on the second one. In other words, if the procedure changes, the
application has to change too. For example, if a procedure suddenly changes it’s output or the arguments it takes, the
applications that call this service have to change immediately.
RPC is meant to work with any transport protocol. However, the application must be conscious about what protocol
is being used, because some adjustments may be required. For example, if the protocol used is not reliable (UDP) the
application must implement its own time-out, retransmission, and duplicate detection policies.
There are different standards for RPC. The most popular one’s are Sun Microsystems’ ONC-RPC, specified in
RFC1831[30] and Open Software Foundation’s DCE/RPC, specified in DCE 1.1 C706 1 . They define authentication
protocols, data formats, data structures, etc.
Nowadays pure RPC is not used. Instead, some ’flavored’ versions of RPC are used such as XML-RPC or JSONRPC.
XML-RPC is a protocol based on RPC developed by Dave Winer in 1998. It applies the basics from RPC but it
uses XML to structure the parameters that are used as input and output, allowing more complex structures such as
arrays . It always uses HTTP-POST messages. JSON-RPC is basically the same but using JSON instead of XML.
5.2
Message based APIs
Message based evolve from RPC but instead adding a new level of abstraction. It was designed to avoid the tight
coupling from RPC. Instead of defining a procedure like in RPC, in message based APIs it is the server which decides
the correct procedure that it has to execute depending on the message received from the client. The message is sent
to a designated URI and in his body contains the needed data and may contain headers to make the message selfdescriptive.
1 http://pubs.opengroup.org/onlinepubs/9629399/
55
Usually message based APIs use standardized message formats like SOAP and they use other standard specifications defined by W3C. For example, they usually use WSDL to define and describe services, they may use WS-Policy
and WS-Security specifications to define the authentication and security schemes, etc.
SOAP is a message-based protocol developed by Microsoft, IBM, DevelopMentor and UserLand. It evolved from
Dave Winer’s XML-RPC protocol, but it is much more complex.
SOAP defines a message construct that structures information in a XML format and defines three blocks:
• An envelope: which contains a SOAP message.
• A Header, which contains control information such as authentication credentials, etc.
• A Body, that contains data.
Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<?xml version="1.0" ?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header>
<p:oneBlock xmlns:p="http://example.com"
env:role="http://example.com/Log">
...
...
</p:oneBlock>
<q:anotherBlock xmlns:q="http://example.com"
env:role="http://www.w3.org/2003/05/soap-envelope/role/next">
...
...
</q:anotherBlock>
<r:aThirdBlock xmlns:r="http://example.com">
...
...
</r:aThirdBlock>
</env:Header>
<env:Body >
...
...
</env:Body>
</env:Envelope>
This Soap Message contains an envelop which contains a header with three blocks of control information and a
body.
SOAP messages may carry information, tasks to execute and events. When the message is received by the server it
determines the exact procedure to call. Clients may send three message types: Command messages to execute a task
on the server, Event messages to notify an event and Document Messages to exchange documents. There is one more
type of messages called Faults which are used as a return from the server as error messages.
Even if SOAP can be used on top of any transport protocol most of the time it is used over HTTP. It always uses
the GET and POST methods. SOAP defines some functionalities that are already implemented by the HTTP protocol,
such as the Faults messages, which are already defined in HTTP as status codes.
An important tehcnology involved in SOAP architectures is Web Services Description Language (WSDL). WSDL
are documents written in XML. They describe web services by describing the lists of operations that a client can call
(and the endpoint where to call them, usually a URI), defining message structures sent and received.
Example:
1
2
3
4
5
6
7
8
<operation name="GetLastTradePrice">
<soap:operation soapAction="http://example.com/GetLastTradePrice"/>
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
56
9
</operation>
In this example you can see how an operation (service) is defined. The ’soap:operation’ item defines an URI where
the operation can be called and it indicates that it is an action. The input and output marks define that request should
contain a literal and that the response will contain a literal too.
5.3
Comparison
There are two basic differences between web services and the REST APIS.
Web services are always built using SOAP and WSDL as data format, while REST APIS can use any kind of data
format and are actually prepared to work simultaneously with more than one data type.
The fact that web services requrie SOAP implementations adds one more discrepancy point: SOAP defines multiple interfaces to treat with different kinds of interaction with the server (SOAP messages may carry information, tasks
to execute and events). REST, on the other hand requires a single unique interface.
Summing up, SOAP is a structured well defined protocol, while REST is a free and variable style that gives
freedom to the developers to build their API on his own way.
Generally, SOAP used to be the default option in enterprise environments, but over the time, many enterprises have
changed their policies and have adopted REST. Amazon web services states that "(they) are still seeing an 80% REST
/ 20% SOAP usage pattern"[27].
It is because REST implementations are simpler and easier to understand and to work with.
There’s an aspect that web services does cover and REST doesn’t, which is other side aspects of the services
give. REST does only talk about interface while web services define multiple protocols to cover aspects such as Legal
protocols, Terms of use protocols, etc. They’re known as the ws-* protocols.
Let’s see the practical differences between the three types of API that we’ve studied in a simple practical case:
Imagine that you create three APIs, one following the RPC model, another following a message based model and
the last one a REST API. They have the same functionality, they access a database row that represents a plane ticket
from an Airline.
Using the RPC API you might have to perform a call similar to the following:
1
2
3
URI: api.myairline.com/
Procedure: QueryDatabase(query)
Params: SELECT * FROM plane_tickets where name='MyName' and date='21/07/2015'
In this case the server calls the procedure specified (it could’ve been specified by a procedure identifier instead of
the function name)
On the other hand, in a message based API, the client might call something similar to:
1
2
URI: api.myairline.com/GetPlaneTicket
Params: name='MyName',date='21/07/2015'
In this example, the server takes the parameters and generates the query that will be sent to the database. In this
case, the client does not need to know the implementation details but it needs to know that it has to send a message
that executes a task instead and that the response will be a message carrying data.
Finally, on a REST API, the ticket would be rendered as a resource and it would be identified by an URI. To get to
the ticket URI, the client could access other resources that contain links to the desired resource:
1
2
GET: api.myairline.com/tickets/user/
GET: api.myairline.com/tickets/user/7657/
In this example, the first URI is used to access a user’s tickets and it returns a list of tickets. The tickets received
contain some information (the date for example) and links to every ticket resource. Finally the user parses the response,
selects the ticket that it wants to retrieve and sends a GET to the URI linked to it.
57
Chapter 6
Django Development
6.1
Introduction to DJANGO
Django is a high-level Phyton Web framework built to make the task of developing web applications much easier.
It’s power comes from the ability to separate the application development from the low-level hassles such as database
connection. Another important aspect of Django is the modularity that brings. A project is a set of applications
assembled.
DJ ANGO
HTTP REQUEST
URIS.PY
MIDDLEWARE
VIEWS.PY
MODEL
TEMPLATE
URI
RETURN A VIEW
HTTP REQUEST
HTTP REQUEST
Model
HTTP RESPONSE
RESPONSE CONTENT
HTTP RESPONSE
Figure 6.1: Django’s request-response procedure.
Django is based on a MVC pattern1 . The most important parts in a Django project are: Models, Views and
Templates. While Django models are synonyms with MVC pattern models, Views and Templates are quite different.
1
http://en.wikipedia.org/wiki/Model-view-controller
58
As we see in figure 6.1 when Django receives a request first of all it goes through middleware, which performs
repetitive tasks such as authentication check. After that Django checks the url from the request and maps it into a
single view using patterns stored in the file uris.py. Then the view is called, it accesses the model to collect the data
needed to elaborate a response and generates it using a template.
Since the views are the element that decides how the response is going to be like there is a tendency to believe that
it matches the controller in a MVC pattern, while actually the controller is Django itself. The framework is the one
that receives the request, handles it to the modules and gets the response back, decides which view is called and calls
it, etc. Models map directly to what is understood in the MVC pattern as a model: They store data from an application.
Finally MVC’s view is separated in two parts in Django: The views decide WHAT data from the model its going to be
showed and the templates decide HOW said data is shown (templates are actually optional, and sometimes the view
also performs the template function).
6.2
Starting a new project
In this section it will be assumed that the reader has already installed python and Django.
It may be important for you to use some software such as virtualenv to avoid dependency and version problems.
To learn how to use it visit https://virtualenv.pypa.io/en/latest/userguide.html.
Starting a new project in Django is very simple, you only need the django-admin tool that has been already installed
with the django framework.
1
$ django-admin.py startproject testproject
After this we have already created a new project. Now to create a new application for our project we have to use
the manage.py script located in testproject/manage.py
1
$ python3 testproject/manage.py startapp testapp
6.3
Project structure
In this section we will list all the files that have been created by django in section 6.2 and we will explain their utility.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
testproject/
manage.py
testapp
admin.py
__init__.py
migrations
__init__.py
models.py
tests.py
views.py
testproject
__init__.py
__pycache__
__init__.cpython-34.pyc
settings.cpython-34.pyc
settings.py
urls.py
wsgi.py
As you can see, Django has created a ’testproject’ directory, a ’testapp’ directory and a ’manage.py’ script. The
’testproject’ directory is unique, it contains project configuration scripts. The most important ones for now are ’urls’
and ’settings’. As stated in the introduction the urls script is used to map urls into views. It’s done using regular
expressions. The settings script contains the database conection configuration, the list of applications used in this
project, the list of middleware used in this project, etc. Finally ’wsgi.py’ is a script using the ’Web Server Gateway
Interface’ that will be detailed in section 6.9
59
On the other hand ’testapp’ is a directory created to accomodate a single application. A project is usually formed
by multiple applications and each one is stored in a different directory. In an application directory there are two out of
the three scripts mentioned in the introduction: models and views. The templates are usually stored in a new directory
inside the app directory but they can be stored anywhere. There is also a ’test’ script that is used to perform tests in an
automated way and an ’admin’ script, that is part of an administration application that is installed by default but that
will not be used in this documment.
6.4
Models
Models in Django provide high level functionality and allow developers to avoid dealing with low level tasks such as
database managing, SQL quering, etc.
In order to create a model you only need to create a class that extends from ’models.Model’ and write your model
atributes as the class’ atributes.
For example:
1
class Person(models.Model):
2
name = models.CharField()
3
CharField represents a string. Django offers a set of pre-built fields that are automatically managed by the framework, you can see the most important ones in table 6.1 and the whole list in: 2
Name
Stored data
BigIntegerField
A 64 bit integer
BooleanField
True or False
CharField
A small or medium size
string
DateField
A date represented by
a python datetime.date
instance
EmailField
A string that is checked
to certify that it is a
email adress
FileField
A file uploaded to the
server
FloatField
A floating-point number
IntegerField
A 32 bit integer
GenericIPAddressField An IPv4/IPv6 adress
TextField
A large string
URLField
A charfield for a URL
Table 6.1: Django’s important field list.
The fields in Django accept some options, for example: If you want to allow your field to be empty you can use
the option: ’Field.blank=True’. A Field accepts more than one option. There are some options that are common to
all the fields, and there are fields that have especific options, for example: ’Field.blank’ is avaliable to all field types
and GenericIPAdressField accepts an option ’GenericIPAdressField.protocol’ that allows you to decide whether if you
accept IPv4 adresses, IPv6 adresses or both. There is a list of important field options in figure 6.2 and you can find the
full list in 3
For example:
2 https://docs.djangoproject.com/en/1.8/ref/models/fields/#field-types
3
https://docs.djangoproject.com/en/1.8/ref/models/fields/#field-options
60
1
NIF = models.CharField(max_length=9, unique=True, primary_key=True)
This code will create a field that allows strings in it, it has to be unique through the table and it will be used as key
to refer to the table row.
Option
null
Value
True or False
Description
When true Django will allow empty
values as NULL in the database
blank
True or False
When True the fill is allowed to be
blank. It is validation-related, while null
is database-related.
choices
A List or a Tuple both of them The tuples are the possible choices for
having to consist of iterables the field. The first element of the inner
of exactly two tiems ([A,B], tuple (A and C) are the values that will
[C,D],...)
be stored in the databse, and the
second element (B and D) are the
human-readable equivalents.
default
A value or a callable object The default value is used when model
instances are created and a value isn't
provided for the field.
error_message
A dictionary with keys
Overrides the default messages that the
s
matching the error messages field may raise
you want to override
primary_key
True or False
If none of the fields in a model has a
primary_key=true option Django will
create an AutoField to hold a primary
key
unique
True or False
It will force the field to be unique in the
table. If there is a conflict it will raise a
django.db.IntegrityError
Table 6.2: Django’s important field options list.
6.4.1
Model relationships
Django accepts relationships between models. There are three basic relationships allowed: one to one, many to one
and many to many.
One to one relationships can be used by defining a ’OneToOneField’ that takes a class model as argument. It may
be useful for example to define a marriage status.
1
2
3
4
class Person(models.Model):
name = models.Charfield(max_length=30)
DNI = models.Charfield(max_length=9)
isMarriedTo = models.OneToOneField(Person, blank=True)
One to one relationships can be changed as if they were normal fields. If changes occur to a model the other related
model will be updated too. Remember: if you retrieve a model from the database and you change some of it’s attributes
you’ll have to use the method ’save()’ or the changes won’t be applied.
Many to one relationships are defined by a ’ForeignKey’ field. The ForeignKey accepts as argument a model class.
It could be used to define properties.
1
2
3
class House(model.Model):
address = models.CharField(max_length=50)
landlord = models.ForeignKey(Person)
61
When you define a many to one relationship in the ’one’ element (In this case the ’Person’) a new field is created
by Django, it is the name of the ’many’ class followed by ’_set’. In this example, Django will add a field called
’house_set’ to ’Person’. You can use this new attribute to add new ’Houses’ to a ’Person’ with the method ’add()’
which will take a ’House’ object as attribute. You can also delete an object from a determied set (without deleting it
from the database) with the method ’remove(object)’. To delete all the objects in a set you can use ’clear()’. Finally the
method ’create()’ applied to the set will create a new object and add it automatically to the set. Remember again that if
you modify some objects you need to use the ’save’ method or the changes will be lost. If you remove the ’one’ object
from the database (In this case the Person) the ’many’ that belong to it (In this case the Houses) will be removed.
Many to many relationships are implemented by the ManyToManyField field. It can be useful for example for
students and subjects (One student attends many subjects and each subject has many students enroled).
1
2
3
4
class Student(models.Model):
name = CharField()
age = IntegerField()
subjects = ManyToManyField(Subject)
5
6
7
8
9
class Subject(models.Model):
name = CharField()
ECTS = IntegerField()
Lab = BooleanField()
In many to many relationships you can access one from each other. In this case, you can acces the subjects a student
is taking using the ’subjects’ attribute, and you can have access to the students enroled in a subject by accessing the
’student_set’ on ’Subject’. Just like it happened in many to one relationships. You can also use the same methods (add,
remove, clear, etc.). If you need to add data to the relationship itself you can use an intermediate model4 .
6.4.2
Managers and QuerySets
The models that we’ve seen until now are only a structure, they don’t represent any data in any database. To actually
manage some data you need a Manager.
Managers are objects that every model has innerited from models.Model. They are the responsible ones for the
communication with the database (They are Data Acess Objects5 ). They perform the queries and they store the data
retreived (in QuerySets). The default manager is named ’objects’. Managers offer two basic methods: The first one
is ’all()’. It returns every object in the database. The second one is ’get_queryset()’. It is equivalent to the ’all()’
method. The trick here is that you can create your own manager, extend it from models.Manager and override the
’get_queryset()’ method and reduce the number of results (filter) to make it more specific.
QuerySets are sets of elements. They are retrieved from managers or from other QuerySets. You can use some
methods that return QuerySets on Managers or on other QuerySets. The most important methods are listed in table 6.3
and you can find the complete list in Django’s field-options-tabledocumentation6 .
6.5
Views
As stated before, a view is a callable object that takes an ’HttpRequest’ and returns an ’HttpResponse’. There are two
basic ways of developing views: The first one, is to develop a function that renders the view. The second one is to use
a class-based view. The class based views allow the user to structure better the views and reuse code.
6.5.1
Function views
When developing function views you’ll have to handle Requests and Responses.
4
https://docs.djangoproject.com/en/1.8/topics/db/models/#intermediary-manytomany
5 http://en.wikipedia.org/wiki/Data_access_object
6
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#methods-that-return - new-querysets
62
Method
Use
filter(field1='value1',fiel2='value2')
Returns a QuerySet where all the result
match the condition(s) passed.
exclude(field1='value1',fiel2='value2')
The opposite of filter. It returns all the
results that don't match the condition(s)
passed
order_by('field1','field2')
It orders the QuerySet by the fields
specified. If you want descendig order
you have to write '-' before the field:
('-field1')
reverse()
It reverses the actual order of the
QuerySet
Table 6.3: Django’s important QuerySet methods.
HttpRequests7 are objects which contain all the information from the request. They have some attributes and some
methods. Methods are not very useful and we will omit them in this document, but if you want to learn about them you
can check Django’s documentation 8 . Attributes, on the other hand, will be very useful. In table 6.4 you can find the
most important ones and if you want to learn about the less important ones you can look into Django’s documentation
9
.
Attribute
Description
body
It contains the raw
data received from the
client.
method
A string that represents
the Http method used
in the request (GET,
PUT...)
GET
A dictionary containig
all the pairs variablevalue sent from the
client (If the method
used is GET)
POST
A dictionary containig
all the pairs variablevalue sent from the
client (If the method
used is POST)
FILES
It contains the files
uploaded to the server
through a form (if any)
META
A dictionary that stores
all the HTTP headers
sent.
Table 6.4: Django’s HttpRequest object’s attributes.
HttpResponses are objects that will be generated and will contain the body of the response and all the HTTP
headers desired. The content and the headers of the response can be passed to the HttpResponse’s constructor or it can
be added later:
1
response = HttpResponse(body, content_type="text/plain", date="Mon, 4 May 2015 15:05:30 GMT")
1
response = HttpResponse(body)
response['Content_Type']= 'text/plain'
response['Date']= 'Mon, 4 May 2015 15:05:30 GMT'
2
3
7
https://docs.djangoproject.com/en/1.8/ref/request-response/#httprequest-objects
8 https://docs.djangoproject.com/en/1.8/ref/request-response/#methods
9
https://docs.djangoproject.com/en/1.8/ref/request-response/#attributes
63
NOTICE: When passing headers on the constructor you have to use lower case but if you access to the response’s
dictionary you have to use the exact header as specified in HTTP protocol.
Example 1: let’s develop a simple view that takes a field named ’name’ from the request’s body and returns a html
string.
1
from django.http import HttpResponse
2
3
4
5
6
7
8
9
def salute(self, request):
response ='<html>'
response+=' <body>'
response+=' Hello '
response+=request.GET['name']
response+=' </body>'
response='</html>'
10
return HttpResponse(response)
11
In the case we only need to return a status code we can use some HttpResponse subclasses10 :
1
return HttpResponseNotFound("The resource you are looking for is not valid")
6.5.2
Class-Based views
Django uses callable objects as views but if you make the function called belong to a class you can gain some advantadges such as code reuse. For example, you could make an auxiliar function to handle OPTIONS requests.
Django defines a Base View that you can extend to inherit the following functionalities:
• Validates arguments passed into the view configuration
• Prevents using arguments named after HTTP methods
• Collects arguments passed in the URL coniguration
• Keeps request information in a convenient place for methods to access
• verifies That a requested HTTP method is supported by the view
• Automatically handles OPTIONS requests
• Dispatches to view methods based on the requested HTTP method
It has an argument: ’http_method_names’ that lists all the methods that the view will handle.
Django has some built-in simple views that can however be very useful to us such as ListView and DetailView.
ListView has been designed to take a list of objects from the model and display them by rendering a template. DetailView has been designed to take a single object and generate an output with it through a template. To do so you only
need to create a class and extend it from one of this classes. We will not take a deeper look into this classes because
they are not useful in the process of developing an API.
1
2
from django.views.generic import View
from django.http import HttpResponse
3
4
class MyView(View):
5
http_method_names = ['get', 'post', 'put', 'delete', 'options']
6
7
def get(self,request,*args, **kwargs):
return HttpResponse("This request was a get request")
8
9
10
def post(self, request, *args **kwargs):
return HttpResponse("This one was a post")
11
12
13
def put(self,request, *args, **kwargs):
return HttpResponse("This other one was a put")
14
15
16
def delete(self, request, *args, **kwargs):
17
10
https://docs.djangoproject.com/en/1.8/ref/request-response/#httpresponse-subcla sses
64
return HttpResponse("The last one was a delete")
18
To develop a function based view with the same functionality it’d require a messy code:
1
from django.http import HttpResponse
2
3
def MyView(self, request, *args, **kwargs):
4
if request.method == 'GET':
return HttpResponse("This request was a get request")
5
6
7
elif request.method == 'POST'
return HttpResponse("This one was a post")
8
9
10
elif request.method == 'PUT'
return Httpresponse("This other one was a put")
11
12
13
elif request.method == 'DELETE'
return HttpResponse("The last one was a delete")
14
15
16
elif request.method == 'OPTIONS'
return HttpResponse("Allow: GET, POST, PUT, DELETE")
17
18
6.6
URI patterns
In the file ’urls.py’ there will be a list of url objects. The list must be called urlpatterns. The url construtor takes 5
arguments:
• Pattern: A string that represents a python regular expression.
• View: The callable object that will be called if the url matches the pattern.
• kwargs: (OPTIONAL, DEFAULT=NONE) It contains a dictionary with arguments that will be passed to the
view.
• Name: (OPTIONAL, DEFAULT=NONE) A string to store a name for the pattern.
• Prefix: (OPTIONAL, DEFAULT=NONE) It’s not necessary at all and in version 2.0 of Django it will be removed, so it will always be void.
Examples:
1
2
3
4
5
urlpatterns=[
url(r'index/$', views.index, name='index'),
url(r'^users/?$'), views.user_list, name='user-list'),
url(r'^users/(?P<pk>\d+)/?$'), views.user, name='user-detail'),
]
In the first one we can see a ’ ˆ ’ character. It matches the start of the line (Notice that django will remove the first
part from a URL, For example: In the url: ’www.test.com/index/’ Django will trim the ’www.test.com/’ and the string
to be matched will be ’index/’). We also see a $ character. It will match the end of the line. All the other characters
match themselves. This pattern will catch the exact string: ’index/’. If we remove the first ˆ character it would match
any string that ends with ’index/’. For example: ’test-index/’ would be a match. On the other hand, if we remove the
$ symbol it would match any string that starts with ’index/’. For example: ’index/test’ would match. Finally, if we
remove both ˆ and $ it would match any string that contains the substring ’index/’. For example: ’this is a index/ test’
would also match.
In the second one we can see a ? char after the slash. This means that it will match the string only if there is zero
or one slash.
In the third one we can see a \d+ pattern. This will match if there is a decimal digit (\d) repeated one or more times.
We can also see a structure (?P<name>pattern). This is a named group. If the string matches and a view is called,
inside the argument kwargs there will be a string named ’name’ whose value will be the part of the url that matches
the inner pattern. In this example, ’users/1/’ will match and it will send to the view a string named ’pk’ with value ’1’.
65
If you want to learn more about regular expressions in python you can look into 11 . You can also find a great testing
tool here 12
6.7
Formatting the output
Generally, when creating an API what we really need is to exchange text information. We’ve seen that receiving and
sending plain text (or html, since we treat it as text) is very easy but in most applications it’s not going to be very useful,
since html is hard to parse. The most extended text-based formats are JSON and XML. Django provides ’serializers’
classes that convert a QuerySet into JSON, XML and YAML.
In order to serialize objects you’ll have to call ’serializers.serialize()’ method. It takes a string that represents the
format, an object and some options and turns it into a string in the requested format.
On the other hand, to deserialize it we will not use the default django method (’serializers.deserialize’). Instead we
will use python’s method.
Example: Serialize a list of cars stored in the database:
1
2
from django.core import serializers
from car.models import Car
3
4
5
car_list = Car.objects.all()
serialized_data = serializers.serialize('json', car_list)
Result:
1
[
{
"model": "carlist.car",
"fields":
{"model": "ModelA", "brand": "BrandA", "price": 20000},
"pk": 1
},
{
"model": "carlist.car",
"fields":
{"model": "ModelB", "brand": "BrandB", "price": 24000},
"pk": 2
},
{
"model": "carlist.car",
"fields":
{"model": "ModelC", "brand": "BrandC", "price": 18000},
"pk": 3
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
]
We can see in the output that the three items stored in the database have been exported into the string. The output
contains the id number that represents each object and the model whom each object belongs to. These are not necessary
fields but Django exports them because if you use their deserializer you can store them in the database directly13 .
Python has a module called json that has a method called ’loads’ that takes a string and returns a variable that
maps the string json fields as python code. For example, in the previous example there is a three item list: The three
items will be modelled as a list. Each item contains 3 items (pk, model and fields). All of them are dictionary type
11 https://docs.python.org/2/howto/regex.html
12 http://www.pyregex.com/
13
https://docs.djangoproject.com/en/1.8/topics/serialization/#deserializing-data
66
(name:value), so they will be modelled as a dictionary. Finally ’fields’ contains 3 variables (model, brand and price)
and they are also a dictionary, so they will be also modelled as a dictionary.
Example 2: Deserialize the car list:
1
2
from carlist.models import Car
import json
3
4
def deserialize(serialized_data):
5
deserialized_data = json.loads(serialized_data)
for car in deserialized_data:
Car.objects.create(
model=car['fields']['model'],
brand=car['fields']['model'],
price=car['fields']['price'],
)
6
7
8
9
10
11
12
6.8
Middleware
Middleware is a very helpful tool in Django that allows you to automate some routines. Middleware pieces are simple
functions that allow you to modify requests and responses. The middleware configuration is stored in the ’settings.py’
file, in a tuple called ’MIDDLEWARE_CLASSES’.
Why would it be useful for? For repetitive tasks that you must perform on every request or response. You can
actually do the same in views but by using middleware you put it on a second plane, and you can forget about
it. For example: The most used middleware is django’s authentication middleware. It is responsible for managing
authentication credentials. When using this middleware the only thing you must do is add some decorators to indicate
which of your views require authentication.
In figure 6.1 the middleware has been simplified. There is actually five types of middleware functions, two of them
are called before calling the view and the other three are called after.g445
Before calling the view (in calling order):
1. process_request: It is called before the URI has been parsed. The view to use has not been decided yet. It takes
just one argument: request (remember to add ’self’ when creating yours).
2. process_view: It is called when it has been already decided which view must be used. It takes four arguments:
request, view_func, view_args and view_kwargs. view_func is the function object that will be called (Not a
string with the function’s name). view_args and view_kwargs are the same parameters that will be passed to the
view. (Remember to add ’self’ too).
Both process_request and process_view can either return a HttpResponse object or None. If they return a HttpResponse the request-response chain is cut and the generated object is directly sent to the client. If they return ’None’
the process continues.
After the view has been processed (in calling order):
1. process_exception: It is called only if a view raises an exception. It takes two arguments: request and exception.
It can return either a HttpResponse or None. If it returns HttpResponse the request-response flow will continue
normally, calling the next middleware piece, but if it returns ’None’ then the default django’s exception handler
will act, most likely returning a 500(internal server error) status code.
2. process_template_response: It is called after the view has been processed. It takes two arguments: a request and
a response, being the response a Template Response instead of a HttpResponse. It should return a render object
or None. In this document it will not be used.
3. process_response: It is called right before the response is sent to the client. It takes two arguments: request and
response. It should always return a HttpResponse.
67
The middleware can be stacked, meaning that you can use more than one middleware component in the same
project. If you have more than one middleware component the functions will be called in the order listed above, and
the middleware components will be called in the order in which were defined in the ’MIDDLEWARE_CLASSES’
tuple. However, after the view is called, the middleware will be called in the reverse order.
For example: If your middleware configuration looks like this:
1
2
3
4
MIDDLEWARE_CLASSES = (
'mymiddleware1',
'mymiddleware2',
)
The order of execution would be:
1
2
3
4
5
6
7
8
9
process_request
mymiddleware1
process_request
mymiddleware2
process_view
mymiddleware1
process_view
mymiddleware2
VIEW PROCESSING
p r o c e s s _ t e m p l a t e _ r e s p o n s e mymiddleware2
p r o c e s s _ t e m p l a t e _ r e s p o n s e mymiddleware1
process_response
mymiddleware2
process_response
mymiddleware1
Be careful, if a function returns a response before calling the view, the remaining middleware functions will not be
executed and it will start processing the process_response middleware corresponding after the view. For example, if my
middleware1 process_view returns a response, the middleware2 process_view will not be called and the middleware2
process_template_response will be called. This means that you cannot rely on actions supposed to be done in your
request middleware when you’re developing the response middleware.
In a REST API development, middleware can be applied to develop and test layered systems. It allows you to
develop some components separately without having to use virtualization schemes.
6.9
Deploy the project
Django uses a WSGI. It stands for Web Server Getaway Interface. It’s a specification on how should the servers and
python applications communicate. In a development phase, django provides you with a very basic server that you can
execute with the manage.py script located on the project’s directory but once you’ve finally finished your project you
should use another server.
There are many servers that support WSGI. You can find a list of them and more extended information in 14 .
6.10
Cache in Django
It is possible to implement a caching application with django. To do so, we will need to have a running memcached
daemon. Memcached daemon stores data in memory in a key-value format. It can be accessed through a tcp socket.
To install memcached you can run:
1
2
u s e r ~$ s u d o a p t −g e t i n s t a l l memcached
u s e r ~$ s u d o s e r v i c e memcached s t a r t
Memcached is prepared to be used in a distributed architecture, meaning that the same cache may be used for many
servers, sharing the stored data between servers if desired.
You’ll also need a python library to communicate with the daemon. The most usual ones are python-memcached
and pylibmc.
Important: python-memcached requires a version higher than 2.0 but at the moment of writing this document it’s not
compatible with python 3.
To install python-memcached and pylibmc you can use pip tool:
14 http://wsgi.readthedocs.org/en/latest/
68
u s e r ~$ s u d o p i p i n s t a l l p y t h o n−memcached
and
u s e r ~$ s u d o p i p i n s t a l l p y l i b m c
1
2
3
Once you have both the daemon running and the library installed you have to configure django. Open the settings.py script and add the following lines:
CACHES = {
' default ' : {
'BACKEND ' : ' d j a n g o . c o r e . c a c h e . b a c k e n d s . memcached . MemcachedCache ' ,
' LOCATION ' : ' 1 2 7 . 0 . 0 . 1 : 1 1 2 1 1 ' ,
}
1
2
3
4
5
6
}
By adding this lines django only stores the cache service information but does not use it.
There are two tools for caching:
1. Django Middleware: Django has some built-in middleware that handles the caching.
2. cache_page: A decorator that defines the time which the result of a view can be stored.
You can use both of them at the same time.
How to use middleware:
Add the middleware classes to the MIDDLEWARE_CLASSES stack from settings.py:
MIDDLEWARE_CLASSES = (
' django . middleware . cache . UpdateCacheMiddleware ' ,
' django . middleware . cache . FetchFromCacheMiddleware ' ,
1
2
3
4
)
Important: You must add them in this order or there will be conflicts.
You also have to add three variable in the settings file:
CACHE_MIDDLEWARE_ALIAS = " S t r i n g "
1
2
CACHE_MIDDLEWARE_SECONDS = 999
3
4
CACHE_MIDDLEWARE_KEY_PREFIX = " "
5
The alias is used as a ’namespace’ to avoid collisions if multiple applications are using the same cache. The
seconds number indicates the validity in seconds of the stored data. The key prefix is a string used when you share the
same cache for multiple servers serving the same resources, but in this document it will not be used, so you can leave
it empty.
There are two ways of using the cache_page decorator:
The first one is to use it in the urls.py script with the format: cache_page(max seconds)(View to call)
1
django . views . d e c o r a t o r s . cache import cache_page
2
3
urlpatterns = patterns ( ' ' ,
4
url ( r '^ cachetest /$ ' ,
cache_page (60*5) ( CacheTesterView . as_view ( ) ) ,
name= ' c a c h e−t e s t ' )
5
6
7
8
)
Defining the seconds in a 60*X format where X represents minutes is a common practice to add readability to the
scripts. If you use the result directly (in this case 300) it is also correct and will run just fine.
The other way of using the cache_page decorator is right before the view function that you want to cache:
1
django . views . d e c o r a t o r s . cache import cache_page
2
3
@cache_page ( 6 0 * 5 )
69
4
5
d e f mycachedview :
...
If you’re using class-based views you should use the decorator in urls script, because it will be applied to GET,
OPTIONS and HEAD methods, and you won’t need it on the other ones. Also, at the time of writing this document
the decorator applied to a class based view’s function raises a non-documented exception.
6.11
Example 1: File distribution API
In this example develop an API that allows us to access some files remotely. Since this document aims to instruct
people with low knowledge about django we will develop this example in an iterative and incremental way.
6.11.1
First iteration
In this iteration we will focus on creating a simple model that rougly maps a file, and a Function-based view and a
Class-based view that allows to retrieve information from the server (A list of files and a detailed file).
First of all we will start by creating our project:
1
$ django-admin.py startproject file_distribution_api
Once our project has been created we will create the first app of the project. With this app we will manage a single
file. To do so we will type:
1
$ python3 manage.py startapp files
To create the ’Files’ model we will have to edit the ’files/models.py’:
1
2
from django.db import models
from django.core.urlresolvers import reverse
3
4
# Create your models here.
5
6
class File(models.Model):
7
8
9
file_name = models.CharField(max_length=255,)
file_type = models.CharField(max_length=5,)
file_location = models.CharField(max_length=255,)
10
11
12
13
def __str__(self):
return ' '.join([self.file_name,self.file_type,self.file_location])
14
15
16
def get_absolute_url(self):
return "/files/%i" % self.id
17
18
NOTICEABLE PARTS:
***********************
• Every model class inherits from a base class ’models.Model’.
• The attributes contained in this class will be mapped into the database and will belong to one of the field types
offered by django 15 . The most important ones are listed on table 6.1
• The function __str__() overrides a model method. There are many methods that are automatically given to the
model16 but some of them are worth overriding yourself. __str__() is one of them.
15 https://docs.djangoproject.com/en/1.8/ref/models/fields/#field-types
16
https://docs.djangoproject.com/en/1.8/ref/models/instances/#model-instance-meth ods
70
• The method get_absolute_url() is used to calculate the URL for an object. In this particular case, every file object
will be accessible from ’/files/id-of-the-file’ (for example: ’/files/128’). There are more sofisticated methods to
do this task but we will not need them, you can find more information about this on Django’s documentation17 .
***********************
Now our model is created but the application ’fles’ doesn’t belong to the project yet. To include it we will have
add it in the file settings.py, in the INSTALLED_APPS tupple. It should look something like this:
1
# Application definition
2
3
4
5
6
7
8
9
10
11
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'files',
)
After creating the model you have to update the database using the manage.py script. It will create all the new
tables that will contain the fields needed and everything necessary for it to work.
1
2
3
4
5
6
7
8
$ python3 manage.py syncdb
Operations to perform:
Apply all migrations: contenttypes, sessions, auth, admin
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying sessions.0001_initial... OK
By default if it is the first time that you’ve used the syncdb order the script will ask you to create a superuser, in
this example we will use it, so create it.
It is possible that by executing the syncdb command django does not actually export your model to the data base.
In that case you will simply have to run:
1
2
3
4
5
6
7
8
9
$ python3 manage.py makemigrations
Migrations for 'files':
0001_initial.py:
- Create model File
$python3 manage.py migrate
Operations to perform:
Apply all migrations: contenttypes, admin, auth, files, sessions
Running migrations:
Applying files.0001_initial... OK
Now that we have our model well defined it is time to create a view for it. As we saw in section 6.5 there are two
ways of developing views. In this example we’ll show a possible implementation for both of them.
Class-based:
1
2
3
4
from
from
from
from
django.shortcuts import render
django.http import HttpResponse, Http404
django.views.generic import ListView
django.views.generic import DetailView
5
6
from files.models import File
7
8
9
10
# Create your views here.
class ListFileView(ListView):
model = File
17 https://docs.djangoproject.com/en/1.8/ref/urlresolvers/
71
11
template_name = 'file_list'
12
13
14
15
class FileView(DetailView):
model = File
template_name = 'file'
NOTICEABLE PARTS:
***********************
• The class ListFileView inherits from ListView, a generic Class-based view that is used to list a set of objects
following a certain template. As we saw, there are many of them18 .
• The clas FileView inherits from DetailView, that we will use when we want to expose a certain element.
***********************
Templates:
1
<h1>Files</h1>
2
8
<ul>
{% for file in object_list %}
<li class="file">
<a href="{{ file.get_absolute_url }}">{{file}}</a></li>
{% endfor %}
</ul>
1
<h1> {{ contact }} </h1>
3
4
5
6
7
2
3
<p> File: {{ file.file_name}} </p>
NOTICEABLE PARTS:
***********************
• HTML code can be written directly into the template.
• To interact with the model fields and its methods you have to use double braces {{·}}
• In order to use control structures such as ’for’ or ’if’ you have to wrap them with {% ·%}.
***********************
Function-based:
1
2
from django.http import HttpResponse, Http404
from files.models import File
3
4
5
6
7
8
9
10
def SingleFile(request, pk):
try:
file = File.objects.get(id=pk)
html = "<html><body><p>%s</p></body></html>" % file.file_name
return HttpResponse(html)
except File.DoesNotExist:
raise Http404
11
12
13
14
15
16
17
18
19
20
21
def FileList(request):
file_list = File.objects.all()
html = "<html><body><h1>Files</h1><ul>"
for file in file_list:
html += "<li>"
html += "<a href=\""+file.get_absolute_url()+"\">"+file.file_name+"</a>"
html += "</li>"
html += "</ul>"
html += "</body></html>"
return HttpResponse(html)
NOTICEABLE PARTS:
***********************
18 https://docs.djangoproject.com/en/1.8/ref/class-based-views/
72
• SingleFile has two arguments:
– Request is an object that represents the Http Request received by django. It contains the HTTP method
sent by the client, other attributes set by middleware classes, etc. 19
– pk is a string that represents the file’s id, it is sent by the url dispatcher 20 . It will be explained in the
urls.py file.
• File.objects is an attribute inherited from Models.model. It is responsible for retrieving the instances from the
database. At the moment, we will only need 3 methods:
– File.objects.all(): retrieves all the objects in the database
– File.objects.filter(field_name=value): returns the set of all objects that match the condition.
– File.objects.get(field_name=value) that will return a single matching object. It is used to retrieve objects
by their unique attribute. ATTENTION: if there isn’t any object that matches, this method will raise a
File.DoesNotExist exception.
• A view is responsible for returning a HttpResponse object. HttpResponse’s constructor takes a string with all
the content and other values, for example HTTP headers such as content_type.
URLS.py
1
2
3
from django.conf.urls import patterns, include, url
from django.contrib import admin
import files.views
4
5
6
7
8
9
10
11
12
13
urlpatterns = patterns('',
url(r'^admin/', include(admin.site.urls)),
url(r'^files/$',
files.views.ListFileView.as_view(),
name='file-list'),
url(r'^files/(?P<pk>\d+)/?$',
files.views.FileView.as_view(),
name='file-view'),
)
As we saw in section 6.6 the file urls.py is the link between a URI pattern and the view django will call.
This file maps 3 types of adresses:
1. Any url that starts with ’admin/’ will be managed by the admin application
2. The url ’files/’ will be used as entry point to the list of files.
3. Any url ’files/anything’ will call the detailed view of a single file. Using the expression ’(?P<name>pattern)’
the url dispatcher passes the group to the view function called.
Remember that any url that doesn’t match one of this patterns will cause a 404 error message. Also, if a url matches
the 3rd rule but the file’s id is not registered in the database django will also return a 404 response.
6.11.2
Second Iteration
In this second iteration we will make a more complex model and we will develop views so you can add and delete
files.
Let’s take a look at the new model in ’files/models.py’:
1
2
3
4
from django.db import models
from django.core.urlresolvers import reverse
from django.contrib.auth.models import User
# Create your models here.
5
6
class File(models.Model):
7
19
https://docs.djangoproject.com/en/1.8/ref/request-response/#httprequest-objects
20 https://docs.djangoproject.com/en/1.8/topics/http/urls/
73
filetype_list=(['txt','text'],
['jpeg', 'image'],
['png','image'],
['gif','image'],)
8
9
10
11
12
13
14
15
16
name = models.CharField(max_length=255)
file_type = models.CharField(choices=filetype_list,max_length=4)
location = models.URLField(unique=True,primary_key=True)
owner = models.ForeignKey(User)
17
18
19
20
21
22
def __str__(self):
return ' '.join([self.name,
self.file_type,
self.location,
self.owner])
23
24
25
def get_absolute_url(self):
return "/files/%i" % self.id
Changes:
• ’file_type’ has now a choices option. It will only allow one of the strings stored in the tuple ’filetype_list’
• ’location’ is now a ’URLField’ and will be the primary key for the file table.
• ’owner’ field has been aded. It represents a many-to-one relationship between users and files.
We will now test the one-to-many relationship between users and files:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ python3 manage.py shell
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from django.contrib.auth.models import User
>>> from files.models import File
>>> u = User.objects.create_user('user1',
'[email protected]',
'user1password')
>>> u
<User: user1>
>>> u.save()
>>> File.objects.all()
[]
>>> file1 = File.objects.create(name='file1',file_type='txt',
... location = '/test/file1', owner = u)
>>> u.file_set.all()
[<File: file1 txt /test/file1>]
>>> file1.owner
<User: user1>
>>> file2 = File.objects.create(name='file2',file_type='txt',
... location='/test/file2',owner = u1)
>>> file2.owner
<User: user1>
>>> u1.file_set.all()
[<File: file1 txt /test/file1>, <File: file2 txt /test/file2>]
>>> User.objects.get(username='user1').delete()
>>> User.objects.all()
[<User: admin>]
>>> File.objects.all()
[]
You can verify the model relationship properties developed in section 6.4.1 with the django shell.
1
2
3
4
from
from
from
from
django.shortcuts import render
django.views.generic.base import View
files.models import File
django.http import HttpResponse, HttpResponseNotFound
74
5
6
7
8
9
from
from
from
from
from
django.core import serializers
django.core.exceptions import ObjectDoesNotExist
django.contrib.auth.models import User
django.views.decorators.csrf import csrf_exempt
django.utils.decorators import method_decorator
10
11
import json
12
13
14
# Create your views here.
class FileView(View):
15
16
http_method_names = ['get', 'delete', 'options']
17
18
19
20
21
22
23
24
25
def get(self,request, *args, **kwargs):
file_location=kwargs['pk']
try:
file = File.objects.get(location=file_location)
except ObjectDoesNotExist:
return HttpResponseNotFound()
serialized_file = serializers.serialize('json', [file])
return HttpResponse(serialized_file)
26
27
28
29
30
31
32
33
34
def delete(self, request, *args, **kwargs):
file_location=kwargs['pk']
try:
file = File.objects.get(location=file_location)
except ObjectDoesNotExist:
return HttpResponseNotFound()
file.delete()
return HttpResponse(status=200)
35
36
37
38
@method_decorator(csrf_exempt)
def dispatch(self, *args, **kwargs):
return super(FileView, self).dispatch(*args, **kwargs)
39
40
class FileListView(View):
41
42
http_method_names = ['get','put','options']
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def put(self, request, *args, **kwargs):
unicode_data = request.body.decode('utf-8')
data = json.loads(unicode_data)
try:
file = File.objects.get(location=data[0]['pk'])
return HttpResponse(status=403)
except ObjectDoesNotExist:
if True:
owner=User.objects.get(id=data[0]['fields']['owner'])
file = File.objects.create(location=data[0]['pk'],
name=data[0]['fields']['name'],
file_type=data[0]['fields']['file_type'],
owner=owner)
return HttpResponse(status=200)
else:
return HttpRespone("Forbidden")
60
61
62
63
def get(self,request,*args,**kwargs):
output = serializers.serialize('json',File.objects.all())
return HttpResponse(output)
64
65
66
67
@method_decorator(csrf_exempt)
def dispatch(self, *args, **kwargs):
return super(FileListView, self).dispatch(*args, **kwargs)
Changes:
75
• The classes don’t extend ’ListView’ and ’DetailView anymore. Instead they extend a base view.
• Since we are building an API the responses are now serialized data in ’json’ format.
• There are more methods allowed now: You can use the methods listed in the attribute ’http_method_names’:
GET, DELETE and OPTIONS for ’FileVIew’ and GET, PUT and OPTIONS for FileListView. The method
names represent the action that the server will perform when received.
• OPTIONS method is handled by the upper class (View).
• There is a method decorator before the dispatch function. By default Django has some middleware installed that
handles sessions and some sort of security. By now we are not trying to secure our API but we can’t disable
this middleware because it is required in order to use some applications. To solve this django offers a function
decorator (’csrf_exempt’) but since we have class based views and function decorators can’t be used on methods
we need a new decorator before dispatch that adds the rule desired (in our case ’csrf_exempt’) to all the methods
in the class. The methods that are sensible to the ’csrf’ token are PUT, DELETE and POST.
• Django serializes the primary key object under a ’pk’ tag. In this example the primary key is ’location’ so in the
put method the location information is retrieved from ’pk’ instead of ’fields’
1
2
3
from django.conf.urls import patterns, include, url
from django.contrib import admin
from files.views import FileView, FileListView
4
5
6
7
8
9
10
11
urlpatterns = patterns('',
url(r'^admin/', include(admin.site.urls)),
url(r'^files/$', FileListView.as_view(), name='file-list'),
url(r'^files/(?P<pk>..+)/?$',
FileView.as_view(),
name='file_view')
)
This file has not changed.
To be able to use this API you can’t use a browser anymore because you need to use other http verbs such as put
and delete. To do so there are many options avaliable:
• cURL is a command line tool and library for transferring data with URL syntax, supporting many transfer
protocols 21 .
• HTTPie is a command line HTTP client 22 .
• Requests is an Apache2 Licensed HTTP library, written in Python 23 .
• ’django.tests’ module contains ’Clients’ which is an object capable of performing any http call24 .
There is enough documentation about all of them in their official sites, so this document won’t focus on explaining
their use.
Examples of use:
1
$ http GET http://127.0.0.1:8000/files/ > result
1
[
{
2
3
4
5
6
7
8
9
10
"pk": "file1",
"fields":
{
"file_type": "txt",
"owner": 1,
"name": "file1"
},
"model": "files.file"
21 http://curl.haxx.se/
22 https://github.com/jakubroztocil/httpie#main-features
23 http://docs.python-requests.org/en/latest/
24
https://docs.djangoproject.com/en/1.8/topics/testing/tools/#the-test-client
76
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
},
{
"pk": "file2",
"fields":
{
"file_type": "txt",
"owner": 1,
"name": "file2"
},
"model": "files.file"
},
{
"pk": "file3",
"fields":
{
"file_type": "txt",
"owner": 1,
"name": "file3"
},
"model": "files.file"
}
]
As you can see the response is a list of ’file’ objects with the format explained in 6.7.
1
$ http DELETE http://127.0.0.1:8000/files/file1
The view responsible for deleting files doesn’t return any message but you can know that operation worked by the
status code. Also, you can perform again a get request on ’/files/’ and compare the response with the old one.
1
$ http GET http://127.0.0.1:8000/files/ > result
1
[
{
2
"fields":
{
"name": "file2",
"owner": 1,
"file_type": "txt"
},
"pk": "file2",
"model": "files.file"
3
4
5
6
7
8
9
10
11
12
},
{
"fields":
{
"name": "file3",
"owner": 1,
"file_type": "txt"
},
"pk": "file3",
"model": "files.file"
13
14
15
16
17
18
19
20
21
}
77
22
]
Finally to send data from the terminal you can cretate an auxiliary file and redirect the standard input of http to the
file.
1
$ http PUT http://127.0.0.1:8000/files/ < file.json > result
Where file.json contains:
1
2
[
{
"pk": "new_file1",
"fields":
{
"file_type": "txt",
"owner": 1,
"name": "new_file1"
},
"model": "files.file"
3
4
5
6
7
8
9
10
11
12
}
]
To check the results we will send a GET request again:
1
$ http GET http://127.0.0.1:8000/files/ > result
1
[
{
2
"fields":
{
"name": "file1",
"owner": 1,
"file_type": "txt"
},
"pk": "file1",
"model": "files.file"
3
4
5
6
7
8
9
10
11
12
},
{
"fields":
{
"name": "file2",
"owner": 1,
"file_type": "txt"
},
"pk": "file2",
"model": "files.file"
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
},
{
"fields":
{
"name": "new_file1",
"owner": 1,
"file_type": "txt"
},
78
"pk": "new_file1",
"model": "files.file"
29
30
31
32
}
]
79
Chapter 7
Django Practices
To practice django we will implemented the project described in the first exercise of the rest practices.
Model practices
Resource List:
1. Bookmark: A list of URIs pointing to the existing resources
2. Article: It represents an article, it contains information about the newspaper which published the article, it’s
author and the article headline and body.
3. Author: It represents an author. It contains contact information about the author, the newspaper that the author
writes for, etc.
4. List of articles: An author’s list of publications.
5. Newspaper: It represents a publishing authority. It contains information about the authority (Name, location,
list of authors, brief description, etc.)
6. List of newspapers: A list containing all the newspapers.
7. Search by date: It represents a list of articles written on a date sent in the body payload.
8. Search by newspaper: It represents a list of articles published by the newspaper whose representation is sent
in the body payload.
9. Most recent news: Lists the most recent news.
Exercise 1: Creating the model.
Exercise 1.1: Start a new project named ’Exercise1’. Inside this project create a new app called ’ArticleReader’.
Exercise 1.2: Write models for Articles, Authors and Newspapers. You have to decide which field types should
be used to model each of this resource’s attributes. You must decide too which field options should be applied to each
field. Once you’ve finished register the app in the ’INSTALLED_APPS’ tuple inside settings.py and use the manage
script to update the database structure.
Exercise 1.3: Using the shell create two different newspapers, create four authors, two of them that write for a
newspaper and the other two for the other. Finally, create one article for each author.
Exercise 1.4: Using the shell, retrieve a newspaper from the newspapers list and save it in a variable ’n’ and try
to show it’s content (just type ’n’ and hit enter). Whats it’s content? Go to your model files and add __str__() methods
to every class. It should return a string representing the content of the model. Start the shell again and try to show a
newspaper variable again. What happens? Test the article and author objects too.
80
Exercise 1.5: Use the django’s default serializer to serialize a list containing all the authors created in 1.3 in
JSON format. Show the output on screen. You should use an external tool to understand the JSON format better. What
is the ’pk’ field? Why do you think that ’newspaper’ is shown as a number?
Exercise 2: Since the django’s serializer won’t fit our needs, let’s build our own serializer. Create a new file in the
ArticleReader directory and name it ’serializers.py’ inside it create a new class called ArticleSerializer and create
inside it.
Exercise 2.1: Create a method inside ArticleSerializer that takes an argument ’article’ and returns a dict that
contains three keywords: author, headline and body. ’author’ value must be a string containing the author’s first,
middle and last name separated by spaces. ’headline’ and ’body’ will be the same ’headline’ and ’body’ defined in the
article class.
Exercise 2.2: Create a new method called serialize that takes one argument called article and that, using the
method created in 2.1, returns a json formatted string. You can use python’s default json module to do so:
Example:
1
import json
2
3
mydict = {'hello':'world'}
4
5
serialized_dict = json.dumps(mydict)
Result : a string containing the json object: ’hello’:’world’
Exercise 2.3: Test your code with the shell.
Exercise 2.4: Create a new method called serialize_many that takes one argument named list which contains an
array with ’article’ objects. The result must be a string in json format containing the articles’ array. You should use
the method developed in 2.1. Why do you think you can’t use the method developed in 2.2?
Exercise 2.5: Test the results with the shell.
Exercise 2.6: Modify the methods developed in this exercise so the serialized data matches with the resource
representations created in the rest chapter practices. You should make two different auxiliar methods, one to use inside
the ’serialize’ method and the other with ’serialize_many’.
Exercise 2.7: Develop a NewspaperSerializer and an AuthorSerializer
View practices
Exercise 1: In this first exercise we will implement the GET views from the ArticleReader application. To check this
exercises you’ll only need a web browser.
Exercise 1.1: In the script ’views.py’ add a new class named NewspaperListView class that extends from View.
Remember to import it from django.views.generic.base. define a get method inside it and make it return a 200 status
code without any body. Add a new url to ’urls.py’ with the pattern r’ˆ newspaper/$’ that when accessed calls NewspaperListView. To test if it worked start a new test server, open a web browser and try to access ’127.0.0.1:8000/newspaper/’. If you’ve done well a blank page will be shown, otherwise an error page will tell you what you did wrong.
1
p y t h o n 3 manage . py r u n s e r v e r
Exercise 1.2 Using the serializer that you implemented in the second exercise of the models practices try to
generate an HttpResponse and send the complete list of newspapers. Remember that you can access to the model from
the views just like you can do on the shell. Refresh the page and if you do it correctly a JSON string will be shown on
the browser.
Exercise 1.3 Write a new class ’AuthorView’ and define a get method. It should return the result of using the
AuthorSerializer.serialize() method. Add also a url into urls.py to make the view accessible. To get the identification
number of the author you can use the regular expressions explained in section 6.6 and catch it with ’kwargs[’pk’]’.
Use the browser to test it.
Exercise 1.4 Write a new class ’ArticleView’ like you did with the Authors in the previous exercise. Add it to
the urls.py script.
Exercise 1.5 Write a new class ’NewspaperView’ that returns a json serialized Newspaper whose id is indicated
on the url. Add the url to the urls script.
81
Exercise 1.6 Write a ’SearchView’ class that returns a json serialized list of articles from a newspaper whose id
is specified on the url. Add the url to the urls script.
Exercise 1.7 Implement the ’Search by date’ resource. You can reuse the SearchView
Exercise 1.8 Try to GET with the browser the url ’localhost:8000/newspaper/98/’. What happens? Try to fix it.
What status code should be returned? HINT: if a get doesn’t return any object or if it returns more than one it raises
an exception. If you did it well the browser should show a white page because it doesn’t show status codes but if you
look at the server that you started, it shoud’ve written a log for each request that you’ve performed, and you should
see a 404 response in yellow.
Exercise 1.8 Fix the other queries that could raise an exception.
Middleware and parsing practices
Exercise 1: Continuing with the previous app, we’re going to learn how to write middleware and how to parse information that comes inside the body of a request. The middleware that we’re going to develop will take a request,
evaluate if it is going to be processed by the ArticleListView and check if it contains a valid article. In this section we
will use POST requests, so we will have to use a more powerful tool than a brower, we will use CURL. Above you
have a list of commands to use CURL:
Basic usage:
1
$ curl -X GET http://127.0.0.1:8000/ -d 'yourbody'
-X is used to specify the method used. By default, if no -X argument is found, the request will be a GET. The
methods are always in capital letters.
-d is used to send data. If the data is a string you can send it from the commandline like you saw in the example.
If you want to send binary data or you want to send the string from a file you have to use ’@’ before the file.
1
$ curl -X GET http://127.0.0.1:8000/ -d @datafile
To specify a header you have to use -H.
1
curl http://127.0.0.1:8000/ -d 'yourbody' -H 'Content-type: application/json'
If you want to see the whole communication between the client and the server you can use -v.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ curl http://127.0.0.1:8000/newspaper/ -v
* Hostname was NOT found in DNS cache
Trying 127.0.0.1...
*
* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> GET /newspaper/ HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 127.0.0.1:8000
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Wed, 24 Jun 2015 12:16:16 GMT
< Server: WSGIServer/0.2 CPython/3.4.0
< X-Frame-Options: SAMEORIGIN
< Content-Type: text/html; charset=utf-8
<
* Closing connection 0
[{"newspaperlink": "newspapers/1", "name": "newspaper1"},
{"newspaperlink": "newspapers/2", "name": "newspaper2"}]
Exercise 1.1: Create a new file inside ArticleReader/ and name it middleware.py. Inside it create a new class called
ParsingMiddleware.
Exercise 1.2: First of all we must decide which middleware method must be implemented. If you can’t decide take
a look at section 6.8. Create the method inside ParsingMiddleware and make it return ’None’. Add your middleware
as ’ArticleReader.middleware.ParsingMiddleware’ in the middleware tuple inside ’settings.py’ as the first element of
82
the tuple*. Try to make a GET request to the server. If the server doesn’t throw an exception, you’ve correctly added
the middleware.
*We’re going to perform POST calls later and by default there’s a middleware installed that will catch and response
the request before our middleware is processed.
Exercise 1.3: We should be able to know which view is being called. In the parameters of process_view, view_func
is passed, but it is a function object. You’ll have to look into the python documentation how to get it’s name. To check
if it’s working use the print function with the name function and it will be shown in the server log. Keep returning
’None’ so the server doesn’t rise an exception. Test it with GET requests.
Exercise 1.4: Let’s add some functionality to process_view: If the view_function is ’ArticleListView’ use the
python’s json default library to parse the body request. Notice that request.body is defined as a byte stream, so you’ll
have to decode it before parsing it. You can use request.body.decode(’utf-8’). To parse the data python has a json
library that will be enough for our purposes. You can use it like this:
1
2
3
4
try:
parsed_data=json.loads(yourjsondata)
except:
return HttpResponse(status=??)
Use print to check the functionality. Look into the status codes’ list to decide which status code you have to use.
Exercise 1.5: The middleware should only try to parse the data inside the requests’ bodies if the method used is a
POST, which will be used to add new articles to an author’s list of articles. Modify the method so it returns None if
the request.method is any other than POST, regardless of what’s inside the body.
Exercise 1.6: Create an auxiliary method that:
1. Checks if the parsed data is a dictionary.
2. Checks if the number of keys in the dictionary matches the number of keys in an article object.
3. Checks if the names of the keys match the names of the keys of an article object.
4. Checks the previous conditions for each article’s sub-structure
You can do it manually, checking all the fields in an article or define a function that given a dummie structure,
checks if an object matches with the structure.
Exercise 1.7 Use the auxiliary method that you just created to check if the received data is an article. If it is an
article, the middleware should return None, but if its not valid data you should return a 400 status code.
83
MODELS
1.1:
u s e r ~ / y o u r d i r $ d j a n g o −admin s t a r t p r o j e c t E x e r c i s e 1
u s e r ~ / y o u r d i r $ cd E x e r c i s e 1
u s e r ~ / y o u r d i r / E x e r c i s e 1 $ p y t h o n 3 manage . py s t a r t a p p A r t i c l e R e a d e r
1
2
3
1.2:
1
from django.db import models
2
3
class Newspaper(models.Model):
4
name = models.CharField(max_length=20, unique=True)
location = models.CharField(max_length=50)
description = models.TextField()
5
6
7
8
9
class Author(models.Model):
10
fistname = models.CharField(max_length=20)
middlename = models.CharField(max_length=20)
lastname = models.CharField(max_length=20)
newspaper = models.ForeignKey(Newspaper)
11
12
13
14
15
16
class Article(models.Model):
17
author = models.ForeignKey(Author)
headline = models.CharField(max_length=140)
body = models.TextField()
18
19
20
u s e r ~ / y o u r d i r / E x e r c i s e 1 $ p y t h o n 3 manage . py m a k e m i g r a t i o n s
u s e r ~ / y o u r d i r / E x e r c i s e 1 $ p y t h o n 3 manage . py m i g r a t e
1
2
INSTALLED_APPS = (
' django . c o n t r i b
' django . c o n t r i b
' django . c o n t r i b
' django . c o n t r i b
' django . c o n t r i b
' django . c o n t r i b
' ArticleReader '
1
2
3
4
5
6
7
8
9
. admin ' ,
. auth ' ,
. contenttypes ' ,
. sessions ' ,
. messages ' ,
. staticfiles ' ,
)
1.3:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
u s e r ~ / y o u r d i r / E x e r c i s e 1 $ p y t h o n 3 manage . py s h e l l
P y t h o n 3 . 4 . 0 ( d e f a u l t , Apr 11 2 0 1 4 , 1 3 : 0 5 : 1 1 )
[GCC 4 . 8 . 2 ] on l i n u x
Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more i n f o r m a t i o n .
( InteractiveConsole )
>>> from A r t i c l e R e a d e r . m o d e l s i m p o r t Newspaper , Author , A r t i c l e
>>> n = Newspaper . o b j e c t s . c r e a t e ( name= n e w s p a p e r 1 , l o c a t i o n =123 , f a k e s t r e e t , d e s c r i p t i o n = F i r s t
newspaper t e s t )
>>> n
< Newspaper : Newspaper o b j e c t >
>>> n2 = Newspaper . o b j e c t s . c r e a t e ( name= n e w s p a p e r 2 , l o c a t i o n =4 , P r i v e t D r i v e , d e s c r i p t i o n = Second
newspaper t e s t )
>>> n2
< Newspaper : Newspaper o b j e c t >
>>> a1 = A u t h o r . o b j e c t s . c r e a t e ( f i s t n a m e = " Homer " , middlename = " J a y " , l a s t n a m e = " Simpson " , n e w s p a p e r =
n)
>>> a2 = A u t h o r . o b j e c t s . c r e a t e ( f i s t n a m e = " Marge " , middlename = " . " , l a s t n a m e = " Simpson " , n e w s p a p e r =n )
>>> a3 = A u t h o r . o b j e c t s . c r e a t e ( f i s t n a m e = " B a r t " , middlename = " . " , l a s t n a m e = " Simpson " , n e w s p a p e r =n2 )
>>> a4 = A u t h o r . o b j e c t s . c r e a t e ( f i s t n a m e = " L i s a " , middlename = " . " , l a s t n a m e = " Simpson " , n e w s p a p e r =n2 )
84
17
18
19
20
>>> a r 1 = A r t i c l e . o b j e c t s . c r e a t e ( a u t h o r =a1 , h e a d l i n e = " Ouch ! " , body= "No m a t t e r how good you a r e a t
s o m e t h i n g , t h e r e ' s a l w a y s a b o u t a m i l l i o n p e o p l e b e t t e r t h a n you . " )
>>> a r 2 = A r t i c l e . o b j e c t s . c r e a t e ( a u t h o r =a2 , h e a d l i n e = "Hrmmm . . . " , body= " I 'm g o i n g i n t o t h e d i n i n g
room t o h a v e a c o n v e r s a t i o n . I f you want t o j o i n me , f i n e . ( g o e s i n t o t h e d i n i n g room and
i m i t a t e s a s e c o n d v o i c e ) H e l l o Marge , how ' s t h e f a m i l y ? ( i n r e g u l a r v o i c e ) I don ' t want t o
t a l k a b o u t i t ! Mind y o u r own b u s i n e s s ! " )
>>> a r 3 = A r t i c l e . o b j e c t s . c r e a t e ( a u t h o r = a3 , h e a d l i n e = " I d i d n ' t do i t ! " , body= " You g o t t h e
b r a i n s and t a l e n t t o go a s f a r a s you want and when you do I ' l l be r i g h t t h e r e t o b o r r o w
money . " )
>>> a r 4 = A r t i c l e . o b j e c t s . c r e a t e ( a u t h o r = a4 , h e a d l i n e = "BAAAAART ! ! ! " , body= " I had a c a t named
S n o w b a l l . She d i e d ! She d i e d ! Mom s a i d s h e was s l e e p i n g . She l i e d ! She l i e d ! Why oh why i s
my c a t d e a d ? Couldn ' t t h a t C h r y s l e r h i t me i n s t e a d ? I had a h a m s t e r named S n u f f y . He d i e d "
)
1.4:
1
2
3
4
5
6
7
8
9
1
u s e r y o u r d i r / E x e r c i s e 1 $ p y t h o n manage . py s h e l l
P y t h o n 2 . 7 . 6 ( d e f a u l t , Mar 22 2 0 1 4 , 2 2 : 5 9 : 5 6 )
[GCC 4 . 8 . 2 ] on l i n u x 2
Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more i n f o r m a t i o n .
( InteractiveConsole )
>>> from A r t i c l e R e a d e r . m o d e l s i m p o r t Newspaper
>>> n = Newspaper . o b j e c t s . g e t ( name= " n e w s p a p e r 1 " )
>>> n
< Newspaper : Newspaper o b j e c t >
from django.db import models
2
3
class Newspaper(models.Model):
4
5
6
7
name = models.CharField(max_length=20, unique=True)
location = models.CharField(max_length=50)
description = models.TextField()
8
9
10
11
12
def __str__(self):
return " ".join([self.name,
self.location,
self.description])
13
14
class Author(models.Model):
15
16
17
18
19
fistname = models.CharField(max_length=20)
middlename = models.CharField(max_length=20)
lastname = models.CharField(max_length=20)
newspaper = models.ForeignKey(Newspaper)
20
21
22
23
24
25
def __str__(self):
return " ".join([self.fistname,
self.middlename,
self.lastname,
self.newspaper.name])
26
27
class Article(models.Model):
28
29
30
31
author = models.ForeignKey(Author)
headline = models.CharField(max_length=140)
body = models.TextField()
32
33
34
35
36
37
def __str__(self):
return " ".join([self.headline,
self.body,
self.author.fistname,
self.author.newspaper.name])
85
u s e r y o u r d i r / E x e r c i s e 1 $ p y t h o n manage . py s h e l l
P y t h o n 2 . 7 . 6 ( d e f a u l t , Mar 22 2 0 1 4 , 2 2 : 5 9 : 5 6 )
[GCC 4 . 8 . 2 ] on l i n u x 2
Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more i n f o r m a t i o n .
( InteractiveConsole )
>>> from A r t i c l e R e a d e r . m o d e l s i m p o r t Newspaper
>>> n = Newspaper . o b j e c t s . g e t ( name= " n e w s p a p e r 1 " )
>>> n
< Newspaper : n e w s p a p e r 1 1 2 3 , f a k e s t r e e t F i r s t n e w s p a p e r t e s t >
>>> from A r t i c l e R e a d e r . m o d e l s i m p o r t Author , A r t i c l e
>>> au = A u t h o r . o b j e c t s . g e t ( f i s t n a m e = " Homer " )
>>> au
< A u t h o r : Homer J a y Simpson n e w s p a p e r 1 >
>>> a r = A r t i c l e . o b j e c t s . g e t ( a u t h o r = au )
>>> a r
< A r t i c l e : Ouch ! No m a t t e r how good you a r e a t s o m e t h i n g , t h e r e ' s a l w a y s a b o u t a m i l l i o n p e o p l e
b e t t e r t h a n you . Homer n e w s p a p e r 1 >
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1.5
>>>
>>>
>>>
>>>
>>>
u'
1
2
3
4
5
6
7
8
[
{
" fields ":{
" middlename " : " J a y " ,
" l a s t n a m e " : " Simpson " ,
" newspaper " : 1 ,
" f i s t n a m e " : " Homer "
},
" model " : " A r t i c l e R e a d e r . a u t h o r " ,
" pk " : 1
9
10
11
12
13
14
15
16
17
18
},
{
" fields ":{
" middlename " : " . " ,
" l a s t n a m e " : " Simpson " ,
" newspaper " : 1 ,
" f i s t n a m e " : " Marge "
},
" model " : " A r t i c l e R e a d e r . a u t h o r " ,
" pk " : 2
19
20
21
22
23
24
25
26
27
28
},
{
" fields ":{
" middlename " : " . " ,
" l a s t n a m e " : " Simpson " ,
" newspaper " : 2 ,
" fistname " : " Bart "
},
" model " : " A r t i c l e R e a d e r . a u t h o r " ,
" pk " : 3
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from d j a n g o . c o r e i m p o r t s e r i a l i z e r s
from A r t i c l e R e a d e r . m o d e l s i m p o r t A u t h o r
a u t h o r _ l i s t = Author . o b j e c t s . a l l ( )
serialized_authors = s e r i a l i z e r s . s e r i a l i z e ( ' json ' , a u t h o r _ l i s t )
serialized_authors
},
{
" fields ":{
" middlename " : " . " ,
" l a s t n a m e " : " Simpson " ,
" newspaper " : 2 ,
" fistname " : " Lisa "
},
" model " : " A r t i c l e R e a d e r . a u t h o r " ,
86
" pk " : 4
46
}
47
48
] '
Note: the output has been formatted to enhance the reading.
2.1
1
2
3
4
5
6
7
8
def getdict(self):
article = {'author': " ".join([
self.author.fistname,
self.author.middlename,
self.author.lastname,]),
'headline': self.headline,
'body': self.body}
return article
2.2
def serialize(self, article):
return json.dumps(self.getdict(article))
1
2
2.3
1
2
3
4
5
6
7
>>>
>>>
>>>
>>>
>>>
>>>
{
" h e a d l i n e " : " Ouch ! " ,
" body " : "No m a t t e r how good you a r e a t s o m e t h i n g , t h e r e ' s a l w a y s a b o u t a m i l l i o n p e o p l e b e t t e r
t h a n you . " ,
" a u t h o r " : " Homer J a y Simpson "
8
9
10
11
from A r t i c l e R e a d e r . m o d e l s i m p o r t A r t i c l e
from A r t i c l e R e a d e r . s e r i a l i z e r s i m p o r t A r t i c l e S e r i a l i z e r
a = ArticleSerializer ()
a r t i c l e = Article . objects . all () [0]
s e r i a l i z e d _ a r t i c l e =a . s e r i a l i z e ( a r t i c l e )
serialized_article
}
2.4
1
2
3
4
5
def serialize_many(self, list):
array=[]
for article in list:
array.append(self.getdict(article))
return json.dumps(array)
You can’t use serialize() because it’s return is a string. To use it you should parse the string to get a dict object
again, join all the objects in an array and then serialize the array, that’s why you should do the auxiliary method.
2.5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>>
>>>
>>>
>>>
>>>
>>>
>>>
'[
from A r t i c l e R e a d e r . m o d e l s i m p o r t A r t i c l e
from A r t i c l e R e a d e r . s e r i a l i z e r s i m p o r t A r t i c l e S e r i a l i z e r
a = Article . objects . all ()
s = ArticleSerializer ()
s e r i a l i z e d _ l i s t = s . serialize_many ( a )
serialized_list
{
" h e a d l i n e " : " Ouch ! " ,
" body " : "No m a t t e r how good you a r e a t s o m e t h i n g , t h e r e ' s a l w a y s a b o u t a m i l l i o n p e o p l e
b e t t e r t h a n you . " ,
" a u t h o r " : " Homer J a y Simpson "
},
{
" h e a d l i n e " : "Hrmmm . . . " ,
87
" body " : " I 'm g o i n g i n t o t h e d i n i n g room t o h a v e a c o n v e r s a t i o n . I f you want t o j o i n me ,
f i n e . ( g o e s i n t o t h e d i n i n g room and i m i t a t e s a s e c o n d v o i c e ) H e l l o Marge , how ' s t h e f a m i l y ?
( i n r e g u l a r v o i c e ) I don ' t want t o t a l k a b o u t i t ! Mind y o u r own b u s i n e s s ! " ,
" a u t h o r " : " Marge . Simpson "
},
{
" h e a d l i n e " : " I d i d n ' t do i t ! " ,
" body " : " You g o t t h e b r a i n s and t a l e n t t o go a s f a r a s you want and when you do I ' l l be
r i g h t t h e r e t o b o r r o w money . " ,
" a u t h o r " : " B a r t . Simpson "
},
{
" h e a d l i n e " : "BAAAAART ! ! ! " ,
" body " : " I had a c a t named S n o w b a l l . She d i e d ! She d i e d ! Mom s a i d s h e was s l e e p i n g . She
l i e d ! She l i e d ! Why oh why i s my c a t d e a d ? Couldn ' t t h a t C h r y s l e r h i t me i n s t e a d ? I had a
h a m s t e r named S n u f f y . He d i e d \ \ u2014 " ,
" a u t h o r " : " L i s a . Simpson "
}
16
17
18
19
20
21
22
23
24
25
26
27
28
29
]'
2.6
1
2
3
4
5
6
7
8
9
10
11
12
class ArticleSerializer:
def getdict(self,article):
aux = {
'headline':article.headline,
'body': article.body
}
article = {
'newspaper':article.author.newspaper.id,
'author': article.author.id,
'article': aux
}
return article
13
14
15
16
17
18
def getlistdict(self,article):
article = {
'headline':article.headline,
'articlelink':'/article/%s' % article.id
}
19
20
return article
21
22
23
def serialize(self, article):
return json.dumps(self.getdict(article))
24
25
26
27
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
def serialize_many(self, list):
array=[]
for article in list:
array.append(self.getlistdict(article))
return json.dumps(array)
>>> from A r t i c l e R e a d e r . s e r i a l i z e r s i m p o r t A r t i c l e S e r i a l i z e r
>>> from A r t i c l e R e a d e r . m o d e l s i m p o r t A r t i c l e
>>> s = A r t i c l e S e r i a l i z e r ( )
>>> a r t i c l e _ l i s t = A r t i c l e . o b j e c t s . a l l ( )
>>> s t r i n g = s . s e r i a l i z e ( a r t i c l e _ l i s t [ 0 ] )
>>> s t r i n g
'{
" newspaper " : 1 ,
" article ": {
" h e a d l i n e " : " Ouch ! " ,
" body " : "No m a t t e r how good you a r e a t s o m e t h i n g ,
t h e r e ' s always about a m i l l i o n people b e t t e r
88
t h a n you . "
13
},
" autho r " : 1
}'
14
15
16
1
2
3
4
5
6
7
>>>
>>>
>>>
>>>
>>>
>>>
[
from A r t i c l e R e a d e r . s e r i a l i z e r s i m p o r t A r t i c l e S e r i a l i z e r
from A r t i c l e R e a d e r . m o d e l s i m p o r t A r t i c l e
s = ArticleSerializer ()
a r t i c l e _ l i s t = Article . objects . all ()
string = s . serialize_many ( a r t i c l e _ l i s t )
string
{
8
" h e a d l i n e " : " Ouch ! " ,
" a r t i c l e l i n k " : " / a r t i c l e /1 "
9
10
},
{
11
12
" h e a d l i n e " : "Hrmmm . . . " ,
" a r t i c l e l i n k " : " / a r t i c l e /2 "
13
14
},
{
15
16
" h e a d l i n e " : " I d i d n ' t do i t ! " ,
" a r t i c l e l i n k " : " / a r t i c l e /3 "
17
18
},
{
19
20
" h e a d l i n e " : "BAAAAART ! ! ! " ,
" a r t i c l e l i n k " : " / a r t i c l e /4 "
21
22
}
23
24
]
2.7
1
class NewspaperSerializer:
2
3
4
5
6
7
8
9
10
11
12
13
14
def getdict(self,newspaper):
author_list = newspaper.author_set.all()
aux = []
for author in author_list:
aux.append({'authorname':author.fistname,
'authorlink':'/Author/%s' % author.id})
mynewspaper = {
'name':newspaper.name,
'location': newspaper.location,
'authors':aux
}
return mynewspaper
15
16
17
18
19
20
21
def getlistdict(self,newspaper):
newspaper = {
'name':newspaper.name,
'newspaperlink':'newspapers/%s' % newspaper.id
}
return newspaper
22
23
24
25
26
27
def serialize_many(self,list):
array = []
for newspaper in list:
array.append(self.getlistdict(newspaper))
return json.dumps(array)
28
29
30
1
def serialize(self,newspaper):
return json.dumps(self.getdict(newspaper))
class AuthorSerializer:
89
2
3
4
5
6
7
8
9
10
11
def getdict(self,author):
aux = {
'firstname': author.fistname,
'middlename': author.middlename,
'lastname': author.lastname}
myauthor = {
'information': aux,
'newspaper': author.newspaper.id,
'articleslistlink':'/author/%s/articles/' % author.id
}
return myauthor
12
13
14
def serialize(self,author):
return json.dumps(self.getdict(author))
VIEWS
1.1
1
class NewspaperListView(View):
2
3
4
def get(self,request,*args,**kwargs):
return HttpResponse(status=200)
1.2
1
2
3
1
url(r'^newspaper/$',
NewspaperListView.as_view(),
name='newspaper-list')
class NewspaperListView(View):
2
3
4
5
6
7
def get(self,request,*args,**kwargs):
newspaper_list = Newspaper.objects.all()
serializer = NewspaperSerializer()
data = serializer.serialize_many(newspaper_list)
return HttpResponse(data)
1.3
1
2
3
1
url(r'^author/(?P<pk>\d+)/$',
AuthorView.as_view(),
name='author-detail' )
class AuthorView(View):
2
3
4
5
6
7
8
def get(self,request,*args,**kwargs):
author_id = int(kwargs['pk'])
author = Author.objects.get(id=author_id)
serializer = AuthorSerializer()
data = serializer.serialize(author)
return HttpResponse(data)
Result
1.4
1
2
3
1
url(r'^article/(?P<pk>\d+)/$',
ArticleView.as_view(),
name='article-detail')
class ArticleView(View):
2
3
def get(self,request,*args,**kwargs):
90
article_id = int(kwargs['pk'])
article = Article.objects.get(id=article_id)
serializer = ArticleSerializer()
data = serializer.serialize(article)
return HttpResponse(data)
4
5
6
7
8
1.5
url(r'^newspaper/(?P<pk>\d+)/$',
NewspaperView.as_view(),
name ='newspaper-detail')
1
2
3
1
class NewspaperView(View):
2
def get(self,request,*args,**kwargs):
newspaper_id = int(kwargs['pk'])
newspaper = Newspaper.objects.get(id=newspaper_id)
s = NewspaperSerializer()
data = s.serialize(newspaper)
return HttpResponse(data)
3
4
5
6
7
8
1.6
1
2
3
1
url(r'^search/by_newspaper/(?P<pk>\d+)/$',
SearchView.as_view(),{'filter':'newspaper'},
name='search-newspaper'),
class SearchView(View):
2
3
4
5
6
7
8
9
10
11
12
def get(self,request,*args,**kwargs):
if kwargs['filter']=='newspaper':
aux = []
newspaper = Newspaper.objects.get(id=int(kwargs['pk']))
for author in newspaper.author_set.all():
for article in author.article_set.all():
aux.append(article)
s = ArticleSerializer()
data = s.serialize_many(aux)
return HttpResponse(data)
1.7
1
2
3
4
5
url(
r'^search/by_date/(?P<day>\d\d)-(?P<month>\d\d)-(?P<year>\d\d\d\d)/$',
SearchView.as_view(),
{'filter':'date'},
name='search-date'),
Indent changed to 2 spaces to fit the code in the page
1
class SearchView(View):
2
3
4
5
6
7
8
9
10
11
12
13
def get(self,request,*args,**kwargs):
if kwargs['filter']=='newspaper':
aux = []
newspaper = Newspaper.objects.get(id=int(kwargs['pk']))
for author in newspaper.author_set.all():
for article in author.article_set.all():
aux.append(article)
s = ArticleSerializer()
data = s.serialize_many(aux)
return HttpResponse(data)
elif kwargs['filter'] == 'date':
91
day = int(kwargs['day'])
month = int(kwargs['month'])
year = int(kwargs['year'])
newspaper_list = Newspaper.objects.all()
aux=[]
for newspaper in newspaper_list:
for author in newspaper.author_set.all():
for article in author.article_set.filter(
date__day=day,
date__month=month,
date__year=year):
aux.append(article)
s = ArticleSerializer()
data = s.serialize_many(aux)
return HttpResponse(data)
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1.8
Look for every .get() query and wrap it with a try...except structure and in the exception return an HttpResponse(status=404)
1
class NewspaperView(View):
2
3
4
5
6
7
8
9
10
11
def get(self,request,*args,**kwargs):
newspaper_id = int(kwargs['pk'])
try:
newspaper = Newspaper.objects.get(id=newspaper_id)
except:
return HttpResponse(status=404)
s = NewspaperSerializer()
data = s.serialize(newspaper)
return HttpResponse(data)
MIDDLEWARE
1.2
1
2
3
4
5
6
7
8
9
10
MIDDLEWARE_CLASSES = (
'ArticleReader.middleware.ParsingMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
)
The method must be ’process_view(self,request,view_func,view_args,view_kwargs)’
1.3
1
2
3
4
5
def process_view(self, request, view_func, view_args, view_kwargs):
print(view_func.__name__)
if view_func.__name__ == 'ArticleListView':
return None
return None
1.4
1
2
3
4
5
6
7
def process_view(self, request, view_func, view_args, view_kwargs):
print(view_func.__name__)
if view_func.__name__ == 'ArticleListView':
body = request.body.decode('utf-8')
data = json.loads(body)
print('Data: %s'%data)
return None
92
1.5
1
2
3
4
5
6
7
8
9
def process_view(self, request, view_func, view_args, view_kwargs):
print(view_func.__name__)
if view_func.__name__ == 'ArticleListView':
if request.method == 'POST':
if request.POST != "":
body = request.body.decode('utf-8')
data = json.loads(body)
print('Data: %s'%data)
return None
1.6
article_is_valid is the manual form, much harder to maintain. are_same_type is much more maintainable, because
you only need to change the dummie object to check a different structure.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def article_is_valid(article):
if type(article) is dict:
if len(article) == 3:
if ('newspaper' in article):
if ('author' in article):
if ('article' in article):
if type(article['newspaper']) is int and
type(article['author']) is int and
type(article['article']) is dict:
if len(article['article'].keys()) == 2:
if 'headline' in article['article'] and
'body' in article['article']:
if type(article['article']['headline']) is str and
type(article['article']['body']) is str:
return True
return False
17
18
19
20
21
22
23
24
25
26
27
28
29
def are_same_type(obj,dummie):
if type(obj) is not type(dummie):
return False
if type(dummie) == dict or type(obj) is list:
if len(dummie) != len(obj):
return False
for key in dummie:
if type(dummie[key]) is not type(obj[key]):
return False
if type(dummie[key]) is list or type(dummie[key]) is dict:
return are_same_type(obj[key], dummie[key])
return True
1.7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class ParsingMiddleware():
article_dummie = {'newspaper':1,
'author':1,
'article':{
'headline':'asdf',
'body':'asdf'}
}
def process_view(self, request, view_func, view_args, view_kwargs):
print(view_func.__name__)
if view_func.__name__ == 'ArticleListView':
if request.method == 'POST':
if request.POST != "":
body = request.body.decode('utf-8')
print(body)
data = json.loads(body)
print('Data: %s'%data)
if are_same_type(data,self.article_dummie):
93
18
19
20
21
return None
else:
return HttpResponse(status=400)
return None
94
Chapter 8
REST Aplied to a SDN Application
8.1
Introduction
The main principle behind Software-Defined Networking (SDN) is the physical separation of the network control
plane from the forwarding plane, where a single control plane controls several devices.
Software-Defined Networking is an emerging architecture that is dynamic, manageable, cost-effective, and adaptable, making it ideal for the high-bandwidth, dynamic nature of today’s applications. This architecture decouples
the network control and forwarding functions enabling the network control to become directly programmable and the
underlying infrastructure to be abstracted for applications and network services. The SDN architecture is:
• Directly programmable: Network control is directly programmable because it is decoupled from forwarding
functions.
• Agile: Abstracting control from forwarding lets administrators dynamically adjust network-wide traffic flow to
meet changing needs.
• Centrally managed: Network intelligence is (logically) centralized in software-based SDN controllers that
maintain a global view of the network, which appears to applications and policy engines as a single, logical
switch.
• Programmatically configured: SDN lets network managers configure, manage, secure, and optimize network
resources very quickly via dynamic, automated SDN programs, which they can write themselves because the
programs do not depend on proprietary software.
• Open standards-based and vendor-neutral: When implemented through open standards, SDN simplifies
network design and operation because instructions are provided by SDN controllers instead of multiple, vendorspecific devices and protocols.
8.2
Ryu Introduction
Ryu is a component-based framework for Software-Defined Networking applications. It provides software components with well defined API that make it easy to create network management and control applications. Ryu supports
various protocols for managing network devices, such as OpenFlow, Netconf, OF-config, etc.
Ryu supports OpenFlow 1.0, 1.2, 1.3, 1.4. It’s fully developed in Python and all of the code is freely available
under the Apache 2.0 license. It is the tool chosen to develop the Control Plane in the experiments exposed in this
document. All of the scenarios analyzed rely on OpenFlow 1.3, Open vSwitch, Ryu and Mininet.
95
8.3
Ryu Features
Ryu defines two important base classes, that you’ll need to extend to create your applications and controllers: RyuApp
and ControllerBase.
The first one is located in the app_manager package (app_manager.RyuApp). It is used to receive messages
originated in the switches. You can receive any OpenFlow message. To receive messages you’ll have to create your
method, that processes the incoming message and decorate it with a ’@set_ev_cls’ decorator (see section 8.3.2). When
a message is received the RyuApp class will look for methods in the extended class that are decorated and if it finds a
method with the right decorator it will call it. Otherwise the packet will not be processed. RyuApp contains a variable
called OFP_VERSIONS. It is a list of all the OpenFlow versions that the app will accept. If a switch does not operate
in one of the versions listed in OFP_VERSIONS, that switch will be ignored.
ControllerBase is the base class to generate your APIs. It will handle the incoming HTTP connections
through a WSGI interface. To respond to a certain HTTP connection you’ll have to create a method that processes the
request and sends the response back and decorate it with a ’@route’ decorator (see section 8.3.3). When a request
arrives the server will look in the class extended from ControllerBase if there’s a method decorated with the right
decorator. If the method is found, the packet will be handed to the method and the response gotten from the method
will be sent back to the client. Otherwise, the request will be deprecated and the client will receive a response with a
404 status code.
8.3.1
Message Reply Handlers
To process the packets that arrive from the switches you can use a decorator found in the package ryu.controller.handler
called set_ev_cls. This decorator accepts two arguments. The first one is the type of packets that the function will
handle. Those are OpenFlow packets. In this example we will only use asdf of them:
• EventOFPStateChange: It is received when a switch’s state changes.
• EventOFPFlowStatsReply: It is received after a EventOFPFlowStatsRequest is sent. It contains information
about the switch’s flows.
• EventOFPPortStatsReply: It is received after a EventOFPPortStatsRequest is sent. It contains statistics collected
from the switche’s ports.
The second argument is the switch’s state when it generated the message. There are four possible states:
• HANDSHAKE_DISPATCHER: The switch is up and it looks for controllers.
• CONFIG_DISPATCHER: Exchange of features, such as OpenFlow versions available.
• MAIN_DISPATCHER: Normal state, the switch forwards packets and communicates with the controller when
necessary.
• DEAD_DISPATCHER: The switch is disconnected from the controller.
When a packet is processed, the method receives one positional argument, the event, which will be explained on
section 8.3.2
Example:
1
2
3
@set_ev_cls(ofp_event.EventOFPStateChange, MAIN_DISPATCHER)
def add_switch(self, ev):
print('New switch is up')
4
5
6
7
@set_ev_cls(ofp_event.EventOFPStateChange, DEAD_DISPATCHER)
def delete_switch(self,ev):
print('Swtich went down')
This methods will receive the EventOFPStateChange message when the switch is ready to work and when
the switch is disconnected from the controller. It can be used, for example, to keep track of working switches.
96
8.3.2
OpenFlow protocol messages
When a message is received, it is encapsulated in one object. The object contains the msg object. The msg object
contains the switch’s datapath object, the datapath object contains the ofproto object and the parser object which are
independent from the message.
It also contains the specific information that the packet contains (For example the port statistics). While the first
part (which contains the switch information) has always the same structure, the structure from the part that carries the
message information depends on the type of the message received.
Example:
OFPPortStatsReply contains an object called ’body’ inside msg that contains a list of OFPPortStats.
Each flow contains information such as the packets received and the packets sent.
1
2
3
4
@set_ev_cls(ofp_event.EventOFPPortStatsReply, MAIN_DISPATCHER)
def _port_stats_reply_handler(self, ev):
datapath = ev.msg.datapath
datapath_id = datapath.id
5
6
body = ev.msg.body
7
8
9
for stat in body:
self.logger.info('Datapath: %s, port: %s, received packets: %s' % (datapath.id, stat.port_no,
stat.rx_packets))
This simple function prints the statistics received in the controller, specifying the datapath and the port number.
To send messages to the switches you’ll need the datapath and the parser object contained in the received messages.
You should store them when a switch changes to MAIN_DISPATCHER state.
The parser object contains an OFPMatch method, that takes named arguments and generates a match object. It also
contains methods to generate instructions. We’ll only use OFPInstructionActions and OFPActionOutput
but you can find the full list of methods in the Ryu’s docummentation. Finally, we’ll also use the parser object to
compile the match, the instructions, the datapath, etc in a single object using OFPFlowMod. It takes named arguments.
The most common are: match, instructions, priority and datapath.
The datapath object contains a method named send_msg, which takes as argument an OFPFlowMod and sends a
message to the switch with the compiled object.
Example:
1
2
3
4
5
6
7
8
def add_flow(self, datapath):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
match = parser.OFPMatch(in_port=1)
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser.OFPActionOutput(2)])]
mod = parser.OFPFlowMod(datapath=datapath, priority=1, match=match, instructions=inst)
datapath.send_msg(mod)
self.logger.info("New Flow Added")
This simple method, takes a datapath as the first positional argument, then generates a message that installs a new
flow into the switch. The flow indicates that when a packet is received from the port 1 it must be forwarded to port 2.
8.3.3
HTTP Request Handlers
A class that extends from ControllerBase will be able to handle HTTP Requests that arrive to the server. The requests
will be filtered by their URL and their methods and Ryu will decide which method is called. To indicate which URL
and methods will be matched the class’ methods must be decorated with the route decorator.
Route accepts two positional arguments and two key-word arguments:
1. Request name: It’s only an identifier string to name the resource. It doesn’t have any further implications.
2. URL: It contains a string defining the URL to match. It doesn’t contain the domain nor the first part of the URI
(protocol://ip_address:port). To define variable URL parts you can use brackets ’{’ and ’}’. Inside this brackets
97
you have to specify a representative name, it doesn’t matter which one do you use, you’ll use it only to identify
the substring contained. You’ll be able to access the substring wrapped in braces inside the method by calling
’kwargs[’name’]’.
3. methods=[]: It contains an array with all the method(s) that this method will listen to.
4. requirements={}: It contains a dictionary whose keys are the names are the names defined in the URL (inside the
brackets) and the values are patterns. It forces the substring identified by the key to match the pattern contained
in the value.
Example:
1
self.simple_digit_pattern='\d'
2
3
4
5
6
@route('My_test_example',
'/resource/{resource_identifier}',
methods=['GET','POST','DELETE'],
requirements={'resource_identifier':self.simple_digit_pattern})
The methods decorated with route should take two arguments: a request argument, and a key-word argument.
They have to return a Response object from the ’webob’ package.
8.3.4
Link REST Controllers with Ryu applications
The rest linkage of a certain Ryu application with a web interface is made outside the app’s class. You have to create
a controller class that handles requests and responses. To link a controller with an application you can use a WSGI
object, created by Ryu and stored in the key-words argument. It is accessible through the key ’wsgi’. This object
allows you to register controllers for your api. The controller registered will recieve the incoming requests and will
have an instance of the application (See Figure 8.1). The controller will be able to call the application methods through
this instance.
RYU APP
Register
Request
W
S
G
I
CONTROLLER
App
Instance
Figure 8.1: App and Controller interconnection
Example:
1
2
from ryu.app.wsgi import ControllerBase
from webob import Response
3
4
class TestApp(app_manager.RyuApp):
5
6
_CONTEXTS = { 'wsgi': WSGIApplication }
7
8
9
10
11
def __init__(self,*args,**kwargs):
super(TestApp, self).__init__(*args, **kwargs)
wsgi = kwargs['wsgi']
wsgi.register(TestController, {'TestApp': self})
12
13
def PrintHelloWorld(self):
98
Request
print("Hello World")
14
15
16
class TestController(ControllerBase):
17
def __init__(self, req, link, data, **config):
super(TestController, self).__init__(req, link, data, **config)
self.testapp = data['TestApp']
18
19
20
21
@route('Hello World','/',methods=['GET'])
def Hello_world(self,req,**kwargs):
return Response(body=self.testapp.PrintHelloWorld())
22
23
24
8.4
Monitoring Application
In this section we will show how can a Ryu application offer a rest interface for clients to interact with the network.
First of all, we need to create the list of resources that we want to offer. In this case the list will be:
1. Bookmark : Application’s entry point
2. Topology : Topology bookmark. Lists all the topology resources avaliable.
3. Switch List : Represents a list of the active switches.
4. Link List : Represents a list of the interconnections between all switches.
5. Switch’s link list : Represents a list of the existing links between the specified switch and the other switches.
6. List of flows: Complete list of flows in the network.
7. A switch’s list of flows: List of flows defined for a switch.
8. Statistics of a port: Represents the packet and byte load for a certain port of a certain switch.
9. Statistics of a switch: Lists the statistics of every port in a switch.
In table 8.1 you can see the list of URIs defined for this application.
Resource
Bookmark
Topology
Switch List
Link List
Switch’s link list
Flows list
Switch’s flow list
Port Statistics
Switch Statistics
URI
’/’
’topology/’
’topology/switches/’
’topology/links/’
’topology/links/<id>/’
’flows/’
’flows/<id>/
’statistics/<id>/<port>/
’statistics/<id>’
Table 8.1: List of URIs
8.4.1
RYU implementation
In this section we will explain how to create a simple Ryu application and a controller that are able to serve the
resources listed above.
99
Application
First of all we will have a look at the Ryu application:
This ryu application will have three functionalities:
1. Store a record of active switches.
2. Send messages to add and delete flows.
3. Send and recieve status messages.
Switches
To send messages to the switches you’ll need to have a datapath object, which contains the identifier of a
switch, the OpenFlow protocol it is using and a parser which contains a callable object which is used to prepare the
message for the switch. To store them we will create a dictionary which contains the objects under the datapath’s
id number as a key. We will catch the datpath object when a switch sends an EventOFPStateChange message,
that indicates that the switch is in a MAIN_DISPATCHER state (Active).
Also, when a switch changes it’s state to DEAD_DISPATCHER (It’s no longer active) we will remove the switch
from the dictionary.
To ’catch’ the EventOFPStateChange messages, like any other message we will add a decorator to the function designed to handle it. In the decorator we will specify the message (ofp_event.EventOFPStateChange) and
the switch’s state:
1
2
3
4
5
6
7
8
9
@set_ev_cls(ofp_event.EventOFPStateChange, MAIN_DISPATCHER)
def add_switch(self, ev):
try:
datapath = ev.datapath
d_id = datapath.id
except:
print("Error Occurred")
self.switches[d_id]=datapath
self.logger.info("Switch %s UP", d_id)
10
11
12
13
14
15
16
17
@set_ev_cls(ofp_event.EventOFPStateChange, DEAD_DISPATCHER)
def delete_switch(self,ev):
datapath = ev.datapath
d_id = datapath.id
if d_id in self.switches:
self.switches.pop(d_id)
self.logger.info("Switch %s DOWN", d_id)
self.switches is the dictionary where the datapaths are stored. They are indexed by their id.
Sometimes, DEAD_DISPATCHER messages arrive more than once, so you have to check if you deleted it already.
The logger is just a debugging tool, it doesn’t affect the application even if it is highly recommended to use it in
order to keep a record of the events.
Even if with this we can keep a track of the active switches, to implement the topology resource which lists
switches we will not use this information but the information provided by ryu’s topology api, which will give us much
more information about the switches (number of ports, etc.).
Flows
In this app, we will implement two methods regarding flows: One for adding them and another one for deleting
them. This methods will be called by the controller when the right request arrives (we’ll see that later).
Adding a flow:
1
2
3
4
5
6
def add_flow(self, d_id, priority, conditions, out_port, buffer_id=None):
datapath = self.switches[int(d_id)]
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
match = parser.OFPMatch(**conditions)
if str(out_port) == "BROADCAST":
100
7
8
9
10
11
12
13
14
15
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser.
OFPActionOutput(ofproto.OFPP_FLOOD)])]
else:
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser.
OFPActionOutput(int(out_port))])]
self.logger.info("datapath: %s,conditions = %s, output = %s" % (d_id, conditions,out_port
))
mod = parser.OFPFlowMod(datapath=datapath, priority=priority,
match=match, instructions=inst, cookie=self.count)
self.count+=1
datapath.send_msg(mod)
self.logger.info("New Flow Added")
As you can see, first of all you need the datapath stored in self.switches and the objects it contains:
ofproto and ofproto_parser.
After that we generate a match with the conditions received from the request and OFPMatch method from the
parser.
To generate the match we need to use the double start notation to simplify the code. The OFPMatch method
requires named arguments:
1
match = parser.OFPMatch(in_port=2)
If you receive a dictionary with all the conditions you’d need to make a call specifying all the match conditions that
the dictionary contains, but you don’t know which ones are defined, which means that you’d have to call something
like:
1
match = parser.OFPMatch(in_port=conditions['in_port'],in_phy_port=conditions['in_phy_port'],...)
To solve this we’ve used the double star symbol.
The double star on a dict transforms the key strings into variables and leaves the value as it is. For example a
conditions example could be: {’in_port’:2,’ipv4_src’:"192.168.1.1"}. If you use double star just like in the code, the
result would be the same as calling:
1
match = parser.OFPMatch(in_port=2,ipv4_src="192.168.1.1")
We can use the conditions without checking their validity because it’s been done in a previous stage.
After the match is generated, it’s time for the set of instructions. In this app we will only accept two actions: send
a packet to a certain port or broadcast it. Instructions are more problematic than matches because, with matches you
only need named variables, while every action in an instruction set must be constructed from the parser. This is why
in this app we will only accept OFPActionOutput instructions.
To send a packet to a certain port you need to generate a OFPActionOutput specifying the port number or
OFPP_FLOOD to broadcast it.
We construct the message with OFPFlowMod specifying the datapath, the match conditions, the instructions to perform and the priority and a cookie.
Cookies are numbers tied to a flow that the developer can use freely for his own advantage. We will use it to
identify every flow we insert in a switch. We will set the value of the cookie using a simple integer variable that we
increment every time we use it.
Finally we sent the OFPFlowMod message using send_msg.
Note: In this app all the flows are installed in the table 0 of the switch.
Deleting a flow:
1
2
3
4
5
6
7
def remove_table_flows(self, dpid, flow_id):
"""Create OFP flow mod message to remove flows from table."""
datapath = self.switches[int(dpid)]
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
flow = self.flow_list[int(flow_id)]
match = datapath.ofproto_parser.OFPMatch(**flow['match'])
101
out_port = flow['actions'][0]['OFPActionOutput']
table_id = flow['table_id']
if str(out_port) == "BROADCAST":
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser.OFPActionOutput
(ofproto.OFPP_FLOOD)])]
else:
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser.OFPActionOutput
(int(out_port))])]
flow_mod = datapath.ofproto_parser.OFPFlowMod(datapath, 0, 0,table_id, ofproto.OFPFC_DELETE
,0, 0,1,ofproto.OFPCML_NO_BUFFER,ofproto.OFPP_ANY,ofproto.OFPG_ANY, 0,match, inst)
datapath.send_msg(flow_mod)
8
9
10
11
12
13
14
15
This method works just like the add_flow method. First of all, you have to get the datapath from the stored
list, then you build the match and instruction set objects with the information sored on flow_list and finally
you have to construct a OFPFlowMod message but you construct it with the OFPFC_DELETE attribute.
In this method we don’t use the cookie generated because you can’t define a flow by it’s cookie, because multiple
flows can have the same cookie, instead we retrieve the flow information stored in self.flow_list (datapath,
match, instructions and table) to identify the flow.
Status Messages
OpenFlow defines two types of status messages. OFPPortStatsRequest messages and OFPFlowStatsRequest messages ( with their corresponding OFPPortStatsReply and OFPFlowStatsReply). Port messages are sent to a switch to demand information of every port they have. The reply contains traffic information such
as bytes sent/received from that port. The Flow reply contains a list of all the flows that have been installed into a
switch.
In this app we will implement a traffic monitor. To do so we will send OFPPortStatsRequest messages
periodically to every switch and we will store this information in a database. In this app we will also track the flows
installed into the switches. To do so we will send OFPFlowStatsRequest messages periodically to every switch.
First of all, let’s see how to create independent threads to work in parallel and keep sending the request messages
to the switches.
1
2
self.port_thread = hub.spawn(self._port_monitor)
self.flow_thread = hub.spawn(self._flow_monitor)
Inside Ryu.lib there is a ’hub’ class that spawns independent threads which execute a certain method. We will
create two different threads. We’ll see why later.
1
2
3
4
5
def _port_monitor(self):
while True:
for dp in self.switches.values():
self._request_port_stats(dp)
hub.sleep(self.port_refresh_rate)
6
7
8
9
10
11
def
_flow_monitor(self):
while True:
for dp in self.switches.values():
self._request_flow_stats(dp)
hub.sleep(self.flow_refresh_rate)
12
13
14
15
def _request_port_stats(self, datapath):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
16
17
18
req = parser.OFPPortStatsRequest(datapath, 0, ofproto.OFPP_ANY)
datapath.send_msg(req)
19
20
21
22
def _request_flow_stats(self,datapath):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
23
24
25
req = parser.OFPFlowStatsRequest(datapath)
datapath.send_msg(req)
102
_port_monitor and _flow_monitor will run while the app is working. For every switch in the switch list
they will call a function that sends OFPFlowStatsRequest messages and another function that sends OfPPortStatsRequest messages. After that it will wait a determined interval until the next round.
To catch the reply messages we will use the set_ev_cls decorator.
Catching Flows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
@set_ev_cls(ofp_event.EventOFPFlowStatsReply, MAIN_DISPATCHER)
def _flow_stats_reply_handler(self, ev):
body = ev.msg
datapath = ev.msg.datapath.id
data_json=body.to_jsondict()
self.flow_list = {key: value for key, value in self.flow_list.items() if value['dpid'] !=
datapath}
self.flow_time[int(datapath)]=time.time()
for flow in data_json['OFPFlowStatsReply']['body']:
cookie = flow['OFPFlowStats']['cookie']
if cookie != 0 and cookie not in self.flow_list:
match_list = flow['OFPFlowStats']['match']['OFPMatch']['oxm_fields']
condition_list={}
for condition in match_list:
condition_list[condition['OXMTlv']['field']]=condition['OXMTlv']['value']
actionlist = flow['OFPFlowStats']['instructions'][0]['OFPInstructionActions']['actions']
for action in actionlist:
for actionname in action:
if actionname == 'OFPActionOutput':
action[actionname]=action[actionname]['port']
else:
action[actionname].pop('max_len')
action[actionname].pop('len')
action[actionname].pop('type')
table_id = flow['OFPFlowStats']['table_id']
flow={'match':condition_list,'actions':actionlist,'table_id':table_id,'dpid':datapath}
self.flow_list[cookie]=flow
To fully understand this method you have to know the OFPFlowStatsReply structure. This kind of messages
contain a json representation of a list of flows. The most important fields in a flow are: match, table_id and
action_list. In the first part of the code, the message’s body is transformed into a json object. Then for every
flow listed in the object, the relevant information about the flow is stored and everything else is deprecated.
From every match we keep only the oxm_fields, which contains the list of conditions that must be fulfilled.
Also, we won’t keep all the oxm_fields structure. oxm_fields is a list of objects which contain two values:
’field’ and ’value’. Instead of using this structure we will take a new dictionary whose keys are the ’field’ from
oxm_fields and whose values are the ’value’ fields from oxm_fields.
The same happens with every action: we remove some parameters such as len and type and for the most usual
action (OFPActionOutput) we’ve changed the format to an easier read.
To store the flows in ’flow_list’ we will index it by the cookie that we explained in add_flow. When we receive
a OFPFlowStatsReply we delete all the entries in the dictionary whose datapath_id value is the same as the
datapath_id received in the message in order to delete possible flows that were removed from the switches. After
that we put every incoming flow into the flow_list. If the flow’s cookie is 0 we will not store it. This kind
of flows are generated by the ryu app and are used, for example, to define flows to address the messages from the
switches to the controller.
Catching Ports:
1
2
3
4
5
6
@set_ev_cls(ofp_event.EventOFPPortStatsReply, MAIN_DISPATCHER)
def _port_stats_reply_handler(self, ev):
body = ev.msg.body
for stat in body:
dp_id = ev.msg.datapath.id.__str__()
if dp_id in self.previous_read:
103
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
if str(stat.port_no) in self.previous_read[str(dp_id)]:
old=self.previous_read[str(dp_id)][str(stat.port_no)]
self.db.record_stats(ev.msg.datapath.id,stat.port_no,
stat.rx_packets-old['rx_packets'],
stat.rx_bytes-old['rx_bytes'],
stat.rx_errors-old['rx_errors'],
stat.tx_packets-old['tx_packets'],
stat.tx_bytes-old['tx_bytes'],
stat.tx_errors-old['tx_errors'])
old=None
self.previous_read[str(dp_id)][str(stat.port_no)]={}
else:
self.previous_read[str(dp_id)]={}
self.previous_read[str(dp_id)][str(stat.port_no)]={}
self.previous_read[str(dp_id)][str(stat.port_no)]['rx_packets']=stat.rx_packets
self.previous_read[str(dp_id)][str(stat.port_no)]['rx_bytes']=stat.rx_bytes
self.previous_read[str(dp_id)][str(stat.port_no)]['rx_errors']=stat.rx_errors
self.previous_read[str(dp_id)][str(stat.port_no)]['tx_packets']=stat.tx_packets
self.previous_read[str(dp_id)][str(stat.port_no)]['tx_bytes']=stat.tx_bytes
self.previous_read[str(dp_id)][str(stat.port_no)]['tx_errors']=stat.tx_errors
OFPPortStatsReply contains every port’s throughput (packets, bytes and errors sent and received). Those
values are in absolute values, which means that they’re not suited for a performance analysis. To get the average
values we will take the values received and subtract the previous value received from them. This way, if you send a
OFPPortStatsRequest message every second you’ll have the average throughput per second in bytes, packets
and errors.
To store the previous read we will create a dictionary indexed by the datapath’s id whose values are also
dictionaries, indexed by the port’s id whose values are objects that contain the throughput information.
Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
'1': {
'1': {
"rx_packets":X,
"rx_bytes":Y,
...
}
'2': {
"rx_packets":2X,
"rx_bytes":2Y,
...
}
}
}
In the previous example we see a switch with datapath.id = 1 that has two ports ’1’ and ’2’, and the obtained
values of those ports.
Once we have the average throughput we store it in a database using an auxiliary class (self.db) which will be
explained later.
If it is the first time that we receive a OFPPortStatsReply from that switch we create an empty dictionary.
Finally we have to update the values which are now the previous_read values.
Controller
In the controller we will create a method for every resource and we will decorate it with the @route decorator.
Topology
104
1
2
3
4
5
@route('sw_list','/topology/switches/',methods=['GET'])
def get_sw_list(self,req, **kwargs):
switch_list = get_all_switch(self.myapp)
body = json.dumps([switch.to_dict() for switch in switch_list])
return Response(body=body)
6
7
8
9
10
11
@route('all_links','/topology/links/', methods=['GET'])
def get_all_links(self,req,**kwargs):
links = get_all_link(self.myapp)
body = json.dumps([link.to_dict() for link in links])
return Response(body=body)
12
13
14
15
16
17
18
19
@route('sw_link_list','/topology/links/{dpid}',methods=['GET'])
def get_link_list(self,req, **kwargs):
if int(kwargs['dpid']) not in self.myapp.switches:
return Response(body=None,status_code=404)
links = get_link(self.myapp, int(kwargs['dpid']))
body = json.dumps([link.to_dict() for link in links])
return Response(body=body)
As we said before, we will not use the app’s list of switches to send it as a response, instead we will use
ryu.topology.api package which contains four methods. We will use three of them:
1. get_all_switch(App), which returns a list of switches with relevant information about them such as a list
of ports, dpid, etc.
2. get_all_link(App), which returns a list of links with information about which switches they connect.
3. get_link(App,datapath_id) which returns all the links that a switch has.
Flows
1
2
3
4
@route('flow_table','/flows/',methods=['GET'])
def get_flow_list(self,req, **kwargs):
body = self.myapp.serialize_flow_list()
return Response(body=body)
5
6
7
8
9
10
11
@route('sw_flow_table','/flows/{dpid}/',methods=['GET'],requirements = {'dpid':'\d+'})
def get_sw_flow_list(self,req, **kwargs):
if int(kwargs['dpid']) not in self.myapp.switches:
return Response(body=None,status_code=404)
body = self.myapp.serialize_flow_list(kwargs['dpid'])
return Response(body=body)
12
13
14
15
16
17
18
@route('flow','/flows/{dpid}/{flow}/',methods=['GET'], requirements = {'dpid':'\d+','flow':'\d+'
})
def get_single_flow(self,req, **kwargs):
if int(kwargs['dpid']) not in self.myapp.switches or int(kwargs['flow']) not in self.myapp.
flow_list:
return Response(body=None,status_code=404)
body = self.myapp.serialize_flow_list(dpid=kwargs['dpid'],flow=int(kwargs['flow']))
return Response(body=body)
19
20
21
22
23
24
25
26
27
28
29
@route('add_flow','/flows/{dpid}/',methods=['POST'],requirements = {'dpid':'\d+'})
def put_flow_into_list(self,req,**kwargs):
if int(kwargs['dpid']) not in self.myapp.switches:
return Response(body=None,status_code=404)
try:
data = eval(req.body)
except:
return Response(staus_code=400)
for flow in data:
result = self.myapp.add_flow(kwargs['dpid'],int(flow['priority']),flow['conditions'],flow['
out_port'])
105
30
31
32
if result == False:
return Response(body=None,status_code=404)
return Response(status_code=200)
33
34
35
36
37
38
39
@route('delete_flow', '/flows/{dpid}/{flow}', methods=['DELETE'], requirements = {'dpid':'\d+','
flow':'\d+'})
def delete_flow(self, req,**kwargs):
if int(kwargs['dpid']) not in self.myapp.switches or int(kwargs['flow']) not in self.myapp.
flow_list:
return Response(body=None,status_code=404)
self.myapp.remove_table_flows(kwargs['dpid'],kwargs['flow'])
return Response(status_code=200)
As you can see, we’ve used two variable pieces of the URLs to identify the resources. Those are ’dpid’ and ’flow’.
The first will be used to determine the datapath_id from a switch and the second will be used to determine the
flow identificator (the cookie).
There are five defined methods to work with flows. get_flow_list(), get_sw_flow_list,
get_single_flow, put_flow_into_list() and delete_flow().
The get_flow_list method serializes the information contained in the flow_list.
get_sw_flow_list serializes only the flows that are installed on a switch.
get_single_flow returns only one flow from the flow_list.
The put_flow_into_list method uses add_flow from the application to send a OFPFlowMod message
to a switch with the parameters specified in the body.
Finally, the ’delete_flow’ method uses the remove_table_flow method from the application.
Statistics
1
2
3
4
5
6
7
8
@route('get_statistics', '/statistics/{dpid}/{port}',methods=['GET'], requirements = {'dpid':'\d+
','port':'\d+'})
def get_statistics(self, req, **kwargs):
if int(kwargs['dpid']) not in self.myapp.switches:
return Response(body=None,status_code=404)
stats = self.myapp.db.get_statistics(kwargs['dpid'],kwargs['port'],10)
if not stats:
return Response(body=None,status_code=404)
return Response(body=stats)
This method queries the database and retrieves the last ten results.
Database Connection
The app will query a database to send an retrieve port statistics. This connection will be done through the package
MySQLdb and a MySQL database.
Sending statistics:
1
2
3
4
5
def record_flow_entries(self, dpid, match, instructions, table_id):
try:
self.c.execute("INSERT INTO flow_stats (dpid,flow_match,instructions,table_id) VALUES ('%s
','%s','%s','%s')"%(dpid,match,instructions,table_id))
except Exception as e:
pass
6
7
8
9
10
11
12
13
14
15
def record_stats(self, dpid, port, rx_pkt,rx_b, rx_e, tx_pkt, tx_b, tx_errors):
try:
self.c.execute('INSERT INTO switch_%s_port_%s (rx_pkts,rx_bytes,rx_error,tx_pkts,tx_bytes,
tx_error) VALUES (%s,%s,%s,%s,%s,%s)'%(dpid,port,rx_pkt,rx_b,rx_e,tx_pkt,tx_b,tx_errors))
self.db.commit()
except:
self.c.execute("CREATE TABLE switch_%s_port_%s LIKE base_table" % (dpid,port))
self.db.commit()
self.c.execute('INSERT INTO switch_%s_port_%s (rx_pkts,rx_bytes,rx_error,tx_pkts,tx_bytes,
tx_error) VALUES (%s,%s,%s,%s,%s,%s)'%(dpid,port,rx_pkt,rx_b,rx_e,tx_pkt,tx_b,tx_errors))
self.db.commit()
106
This method will send a new row to be inserted in a table named with following the format:
switch_{datapath_id}_port_{port_number}.
If the table doesn’t exist (a switch just got added) it is created.
The base format of the tables are:
1
2
3
4
5
6
7
8
9
10
11
+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| Field
| Type
| N u l l | Key | D e f a u l t
| Extra
|
+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| rx_pkts
| b i g i n t ( 2 0 ) | YES |
| NULL
|
|
| r x _ b y t e s | b i g i n t ( 2 0 ) | YES |
| NULL
|
|
| r x _ e r r o r | b i g i n t ( 2 0 ) | YES |
| NULL
|
|
| tx_pkts
| b i g i n t ( 2 0 ) | YES |
| NULL
|
|
| t x _ b y t e s | b i g i n t ( 2 0 ) | YES |
| NULL
|
|
| t x _ e r r o r | b i g i n t ( 2 0 ) | YES |
| NULL
|
|
| time
| timestamp
| NO
|
| CURRENT_TIMESTAMP | on u p d a t e CURRENT_TIMESTAMP |
+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
To generate it you can use the MySQL query:
1
2
3
4
5
6
7
8
9
CREATE TABLE ` b a s e _ t a b l e ` (
` r x _ p k t s ` b i g i n t ( 2 0 ) DEFAULT NULL,
` r x _ b y t e s ` b i g i n t ( 2 0 ) DEFAULT NULL,
` r x _ e r r o r ` b i g i n t ( 2 0 ) DEFAULT NULL,
` t x _ p k t s ` b i g i n t ( 2 0 ) DEFAULT NULL,
` t x _ b y t e s ` b i g i n t ( 2 0 ) DEFAULT NULL,
` t x _ e r r o r ` b i g i n t ( 2 0 ) DEFAULT NULL,
` t i m e ` t i m e s t a m p NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
8.4.2
Django Implementation
We will use an intermediary server in order to reduce the server load. This intermediary server will be composed by
different modules that will implement: cache, content-negotiation and authentication functions. The intermediary will
receive a request, process it, decide what should be done with it and if it’s required, connect to the server to retrieve
the necessary information. It will be the entry point of the system and the user will not notice it’s existence.
Figure 8.2: Monitoring system topology procedure.
107
Authentication
The first entry point of the intermediary system will be an authentication subsystem. This authentication will use
HTTP headers to receive credentials from the clients. It will use the Basic authentication scheme. The resources
will be separated in four realms:
1. A realm to access the API bookmarks.
2. A realm to access the statistics.
3. A realm to access the installed flows.
4. A realm to access the network topology.
The first one will not require any credentials but the rest will do.
To implement this subsystem we will build a simple piece of middleware that decodes the Authorization
header and decides if a request can get the resource it demands.
Authentication Algorithm:
View
belongs to
realm?
Entry point
No
Follow Middleware
Chain
Yes
Request
contains
credentials?
No
Send
WWW-Authenticate
Yes
Follow
Middleware
Chain
Yes
Valid
Credentials?
No
401: Unauthorized
Figure 8.3: Authentication algorithm procedure.
For the credentials to be valid, they must use the Basic scheme, the result of decoding the credentials must follow
a ’user:pwd’ pattern and finally they must be authorized credentials for the realm in which the view belongs.
108
Code:
1
from serverconnector.authconfig import authconfig,credentialslist
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class HttpAuthMiddleware():
def process_view(self,request,view_func,view_args,view_kwargs):
realm = authconfig[view_func.__name__]
if realm == None:
return None
else:
if request.META.has_key('HTTP_AUTHORIZATION'):
[auth, credentials]=request.META['HTTP_AUTHORIZATION'].split(' ',1)
if auth.lower() == 'basic':
auth = credentials.strip().decode('base64')
username, password = auth.split(':', 1)
if username in credentialslist[realm]:
if credentialslist[realm][username]==password:
request.META['HTTP_AUTHORIZATION']=True
return None
response = HttpResponse(status=401)
response['WWW-Authenticate'] = "Basic realm=\"%s\"" % (realm)
return response
authconfig and credentialslist are two dictionaries. The first one is indexed by view names and contains the realm name in which the view belongs. The second one is indexed by realm names and it’s values are
dictionaries that contain user credentials: the key is the user name and the value is the password. Both variables have
been hard-coded in an external file.
1
2
3
4
5
6
7
8
9
10
authconfig={
'Bookmark':None,
'TopologyBookmark':None,
'SwitchListView':'Topo',
'LinkListView':'Topo',
'FullFlowView':'Flow',
'SwFlowView':'Flow',
'SingleFlowView':'Flow',
'StatsView':'Stats',
}
11
12
13
14
15
16
credentialslist={
'Flow':{'root':'root','flowuser':'flowpwd'},
'Topo':{'root':'root','topouser':'topopwd'},
'Stats':{'root':'root','statsuser':'statspwd'},
}
Reasons not to use django’s default authentication module:
• Even if django has an authentication application which contains a user model, it works with sessions and cookies, and therefore contradicts one of the REST constrains (stateless). To make it stateless we need to write a
workaround.
• In this application it’s not necessary to manage users, only to have a list of valid credentials.
• If we construct the authentication subsystem like this, we obtain a pluggable mechanism, that doesn’t require
any modifications on the views while if we use djang’s authentication model we need to define the permission
requirements on the view.
Since we’re applying realms and every view can belong to different realms we will use the process_view method,
because we need to apply this middleware after the view has been selected.
Finally, as you can see in line 16, if the credentials are valid and the view is going to be processed, the request’s
Authorization header is modified and it’s value is set to True (Boolean). It’s a workarround to achieve a correct
functioning of the cache middleware, that will be explained later.
109
Content-Negotiation
Once the user is authenticated, the server will decide if it is able to serve the request in the format that the client
requires in a server-driven negotiation style. The server will have a list of acceptable formats for each view.
The server will look for an Accept header with all the allowed types and quality factors and will look for the best
one (See section 2.9.10) that is accepted by the chosen view. All the views will be served in a standard format and
after the view has been generated, the server will decide if it has to transform the content’s format or not.
Content-Negotiation Algorithm:
Entry point
Request
contains
Accept
header?
Yes
No
Set content
header to view's
default format
Parse Header
Look for the
best matching
format
Error on
pasing
None
found
406: Not
Acceptable
406: Not
Acceptable
Set content
header to best
matching format
Figure 8.4: Content Negotiation algorithm procedure.
When a message containing 406 status code is sent, the server sends in it’s body the list of available formats. Since
HTTP does not define a standard format to do so they will be sent in json format.
Code:
1
class ConentNegotiationMiddleware():
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def process_view(self,request,view_func,view_args,view_kwargs):
if request.META.has_key('HTTP_ACCEPT'):
aux=request.META['HTTP_ACCEPT'].replace(' ','')
allowed_types=aux.split(',')
best_fit_type={'general':'*','specific':'*','q':0,'default':True}
func_class = get_class(view_func.__module__, view_func.__name__)
availiable_types = func_class.types
default = func_class.default
try:
for elem in allowed_types:
chunks = elem.split(';',1)
if len(chunks) == 1:
q=1
elif len(chunks) == 2:
q = chunks[1].replace("q=","")
else:
return HttpResponse(json.dumps(availiable_types),status=406)
if float(q) > 0.0 and float(q) <= 1.0:
[general, specific] = chunks[0].split('/',1)
if general in availiable_types:
if specific in availiable_types[general]:
if float(q) > float(best_fit_type['q']):
best_fit_type = {'general':general,'specific':specific,'q':q,'default':False}
elif float(q) == float(best_fit_type['q']) and best_fit_type['specific']=='*':
best_fit_type = {'general':general,'specific':specific,'q':q,'default':False}
elif specific == '*':
if float(q) > float(best_fit_type['q']):
best_fit_type = {'general':general,'specific':specific,'q':q,'default':False}
elif general == '*' and specific == '*':
best_fit_type = {'general':general,'specific':specific,'q':q,'default':False}
if best_fit_type['default']==True:
110
34
35
36
37
38
39
40
41
42
43
44
45
46
47
return HttpResponse(json.dumps(availiable_types),status=406)
else:
if best_fit_type['general'] == '*':
request.META['HTTP_ACCEPT']="%s" % default['*/*']
elif best_fit_type['specific'] == '*':
request.META['HTTP_ACCEPT']="%s" % default[best_fit_type['general']]
else:
request.META['HTTP_ACCEPT']="%s/%s" % (best_fit_type['general'], best_fit_type['
specific'])
return None
except:
return HttpResponse(json.dumps(availiable_types),status=406)
else:
request.META['HTTP_ACCEPT']="%s" % default['*/*']
return None
48
49
50
51
52
53
54
55
56
57
58
59
def process_response(self,request, response):
last = request.META['HTTP_ACCEPT']
last = last[-4:]
if response.status_code==200 and response.content != None and last == 'yaml':
aux = yaml.load(response.content.decode('utf-8'))
if type(aux) == list :
response.content = yaml.dump_all(aux)
else:
response.content = yaml.dump(aux)
response['Content-Type'] = request.META['HTTP_ACCEPT']
return response
The first method (process_view) is the responsible of the content negotiation.
availiable_types and default are dictionaries, defined in every view, that list all the possible formats that
can be served and the default ones, in case of receiving headers with ’*/*’ or ’something/*’ format. They are defined
inside the view’s class with the names ’types’ and ’default’.
availiable_types is indexed by the general type (For example text, application, audio, etc.) and it’s values
are lists of strings defining specific types (For example, for text they can be html, plain, xml, etc.).
default contains general types as keys too but its values are strings which define only one specific type: the
default value it will be used when a ’general/*’ Accept header is received. It also contains a key ’*/*’ which is the
default value for ’*/*’ requests.
Example:
1
2
types = {
'application':[
'json',
'yaml',
3
4
5
6
7
8
9
10
11
12
13
14
15
'text':
]
[
'xml',
'html',
]
}
default = {
'*/*':'application/json',
'application':'yaml',
'text':'html',
}
This view accepts ’application/json’, ’application/yaml’, ’text/xml’ and ’text/html’. If the best qualified contenttype is one of them, it will be served. However, if the best qualified content-type is ’application/*’ the serve will send
the content in ’application/yaml’ format and if it is ’text/*’, the server will render the response in ’text/html’. Finally,
if the best qualified format is ’*/*’ the server will send the response in ’application/json’
As you can see in lines 8-10, to get types and default from the view’s class we use an auxiliary function
called get_class:
111
1
from django.utils import importlib
2
3
4
5
6
7
8
9
10
11
12
13
def get_class(module_name, cls_name):
try:
module = importlib.import_module(module_name)
except ImportError:
raise ImportError('Invalid class path: {}'.format(module_name))
try:
cls = getattr(module, cls_name)
except AttributeError:
raise ImportError('Invalid class name: {}'.format(cls_name))
else:
return cls
When the best fitting format is found it’s q factor and the other Accept values are removed in order to ease the
task of selecting the right format when the view is processed.
For example (using the allowed and default content types from the last example):
1
A c c e p t : a p p l i c a t i o n / j s o n ; q = 0 . 3 , t e x t / xml ; q = 0 . 5 , t e x t / p l a i n ; q = 0 . 8
Would be transformed into:
1
A c c e p t : t e x t / xml
Once the content type has been fixed and the view has been executed, the second method (process_response) is
called. Knowing the default format that the view serves it will transform the content when necessary to the type that
the client requested.
In this application, the accepted types can only be json or yaml and the format served by the view is json. That’s
why the code above only decides between yaml or json depending on the last four chars from the Content-Type
header. If the last four letters of the header are ’yaml’ it will transform the message, but if they’re json it won’t do
anything.
Cache
Cache will be implemented with django’s default middleware, just like it was detailed on section 6.10, using memcached and pylibmc.
In this case, however, we will use one more decorator for the views:
1
@vary_on_header ( ' header1 ' , ' header2 ' , . . . )
This decorator indicates that the results stored on the cache will be used only if the incoming request contain
the exact same values on the specified headers. Even if in the authentication and content-negotiation middleware
we defined that invalid requests must be responded with error status codes, the cache middleware process_response
method doesn’t listen to the status code present in the response and if the resource is cached, the middleware sends it.
For example:
If it arrives a request with valid authentication credentials, the response is cached. After that, a request with non
valid credentials arrive, and the authentication middleware returns a 401 status code, but the middleware cache has the
resource stored. It ignores the status code and returns a message with the stored content with a new status code (200).
The same happens with the Content-Type header, meaning that a user could receive data in a format that he did not
requested.
With the decorator this problems but causes another one : Since this application does not generate different content
for different users, all the requests with valid credentials can be responded with cached values, but since we are using
the decorator, the middleware will differentiate between different users, even if the content stored is the same. That’s
why in the authentication middleware, if the user had valid credentials to access some view the server switches the
Authentication header value to ’True’ (Boolean).
The maximum time for a resource to be stored will be different for each one. There will be four different validity
times, from low varying(5 minutes) to very high varying(1 second).
112
Code:
1
urlpatterns = patterns('',
2
3
url(r'^$',cache_page(60*5)(vary_on_headers('Accept','WWW-Authorization')(Bookmark.as_view())),
name='bookmark'),
4
5
url(r'^topology/$',cache_page(60*5)(vary_on_headers('Accept','WWW-Authorization')(
TopologyBookmark.as_view())), name='topo-book'),
6
7
url(r'^topology/switches/$', cache_page(60*1)(vary_on_headers('Accept','WWW-Authorization')(
SwitchListView.as_view())), name='switch-list'),
8
9
url(r'^topology/links/$',cache_page(60*1)(vary_on_headers('Accept','WWW-Authorization')(
LinkListView.as_view())),name='ful-link-list'),
10
11
url(r'^topology/links/(?P<pk>\d+)/$', cache_page(60*1)(vary_on_headers('Accept','WWWAuthorization')(LinkListView.as_view())), name='link-list'),
12
13
url(r'^flows/$', cache_page(30)(vary_on_headers('Accept','WWW-Authorization')(FullFlowView.
as_view())), name='full-flow-list'),
14
15
url(r'^flows/(?P<dpid>\d+)/$', cache_page(30)(vary_on_headers('Accept','WWW-Authorization')(
SwFlowView.as_view())), name='sw-flow-list'),
16
17
url(r'^flows/(?P<dpid>\d+)/(?P<flow>\d+)/$', cache_page(30)(vary_on_headers('Accept','WWWAuthorization')(SingleFlowView.as_view())), name='flow-view'),
18
19
url(r'^statistics/(?P<dpid>\d+)/(?P<port>\d+)/', cache_page(1)(vary_on_headers('Accept','WWWAuthorization')(StatsView.as_view())), name='statistics'),
20
21
)
Views
In this project, the main function of views will be establish a new connection to the Ryu server and retransmit the
request from the user to the server and the response from the server to the user.
We will write a view for each resource. In the urls configuration we will use the same locators than we’ve used in
the Ryu server.
The view will take the locator received from the request (without the scheme nor the domain part) and create a new
uri pointing to the resource in the Ryu server. When a response is received from the Ryu server, if it’s a valid response,
the server will modify the body of the response to add the links to follow the application flow and it will return it to
the client. If the server responds with any status code from 200 it will generate a response with nothing but the status
code.
To make new requests we will use the library ’requests’, available on pip. It will create HTTP connections to
retransmit the received requests to the Ryu server.
The view, as explained before, will contain the content-negotiation information for the resource. The content types
will be custom content types to differentiate between the information types that each resource contains.
Code:
1
2
class SwFlowView(View):
http_method_names = ['get','post','options']
3
4
5
6
7
8
9
10
11
types = {
'application': [
'vnd+SDN.swflowlist+json',
'vnd+SDN.swflowlist+yaml',
]
}
default = {
'*/*':'application/vnd+SDN.swflowlist+json',
113
'application':'application/vnd+SDN.swflowlist+json',
12
13
}
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def get(self,request,*args,**kwargs):
uri = basedir + request.path
try:
r = requests.get(uri)
if r.status_code == 200:
flow_list = eval(r.text)
for flow in flow_list:
if flow != 'age':
flow_list[flow]['detailflowlink'] = reverse('flow-view',kwargs={'dpid':
flow_list[flow]['dpid'],'flow':flow})
return HttpResponse(json.dumps(flow_list))
else:
return HttpResponse(status=r.status_code)
except:
return HttpResponse(status=500)
29
30
31
32
33
valid_keys = ['in_port','in_phy_port','metadata','eth_dst','eth_src','eth_type','vlan_vid','
ip_dscp','ip_ecn','ip_proto',
'ipv4_src','ipv4_dst','tcp_src','tcp_dst','udp_src','udp_dst','sctp_src','
sctp_dst','icmpv4_type','icmpv4_code',
'arp_op','arp_spa','arp_tpa','arp_sha','arp_tha','ipv6_src','ipv6_dst','
ipv6_flabel','icmpv6_type','icmpv6_code',
'ipv6_nd_target','ipv6_nd_sll','ipv6_nd_tll','mpls_label','mpls_tc','mpls_bos',
'pbb_isid','tunnel_id','ipv6_exthdr']
34
35
36
37
38
39
40
41
42
43
def isflow(self,flow):
keys = flow.keys()
if len(keys) == 4:
if 'd_id' in keys and 'priority' in keys and 'conditions' in keys and 'out_port' in
keys:
for key in flow['conditions']:
if key not in self.valid_keys:
return False
return True
return False
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def post(self,request,*args,**kwargs):
if request.META['CONTENT_TYPE'] == 'application/vnd+SDN.flowlist+json':
try:
data=request.body.decode('utf-8')
flow_list = yaml.load(data)
except:
return HttpResponse(status=500)
for flow in flow_list:
if not self.isflow(flow):
return HttpResponse(status=400)
try:
r = requests.post(basedir + request.path, data = request.body.decode('utf-8')
)
return HttpResponse(status=r.status_code)
except:
return HttpResponse(status=400)
return HttpResponse(status=400)
This view corresponds to the switch’s flow list resource. There are two available methods for this resource: GET
and POST.
The GET method is very simple: It retrieves a list of flows from the server in json format, parses it to a dictionary
and adds to every flow a link to the flow’s detailed resource using the reverse function.
Reverse function is used to avoid having to write hard-coded URLs. It takes the name view and the parameters that
a view contains and generates the URL. This allows you to change your URLs freely without having to worry about
114
all the places where this URL is used.
Example:
If you have defined this url pattern:
1
url(r'^flows/(?P<dpid>\d+)/$', SwFlowView.as_view(), name='sw-flow-list')
The response to this call:
1
reverse('sw-flow-list', kwargs={'dpid': 5})
Will generate the string:
1
/ flows / 5 /
If any exception raises in the execution of the view, the server will return a 500 status code. It could happen on the
process of connecting to the Ryu server or on the process of parsing the data received from it.
Even if the response received is on json format, we will parse it using the yaml parser. Since yaml is a superset of
json there is no problem on doing it. The reason behind it is that if you parse the data using the json method designed
for it and then you want to format it with yaml, the output contains some encoding markups that are not required. By
using yaml parser we avoid this problem, since the data parsed with the yaml parser can be formatted in json or again
in yaml without generating this markups.
The POST method takes the URL from the request, builds the new URI, parses the information received from the
client and checks if it is valid data. If it is valid it establishes a connection to the Ryu server to retransmit the request.
Finally it returns only a status code, because the responses to the POST methods don’t contain any data.
To check the validity of the data received from the client, the first parameter to evaluate is the Content-Type
header. After that, the server tries to parse the data contained in the body of the request, it evaluates the parsed data
by checking the key names, the number of keys and the keys contained in the ’conditions’ dictionary. If any exception
is raised in this process, the view returns a 500 status code, and if no exceptions are raised but the data parsed is not
valid the server returns a 400 status code.
The rest of the views can be found on annex:
8.4.3
Topology configuration
To create the topology to implement this system we will use three kinds of virtualization.
• The Ryu server, the Django intermediary server and the client will be created with LxC
• To simulate the network that has to be controlled by the Ryu server we will use a virtual network generated with
Mininet.
First of all, we’re going to create three virtual machines with lxc-create tool:
1
2
3
$ sudo lxc-create -t ubuntu -n client
$ sudo lxc-create -t ubuntu -n django
$ sudo lxc-create -t ubuntu -n ryu
After that, we’re going to edit the configuration files:
Client configuration in /var/lib/lxc/client/config:
1
2
3
4
5
# T e m p l a t e u s e d t o c r e a t e t h i s c o n t a i n e r : / u s r / s h a r e / l x c / t e m p l a t e s / l x c −u b u n t u
# Parameters passed to the template :
# For a d d i t i o n a l c o n f i g o p t i o n s , p l e a s e look a t l x c . c o n t a i n e r . conf ( 5 )
# Common c o n f i g u r a t i o n
l x c . i n c l u d e = / u s r / s h a r e / l x c / c o n f i g / u b u n t u . common . c o n f
6
7
8
9
10
# Container specific configuration
lxc . r o o t f s = / var / l i b / lxc / c l i e n t / r o o t f s
l x c . mount = / v a r / l i b / l x c / c l i e n t / f s t a b
l x c . utsname = c l i e n t
115
11
12
13
14
15
16
17
l x c . a r c h = amd64
lxc . a a _ p r o f i l e = unconfined
# Network c o n f i g u r a t i o n : p h y s i c a l h o s t l i n k
lxc . network . type = veth
l x c . n e t w o r k . f l a g s = up
lxc . network . l i n k = lxc br0
l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : 2 a : 0 d : 3 f
18
19
20
21
22
23
24
25
26
# Network c o n f i g u r a t i o n : SUBNET 1 0 . 0 . 0 . 0 / 2 4
lxc . network . type = veth
l x c . n e t w o r k . f l a g s = up
lxc . network . l i n k = br0
l x c . n e t w o r k . name = e t h 1
lxc . network . ipv4 = 1 0 . 0 . 0 . 1 / 2 4
lxc . network . veth . p a i r = vethc
l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : f f : 4 d : 8 e
Django configuration in /var/lib/lxc/django/config:
1
2
3
4
5
# T e m p l a t e u s e d t o c r e a t e t h i s c o n t a i n e r : / u s r / s h a r e / l x c / t e m p l a t e s / l x c −u b u n t u
# Parameters passed to the template :
# For a d d i t i o n a l c o n f i g o p t i o n s , p l e a s e look a t l x c . c o n t a i n e r . conf ( 5 )
# Common c o n f i g u r a t i o n
l x c . i n c l u d e = / u s r / s h a r e / l x c / c o n f i g / u b u n t u . common . c o n f
6
7
8
9
10
11
12
# Container specific configuration
lxc . r o o t f s = / var / l i b / lxc / django / r o o t f s
l x c . mount = / v a r / l i b / l x c / d j a n g o / f s t a b
l x c . utsname = django
l x c . a r c h = amd64
lxc . a a _ p r o f i l e = unconfined
13
14
15
16
17
18
# Network c o n f i g u r a t i o n : p h y s i c a l h o s t l i n k
lxc . network . type = veth
l x c . n e t w o r k . f l a g s = up
lxc . network . l i n k = lxc br0
l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : bb : 2 9 : a a
19
20
21
22
23
24
25
26
27
# Network c o n f i g u r a t i o n : SUBNET 1 0 . 0 . 0 . 0 / 2 4
lxc . network . type = veth
l x c . n e t w o r k . f l a g s = up
lxc . network . l i n k = br0
l x c . n e t w o r k . name = e t h 1
lxc . network . ipv4 = 1 0 . 0 . 0 . 2 / 2 4
lxc . network . veth . p a i r = vethd1
l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : 5 2 : 0 4 : 4 e
28
29
30
31
32
33
34
35
# Network c o n f i g u r a t i o n : SUBNET 1 0 . 0 . 1 . 0 / 2 4
lxc . network . type = veth
l x c . n e t w o r k . f l a g s = up
lxc . network . l i n k = br1
lxc . network . ipv4 = 1 0 . 0 . 1 . 1 / 2 4
l x c . n e t w o r k . name = e t h 2
lxc . network . veth . p a i r = vethd2
Ryu configuration in /var/lib/lxc/ryu/config:
1
2
3
# T e m p l a t e u s e d t o c r e a t e t h i s c o n t a i n e r : / u s r / s h a r e / l x c / t e m p l a t e s / l x c −u b u n t u
# Parameters passed to the template :
# For a d d i t i o n a l c o n f i g o p t i o n s , p l e a s e look a t l x c . c o n t a i n e r . conf ( 5 )
4
5
6
# Common c o n f i g u r a t i o n
l x c . i n c l u d e = / u s r / s h a r e / l x c / c o n f i g / u b u n t u . common . c o n f
7
8
9
# Container specific configuration
lxc . r o o t f s = / var / l i b / lxc / ryu / r o o t f s
116
10
11
12
13
lxc
lxc
lxc
lxc
. mount = / v a r / l i b / l x c / r y u / f s t a b
. utsname = ryu
. a r c h = amd64
. a a _ p r o f i l e = unconfined
14
15
16
17
18
19
# Network c o n f i g u r a t i o n : p h y s i c a l h o s t l i n k
lxc . network . type = veth
l x c . n e t w o r k . f l a g s = up
lxc . network . l i n k = lxc br0
l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : 8 2 : 0 7 : 5
20
21
22
23
24
25
26
27
# Network c o n f i g u r a t i o n : SUBNET 1 0 . 0 . 1 . 0 / 2 4
lxc . network . type = veth
lxc . network . l i n k = br1
l x c . n e t w o r k . f l a g s = up
lxc . network . ipv4 = 1 0 . 0 . 1 . 2 / 2 4
l x c . n e t w o r k . name = e t h 1
lxc . network . veth . p a i r = vet hr1
These files will set the following network configuration:
Django
Client
br0
eth1
10.0.0.1/24
br1
eth1
10.0.0.2/24
eth0
10.0.3.117/24
eth2
10.0.1.1/24
eth1
10.0.1.2/24
eth0
10.0.3.99/24
eth0
10.0.3.51/24
lxbr0
lxbr0
10.0.3.1/24
Physical Host
Figure 8.5: Topology generated with lxc containers procedure.
Notes:
• The part of the config file that corresponds with the connection with the physical host should not be changed.
Depending on the changes, the containers may start taking a long time to load (about 5 min). Also, some
functionalities may not work correctly, such as DNS service.
• The device names look different from the point of view of the physical host. From the point of view of the
container, the names look like in figure 8.5 but if you call ifconfig in the physical host you’ll see the names
defined in the configuration files with the parameter lxc.network.veth.pair (vethr1, vethd2, vethd1, etc.). It is
important to define them because if you don’t do it, the names will be defined with a static part (veth) and a
random string appended (For example: vethXHD92Q) and can be chaotic if you need to apply some changes to
the network.
117
• LxC creates only the lxbr0 bridge. The other ones have to be created manually (with brctl for example). If
they’re not created at the moment of starting the container LxC will throw an error and it won’t start. However,
it’s not necessary to configure them, LxC will add the interfaces to the bridges and will set them up.
Once the containers have been correctly configured and the network works correctly the next step is to install all
the required packages in order to be able to execute the project.
Client:
Start the client container typing:
1
$ sudo lxc-start -n client
Once the container is loaded log with the default credentials User: Ubuntu and Password: Ubuntu.
On the client we will only need curl.
1
$ sudo apt-get install curl
Django
Start the container and log just like in the Client container.
In this container we will need to install the django framework, the memcached service and some python packages.
To make everything easier we will also install pip.
1
2
3
4
5
6
7
8
9
10
11
$
$
$
$
$
$
$
$
$
$
$
sudo
sudo
sudo
sudo
sudo
sudo
sudo
sudo
sudo
sudo
sudo
apt-get update
apt-get install python-pip
pip install django
pip install pyyaml
pip install requests
apt-get install memcached
apt-get install libmemcached-dev
apt-get install gcc
apt-get install python-dev
pip install python-memcached
pip install pylibmc
Ryu
Finally, in the Ryu container we will have to install mininet, the ryu framework, OpenvSwitch, Mysql and python
packages. We will also install Git to clone some repositories.
Mininet:
1
2
3
$ sudo apt-get install git
$ git clone git://github.com/mininet/mininet
$ mininet/util/install.sh -a
OpenvSwitch:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ sudo apt-get install wget
$ wget http://openvswitch.org/releases/openvswitch-2.0.1.tar.gz
$ tar zxvf openvswitch-2.0.1.tar.gz
$ cd openvswitch-2.0.1
$ ./boot.sh
$ sudo ./configure
$ sudo make && sudo make install
$ mkdir -p /usr/local/etc/openvswitch
$ sudo ovsdb-tool create /usr/local/etc/openvswitch/conf.db vswitchd/vswitch.ovsschema
$ sudo ovsdb-server -v --remote=punix:/usr/local/var/run/openvswitch/db.sock \
--remote=db:Open_vSwitch,Open_vSwitch,manager_options \
--private-key=db:Open_vSwitch,SSL,private_key \
--certificate=db:Open_vSwitch,SSL,certificate \
--pidfile --detach --log-file
$ sudo ovs-vsctl --no-wait init
$ sudo ovs-vswitchd --pidfile --detach
Every time you reboot the system you’ll have to execute the following commands:
118
1
2
3
4
5
6
7
$ sudo ovsdb-server -v --remote=punix:/usr/local/var/run/openvswitch/db.sock \
--remote=db:Open_vSwitch,Open_vSwitch,manager_options \
--private-key=db:Open_vSwitch,SSL,private_key \
--certificate=db:Open_vSwitch,SSL,certificate \
--pidfile --detach --log-file
$ sudo ovs-vsctl --no-wait init
$ sudo ovs-vswitchd --pidfile --detach
Ryu framework:
1
2
3
4
5
6
7
$
$
$
$
$
$
$
sudo apt-get install python-pip
sudo apt-get install python-dev
git clone git://github.com/osrg/ryu.git
cd ryu; sudo python ./setup.py install
sudo apt-get install python-eventlet
sudo apt-get install python-routes
sudo apt-get install paramiko
At this point you can test if it is everything has been installed correctly:
Open a new terminal and execute:
1
$ sudo lxc-console -n ryu
It will convert the terminal from the physical host to a terminal from the ryu container.
On the first terminal execute:
1
$ sudo mn --topo single,2 --controller remote --mac --switch ovsk
On the second terminal execute, from the ryu directory:
1
~/ryu/$ PYTHONPATH=. ./bin/ryu-manager --observe-links ryu/app/simple_switch_13.py
It will launch a Ryu application that makes the virtual switch to work like any level 2 switch. Back on the first
terminal, inside the mininet environment, call:
1
> h1 ping -c 1 h2
If the ping was answered everything is working fine.
MySQL:
2
$ sudo apt-get install mysql-server
$ sudo apt-get install python-mysqldb
1
$ mysql -u <username> -p
1
You should use the user that you created during the ’mysql-server’ installation.
1
2
3
4
5
6
7
8
9
mysql> CREATE DATABASE SDN:
mysql> CREATE TABLE `base_table` (
`rx_pkts` bigint(20) DEFAULT NULL,
`rx_bytes` bigint(20) DEFAULT NULL,
`rx_error` bigint(20) DEFAULT NULL,
`tx_pkts` bigint(20) DEFAULT NULL,
`tx_bytes` bigint(20) DEFAULT NULL,
`tx_error` bigint(20) DEFAULT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP);
Complete Test:
We’ve finally set all the virtual machines up with the correct configuration and with the necessary packages for the
whole application to work.
119
To test if it works, we’ll start the three machines, start the Ryu server and mininet on the Ryu container and the
django server in django container. Finally we’ll capture the traffic on br0 and br1 with Wireshark and we’ll use curl to
execute a call to the system.
First of all, add br0 and br1:
1
2
$ sudo brctl addbr br0
$ sudo brctl addbr br1
After, start the three virtual machines.
1
2
3
$ sudo lxc-start -n client
$ sudo lxc-start -n django
$ sudo lxc-start -n ryu
Open a new terminal and start another ryu console:
1
$ sudo lxc-console -n ryu
On the first ryu console configure the OpenvSwitch and start mininet with the following configuration:
7
$ sudo ovsdb-server -v --remote=punix:/usr/local/var/run/openvswitch/db.sock \
--remote=db:Open_vSwitch,Open_vSwitch,manager_options \
--private-key=db:Open_vSwitch,SSL,private_key \
--certificate=db:Open_vSwitch,SSL,certificate \
--pidfile --detach --log-file
$ sudo ovs-vsctl --no-wait init
$ sudo ovs-vswitchd --pidfile --detach
1
$ sudo mn --topo single,2 --mac --switch ovsk --controller remote
1
2
3
4
5
6
On the second ryu console configure the bridge to use OpenFlow13 and start the ryu server:
1
2
3
$ sudo ovs-vsctl set Bridge s1 protocols=OpenFlow13
$ cd ryu/
$ PYTHONPATH=. ./bin/ryu-manager --observe-links ryu/app/myapp.py
On the django console start the django server with the manage script.
1
sudo ./manage.py runserver 0.0.0.0:80
Start two Wireshark instances, set one to capture on br0 and another one to capture on br1.
Finally, on the client use curl to make a POST call to the django server and add two flows, one that redirects the
packets from port1 to port 2 and another one that redirects th packets from port 2 to port 1
1
curl -X post 10.0.0.2:80/flows/1/ -d '[{"d_id":0000000000000001,"priority":1,"conditions":{"
in_port":1},"out_port":2},{"d_id":0000000000000001,"priority":1,"conditions":{"in_port":2},"
out_port":1}]' -u root:root
On the client side we only receive a 200 status code. If we look at the wireshark capture (Figure 8.6) we can see
how every bridge captured different TCP connections.
Tcp Stream on br0:
1
2
3
4
5
6
7
POST / f l o w s / 1 / HTTP / 1 . 1
A u t h o r i z a t i o n : B a s i c cm9vdDpyb290
User−Agent : c u r l / 7 . 2 2 . 0 ( x86_64−pc−l i n u x −gnu ) l i b c u r l / 7 . 2 2 . 0 OpenSSL / 1 . 0 . 1 z l i b / 1 . 2 . 3 . 4 l i b i d n
/1.23 librtmp /2.3
Host : 1 0 . 0 . 0 . 2 : 8 0 0 0
Accept : */*
C o n t e n t −Type : a p p l i c a t i o n / vnd+SDN . f l o w l i s t + j s o n
C o n t e n t −L e n g t h : 129
8
120
9
10
11
12
13
14
[{ " d_id " :1 , " p r i o r i t y " :1 , " c o n d i t i o n s " :{ " i n _ p o r t " :1} , " out_port " :2} ,{ " d_id " :1 , " p r i o r i t y " :1 , "
conditions " :{ " in_port " :2} , " out_port " :1}]
HTTP / 1 . 0 200 OK
D a t e : Sun , 05 J u l 2015 2 3 : 1 6 : 5 6 GMT
S e r v e r : WSGIServer / 0 . 1 P y t h o n / 2 . 7 . 3
Vary : Accept , WWW
−A u t h o r i z a t i o n
C o n t e n t −Type : a p p l i c a t i o n / vnd+SDN . s w f l o w l i s t + j s o n
Tcp Stream on br1:
1
2
3
4
5
6
7
POST / f l o w s / 1 / HTTP / 1 . 1
Host : 1 0 . 0 . 1 . 2 : 8 0 8 0
C o n t e n t −L e n g t h : 129
User−Agent : p y t h o n−r e q u e s t s / 2 . 7 . 0 CPython / 2 . 7 . 3 L i n u x /3.13.0 −37 − g e n e r i c
C o n n e c t i o n : keep−a l i v e
Accept : */*
Accept−E n c o d i n g : g z i p , d e f l a t e
8
9
10
11
12
13
14
[{ " d_id " :1 , " p r i o r i t y " :1 , " c o n d i t i o n s " :{ " i n _ p o r t " :1} , " out_port " :2} ,{ " d_id " :1 , " p r i o r i t y " :1 , "
conditions " :{ " in_port " :2} , " out_port " :1}]
HTTP / 1 . 1 200 OK
C o n t e n t −Type : t e x t / h t m l ; c h a r s e t =UTF−8
C o n t e n t −L e n g t h : 0
D a t e : Sun , 05 J u l 2015 2 3 : 1 6 : 5 6 GMT
C o n n e c t i o n : keep−a l i v e
(a) Capture on br0
(b) Capture on br1
Figure 8.6: Wireshark captures from a POST request
Finally, on the first ryu console, to test if the flows were correctly added, you can try:
1
mininet> h1 ping h2
121
Bibliography
[1] Roy Thomas Fielding. Architectural Styles and the Design of Network-based Software Architectures. PhD thesis,
University of California, Irvine, 2000.
[2] Paul Sobocinski.
Hypermedia apis:
The benefits of hateoas,
2014.
http://www.programmableweb.com/news/hypermedia-apis-benefits-hateoas/how-to/2014/02/27].
[Online;
[3] Roy T. Fielding. Rest apis must be hypertext-driven, 2008. [Online; http://roy.gbiv.com/untangled/2008/restapis-must-be-hypertext-driven].
[4] Leonard Richardson and Sam Ruby. RESTful Web Services. O’Reilly Media, 2007.
[5] Joshua Thijssen. The restful cookbook. [Online; http://restcookbook.com/].
[6] Jim Webber, Savas Parastatidis, and Ian Robinson.
http://www.infoq.com/articles/webber-rest-workflow].
How to get a cup of coffee, 2008.
[Online;
[7] Draft - make readable uris, 2004. [Online; http://www.w3.org/QA/2004/08/readable-uri].
[8] Mike Amundsen.
Roy fielding on versioning,
http://www.infoq.com/articles/roy-fielding-on-versioning].
hypermedia,
and
rest,
2014.
[Online;
[9] T. Berners-Lee, L. Masinter, and M. McCahill. Uniform Resource Locators (URL). RFC 1738 (Proposed
Standard), December 1994. Obsoleted by RFCs 4248, 4266, updated by RFCs 1808, 2368, 2396, 3986, 6196,
6270.
[10] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. Hypertext Transfer
Protocol – HTTP/1.1. RFC 2616 (Draft Standard), June 1999. Obsoleted by RFCs 7230, 7231, 7232, 7233,
7234, 7235, updated by RFCs 2817, 5785, 6266, 6585.
[11] A. Barth. HTTP State Management Mechanism. RFC 6265 (Proposed Standard), April 2011.
[12] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. RFC 7230
(Proposed Standard), June 2014.
[13] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC 7231
(Proposed Standard), June 2014.
[14] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests. RFC 7232 (Proposed Standard), June 2014.
[15] R. Fielding, Y. Lafon, and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Range Requests. RFC 7233
(Proposed Standard), June 2014.
[16] R. Fielding, M. Nottingham, and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Caching. RFC 7234
(Proposed Standard), June 2014.
122
[17] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Authentication. RFC 7235 (Proposed
Standard), June 2014.
[18] Introducing json. [Online; http://www.json.org].
[19] Use http basic authentification to login into django, 2014. [Online; http://ponytech.net/blog/use-http-basicauthentification-login].
[20] Jacob K. Moss Adrian Holovaty. The Definitive Guide to Django: Web Development Done Right. Apress, 2006.
[21] Django documentation. [Online; https://docs.djangoproject.com/en/1.8/].
[22] Erik Christensen, Francisco Curbera, Greg Meredith, and Sanjiva Weerawarana. Web services description language (wsdl) 1.1, 2001. [Online; http://www.w3.org/TR/wsdl].
[23] Nilo Mitra and Yves Lafon.
www.w3.org/TR/soap12-part0/].
Soap version 1.2 part 0:
Primer (second edition), 2007.
[Online;
[24] Martin Gudgin, Marc Hadley, Noah Mendelsohn, Jean-Jacques Moreau, Henrik Frystyk Nielsen, Anish Karmarkar, and Yves Lafon. Soap version 1.2 part 1: Messaging framework (second edition), 2007. [Online;
www.w3.org/TR/soap12-part1/].
[25] Hugo Haas and Allen Brown. Web services glossary, 2004. [Online; http://www.w3.org/TR/ws-gloss/].
[26] Don Box. A brief history of soap, April 2001. [Online; http://www.xml.com/pub/a/ws/2001/04/04/soap.html].
[27] Some thoughts for the enterprise embracing web apis, 2012. [Online; http://apievangelist.com/2012/12/09/somethoughts-for-the-enterprise-embracing-web-apis/].
[28] Douglas C. Schmidt.
Overview of
http://www.cs.wustl.edu/ schmidt/PDF/rpc4.pdf].
remote
procedure
calls
(rpc).
[Online;
[29] From edi to xml and uddi:
A brief history of web services, 2001.
[Online;
http://www.informationweek.com/from-edi-to-xml-and-uddi-a-brief-history-of-web-services/d/d-id/1012008].
[30] R. Srinivasan. RPC: Remote Procedure Call Protocol Specification Version 2. RFC 1831 (Proposed Standard),
August 1995. Obsoleted by RFC 5531.
[31] Topology discovery with ryu, 2014. [Online; http://sdn-lab.com/2014/12/31/topology-discovery-with-ryu/].
[32] Setting up openvswitch 2.0 + mininet 2.1+ ubuntu 13.04, 2013. [Online; http://sdn-lab.com/2013/11/14/settingup-openvswitch-2-0-mininet-2-1/].
[33] Robert Daigneau. Service design patterns : fundamental design solutions for SOAP/WSDL and restful Web
services. Addison-Wesley, 2012.
[34] Ryu development team. Ryubook 1.0, 2014. [Online; http://osrg.github.io/ryu-book/en/html/index.html].
[35] Hao He. What is service-oriented architecture, september 2003. [Online; http://www.xml.com/lpt/a/1292].
[36] Dave
Marshall.
Remote
procedure
http://www.cs.cf.ac.uk/Dave/C/node33.html].
123
calls
(rpc),
March
1999.
[Online;
REST API for SDN Code
Django
Settings
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
"""
Django settings for SDN project.
For more information on this file, see
https://docs.djangoproject.com/en/1.7/topics/settings/
For the full list of settings and their values, see
https://docs.djangoproject.com/en/1.7/ref/settings/
"""
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
import os
BASE_DIR = os.path.dirname(os.path.dirname(__file__))
# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/1.7/howto/deployment/checklist/
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = ’xhm-zr6&p_i%9kb5#=y)n1p6#p%5d!jx5tq#i-^l^lfzbx!_b5’
# SECURITY WARNING: don’t run with debug turned on in production!
DEBUG = False
TEMPLATE_DEBUG = False
ALLOWED_HOSTS = []
# Application definition
INSTALLED_APPS = (
#’django.contrib.admin’,
#’django.contrib.auth’,
#’django.contrib.contenttypes’,
#’django.contrib.sessions’,
#’django.contrib.messages’,
#’django.contrib.staticfiles’,
#’serverconnector’,
)
CACHES={
’default’:{
’BACKEND’:’django.core.cache.backends.memcached.MemcachedCache
’,
’LOCATION’:’127.0.0.1:11211’,
45
46
},
47 }
48
49 MIDDLEWARE_CLASSES = (
50
#’django.contrib.sessions.middleware.SessionMiddleware’,
51
#’django.middleware.common.CommonMiddleware’,
52
#’django.middleware.csrf.CsrfViewMiddleware’,
53
#’django.contrib.auth.middleware.AuthenticationMiddleware’,
54
#’django.contrib.auth.middleware.SessionAuthenticationMiddleware’,
1
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
#’django.contrib.messages.middleware.MessageMiddleware’,
#’django.middleware.clickjacking.XFrameOptionsMiddleware’,
’serverconnector.middleware.HttpAuthMiddleware’,
’serverconnector.middleware.ConentNegotiationMiddleware’,
’django.middleware.cache.UpdateCacheMiddleware’,
’django.middleware.cache.FetchFromCacheMiddleware’,
)
CACHE_MIDDLEWARE_ALIAS = "default"
CACHE_MIDDLEWARE_SECONDS = 5*60
CACHE_MIDDLEWARE_KEY_PREFIX = ""
ROOT_URLCONF = ’SDN.urls’
WSGI_APPLICATION = ’SDN.wsgi.application’
# Database
# https://docs.djangoproject.com/en/1.7/ref/settings/#databases
DATABASES = {
’default’: {
’ENGINE’: ’django.db.backends.sqlite3’,
’NAME’: os.path.join(BASE_DIR, ’db.sqlite3’),
}
}
# Internationalization
# https://docs.djangoproject.com/en/1.7/topics/i18n/
LANGUAGE_CODE = ’en-us’
TIME_ZONE = ’UTC’
USE_I18N = True
USE_L10N = True
USE_TZ = True
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.7/howto/static-files/
STATIC_URL = ’/static/’
2
URL patterns
1 from django.conf.urls import patterns, include, url
2 from django.contrib import admin
3 from serverconnector.views import TopologyBookmark,FullFlowView,SwFlowView,
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SingleFlowView,StatsView,Bookmark,SwitchListView,LinkListView
from django.views.decorators.csrf import csrf_exempt
from django.views.decorators.cache import cache_page
from django.views.decorators.vary import vary_on_headers
urlpatterns = patterns(’’,
url(r’^$’,cache_page(60*5)(vary_on_headers(’Accept’,’WWW-Authorization’)(
Bookmark.as_view())), name=’bookmark’),
url(r’^topology/$’,cache_page(60*5)(vary_on_headers(’Accept’,’WWWAuthorization’)(TopologyBookmark.as_view())), name=’topo-book’),
url(r’^topology/switches/$’, cache_page(60*1)(vary_on_headers(’Accept’,’WWWAuthorization’)(SwitchListView.as_view())), name=’switch-list’),
url(r’^topology/links/$’,cache_page(60*1)(vary_on_headers(’Accept’,’WWWAuthorization’)(LinkListView.as_view())),name=’ful-link-list’),
url(r’^topology/links/(?P<pk>\d+)/$’, cache_page(60*1)(vary_on_headers(’Accept
’,’WWW-Authorization’)(LinkListView.as_view())), name=’link-list’),
url(r’^flows/$’, cache_page(30)(vary_on_headers(’Accept’,’WWW-Authorization’)(
FullFlowView.as_view())), name=’full-flow-list’),
url(r’^flows/(?P<dpid>\d+)/$’, cache_page(30)(vary_on_headers(’Accept’,’WWWAuthorization’)(SwFlowView.as_view())), name=’sw-flow-list’),
url(r’^flows/(?P<dpid>\d+)/(?P<flow>\d+)/$’, cache_page(30)(vary_on_headers(’
Accept’,’WWW-Authorization’)(SingleFlowView.as_view())), name=’flow-view’)
,
url(r’^statistics/(?P<dpid>\d+)/(?P<port>\d+)/’, cache_page(1)(vary_on_headers
(’Accept’,’WWW-Authorization’)(StatsView.as_view())), name=’statistics’),
)
3
Views
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
from django.shortcuts import render
from django.http import HttpResponse
from django.views.generic.base import View
import requests, sys, json, yaml
from django.utils.decorators import method_decorator
from django.views.decorators.csrf import csrf_exempt
from django.contrib.auth.decorators import login_required
from django.core.urlresolvers import reverse
from django.views.decorators.vary import vary_on_headers
basedir = ’http://10.0.1.2:8080’
bookmark_obj={
’topologylink’: ’topology/’,
’fullflowlink’: ’flows/’,
}
topology_bookmark_obj = {
’Swithceslink’:’topology/switches/’,
’Linklink’:’topology/links/’,
}
class Bookmark(View):
types={
’application’:[
’json’,
’yaml’
]
}
default = {
’*/*’:’application/json’,
’application’:’application/json’,
}
def get(self,request,*args,**kwargs):
response = HttpResponse(json.dumps(bookmark_obj))
return HttpResponse(response)
class TopologyBookmark(View):
types={
’application’:[
’json’,
’yaml’
]
}
default = {
’*/*’:’application/json’,
’application’:’application/json’,
}
def get(self,request,*args,**kwargs):
if request.META[’HTTP_ACCEPT’] == ’application/vnd+SDN.bookmark+json’:
response = HttpResponse(json.dumps(topology_bookmark_obj))
else:
response = HttpResponse(yaml.dump(topology_bookmark_obj))
return HttpResponse(response)
class SwitchListView(View):
types = {
’application’:[
4
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
’json’,
’yaml’,
]
}
default = {
’*/*’:’application/json’,
’application’:’application/json’,
}
def get(self,request, *args, **kwargs):
uri = basedir + request.path
try:
r = requests.get(uri)
if r.status_code == 200:
sw_list = yaml.load(r.text)
for sw in sw_list:
sw[’linkslink’] = str(reverse(’link-list’, kwargs={’pk’: int(sw[’dpid’])
}))
sw[’flowlistlink’] = str(reverse(’sw-flow-list’, kwargs={’dpid’: int(sw
[’dpid’])}))
for port in sw[’ports’]:
port[’statslink’] = str(reverse(’statistics’, kwargs={’dpid’: int(sw[’
dpid’]), ’port’:int(port[’port_no’])}))
return HttpResponse(json.dumps(sw_list))
else:
return HttpResponse(status=r.status_code)
except:
return HttpResponse(status=500)
81
82
83
84
85
86
87 class LinkListView(View):
88
types = {
89
’application’:[
90
’json’,
91
’yaml’,
92
]
93
}
94
default = {
95
’*/*’:’application/json’,
96
’application’:’application/json’,
97
}
98
def get(self,request, *args, **kwargs):
99
uri = basedir + request.path
100
try:
101
r = requests.get(uri)
102
if r.status_code == 200 and r.text!=None:
103
link_list = yaml.load(r.text)
104
return HttpResponse(json.dumps(link_list))
105
else:
106
return HttpResponse(status=r.status_code)
107
except:
108
return HttpResponse(status=500)
109
110 class FullFlowView(View):
111
types = {
112
’application’:[
113
’json’,
114
’yaml’,
115
]
116
}
117
default = {
5
118
119
120
121
122
123
124
125
126
127
128
129
130
131
’*/*’:’application/json’,
’application’:’application/json’,
}
def get(self, request, *args,**kwargs):
uri = basedir + request.path
r = requests.get(uri)
try:
r = requests.get(uri)
if r.status_code == 200:
flow_list = yaml.load(r.text)
new_flow_list = []
for flow in flow_list:
new_flow = {’flow_id’:flow,’detailflowlink’:reverse(’flow-view’,kwargs
={’dpid’:flow[’dpid’],’flow’:flow[’id’]})}
new_flow_list.append(new_flow)
return HttpResponse(json.dumps(new_flow_list))
else:
return HttpResponse(status=r.status_code)
except:
return HttpResponse(status=500)
132
133
134
135
136
137
138
139 class SwFlowView(View):
140
http_method_names = [’get’,’post’,’options’]
141
142
types = {
143
’application’: [
144
’json’,
145
’yaml’,
146
]
147
}
148
default = {
149
’*/*’:’application/json’,
150
’application’:’application/json’,
151
}
152
153
def get(self,request,*args,**kwargs):
154
uri = basedir + request.path
155
try:
156
r = requests.get(uri)
157
if r.status_code == 200:
158
flow_list = yaml.load(r.text)
159
for flow in flow_list:
160
if flow != ’age’:
161
flow_list[flow][’detailflowlink’] = reverse(’flow-view’,kwargs={’dpid
162
163
164
165
166
167
168
169
170
171
’:flow_list[flow][’dpid’],’flow’:flow})
return HttpResponse(json.dumps(flow_list))
else:
return HttpResponse(status=r.status_code)
except:
return HttpResponse(status=500)
valid_keys = [’in_port’,’in_phy_port’,’metadata’,’eth_dst’,’eth_src’,’eth_type
’,’vlan_vid’,’ip_dscp’,’ip_ecn’,’ip_proto’,
’ipv4_src’,’ipv4_dst’,’tcp_src’,’tcp_dst’,’udp_src’,’udp_dst’,’sctp_src
’,’sctp_dst’,’icmpv4_type’,’icmpv4_code’,
’arp_op’,’arp_spa’,’arp_tpa’,’arp_sha’,’arp_tha’,’ipv6_src’,’ipv6_dst’,’
ipv6_flabel’,’icmpv6_type’,’icmpv6_code’,
’ipv6_nd_target’,’ipv6_nd_sll’,’ipv6_nd_tll’,’mpls_label’,’mpls_tc’,’
mpls_bos’,’pbb_isid’,’tunnel_id’,’ipv6_exthdr’]
172
6
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
def isflow(self,flow):
keys = flow.keys()
if len(keys) == 4:
if ’d_id’ in keys and ’priority’ in keys and ’conditions’ in keys and ’
out_port’ in keys:
for key in flow[’conditions’]:
if key not in self.valid_keys:
return False
return True
return False
def post(self,request,*args,**kwargs):
if request.META[’CONTENT_TYPE’] == ’application/json’:
print "not here"
try:
print "lol1"
data=request.body.decode(’utf-8’)
flow_list = yaml.load(data)
except:
print "lol2"
return HttpResponse(status=500)
print(flow_list)
for flow in flow_list:
print(flow)
if not self.isflow(flow):
print "data"
return HttpResponse(status=400)
try:
r = requests.post(basedir + request.path, data = request.body.decode(’
utf-8’))
return HttpResponse(status=r.status_code)
except:
print "server"
return HttpResponse(status=500)
print "yeshere"
return HttpResponse(status=400)
201
202
203
204
205
206
207
208 class SingleFlowView(View):
209
types = {
210
’application’:[
211
’json’,
212
’yaml’,
213
]
214
}
215
default = {
216
’*/*’:’application/json’,
217
’application’:’application/json’
218
}
219
def get(self,request,*args,**kwargs):
220
uri = basedir + request.path
221
try:
222
r = requests.get(uri)
223
if r.status_code == 200:
224
flow = yaml.load(r.text)
225
return HttpResponse(json.dumps(flow))
226
else:
227
return HttpResponse(status=r.status_code)
228
except:
229
return HttpResponse(status=500)
230
231
def delete(self,request,*args,**kwargs):
7
232
uri = basedir + request.path
233
try:
234
r = requests.delete(uri)
235
return HttpResponse(status=r.status_code)
236
except:
237
return HttpResponse(status=500)
238
239 class StatsView(View):
240
types = {
241
’application’:[
242
’json’,
243
’yaml’,
244
]
245
}
246
default = {
247
’*/*’:’application/json’,
248
’application’:’application/json’,
249
}
250
def get(self,request,*args,**kwargs):
251
uri = basedir + request.path
252
try:
253
r = requests.get(uri)
254
if r.status_code == 200:
255
flow = yaml.load(r.text)
256
return HttpResponse(json.dumps(flow))
257
else:
258
return HttpResponse(status=r.status_code)
259
except:
260
return HttpResponse(status=500)
8
Middleware
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from serverconnector.authconfig import authconfig,credentialslist
from myutils import get_class
from django.http import HttpResponse
import json, yaml
class HttpAuthMiddleware():
def process_view(self,request,view_func,view_args,view_kwargs):
realm = authconfig[view_func.__name__]
if realm == None:
return None
else:
if request.META.has_key(’HTTP_AUTHORIZATION’):
[auth, credentials]=request.META[’HTTP_AUTHORIZATION’].split(’ ’,1)
if auth.lower() == ’basic’:
auth = credentials.strip().decode(’base64’)
username, password = auth.split(’:’, 1)
if username in credentialslist[realm]:
if credentialslist[realm][username]==password:
request.META[’HTTP_AUTHORIZATION’]=True
return None
response = HttpResponse(status=401)
response[’WWW-Authenticate’] = "Basic realm=\"%s\"" % (realm)
return response
class ConentNegotiationMiddleware():
def process_view(self,request,view_func,view_args,view_kwargs):
if request.META.has_key(’HTTP_ACCEPT’):
aux=request.META[’HTTP_ACCEPT’].replace(’ ’,’’)
allowed_types=aux.split(’,’)
best_fit_type={’general’:’*’,’specific’:’*’,’q’:0,’default’:True}
func_class = get_class(view_func.__module__, view_func.__name__)
availiable_types = func_class.types
default = func_class.default
try:
for elem in allowed_types:
chunks = elem.split(’;’,1)
if len(chunks) == 1:
q=1
elif len(chunks) == 2:
q = chunks[1].replace("q=","")
else:
request.META[’MYTEST’] = "hello"
return HttpResponse(json.dumps(availiable_types),status=406)
if float(q) > 0.0 and float(q) <= 1.0:
[general, specific] = chunks[0].split(’/’,1)
if general in availiable_types:
if specific in availiable_types[general]:
if float(q) > float(best_fit_type[’q’]):
best_fit_type = {’general’:general,’specific’:specific,’q’:q,’
default’:False}
elif float(q) == float(best_fit_type[’q’]) and best_fit_type[’
specific’]==’*’:
best_fit_type = {’general’:general,’specific’:specific,’q’:q,’
default’:False}
elif specific == ’*’:
if float(q) > float(best_fit_type[’q’]):
best_fit_type = {’general’:general,’specific’:specific,’q’:q,’
default’:False}
9
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
elif general == ’*’ and specific == ’*’:
best_fit_type = {’general’:general,’specific’:specific,’q’:q,’
default’:False}
if best_fit_type[’default’]==True:
return HttpResponse(json.dumps(availiable_types),status=406)
else:
if best_fit_type[’general’] == ’*’:
request.META[’HTTP_ACCEPT’]="%s" % default[’*/*’]
elif best_fit_type[’specific’] == ’*’:
request.META[’HTTP_ACCEPT’]="%s" % default[best_fit_type[’general’]]
else:
request.META[’HTTP_ACCEPT’]="%s/%s" % (best_fit_type[’general’],
best_fit_type[’specific’])
return None
except:
return HttpResponse(json.dumps(availiable_types),status=406)
else:
request.META[’HTTP_ACCEPT’]="%s" % default[’*/*’]
return None
def process_response(self,request, response):
last = request.META[’HTTP_ACCEPT’]
last = last[-4:]
if response.status_code==200 and response.content != None and last == ’yaml’:
aux = yaml.load(response.content.decode(’utf-8’))
if type(aux) == list :
response.content = yaml.dump_all(aux)
else:
response.content = yaml.dump(aux)
response[’Content-Type’] = request.META[’HTTP_ACCEPT’]
return response
10
Authentication config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
authconfig={
’Bookmark’:None,
’TopologyBookmark’:None,
’SwitchListView’:’Topo’,
’LinkListView’:’Topo’,
’FullFlowView’:’Flow’,
’SwFlowView’:’Flow’,
’SingleFlowView’:’Flow’,
’StatsView’:’Stats’,
}
credentialslist={
’Flow’:{’root’:’root’,’flowuser’:’flowpwd’},
’Topo’:{’root’:’root’,’topouser’:’topopwd’},
’Stats’:{’root’:’root’,’statsuser’:’statspwd’},
}
Utils
1 from django.utils import importlib
2
3 def get_class(module_name, cls_name):
4
try:
5
module = importlib.import_module(module_name)
6
except ImportError:
7
raise ImportError(’Invalid class path: {}’.format(module_name))
8
try:
9
cls = getattr(module, cls_name)
10
except AttributeError:
11
raise ImportError(’Invalid class name: {}’.format(cls_name))
12
else:
13
return cls
11
Ryu
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import json, sys, MySQLdb, time
from wsgiref.handlers import format_date_time
from operator import attrgetter
from webob import Response
from ryu.app.wsgi import ControllerBase, WSGIApplication, route
from ryu.base import app_manager
from ryu.lib import dpid as dpid_lib
from ryu.topology.api import get_switch, get_link, get_all_switch, get_all_link
from ryu.controller.handler import set_ev_cls
from ryu.controller import ofp_event
from ryu.controller.handler import DEAD_DISPATCHER, MAIN_DISPATCHER
from ryu.ofproto import ofproto_v1_3
from ryu.lib import hub
myapp_instance_name = ’MyApp’
class MyApp(app_manager.RyuApp):
_CONTEXTS = {
’wsgi’: WSGIApplication
}
OFP_VERSIONS = [ofproto_v1_3.OFP_VERSION]
def __init__(self, *args, **kwargs):
super(MyApp, self).__init__(*args, **kwargs)
wsgi = kwargs[’wsgi’]
wsgi.register(MyAppRestController, {myapp_instance_name: self})
self.port_thread = hub.spawn(self._port_monitor)
self.flow_thread = hub.spawn(self._flow_monitor)
self.flow_list={}
self.switches={}
self.monitored_switches={}
self.db = SQLDB()
self.previous_read = {}
self.flow_refresh_rate=3
self.port_refresh_rate=3
self.flow_time={}
self.count=1
@set_ev_cls(ofp_event.EventOFPStateChange, MAIN_DISPATCHER)
def add_switch(self, ev):
try:
datapath = ev.datapath
d_id = datapath.id
except:
print("Error Occurred")
self.switches[d_id]=datapath
self.logger.info("Switch %s UP", d_id)
@set_ev_cls(ofp_event.EventOFPStateChange, DEAD_DISPATCHER)
def delete_switch(self,ev):
datapath = ev.datapath
d_id = datapath.id
if d_id in self.switches:
self.switches.pop(d_id)
self.logger.info("Switch %s DOWN", d_id)
def add_flow(self, d_id, priority, conditions, out_port, buffer_id=None):
datapath = self.switches[int(d_id)]
12
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
match = parser.OFPMatch(**conditions)
if str(out_port) == "BROADCAST":
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [
parser.OFPActionOutput(ofproto.OFPP_FLOOD)])]
else:
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [
parser.OFPActionOutput(int(out_port))])]
self.logger.info("datapath: %s,conditions = %s, output = %s" % (d_id,
conditions,out_port))
mod = parser.OFPFlowMod(datapath=datapath, priority=priority,
match=match, instructions=inst, cookie=self.
count)
self.count+=1
datapath.send_msg(mod)
self.logger.info("New Flow Added")
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
def remove_table_flows(self, dpid, flow_id):
"""Create OFP flow mod message to remove flows from table."""
datapath = self.switches[int(dpid)]
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
flow = self.flow_list[int(flow_id)]
match = datapath.ofproto_parser.OFPMatch(**flow[’match’])
out_port = flow[’actions’][0][’OFPActionOutput’]
table_id = flow[’table_id’]
if str(out_port) == "BROADCAST":
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [
parser.OFPActionOutput(ofproto.OFPP_FLOOD)])]
else:
inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [
parser.OFPActionOutput(int(out_port))])]
flow_mod = datapath.ofproto_parser.OFPFlowMod(datapath, 0, 0,table_id,
ofproto.OFPFC_DELETE,0, 0,1,ofproto.OFPCML_NO_BUFFER,ofproto.OFPP_ANY,
ofproto.OFPG_ANY, 0,match, inst)
datapath.send_msg(flow_mod)
def _port_monitor(self):
while True:
for dp in self.switches.values():
self._request_port_stats(dp)
hub.sleep(self.port_refresh_rate)
def
_flow_monitor(self):
while True:
for dp in self.switches.values():
self._request_flow_stats(dp)
hub.sleep(self.flow_refresh_rate)
def _request_port_stats(self, datapath):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
req = parser.OFPPortStatsRequest(datapath, 0, ofproto.OFPP_ANY)
datapath.send_msg(req)
def _request_flow_stats(self,datapath):
ofproto = datapath.ofproto
parser = datapath.ofproto_parser
13
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
req = parser.OFPFlowStatsRequest(datapath)
datapath.send_msg(req)
def serialize_flow_list(self,dpid=None,flow=None):
if dpid != None:
if flow != None:
aux = self.flow_list[flow]
else:
aux = {key: value for key, value in self.flow_list.items() if
value[’dpid’] == int(dpid)}
aux[’age’]=time.time()-self.flow_time[int(dpid)]
else:
aux = []
for flow in self.flow_list:
aux.append({’id’:flow,’dpid’:self.flow_list[flow][’dpid’]})
return json.dumps(aux, ensure_ascii=False, encoding=’utf8’)
@set_ev_cls(ofp_event.EventOFPFlowStatsReply, MAIN_DISPATCHER)
def _flow_stats_reply_handler(self, ev):
body = ev.msg
datapath = ev.msg.datapath.id
data_json=body.to_jsondict()
self.flow_list = {key: value for key, value in self.flow_list.items() if
value[’dpid’] != datapath}
self.flow_time[int(datapath)]=time.time()
for flow in data_json[’OFPFlowStatsReply’][’body’]:
cookie = flow[’OFPFlowStats’][’cookie’]
if cookie != 0 and cookie not in self.flow_list:
flow[’OFPFlowStats’][’match’][’OFPMatch’][’oxm_fields’][0][’OXMTlv
’].pop(’mask’)
match_list = flow[’OFPFlowStats’][’match’][’OFPMatch’][’oxm_fields
’]
condition_list={}
for condition in match_list:
condition_list[condition[’OXMTlv’][’field’]]=condition[’OXMTlv
’][’value’]
actionlist = flow[’OFPFlowStats’][’instructions’][0][’
OFPInstructionActions’][’actions’]
for action in actionlist:
for actionname in action:
if actionname == ’OFPActionOutput’:
action[actionname]=action[actionname][’port’]
else:
action[actionname].pop(’max_len’)
action[actionname].pop(’len’)
action[actionname].pop(’type’)
table_id = flow[’OFPFlowStats’][’table_id’]
flow={’match’:condition_list,’actions’:actionlist,’table_id’:
table_id,’dpid’:datapath}
self.flow_list[cookie]=flow
@set_ev_cls(ofp_event.EventOFPPortStatsReply, MAIN_DISPATCHER)
def _port_stats_reply_handler(self, ev):
body = ev.msg.body
for stat in body:
dp_id = ev.msg.datapath.id.__str__()
if dp_id in self.previous_read:
if str(stat.port_no) in self.previous_read[str(dp_id)]:
old=self.previous_read[str(dp_id)][str(stat.port_no)]
14
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
self.db.record_stats(ev.msg.datapath.id,stat.port_no,
stat.rx_packets-old[’
rx_packets’],
stat.rx_bytes-old[’rx_bytes’],
stat.rx_errors-old[’rx_errors
’],
stat.tx_packets-old[’
tx_packets’],
stat.tx_bytes-old[’tx_bytes’],
stat.tx_errors-old[’tx_errors
’])
old=None
self.previous_read[str(dp_id)][str(stat.port_no)]={}
else:
self.previous_read[str(dp_id)]={}
self.previous_read[str(dp_id)][str(stat.port_no)]={}
self.previous_read[str(dp_id)][str(stat.port_no)][’rx_packets’]=stat.
rx_packets
self.previous_read[str(dp_id)][str(stat.port_no)][’rx_bytes’]=stat.
rx_bytes
self.previous_read[str(dp_id)][str(stat.port_no)][’rx_errors’]=stat.
rx_errors
self.previous_read[str(dp_id)][str(stat.port_no)][’tx_packets’]=stat.
tx_packets
self.previous_read[str(dp_id)][str(stat.port_no)][’tx_bytes’]=stat.
tx_bytes
self.previous_read[str(dp_id)][str(stat.port_no)][’tx_errors’]=stat.
tx_errors
185
186 class MyAppRestController(ControllerBase):
187
188
def __init__(self, req, link, data, **config):
189
super(MyAppRestController, self).__init__(req, link, data, **config)
190
self.myapp = data[myapp_instance_name]
191
192
@route(’sw_list’,’/topology/switches/’,methods=[’GET’])
193
def get_sw_list(self,req, **kwargs):
194
switch_list = get_all_switch(self.myapp)
195
body = json.dumps([switch.to_dict() for switch in switch_list])
196
return Response(body=body)
197
198
@route(’all_links’,’/topology/links/’, methods=[’GET’])
199
def get_all_links(self,req,**kwargs):
200
links = get_all_link(self.myapp)
201
response = json.dumps([link.to_dict() for link in links])
202
return Response(body=response)
203
204
@route(’sw_link_list’,’/topology/links/{dpid}/’,methods=[’GET’])
205
def get_link_list(self,req, **kwargs):
206
if int(kwargs[’dpid’]) not in self.myapp.switches:
207
return Response(body=None,status_code=404)
208
links = get_link(self.myapp, int(kwargs[’dpid’]))
209
body = json.dumps([link.to_dict() for link in links])
210
return Response(body=body)
211
212
@route(’flow_table’,’/flows/’,methods=[’GET’])
213
def get_flow_list(self,req, **kwargs):
214
body = self.myapp.serialize_flow_list()
215
return Response(body=body)
216
217
@route(’sw_flow_table’,’/flows/{dpid}/’,methods=[’GET’],requirements = {’dpid
15
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
’:’\d+’})
def get_sw_flow_list(self,req, **kwargs):
if int(kwargs[’dpid’]) not in self.myapp.switches:
return Response(body=None,status_code=404)
body = self.myapp.serialize_flow_list(kwargs[’dpid’])
return Response(body=body)
@route(’flow’,’/flows/{dpid}/{flow}/’,methods=[’GET’], requirements = {’dpid
’:’\d+’,’flow’:’\d+’})
def get_single_flow(self,req, **kwargs):
if int(kwargs[’dpid’]) not in self.myapp.switches or int(kwargs[’flow’])
not in self.myapp.flow_list:
return Response(body=None,status_code=404)
body = self.myapp.serialize_flow_list(dpid=kwargs[’dpid’],flow=int(kwargs
[’flow’]))
return Response(body=body)
@route(’add_flow’,’/flows/{dpid}/’,methods=[’POST’],requirements = {’dpid’:’\d
+’})
def put_flow_into_list(self,req,**kwargs):
if int(kwargs[’dpid’]) not in self.myapp.switches:
return Response(body=None,status_code=404)
try:
data = eval(req.body)
except:
return Response(staus_code=400)
for flow in data:
result = self.myapp.add_flow(kwargs[’dpid’],int(flow[’priority’]),flow
[’conditions’],flow[’out_port’])
if result == False:
return Response(body=None,status_code=404)
return Response(status_code=200)
@route(’delete_flow’, ’/flows/{dpid}/{flow}/’, methods=[’DELETE’],
requirements = {’dpid’:’\d+’,’flow’:’\d+’})
def delete_flow(self, req,**kwargs):
if int(kwargs[’dpid’]) not in self.myapp.switches or int(kwargs[’flow’])
not in self.myapp.flow_list:
return Response(body=None,status_code=404)
self.myapp.remove_table_flows(kwargs[’dpid’],kwargs[’flow’])
return Response(status_code=200)
@route(’get_statistics’, ’/statistics/{dpid}/{port}/’,methods=[’GET’],
requirements = {’dpid’:’\d+’,’port’:’\d+’})
def get_statistics(self, req, **kwargs):
if int(kwargs[’dpid’]) not in self.myapp.switches:
return Response(body=None,status_code=404)
stats = self.myapp.db.get_statistics(kwargs[’dpid’],kwargs[’port’],10)
if stats == "":
return Response(body=None,status_code=404)
return Response(body=stats)
253
254
255
256
257
258
259
260
261
262 class SQLDB():
263
264
def __init__(self):
265
self.db=MySQLdb.connect(host=’127.0.0.1’,user=’root’, passwd=’root’, db=’
266
267
268
SDN’)
self.c = self.db.cursor()
def get_statistics(self, dpid, port, limit):
16
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
action = "SELECT rx_pkts,rx_bytes,rx_error,tx_pkts,tx_bytes,tx_error,time
FROM switch_%s_port_%s ORDER BY time DESC LIMIT %s"%(dpid,port,limit)
try:
self.c.execute(action)
except:
return False
jsonlist=[]
for (rx_pkts,rx_bytes,rx_error,tx_pkts,tx_bytes,tx_error,date) in self.c:
resultdict ={}
resultdict[’rx_pkts’] = rx_pkts
resultdict[’rx_bytes’] = rx_bytes
resultdict[’rx_error’] = rx_error
resultdict[’tx_pkts’] = tx_pkts
resultdict[’tx_bytes’] = tx_bytes
resultdict[’tx_error’] = tx_error
resultdict[’date’] = str(date)
jsonlist.append(resultdict)
results = json.dumps(jsonlist)
return results
def record_flow_entries(self, dpid, match, instructions, table_id):
try:
self.c.execute("INSERT INTO flow_stats (dpid,flow_match,instructions,
table_id) VALUES (’%s’,’%s’,’%s’,’%s’)"%(dpid,match,instructions,
table_id))
except Exception as e:
pass
def record_stats(self, dpid, port, rx_pkt,rx_b, rx_e, tx_pkt, tx_b, tx_errors)
:
try:
self.c.execute(’INSERT INTO switch_%s_port_%s (rx_pkts,rx_bytes,rx_error,
tx_pkts,tx_bytes,tx_error) VALUES (%s,%s,%s,%s,%s,%s)’%(dpid,port,rx_pkt
,rx_b,rx_e,tx_pkt,tx_b,tx_errors))
self.db.commit()
except:
self.c.execute("CREATE TABLE switch_%s_port_%s LIKE base_table" % (
dpid,port))
self.db.commit()
self.c.execute(’INSERT INTO switch_%s_port_%s (rx_pkts,rx_bytes,
rx_error,tx_pkts,tx_bytes,tx_error) VALUES (%s,%s,%s,%s,%s,%s)’%(
dpid,port,rx_pkt,rx_b,rx_e,tx_pkt,tx_b,tx_errors))
self.db.commit()
17
Application demonstration of use
For this demonstration, the mininet virtualized network contains two switches using Open Flow 1.3, a link between
them exists and each of them is connected to a host (h1, h2) to perform tests.
$ s u d o mn −−t o p o l i n e a r , 2 −−mac −−s w i t c h o v s k −− c o n t r o l l e r r e m o t e
$ s u d o ovs−v s c t l s e t B r i d g e s 1 p r o t o c o l s =OpenFlow13
$ s u d o ovs−v s c t l s e t B r i d g e s 2 p r o t o c o l s =OpenFlow13
Topology
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 /
Response:
1
{"fullflowlink": "flows/", "topologylink": "topology/"}
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y /
Response:
1
{"Linklink": "topology/links/", "Swithceslink": "topology/switches/"}
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −v
Response:
* About t o c o n n e c t ( ) t o 1 0 . 0 . 0 . 2 p o r t 80 ( # 0 )
Trying 1 0 . 0 . 0 . 2 . . . connected
*
> GET / t o p o l o g y / s w i t c h e s / HTTP / 1 . 1
> User−Agent : c u r l / 7 . 2 2 . 0 ( x86_64−pc−l i n u x −gnu ) l i b c u r l / 7 . 2 2 . 0 OpenSSL / 1 . 0 . 1 z l i b / 1 . 2 . 3 . 4
libidn /1.23 librtmp /2.3
> Host : 1 0 . 0 . 0 . 2
> Accept : */*
>
* HTTP 1 . 0 , assume c l o s e a f t e r body
< HTTP / 1 . 0 401 UNAUTHORIZED
< D a t e : F r i , 10 J u l 2015 0 8 : 0 2 : 3 6 GMT
< S e r v e r : WSGIServer / 0 . 1 P y t h o n / 2 . 7 . 3
< C o n t e n t −Type : * / *
< WWW
−A u t h e n t i c a t e : B a s i c r e a l m =" Topo "
<
* C l o s i n g c o n n e c t i o n #0
i t h c e s l i n k ": " topology / switches /"}
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t
Response:
1
{"Linklink": "topology[{"flowlistlink": "/flows/1/", "ports": [{"hw_addr":
"0a:b8:aa:8b:e2:4f", "statslink": "/statistics/1/1/", "name": "s1-eth1",
"port_no": "00000001", "dpid": "0000000000000001"}, {"hw_addr": "02:b1:
b2:3b:9a:03", "statslink": "/statistics/1/2/", "name": "s1-eth2", "
port_no": "00000002", "dpid": "0000000000000001"}], "linkslink": "/
topology/links/1/", "dpid": "0000000000000001"}, {"flowlistlink": "/
flows/2/", "ports": [{"hw_addr": "92:0d:30:2c:40:94", "statslink": "/
statistics/2/1/", "name": "s2-eth1", "port_no": "00000001", "dpid": "000
0000000000002"}, {"hw_addr": "26:83:73:b5:6a:2a", "statslink": "/
1
statistics/2/2/", "name": "s2-eth2", "port_no": "00000002", "dpid": "000
0000000000002"}], "linkslink": "/topology/links/2/", "dpid": "0000000000
000002"}, {"flowlistlink": "/flows/3/", "ports": [], "linkslink": "/
topology/links/3/", "dpid": "0000000000000003"}]/links/", "Swithceslink"
: "topology/switches/"}
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / l i n k s / −u r o o t : r o o t
Response:
1
[{"src": {"hw_addr": "26:83:73:b5:6a:2a", "name": "s2-eth2", "port_no": "00
000002", "dpid": "0000000000000002"}, "dst": {"hw_addr": "02:b1:b2:3b:9a
:03", "name": "s1-eth2", "port_no": "00000002", "dpid": "000000000000000
1"}}, {"src": {"hw_addr": "02:b1:b2:3b:9a:03", "name": "s1-eth2", "
port_no": "00000002", "dpid": "0000000000000001"}, "dst": {"hw_addr": "2
6:83:73:b5:6a:2a", "name": "s2-eth2", "port_no": "00000002", "dpid": "00
00000000000002"}}]
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / l i n k s / 1 / −u r o o t : r o o t
Response:
1
[{"src": {"hw_addr": "02:b1:b2:3b:9a:03", "name": "s1-eth2", "port_no": "00
000002", "dpid": "0000000000000001"}, "dst": {"hw_addr": "26:83:73:b5:6a
:2a", "name": "s2-eth2", "port_no": "00000002", "dpid": "000000000000000
2"}}]
Flows
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / −u r o o t : r o o t
Response:
1
[]
Client:
c u r l −X POST h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 1 / −d @add_flows −u r o o t : r o o t −H " C o n t e n t −t y p e : a p p l i c a t i o n
/ j s o n " −v
add_flows:
1
[{"d_id":1,"priority":1,"conditions":{"in_port":1},"out_port":2},{"d_id":1,
"priority":1,"conditions":{"in_port":2},"out_port":1}]
Response
*
*
*
>
>
>
>
>
>
>
About t o c o n n e c t ( ) t o 1 0 . 0 . 0 . 2 p o r t 80 ( # 0 )
Trying 1 0 . 0 . 0 . 2 . . . connected
Server auth using Basic with user ’ root ’
POST / f l o w s / 1 / HTTP / 1 . 1
A u t h o r i z a t i o n : B a s i c cm9vdDpyb290
User−Agent : c u r l / 7 . 2 2 . 0 ( x86_64−pc−l i n u x −gnu ) l i b c u r l / 7 . 2 2 . 0 OpenSSL / 1 . 0 . 1 z l i b / 1 . 2 . 3 . 4
libidn /1.23 librtmp /2.3
Host : 1 0 . 0 . 0 . 2
Accept : */*
C o n t e n t −t y p e : a p p l i c a t i o n / j s o n
C o n t e n t −L e n g t h : 129
2
>
*
*
<
<
<
<
<
<
*
u p l o a d c o m p l e t e l y s e n t o f f : 129 o u t o f 129 b y t e s
HTTP 1 . 0 , assume c l o s e a f t e r body
HTTP / 1 . 0 200 OK
D a t e : F r i , 10 J u l 2015 0 8 : 1 6 : 2 8 GMT
S e r v e r : WSGIServer / 0 . 1 P y t h o n / 2 . 7 . 3
Vary : Accept , WWW
−A u t h o r i z a t i o n
C o n t e n t −Type : a p p l i c a t i o n / j s o n
C l o s i n g c o n n e c t i o n #0
Client:
c u r l −X POST h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 2 / −d @add_flows −u r o o t : r o o t −H " C o n t e n t −t y p e : a p p l i c a t i o n
/ j s o n " −v
Response
*
*
*
>
>
>
>
>
>
>
>
*
*
<
<
<
<
<
<
*
About t o c o n n e c t ( ) t o 1 0 . 0 . 0 . 2 p o r t 80 ( # 0 )
Trying 1 0 . 0 . 0 . 2 . . . connected
Server auth using Basic with user ’ root ’
POST / f l o w s / 2 / HTTP / 1 . 1
A u t h o r i z a t i o n : B a s i c cm9vdDpyb290
User−Agent : c u r l / 7 . 2 2 . 0 ( x86_64−pc−l i n u x −gnu ) l i b c u r l / 7 . 2 2 . 0 OpenSSL / 1 . 0 . 1 z l i b / 1 . 2 . 3 . 4
libidn /1.23 librtmp /2.3
Host : 1 0 . 0 . 0 . 2
Accept : */*
C o n t e n t −t y p e : a p p l i c a t i o n / j s o n
C o n t e n t −L e n g t h : 129
u p l o a d c o m p l e t e l y s e n t o f f : 129 o u t o f 129 b y t e s
HTTP 1 . 0 , assume c l o s e a f t e r body
HTTP / 1 . 0 200 OK
D a t e : F r i , 10 J u l 2015 0 8 : 1 8 : 2 5 GMT
S e r v e r : WSGIServer / 0 . 1 P y t h o n / 2 . 7 . 3
Vary : Accept , WWW
−A u t h o r i z a t i o n
C o n t e n t −Type : a p p l i c a t i o n / j s o n
C l o s i n g c o n n e c t i o n #0
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 1 / −u r o o t : r o o t
Response:
1
{"1": {"detailflowlink": "/flows/1/1/", "table_id": 0, "actions": [{"
OFPActionOutput": 2}], "match": {"in_port": 1}, "dpid": 1}, "age": 2.707
3638439178467, "2": {"detailflowlink": "/flows/1/2/", "table_id": 0, "
actions": [{"OFPActionOutput": 1}], "match": {"in_port": 2}, "dpid": 1}}
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 2 / −u r o o t : r o o t
Response
1
{"age": 0.3208048343658447, "3": {"detailflowlink": "/flows/2/3/", "
table_id": 0, "actions": [{"OFPActionOutput": 2}], "match": {"in_port":
1}, "dpid": 2}, "4": {"detailflowlink": "/flows/2/4/", "table_id": 0, "
actions": [{"OFPActionOutput": 1}], "match": {"in_port": 2}, "dpid": 2}}
On mininet:
m i n i n e t > h1 p i n g −c 1 h2
PING 1 0 . 0 . 0 . 2 ( 1 0 . 0 . 0 . 2 ) 5 6 ( 8 4 ) b y t e s o f d a t a .
64 b y t e s from 1 0 . 0 . 0 . 2 : i c m p _ r e q =1 t t l =64 t i m e = 0 . 5 0 9 ms
−−− 1 0 . 0 . 0 . 2 p i n g s t a t i s t i c s −−−
3
1 p a c k e t s t r a n s m i t t e d , 1 r e c e i v e d , 0% p a c k e t l o s s , t i m e 0ms
r t t min / avg / max / mdev = 0 . 5 0 9 / 0 . 5 0 9 / 0 . 5 0 9 / 0 . 0 0 0 ms
Client:
c u r l −X DELETE h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 2 / 3 / −u r o o t : r o o t
Response: None
On mininet:
m i n i n e t > h1 p i n g −c 1 h2
PING 1 0 . 0 . 0 . 2 ( 1 0 . 0 . 0 . 2 ) 5 6 ( 8 4 ) b y t e s o f d a t a .
64 b y t e s from 1 0 . 0 . 0 . 2 : i c m p _ r e q =1 t t l =64 t i m e = 0 . 5 0 9 ms
−−− 1 0 . 0 . 0 . 2 p i n g s t a t i s t i c s −−−
1 p a c k e t s t r a n s m i t t e d , 1 r e c e i v e d , 0% p a c k e t l o s s , t i m e 0ms
r t t min / avg / max / mdev = 0 . 5 0 9 / 0 . 5 0 9 / 0 . 5 0 9 / 0 . 0 0 0 ms
Client:
c u r l −X POST h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 2 / −d @ a d d _ s i n g l e −u r o o t : r o o t −H " C o n t e n t −t y p e :
application / json "
add_single:
1
[{"d_id":2,"priority":1,"conditions":{"in_port":1},"out_port":2}]
Response: None
Statistics
For this section the value of the port refresh rate has been set to 1 per second so it is
Client:
c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / s t a t i s t i c s / 1 / 1 / −d @ a d d _ s i n g l e −u r o o t : r o o t −H " C o n t e n t −t y p e :
a p p l i c a t i o n / j s o n " && s l e e p 10 && c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / s t a t i s t i c s / 1 / 1 / −d
@ a d d _ s i n g l e −u r o o t : r o o t −H " C o n t e n t −t y p e : a p p l i c a t i o n / j s o n "
Right after executing this command, on mininet:
mininet > i p e r f
Response
1
2
3
4
5
6
7
8
9
10
[{"tx_bytes": 51, "date": "2015-07-10 10:58:02", "rx_bytes": 0, "rx_pkts":
0, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:58:01", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:58:00", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:57:59", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:57:58", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:57:57", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:57:56", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:57:55", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:57:54", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:57:53", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0}]
4
11
12
14
15
16
17
18
19
20
21
22
[{"tx_bytes": 51, "date": "2015-07-10 10:58:12", "rx_bytes": 0, "rx_pkts":
0, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:58:11", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:58:10", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0},
{"tx_bytes": 451161, "date": "2015-07-10 10:58:09", "rx_bytes": -3995999330
, "rx_pkts": 6835, "tx_error": 0, "tx_pkts": 6836, "rx_error": 0},
{"tx_bytes": 3448023, "date": "2015-07-10 10:58:08", "rx_bytes": 2286001006
, "rx_pkts": 52243, "tx_error": 0, "tx_pkts": 52243, "rx_error": 0},
{"tx_bytes": 3937281, "date": "2015-07-10 10:58:07", "rx_bytes": -152970022
0, "rx_pkts": 63174, "tx_error": 0, "tx_pkts": 59656, "rx_error": 0},
{"tx_bytes": 3546759, "date": "2015-07-10 10:58:06", "rx_bytes": 2383469390
, "rx_pkts": 54471, "tx_error": 0, "tx_pkts": 53739, "rx_error": 0},
{"tx_bytes": 3646281, "date": "2015-07-10 10:58:05", "rx_bytes": -181992188
6, "rx_pkts": 56845, "tx_error": 0, "tx_pkts": 55180, "rx_error": 0},
{"tx_bytes": 3224035, "date": "2015-07-10 10:58:04", "rx_bytes": 2176716210
, "rx_pkts": 49701, "tx_error": 0, "tx_pkts": 48849, "rx_error": 0},
{"tx_bytes": 51, "date": "2015-07-10 10:58:03", "rx_bytes": 0, "rx_pkts": 0
, "tx_error": 0, "tx_pkts": 1, "rx_error": 0}]
Grphical representation:
Transmitted bytes on iperf test
4500000
4000000
3500000
3000000
Bytes
13
2500000
2000000
1500000
1000000
500000
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
Seconds
Graphical representation of the statistics retreived
Cache
Having a wireshark instance capturing br0 and another one caputring br1 filtering HTTP:
Client:
c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t && s l e e p 10 && c u r l −X GET 1 0 . 0 . 0 . 2 /
t o p o l o g y / s w i t c h e s / −u r o o t : r o o t && s l e e p 60 && c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u
root : root
Result on figure 2.
5
Wireshark capture of a cached response
Content Negotiation
c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t −H " A c c e p t : a p p l i c a t i o n / yaml ; q = 0 . 5 ,
a p p l i c a t i o n / * ; q = 0 . 8 " &&
c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t −H " A c c e p t : * / * ; q = 0 . 5 , a p p l i c a t i o n / yaml ; q
=0.8"
Result:
1
[{"flowlistlink": "/flows/1/", "ports": [{"hw_addr": "0a:b8:aa:8b:e2:4f", "
statslink": "/statistics/1/1/", "name": "s1-eth1", "port_no": "00000001"
, "dpid": "0000000000000001"}, {"hw_addr": "02:b1:b2:3b:9a:03", "
statslink": "/statistics/1/2/", "name": "s1-eth2", "port_no": "00000002"
, "dpid": "0000000000000001"}], "linkslink": "/topology/links/1/", "dpid
": "0000000000000001"}, {"flowlistlink": "/flows/2/", "ports": [{"
hw_addr": "92:0d:30:2c:40:94", "statslink": "/statistics/2/1/", "name":
"s2-eth1", "port_no": "00000001", "dpid": "0000000000000002"}, {"hw_addr
": "26:83:73:b5:6a:2a", "statslink": "/statistics/2/2/", "name": "s2-eth
2", "port_no": "00000002", "dpid": "0000000000000002"}], "linkslink": "/
topology/links/2/", "dpid": "0000000000000002"}]
2
3
4
5
6
7
8
9
10
11
12
dpid: ’0000000000000001’
flowlistlink: /flows/1/
linkslink: /topology/links/1/
ports:
- {dpid: ’0000000000000001’, hw_addr: ’0a:b8:aa:8b:e2:4f’, name: s1-eth1,
port_no: ’00000001’,
statslink: /statistics/1/1/}
- {dpid: ’0000000000000001’, hw_addr: ’02:b1:b2:3b:9a:03’, name: s1-eth2,
port_no: ’00000002’,
statslink: /statistics/1/2/}
--dpid: ’0000000000000002’
6
13
14
15
16
17
18
19
flowlistlink: /flows/2/
linkslink: /topology/links/2/
ports:
- {dpid: ’0000000000000002’, hw_addr: ’92:0d:30:2c:40:94’, name: s2-eth1,
port_no: ’00000001’,
statslink: /statistics/2/1/}
- {dpid: ’0000000000000002’, hw_addr: ’26:83:73:b5:6a:2a’, name: s2-eth2,
port_no: ’00000002’,
statslink: /statistics/2/2/}
7