Estudi de les API RESTFUL A Degree Thesis Submitted to the Faculty of the Escola Tècnica d'Enginyeria de Telecomunicació de Barcelona Universitat Politècnica de Catalunya by Francesc Garcia Peña In partial fulfilment of the requirements for the degree in TELEMATICS ENGINEERING Advisor: José Luis Muñoz Tapia Barcelona, July 2015 Abstract To mitigate the lack of standardization that technologies such as RMI and CORBA had, around the year 2000 two technologies with different approaches were developed. Microsoft patented a protocol named SOAP that standardizes the client-server interactions. On the other side, Dr. Roy Fielding presented his PhD thesis, where he defined an architectural style named Representational State Transfer. This document has three main objectives: first, to study the REST architectural style from three points of view: theory, practice and implementation. Second, to generate documentation that allows other ETSETB students to learn about REST. Finally, to apply the learned concepts to Software Defined Networks applications and generate documentation about it. The study of REST has shown that systems that apply this architectural style acquire some desired characteristics. Most of them are not quantifiable and others that are quantifiable depend on the implementation that the developer has done. Even so, it's the dominant technology in the distributed API field at the moment. 2 Resum Per mitigar la falta d'estandardització que tenien tecnologies com RMI i CORBA, al voltant de l'any 2000 es van desenvolupar dos tecnologies amb enfocaments diferents. Microsoft va patentar un protocol anomenat SOAP que estandarditza les interaccions entre client i servidor. D'altra banda el Dr. Roy Fielding va presentar la seva tesis doctoral, on va definir un estil arquitectònic anomenat Representational State Transfer. Aquest document te tres objectius: Primerament, l'estudi de l'estil arquitectònic REST des de tres punts de vista: teoria, pràctica i implementació. El segon objectiu es generar documentació que permeti altres estudiants de l'ETSETB aprendre sobre REST. Finalment, l'aplicació dels conceptes apresos a les aplicacions de Software Defined Networks i generar documentació sobre aquesta aplicació. L'estudi de REST demostra que els sistemes que apliquen aquesta estil arquitectònic guanyen certes característiques positives. La majoria d'elles no son quantificables i altres, que son qualificables depenen de la implementació que el programador fa. Tot i això, és la tecnologia dominant en el camp de les APIS distribuïdes. 3 Resumen Para mitigar la falta de estandarización que tienen tecnologías como RMI y CORBA, sobre el año 2000 se desarrollaron dos tecnologías con diferentes enfoques. Microsoft patentó un protocolo llamado SOAP que estandariza las interacciones entre cliente y servidor. Por otro lado, el Dr. Roy Fielding presentó su tesis doctoral, donde definió un estilo arquitectónico llamado Representational State Transfer. Este documento tiene tres objetivos: primeramente, el estudio del estilo arquitectónico REST des de tres puntos de vista: teoría, practica e implementación. El segundo objetivo es generar documentación que permita otros estudiantes de ETSETB aprender sobre REST. Finalmente, la aplicación de los conceptos aprendidos a las aplicaciones de Software Defined Networks i generar documentación sobre dicha aplicación. El estudio de REST demuestra que los sistemas que aplican éste estilo arquitectónico ganan ciertas características positivas. La mayoría de ellas no son cuantificables y otras, que sí son cuantificables dependen de la implementación que el programador hace. Aún así, es la tecnología dominante en el campo de las APIS distribuidas. 4 Acknowledgements Per començar voldria agrair a la meva família el seu recolzament incondicional, les seves paraules d'ànim quan les coses no han anat bé i tot l'esforç que han fet per a que pugui arribar fins aquí. Seguidament, vull agrair a la meva parella, Judit, que sempre estigui al meu costat i que em doni els ànims i l'empenta que necessito a vegades. Seguidamente querría agradecer a Carlos que compartiera conmigo los conocimientos que había aprendido con su trabajo. Para acabar, quiero agradecer a mi tutor, José Luis por ofrecerme éste trabajo, por la libertad que me ha dado para elaborar el trabajo, por toda la documentación que me ha facilitado cuando la he necesitado y por todo lo que ha puesto de su parte para evitarme viajes desde que supo que no era de Barcelona. 5 Revision history and approval record Revision Date Purpose 0 26/06/2015 Document creation 1 03/07/2015 Document revision 2 05/07/2015 Document revision 3 08/07/2015 Document revision 4 10/07/2015 Document final revision and approval DOCUMENT DISTRIBUTION LIST Name e-mail Francesc Garcia Peña [email protected] José Luis Muñoz Tapia [email protected] Written by: Reviewed and approved by: Date 26/06/2015 Date 10/07/2015 Name Francesc Garcia peña Name Jose Luis Muñoz Tapia Position Project Author Position Project Supervisor 6 Table of contents Abstract............................................................................................................................. 2 Resum............................................................................................................................... 3 Resumen........................................................................................................................... 4 Acknowledgements............................................................................................................ 5 Revision history and approval record................................................................................. 6 Table of contents................................................................................................................ 7 List of Figures.................................................................................................................... 9 List of Tables:................................................................................................................... 10 1.Introduction................................................................................................................... 11 1.1.Project work plan.................................................................................................... 12 1.1.1.Tasks:.............................................................................................................. 12 1.1.2.Milestones....................................................................................................... 14 1.1.3.Gantt diagram.................................................................................................. 15 2.State of the art of the technology used or applied in this thesis:................................... 16 2.1.HTTP...................................................................................................................... 16 2.1.1.cURL................................................................................................................ 17 2.1.2.Requests......................................................................................................... 17 2.2.XML........................................................................................................................ 17 2.3.JSON..................................................................................................................... 17 2.4.DJANGO................................................................................................................ 18 2.5.RYU....................................................................................................................... 18 2.6.LXC........................................................................................................................ 18 2.7.MININET................................................................................................................ 19 2.8.OPENFLOW........................................................................................................... 19 3.Methodology / project development:............................................................................. 20 3.1.Rest study.............................................................................................................. 20 3.1.1.Client-Server....................................................................................................... 20 3.1.2.Stateless............................................................................................................. 20 3.1.3.Cache.................................................................................................................. 21 3.1.4.Uniform Interface................................................................................................. 21 3.1.5.Layered System.................................................................................................. 22 7 3.1.6.Code on demand................................................................................................. 22 3.1.7.Implementational approach.............................................................................. 22 3.2.Documentation Generation.................................................................................... 22 3.3.Rest SDN Application............................................................................................. 23 3.3.1.Define Functionalities:..................................................................................... 23 3.3.2.Define resources:............................................................................................. 23 3.3.3.Define Resource representation:..................................................................... 23 3.3.4.HATEOAS........................................................................................................ 24 3.3.5.Cache.............................................................................................................. 25 3.3.6.Implementing................................................................................................... 25 4.Results.......................................................................................................................... 27 5.Budget.......................................................................................................................... 28 6.Conclusions and future development:........................................................................... 29 Bibliography:.................................................................................................................... 30 Glossary.......................................................................................................................... 32 Appendices:..................................................................................................................... 34 7.RESTful APIS book................................................................................................... 34 8.SDN REST API......................................................................................................... 34 9.API Demonstration.................................................................................................... 34 8 List of Figures Gant Diagram…………………………………………………………………………………….15 Finite state machine representation of the SDN API………………………………………...24 Virtualization topology…………………………………………………………………………..25 Apendix 1 List of figures: 1.1 Client - Server architecture………………………….……………………………………….6 2.1 HTTP client/server………………………………….……………………………………….10 2.2 How HTTP cookies work…………………………..……………………………………….13 2.3 HTTP Proxies……………………………………..…………………………………………14 2.4 How CGIs work in HTTP………………………….………………………………………..15 2.5 An HTML form viewed from a browser……………..……………………………………..16 2.6 2 Multiple Persistent Connections with an HTTP Server……………………………….19 3.1 Finite state machine representation……………………………..………………………..41 3.2 Layered system example……………………………………….…...……………………..41 3.3 Flight finite state machine representation…………………………….…………………..43 4.1 Cache model………………………………………………………………………………...47 6.1 Django’s request-response procedure………………………………..…………………..58 8.1 App and Controller interconnection…………………………………..…………………...98 8.2 Monitoring system topology procedure………………………………………………….107 8.3 Authentication algorithm procedure……………………………………………………...108 8.4 Content Negotiation algorithm procedure…………………………….…………………110 8.5 Topology generated with lxc containers procedure…………………...………………..117 8.6 Wireshark captures from a POST request…………………………..………………….121 Appendix 3 List of figures: 1 Graphical representation of the statistics retreived………………………………………….5 2 Wireshark capture of a cached response…………………………………………………….6 9 List of Tables: 1-List of resources………………………………………………………………………………23 2-Project Budget……...…………………………………………………………………………28 Annex 1 list of tables: 2.1 HTTP Status Codes………………………………………………………………………...22 2.2 Common mime types……………………………………………………………………….23 2.3 Cache-Control header directives…………………………………………………………..25 2.4 Commands for WWW………………………………………………………………………30 6.1 Django’s important field list………………………………………………………………...60 6.2 Django’s important field options list……………………………………………………….61 6.3 Django’s important QuerySet methods…………………………………………………...63 6.4 Django’s HttpRequest object’s attributes…………………………………………………63 8.1 List of URIs…………………………………………………………………………………..99 10 1. Introduction In computer programming, an application programming interface (API) is a set of routines, protocols, and tools for building software applications. An API expresses a software component in terms of its operations, inputs, outputs, and underlying types. An API defines functionalities that are independent of their respective implementations, which allows definitions and implementations to vary without compromising each other. A good API makes it easier to develop a program by providing all the building blocks. A programmer then puts the blocks together. Some APIS are offered in a distributed system context such as a client-server topology, where the client and the server communicate though networking protocols. As a first approach to develop APIS able to be used in a distributed system scenario, many platforms developed their own technologies, such as CORBA, Java Remote Method Invocation (RMI), DCOM, or .NET Remoting. Some difficulties appeared here, the most important one was the incompatibility between technologies: the client and the server had to use the same technologies. Furthermore some vendor implementations/toolkits had troubles talking to each other due to the lack of standardization. To avoid the lack of standardization the web services were created. The web services include a set of technologies standardized basically by two organisations: W3C and Oasis. The W3C defines web services as follows: “A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards”. SOAP (Simple Object Access Protocol) is a protocol used to exchange structured information. It was developed by Microsoft, who has the main company interested in web services development. Even if in its name it specifies that it's an 'Object Access' protocol, it's only a past reminiscence of the past technologies that were used to access objects (DCOM). In the present, SOAP is used to use services instead of accessing objects. However, at the same time that web services were developed, Dr. Roy Fielding developed his thesis Architectural Styles and the Design of Network-based Software Architectures where he defines a new architectural style called REST (Representational State Transfer). In this thesis I will deeply analyse the REST architectural style from three points of view: • The theoretical one: I will study the Dr. Fielding's dissertation. • The practical one: I will study how to apply REST style to the API design. • The implementational one: I will try to study technologies that can be used in the design of APIS that follow the REST style (from protocols and standards to programming frameworks and packages). Once I have studied the three previous points of view I will develop a REST API for a SDN application using Ryu as a framework to create the application. 11 I will study the Ryu framework from the point of view of REST APIS development, and I will develop a simple API that offers monitoring functionalities such as traffic statistics retrieval. Finally, as a derivative of this study a book oriented towards education purposes for ETSETB will be created. The content of this book will have as much of a practical approach to the subjects as possible. It will be filled with examples to a better understanding of the REST architectural style and it's appliance to Software Defined networks. It is this project's main result and can be found in the appendices on section ????. To be able to carry out this project the many skills have been needed, such as the ability to autonomously learn complex concepts and tools (HTTP, JSON, XML, Django, Ryu, LaTeX), the ability to be able to express learnt concepts in a schematic and clear way, and the creativity and knowledge to design examples that demonstrate the main characteristics from the studied technologies. This whole project has been developed by using open source tools over a laptop computer running Linux Mint 17,1 “Rebbecca”. It could have been carried out on any midrange computer with 4GB of RAM or more. 1.1. Project work plan 1.1.1. Tasks: Project: Major constituent: Familiarize with technologies WP ref: WP1 Sheet n of m Short description: Planned start date: 23/02/2015 Installation of all software needed (Python, LaTeX editor, Apache's Subversion, etc.) and getting used to work with it. Planned end date: 08/03/2015 Start event: End event: Internal task T1: Software Installation Deliverables: Dates: Internal task T2: Develop test to get used to the software Project: Major constituent: Python Review WP ref: WP2 Sheet n of m Short description: Planned start date: 09/03/2015 Familiarize with syntax and particularities of python language. Planned end date: 15/03/2015 Start event: Internal task T1: Study Python. End event: Deliverables: Dates: Internal task T2: Develop some test projects. 12 Project: Major constituent: Django Review WP ref: WP3 Sheet n of m Short description: Planned start date: 16/03/2015 Familiarize with syntax and particularities of Django Framework. Planned end date: 22/03/2015 Start event: Internal task T1: Study Django's particularities End event: Deliverables: Dates: Internal task T2: Develop some test projects Project: Major constituent: Study restful API WP ref: WP4 Sheet n of m Short description: Planned start date: 23/03/2015 In depth study of restful API. Planned end date: 26/04/2015 Start event: End event: Deliverables: Dates: Project: Major constituent: Compare other API architectures WP ref: WP5 Sheet n of m Short description: Planned start date: 27/04/2015 Compare restful with other viable API architectures. Planned end date: 10/05/2015 Start event: Internal task T1: Study other API architectures (SOAP, XMLRPC) End event: Deliverables: Dates: Internal task T2: Compare other API architectures with restful Project: Major constituent: Examples Developing WP ref: WP6 Sheet n of m Short description: Planned start date: 16/03/2015 Generate useful examples that will be included in documentation. Planned end date: 31/05/2015 Start event: Internal task T1: Study Django's particularities End event: Deliverables: Dates: Internal task T2: Develop some test projects 13 Project: Major constituent: Generate Documentation WP ref: WP7 Sheet n of m Short description: Planned start date: 6/04/2015 Generate all the documentation about restful APIs and Django. Planned end date: 21/06/2015 Start event: End event: Deliverables: Dates: Project: Major constituent: Develop a RESTFul API WP ref: WP8 Sheet n of m Short description: Planned start date: 11/05/2015 Implement a REST API that offers monitoring functionalities for a Planned end date: 21/06/2015 SDN application Start event: End event: Deliverables: Dates: Project: Major constituent: Generate docummentation about the SDN REST API WP ref: WP9 Sheet n of m Short description: Planned start date: 11/05/2015 Generate necessary documentation for a complete understandin of the principles and technologies used in the creation of the SDN REST API. Planned end date: 21/06/2015 Start event: End event: Deliverables: Dates: 1.1.2. Milestones WP# 3 4, 5 6,7 8,9 Task# Short title Python, Django Restful study Generate Documentation and examples Restful API Milestone / deliverable Date (week) 4 11 17 17 14 1.1.3. Gantt diagram Tasks Familiarize with technologies Python review Django review Study restful API Compare w/ other API Examples developing Generate documentation Developing a Restful API Generate documentation API Figure 1: Gantt Diagram F 1 March 2 3 4 5 April 6 7 8 9 May 10 11 12 13 June July 14 15 16 17 18 19 2. State of the art of the technology used or applied in this thesis: This project starts as a part of a bigger project which objective is studying SDN networks. Having in mind that SDN are a vast subject and that it would be impossible for a single student to study it in depth in the period of time given to elaborate this project, the supervisor of the project narrowed it down to some independent themes that can be studied individually. This project, as stated before, will focus on the study of APIS that allow the communication between the Application layer and the Control layer. More concretely it will focus on REST APIS, since they're very popular at the moment -many important enterprises such as Google or Twitter use them-. The REST architectural style does not specify on top of which protocols the applications should be built, which format the communication messages should follow or which technologies should be used on the server side. However, in practice, the applications that follow the REST constrains are built on top of HTTP and use one of the common internet formats: XML or JSON. In the server side technologies there's much more diversity. In this thesis Django framework has been used to implement the server. To create the SDN API, the communication protocol used between the switches and the controller will be Open Flow 1.3 and the framework used to develop the application will be Ryu. Finally, in this project two different virtualization technologies have been used: Linux Containers and Mininet. Since the scope of this project is studying this technologies, apply them and generate documentation to learn how to use it correctly, in this section there will only be a brief explanation of the technologies used. 2.1. HTTP HTTP (Hypertext Transfer Protocol) is an application protocol that runs on top of TCP. Its first version (HTTP/1.0) was specified on 1996 (rfc1945). After that a new version (HTTP/1.) was released in 1999 (rfc2616) and it has remained as the standard version until now. It has received some minor updates to add more functionalities. The protocol received a big update in June 2014 but it remains still on version 1.1. It is now defined in multiple rfc's: RFC7230: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing RFC7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content RFC7232: Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests RFC7233: Hypertext Transfer Protocol (HTTP/1.1): Range Requests RFC7234: Hypertext Transfer Protocol (HTTP/1.1): Caching RFC7235: Hypertext Transfer Protocol (HTTP/1.1): Authentication HTTP defines standardized semantics in an application level. 16 It defines different methods, which are meant to define different kind of actions. There are 8 of them: GET, POST, PUT, DELETE, OPTIONS, HEAD, TRACE and CONNECT. It also defines multiple headers, which add information about the data that a message carries. For example, you can specify the format of the information contained in a message with the Content-Type header. Finally, it defines status codes, which indicate standard responses from the server such as “Everything went ok” (200) or “Not found” (404). Another important key aspect of HTTP is that it's a stateless protocol, the response originated to a message does not depend from previous requests. The HTTP protocol will be the protocol used throughout this thesis because it's singularities match perfectly with the REST constrains. 2.1.1. cURL curl is an open source command line tool and library for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMTP, SMTPS, Telnet and TFTP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, HTTP/2, cookies, user+password authentication (Basic, Plain, Digest, CRAM-MD5, NTLM, Negotiate and Kerberos), file transfer resume, proxy tunnelling and more. 2.1.2. Requests Requests is a python HTTP library that will be used in this project to perform HTTP requests against a server. It's very simple to use and it will match perfectly this document's didactic ideology. 2.2. XML XML (eXtensible Markup Language) defines a set of rules for encoding documents in a readable form. An XML document is a ``text'' file, i.e a string of characters coded with UTF8 or with an ISO standard like ISO-8859-1 (Latin1). The characters which make up an XML document are divided into \textit{markup} and \textit{content}. All strings which constitute markup either begin with the character "<" and end with a ">", or begin with the character "\&" and end with a ";". Strings which are not markup are content. In particular, a \textit{tag} is a markup construct that begins with "<" and ends with ">". Tags come in three flavors: start-tags, for example <section> end-tags, for example </section> empty-element tags, for example <line-break /> XML is widely used on the web and it will be used to format the data in HTTP messages. 2.3. JSON JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition 17 December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language. JSON is widely used to format data in HTTP messages in APIS because even if it's not easily readable it is easy to parse. 2.4. DJANGO Django is a high-level Phyton Web framework built to make the task of developing web applications much easier. It's power comes from the ability to separate the application development from the low-level hassles such as database connection. Another important aspect of Django is the modularity that brings. A project is a set of applications assembled. Even if Django is a WEB framework and it's not meant to be used to build APIs it has many properties that are useful: Django models use DAO design pattern. This pattern allows the developer to avoid the coupling between the application designed and the database connection. Django middleware can be useful to design multi-layered systems. Because of the order in which the middleware components are executed, they behave like if every one of them was a distinct layer of the system. This allows the developer to design the whole system before deploying it, avoiding the need of using multiple machines or virtualization tools. Django pattern based URIS. Django framework does not only accept exact URIS but also pattern-based URIS. It will ease the task of recognising URIS that vary depending on the resource desired. 2.5. RYU Ryu is a component-based framework for Software-Defined Networking applications. It provides software components with well defined API that make it easy to create network management and control applications. Ryu supports various protocols for managing network devices, such as OpenFlow, Netconf, OF-config, etc. Ryu supports OpenFlow 1.0, 1.2, 1.3, 1.4. It's fully developed in Python and all of the code is freely available under the Apache 2.0 license. 2.6. LXC With several functionalities added to the Linux kernel, it has become very easy to isolate Linux processes into their own little environments. Isolation tools allow to build containers, which are a lightweight virtualization technology. While hardware virtualizations or para-virtualizations provide virtual machines, containers are an operating system-level virtualization method for running multiple isolated Linux systems (containers) on a single host. With containers, a single Linux kernel is shared between the host and the virtual machines. Containers can achieve higher densities of isolated environments than when using virtual machines. 18 2.7. MININET Mininet is a network emulator. It runs a collection of end-hosts, switches, routers, and links on a single Linux kernel. It uses lightweight virtualization to make a single system look like a complete network, running the same kernel, system, and user code. 2.8. OPENFLOW OpenFlow is the first standard communications interface defined between the control and forwarding layers of an SDN architecture. OpenFlow allows direct access to and manipulation of the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisor-based). 19 3. Methodology / project development: This project is, as stated before, has been divided in two phases with two different methodologies. In the first part of the project, the main objective was the study of the REST architectural style and the generation of documentation that eases the task for future students that want to learn about it, knowing the difficulties that are found in the way. 3.1. Rest study This first part had three visions: theoretical, practical and implementational. In the first part, the task was to understand every detail of REST architectural style. The main source of information was Dr. Fielding's dissertation but it was a rough read. It defines the style in a very abstract way and it does not explain how to apply the explained concepts in the thesis to actual (or at least at the time the thesis was published) technologies. To solve this problem I started gathering information about REST in other sources such as books or online articles. Those books and articles focus on a single concept on rest: the resources and it's representations. It is the most important one, it's what defines its identity, but it's not the only one. The other properties from REST were not explained in many publication. Once I had my ideas clear about which were the main concepts required by REST applications, the next step was to learn how to apply this concepts to the development of an API. Finally, the last part was to find tools and protocols that could be used in the creation of APIS that follow the REST constrains. In the next part I will list the constrains defined in Dr Fielding's dissertation and a practical approach to them, defining (if any) the technologies that can be used to apply those constrains. 3.1.1. Client-Server This constrains requires separation of concerns between the client and the server. An intermediate block that separates interfaces from data. The best technology to apply are URIS. They allow to identify resources, but they don't need to be specifically files or directories on your server. It's the server task to interpret the URIS and understand the resource to which they're pointing. 3.1.2. Stateless The REST architectural style is defined on top of stateless communications. To accomplish this requirement, there are two conditions to be fulfilled: the first one is that the communication between the client and the server must be stateless and the second one is that the API built must be stateless too. The application protocol where we will build our applications (HTTP) is by nature stateless. The second requirement has to be kept present when designing APIS, but there's no technology or tool behind it. 20 3.1.3. Cache The responses from a requests must be labelled as cacheable or non-cacheable. Behind this constrain there are two questions: How to label the responses and which ones are cacheable and wich ones are not? The first question is easy to respond: The HTTP headers are the response. There's a whole rfc dedicated to HTTP/1.1 cache handling (rfc7234). About the second one, as a first approach is easy to say that the ones that have a high variation rate should not be cached and the ones with low variation rate should be cached. To find a more elaborate response, you can look for a compromise between the probability of data not being valid depending on the time since it was generated and the improvements that you find by caching a determined resource. Also you should keep in mind the possibility that a client can work with erroneous data when designing your API. 3.1.4. Uniform Interface This is the most iconic and reviewed part of the REST architectural style. It states that REST APIS have to apply the principle of 'generality to the component interface', meaning that everything in REST (any resource) has to be accessed through the same interface. It defines four sub-constrains: Identification of resources: every resource must have a unique identifier. Resource representation: a resource is never transferred to the client but a representation of it. Self-descriptive messages: messages must contain all the relevant information for a server to process them. Hypermedia as the engine of the application state: the client is guided by the server through the application state by sending 'paths' in the form of URIS. The first sub constrain is easy to accomplish: you only need to identify resources with unique identifiers, to build URIS for a resource, they have to contain the identifier for the server to be able to differentiate between different resources. The second one is not that easy to understand, but it's easy to apply. From the client side it has to be impossible to differentiate a resource that returns a file from one that returns the result of executing an algorithm. The server has to define representations for every resource and that's what he needs to transmit to the client. The resource representation metadata is defined by the HTTP headers that are carried by the responses but it's always referred to the data carried, not the way that the data was generated. To generate self-descriptive messages we have to use one more time HTTP headers. HTTP headers add the necessary metadata for the server to understand every request. Also, since the connections are stateless, the messages can't vary it's meaning because of previous interactions between the client and the server. Finally, the HATEOAS requirement is the most forgotten one from this list. Since there's no stored state, the server has to give to the client the information about where can it go inside the API. You can look at your API like if it was a finite state machine. When a client the API's bookmark is on the initial state, and you can guide them through the application 21 by providing links to the next possible resources in the application flow depending on the last resource that they accessed. 3.1.5. Layered System The basic configuration of a central node acting as a server is not scalable. Instead, multiple layered systems have to be deployed to achieve salability. To exemplify how a layered system is built, the last part of the project (link with SDN) is developed deploying a multi-layered system. 3.1.6. Code on demand It's an optional constrain that states that the server can extend the clients functionality by sending scripts that are executed on the client side. Since it's an optional constrain and due to the possible risks of executing code received from a remote location (Man in the middle attack for example) I've preferred not to detail this constrain. I've only listed some scripting languages that are widely used and stated that they can be potentially dangerous. 3.2. Theoretical benefits from using REST Studying the REST architectural style has shown how systems that apply it gain: portability, scalability, visibility, reliability, efficiency, improvement on the user-perceived performance, visibility of interaction and simplicity on the system architecture. However, it's hard to quantify this measures. Some of them don't have a measurable magnitude, and other that can be measured depend highly on the implementation that the developer has done. 3.2.1. Implementational approach Once the REST constrains were clear, the next step was to study how to apply the constrains to APIS generated with the Django framework, using HTTP as the application protocol: The implementational challenges to implement the constrains listed above are basically three: How to correctly manage the HTTP headers. Both the headers that are included in the requests and the ones that are included as part of the responses. Django defines it's own request and response objects. Those encapsulate a complete HTTP message: its content, its headers and other HTTP fields such as the HTTP method and the status code. How to address correctly the resources: Django framework offers URL resolvers. In Django's, URIS are parsed and depending on the results of the parsing, the requests are sent to different 'views', which are callable objects that return an arbitrary response that is not linked to the requests' URIS unless you want them to be. How to simulate a layered system with Django. Django contains a built in cache middleware that implements application-level data cache. Given its execution order, middleware components act like a layered system. 22 3.3. Documentation Generation Once I had already clear the REST constrains from the three points of view, I started generating the documentation. It was generated on Latex. Latex is a word processor which gets its own markup language as plaint text input and renders it into high quality typesetting text. It is widely used for writing publications or scientific documents. To communicate with my advisor and be able to discuss the documentation in an optimal way, we've been using subversion repositories. Subversion is an open source version control system which makes a good combination with Latex because the source files of Latex are text based and svn does not update whole documents but only the lines affected by the changes. The combination of both technologies allows to make little changes to a document simultaneously. The documentation generated about REST contains information about four topics: A chapter that explains the constrains that have been explained before fully developed, using numerous examples, written in an easy and intuitive way to understand and a simple methodology that can be followed in the design process of a REST API. A chapter were REST is compared to web services. A whole chapter that describes how to use the Django framework to generate APIS that follow the REST architectural style. Finally, the documentation contains two practice chapters. One designed for the student to practice the API design and another one dedicated to the use of Django as implementation tool. Both chapters have the practices resolved. 3.4. Rest SDN Application In this second part of the project, the main idea is to apply all the knowledge learnt in the first one in a SDN environment. More precisely I've developed a REST API using the SDN framework Ryu as controller, working with Open Flow 1.3 protocol to communicate with switches generated with mininet virtualization tool that use OpenVSwitch following the design metodology explained in the REST documentation: 3.4.1. Define Functionalities: The API has to be able to show the network topology. The API must show the network performance statistics. The API must show the routes installed in the switches and allow to add and delete them. 23 3.4.2. Define resources: Bookmark Topology Bookmark Switch List Switch's Flow List Link List Individual Flow Flow List Port statistics Table 1: List of resources 3.4.3. Define Resource representation: Switch List representation: [ { "dpid" : "0000000000000001", "ports" : [ { "dpid" : "0000000000000001", “hw_addr" : "16:7a:df:c9:02:e5", "name" : "s1-eth1", "port_no" : "00000001", "statslink" : "/statistics/1/1/" }, { "dpid" : "0000000000000001", "hw_addr" : "ba:98:17:d4:27:f4", "name" : "s1-eth2", "port_no" : "00000002", "statslink" : "/statistics/1/2/" }] } ] You can find the other representations in annex 3: 24 3.4.4. HATEOAS Figure 2: Finite State Machine representation of the REST SDN API 3.4.5. Cache For this application I've separated the resources in three categories: Low variation rate, medium variation rate, high variation rate and statistics. There are two resources with low variation rates: The bookmark and the topology bookmark. This resources, have a validity time of 5 minutes. Three resources have a medium variation rate: switches, links and switch's links. They must be updated every 1 minute. The three resources related with flows are updated every 30 seconds. Finally, the statistics data varies exactly every three seconds, which means that the validity time of this resource is 3 seconds. 3.4.6. Implementing This whole application will work on a visualised environment with three main virtual machines: a client, an intermediary server and a Ryu server. The objective is that the client communicates with the Django intermediary server and the intermediary communicates with the Ryu server: 25 Figure 3: Virtualization topology This scheme lowers the load from the Ryu server, since the Django server can perform many tasks that otherwise should have been done in the Ryu server. The Django server will receive the request and check the authentication credentials and the content negotiation headers. It also checks if the request URI corresponds with one of the resources and also if the data that the client sends matches one of the formats defined in 3.3.3. Finally it also acts as a cache: when a request is processed and returns a 2XX code the client the server stores in memory the response. If a new request arrives (requesting the same resource) if the stored data is still valid, it returns the stored response. Otherwise, it connects to the Ryu server and retrieves new information. The Ryu virtual machine has two tasks: The first one is to generate the virtual network that is going to be monitored and the second one is to execute a controller application. The controller application is fragmented in two parts: One part is the designed to send and receive the Open Flow messages and the other one is designed to manage the requests received from the Django server. The application code can be found on annex 2 and the procedures applied to create the network with virtualization technologies can be found on annex 1 (section 8.4.3) (plus the LxC configuration files and the software installed on each machine so the whole server can work) A deeper explanation of the SDN API implementation can be found on annex 1 and an example of use of the API can be found on annex 3. 26 4. Results The main result of this project is the documentation generated, which includes an introduction, the core explanation of REST architectural style, practices about the REST API development, a comparative between REST and other technologies applied on the web, a whole review of the Django framework (the parts implied on the API implementation), practices about the Django framework and finally a review of the RYU framework's functionalities that are needed to develop REST APIS. All of it filled with representative examples for the student to understand the topic better. The study of REST has shown that it's application provides some measurable and some non measurable benefits in front of other technologies (see Annex 1). Since it is an architectural style that the developer uses as a guide, the performance improvements of applying REST depend on the implementation that the developer does. Another result is the application of Django in the development of REST APIS. Django has good qualities desirable on a back-end server technology such as separation of concerns through MVC pattern, DAO models and middleware, capacity to adapt to changes through URL resolvers, a big community that has developed many pluggable components and several sources of information about the framework. When Django is used as an intermediary server, even if it keeps the qualities detailed above, it's not an optimal tool. Ideally an intermediary server would receive requests, perform some tasks on them and then forward them to the core server. The same happens with the responses. Since you can't forward requests and responses, you need to receive the requests and then establish a new connection to the core server. This causes the server to block the client connection until the core server has processed the new request and has generated a new response. Finally, regarding the application of REST APIS to SDN it's safe to say that REST APIS have qualities that fit perfectly on the SDN context. REST APIS are scalable, prepared for change, robust in front of partial failures and are versatile regarding data formats and codifications. Again, the performance benefits of applying REST APIS instead of other types of technologies will depend on the implementation that the developer does. 27 5. Budget This project has been carried out by using open source tools over a laptop of an estimate cost of 900€ and a Microsoft Office license has been needed in order to fill this document adding 269€ to the total amount. Because of the speed that technology evolves nowadays, an amortization period of 3 year is considered before the obsolescence of the hardware and software used. We will consider a residual cost of the computer of 100€. Four months of full-time work of a junior engineer is considered, resulting on a wage of 22000 €/year and 1833,33€/months after taxes. Concept Desktop Computer Microsoft Office 2013 Salary Price/month Price/project 22,20€ 88,80€ -- 269€ 1833,33€ 7333,33€ Total Price 7691,13€ Table 2: Project Budget 28 6. Conclusions and future development: Rest is a widely accepted architectural style that many big enterprises have adopted (Twitter, LinkedIn, Facebook, Amazon Product Advertising, etc.). It is widely used in APIS that have to be used massively due to the freedom that it brings. On the other side, on an enterprise environment SOAP is much more used. SOAP brings some aspects that REST does not cover such as web services policies which wile are important for inter-enterprise services, for other kind of applications are not required. The developer will have to decide before building his own API which technology is better suited for his needs. Regarding the SDN REST API, it is in an initial phase. This document has a demonstrative intention and there's still lots of details to describe and implement. For instance, the API developed runs on top of a dummy application that is only used to receive statistics messages and send flow modifications and statistics requests but ideally, even if a SDN REST API contains a monitoring part, it should be used to control the controller behaviour. 29 Bibliography: [1] Roy Thomas Fielding. Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine, 2000. [2] Paul Sobocinski. Hypermedia apis: The benefits of hateoas, 2014. [Online; http://www.programmableweb.com/news/hypermedia-apis-benefits-hateoas/howto/2014/02/27]. [3] Roy T. Fielding. Rest apis must be hypertext-driven, http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven]. 2008. [Online; [4] Leonard Richardson and Sam Ruby. RESTful Web Services. O’Reilly Media, 2007. [5] Joshua Thijssen. The restful cookbook. [Online; http://restcookbook.com/]. [6] Jim Webber, Savas Parastatidis, and Ian Robinson. How to get a cup of coffee, 2008. [Online; http://www.infoq.com/articles/webber-rest-workflow]. [7] Draft - make readable uris, 2004. [Online; http://www.w3.org/QA/2004/08/readableuri]. [8] Mike Amundsen. Roy fielding on versioning, hypermedia, and rest, 2014. [Online; http://www.infoq.com/articles/roy-fielding-on-versioning]. [9] T. Berners-Lee, L. Masinter, and M. McCahill. Uniform Resource Locators (URL). RFC 1738 (Proposed Standard), December 1994. Obsoleted by RFCs 4248, 4266, updated by RFCs 1808, 2368, 2396, 3986, 6196, 6270. [10] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. BernersLee. Hypertext Transfer Protocol – HTTP/1.1. RFC 2616 (Draft Standard), June 1999. Obsoleted by RFCs 7230, 7231, 7232, 7233, 7234, 7235, updated by RFCs 2817, 5785, 6266, 6585. [11] A. Barth. HTTP State Management Mechanism. RFC 6265 (Proposed Standard), April 2011. [12] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. RFC 7230 (Proposed Standard), June 2014. [13] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC 7231 (Proposed Standard), June 2014. [14] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests. RFC 7232 (Pro-posed Standard), June 2014. [15] R. Fielding, Y. Lafon, and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Range Requests. RFC 7233 (Proposed Standard), June 2014. [16] R. Fielding, M. Nottingham, and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Caching. RFC 7234 (Proposed Standard), June 2014. [17] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Authentication. RFC 7235 (Proposed Standard), June 2014. [18] Introducing json. [Online; http://www.json.org]. 30 [19] Use http basic authentification to login into http://ponytech.net/blog/use-http-basic-authentification-login]. django, 2014. [Online; [20] Jacob K. Moss Adrian Holovaty. The Definitive Guide to Django: Web Development Done Right. Apress, 2006. [21] Django documentation. [Online; https://docs.djangoproject.com/en/1.8/]. [22] Erik Christensen, Francisco Curbera, Greg Meredith, and Sanjiva Weerawarana. Web services description lan-guage (wsdl) 1.1, 2001. [Online; http://www.w3.org/TR/wsdl]. [23] Nilo Mitra and Yves Lafon. Soap version 1.2 part 0: Primer (second edition), 2007. [Online; www.w3.org/TR/soap12-part0/]. [24] Martin Gudgin, Marc Hadley, Noah Mendelsohn, Jean-Jacques Moreau, Henrik Frystyk Nielsen, Anish Kar-markar, and Yves Lafon. Soap version 1.2 part 1: Messaging framework (second edition), 2007. [Online; www.w3.org/TR/soap12-part1/]. [25] Hugo Haas and Allen http://www.w3.org/TR/ws-gloss/]. Brown. Web services [26] Don Box. A brief history of soap, http://www.xml.com/pub/a/ws/2001/04/04/soap.html]. glossary, April 2004. 2001. [Online; [Online; [27] Some thoughts for the enterprise embracing web apis, 2012. [Online; http://apievangelist.com/2012/12/09/some-thoughts-for-the-enterprise-embracing-webapis/]. [28] Douglas C. Schmidt. Overview of remote procedure calls (rpc). [Online; http://www.cs.wustl.edu/ schmidt/PDF/rpc4.pdf]. [29] From edi to xml and uddi: A brief history of web services, 2001. [Online; http://www.informationweek.com/from-edi-to-xml-and-uddi-a-brief-history-of-webservices/d/d-id/1012008]. [30] R. Srinivasan. RPC: Remote Procedure Call Protocol Specification Version 2. RFC 1831 (Proposed Standard), August 1995. Obsoleted by RFC 5531. [31] Topology discovery with ryu, 2014. [Online; http://sdn-lab.com/2014/12/31/topologydiscovery-with-ryu/]. [32] Setting up openvswitch 2.0 + mininet 2.1+ ubuntu 13.04, 2013. [Online; http://sdnlab.com/2013/11/14/setting-up-openvswitch-2-0-mininet-2-1/]. [33] Robert Daigneau. Service design patterns : fundamental design solutions for SOAP/WSDL and restful Web services. Addison-Wesley, 2012. [34] Ryu development team. Ryubook 1.0, 2014. [Online; http://osrg.github.io/ryubook/en/html/index.html]. [35] Hao He. What is service-oriented architecture, september 2003. [Online; http://www.xml.com/lpt/a/1292]. [36] Dave Marshall. Remote procedure http://www.cs.cf.ac.uk/Dave/C/node33.html]. calls (rpc), March 1999. [Online; 31 Glossary API: Application Programming Interface CoD: Code on Demand CORBA: Common Object Request Broker Architecture DAO: Data Access Object DCOM: Distributed Component Object Model HATEOAS: Hyperlink As The Engine Of Application State HTTP: Hypertext Transfer Protocol JSON: JavaScript Object Notation LxC: Linux Containers MVC: Model View Controller OF: Open Flow OVS: OpenVSwitch REST: Representational State Transfer RMI: Remote Method Invocation RPC: Remote Procedure Call SDN: Software Defined Networks SOA: Service Oriented Architecture SOAP: Simple Object Access Protocol SVN: Subversion UI: User Interface URI: Universal Resource Identifier 32 URL: Universal Resource Locator W3C: World Wide Web Consortium XML: eXtensible Markup Language 33 Appendices: 7. RESTful APIS book 8. SDN REST API 9. API Demonstration 34 REST API Book Contents 1 2 Introduction to APIs 1.1 Introduction . . . . . . . . . . . . . . 1.2 Distributed APIs . . . . . . . . . . . 1.3 Web Services . . . . . . . . . . . . . 1.3.1 Service Oriented Architecture 1.4 REST . . . . . . . . . . . . . . . . . 1.5 When to use web technologies? . . . . 1.6 What will you find in this book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 5 6 6 6 7 Web Technologies 2.1 History . . . . . . . . . . . . . . . . . . . . . 2.2 HTML Documents . . . . . . . . . . . . . . . 2.3 HTTP Motivation . . . . . . . . . . . . . . . . 2.4 URL/URI . . . . . . . . . . . . . . . . . . . . 2.5 HTTP 1.0 . . . . . . . . . . . . . . . . . . . . 2.5.1 HTTP Requests . . . . . . . . . . . . . 2.5.2 Headers . . . . . . . . . . . . . . . . . 2.5.3 HTTP Responses . . . . . . . . . . . . 2.6 Cookies . . . . . . . . . . . . . . . . . . . . . 2.7 HTTP Proxies . . . . . . . . . . . . . . . . . . 2.8 Dynamic Web . . . . . . . . . . . . . . . . . . 2.8.1 Introduction . . . . . . . . . . . . . . . 2.8.2 CGIs . . . . . . . . . . . . . . . . . . 2.8.3 HTML Forms . . . . . . . . . . . . . . 2.9 HTTP 1.1 . . . . . . . . . . . . . . . . . . . . 2.9.1 Introduction . . . . . . . . . . . . . . . 2.9.2 Headers . . . . . . . . . . . . . . . . . 2.9.3 Chunked Data . . . . . . . . . . . . . . 2.9.4 Persistent Connections . . . . . . . . . 2.9.5 Continue . . . . . . . . . . . . . . . . 2.9.6 Caching . . . . . . . . . . . . . . . . . 2.9.7 HTTP 1.1 Methods . . . . . . . . . . . 2.9.8 HTTP 1.1 Status codes . . . . . . . . . 2.9.9 HTTP 1.1 Representation Headers . . . 2.9.10 HTTP 1.1 Content-negotiation headers 2.9.11 HTTP 1.1 Cache headers . . . . . . . . 2.9.12 HTTP 1.1 Conditional headers . . . . 2.9.13 HTTP 1.1 Authentication headers . . . 2.10 Practical HTTP with apache . . . . . . . . . 2.10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 8 10 10 11 11 11 12 13 14 14 14 14 15 17 17 17 18 18 20 20 21 22 22 23 24 26 26 27 27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.2 Virtual Hosts (sites) . 2.10.3 CGIs . . . . . . . . 2.10.4 Modules . . . . . . 2.11 Commands summary . . . . 2.12 XML . . . . . . . . . . . . 2.12.1 Introduction . . . . . 2.12.2 XML Comments . . 2.12.3 Escaping . . . . . . 2.12.4 Well-formed XML . 2.12.5 Valid XML . . . . . 2.13 JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 28 29 30 30 30 31 31 31 32 32 Restful Architectural Style 3.1 REST motivation . . . . . . . . . . . 3.2 REST Constrains . . . . . . . . . . . 3.2.1 Client-server . . . . . . . . . 3.2.2 Stateless . . . . . . . . . . . 3.2.3 Cache . . . . . . . . . . . . . 3.2.4 Uniform interface . . . . . . . 3.2.5 Layered System . . . . . . . . 3.2.6 Code on demand . . . . . . . 3.3 How to design your APIs . . . . . . . 3.3.1 Define functionalities . . . . . 3.3.2 Define your resources . . . . 3.3.3 Define resource representation 3.3.4 HATEOAS . . . . . . . . . . 3.3.5 Cache . . . . . . . . . . . . . 3.3.6 Implement your API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 35 36 36 37 37 41 42 42 42 42 43 43 44 44 4 REST Practices 4.1 Interface exercices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Cache exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 46 5 Other API architectures 5.1 RPC APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Message based APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 55 55 57 6 Django Development 6.1 Introduction to DJANGO . . . . 6.2 Starting a new project . . . . . . 6.3 Project structure . . . . . . . . . 6.4 Models . . . . . . . . . . . . . 6.4.1 Model relationships . . . 6.4.2 Managers and QuerySets 6.5 Views . . . . . . . . . . . . . . 6.5.1 Function views . . . . . 6.5.2 Class-Based views . . . 6.6 URI patterns . . . . . . . . . . . 6.7 Formatting the output . . . . . . 6.8 Middleware . . . . . . . . . . . 6.9 Deploy the project . . . . . . . . 58 58 59 59 60 61 62 62 62 64 65 66 67 68 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Cache in Django . . . . . . . . 6.11 Example 1: File distribution API 6.11.1 First iteration . . . . . . 6.11.2 Second Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Django Practices 8 REST Aplied to a SDN Application 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 8.2 Ryu Introduction . . . . . . . . . . . . . . . . . . . 8.3 Ryu Features . . . . . . . . . . . . . . . . . . . . . 8.3.1 Message Reply Handlers . . . . . . . . . . . 8.3.2 OpenFlow protocol messages . . . . . . . . 8.3.3 HTTP Request Handlers . . . . . . . . . . . 8.3.4 Link REST Controllers with Ryu applications 8.4 Monitoring Application . . . . . . . . . . . . . . . . 8.4.1 RYU implementation . . . . . . . . . . . . . 8.4.2 Django Implementation . . . . . . . . . . . 8.4.3 Topology configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 70 70 73 80 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 . 95 . 95 . 96 . 96 . 97 . 97 . 98 . 99 . 99 . 107 . 115 Chapter 1 Introduction to APIs 1.1 Introduction In computer programming, an application programming interface (API) is a set of routines, protocols, and tools for building software applications. An API expresses a software component in terms of its operations, inputs, outputs, and underlying types. An API defines functionalities that are independent of their respective implementations, which allows definitions and implementations to vary without compromising each other. A good API makes it easier to develop a program by providing all the building blocks. A programmer then puts the blocks together. 1 Basically an API is a black box that has some specified inputs and performs certain operations, returning (if any) some outputs. We can find them in many places, for instance: • Linux kernel interfaces. They allow user space programs to access system resources and services of the linux kernel2 via syscalls. • 3D computer graphics such as DirectX and OpenGL. • Distributed APIs. This document will focus on Distributed APIs. 1.2 Distributed APIs Everyone knows what distributed systems are but everyone defines them in their own words, so when we talk about a distributed system we will use the Client - Server architecture (Fig 1.1). The purpose behind the development of a distributed api is to be able to execute a given call in the server from the client. As a first approach, many platforms developed their own technologies, such as CORBA, Java Remote Method Invocation (RMI), DCOM, or .NET Remoting. Some difficulties appeared here, the most important one was the incompatibility between technologies: the client and the server had to use the same technologies. Furthermore some vendor implementations/toolkits had troubles talking to each other due to the lack of standardization. In order to get rid of the incompatibilities developers started to use web services. The main advantage of using web services is the level of standardization achieved with them. 1.3 Web Services Web services are applications that use a set of technologies that are able to operate in the web. Web services involve many protocols and technologies such as XML, RPC, SOAP, WSDL, WS-SECURITY, etc. to achieve a high level of standardization. Most of the web services use HTTP as a transport protocol. 1 http://en.wikipedia.org/wiki/Application_programming_interface 2 http://www.linux.it/~rubini/docs/ksys/ksys.html 5 Re sp on se Re qu es t Server / Service Provider Clients Figure 1.1: Client - Server architecture. "A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards. " [] The main two organizations behind the web services standardization are Oasis and W3C. They define Typically, web services have followed Service Oriented Architecture. 1.3.1 Service Oriented Architecture In SOA, the server defines services, which can be seen as functionalities. The basic idea is to break down big applications into simple little units, which are services. This way, when you need a new functionality you can make use of existing little services (Even outside from your domain) and concatenate them to achieve your desired functionality. The servers should be highly reusable and multi-purpose. In SOA, the interface is a fundamental part. If the interface between two applications doesn’t work, the system doesn’t work. That’s why standard interfaces gain a big importance in web services. In SOA, it is important to separate the functionality given from the implementation. "This is like going to a restaurant: you tell your waiter what you would like to order and your preferences but you don’t tell their cook how to cook your dish step by step." [35] Notice that SOA is not restricted at all to web services. 1.4 REST In year 2000 Dr. Roy Fielding published his PhD thesis: Architectural Styles and the Design of Network-based Software Architectures [1]. In his thesis, he gave a completely different approach to web architecture and defined the bases of a new architectural style: Representational state transfer. It may have some similarities with SOA but it is different in a core aspect: the principal elements are Resources instead of Services. Even if REST does not define any underlying protocol, it’s mostly used on top of HTTP, which fits perfectly with the architectural constrains of REST. 1.5 When to use web technologies? As we said before, web technologies solve one of the biggest problems of distributed APIs: if we find diversity among the clients that access the server, the web will be the easiest way to implement an API (Understanding web technologies 6 and the web any application/protocol that is built on top of HTTP). On the other hand, web technologies have some drawbacks: • In every request the client must serialize the data as a stream of bytes and transmit it and the reverse operation must be performed by the server: when it receives a stream of data it must deserialize it into an understandable data format and structure. The same happens for each response (if the response is something more than a status code). This processes is expensive. • The HTTP protocol will add some headers that may be significant if an application requires high throughput or low response time. • HTTP is by definition ’stateless’, which means that if we need to develop a ’stateful’ application we will have to build a state mechanism. 1.6 What will you find in this book? In this book, you’ll find a very practical introduction to the design of APIs. More specifically apis that follow the REST architectural style. First of all, in chapter 2 you’ll find a review of the web technologies that we’ll use in the process of creating APIS (HTTP, XML and JSON). After that, in chapter 3 you’ll find the basic principles that define the REST architecture, a practical approach to the developing of REST APIs and examples to fully understand the REST nature. After that, in chapter 5 we’ll review other architectures and popular protocols that can be used to develop APIs. Once the REST concept is clear, in chapter 6 you’ll be able to learn how to use the popular Web framework ’Django’, which is a very powerful tool that can be used to implement APIs. Finally, in chapter 8 you’ll be able to see how to implement a REST API, applying all the learned concepts in the previous chapters to a SDN project. To fix the knowledges learned in chapters 3 and 6, you’ll find some proposed exercises in chapters 4 and 7. 7 Chapter 2 Web Technologies 2.1 History Tim Berners-Lee is credited with having created the initial World Wide Web (WWW) during 1985-1991, while he was a researcher at the European High-Energy Particle Physics lab at CERN (Centre Européen de Recherche Nucléaire). In this context, a multi-platform tool was needed to enable sharing documents between physicists and other researchers in the high energy physics community. Tim Berners-Lee wrote a proposal that was a solution for enabling such collaboration. Four basic technologies were part of his proposal: • HTML (HyperText Markup Language): a language to write documents. • HTTP (HyperText Transfer Protocol): a protocol to transmit resources (like HTML documents). • A WEB server: a software that serves resources like HTML documents. • A WEB browser: a software that acts as client to send requests and process responses for resources available on a WEB server (like HTML documents). 2.2 HTML Documents HTML (HyperText Markup Language) as its name states is not a programming language like C or Java but a markup language. In plain English, this means that HTML is a language for describing how content (text, images, etc.) should be displayed. With the HTML language, we can create HTML documents to be displayed in a browser. HTML documents are just text files so you can edit them with any text editor. There are also available “HTML editors”, specially designed for writing HTML. Analyzing HTML documents is a good way of learning HTML. Let’s take a look at a simple HTML document (see Code 2.1). 1 2 3 4 5 6 7 8 9 < html > <head > < t i t l e > H e l l o World < / t i t l e > < meta h t t p −e q u i v = " c o n t e n t −t y p e " c o n t e n t = " t e x t / h t m l ; c h a r s e t =UTF−8" > </ head > <body > H e l l o <b>World < / b > ! ! ! ! ! ! ! </ body > </ html > Code 2.1: Simple HTML document As you observe, the HTML document is just text. However, some of the text is considered “hypertext”, which means that it has a special meaning in HTML. Text enclosed between the characters “<” and “>” is hypertext and those hypertexts are called “HTML tags”. HTML tags tell the browser to do something special. In our example, 8 “<b>World</b>” tells the browser to use the boldface font. As you see, some HTML tags have an opening tag and an ending tag. This is marked as <tag> ... </tag>, like in the case of the boldface tag. Other tags however, are just composed of a single tag. The HTML document is delimited by <html> and </html>. In addition, the HTML document is divided in two parts: • <head>. This part is optional. When <head> exists, it can contain several labels like <title>, <meta> etc. For example, the <title> tag specifies the title that must be displayed in the browser’s window. With the <meta> tag we can define the charset: 1 < meta h t t p −e q u i v = " c o n t e n t −t y p e " c o n t e n t = " t e x t / h t m l ; c h a r s e t =UTF−8" > • <body>. Inside the body is where the whole HTML document is specified. All text, images, etc. are contained between <body> and </body>. On the other hand, we can also use tags to create hyperlinks to other resources (like other HTML documents). This is a fundamental feature in HTML. The hyperlink tag is <a>... </a>. To see an example, look at Code 2.2: 1 2 3 4 5 6 7 8 9 10 11 < html > <head > < t i t l e > H e l l o World < / t i t l e > < meta h t t p −e q u i v = " c o n t e n t −t y p e " c o n t e n t = " t e x t / h t m l ; c h a r s e t =UTF−8" > </ head > <body > H e l l o <b>World < / b > ! ! ! ! ! ! ! Go t o <a h r e f = d o c s / o t h e r d o c . html > a n o t h e r document </ a > </ body > </ html > Code 2.2: Simple HTML document with an hyperlink. In the previous example, we link our HTML document with another HTML document that is located in a relative directory called “docs”. Relative paths are described taking the location of the HTML document as reference. Notice that the hyperlink is specified as a parameter “href” inside the opening tag. The <img> tag is used to display an image. The src attribute provides the path to the image. Example: 1 <img s r c = " p i c t u r e s / image . g i f " > On the other hand, blank spaces and new lines are called ”whites”. You can add as many ”whites” as you like to make your HTML file easier to read but browsers display consecutive whites as a single space. If you need to create a paragraph, you have to use the labels <p> ... </p>. For paragraphs, the browser will adjust the text lines correctly based on the window width. If you really want to force a new line, you have to use the <br> tag. HTML has many tags but with a few of these tags, we can have an idea about how HTML works. Some more useful tags are: • <i> </i> Sets text in italics. • <tt> </tt> Sets text in teletype. • <h1> </h1> Sets text in type “header 1”. You can use numbers of headers in descending order of importance (size): <h2> </h2> . . . <h6> </h6> • <hr> Prints an horizontal line. • <center> </center> Centers text and images. • <blockquote> </blockquote> Indents text. • <pre> </pre> Pre-formatted text, i.e. spaces and line breaks between these tags are maintained. • <!-- text comments... --> Comments in the HTML file. 9 2.3 HTTP Motivation Initially, HTTP (Hypertext Transfer Protocol) arised from the necessity of creating hyperlinks in HTML documents to resources that are not on the same host. HTTP is a text protocol and it is based on a client/server model that can be used over a TCP/IP network to deliver virtually any resource of the World Wide Web (WWW). For now, we will consider that a resource is just an HTML document. An HTTP server or WEB server is a network daemon that uses by default the well-known TCP port 80. HTTP clients, generically called WEB Browsers (e.g. firefox or lynx), send HTTP requests to the HTTP servers asking for a resource and the server responds with the requested resource (see Figure 2.1). HTTP protocol Browser GET doc.html HTTP server connection TCP/80 doc.html Figure 2.1: HTTP client/server. 2.4 URL/URI The first issue to implement HTTP is to define how to identify resources. The identifiers used in HTTP were initially defined by Tim Berners in 1991. They were called URLs (Uniform Resource Locators) and they were first used to allow authors of HTML documents to establish hyperlinks in the WWW. An URL is just a text string with a standard format that allows you to name a resource based on its location on the WWW. In 1994, the URL concept was incorporated into a more general concept called URI (Uniform Resource Identifier). URI is the standard name for resource identifiers in the Internet, but the term URL is still widely used. The simplest URL/URI format is as follows: 1 p r o t o c o l : / / hostname / d i r e c t o r y / r e s o u r c e But, other information can also be present in the URL: 1 p r o t o c o l : / / u s e r n a m e : password@hostname : p o r t / d i r e c t o r y / r e s o u r c e The detailed specification for URL/URIs is in RFC 1738[9]. Some examples are: • http://www.example.com/pictures/upc.jpg • http://www.example.com • http://192.168.0.5 • http://www.example:8080/cgi-bin/time.sh • http://user:[email protected]/ • ftp://debian.org If in the URL there is not any resource (filename) specified, it is assumed that the client is asking for a file called index.html or index.htm. As its name suggests, this file contains an HTML file with the Web site index. On the other hand, we can use absolute or relative paths in HTTP hyperlinks. In an HTTP server, absolute paths are related to a directory called DocumentRoot. This parameter is defined in the configuration file of the HTTP server. For example, a typical DocumentRoot when using Linux is /var/www. In this case, the URL http://www.example.com/images/upc1.gif refers to a file called upc1.gif that is stored in the HTTP server in the directory /var/www/images. The following HTML file serves as an example of how to use absolute and relative paths: 10 1 2 3 4 5 6 7 8 9 10 11 12 13 < html > <head > < t i t l e > H e l l o World < / t i t l e > </ head > <body > <p> H e l l o <b>World < / b > ! ! ! ! ! ! ! < / p> <p>Go t o <a h r e f = d o c s / o t h e r d o c . html > a n o t h e r document </ a > </ p> <p>You c a n v i s i t t h e UPC home p a g e a t <a h r e f = " h t t p : / / www. upc . edu " >UPC home < / a > . </ p> <img s r c = " / i m a g e s / upc1 . g i f " > <img s r c = " / i m a g e s / upc2 . g i f " > <img s r c = " h t t p : / / www. e x a m p l e . com / i m a g e s / upc1 . g i f " > </ body > </ html > Code 2.3: Simple HTML Document with an Absolute Path and External Hyperlinks. 2.5 HTTP 1.0 HTTP is a text protocol that uses the client-server model like many other TCP/IP applications and 80 as its default port. Other TCP port can be used but the client must know this port and include it in the URL. Then, the HTTP client opens a TCP connection and sends an HTTP request to an HTTP server. If everything is correct, the server returns an HTTP response that contains the requested resource. After delivering the response, the HTTP server closes the TCP connection. HTTP is a stateless protocol, which means that HTTP does not maintain state information between different requests. 2.5.1 HTTP Requests In an HTTP request, the first line is the only one mandatory and it contains the “request method”, the path to the resource and the HTTP version. Then, it follows a blank line (CR+LF). The minimal request in HTTP 1.0 is something like the following: 1 2 GET / HTTP / 1 . 0 [ blank l i n e ] GET is the most commonly used request method and it means “give me this resource”. After the GET keyword we find a “/”. This means that the resource that we are requesting is the index file of the WEB server. Finally the line ends with a CR+LF ([blank line]). Another example is: 1 2 GET / i m a g e s / upc1 . g i f HTTP / 1 . 0 [ blank l i n e ] In this case, the client is requesting a file called upc1.gif that is stored in the HTTP server in the directory images (relative to the server’s DocumentRoot). 2.5.2 Headers Requests (and also responses) can have header lines. Headers are text lines that provide additional information or functionality in requests/responses. The format is ”Header-Name: value1, value2”, ending with CR+LF. The header name is not case-sensitive. There can be any number of spaces or tabs between : and the value. The header lines starting with space or tab are actually part of the previous header line (used for readability). The following headers are equivalent: 1 H e a d e r 1 : some−l o n g−v a l u e −1a , some−l o n g−v a l u e −1b 2 3 4 H e a d e r 1 : some−l o n g−v a l u e −1a some−l o n g−v a l u e −1b 11 HTTP 1.0 defines 16 headers, though none is required. Typical headers included in the requests are: • From: gives the email address of the user who makes the request. • User-Agent: name of the browser and OS. For example, a request with headers could be the following: 1 2 3 4 GET / p a t h / f i l e . h t m l HTTP / 1 . 0 From : user@example . n e t User−Agent : M o z i l l a / 5 . 0 ( X11 ; Ubuntu ; L i n u x i 6 8 6 ; r v : 2 4 . 0 ) Gecko / 2 0 1 0 0 1 0 1 F i r e f o x / 2 4 . 0 [ blank l i n e ] The headers can help to solve the problems in web sites but they also reveal information about the user. Thus, notice that there is a trade-off between information provided for debugging and the user privacy. 2.5.3 HTTP Responses HTTP responses are also composed of text lines. The first text line of an HTTP response is the status. Typical status lines are: 1 2 HTTP / 1 . 0 200 OK HTTP / 1 . 0 404 Not Found The first digit identifies the general category of the status: • 1xx indicates an informational message only. • 2xx indicates success of some kind. • 3xx redirects the client to another URL. • 4xx indicates an error in the client side. • 5xx indicates an error in the server. Examples: • 301 Moved Permanently. • 302 Moved Temporarily. • 303 See Other (HTTP 1.1 only. Means that the resource has been moved to another URL given by the location header in the response). • 500 Server Error. On the other hand, a response can also have headers. The headers usually included in responses by servers are: • Server: header is analogous to the User-Agent (it identifies the server software). • Date: current date. • Last-Modified: date of last modification of the resource being returned. This header is used for caching (explained later). After the headers, if the resource was available in the server, we can find a CR+LF and then the response’s body containing the requested resource. In general, if an HTTP message includes a body, there are at least two additional header lines to describe the body’s content. These header lines are “Content-Type” and “Content-Length”: • Content-Type: MIME-type of the object. • Content-Length: number of bytes of the object. For example, to retrieve the file http://www.example.com/path/file.html using HTTP 1.0, the first step is to open a TCP connection with the server www.example.com using the HTTP default TCP port 80. Then, through this connection the client could send an HTTP 1.0 request like the following: 1 2 3 GET / p a t h / f i l e . h t m l HTTP / 1 . 0 From : user@example . n e t [ blank l i n e ] Through the same socket (connection), the server could respond with something like the following: 12 1 2 3 4 5 6 7 8 9 10 HTTP / 1 . 0 200 OK D a t e : Mon , 21 Oct 2013 2 2 : 2 9 : 5 9 GMT C o n t e n t −Type : t e x t / h t m l C o n t e n t −L e n g t h : 50 [ blank l i n e ] < html > <body > <h1 > I t works ! < / h1 > </ body > </ html > After receiving the response, in the basic implementation of HTTP 1.0, the client closes the TCP socket. 2.6 Cookies As previously mentioned, HTTP is a stateless protocol, which means that HTTP does not maintain state information between different requests. A cookie is a piece of information (UTF8 text) sent from an HTTP server and that is stored by the browser in the client’s filesystem. Sometimes cookies are also called footprints. The browser returns cookies unchanged to the server. Cookies provide a state (memory of previous events) into otherwise stateless HTTP transactions. Without cookies, each retrieval of a web page or component of a web page is an isolated event, mostly unrelated to all other views of the pages of the same site. The most common uses of cookies are: • User Control. For example, when a user enters his username and password, a cookie can store this information so there is no need to enter them again in a later visit to the web server. • Getting information about user’s browsing habits. The HTTP server sends lines with the Set-Cookie header if the server wishes the browser to store these cookies. Set-Cookie is a directive for the browser to store the cookie and send it back in future requests to the server (subject to expiration time or other cookie attributes). For example, the browser requests the resource http://www.example.org/doc.html (see Figure 2.2). Browser HTTP server GET /doc.html HTTP/1.0 HTTP/1.0 200 OK Content-type: text/html Set-Cookie: EXSID=ABCKKO…em_vYg; Expires=Wed, 27 Feb 2019 10:10:10 GMT (content of page) GET /doc.html HTTP/1.0 Cookie: EXSID=ABCKKO…em_vYg; Expires=Wed, 27 Feb 2019 10:10:10 GMT ... Figure 2.2: How HTTP cookies work. The client sends a regular request, then the server asks the client to store the cookie. Then, the client sends the cookie in a subsequent request. It is worth to mention that there are more fields (like path and domain) in the cookie to help in deciding when to send it or not. Finally, as a you may imagine cookies can cause problems of privacy. 13 2.7 HTTP Proxies An HTTP proxy is a program that acts as an intermediary between a browser and a Web server. HTTP Proxies are typically used for security (a single point of control) or efficiency (caching). HTTP server Browser Browser GET GET TCP connection GET TCP connection HTTP Proxy Server (transparent) HTTP server New TCP connection HTTP Proxy Server (no transparent) (a) Transparent. (b) No transparent. Figure 2.3: HTTP Proxies. From the point of view of users, there are two basic types of proxies: • Transparent (Figure 2.3a). A transparent proxy intercepts normal communication at the network layer without requiring any special client configuration. Clients need not be aware of the existence of the proxy. • No transparent (Figure 2.3b). A proxy that is not transparent receives requests from clients and sends requests to servers. The responses go the way back also using the proxy. Therefore, a proxy has functions of a client and a server. A non-transparent proxy can use another transparent or non-transparent proxy to reach the final server. Clients send their requests to the proxy instead of the real server specified in the URL (the proxy IP address and port is defined in the browser). HTTP requests using a non-transparent proxy must include the full URL of the resource (not only the relative path). In this way, the proxy knows to which server it must send the HTTP request. For example: 1 2 GET h t t p : / / www. s o m e h o s t . com / p a t h / f i l e . h t m l HTTP / 1 . 0 [ blank l i n e ] Finally, it is worth to mention that we have open source HTTP proxy implementations like Squid (which widely used). 2.8 2.8.1 Dynamic Web Introduction In today’s Web, the content is not static but documents are generated on the fly by servers with information provided by clients. As a result, WWW is not just a huge database of documents or content but a platform to implement services and applications. Common applications of the dynamic web are searching engines, remote access to corporate applications and databases, etc. 2.8.2 CGIs There are several ways of implementing the dynamic Web. In this document, we only deal with CGIs because they are easy to understand and they were the first method used for such purpose. CGIs or Common Gateway Interfaces are a standard procedure through which HTTP servers can use external applications to dynamically generate content (see Figure 2.4). When we use a CGI, the URL identifies: • An executable program (which is also called “the CGI”). • The parameters with which the CGI has to be executed. 14 Browser HTTP server GET /cgi-bin/cgi HTTP/1.0 HTTP/1.0 200 OK Content-type: text/html ... (content of page generate by the CGI) cgi Figure 2.4: How CGIs work in HTTP. The first issue to take into account is how a web server knows that it has to execute a program instead of sending a resource. An usual solution is to store all the CGIs in a special directory, typically called /cgi-bin/. In this way, if a client asks for www.example.com/cgi-bin/program the server knows that it must execute program instead of sending it. The second issue is how to send the parameters to the program. When using GET, the parameters are encoded in the URL. These parameters are added to the URL after a character “?” and separated by the character “&”. Example: 1 h t t p : / / www. e x a m p l e . com / c g i −b i n / p r o g r a m ? param1 = v a l u e 1¶m2 = v a l u e 2 . . . Note. Spaces are translated using the character ”+” and ASCII characters can also be sent in the format %NNN, where NNN is the ASCII code number. Finally, before executing the CGI, the Web server establishes a special context for the program using environment variables. These variables are: CONTENT_LENGTH, CONTENT_TYPE, REMOTE_HOST, REMOTE_USER, REQUEST_METHOD, SERVER_NAME, QUERY_STRING, GATEWAY_INTERFACE, HTTP_* For GET requests, the QUERY_STRING variable takes the value of the parameters, as shown in the URL. In this manner, the CGI can get the parameters that the client has specified. Regarding the response, the CGI writes it to the standard output (STDOUT). Then, the server reads this answer and sends it to the client through the socket. Depending on the type of web server, the CGI application can act in two ways: • NPH Server (No Parse Header). The CGI application must write the complete response including the HTTP headers. • PH Server (Parse Headers). The CGI application must write a response without HTTP headers and it must pass information to the server on how to form the headers. Typically, web servers are NPH. Finally, we would like to remark that CGIs are not the most efficient solution because a process is created per request. Today we have other solutions more efficient or flexible to create dynamic Websites like Javascript, Phyton, PHP, JAVA servlets, etc. 2.8.3 HTML Forms An HTML form allows a client to send parameters to a WEB server. The tag to declare a form is <FORM>. Different elements can be inserted into the form: text input elements, codes, images, files, checkboxes, etc. These elements are inserted in the form using the <INPUT> tag. All the items of the form have a “type” attribute and they might have a ”name” attribute. There are two special elements: RESET, which clears the form to its original state and SUBMIT, which presents a button to send the form. Example: 15 1 2 3 4 5 6 7 < html > <head > < t i t l e > W e b s i t e t i t l e </ t i t l e > </ head > <body > Form t o s e l e c t p a r a m e t r e s t o s e n d t o t h e s e r v e r . < form ACTION= " / c g i −b i n / p r o c e s s " METHOD= "GET" > 8 9 E n t e r a name : <INPUT NAME= " a " TYPE= " t e x t " > < br > 10 11 E n t e r a p a s s w o r d : <INPUT TYPE= " p a s s w o r d " NAME= " b " MAXLENGHT= " 8 " > < br > 12 13 Checkbox : <INPUT TYPE= " c h e c k b o x " NAME= " c " > < br > 14 15 16 17 18 19 <INPUT TYPE= " r e s e t " > <INPUT TYPE= " s u b m i t " > < br > </ form > </ body > </ html > Code 2.4: HTML document with a form. Figure 2.5: An HTML form viewed from a browser. Figure 2.5 shows the form viewed in a WEB browser. When the form is sent (pressing the submit button), the client generates an HTTP request using the method (GET or POST) showed in the METHOD attribute to execute the script or application indicated in the ACTION attribute. As already discussed, GET requests do not have a body but parameters for the execution of the application are encoded in the URL, while POST requests have a body with the parameters. Using the content-type header, the POST request defines how the parameters-values have been encoded: • application/x-www-form-urlencoded. This is the default encoding type. It is similar to the encoding used by GET. You cannot send a body in the request. • multipart/form-data. Separates parameters with a mark (boundary). You can also include a body (e.g. a binary file) in the request. Example: 1 2 3 POST / c g i −b i n / p r o g r a m HTTP / 1 . 0 From : user@example . n e t C o n t e n t −Type : a p p l i c a t i o n / x−www−form−u r l e n c o d e d 16 4 5 6 C o n t e n t −L e n g t h : 27 [ blank l i n e ] param1 = v a l u e 1¶m2 = v a l u e 2 The question is: use GET or POST? Actually, each method has advantages and drawbacks. When using GET, the parameters for the server are encoded in the URL. This can be considered security vulnerability because these parameters can be read by anyone. Another drawback of GET is that it does not allow sending binary files in the body of the request. However, GET is useful to perform requests and store the results together with the associated URL (that contains all the parameters of the query). GET also allows to use the back button to go to the previous results. On the other hand, with POST the parameters for the server are sent in the body of the request. With POST the parameters are not visible in the browser as a query string. In general, GET is useful for idempotent operations (which always give the same result). POST means ”carry out” an action with a ”side effect” or a change of state (non-idempotent operations). 2.9 2.9.1 HTTP 1.1 Introduction HTTP 1.1 defines 46 headers, and one of them “Host” is mandatory in requests. HTTP 1.1 was defined to face up new needs and to overcome the shortcomings of HTTP 1.0. In general terms, HTTP 1.1 is a superset of HTTP 1.0. These improvements include: • Host header. Provides efficient use of IP addresses. Now, multiple domains can be served from a single IP address. • Chunked encoding. Allows a faster response for dynamically generated pages. Pages are divided and sent in chunks (fragments). In this way, a response can be sent before its total content or length is known. • Persistent connections. A TCP connection is not opened/closed for each request. By allowing multiple HTTP transactions in one TCP connection we can reduce the total transmission delay. • Caching. The protocol provides headers to implement caching. This allows a faster response and bandwidth savings. HTTP 1.1 requires changes in both client and server. Next, we describe in more detail each of the previous features. HTTP1.1 was originally defined in RFC2616[10]. In June 2014 it had major changes and now is defined not only by one rfc’s but for many of them. The ones that carry the most important information are: RFC7230[12]: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing RFC7231[13]: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content RFC7232[14]: Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests RFC7233[15]: Hypertext Transfer Protocol (HTTP/1.1): Range Requests RFC7234[16]: Hypertext Transfer Protocol (HTTP/1.1): Caching RFC7235[17]: Hypertext Transfer Protocol (HTTP/1.1): Authentication 2.9.2 Headers This documment does not contain all the available headers from HTTP1.1 specification. Instead, the most used ones are listed and explained. For an exhaustive and precise understanding of the headers you should look into RFC7231 [13]. HTTP headers have a specific format: They are case sensitive, following the camel case 1 style but adding a hyphen between words (Content-Type, If-Match, etc.) followed by a colon (:) and the value of the header. Example: 1 http://en.wikipedia.org/wiki/CamelCase 17 1 Allow : GET , PUT There is only one header that MUST be included in every request: Host From HTTP 1.1, web servers can be multi-domain. For example, we can have the domain ”www.example.com“ and ”www.example.net” on the same server. Thus, the IP address of the server is not enough to figure out which is the domain to be served. An analogy is a situation in which several people share a phone, then, when we call, we have to ask who is speaking and possibly ask for the correct person. So, in HTTP 1.1, each request must specify the hostname (and optionally the port). A minimal HTTP request for version 1.1 could be the following: 1 2 3 GET / HTTP / 1 . 1 H o s t : www. e x a m p l e . com : 8 0 [ blank l i n e ] The host header contains the domain name or IP address of the WEB server. The port number (”:80”). In this case, specifying the port is not necessary because 80 is the default port for HTTP. Regarding HTTP proxies and the Host header, the destination for a request can appear in the URL (as an absolute URI) as well as in the Host header. So, it is important for proxies to behave correctly when both appear. In short, the host and port in an absolute URI always override the Host header. For example: 1 2 GET h t t p : / / e x a m p l e . n e t / f o o HTTP / 1 . 1 H o s t : www. e x a m p l e . com : 8 0 0 0 Here, the server that will be used is example.net and the port 80 (the default for HTTP). 2.9.3 Chunked Data This mechanism allows a server to start sending a response before knowing the complete content, that is to say, before knowing the total length of the content. The idea is to divide the response in small pieces called “chunks” and send these chunks one after another. Responses divided in chunks are identified by the header ”Transfer-Encoding: chunked”. All HTTP 1.1 clients must be able to correctly process responses divided in chunks. The body of a message such as “chunked” contains: several fragments (chunks) followed by a line with ”0” (zero). Optionally followed by the foot of the page (footers). Each “chunk” consists of two parts: (1) a line with the size of the chunk in hexadecimal + CR+LF and (2) Data + CR+LF. Example without chunks: 1 2 3 4 5 HTTP / 1 . 1 200 OK C o n t e n t −Type : t e x t / p l a i n C o n t e n t −L e n g t h : 42 [ blank l i n e ] abcdefghijklmnopqrstuvwxyz1234567890abcdef The same example with chunked data: 1 2 3 4 5 6 7 8 9 10 HTTP / 1 . 1 200 OK C o n t e n t −Type : t e x t / p l a i n T r a n s f e r −E n c o d i n g : c h u n k e d [ blank l i n e ] 1a abcdefghijklmnopqrstuvwxyz 10 1234567890 a b c d e f 0 [ blank l i n e ] 2.9.4 Persistent Connections In HTTP 1.0, TCP connections are closed after each request/response by default. As we know, opening/closing TCP connections requires a substantial amount of CPU time, bandwidth, and memory. In practice, most web pages consist of several files (linked HTML documents, images, etc.) that are located on the same server. Consecutive requests (and 18 their associated responses) can be more efficiently transmitted by allowing multiple requests/responses to be sent over a single connection. This is mechanism is called “persistent connections”. In HTTP 1.1, persistent connections are used by default. We do not need anything special to use persistent connections. Simply, the clients open a connection, send multiple requests one after another and then, read the corresponding responses in order. The client can include a header “Connection: close”. Then, the server has to close the connection after the reply. This should only be used if the client is unable to process persistent connections or if it is known that the request will be the last. On the other hand, if a response contains the header “Connection: close”, then, the client cannot send more requests through that connection and it must close the connection after the response is received. A server may close the connection before sending all the answers. In this case, the client is responsible for tracking the answered requests and resend these unanswered requests if necessary. The HTTP 1.1 client can also send multiple requests through a single connection without having received any response (pipelining). On its side, an HTTP 1.1 server must store queued requests while it can not process them, and it must send the responses in the same order as it received the requests. If a request includes the header ”Connection: close”, the server must interpret this as that the request is the latest and it must close after sending the corresponding response. The server also closes idle connections (after a period of time, typically 10 seconds). Some servers do not support persistent connections to save resources (minimize the number of concurrent open sockets). If it is not wanted to use the persistent connection, then the server can include the header ”Connection: close” in each response. Finally, it is worth to mention that typically, clients (browsers) open several simultaneous persistent TCP connections with each server. In the example of Figure 2.6, the browser uses 2 persistent connections with the HTTP server. Browser HTTP server index.html GET / HTTP/1.0 connection 1 HTTP/1.0 200 OK Content-type: text/html ... <img src=”images/img1.gif”> <img src=”images/img2.gif”> <img src=”images/img3.gif”> <img src=”images/img4.gif”> <img src=”images/img5.gif”> ... connection 1 GET /images/img1.gif HTTP/1.0 connection 1 GET /images/img2.gif HTTP/1.0 connection 2 GET /images/img3.gif HTTP/1.0 connection 1 ... Figure 2.6: 2 Multiple Persistent Connections with an HTTP Server. For example, in a real browser like firefox the default is having up to 6 persistent connections per server. In fact, this can be configured with the parameter network.http.max-persistent-connections-per-server. To configure this parameter, type about:config in the URL bar of firefox. Enabling multiple connections helps in increasing the performance since we can obtain more throughput from several connections than from just one connection. In addition, the client can send its requests in parallel through the different connections. 19 2.9.5 Continue The “continue” mechanism allows to determine if the server is willing to accept a request based on the message headers. This is useful if a client has to send a request with a big body (e.g. big file). This mechanism prevents to waste time and resources if the server is going to reject the message (independently of its body). Clients include the header “Expect :100-continue“. Then, if the server is going to process the request must respond with 100 (Continue) status. A client should not send the Expect header if it is not going to send any body in its request. 2.9.6 Caching HTTP defines two different kind of headers to achieve a caching system. From the client and server’s point of view it defines the ”date“ header and conditional headers. The date header is used to know the reference time when a response was created. The conditional headers indicate some conditions for the server to determine if it should process a request or not. The most typical one is ’If-Modified-Since’ which specifies a date. A server will only respond a request including this header if the information requested has changed since that specified date. Finally, an ETag can be added. This is an identifier assigned by a web server to a specific version of a resource. Example: A client sends a request on index.html to the server and the server responds with the following message: 1 2 3 4 5 6 7 8 9 10 HTTP / 1 . 1 200 OK C o n t e n t −Type : t e x t / h t m l D a t e : Mon , 04 May 2015 2 0 : 4 5 : 0 2 GMT [ blank l i n e ] < html > <body > H e l l o World ! </ body > </ html > [ blank l i n e ] Some time later the client wants to check the index.html page again. and it sends this HTTP request: 1 2 3 4 5 GET / i n d e x . h t m l HTTP / 1 . 1 H o s t : www. m y s i t e . com I f −M o d i f i e d −S i n c e : Mon , 04 May 2015 2 0 : 4 5 : 0 2 GMT [ blank l i n e ] [ blank l i n e ] If this page has changed since that day the server will send back a normal response. But if the page has not changed since then, the server will only send a response with a 304 code (Not modified), reducing the server’s execution time and network’s load. HTTP 1.1 defines a set of headers for intermediary systems. One of the most importants is Cache-Control (see Section for further information of caching). Example: A client requests a resource from a server. It sends the following message: 1 2 3 4 5 GET / p a g e . h t m l HTTP / 1 . 1 H o s t : www. m y s i t e . com Cache−C o n t r o l : max−a g e =120 [ blank l i n e ] [ blank l i n e ] The max-age parameter in Cache-Control indicates that the client wants a response that has been stored in some intermediary server for less than two minutes. The server could respond with something like this: 1 2 3 4 5 6 HTTP / 1 . 1 200 OK C o n t e n t −Type : t e x t / h t m l D a t e : Mon , 04 May 2015 2 0 : 4 5 : 0 2 GMT Cache−C o n t r o l : no−c a c h e [ blank l i n e ] < html > 20 7 8 9 10 11 <body > Bye World ! </ body > </ html > [ blank l i n e ] The no-cache parameter in Cache-Control indicates to the caching intermediaries that they mustn’t store the information in the message’s payload. You can find a list of the most important cache related headers in section 2.9.2. 2.9.7 HTTP 1.1 Methods We have already seen some HTTP methods (GET and POST). In this section we will review all the available methods and explain what are each of them used for. • OPTIONS: ”This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.“ [13]. The OPTIONS method should return an ’Allow’ header listing all available methods for a certain resource. It’s not required for the response to contain a body, but if it does, it should contain information about the communication options. The body structure is not standardized so it depends on the developer implementation. The OPTIONS responses are non cacheable. • GET: This method is the most used one on the web. The method allows a client to obtain the a resource. When a server receives a GET request it should not perform ANY modification in the resources. For this reason is considered a ’safe’ method: you can use it as many times as you want and it’s not going to change anything on the server. The GET responses are cacheable • HEAD: Head is the other ’safe’ method. It is used in the same terms of GET requests but the response to a HEAD message doesn’t contain a body. Instead, it only contains the headers that would be sent with GET. HEAD can be used for checking link validity, accessibility and recent modification in order to reduce network load. The HEAD responses are cacheable • POST: In the RFC7231[13] definition of POST method it’s stated that: ”The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line.“. This is, in plain language, to add some data to an existing resource. For example: if you have a resource that represents your cars and you buy a new car you should POST it to the list. However, the POST method has been used wrong a lot because of the HTML specification. It allows only two methods in it’s formularies: GET and POST. This means that everything that requires something more than retrieve information is achieved with POST. For example, if a server has a web interface and you want to delete an resource from the server, the developer has to implement a formulary that uses POST or GET and points to a certain URI and make the server listen to the request as a DELETE request on the resource. POST response are by default not cacheable. • PUT: PUT is used to create new resources. The body of the PUT message must be considered as the new resource. The resource must be identified with the request URI. If the request URI point to an existing resource the message should be considered as a modification of the existing resource and should be updated. PUT responses are not cacheable. • DELETE: It’s used to remove a resource identified by the request URI. DELETE responses are not cacheable. • TRACE: It is used as a diagnose tool. When invoking TRACE the request is sent to the server. All the intermediaries process it like any other request, but the final server generates a response that contains the request inside the response’s body without changing anything from it. It is used to learn how the intermediaries modify the requests. The request mustn’t contain a body and the request will be sent in the body of the response. TRACE responses are not cacheable. 21 • CONNECT: It is used to establish a tunnel between the client and the server. When a client establishes a secure communication with the server with (for example TLS), the messages sent from one to another are encrypted. When an intermediary receives a CONNECT request, it should, from that time, start to redirect all the content between the client and the server, without trying to perform any change in the message. It is normally used with the Proxy-Authorization 2.9.8 HTTP 1.1 Status codes You can find the complete list of the available codes in figure 2.1. Since most of them are self-explained there won’t be further explanation, however if you want to look into the little details behind each of them you can look at the section 6 from RFC7231 [13]. CODE DESCRIPTION CODE DESCRIPTION 100 Continue 405 Method Not Allowed 101 Switching Protocols 406 Not Acceptable 200 OK 407 Proxy Authentication Required 201 Created 408 Request Timeout 202 Accepted 409 Conflict 203 Non-Autoritative Information 410 Gone 204 No Content 411 Length Required 205 Reset Content 412 Precondition Failed 206 Partial Content 413 Payload Too Large 300 Multiple Choices 414 URI Too Long 301 Moved Permanently 415 Unsupported Media Type 302 Found 416 Range Not Satisfiable 303 See Other 417 Expectation Failed 304 Not Modified 426 Upgrade Required 305 Use Proxy 500 Internal Server Error 307 Temporary Redirect 501 Not implemented 400 Bad Request 502 Bad Gateway 401 Unautorized 503 Service Unavailable 402 Payment Required 504 Gateway Timeout 403 Forbidden 505 HTTP Version Not Supported 404 Not Found Table 2.1: HTTP Status Codes. 2.9.9 HTTP 1.1 Representation Headers Representation headers add information about the payload content. They define the type of data that contains, it’s encoding, it’s language and it’s location. Content-Type: the media type of the payload. It’s values are mime-types (Fig. 2.2) Example: 1 C o n t e n t −Type : t e x t / h t m l Content-Encoding: Indicates what codings have been applied to the content data. Content-Encoding is primarily used to allow a representation’s data to be compressed without losing the identity of its underlying media type. 22 Media type Application Audio Image Text Video Specific value application/json application/soap+xml application/javascript application/xhtml+xml application/pdf application/xml application/postscript application/zip audio/basic audio/mpeg audio/mp4 audio/vnd.wave image/gif image/bmp image/jpeg image/svg+xml image/png image/tiff text/css text/plain text/html text/xml video/avi video/mp4 video/mpeg video/x-flv Table 2.2: Common mime types. Example: 1 C o n t e n t −E n c o d i n g : g z i p , d e f l a t e In this example, there were applied two encodings: ’gzip’ and ’deflate’. The encodings are always listed in the order that they were applied, and therefore, the response data should be decoded in reverse order. In the example, ’gzip’ was applied first, and after that ’deflate’ was applied, which means the client has to apply ’deflate’ decoding first and then ’gzip’. Identity is used to indicate that the data was not encoded (The header is not required if there is no encoding). Content-Language: It defines the language in which the content is written. Example: 1 C o n t e n t −Language : en Content-Location: It contains an URI that points to a resource. If it is used in the responses of PUT or POST methods, it contains the URI that points to the resource that has been created, If it is used alongside the 301 or 307 it contains the URI where the resource was moved to. Example: If we use POST and we get the response below, it means that the resource was created and it is accessible through the URL specified. 1 2 HTTP / 1 . 1 200 OK C o n t e n t −L o c a t i o n : h t t p : / / m y s i t e . o r g / m y r e s o u r c e / Example: This response means that the client should make a new request to the URI specified. 1 2 HTTP / 1 . 1 301 OK C o n t e n t −L o c a t i o n : h t t p : / / m y s i t e . o r g / m y r e s o u r c e / 2.9.10 HTTP 1.1 Content-negotiation headers There are three content-negotiation strategies: server-driven, agent-driven and transparent negotiation. 23 In server-driven negotiation, the client supplies a list of representations allowed and the server decides which of them serves. The client can, however add headers into his request that list a set of allowed types. Accept header: Accept header allows the client to list all the formats wanted for a response. It has two important fields: media-range and a quality factor. The media-range and the ’q’ factor are separated by a semicolon, and a media type is separated from another with a comma. Let’s see it with an example: 1 Accept : t e x t / p l a i n , a p p l i c a t i o n / pdf ; q =0.8 , a p p l i c a t i o n / j s o n ; q =0.3 , t e x t /* As you can see, we specified four media types and only the second and the third media types have a q parameter. ’q’ is a quality value, the bigger it is the more desirable is for the client. If q is not present, it takes 1 as default value. An asterisk means anything, if two types have the same ’q’ parameter but one of them contains an asterisk, the one without the asterisk is preferred. In this example the order of preference would be: 1. 2. 3. 4. 1 2 3 4 text / plain text /* a p p l i c a t i o n / pdf application / json In figure 2.2 you can see a list of the most used mime types. You can find a more extensive list in 2 . There are three more ’Accept’ headers: Accept-Charset, Accept-Encoding and Accept-Language. They have the same syntax as ’Accept’ but they parse charsets, encodings and languages. For example: GET / m i s c / m y r e s o u r c e HTTP / 1 . 1 Accept : t e x t / p l a i n , a p p l i c a t i o n / pdf ; q =0.8 , a p p l i c a t i o n / j s o n ; q =0.3 , t e x t /* Accept−C h a r s e t : u t f −8;q =1 , i s o −8859−1;q = 0 . 5 Accept−E n c o d i n g : g z i p ; q =1 , i d e n t i t y ; q = 0 . 5 , * / * ; q = 0 . 5 Accept−Language : en ; q =1 , e s ; q = 0 . 8 , c a ; q = 0 . 7 1 2 3 4 5 On the other hand, in an agent-driven negotiation, the client decides which representation wants. First, the client performs a request in order to learn all available representations and after that it performs a second request pointing to the specific representation. You can look at it as if you create a single resource for each representation: 1 2 3 4 5 http http http http http :// :// :// :// :// myrestaurantsite myrestaurantsite myrestaurantsite myrestaurantsite myrestaurantsite . org / . org / . org / . org / . org / Restaurants Restaurants Restaurants Restaurants Restaurants /{ /{ /{ /{ /{ Restaurant Restaurant Restaurant Restaurant Restaurant Name } / Menu Name } / Menu / png / Name } / Menu / p d f / Name } / Menu / p l a i n / Name } / Menu / xml / The resource ’ /Menu’ returns links to all the other representations and after that the client performs a new request to get the resource. The last strategy (Transparent negotiation) is a combination of the previous strategies. The client uses server-driven negotiation, but instead of the server handling it, an intermediary redirects the client to the correct representation. This implies that the intermediary must know all the representations that the server has for every resource. So, from the point of view of the client it is a server-driven negotiation, but from the server side, who has to serve all the representations to the proxy it is a agent-driven negotiation, only that the agent is the proxy. 2.9.11 HTTP 1.1 Cache headers Cache headers contain directives to determine whether if a response can be cached, if the client wants a cached response or not and so on. Cache-Control: It contains a list of directives that indicate to the cache agents what to do with the request or the response that they are processing. The cache agents MUST follow this directives and said header must be forwarded to further layers. In figure 2.3 you can find a list of the available directives with a brief description (for further information see section 5.2.2 of RFC7234 [16]). Example: The following header indicates that the client requires a resource that comes from a cache agent but that has not been stored for more than two minutes. 2 http://www.sitepoint.com/web-foundations/mime-types-complete-list/ 24 1 Cache−C o n t r o l : o n l y−i f −c a c h e d , max−a g e =120 Directive name Value Description Validity max-age seconds In a request it indicates the maximum time that a response has been stored since it was generated. REQUEST max-stale seconds In a request it indicates the maximum time that a response has exceeded it's validity time. REQUEST min-fresh seconds In a request it indicates that the client wants a response that is valid at least for 'min-fresh' seconds. REQUEST only-if-cached nothing Indicates that the client wants a stored response. REQUEST mustrevalidate nothing It indicates that caches MUST NOT use not fresh responses. It has to revalidate before with the server. RESPONSE public nothing It indicates that the cache agents may store the response, even if it'd normally be a non-cacheable response. RESPONSE private nothing It indicates that the store should not be stored by shared caches. RESPONSE proxyrevalidate nothing It wors as must-revalidate but it does not apply to private caches RESPONSE max-age seconds It specifies the number of seconds during which the response will be valid RESPONSE s-maxage seconds In shared caches it overrides the max-age directive RESPONSE Extensions optional The directives can be extended with other private directives. REQUEST AND RESPONSE no-cache nothing In a request it indicates that the request MUST NOT be responded with a stored response. In a response it indicates that it MUST NOT be stored. REQUEST AND RESPONSE no-store nothing It indicates that a cache MUST NOT store any part of the request or the response containing it. REQUEST AND RESPONSE no-transform nothin It means that an intermediary MUST NOT transform in any way the payload. REQUEST AND RESPONSE Table 2.3: Cache-Control header directives. Date: It specifies the HTTP data corresponding to the time the message was originated. All the responses (including errors) except the continue answers (status 100) should include the header ”date“. Example: 1 D a t e : Mon , 01 J u l 2014 1 2 : 1 3 : 1 4 GMT Unfortunately, due to earlier versions of HTTP, the value date can be in any of three possible formats: Date: Mon, 27 Apr 2009 23:59:59 GMT Date: Monday, 27-Apr-09 23:59:59 GMT Date: Mon April 27 23:59:59 2009 Although servers can accept all three formats of date, HTTP 1.1 only generates the first type. Age: It indicates the number of seconds that have passed since the response was generated. It is used in responses. Example: This header means that the response was generated 25 seconds ago and it has been stored since then. 1 Age : 25 25 Expires: 1 It indicates that the response is valid until the HTTP date specified: E x p i r e s : Mon , 01 J u l 2015 1 6 : 0 0 : 0 0 GMT Warning: It contains additional information that is not reflected in the status code. It contains a numerical code, a brief description and a HTTP date that must match the Date header. You can find the list of codes in the RFC7234 [16] section 5.5. Example: 1 2 D a t e : Mon , 01 J u l 2015 1 6 : 0 0 : 0 0 GMT Warning : 110 − " R e s p o n s e i s S t a l e " "Mon , 01 J u l 2015 1 6 : 0 0 : 0 0 GMT" 2.9.12 HTTP 1.1 Conditional headers These time stamps use the Greenwich Mean Time (GMT). There are two headers called ”If-Modified-since“ and ”If-Unmodified-Since“ that can be included in HTTP requests. • The If-Modified-Since header means “send the response if it has changed since that date“. • The If-Unmodified-Since header means ”send the response if it has not changed since that date“. Clients are not required to use them but it is assumed that the HTTP 1.1 servers will consider these headers and proceed as follows: • If we use If-Modified-Since in the request and the data of the response has not been changed, the server must send "304 Not Modified". • If we use the header If-Unmodified-Since, and the data of the response has been modified, the server must send "412 Precondition Failed". The most commonly used is the If-Modified-Since header. The If-Unmodified-Since has some not so common uses. As an example, it can be used in a situation in which you request a resource that needs other resources and that if someone changes the original resource in the meantime, this might lead to inconsistencies. In this case, we can use the if-unmodified-since header and the HTTP server will send us information if a record has been changed. 2.9.13 HTTP 1.1 Authentication headers Authentication headers allow the user to send its credentials to the server. There are some standard authentication schemes defined on HTTP 1.1 but the most used one is ’basic’ (explained below). WWW-Authenticate: It is a response header sent from the server to tell the client which authentication schemes are allowed in the requested resource. It should always be included when the server returns a 401 (Unauthorized) status code. It can also contain other parameters, such as the ’realm’ identification. Realms are virtual collection of resources that share the same authentication permissions. Example: 1 WWW −A u t h e n t i c a t e : B a s i c Authorization It contains the authentication credentials for a user. In the basic authentication scheme, the client must take the user and the password and construct a ’user:password’ structure. Then it has to encode it using base64 encoding. For example, user-ID "Aladdin" and password "open sesame" would be encoded as: 1 A u t h o r i z a t i o n : B a s i c QWxhZGRpbjpvcGVuIHNlc2FtZQ== Proxy-Authenticate: It is used to request to an intermediary proxy which authorization schemes does it allow. They are the same kind of schemes used in WWW-Authenticate. Proxy-Authorization: It is the proxy equivalent of the ’Authorization’ header. This is sent by the client to the proxy. 26 2.10 Practical HTTP with apache 2.10.1 Introduction The Apache HTTP Server, commonly referred to as Apache, is a WEB server software notable for playing a key role in the initial growth of the WWW. Today it is also widely deployed in many sites. In our case, we are going to use its second version: the apache2 daemon. One of the main advantages of apache2 is its modular architecture. You can add or remove functionality as dictated by your requirements. Debian-based distros store the Apache 2.0 configuration files in the directory /etc/apache2. Actually, this configuration file is used to load other configuration files. One of these other configuration files is ports.conf, which contains the Listen directives telling apache2 what IP addresses and ports should listen to. As usual, if you change the configuration of the daemon you have to stop and start it to apply the changes. As most of the network daemons, apache2 can be started and stopped under Debian Linux using a script under the directory /etc/init.d. In particular, to stop apache2 type: 1 # /etc/init.d/apache2 stop To start the daemon type: 1 # /etc/init.d/apache2 start 2.10.2 Virtual Hosts (sites) A virtual host is just a web site served by the HTTP server. Each virtual host or site has its own configuration file that contains all the directives that pertain only to that site (a sample configuration file is shown in Code 2.5). 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 < V i r t u a l H o s t * :80 > ServerAdmin webmaster@localhost DocumentRoot / v a r /www <Directory /> O p t i o n s FollowSymLinks A l l o w O v e r r i d e None </ D i r e c t o r y > < D i r e c t o r y / v a r /www/ > O p t i o n s I n d e x e s FollowSymLinks M u l t i V i e w s A l l o w O v e r r i d e None O r d e r a l l o w , deny Allow from a l l </ D i r e c t o r y > S c r i p t A l i a s / c g i −b i n / / v a r /www/ c g i −b i n / < D i r e c t o r y " / v a r /www/ c g i −b i n " > A l l o w O v e r r i d e None O p t i o n s +ExecCGI −M u l t i V i e w s + SymLinksIfOwnerMatch O r d e r a l l o w , deny Allow from a l l </ D i r e c t o r y > E r r o r L o g $ {APACHE_LOG_DIR } / e r r o r . l o g # P o s s i b l e v a l u e s i n c l u d e : debug , i n f o , n o t i c e , warn , e r r o r , c r i t , # a l e r t , emerg . L o g L e v e l warn CustomLog $ {APACHE_LOG_DIR } / a c c e s s . l o g combined A l i a s / doc / " / u s r / s h a r e / doc / " < D i r e c t o r y " / u s r / s h a r e / doc / " > O p t i o n s I n d e x e s M u l t i V i e w s FollowSymLinks A l l o w O v e r r i d e None O r d e r deny , a l l o w Deny from a l l Allow from 1 2 7 . 0 . 0 . 0 / 2 5 5 . 0 . 0 . 0 : : 1 / 1 2 8 </ D i r e c t o r y > </ V i r t u a l H o s t > Code 2.5: Sample Apache 2.0 configuration file for a virtualhost In apache2, the configuration of virtual hosts are in the directory /etc/apache2/sites-available. To activate a site (virtual host), you can use the a2ensite command: 1 2 # a2ensite default # /etc/init.d/apache2 restart There is a respective a2dissite command for disabling a site: 1 2 # a2dissite default # /etc/init.d/apache2 restart Typically, if you only run one web site on your server, apache2 uses the default virtual host. The configuration of the default site is in the file /etc/apache2/sites-available/default. After you enable this site, if you look at /etc/apache2/sites-enabled/, you will find that there is a symbolic link called 000-default. Looking at the configuration of the default site you can easily create other virtual hosts. 2.10.3 CGIs Next, we discuss how CGIs work with apache2. A CGI defines a way for a web server to interact with external content-generating programs, which are often referred to as CGI programs or CGI scripts. This is one of the simplest ways of creating dynamic content on your web site. In the case of apache2, in order to get your CGI programs to work properly, you will need to have Apache configured to permit CGI execution. In Code 2.5 you can observe the following configuration line: 28 1 S c r i p t A l i a s / c g i −b i n / / v a r /www/ c g i −b i n / This tells Apache that any request for a resource beginning with /cgi-bin/ should be served from the directory /var/www/cgi-bin/ and should be treated as a CGI program. For example, if the URL http://localhost/cgi-bin/datecgi.sh is requested, Apache will attempt to execute the file /var/www/cgi-bin/datecgi.sh and return the output. Of course, the file has to exist, be executable and return a correct output (e.g. an HTML file) or apache2 will return an error message. You can use Code 2.6 for datecgi.sh. 1 2 3 4 5 6 7 # ! / bin / sh e c h o " C o n t e n t −t y p e : t e x t / h t m l " echo e c h o " < html > <body > " e c h o −n " The c u r r e n t d a t e i s " date e c h o " </ body > </ html > " Code 2.6: Simple CGI script with Bash In Code 2.7 you have another example of a CGI (in C) that multiplies two numbers. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # i n c l u d e < s t d i o . h> # i n c l u d e < s t d l i b . h> i n t main ( v o i d ) { char * data ; long x , y ; d a t a = g e t e n v ( "QUERY_STRING" ) ; p r i n t f ( " C o n t e n t −t y p e : t e x t / h t m l \ n \ n " ) ; p r i n t f ( " < html ><body > \ n " ) ; p r i n t f ( " <h1 >MULTIPLICATION < / h1 > \ n< hr > \ n " ) ; i f ( d a t a == NULL) p r i n t f ( " <P>ERROR : No q u e r y s t r i n g r e c e i v e d </ P> " ) ; e l s e i f ( s s c a n f ( d a t a , " x=%l d&y=%l d " ,&x ,& y ) ! = 2 ) p r i n t f ( " <P>ERROR : I n v a l i d Arguments </ P> " ) ; else p r i n t f ( " <P>The p r o d u c t o f x=%l d and y=%l d i s z=%l d </ P> " , x , y , x * y ) ; p r i n t f ( " </ body > </ html > \ n " ) ; return 0; } Code 2.7: Simple CGI in C 2.10.4 Modules To manage modules, Debian-based distros use two directories: /etc/apache2/mods-enabled and /etc/apache2/modsavailable. To activate a module, use the a2enmod command: 1 2 # a2enmod userdir # /etc/init.d/apache2 restart The previous a2enmod creates symbolic links in the mods-enabled directory. Likewise, to disable the “userdir” module you can type: 1 2 # a2dismod userdir # /etc/init.d/apache2 restart 29 The “userdir” module is quite useful because it gives users a default place to setup their own WEB pages. As a system user, you just create a subdirectory called “public_html” in your home directory and place your files and HTML documents there. You can test this module locally with a browser using the following URL: h t t p : / / l o c a l h o s t /~ username / 1 The “username” is your user and of course you can use an IP address instead of “localhost” to remotely connect to your personal site. If an “access denied” appears in your browser, this might be due to the fact that apache2 runs using the system user www-data and your “public_html” directory is not readable by www-data. In general, all the resources on the server must be readable by the user www-data. 2.11 Commands summary Table 2.4 summarizes the commands used within this section. Table 2.4: Commands for WWW. firefox apache2 a2enmod a2dismod service a2ensite a2dissite 2.12 XML 2.12.1 Introduction A WEB browser. An HTTP server. Enable an Apache 2.0 module. Disable an Apache 2.0 module. Start, stop, restart, etc. services (daemons). Enable an Apache 2.0 WEB site (Virtual Host). Disable an Apache 2.0 WEB site (Virtual Host). XML (eXtensible Markup Language) defines a set of rules for encoding documents in a readable form. An XML document is a “text” file, i.e a string of characters coded with UTF8 or with an ISO standard like ISO-8859-1 (Latin1). The characters which make up an XML document are divided into markup and content. All strings which constitute markup either begin with the character "<" and end with a ">", or begin with the character "&" and end with a ";". Strings which are not markup are content. In particular, a tag is a markup construct that begins with "<" and ends with ">". Tags come in three flavors: • start-tags, for example <section> • end-tags, for example </section> • empty-element tags, for example <line-break /> Another special component in a XML file is the element. An element is a logical document component that either begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag. The characters between the start-tag and the end-tag, if any, are the element’s content. The element content may also contain markup, including other elements, which are called child elements. An example of an element is <Greeting>Hello, world.</Greeting>. A more elaborated example is the following: 1 2 3 4 5 6 7 <person > < n i f >46117234 </ n i f > <name> < f i r s t >Peter </ f i r s t > < l a s t >Scott </ l a s t > </ name> </ p e r s o n > Finally, the attribute of an element is a markup construct consisting of a name="value" pair that exists within a start-tag or empty-element tag. For example, the above person record can be modified using attributes to add the age and the gender of the person definition: 30 1 2 3 4 5 6 7 < p e r s o n a g e = " 17 " g e n d e r = " male " > < n i f >46117234 </ n i f > <name> < f i r s t >Peter </ f i r s t > < l a s t >Scott </ l a s t > </ name> </ p e r s o n > 2.12.2 XML Comments You can use comments to leave a note or to temporarily edit out a portion of XML code. Although XML is supposed to be self-describing data, you may still come across some instances where an XML comment might be necessary. XML comments have the exact same syntax as HTML comments: they start with "<!--" and end with "-->". Below is an example of a notation comment that should be used when you need to leave a note to yourself or to someone who may be viewing your XML. 1 2 3 4 5 6 7 8 < p e r s o n a g e = " 17 " g e n d e r = " male " > <!−− P e t e r i s a r e a l l y n i c e p e r s o n −−> < n i f >46117234 </ n i f > <name> < f i r s t >Peter </ f i r s t > < l a s t >Scott </ l a s t > </ name> </ p e r s o n > 2.12.3 Escaping XML uses several characters in special ways as part of its markup, in particular the less-than symbol (<), the greaterthan symbol (>), the double quotation mark ("), the apostrophe (’), and the ampersand (&). But what if you need to use these characters in your content, and you don’t want them to be treated as part of the markup by XML processors? For this purpose, XML provides escape facilities for including characters which are problematic to include directly. These escape facilities to reference problematic characters or “entities” are implemented with the ampersand (&) and semicolon (;). There are five predefined entities in XML: • & refers to an ampersand (&) • < refers to a less-than symbol (<) • > refers to a greater-than symbol (>) • ' refers to an apostrophe symbol (’) • " refers to an quotation symbol (") For example, suppose that our XML file should contain the following text line: 1 <commnand> e c h o " 1 " >/ p r o c / s y s / n e t / i p v 4 / i p _ f o r w a r d </ commnand> The previous line is not correct in XML. To avoid our XML parser being confused with the greater-than character, we have to use: 1 <commnand> e c h o " 1 " &g t ; / p r o c / s y s / n e t / i p v 4 / i p _ f o r w a r d </ commnand> In the same way, the quotation mark (") might be problematic if you need to use it inside an attribute. In this case, you have to scape this symbol. Notice however, that escaping the quotation mark is not necessary in our previous example, since the quotation mark appears inside the content of the element (and not in the value of an attribute). 2.12.4 Well-formed XML A “well-formed” XML document is a text document that satisfies the list of syntax rules provided in the XML specification. The list of syntax rules is fairly lengthy but some key rules are the following: 31 • The document contains only properly encoded legal Unicode characters. • None of the special syntax characters such as "<" and "&" appear except when performing their markupdelineation roles. • The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping. • The element tags are case-sensitive; the beginning and end tags must match exactly. • Tag names cannot contain any of the characters !"#$%&’()*+, /;<=>?@[] \^‘{|}~ nor a space character, and cannot start with - (dash), . (point), or a numeric digit. • There must be a single "root" element that contains all the other elements. 2.12.5 Valid XML In addition to being well-formed, an XML document has to be “valid“. This means that all the elements and attributes used in the XML document must be in the set defined in the language specification and must be used correctly. For example, if we define a language specification for person registry, we can define the elements: person, nif, name, first, last. We can also define the person attributes: age and gender and the type of values for each of the attributes (e.g. age attribute is an integer number and gender attribute has a value inside the set {male, female}). We might also define the order in which elements can appear and the nesting rules. For addressing all these issues, XML defines a especial file called ”Document Type Definition” (DTD) file. A DTD file defines an XML specification language, including all the elements, attributes and grammatical rules. Finally, the DTD file is used by XML processors to check if an XML document is ”valid”. In Code 2.8 we show the beginning of the DTD file of the VNUML language. 1 2 3 4 5 6 7 <!−− VNUML DTD version 1.8 −−> <!ELEMENT vnuml (global,net*,vm*,host?)> <!ELEMENT global (version,simulation_name,ssh_version?,ssh_key*,automac?,netconfig?,vm_mgmt?, tun_device?,vm_defaults?)> <!ELEMENT vm_defaults (filesystem?,mem?,kernel?,shell?,basedir?, mng_if?,console*,xterm?,route*,forwarding?,user*,filetree*)> ... Code 2.8: Beginning of the VNUML DTD file. In the previous DTD file we can see several quantifiers. A quantifier in a DTD file is a single character that immediately follows the specified item to which it applies, to restrict the number of successive occurrences of these items at the specified position in the content of the element. The quantifier may be either: • + for specifying that there must be one or more occurrences of the item. The effective content of each occurrence may be different. • * for specifying that any number (zero or more) of occurrences are allowed. The item is optional and the effective content of each occurrence may be different. • ? for specifying that there must not be more than one occurrence. The item is optional. • If there is no quantifier, the specified item must occur exactly one time at the specified position in the content of the element. 2.13 JSON JSON or JavaScript Object Notation is a data-interchange format. It’s power is it’s simplicity. The data is encapsulated in pair form: a string (a name) and the data contained which can be a string, a number, an object, an array or boolean types. The pairs are separated by a colon ’:’. Strings are always double-quoted. Example: 1 "mystring" : "HelloWorld" 32 1 "mynumber" : 123 Objects are elements that encapsulate one or more data pairs. They are limited by ’{’ and ’}’ and the data pairs are separated one from another with a comma ’,’ but the last element is never followed by a comma. Example: 1 2 3 4 5 { "string1" : "Hello", "string2" : "World", "number1" : 1 } Arrays are lists of elements . Arrays are delimited by ’[’ and ’]’ and the data is also separated with commas ’,’ and like objects, the last element is never followed by a comma. The data inside arrays can be strings, numbers, boolean expressions, objects or other arrays but cannot be value pairs. Example: 1 2 3 4 5 6 7 [ "Alice", "Bob", { "name": "Carla" } ] You can form nested structures by combining objects and arrays. Example: 1 [ { "string1" : "HeollWlrod", "spelling-checked" : false }, { "string1" : "HelloWorld", "spelling-checked" : true } 2 3 4 5 6 7 8 9 10 11 ] You can define an object or array as a value in a data pair: 1 2 3 4 5 6 7 8 "message" : { "from" : "Bob", "to" : [ "Alice", "Carla" ], "body" : "Hello Alice! Hello Carla!" } Note: Normally JSON messages are defined in multiple lines, adding indentation to clarify their structures but it’s not mandatory, the following message is equally valid: 33 1 "message":{"from":"Bob","to":["Alice","Carla"],"body":"Hello Alice!Hello Carla !"} Note: Since it doesn’t make sense to use JSON to encapsulate only one value most validators find errors if you define a JSON message like the previous example. To solve it you can simply wrap it inside an object or an array: 1 2 3 4 5 6 7 8 9 {"message" : { "from" : "Bob", "to" : [ "Alice", "Carla" ], "body" : "Hello Alice! Hello Carla!" } } JSON is named after JavaScript because it uses the same syntax to encapsulate data, you can look at JSON objects as javaScript dictionaries and JSON arrays as JavaScript variables. The same happens with python except for the boolean expression, which in python are Capitalized. 34 Chapter 3 Restful Architectural Style Representational State Transfer (REST from now on) is not a protocol or a standard. Rest is an architectural style for distributed hypermedia systems defined by Dr. Roy Fielding in his PhD dissertation: Architectural Styles and the Design of Network-based Software Architectures[1]. 3.1 REST motivation REST was designed in order to improve the modern web architecture and help solve some existing problems. Here are some of the desired requirements: • A system that provides a universally consistent interface to structured information, available on as many platforms as possible. • Simplicity: all of the protocols are defined as text. • Extensibility: A system must be prepared for change. • Systems must be designed for large-grain data transfer. • The architecture must minimize network interactions • The architecture element must be able to continue operating when they are subjected to an unanticipated load or when given malformed or maliciously constructed data. (Architecture elements refer to all the elements participating in the connection, thus client is included in this requirement). • To have a safe set of operations with well-defined semantics. • A system must be prepared for gradual and fragmented change. Old and new implementations co-exist. • The architecture must be designed to ease the deployment of architectural elements. When the REST was designed, the web architecture was already widely deployed and it had significant limitations in its support for extensibility, shared caching and intermediaries. The big problem was: How to create a new architectural style that added the desired requirements listed above but not producing a major change to the properties that had allowed the web to grow exponentially. The solution that Dr. Fielding found was to take the deployed web architecture, study the constrains that are responsible for its properties and add a new set of constrains to create a modern web architecture. To learn more about the justification and thought process behind this ideas you can read the fourth chapter from Dr. Fielding’s dissertation. 3.2 REST Constrains In this section you can find the constrains (or rules) added by Dr. Fielding and a brief explanation for each of them. Dr. Fielding defines this constrains in a very general way so the result can be applied to multiple systems and protocols. This document’s scope, however, is to describe it’s usual deployment: over HTTP. In each section, after the description of the constrain you’ll find a full description on how can you apply this rule to the development of APIs over HTTP under the title In practice. 35 3.2.1 Client-server The first constrain is to apply the main client-server architectural style principle: Separation of concerns. The separation of user interface concerns from data storage concerns adds portability of the user interface across multiple platforms and improves scalability. It also allows the components to evolve independently. In practice: The responsible for the separation between user interface and data storage are URIs. URIs are just a text string with a standard format that allows you to name a resource based on its location on the web. 1 p r o t o c o l : / / hostname / d i r e c t o r y / r e s o u r c e But, other information can also be present in the URL: 1 p r o t o c o l : / / u s e r n a m e : password@hostname : p o r t / d i r e c t o r y / r e s o u r c e The detailed specification for URL/URIs is detailed in RFC 1738[9]. Some examples are: • http://www.example.com/pictures/upc.jpg • http://www.example.com • http://192.168.0.5 • http://www.example:8080/cgi-bin/time.sh • http://user:[email protected]/ • ftp://debian.org Dr. Fielding doesn’t specify any rule for defining identifiers in his dissertation. However, some rules can be deduced from other constrains. For example, every resource has one or more representation, but they shouldn’t be part of the resource identifier: if you want to GET an image, traditionally its identifier will look like ’/img/myimg.png’ but an optimal way to GET the image would be requesting ’/img/myimage’ and send an ’Accept’ header in the request, specifying the desired media type. Apart from this constrain, there is a global tendency impulsed by W3C to use readable URIs with hierarchical structure that can be helpful[7]. Example: A service that provides information about restaurants. Every restaurant will be identified as a resource, you can also have a resource that lists them (or a subset of them) and a different resource that lists the ones near a certain zip code and finally a resource that represents the menu of a restaurant (The definition of a resource will be explained in section 3.2.4). 1 2 3 4 http http http http :// :// :// :// myrestaurantsite myrestaurantsite myrestaurantsite myrestaurantsite . org / . org / . org / . com / Restaurants Restaurants Restaurants Restaurants / / { R e s t a u r a n t Name} / { R e s t a u r a n t Name } / Menu / n e a r / { ZIPCODE} As you can see this URIs are self explained and don’t require a further explanation. Have in mind that this only helps to the understanding of the API structure and (maybe) to the URI parsing but it is not a REST requirement (and has nothing to do with the ’visibility’ attribute). Actually, following REST constrains, clients should not use fixed or pattern-generated URIs but retrieve the URIs from the server (see section 3.2.4). 3.2.2 Stateless In the client-server architecture a server serves many clients. This can be a problem if the server needs to store data from each client’s state because if the number of clients grow, the resources necessary also grow and the response time will be affected. To solve it, all the communications must be stateless, this implies that each request must contain all the necessary information for the server to process it and that the session information must be stored on the client. Stateless systems have better scalability, visibility and reliability. Scalability because the server needs less resources for each client, visibility because each request can be monitored independently (it contains all the information required) and reliability because it is easier to recover from partial failures. 36 On the other hand, by having a stateless server the network performance can be decreased because of repetitive data sent in a series of requests. In practice: The stateless constrain has two points of view: protocol side and server side. First of all, HTTP is a stateless protocol, so the protocol side is covered but you must keep in mind that you can’t use cookies. Cookies are not defined in the original standard of HTTP. They were defined some time after to fill the need of some applications to store sessions [11] but are widely used. On the server side, there is not any technology behind being stateless, it depends on the implementation each developer does. The key aspect behind this constraint is that it is only applied to the server, so the solution to implement stateful application is to transfer the state of the application to the client. Example: Imagine you want to develop an API for an online shop. A client might want to buy multiple products, and many online shops implement a virtual cart, where the app stores elements that you’ve already chosen. In a REST API, the shop would not keep the track of the elements that the client has visited or chosen. Instead, the client should keep the record and in the check-out, however it’s implemented, it should send the list of elements that the client wants. 3.2.3 Cache Cache strategy is a method to improve network performance. By labeling data as cacheable or non cacheable we allow the client (or another network element) to reuse the information retrieved from the server in previous requests. Of course, since this can cause reliability problems if the data stored is drastically different from the data stored in the server, the decision of whether some data is cacheable or not is crucial. By implementing a cache-capable system the efficiency, scalability and the user-perceived performance are improved. In practice: From the point of view of messages, HTTP has headers that define the caching strategy that a client or a server must follow regarding that specific message. You can find those headers detailed in section 2.9.2. The cache system can be applied not only to the clients but also to intermediary nodes on the network. The problem is you can only use cache for non encrypted connections. Therefore, if an application needs to be encrypted the cache-capable nodes have to be located behind the security layer. From the network point of view there are many available proxy server technologies that implement caching. Important: The decisions taken in the resource definition may condition some system’s ability to properly apply the cache rules defined. For example, some resources return different information depending on the query string included on their URI. The most typical examples are search resources: 1 h t t p : / / m y s e a r c h s i t e . com / s e a r c h ? q=news&o r d e r _ b y = d a t e Some cache intermediaries can consider this kind of URIs non-cacheable regardless of the cache headers included on the response. Squid, for example, handles query string cache since version 2.7. 3.2.4 Uniform interface This constrain is probably the most important one. It’s the one that defines the very nature or REST. REST requires applying the software engineering principle of ’generality to the component interface’. It allows the implementations to be decoupled from the services they provide. REST defines four interface constrains: • Identification of resources 37 • Manipulation of resources through representations • Self-descriptive messages • Hypermedia as the engine of application state Identification of resources: REST uses resource identifiers to map it’s resources. REST enforces the author to choose the resource identifier that best fits the nature of the concept being identified. The author is also responsible for maintaining the semantic validity of the mapping over time. Resource representation: REST resources are not transfered. Instead, a representation of the resource is transferred. A representation is a sequence of bytes containing the information traded with additional meta-data to describe the sequence. For example, if your resource is an email stored on a database, you could represent it as three strings under the names of ’from’, ’to’, and ’message’ in a JSON format, but you’d never send the whole row from the table that contains it in the database. Representations must match one of an evolving set of standard data types. Self-descriptive messages: REST messages must contain standard methods and media types to indicate semantics. Joining the previous ’stateless’ constrain the result are self-descriptive messages. Hypermedia as the engine of application state: Since the ’stateless’ constrain prevents servers from storing application state, this must be kept in the client side. This can be achieved by making the server send to the client the set of choices he has in the state point where it is. Dr. Fielding defends in his blog: "A REST application should be entered with no prior knowledge beyond the initial URI (bookmark) and set of standardized media types that are appropriate for the intended audience (i.e., expected to be understood by any client that might use the API). From that point on, all application state transitions must be driven by client selection of server-provided choices that are present in the received representations or implied by the user’s manipulation of those representations." By defining a uniform interface visibility of interaction is improved and overall system architecture is simplified. On the other hand, by defining a uniform interface, the efficiency may be degraded since some application could work optimally under some different conditions. In practice: Everything is a resource in REST APIs: files, raw information, database query results, data retrieved from an algorithm, etc. are the same from the point of view of the client. It is task of the developer to define what resources are important from the client perspective and how to treat those diverse elements as a resource. If we look at resources from the class based programming languages perspective, the resources could be seen as classes. They have their attributes, which will be used for representing such resource and their methods, which will allow us to interact with them. Example: If we develop an address book the resources can be: a concrete person, a telephone number, an address, lists of people, etc. A concrete person will contain a telephone number, an address (or more), a name, etc. A list of people will contain multiple person resources. We are not interested on how this resources are obtained. They might be stored on databases, they might be stored on different single files... but from the client perspective it doesn’t matter. Sometimes it’s hard to transform some services or functionalities into resources. For example, following with the online shop example: Imagine that you want your API to allow a user to buy a list of things, pay for them and keep track of them until the transportation enterprise takes care. You could, for example define a resource that represents a list of orders, a resource that defines the payment information about an order and a resource that represents the shipment information about a payed order. The client would look for products in the API and elaborate a list of the ones that the user wants to buy. Once the list is finished, the client would POST the list on the list of orders resource. If everything went ok (the list is well formed, there is enough stock for all the products, etc.) the API should return a 201 status code (created) and a content location header that points to the payment information resource. The client would GET the payment information and pay the bill by sending a PUT action with the bank account information for the store to collect the money. Once the payment is done, another 201 status code is returned and a content location header pointing to a new resource that represents the shipment information. Also the order is removed from the orders list. 38 Resource representation: As stated before, the resources are never transfered, instead a representation of them is sent to work with. The developer must decide and document which formats representation have. As Dr. Fielding stated in his blog: “A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources and driving application state, or in defining extended relation names and/or hypertext-enabled mark-up for existing standard media types” Example: In the previous address book we could define a person representation in JSON format like this: 1 [ { 2 "person":{ "first_name":"STRING", "last_name":"STRING", "telephone_number":"URI pointing to telephone resource", "addresses":[ { "address_name":"STRING", "address":"URI pointing to address 1" }, { "address_name":"STRING", "address":"URI pointing to address n" } ] } 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 } 18 19 ] A good practice regarding resource representations is to define custom Content-Type values instead of using the generic ones for your representations, this will ease the task of parsing the payload of your messages. The application/vnd mime type allows you to define your custom non standardized formats. If you define it like this: 1 a p p l i c a t i o n / vnd+company . c a t e g o r y + f o r m a t You’re not only defining the format but also the kind of data it contains and it’s structure. Example: For the previous example you could’ve defined it’s Content-Type like usually: 1 C o n t e n t −Type : a p p l i c a t i o n / j s o n But a much better option would’ve been: 1 C o n t e n t −Type : a p p l i c a t i o n / vnd+ yourcompanyname . p e r s o n + j s o n If you want to standardize it and make it public so it can be accessible all around the world you can use IANA’s service 1 Self-descriptive messages Self descriptive messages imply two conditions: the use of well known semantics and the need of sending not only a request but also metadata that should be used to describe the message. 1 http://www.iana.org/cgi-bin/mediatypes.pl 39 The semantics in HTTP1.1 are defined by HTTP method names and status codes. You can look for the available methods in HTTP in section 2.9.7 and you can learn about status codes in section 2.9.8. HTTP1.1 also defines headers that will contain the metadata needed for a message to be Self-descriptive. There are three types of headers: Request-only, Response-only and Request-and-Response. You have a list of them in section 2.9.2. Hypermedia as the engine of application state From Wikitionary: 1 Noun 2 3 hypermedia (uncountable) 4 5 6 7 (computing) The use of text, data, graphics, audio and video as elements of an extended hypertext system in which all elements are linked so that the user can move among them at will. Hypermedia is a superset of hypertext. Hypermedia refers to any kind of data that may be transfered but that contains interconnections between resources. Since in this example we are using URIs as resource identifiers the interconnections of resources and the drive of the application state will be through URIs. Example: Remember the previous restaurant example. We can access its bookmark in ’/’ and we may get: 1 <link href="/Restaurants/" ref="List" /> Then if we perform a get action on ’Restaurants’ we could get: 1 2 3 4 5 6 <Restaurant> <Name>My Restaurant</Name> <Location>123 Fake street</Location> ... <link href="/Resturants/168498/ ref="Restaurant" /> </Restaurant> After that we could even follow the link provided and we might get: 1 2 3 4 5 6 7 8 <RestaurantDetail> <Menu> <link href="/Restaurants/168498/Menu" ref="Menu" /> </Menu> <Reserve> <link href="/Restaurants/168498/ ref="Reserve" /> </Reserve> </RestaurantDetail> Or if the restaurant doesn’t allow on-line reservations: 1 2 3 4 5 <RestaurantDetail> <Menu> <link href="/Restaurants/168498/Menu" ref="Menu" /> </Menu> </RestaurantDetail> You can look at your API like if it was a finite state machine. When a client the API’s bookmark is on the initial state, and you can guide them through the application by providing links to the next possible resources in the application flow depending on the last resource that they accessed. For example, in figure 3.1 you can see a representation of a finite state machine that describes the restaurant API. The nodes are not resources but actions performed on those resources (you can see how it’s very simple to relate to HTTP verbs). 40 Show list Show Restaurant Show menu Show list Restaurant List Shown Bookmark Show list Restaurant Details Shown Show Details Show list Add Restaurant Restaurant Added Menu Shown Add restaurant Modify Menu Modify menu Modified Menu Add a menu Figure 3.1: Finite state machine representation. 3.2.5 Layered System The most basic server-side architecture is a central node that responds all the incoming requests. This solution, obviously is not the optimal one. REST proposes layered systems that add hierarchy layers where any component can’t see beyond the immediate layer. A layered system offers scalability improvements, since it opens the possibility of load-balancing. It also allows to allocate multiple strategically placed caches in order to boost the performance. The main disadvantage is that they add overhead and latency to the processing of data. In practice: Let’s take a look at a possible server side arrange. ZONE 1 ZONE 2 CORE SERVER Figure 3.2: Layered system example. 41 ZONE 3 As you can see, there are three zones: • Zone 1: Authentication zone. It handles authentication and since it is the edge zone, it will also decrypt the incoming requests. • Zone 2: Proxy zone. It caches data. If a request can be responded with previously stored data it will be done in this layer. • Zone 3: Core zone. It will manage the remaining requests. When defining layered systems the important rule to remember is to export as many tasks as possible to the outer layers. 3.2.6 Code on demand Code on demand is an optional constrain within REST. REST allows the client to download and execute code in the form of applets or scripts in order to extend it’s functionality. This constrain improves system extensibility but at the same time it reduces visibility, and that’s the reason behind it’s optional condition. In practice: As seen before, code on demand may reduce server load and improve network performance but you have to be careful. If the connection is attacked with a Man-In-The-Middle attack2 the attacker could send malicious code to the client. That’s why the code should be interpretable and not compiled in order to be able to determine whether the code is safe or not. Some common used scripting languages3 are JavaScript or Python. 3.3 How to design your APIs In this section you’ll find a methodology that can be helpful when applied to the API design. The ideas developed in the next sections will be exemplified using a common example: A flight booking API. 3.3.1 Define functionalities The design of an API must be started with a top down thinking process. You need to start by listing what functionalities should your API offer and as much as possible forget about the implementation of this functionalities. It should be done from the point of view of the client: What does the client need? What would be useful for a client? You should avoid at all cost to think about how would you implement the functionalities or how this functionalities match to resources because if you can’t stay away from this ideas you might fail to accomplish the Uniform Interface constrain. Example: In this flight booking API we’ll define only one functionality to keep it simple: to access a list of flights. You have to set a day, an origin and a destination and the API shall return the list. 3.3.2 Define your resources Once you’ve thought which functionalities your API must offer, the next part is to map each on of them into one or many resources. It’s important to keep in mind the fourth constrain, specifically the first two conditions of it. You must define resources that are identifiable and have a representation. Example: You could create a resource that parses bits of URI to query a database such as: 1 http://api.myflightcompany.com/flight/{from}/{to}/{yyyy}/{mm}/{dd}/ 2 http://en.wikipedia.org/wiki/Man-in-the-middle_attack 3 http://en.wikipedia.org/wiki/List_of_programming_languages_by_type#Scripting_languages 42 But you should communicate this structure to the client and make it construct the URIs instead of you sending them. A better solution is to create a query resource, which takes a standard data type that contains the date, the origin city and the destination city. When the client performs a POST request in this resource, it would process the information, query the different companies that the API works with and generate a new resource that contains a list of flights. The server would communicate the new resource’s URI to the client. This strategy allows the server to implement cache on application level. This means that if a client POSTs a query and a new resource is generated, this resource may have a validity time, and can be reused to answer other client requests that arrive in a certain period of time. 3.3.3 Define resource representation You’ll basically need to define which data is exchanged in every connection with the server. It needs to be communicated to the client so it can be prepared for it. When designing the data structure you should keep in mind that while REST APIs are built to avoid coupling between the client and the server and that they increase the independence between them, a change in the data representation is one aspect that REST APIs are not prepared to deal with and it may cause major changes in the client. Example: The data structure in JSON for a POST request into the query resource could be: 1 2 3 4 5 6 "query": { "from":"string", "to":"string", "date":"yyyy-mm-dd", "company": "string" or None } Or it could be completely different. You could’ve defined the ’from’ and ’to’ fields by integers that represent the city. You should then create resources that represents the list of cities and relates the city names with the integer identification number. 3.3.4 HATEOAS After the representation data has been decided, you should elaborate the HATEOAS flow chart. This means creating different states depending on the last action that the client has performed. The client then will be offered a list of valid links to the possible resources that it could access to follow the application flow. If the format that you’ve chosen does not define a standard type for links, you’ll have to create your own. You’ll have to define for every link to what kind of resource does it point. Example: In this example there’s only three possible states: Bookmark, Query posted and list of flights shown. The application flow should be like the one that you can see in figure 3.3 Post a new query Bookmark Post a query Good Query query posted Show the flight list Flight list shown Post another query Figure 3.3: Flight finite state machine representation. 43 3.3.5 Cache The next step is to define which information is able to be cached or not. The decision of the time that a resource can be cacheable must be based on the variability of the resource over the time and on the risk for the client to work with non-valid data. You’ll also need to specify other cache-related questions such as if the data is private or not. If it’s private, it can’t be stored in public caches. Example: The query resource responds only to POST requests. Since POST is not a cacheable method, the query resource cannot be cached. However, the resources created when a query request arrives can be cacheable and the validity time of the data will depend on the data they carry. For example, if they contain the number of free seats remaining on the plane, the validity time will be short, but if they don’t it will be longer. 3.3.6 Implement your API Once you’ve fully designed your API functionalities, the resources that will be implemented to offer those functionalities, the structure of the representation data, and the list of cacheable and non-cacheable resources it’s time to implement your API. You’ll have to have in mind the part of the Uniform Interface constrain which requires self-descriptive messages. You’ll have to add as much metadata as possible using the HTTP Headers correctly. Also, keep in mind to add the links generated to flow the application state correctly into the responses. In chapter 6 you’ll find a concise explanation of Django framework. You’ll find documentation about how to implement REST APIs with the framework. and examples to clarify ideas. 44 Chapter 4 REST Practices 4.1 Interface exercices Exercise 1: Suppose a given fully developed newspaper article API. All the resources are given. Resource List: 1. Bookmark: A list of URIs pointing to the existing resources 2. Article: It represents an article, it contains information about the newspaper which published the article, it’s author and the article headline and body. 3. Author: It represents an author. It contains contact information about the author, the newspaper that the author writes for, etc. 4. List of articles: An author’s list of publications. 5. Newspaper: It represents a publishing authority. It contains information about the authority (Name, location, list of authors, brief description, etc.) 6. List of newspapers: A list containing all the newspapers. 7. Search by date: It represents a list of articles written on a date sent in the body payload. 8. Search by newspaper: It represents a list of articles published by the newspaper whose representation is sent in the body payload. 9. Most recent articles: Lists the most recent articles. Exercise 1.1: Write a URI for each of the previous resources. Exercise 1.2: For each resource list all the HTTP methods that you’d develop and describe what whould they do. Exercise 1.3: Describe the application’s flow diagram for a client that wants to read the articles for a certain author. Exercise 1.4: For each resource think of at least one resource representation and specify it (Remember that with every representation you must send the hypermedia necessary for the client to be able to follow the application flow). Define also a private mime type for each resource representation For example: the bookmark could return a JSON representation that could look like this: 1 2 3 4 5 6 [ "articlelistlink": "URI to `List of Articles` resource", "newspaperlistlink' : "URI to `List of Newspapers` resource", "searchbydatelink" : "URI to `Search by Date` resource", "searchbynewspaperlink" : "URI to `Search by Newspaper` resource" "recentarticleslink": "URI to `Most recent news`resource" 45 7 8 ] Content-Type: application/vnd+example.bookmark+json Exercise 1.5: You can see how the ’Search by author’ and the ’Search by newspaper’ resources are very similar. How could you make them point to the same resource (a search engine) using query parameters in the URI? Exercise 2: In this exercise we will practice the topics explained in section 3.2.4. We will start from a defined list of resources for a concrete API and we will work from there. The API is invented and tries to manage an automated house. It can contain sensors and actuators, for example: you can have temperature and light sensors and heaters and leds as actuators. API resources: • Bookmark: Contains a list of links to the existing resources. • List of sensors: A list of all the active sensors showing their id number and the URI to access their information. • List of actuators: A list of all the active actuators showing their id number and the URI to access their information. • Sensor: It represents a sensor. It contains the information about what the sensor is capable of reading (light, sound, etc.), the link to a resource that returns its actual reading, their refresh rate and an identification number. • Sensor value: a resource that represents the actual reading of the sensor. • Actuator: It represents an actuator. It contains a list of actions actions the actuator is capable of performing (start/stop for example) and an identification number. • List of Rules: Lists all the defined rules of the system. • List of active rules: Lists all the rules whose condition is true at the moment of the request. • Rule: It is the link between sensors and activators. A rule specifies two arrays: one for conditions and one for actuations. The conditions array will contain a list of condition which will define the sensor to which they’re applied, the operand (bigger than, smaller than, equal, etc.) and the reference value. The actuations array will contain a list of objects which will contain the actuator, and the action. They can be accessed through their id number Exercise 2.1: Write a URI for each of the previous resources. Exercise 2.2: For each resource list all the HTTP methods that you’d develop and describe what’d they be used for. Exercise 2.3: Describe the application flow diagram for a client that wants to add a new rule to the system. Exercise 2.4: For each resource think of at least one resource representation and specify it (Remember that with every representation you must send the hypermedia necessary for the client to be able to follow the application flow). Define also a private mime type for each resource representation. Exercise 2.5: What cache related property could be related to the ’refresh rate’ parameter existing in a sensor? 4.2 Cache exercises Exercise 1 (Theoretical): Suppose there is a cache-ready system. It will take a request and if there has been a previous request in the last T seconds it will return the stored response, otherwise, it will recalculate the response. The time spent generating a response with the stored data is ’x’ seconds and the time spent generating a new response from the resource is ’y’ seconds. Suppose also Poisson distribution for the incoming requests with income tax λ req. sec . HINT: Consider a fixed span of time (duration T) and consider that the chached response expired just before this span as you can see in figure 4.1 . The arrows represent requests. The red one represents a request responded with a non-cached response and the black ones represent responses responded with cache responses. 46 t (s) T (s) Validity time Figure 4.1: Cache model. Exercise 1.1: Calculate the new average service time (T s0 = 1/µ (s)). s0 Exercise 1.2: Calculate the relative improvement as T s−T knowing that the old service time was T s = y Ts Exercise 1.3: Check that, supposing x=1, y=5, λ = 1 and T = 10, the new service time is 1.4 s and that the relative improvement is 72% Exercise 2: Suppose a given fully developed shop API. All the resources are given. Resource List: 1. Article: It represents an article, it contains its price, its description and the number of products in stock. 2. List of Categories: List of all the categories on the store. 3. List of Articles: List of all the articles existing in a certain category. 4. Shop Information: Plain information about the store such as legal name, physical shop location, etc. 5. My previous buys: Lists the last 5 buys you’ve done in the shop. 6. Recommended for me: Lists 5 articles that are recommended for the user accessing the API based on previous buys and a random factor. 7. Sales: List of articles on sale. In this example, the user based resources (’My previous buys’ and ’Recommended for me’) return different data depending on the authentication header found. For a non authenticated client, they will return a 401 (unauthorized). Exercise 2.1: From the point of view of the client, which resources have less variation rate and therefore should be cached? Exercise 2.2: From the point of view of an intermediary cache which resources should be stored? Remember that an intermediary cache may handle requests from multiple users. Exercise 2.3: For each resource listed before list all the headers that you’d add to its requests and its responses based on the responses of the previous exercises. 47 Solution exercise 1: 1.1: 1 2 3 4 5 6 7 8 9 Bookmark : ' / ' Article : ' / a r t i c l e /{ id } ' Author : ' / a u t h o r / { id } ' L i s t of a r t i c l e s : ' / author /{ id }/ a r t i c l e s / ' Newspaper : ' / n e w s p a p e r s / { i d } ' L i s t of newspaper : ' / newspaper / ' S e a r c h by d a t e : ' / s e a r c h / b y _ d a t e / { dd−mm−yyyy } ' S e a r h by n e w s p a p e r : ' / s e a r c h / b y _ n e w s p a p e r / { n e w s p a p e r _ i d } ' Most r e c e n t a r t i c l e s : ' / r e c e n t / ' 1.2 1 2 3 4 5 6 7 8 9 Bookmark : GET A r t i c l e : GET , DELETE , POST A u t h o r : GET L i s t o f a r t i c l e s : GET , POST Newspaper : GET L i s t o f n e w s p a p e r s : GET S e a r c h by d a t e : GET S e a r h by n e w s p a p e r : GET Most r e c e n t a r t i c l e s : GET 1. The GET methods are always used to retrieve information. 2. DELETE on Article, should be used to delete it ONLY if the client making the request is authenticated and has the right permissions. 3. POST on Article should modify its content ONLY if the client making the request is authenticated and has the right permissions. 4. POST on List of articles should be used to create a new article ONLY if the client making the request is authenticated and has the right permissions. 1.3 1 2 3 4 5 1−GET 2−GET 3−GET 4−GET 5−GET 1.4 1 2 3 4 5 6 7 8 the the the the the L i s t of Newspapers newspaper l i s t of authors l i s t of a r t i c l e s articles . -Bookmark [ "articlelistlink": "URI to `List of Articles` resource", "newspaperlistlink' : "URI to `List of Newspapers` resource", "searchbydatelink" : "URI to `Search by Date` resource", "searchbynewspaperlink" : "URI to `Search by Newspaper` resource", "recentarticlelink":"URI to `Most recent articles`resource" ] Content-Type: application/vnd+example.bookmark+json] 2- Article: 1 2 3 { "newspaper":newspaper_id, "author":author_id, 48 4 5 6 7 8 9 "article":{ "headline":"String", "body":"String" } } Content-Type: application/vnd+example.article+json] 3-Author: 1 2 3 4 5 6 7 8 9 10 11 { "information":{ "name":"String", "city":"String", "about_me":"String", ... }, "newspaper":newspaper_id, "articleslink":"URI to this author's list of articles", } Content-Type: application/vnd+example.author+json] 4- List of articles: 1 2 3 4 5 6 7 8 [ { "headline":"String", "articlelink":"URI to the article" }, ... ] Content-Type: application/vnd+example.articlelist+json] 5-Newspaper: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "information": { "name":"string", "location":"string" } "authors":[ { "authorname":"String", "authorlink":"URI to the author resource" }, ... ] } Content-Type: application/vnd+example.newspaper+json] 6-List of newspapers: 1 2 3 4 [ { "name":"String", "newspaperlink":"URI pointing to the newspaper" 49 5 6 7 8 } ... ] Content-Type: application/vnd+example.newspaperlist+json] 7-Search by date: 1 2 3 4 5 6 7 8 [ { "headline":"String", "articlelink":"URI to the article" }, ... ] Content-Type: application/vnd+example.articlelist+json] 8-Searh by newspaper: 1 [ { "headline":"String", "articlelink":"URI to the article" }, ... 2 3 4 5 6 7 8 ] Content-Type: application/vnd+example.articlelist+json] 9-Most recent articles: 1 [ 2 { 3 "headline":"String", "articlelink":"URI to the article" }, ... 4 5 6 7 8 } Content-Type: application/vnd+example.articlelist+json] Solution exercise 2: 2.1: Bookmark : ' / ' L i s t of sensors : ' / sensors / ' L i s t of a c t u a t o r s : ' / a c t u a t o r s / ' Sensor : ' / sensors /{ sensor_id } ' Sensor value : ' / sensors /{ sensor_id }/ value ' Actuator : ' / actuator /{ actuator_id } ' L i s t of r u l e s : ' / r u l e s / L i s t of a c t i v e r u l e s : ' / r u l e s / a c t i v e ' Rule : ' / r u l e s / { r u l e _ i d } ' 1 2 3 4 5 6 7 8 9 2.2: 1 2 3 4 5 1− 2− 3− 4− Bookmark : GET − R e t r i e v e t h e l i n k s t o t h e r e s o u r c e s . L i s t o f s e n s o r s : GET − R e t r i e v e t h e l i s t o f s e n s o r s . L i s t o f a c t u a t o r s : GET − R e t r i e v e t h e l i s o f a c t u a t o r s . S e n s o r : GET − R e t r i e v e a s e n s o r ' s i n f o r m a t i o n . POST − Change t h e r e f r e s h _ r a t e v a l u e . 50 5− S e n s o r v a l u e : GET − R e t r i e v e t h e a c t u a l v a l u e o f a s e n s o r . 6− A c t u a t o r : GET − R e t r i e v e t h e i n f o r m a t i o n a b o u t an a c t u a t o r 7− L i s t o f r u l e s : GET − R e t r i e v e t h e l i s t o f a l l t h e e x i s t i n g r u l e s . POST − Add a new r u l e . 8− L i s t o f a c t i v e r u l e s : GET − R e t r i e v e t h e l i s t o f a c t i v e r u l e s 9− R u l e : GET − R e t r i e v e a r u l e . 6 7 8 9 10 11 2.3: 1 2 3 4 5 6 1− 2− 4− 5− 6− 7− GET Bookmark . S t o r e URIs p o i n t i n g t o t h e r e s o u r c e s . GET L i s t o f s e n s o r s . S t o r e t h e i r i d ' s . GET S e n s o r s u n t i l f i n d i n g t h e d e s i r e d one . GET L i s t o f a c t u a t o r s . S t o r e t h e i r i d ' s . GET A c t u a t o r u n t i l f i n d i n g t h e d e s i r e d one . POST A new r u l e t o t h e l i s t o f r u l e s . 2.4: JSON Representation 1- Bookmark: 1 2 [ "sensorlistlink": "URI to `List of Sensors` resource", "actuatorlistlink' : "URI to `List of Actuators` resource", "ruleslistlink" : "URI to `List of Rules`resource", "activeruleslistlink" : "URI to `List of active Rules` resource" 3 4 5 6 7 8 ] application/vnd+myexample.bookmark+json 2- List of sensors: 1 2 3 4 5 6 7 8 [ { "sensor_id": id_number 'sensorlink': "URI to the sensor with `id_number` identification" }, ... ] application/vnd+myexample.sensorlist+json 3- List of actuators: 1 2 3 4 5 6 7 8 [ { "sensor_id": id_number 'actuatorlink': "URI to the actuator with `id_number` identification" }, ... ] application/vnd+myexample.actuatorlist+json 4- Sensor: 1 2 3 4 5 6 { "sensor_id": id_number "magnitude": "String with the magnitude read by the sensor" "units": "String to know what units the sensor uses" "refresh_rate": number representing the interval in seconds between reads. "valuelink": "URI pointing to the actual value of the sensor" 51 } application/vnd+myexample.sensor+json 7 8 5- Sensor value: value: number application/json 1 2 6- Actuator: { "actuator_id": id_number "actions":[ "action": "One of the possible actions", ... ] } 1 2 3 4 5 6 7 7- List of rules: [ {Rule object}, ... ] application/vnd+myexample.rulelist+json 1 2 3 4 5 8- List of active rules: [ "rule":{ Rule object}, ... ] application/vnd+myexample.rulelist+json 1 2 3 4 5 9- Rule: { 1 "rule_id": id_number, "conditions": [ "sensor": id_number, "operand": "string representing an operand", "value": number representing the reference value ], "actuations": [ "actuator":id_number, "action":"String representing an action" ], "active": True/False 2 3 4 5 6 7 8 9 10 11 12 } application/vnd+myexample.rule+json 13 14 2.5 1 2 3 4 The r e f r e s h _ r a t e m a t c h e s d i r e c t l y w i t h t h e v a l i d i t y o f t h e d a t a . You c o u l d s p e c i f y t h e ' D a t e ' h e a d e r a l o n g s i d e w i t h ' max−a g e ' Cache−C o n t r o l d i r e c t i v e o r s i m p l y t h e max−a g e s i n c e t h e t i m e i t was g e n e r a t e d i s n o t v e r y i m p o r t a n t because of the changing nature of the value . 52 Solution 1.1: Since in a cache system we will have two different service times, in order to find the average service time we will have to look for the expectation of the service time. E[Ts ] = ∞ X xi pi (4.1) i=1 For our particular case: E[Ts ] = x × pcached + y × pnocached (4.2) To find the probabilities of a response being cached or not we will use the frequency analysis: ni N In our case of study, N will be the average number of arrivals in a T interval which will be: fi = N =λ×T (4.3) (4.4) In a T interval there will be 1 non cached response and N − 1 cached responses, therefore: pcached = fcache = ncached N −1 nnocached 1 = ; pnocached = fnocached = = N N N N (4.5) Applying the equation 4.2: T s0 = E[X] = x N −1 1 +y N N (4.6) 1.2 Directly applying the equation given with Ts = y and with the results of 4.2 we obtain: −1] y − ( y+x[N ) T s − T s0 N = Ts y (4.7) 1 T s − T s0 xN −1 =1− − Ts y N N (4.8) Rearranging the equation we can get: Which is valid for xy < 1 1.3 Simply applying the values to the equation we get: T s0 = 1 10 − 1 1 +5 = 1.4s 10 10 T s − T s0 1 1 10 − 1 =1− − = 0.72 = 72% Ts 10 5 10 Solution exercise 2: 2.1 1 2 3 4 5 6 7 A r t i c l e : High v a r i a t i o n due t o t h e p r o d u c t s i n s t o c k v a r i a b l e −> No Cache L i s t o f c a t e g o r i e s : Very low v a r i a t i o n . −> Cache L i s t o f a r t i c l e s : Low v a r i a t i o n −> Cache Shop I n f o r m a t i o n : Very low v a r i a t i o n . −> Cache My p r e v i o u s b u y s : Low v a r i a t i o n −> Cache Recommended f o r me : High v a r i a t i o n . −> No Cache S a l e s : Medium v a r i a t i o n . −> Cache w i t h low v a l i d i t y t i m e 53 (4.9) (4.10) 2.2 All the resources that are not different for each customer and don’t have high variation rates. 1. List of categories 2. List of articles 3. Shop information 4. Sales 2.3 Article 1 2 3 Request : Cache−C o n t r o l : no−c a c h e ( o n l y i f t h e i n f o r m a t i o n n e e d e d i s I f −Changed−S i n c e : −HTTP DATE− critical ) 4 5 6 7 8 Response : Cache−C o n t r o l : p u b l i c D a t e : −HTTP DATE− E x p i r e s : −HTTP DATE c l o s e t o t h e a c t u a l d a t e − List of categories, List of articles, Shop information and Sales: 1 2 Request : I f −Changed−S i n c e : −HTTP DATE− 3 4 5 6 7 Response : Cache−C o n t r o l : p u b l i c D a t e : −HTTP DATE− E x p i r e s : −HTTP DATE f a i r l y away from t h e a c t u a l d a t e − My previous buys: Ideally, you’d store this list as long as possible, but right after buying an article the list should be updated. 1 2 Request : Cache−C o n t r o l : no−c a c h e 3 4 5 6 7 Response : Cache−C o n t r o l : p r i v a t e D a t e : −HTTP DATE− E x p i r e s : −HTTP DATE f a i r l y away from t h e a c t u a l d a t e − Recommended for me: 1 2 Request : Cache−C o n t r o l : no−c a c h e 3 4 5 6 7 Response : Cache−C o n t r o l : p r i v a t e D a t e : −HTTP DATE− E x p i r e s : −HTTP DATE n o t v e r y c l o s e b u t n o t v e r y f a r away from t h e a c t u a l d a t e − 54 Chapter 5 Other API architectures After we’ve deeply analyzed how to design REST APIs, let’s see other existing styles and protocols. 5.1 RPC APIs Remote Procedure Call (RPC) was the first model that was created for developing distributed APIs. It is based on message exchanges and follows the service oriented architecture. The client, has to generate a message that exactly identifies a procedure and also contains all the necessary parameters for the procedure to work. Procedures can be subroutines, functions, methods, services, system calls or any executable object. When the server receives the message, it inspects the process identifier and calls the procedure indicated mapping the message parameters into the procedure arguments. The main problem with RPC model is that the applications are highly coupled. One application is highly coupled to another when the first one depends strongly on the second one. In other words, if the procedure changes, the application has to change too. For example, if a procedure suddenly changes it’s output or the arguments it takes, the applications that call this service have to change immediately. RPC is meant to work with any transport protocol. However, the application must be conscious about what protocol is being used, because some adjustments may be required. For example, if the protocol used is not reliable (UDP) the application must implement its own time-out, retransmission, and duplicate detection policies. There are different standards for RPC. The most popular one’s are Sun Microsystems’ ONC-RPC, specified in RFC1831[30] and Open Software Foundation’s DCE/RPC, specified in DCE 1.1 C706 1 . They define authentication protocols, data formats, data structures, etc. Nowadays pure RPC is not used. Instead, some ’flavored’ versions of RPC are used such as XML-RPC or JSONRPC. XML-RPC is a protocol based on RPC developed by Dave Winer in 1998. It applies the basics from RPC but it uses XML to structure the parameters that are used as input and output, allowing more complex structures such as arrays . It always uses HTTP-POST messages. JSON-RPC is basically the same but using JSON instead of XML. 5.2 Message based APIs Message based evolve from RPC but instead adding a new level of abstraction. It was designed to avoid the tight coupling from RPC. Instead of defining a procedure like in RPC, in message based APIs it is the server which decides the correct procedure that it has to execute depending on the message received from the client. The message is sent to a designated URI and in his body contains the needed data and may contain headers to make the message selfdescriptive. 1 http://pubs.opengroup.org/onlinepubs/9629399/ 55 Usually message based APIs use standardized message formats like SOAP and they use other standard specifications defined by W3C. For example, they usually use WSDL to define and describe services, they may use WS-Policy and WS-Security specifications to define the authentication and security schemes, etc. SOAP is a message-based protocol developed by Microsoft, IBM, DevelopMentor and UserLand. It evolved from Dave Winer’s XML-RPC protocol, but it is much more complex. SOAP defines a message construct that structures information in a XML format and defines three blocks: • An envelope: which contains a SOAP message. • A Header, which contains control information such as authentication credentials, etc. • A Body, that contains data. Example: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 <?xml version="1.0" ?> <env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <p:oneBlock xmlns:p="http://example.com" env:role="http://example.com/Log"> ... ... </p:oneBlock> <q:anotherBlock xmlns:q="http://example.com" env:role="http://www.w3.org/2003/05/soap-envelope/role/next"> ... ... </q:anotherBlock> <r:aThirdBlock xmlns:r="http://example.com"> ... ... </r:aThirdBlock> </env:Header> <env:Body > ... ... </env:Body> </env:Envelope> This Soap Message contains an envelop which contains a header with three blocks of control information and a body. SOAP messages may carry information, tasks to execute and events. When the message is received by the server it determines the exact procedure to call. Clients may send three message types: Command messages to execute a task on the server, Event messages to notify an event and Document Messages to exchange documents. There is one more type of messages called Faults which are used as a return from the server as error messages. Even if SOAP can be used on top of any transport protocol most of the time it is used over HTTP. It always uses the GET and POST methods. SOAP defines some functionalities that are already implemented by the HTTP protocol, such as the Faults messages, which are already defined in HTTP as status codes. An important tehcnology involved in SOAP architectures is Web Services Description Language (WSDL). WSDL are documents written in XML. They describe web services by describing the lists of operations that a client can call (and the endpoint where to call them, usually a URI), defining message structures sent and received. Example: 1 2 3 4 5 6 7 8 <operation name="GetLastTradePrice"> <soap:operation soapAction="http://example.com/GetLastTradePrice"/> <input> <soap:body use="literal"/> </input> <output> <soap:body use="literal"/> </output> 56 9 </operation> In this example you can see how an operation (service) is defined. The ’soap:operation’ item defines an URI where the operation can be called and it indicates that it is an action. The input and output marks define that request should contain a literal and that the response will contain a literal too. 5.3 Comparison There are two basic differences between web services and the REST APIS. Web services are always built using SOAP and WSDL as data format, while REST APIS can use any kind of data format and are actually prepared to work simultaneously with more than one data type. The fact that web services requrie SOAP implementations adds one more discrepancy point: SOAP defines multiple interfaces to treat with different kinds of interaction with the server (SOAP messages may carry information, tasks to execute and events). REST, on the other hand requires a single unique interface. Summing up, SOAP is a structured well defined protocol, while REST is a free and variable style that gives freedom to the developers to build their API on his own way. Generally, SOAP used to be the default option in enterprise environments, but over the time, many enterprises have changed their policies and have adopted REST. Amazon web services states that "(they) are still seeing an 80% REST / 20% SOAP usage pattern"[27]. It is because REST implementations are simpler and easier to understand and to work with. There’s an aspect that web services does cover and REST doesn’t, which is other side aspects of the services give. REST does only talk about interface while web services define multiple protocols to cover aspects such as Legal protocols, Terms of use protocols, etc. They’re known as the ws-* protocols. Let’s see the practical differences between the three types of API that we’ve studied in a simple practical case: Imagine that you create three APIs, one following the RPC model, another following a message based model and the last one a REST API. They have the same functionality, they access a database row that represents a plane ticket from an Airline. Using the RPC API you might have to perform a call similar to the following: 1 2 3 URI: api.myairline.com/ Procedure: QueryDatabase(query) Params: SELECT * FROM plane_tickets where name='MyName' and date='21/07/2015' In this case the server calls the procedure specified (it could’ve been specified by a procedure identifier instead of the function name) On the other hand, in a message based API, the client might call something similar to: 1 2 URI: api.myairline.com/GetPlaneTicket Params: name='MyName',date='21/07/2015' In this example, the server takes the parameters and generates the query that will be sent to the database. In this case, the client does not need to know the implementation details but it needs to know that it has to send a message that executes a task instead and that the response will be a message carrying data. Finally, on a REST API, the ticket would be rendered as a resource and it would be identified by an URI. To get to the ticket URI, the client could access other resources that contain links to the desired resource: 1 2 GET: api.myairline.com/tickets/user/ GET: api.myairline.com/tickets/user/7657/ In this example, the first URI is used to access a user’s tickets and it returns a list of tickets. The tickets received contain some information (the date for example) and links to every ticket resource. Finally the user parses the response, selects the ticket that it wants to retrieve and sends a GET to the URI linked to it. 57 Chapter 6 Django Development 6.1 Introduction to DJANGO Django is a high-level Phyton Web framework built to make the task of developing web applications much easier. It’s power comes from the ability to separate the application development from the low-level hassles such as database connection. Another important aspect of Django is the modularity that brings. A project is a set of applications assembled. DJ ANGO HTTP REQUEST URIS.PY MIDDLEWARE VIEWS.PY MODEL TEMPLATE URI RETURN A VIEW HTTP REQUEST HTTP REQUEST Model HTTP RESPONSE RESPONSE CONTENT HTTP RESPONSE Figure 6.1: Django’s request-response procedure. Django is based on a MVC pattern1 . The most important parts in a Django project are: Models, Views and Templates. While Django models are synonyms with MVC pattern models, Views and Templates are quite different. 1 http://en.wikipedia.org/wiki/Model-view-controller 58 As we see in figure 6.1 when Django receives a request first of all it goes through middleware, which performs repetitive tasks such as authentication check. After that Django checks the url from the request and maps it into a single view using patterns stored in the file uris.py. Then the view is called, it accesses the model to collect the data needed to elaborate a response and generates it using a template. Since the views are the element that decides how the response is going to be like there is a tendency to believe that it matches the controller in a MVC pattern, while actually the controller is Django itself. The framework is the one that receives the request, handles it to the modules and gets the response back, decides which view is called and calls it, etc. Models map directly to what is understood in the MVC pattern as a model: They store data from an application. Finally MVC’s view is separated in two parts in Django: The views decide WHAT data from the model its going to be showed and the templates decide HOW said data is shown (templates are actually optional, and sometimes the view also performs the template function). 6.2 Starting a new project In this section it will be assumed that the reader has already installed python and Django. It may be important for you to use some software such as virtualenv to avoid dependency and version problems. To learn how to use it visit https://virtualenv.pypa.io/en/latest/userguide.html. Starting a new project in Django is very simple, you only need the django-admin tool that has been already installed with the django framework. 1 $ django-admin.py startproject testproject After this we have already created a new project. Now to create a new application for our project we have to use the manage.py script located in testproject/manage.py 1 $ python3 testproject/manage.py startapp testapp 6.3 Project structure In this section we will list all the files that have been created by django in section 6.2 and we will explain their utility. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 testproject/ manage.py testapp admin.py __init__.py migrations __init__.py models.py tests.py views.py testproject __init__.py __pycache__ __init__.cpython-34.pyc settings.cpython-34.pyc settings.py urls.py wsgi.py As you can see, Django has created a ’testproject’ directory, a ’testapp’ directory and a ’manage.py’ script. The ’testproject’ directory is unique, it contains project configuration scripts. The most important ones for now are ’urls’ and ’settings’. As stated in the introduction the urls script is used to map urls into views. It’s done using regular expressions. The settings script contains the database conection configuration, the list of applications used in this project, the list of middleware used in this project, etc. Finally ’wsgi.py’ is a script using the ’Web Server Gateway Interface’ that will be detailed in section 6.9 59 On the other hand ’testapp’ is a directory created to accomodate a single application. A project is usually formed by multiple applications and each one is stored in a different directory. In an application directory there are two out of the three scripts mentioned in the introduction: models and views. The templates are usually stored in a new directory inside the app directory but they can be stored anywhere. There is also a ’test’ script that is used to perform tests in an automated way and an ’admin’ script, that is part of an administration application that is installed by default but that will not be used in this documment. 6.4 Models Models in Django provide high level functionality and allow developers to avoid dealing with low level tasks such as database managing, SQL quering, etc. In order to create a model you only need to create a class that extends from ’models.Model’ and write your model atributes as the class’ atributes. For example: 1 class Person(models.Model): 2 name = models.CharField() 3 CharField represents a string. Django offers a set of pre-built fields that are automatically managed by the framework, you can see the most important ones in table 6.1 and the whole list in: 2 Name Stored data BigIntegerField A 64 bit integer BooleanField True or False CharField A small or medium size string DateField A date represented by a python datetime.date instance EmailField A string that is checked to certify that it is a email adress FileField A file uploaded to the server FloatField A floating-point number IntegerField A 32 bit integer GenericIPAddressField An IPv4/IPv6 adress TextField A large string URLField A charfield for a URL Table 6.1: Django’s important field list. The fields in Django accept some options, for example: If you want to allow your field to be empty you can use the option: ’Field.blank=True’. A Field accepts more than one option. There are some options that are common to all the fields, and there are fields that have especific options, for example: ’Field.blank’ is avaliable to all field types and GenericIPAdressField accepts an option ’GenericIPAdressField.protocol’ that allows you to decide whether if you accept IPv4 adresses, IPv6 adresses or both. There is a list of important field options in figure 6.2 and you can find the full list in 3 For example: 2 https://docs.djangoproject.com/en/1.8/ref/models/fields/#field-types 3 https://docs.djangoproject.com/en/1.8/ref/models/fields/#field-options 60 1 NIF = models.CharField(max_length=9, unique=True, primary_key=True) This code will create a field that allows strings in it, it has to be unique through the table and it will be used as key to refer to the table row. Option null Value True or False Description When true Django will allow empty values as NULL in the database blank True or False When True the fill is allowed to be blank. It is validation-related, while null is database-related. choices A List or a Tuple both of them The tuples are the possible choices for having to consist of iterables the field. The first element of the inner of exactly two tiems ([A,B], tuple (A and C) are the values that will [C,D],...) be stored in the databse, and the second element (B and D) are the human-readable equivalents. default A value or a callable object The default value is used when model instances are created and a value isn't provided for the field. error_message A dictionary with keys Overrides the default messages that the s matching the error messages field may raise you want to override primary_key True or False If none of the fields in a model has a primary_key=true option Django will create an AutoField to hold a primary key unique True or False It will force the field to be unique in the table. If there is a conflict it will raise a django.db.IntegrityError Table 6.2: Django’s important field options list. 6.4.1 Model relationships Django accepts relationships between models. There are three basic relationships allowed: one to one, many to one and many to many. One to one relationships can be used by defining a ’OneToOneField’ that takes a class model as argument. It may be useful for example to define a marriage status. 1 2 3 4 class Person(models.Model): name = models.Charfield(max_length=30) DNI = models.Charfield(max_length=9) isMarriedTo = models.OneToOneField(Person, blank=True) One to one relationships can be changed as if they were normal fields. If changes occur to a model the other related model will be updated too. Remember: if you retrieve a model from the database and you change some of it’s attributes you’ll have to use the method ’save()’ or the changes won’t be applied. Many to one relationships are defined by a ’ForeignKey’ field. The ForeignKey accepts as argument a model class. It could be used to define properties. 1 2 3 class House(model.Model): address = models.CharField(max_length=50) landlord = models.ForeignKey(Person) 61 When you define a many to one relationship in the ’one’ element (In this case the ’Person’) a new field is created by Django, it is the name of the ’many’ class followed by ’_set’. In this example, Django will add a field called ’house_set’ to ’Person’. You can use this new attribute to add new ’Houses’ to a ’Person’ with the method ’add()’ which will take a ’House’ object as attribute. You can also delete an object from a determied set (without deleting it from the database) with the method ’remove(object)’. To delete all the objects in a set you can use ’clear()’. Finally the method ’create()’ applied to the set will create a new object and add it automatically to the set. Remember again that if you modify some objects you need to use the ’save’ method or the changes will be lost. If you remove the ’one’ object from the database (In this case the Person) the ’many’ that belong to it (In this case the Houses) will be removed. Many to many relationships are implemented by the ManyToManyField field. It can be useful for example for students and subjects (One student attends many subjects and each subject has many students enroled). 1 2 3 4 class Student(models.Model): name = CharField() age = IntegerField() subjects = ManyToManyField(Subject) 5 6 7 8 9 class Subject(models.Model): name = CharField() ECTS = IntegerField() Lab = BooleanField() In many to many relationships you can access one from each other. In this case, you can acces the subjects a student is taking using the ’subjects’ attribute, and you can have access to the students enroled in a subject by accessing the ’student_set’ on ’Subject’. Just like it happened in many to one relationships. You can also use the same methods (add, remove, clear, etc.). If you need to add data to the relationship itself you can use an intermediate model4 . 6.4.2 Managers and QuerySets The models that we’ve seen until now are only a structure, they don’t represent any data in any database. To actually manage some data you need a Manager. Managers are objects that every model has innerited from models.Model. They are the responsible ones for the communication with the database (They are Data Acess Objects5 ). They perform the queries and they store the data retreived (in QuerySets). The default manager is named ’objects’. Managers offer two basic methods: The first one is ’all()’. It returns every object in the database. The second one is ’get_queryset()’. It is equivalent to the ’all()’ method. The trick here is that you can create your own manager, extend it from models.Manager and override the ’get_queryset()’ method and reduce the number of results (filter) to make it more specific. QuerySets are sets of elements. They are retrieved from managers or from other QuerySets. You can use some methods that return QuerySets on Managers or on other QuerySets. The most important methods are listed in table 6.3 and you can find the complete list in Django’s field-options-tabledocumentation6 . 6.5 Views As stated before, a view is a callable object that takes an ’HttpRequest’ and returns an ’HttpResponse’. There are two basic ways of developing views: The first one, is to develop a function that renders the view. The second one is to use a class-based view. The class based views allow the user to structure better the views and reuse code. 6.5.1 Function views When developing function views you’ll have to handle Requests and Responses. 4 https://docs.djangoproject.com/en/1.8/topics/db/models/#intermediary-manytomany 5 http://en.wikipedia.org/wiki/Data_access_object 6 https://docs.djangoproject.com/en/1.8/ref/models/querysets/#methods-that-return - new-querysets 62 Method Use filter(field1='value1',fiel2='value2') Returns a QuerySet where all the result match the condition(s) passed. exclude(field1='value1',fiel2='value2') The opposite of filter. It returns all the results that don't match the condition(s) passed order_by('field1','field2') It orders the QuerySet by the fields specified. If you want descendig order you have to write '-' before the field: ('-field1') reverse() It reverses the actual order of the QuerySet Table 6.3: Django’s important QuerySet methods. HttpRequests7 are objects which contain all the information from the request. They have some attributes and some methods. Methods are not very useful and we will omit them in this document, but if you want to learn about them you can check Django’s documentation 8 . Attributes, on the other hand, will be very useful. In table 6.4 you can find the most important ones and if you want to learn about the less important ones you can look into Django’s documentation 9 . Attribute Description body It contains the raw data received from the client. method A string that represents the Http method used in the request (GET, PUT...) GET A dictionary containig all the pairs variablevalue sent from the client (If the method used is GET) POST A dictionary containig all the pairs variablevalue sent from the client (If the method used is POST) FILES It contains the files uploaded to the server through a form (if any) META A dictionary that stores all the HTTP headers sent. Table 6.4: Django’s HttpRequest object’s attributes. HttpResponses are objects that will be generated and will contain the body of the response and all the HTTP headers desired. The content and the headers of the response can be passed to the HttpResponse’s constructor or it can be added later: 1 response = HttpResponse(body, content_type="text/plain", date="Mon, 4 May 2015 15:05:30 GMT") 1 response = HttpResponse(body) response['Content_Type']= 'text/plain' response['Date']= 'Mon, 4 May 2015 15:05:30 GMT' 2 3 7 https://docs.djangoproject.com/en/1.8/ref/request-response/#httprequest-objects 8 https://docs.djangoproject.com/en/1.8/ref/request-response/#methods 9 https://docs.djangoproject.com/en/1.8/ref/request-response/#attributes 63 NOTICE: When passing headers on the constructor you have to use lower case but if you access to the response’s dictionary you have to use the exact header as specified in HTTP protocol. Example 1: let’s develop a simple view that takes a field named ’name’ from the request’s body and returns a html string. 1 from django.http import HttpResponse 2 3 4 5 6 7 8 9 def salute(self, request): response ='<html>' response+=' <body>' response+=' Hello ' response+=request.GET['name'] response+=' </body>' response='</html>' 10 return HttpResponse(response) 11 In the case we only need to return a status code we can use some HttpResponse subclasses10 : 1 return HttpResponseNotFound("The resource you are looking for is not valid") 6.5.2 Class-Based views Django uses callable objects as views but if you make the function called belong to a class you can gain some advantadges such as code reuse. For example, you could make an auxiliar function to handle OPTIONS requests. Django defines a Base View that you can extend to inherit the following functionalities: • Validates arguments passed into the view configuration • Prevents using arguments named after HTTP methods • Collects arguments passed in the URL coniguration • Keeps request information in a convenient place for methods to access • verifies That a requested HTTP method is supported by the view • Automatically handles OPTIONS requests • Dispatches to view methods based on the requested HTTP method It has an argument: ’http_method_names’ that lists all the methods that the view will handle. Django has some built-in simple views that can however be very useful to us such as ListView and DetailView. ListView has been designed to take a list of objects from the model and display them by rendering a template. DetailView has been designed to take a single object and generate an output with it through a template. To do so you only need to create a class and extend it from one of this classes. We will not take a deeper look into this classes because they are not useful in the process of developing an API. 1 2 from django.views.generic import View from django.http import HttpResponse 3 4 class MyView(View): 5 http_method_names = ['get', 'post', 'put', 'delete', 'options'] 6 7 def get(self,request,*args, **kwargs): return HttpResponse("This request was a get request") 8 9 10 def post(self, request, *args **kwargs): return HttpResponse("This one was a post") 11 12 13 def put(self,request, *args, **kwargs): return HttpResponse("This other one was a put") 14 15 16 def delete(self, request, *args, **kwargs): 17 10 https://docs.djangoproject.com/en/1.8/ref/request-response/#httpresponse-subcla sses 64 return HttpResponse("The last one was a delete") 18 To develop a function based view with the same functionality it’d require a messy code: 1 from django.http import HttpResponse 2 3 def MyView(self, request, *args, **kwargs): 4 if request.method == 'GET': return HttpResponse("This request was a get request") 5 6 7 elif request.method == 'POST' return HttpResponse("This one was a post") 8 9 10 elif request.method == 'PUT' return Httpresponse("This other one was a put") 11 12 13 elif request.method == 'DELETE' return HttpResponse("The last one was a delete") 14 15 16 elif request.method == 'OPTIONS' return HttpResponse("Allow: GET, POST, PUT, DELETE") 17 18 6.6 URI patterns In the file ’urls.py’ there will be a list of url objects. The list must be called urlpatterns. The url construtor takes 5 arguments: • Pattern: A string that represents a python regular expression. • View: The callable object that will be called if the url matches the pattern. • kwargs: (OPTIONAL, DEFAULT=NONE) It contains a dictionary with arguments that will be passed to the view. • Name: (OPTIONAL, DEFAULT=NONE) A string to store a name for the pattern. • Prefix: (OPTIONAL, DEFAULT=NONE) It’s not necessary at all and in version 2.0 of Django it will be removed, so it will always be void. Examples: 1 2 3 4 5 urlpatterns=[ url(r'index/$', views.index, name='index'), url(r'^users/?$'), views.user_list, name='user-list'), url(r'^users/(?P<pk>\d+)/?$'), views.user, name='user-detail'), ] In the first one we can see a ’ ˆ ’ character. It matches the start of the line (Notice that django will remove the first part from a URL, For example: In the url: ’www.test.com/index/’ Django will trim the ’www.test.com/’ and the string to be matched will be ’index/’). We also see a $ character. It will match the end of the line. All the other characters match themselves. This pattern will catch the exact string: ’index/’. If we remove the first ˆ character it would match any string that ends with ’index/’. For example: ’test-index/’ would be a match. On the other hand, if we remove the $ symbol it would match any string that starts with ’index/’. For example: ’index/test’ would match. Finally, if we remove both ˆ and $ it would match any string that contains the substring ’index/’. For example: ’this is a index/ test’ would also match. In the second one we can see a ? char after the slash. This means that it will match the string only if there is zero or one slash. In the third one we can see a \d+ pattern. This will match if there is a decimal digit (\d) repeated one or more times. We can also see a structure (?P<name>pattern). This is a named group. If the string matches and a view is called, inside the argument kwargs there will be a string named ’name’ whose value will be the part of the url that matches the inner pattern. In this example, ’users/1/’ will match and it will send to the view a string named ’pk’ with value ’1’. 65 If you want to learn more about regular expressions in python you can look into 11 . You can also find a great testing tool here 12 6.7 Formatting the output Generally, when creating an API what we really need is to exchange text information. We’ve seen that receiving and sending plain text (or html, since we treat it as text) is very easy but in most applications it’s not going to be very useful, since html is hard to parse. The most extended text-based formats are JSON and XML. Django provides ’serializers’ classes that convert a QuerySet into JSON, XML and YAML. In order to serialize objects you’ll have to call ’serializers.serialize()’ method. It takes a string that represents the format, an object and some options and turns it into a string in the requested format. On the other hand, to deserialize it we will not use the default django method (’serializers.deserialize’). Instead we will use python’s method. Example: Serialize a list of cars stored in the database: 1 2 from django.core import serializers from car.models import Car 3 4 5 car_list = Car.objects.all() serialized_data = serializers.serialize('json', car_list) Result: 1 [ { "model": "carlist.car", "fields": {"model": "ModelA", "brand": "BrandA", "price": 20000}, "pk": 1 }, { "model": "carlist.car", "fields": {"model": "ModelB", "brand": "BrandB", "price": 24000}, "pk": 2 }, { "model": "carlist.car", "fields": {"model": "ModelC", "brand": "BrandC", "price": 18000}, "pk": 3 } 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ] We can see in the output that the three items stored in the database have been exported into the string. The output contains the id number that represents each object and the model whom each object belongs to. These are not necessary fields but Django exports them because if you use their deserializer you can store them in the database directly13 . Python has a module called json that has a method called ’loads’ that takes a string and returns a variable that maps the string json fields as python code. For example, in the previous example there is a three item list: The three items will be modelled as a list. Each item contains 3 items (pk, model and fields). All of them are dictionary type 11 https://docs.python.org/2/howto/regex.html 12 http://www.pyregex.com/ 13 https://docs.djangoproject.com/en/1.8/topics/serialization/#deserializing-data 66 (name:value), so they will be modelled as a dictionary. Finally ’fields’ contains 3 variables (model, brand and price) and they are also a dictionary, so they will be also modelled as a dictionary. Example 2: Deserialize the car list: 1 2 from carlist.models import Car import json 3 4 def deserialize(serialized_data): 5 deserialized_data = json.loads(serialized_data) for car in deserialized_data: Car.objects.create( model=car['fields']['model'], brand=car['fields']['model'], price=car['fields']['price'], ) 6 7 8 9 10 11 12 6.8 Middleware Middleware is a very helpful tool in Django that allows you to automate some routines. Middleware pieces are simple functions that allow you to modify requests and responses. The middleware configuration is stored in the ’settings.py’ file, in a tuple called ’MIDDLEWARE_CLASSES’. Why would it be useful for? For repetitive tasks that you must perform on every request or response. You can actually do the same in views but by using middleware you put it on a second plane, and you can forget about it. For example: The most used middleware is django’s authentication middleware. It is responsible for managing authentication credentials. When using this middleware the only thing you must do is add some decorators to indicate which of your views require authentication. In figure 6.1 the middleware has been simplified. There is actually five types of middleware functions, two of them are called before calling the view and the other three are called after.g445 Before calling the view (in calling order): 1. process_request: It is called before the URI has been parsed. The view to use has not been decided yet. It takes just one argument: request (remember to add ’self’ when creating yours). 2. process_view: It is called when it has been already decided which view must be used. It takes four arguments: request, view_func, view_args and view_kwargs. view_func is the function object that will be called (Not a string with the function’s name). view_args and view_kwargs are the same parameters that will be passed to the view. (Remember to add ’self’ too). Both process_request and process_view can either return a HttpResponse object or None. If they return a HttpResponse the request-response chain is cut and the generated object is directly sent to the client. If they return ’None’ the process continues. After the view has been processed (in calling order): 1. process_exception: It is called only if a view raises an exception. It takes two arguments: request and exception. It can return either a HttpResponse or None. If it returns HttpResponse the request-response flow will continue normally, calling the next middleware piece, but if it returns ’None’ then the default django’s exception handler will act, most likely returning a 500(internal server error) status code. 2. process_template_response: It is called after the view has been processed. It takes two arguments: a request and a response, being the response a Template Response instead of a HttpResponse. It should return a render object or None. In this document it will not be used. 3. process_response: It is called right before the response is sent to the client. It takes two arguments: request and response. It should always return a HttpResponse. 67 The middleware can be stacked, meaning that you can use more than one middleware component in the same project. If you have more than one middleware component the functions will be called in the order listed above, and the middleware components will be called in the order in which were defined in the ’MIDDLEWARE_CLASSES’ tuple. However, after the view is called, the middleware will be called in the reverse order. For example: If your middleware configuration looks like this: 1 2 3 4 MIDDLEWARE_CLASSES = ( 'mymiddleware1', 'mymiddleware2', ) The order of execution would be: 1 2 3 4 5 6 7 8 9 process_request mymiddleware1 process_request mymiddleware2 process_view mymiddleware1 process_view mymiddleware2 VIEW PROCESSING p r o c e s s _ t e m p l a t e _ r e s p o n s e mymiddleware2 p r o c e s s _ t e m p l a t e _ r e s p o n s e mymiddleware1 process_response mymiddleware2 process_response mymiddleware1 Be careful, if a function returns a response before calling the view, the remaining middleware functions will not be executed and it will start processing the process_response middleware corresponding after the view. For example, if my middleware1 process_view returns a response, the middleware2 process_view will not be called and the middleware2 process_template_response will be called. This means that you cannot rely on actions supposed to be done in your request middleware when you’re developing the response middleware. In a REST API development, middleware can be applied to develop and test layered systems. It allows you to develop some components separately without having to use virtualization schemes. 6.9 Deploy the project Django uses a WSGI. It stands for Web Server Getaway Interface. It’s a specification on how should the servers and python applications communicate. In a development phase, django provides you with a very basic server that you can execute with the manage.py script located on the project’s directory but once you’ve finally finished your project you should use another server. There are many servers that support WSGI. You can find a list of them and more extended information in 14 . 6.10 Cache in Django It is possible to implement a caching application with django. To do so, we will need to have a running memcached daemon. Memcached daemon stores data in memory in a key-value format. It can be accessed through a tcp socket. To install memcached you can run: 1 2 u s e r ~$ s u d o a p t −g e t i n s t a l l memcached u s e r ~$ s u d o s e r v i c e memcached s t a r t Memcached is prepared to be used in a distributed architecture, meaning that the same cache may be used for many servers, sharing the stored data between servers if desired. You’ll also need a python library to communicate with the daemon. The most usual ones are python-memcached and pylibmc. Important: python-memcached requires a version higher than 2.0 but at the moment of writing this document it’s not compatible with python 3. To install python-memcached and pylibmc you can use pip tool: 14 http://wsgi.readthedocs.org/en/latest/ 68 u s e r ~$ s u d o p i p i n s t a l l p y t h o n−memcached and u s e r ~$ s u d o p i p i n s t a l l p y l i b m c 1 2 3 Once you have both the daemon running and the library installed you have to configure django. Open the settings.py script and add the following lines: CACHES = { ' default ' : { 'BACKEND ' : ' d j a n g o . c o r e . c a c h e . b a c k e n d s . memcached . MemcachedCache ' , ' LOCATION ' : ' 1 2 7 . 0 . 0 . 1 : 1 1 2 1 1 ' , } 1 2 3 4 5 6 } By adding this lines django only stores the cache service information but does not use it. There are two tools for caching: 1. Django Middleware: Django has some built-in middleware that handles the caching. 2. cache_page: A decorator that defines the time which the result of a view can be stored. You can use both of them at the same time. How to use middleware: Add the middleware classes to the MIDDLEWARE_CLASSES stack from settings.py: MIDDLEWARE_CLASSES = ( ' django . middleware . cache . UpdateCacheMiddleware ' , ' django . middleware . cache . FetchFromCacheMiddleware ' , 1 2 3 4 ) Important: You must add them in this order or there will be conflicts. You also have to add three variable in the settings file: CACHE_MIDDLEWARE_ALIAS = " S t r i n g " 1 2 CACHE_MIDDLEWARE_SECONDS = 999 3 4 CACHE_MIDDLEWARE_KEY_PREFIX = " " 5 The alias is used as a ’namespace’ to avoid collisions if multiple applications are using the same cache. The seconds number indicates the validity in seconds of the stored data. The key prefix is a string used when you share the same cache for multiple servers serving the same resources, but in this document it will not be used, so you can leave it empty. There are two ways of using the cache_page decorator: The first one is to use it in the urls.py script with the format: cache_page(max seconds)(View to call) 1 django . views . d e c o r a t o r s . cache import cache_page 2 3 urlpatterns = patterns ( ' ' , 4 url ( r '^ cachetest /$ ' , cache_page (60*5) ( CacheTesterView . as_view ( ) ) , name= ' c a c h e−t e s t ' ) 5 6 7 8 ) Defining the seconds in a 60*X format where X represents minutes is a common practice to add readability to the scripts. If you use the result directly (in this case 300) it is also correct and will run just fine. The other way of using the cache_page decorator is right before the view function that you want to cache: 1 django . views . d e c o r a t o r s . cache import cache_page 2 3 @cache_page ( 6 0 * 5 ) 69 4 5 d e f mycachedview : ... If you’re using class-based views you should use the decorator in urls script, because it will be applied to GET, OPTIONS and HEAD methods, and you won’t need it on the other ones. Also, at the time of writing this document the decorator applied to a class based view’s function raises a non-documented exception. 6.11 Example 1: File distribution API In this example develop an API that allows us to access some files remotely. Since this document aims to instruct people with low knowledge about django we will develop this example in an iterative and incremental way. 6.11.1 First iteration In this iteration we will focus on creating a simple model that rougly maps a file, and a Function-based view and a Class-based view that allows to retrieve information from the server (A list of files and a detailed file). First of all we will start by creating our project: 1 $ django-admin.py startproject file_distribution_api Once our project has been created we will create the first app of the project. With this app we will manage a single file. To do so we will type: 1 $ python3 manage.py startapp files To create the ’Files’ model we will have to edit the ’files/models.py’: 1 2 from django.db import models from django.core.urlresolvers import reverse 3 4 # Create your models here. 5 6 class File(models.Model): 7 8 9 file_name = models.CharField(max_length=255,) file_type = models.CharField(max_length=5,) file_location = models.CharField(max_length=255,) 10 11 12 13 def __str__(self): return ' '.join([self.file_name,self.file_type,self.file_location]) 14 15 16 def get_absolute_url(self): return "/files/%i" % self.id 17 18 NOTICEABLE PARTS: *********************** • Every model class inherits from a base class ’models.Model’. • The attributes contained in this class will be mapped into the database and will belong to one of the field types offered by django 15 . The most important ones are listed on table 6.1 • The function __str__() overrides a model method. There are many methods that are automatically given to the model16 but some of them are worth overriding yourself. __str__() is one of them. 15 https://docs.djangoproject.com/en/1.8/ref/models/fields/#field-types 16 https://docs.djangoproject.com/en/1.8/ref/models/instances/#model-instance-meth ods 70 • The method get_absolute_url() is used to calculate the URL for an object. In this particular case, every file object will be accessible from ’/files/id-of-the-file’ (for example: ’/files/128’). There are more sofisticated methods to do this task but we will not need them, you can find more information about this on Django’s documentation17 . *********************** Now our model is created but the application ’fles’ doesn’t belong to the project yet. To include it we will have add it in the file settings.py, in the INSTALLED_APPS tupple. It should look something like this: 1 # Application definition 2 3 4 5 6 7 8 9 10 11 INSTALLED_APPS = ( 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'files', ) After creating the model you have to update the database using the manage.py script. It will create all the new tables that will contain the fields needed and everything necessary for it to work. 1 2 3 4 5 6 7 8 $ python3 manage.py syncdb Operations to perform: Apply all migrations: contenttypes, sessions, auth, admin Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying sessions.0001_initial... OK By default if it is the first time that you’ve used the syncdb order the script will ask you to create a superuser, in this example we will use it, so create it. It is possible that by executing the syncdb command django does not actually export your model to the data base. In that case you will simply have to run: 1 2 3 4 5 6 7 8 9 $ python3 manage.py makemigrations Migrations for 'files': 0001_initial.py: - Create model File $python3 manage.py migrate Operations to perform: Apply all migrations: contenttypes, admin, auth, files, sessions Running migrations: Applying files.0001_initial... OK Now that we have our model well defined it is time to create a view for it. As we saw in section 6.5 there are two ways of developing views. In this example we’ll show a possible implementation for both of them. Class-based: 1 2 3 4 from from from from django.shortcuts import render django.http import HttpResponse, Http404 django.views.generic import ListView django.views.generic import DetailView 5 6 from files.models import File 7 8 9 10 # Create your views here. class ListFileView(ListView): model = File 17 https://docs.djangoproject.com/en/1.8/ref/urlresolvers/ 71 11 template_name = 'file_list' 12 13 14 15 class FileView(DetailView): model = File template_name = 'file' NOTICEABLE PARTS: *********************** • The class ListFileView inherits from ListView, a generic Class-based view that is used to list a set of objects following a certain template. As we saw, there are many of them18 . • The clas FileView inherits from DetailView, that we will use when we want to expose a certain element. *********************** Templates: 1 <h1>Files</h1> 2 8 <ul> {% for file in object_list %} <li class="file"> <a href="{{ file.get_absolute_url }}">{{file}}</a></li> {% endfor %} </ul> 1 <h1> {{ contact }} </h1> 3 4 5 6 7 2 3 <p> File: {{ file.file_name}} </p> NOTICEABLE PARTS: *********************** • HTML code can be written directly into the template. • To interact with the model fields and its methods you have to use double braces {{·}} • In order to use control structures such as ’for’ or ’if’ you have to wrap them with {% ·%}. *********************** Function-based: 1 2 from django.http import HttpResponse, Http404 from files.models import File 3 4 5 6 7 8 9 10 def SingleFile(request, pk): try: file = File.objects.get(id=pk) html = "<html><body><p>%s</p></body></html>" % file.file_name return HttpResponse(html) except File.DoesNotExist: raise Http404 11 12 13 14 15 16 17 18 19 20 21 def FileList(request): file_list = File.objects.all() html = "<html><body><h1>Files</h1><ul>" for file in file_list: html += "<li>" html += "<a href=\""+file.get_absolute_url()+"\">"+file.file_name+"</a>" html += "</li>" html += "</ul>" html += "</body></html>" return HttpResponse(html) NOTICEABLE PARTS: *********************** 18 https://docs.djangoproject.com/en/1.8/ref/class-based-views/ 72 • SingleFile has two arguments: – Request is an object that represents the Http Request received by django. It contains the HTTP method sent by the client, other attributes set by middleware classes, etc. 19 – pk is a string that represents the file’s id, it is sent by the url dispatcher 20 . It will be explained in the urls.py file. • File.objects is an attribute inherited from Models.model. It is responsible for retrieving the instances from the database. At the moment, we will only need 3 methods: – File.objects.all(): retrieves all the objects in the database – File.objects.filter(field_name=value): returns the set of all objects that match the condition. – File.objects.get(field_name=value) that will return a single matching object. It is used to retrieve objects by their unique attribute. ATTENTION: if there isn’t any object that matches, this method will raise a File.DoesNotExist exception. • A view is responsible for returning a HttpResponse object. HttpResponse’s constructor takes a string with all the content and other values, for example HTTP headers such as content_type. URLS.py 1 2 3 from django.conf.urls import patterns, include, url from django.contrib import admin import files.views 4 5 6 7 8 9 10 11 12 13 urlpatterns = patterns('', url(r'^admin/', include(admin.site.urls)), url(r'^files/$', files.views.ListFileView.as_view(), name='file-list'), url(r'^files/(?P<pk>\d+)/?$', files.views.FileView.as_view(), name='file-view'), ) As we saw in section 6.6 the file urls.py is the link between a URI pattern and the view django will call. This file maps 3 types of adresses: 1. Any url that starts with ’admin/’ will be managed by the admin application 2. The url ’files/’ will be used as entry point to the list of files. 3. Any url ’files/anything’ will call the detailed view of a single file. Using the expression ’(?P<name>pattern)’ the url dispatcher passes the group to the view function called. Remember that any url that doesn’t match one of this patterns will cause a 404 error message. Also, if a url matches the 3rd rule but the file’s id is not registered in the database django will also return a 404 response. 6.11.2 Second Iteration In this second iteration we will make a more complex model and we will develop views so you can add and delete files. Let’s take a look at the new model in ’files/models.py’: 1 2 3 4 from django.db import models from django.core.urlresolvers import reverse from django.contrib.auth.models import User # Create your models here. 5 6 class File(models.Model): 7 19 https://docs.djangoproject.com/en/1.8/ref/request-response/#httprequest-objects 20 https://docs.djangoproject.com/en/1.8/topics/http/urls/ 73 filetype_list=(['txt','text'], ['jpeg', 'image'], ['png','image'], ['gif','image'],) 8 9 10 11 12 13 14 15 16 name = models.CharField(max_length=255) file_type = models.CharField(choices=filetype_list,max_length=4) location = models.URLField(unique=True,primary_key=True) owner = models.ForeignKey(User) 17 18 19 20 21 22 def __str__(self): return ' '.join([self.name, self.file_type, self.location, self.owner]) 23 24 25 def get_absolute_url(self): return "/files/%i" % self.id Changes: • ’file_type’ has now a choices option. It will only allow one of the strings stored in the tuple ’filetype_list’ • ’location’ is now a ’URLField’ and will be the primary key for the file table. • ’owner’ field has been aded. It represents a many-to-one relationship between users and files. We will now test the one-to-many relationship between users and files: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 $ python3 manage.py shell Python 3.4.0 (default, Apr 11 2014, 13:05:11) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> from django.contrib.auth.models import User >>> from files.models import File >>> u = User.objects.create_user('user1', '[email protected]', 'user1password') >>> u <User: user1> >>> u.save() >>> File.objects.all() [] >>> file1 = File.objects.create(name='file1',file_type='txt', ... location = '/test/file1', owner = u) >>> u.file_set.all() [<File: file1 txt /test/file1>] >>> file1.owner <User: user1> >>> file2 = File.objects.create(name='file2',file_type='txt', ... location='/test/file2',owner = u1) >>> file2.owner <User: user1> >>> u1.file_set.all() [<File: file1 txt /test/file1>, <File: file2 txt /test/file2>] >>> User.objects.get(username='user1').delete() >>> User.objects.all() [<User: admin>] >>> File.objects.all() [] You can verify the model relationship properties developed in section 6.4.1 with the django shell. 1 2 3 4 from from from from django.shortcuts import render django.views.generic.base import View files.models import File django.http import HttpResponse, HttpResponseNotFound 74 5 6 7 8 9 from from from from from django.core import serializers django.core.exceptions import ObjectDoesNotExist django.contrib.auth.models import User django.views.decorators.csrf import csrf_exempt django.utils.decorators import method_decorator 10 11 import json 12 13 14 # Create your views here. class FileView(View): 15 16 http_method_names = ['get', 'delete', 'options'] 17 18 19 20 21 22 23 24 25 def get(self,request, *args, **kwargs): file_location=kwargs['pk'] try: file = File.objects.get(location=file_location) except ObjectDoesNotExist: return HttpResponseNotFound() serialized_file = serializers.serialize('json', [file]) return HttpResponse(serialized_file) 26 27 28 29 30 31 32 33 34 def delete(self, request, *args, **kwargs): file_location=kwargs['pk'] try: file = File.objects.get(location=file_location) except ObjectDoesNotExist: return HttpResponseNotFound() file.delete() return HttpResponse(status=200) 35 36 37 38 @method_decorator(csrf_exempt) def dispatch(self, *args, **kwargs): return super(FileView, self).dispatch(*args, **kwargs) 39 40 class FileListView(View): 41 42 http_method_names = ['get','put','options'] 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 def put(self, request, *args, **kwargs): unicode_data = request.body.decode('utf-8') data = json.loads(unicode_data) try: file = File.objects.get(location=data[0]['pk']) return HttpResponse(status=403) except ObjectDoesNotExist: if True: owner=User.objects.get(id=data[0]['fields']['owner']) file = File.objects.create(location=data[0]['pk'], name=data[0]['fields']['name'], file_type=data[0]['fields']['file_type'], owner=owner) return HttpResponse(status=200) else: return HttpRespone("Forbidden") 60 61 62 63 def get(self,request,*args,**kwargs): output = serializers.serialize('json',File.objects.all()) return HttpResponse(output) 64 65 66 67 @method_decorator(csrf_exempt) def dispatch(self, *args, **kwargs): return super(FileListView, self).dispatch(*args, **kwargs) Changes: 75 • The classes don’t extend ’ListView’ and ’DetailView anymore. Instead they extend a base view. • Since we are building an API the responses are now serialized data in ’json’ format. • There are more methods allowed now: You can use the methods listed in the attribute ’http_method_names’: GET, DELETE and OPTIONS for ’FileVIew’ and GET, PUT and OPTIONS for FileListView. The method names represent the action that the server will perform when received. • OPTIONS method is handled by the upper class (View). • There is a method decorator before the dispatch function. By default Django has some middleware installed that handles sessions and some sort of security. By now we are not trying to secure our API but we can’t disable this middleware because it is required in order to use some applications. To solve this django offers a function decorator (’csrf_exempt’) but since we have class based views and function decorators can’t be used on methods we need a new decorator before dispatch that adds the rule desired (in our case ’csrf_exempt’) to all the methods in the class. The methods that are sensible to the ’csrf’ token are PUT, DELETE and POST. • Django serializes the primary key object under a ’pk’ tag. In this example the primary key is ’location’ so in the put method the location information is retrieved from ’pk’ instead of ’fields’ 1 2 3 from django.conf.urls import patterns, include, url from django.contrib import admin from files.views import FileView, FileListView 4 5 6 7 8 9 10 11 urlpatterns = patterns('', url(r'^admin/', include(admin.site.urls)), url(r'^files/$', FileListView.as_view(), name='file-list'), url(r'^files/(?P<pk>..+)/?$', FileView.as_view(), name='file_view') ) This file has not changed. To be able to use this API you can’t use a browser anymore because you need to use other http verbs such as put and delete. To do so there are many options avaliable: • cURL is a command line tool and library for transferring data with URL syntax, supporting many transfer protocols 21 . • HTTPie is a command line HTTP client 22 . • Requests is an Apache2 Licensed HTTP library, written in Python 23 . • ’django.tests’ module contains ’Clients’ which is an object capable of performing any http call24 . There is enough documentation about all of them in their official sites, so this document won’t focus on explaining their use. Examples of use: 1 $ http GET http://127.0.0.1:8000/files/ > result 1 [ { 2 3 4 5 6 7 8 9 10 "pk": "file1", "fields": { "file_type": "txt", "owner": 1, "name": "file1" }, "model": "files.file" 21 http://curl.haxx.se/ 22 https://github.com/jakubroztocil/httpie#main-features 23 http://docs.python-requests.org/en/latest/ 24 https://docs.djangoproject.com/en/1.8/topics/testing/tools/#the-test-client 76 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 }, { "pk": "file2", "fields": { "file_type": "txt", "owner": 1, "name": "file2" }, "model": "files.file" }, { "pk": "file3", "fields": { "file_type": "txt", "owner": 1, "name": "file3" }, "model": "files.file" } ] As you can see the response is a list of ’file’ objects with the format explained in 6.7. 1 $ http DELETE http://127.0.0.1:8000/files/file1 The view responsible for deleting files doesn’t return any message but you can know that operation worked by the status code. Also, you can perform again a get request on ’/files/’ and compare the response with the old one. 1 $ http GET http://127.0.0.1:8000/files/ > result 1 [ { 2 "fields": { "name": "file2", "owner": 1, "file_type": "txt" }, "pk": "file2", "model": "files.file" 3 4 5 6 7 8 9 10 11 12 }, { "fields": { "name": "file3", "owner": 1, "file_type": "txt" }, "pk": "file3", "model": "files.file" 13 14 15 16 17 18 19 20 21 } 77 22 ] Finally to send data from the terminal you can cretate an auxiliary file and redirect the standard input of http to the file. 1 $ http PUT http://127.0.0.1:8000/files/ < file.json > result Where file.json contains: 1 2 [ { "pk": "new_file1", "fields": { "file_type": "txt", "owner": 1, "name": "new_file1" }, "model": "files.file" 3 4 5 6 7 8 9 10 11 12 } ] To check the results we will send a GET request again: 1 $ http GET http://127.0.0.1:8000/files/ > result 1 [ { 2 "fields": { "name": "file1", "owner": 1, "file_type": "txt" }, "pk": "file1", "model": "files.file" 3 4 5 6 7 8 9 10 11 12 }, { "fields": { "name": "file2", "owner": 1, "file_type": "txt" }, "pk": "file2", "model": "files.file" 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 }, { "fields": { "name": "new_file1", "owner": 1, "file_type": "txt" }, 78 "pk": "new_file1", "model": "files.file" 29 30 31 32 } ] 79 Chapter 7 Django Practices To practice django we will implemented the project described in the first exercise of the rest practices. Model practices Resource List: 1. Bookmark: A list of URIs pointing to the existing resources 2. Article: It represents an article, it contains information about the newspaper which published the article, it’s author and the article headline and body. 3. Author: It represents an author. It contains contact information about the author, the newspaper that the author writes for, etc. 4. List of articles: An author’s list of publications. 5. Newspaper: It represents a publishing authority. It contains information about the authority (Name, location, list of authors, brief description, etc.) 6. List of newspapers: A list containing all the newspapers. 7. Search by date: It represents a list of articles written on a date sent in the body payload. 8. Search by newspaper: It represents a list of articles published by the newspaper whose representation is sent in the body payload. 9. Most recent news: Lists the most recent news. Exercise 1: Creating the model. Exercise 1.1: Start a new project named ’Exercise1’. Inside this project create a new app called ’ArticleReader’. Exercise 1.2: Write models for Articles, Authors and Newspapers. You have to decide which field types should be used to model each of this resource’s attributes. You must decide too which field options should be applied to each field. Once you’ve finished register the app in the ’INSTALLED_APPS’ tuple inside settings.py and use the manage script to update the database structure. Exercise 1.3: Using the shell create two different newspapers, create four authors, two of them that write for a newspaper and the other two for the other. Finally, create one article for each author. Exercise 1.4: Using the shell, retrieve a newspaper from the newspapers list and save it in a variable ’n’ and try to show it’s content (just type ’n’ and hit enter). Whats it’s content? Go to your model files and add __str__() methods to every class. It should return a string representing the content of the model. Start the shell again and try to show a newspaper variable again. What happens? Test the article and author objects too. 80 Exercise 1.5: Use the django’s default serializer to serialize a list containing all the authors created in 1.3 in JSON format. Show the output on screen. You should use an external tool to understand the JSON format better. What is the ’pk’ field? Why do you think that ’newspaper’ is shown as a number? Exercise 2: Since the django’s serializer won’t fit our needs, let’s build our own serializer. Create a new file in the ArticleReader directory and name it ’serializers.py’ inside it create a new class called ArticleSerializer and create inside it. Exercise 2.1: Create a method inside ArticleSerializer that takes an argument ’article’ and returns a dict that contains three keywords: author, headline and body. ’author’ value must be a string containing the author’s first, middle and last name separated by spaces. ’headline’ and ’body’ will be the same ’headline’ and ’body’ defined in the article class. Exercise 2.2: Create a new method called serialize that takes one argument called article and that, using the method created in 2.1, returns a json formatted string. You can use python’s default json module to do so: Example: 1 import json 2 3 mydict = {'hello':'world'} 4 5 serialized_dict = json.dumps(mydict) Result : a string containing the json object: ’hello’:’world’ Exercise 2.3: Test your code with the shell. Exercise 2.4: Create a new method called serialize_many that takes one argument named list which contains an array with ’article’ objects. The result must be a string in json format containing the articles’ array. You should use the method developed in 2.1. Why do you think you can’t use the method developed in 2.2? Exercise 2.5: Test the results with the shell. Exercise 2.6: Modify the methods developed in this exercise so the serialized data matches with the resource representations created in the rest chapter practices. You should make two different auxiliar methods, one to use inside the ’serialize’ method and the other with ’serialize_many’. Exercise 2.7: Develop a NewspaperSerializer and an AuthorSerializer View practices Exercise 1: In this first exercise we will implement the GET views from the ArticleReader application. To check this exercises you’ll only need a web browser. Exercise 1.1: In the script ’views.py’ add a new class named NewspaperListView class that extends from View. Remember to import it from django.views.generic.base. define a get method inside it and make it return a 200 status code without any body. Add a new url to ’urls.py’ with the pattern r’ˆ newspaper/$’ that when accessed calls NewspaperListView. To test if it worked start a new test server, open a web browser and try to access ’127.0.0.1:8000/newspaper/’. If you’ve done well a blank page will be shown, otherwise an error page will tell you what you did wrong. 1 p y t h o n 3 manage . py r u n s e r v e r Exercise 1.2 Using the serializer that you implemented in the second exercise of the models practices try to generate an HttpResponse and send the complete list of newspapers. Remember that you can access to the model from the views just like you can do on the shell. Refresh the page and if you do it correctly a JSON string will be shown on the browser. Exercise 1.3 Write a new class ’AuthorView’ and define a get method. It should return the result of using the AuthorSerializer.serialize() method. Add also a url into urls.py to make the view accessible. To get the identification number of the author you can use the regular expressions explained in section 6.6 and catch it with ’kwargs[’pk’]’. Use the browser to test it. Exercise 1.4 Write a new class ’ArticleView’ like you did with the Authors in the previous exercise. Add it to the urls.py script. Exercise 1.5 Write a new class ’NewspaperView’ that returns a json serialized Newspaper whose id is indicated on the url. Add the url to the urls script. 81 Exercise 1.6 Write a ’SearchView’ class that returns a json serialized list of articles from a newspaper whose id is specified on the url. Add the url to the urls script. Exercise 1.7 Implement the ’Search by date’ resource. You can reuse the SearchView Exercise 1.8 Try to GET with the browser the url ’localhost:8000/newspaper/98/’. What happens? Try to fix it. What status code should be returned? HINT: if a get doesn’t return any object or if it returns more than one it raises an exception. If you did it well the browser should show a white page because it doesn’t show status codes but if you look at the server that you started, it shoud’ve written a log for each request that you’ve performed, and you should see a 404 response in yellow. Exercise 1.8 Fix the other queries that could raise an exception. Middleware and parsing practices Exercise 1: Continuing with the previous app, we’re going to learn how to write middleware and how to parse information that comes inside the body of a request. The middleware that we’re going to develop will take a request, evaluate if it is going to be processed by the ArticleListView and check if it contains a valid article. In this section we will use POST requests, so we will have to use a more powerful tool than a brower, we will use CURL. Above you have a list of commands to use CURL: Basic usage: 1 $ curl -X GET http://127.0.0.1:8000/ -d 'yourbody' -X is used to specify the method used. By default, if no -X argument is found, the request will be a GET. The methods are always in capital letters. -d is used to send data. If the data is a string you can send it from the commandline like you saw in the example. If you want to send binary data or you want to send the string from a file you have to use ’@’ before the file. 1 $ curl -X GET http://127.0.0.1:8000/ -d @datafile To specify a header you have to use -H. 1 curl http://127.0.0.1:8000/ -d 'yourbody' -H 'Content-type: application/json' If you want to see the whole communication between the client and the server you can use -v. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 $ curl http://127.0.0.1:8000/newspaper/ -v * Hostname was NOT found in DNS cache Trying 127.0.0.1... * * Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0) > GET /newspaper/ HTTP/1.1 > User-Agent: curl/7.35.0 > Host: 127.0.0.1:8000 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 200 OK < Date: Wed, 24 Jun 2015 12:16:16 GMT < Server: WSGIServer/0.2 CPython/3.4.0 < X-Frame-Options: SAMEORIGIN < Content-Type: text/html; charset=utf-8 < * Closing connection 0 [{"newspaperlink": "newspapers/1", "name": "newspaper1"}, {"newspaperlink": "newspapers/2", "name": "newspaper2"}] Exercise 1.1: Create a new file inside ArticleReader/ and name it middleware.py. Inside it create a new class called ParsingMiddleware. Exercise 1.2: First of all we must decide which middleware method must be implemented. If you can’t decide take a look at section 6.8. Create the method inside ParsingMiddleware and make it return ’None’. Add your middleware as ’ArticleReader.middleware.ParsingMiddleware’ in the middleware tuple inside ’settings.py’ as the first element of 82 the tuple*. Try to make a GET request to the server. If the server doesn’t throw an exception, you’ve correctly added the middleware. *We’re going to perform POST calls later and by default there’s a middleware installed that will catch and response the request before our middleware is processed. Exercise 1.3: We should be able to know which view is being called. In the parameters of process_view, view_func is passed, but it is a function object. You’ll have to look into the python documentation how to get it’s name. To check if it’s working use the print function with the name function and it will be shown in the server log. Keep returning ’None’ so the server doesn’t rise an exception. Test it with GET requests. Exercise 1.4: Let’s add some functionality to process_view: If the view_function is ’ArticleListView’ use the python’s json default library to parse the body request. Notice that request.body is defined as a byte stream, so you’ll have to decode it before parsing it. You can use request.body.decode(’utf-8’). To parse the data python has a json library that will be enough for our purposes. You can use it like this: 1 2 3 4 try: parsed_data=json.loads(yourjsondata) except: return HttpResponse(status=??) Use print to check the functionality. Look into the status codes’ list to decide which status code you have to use. Exercise 1.5: The middleware should only try to parse the data inside the requests’ bodies if the method used is a POST, which will be used to add new articles to an author’s list of articles. Modify the method so it returns None if the request.method is any other than POST, regardless of what’s inside the body. Exercise 1.6: Create an auxiliary method that: 1. Checks if the parsed data is a dictionary. 2. Checks if the number of keys in the dictionary matches the number of keys in an article object. 3. Checks if the names of the keys match the names of the keys of an article object. 4. Checks the previous conditions for each article’s sub-structure You can do it manually, checking all the fields in an article or define a function that given a dummie structure, checks if an object matches with the structure. Exercise 1.7 Use the auxiliary method that you just created to check if the received data is an article. If it is an article, the middleware should return None, but if its not valid data you should return a 400 status code. 83 MODELS 1.1: u s e r ~ / y o u r d i r $ d j a n g o −admin s t a r t p r o j e c t E x e r c i s e 1 u s e r ~ / y o u r d i r $ cd E x e r c i s e 1 u s e r ~ / y o u r d i r / E x e r c i s e 1 $ p y t h o n 3 manage . py s t a r t a p p A r t i c l e R e a d e r 1 2 3 1.2: 1 from django.db import models 2 3 class Newspaper(models.Model): 4 name = models.CharField(max_length=20, unique=True) location = models.CharField(max_length=50) description = models.TextField() 5 6 7 8 9 class Author(models.Model): 10 fistname = models.CharField(max_length=20) middlename = models.CharField(max_length=20) lastname = models.CharField(max_length=20) newspaper = models.ForeignKey(Newspaper) 11 12 13 14 15 16 class Article(models.Model): 17 author = models.ForeignKey(Author) headline = models.CharField(max_length=140) body = models.TextField() 18 19 20 u s e r ~ / y o u r d i r / E x e r c i s e 1 $ p y t h o n 3 manage . py m a k e m i g r a t i o n s u s e r ~ / y o u r d i r / E x e r c i s e 1 $ p y t h o n 3 manage . py m i g r a t e 1 2 INSTALLED_APPS = ( ' django . c o n t r i b ' django . c o n t r i b ' django . c o n t r i b ' django . c o n t r i b ' django . c o n t r i b ' django . c o n t r i b ' ArticleReader ' 1 2 3 4 5 6 7 8 9 . admin ' , . auth ' , . contenttypes ' , . sessions ' , . messages ' , . staticfiles ' , ) 1.3: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 u s e r ~ / y o u r d i r / E x e r c i s e 1 $ p y t h o n 3 manage . py s h e l l P y t h o n 3 . 4 . 0 ( d e f a u l t , Apr 11 2 0 1 4 , 1 3 : 0 5 : 1 1 ) [GCC 4 . 8 . 2 ] on l i n u x Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more i n f o r m a t i o n . ( InteractiveConsole ) >>> from A r t i c l e R e a d e r . m o d e l s i m p o r t Newspaper , Author , A r t i c l e >>> n = Newspaper . o b j e c t s . c r e a t e ( name= n e w s p a p e r 1 , l o c a t i o n =123 , f a k e s t r e e t , d e s c r i p t i o n = F i r s t newspaper t e s t ) >>> n < Newspaper : Newspaper o b j e c t > >>> n2 = Newspaper . o b j e c t s . c r e a t e ( name= n e w s p a p e r 2 , l o c a t i o n =4 , P r i v e t D r i v e , d e s c r i p t i o n = Second newspaper t e s t ) >>> n2 < Newspaper : Newspaper o b j e c t > >>> a1 = A u t h o r . o b j e c t s . c r e a t e ( f i s t n a m e = " Homer " , middlename = " J a y " , l a s t n a m e = " Simpson " , n e w s p a p e r = n) >>> a2 = A u t h o r . o b j e c t s . c r e a t e ( f i s t n a m e = " Marge " , middlename = " . " , l a s t n a m e = " Simpson " , n e w s p a p e r =n ) >>> a3 = A u t h o r . o b j e c t s . c r e a t e ( f i s t n a m e = " B a r t " , middlename = " . " , l a s t n a m e = " Simpson " , n e w s p a p e r =n2 ) >>> a4 = A u t h o r . o b j e c t s . c r e a t e ( f i s t n a m e = " L i s a " , middlename = " . " , l a s t n a m e = " Simpson " , n e w s p a p e r =n2 ) 84 17 18 19 20 >>> a r 1 = A r t i c l e . o b j e c t s . c r e a t e ( a u t h o r =a1 , h e a d l i n e = " Ouch ! " , body= "No m a t t e r how good you a r e a t s o m e t h i n g , t h e r e ' s a l w a y s a b o u t a m i l l i o n p e o p l e b e t t e r t h a n you . " ) >>> a r 2 = A r t i c l e . o b j e c t s . c r e a t e ( a u t h o r =a2 , h e a d l i n e = "Hrmmm . . . " , body= " I 'm g o i n g i n t o t h e d i n i n g room t o h a v e a c o n v e r s a t i o n . I f you want t o j o i n me , f i n e . ( g o e s i n t o t h e d i n i n g room and i m i t a t e s a s e c o n d v o i c e ) H e l l o Marge , how ' s t h e f a m i l y ? ( i n r e g u l a r v o i c e ) I don ' t want t o t a l k a b o u t i t ! Mind y o u r own b u s i n e s s ! " ) >>> a r 3 = A r t i c l e . o b j e c t s . c r e a t e ( a u t h o r = a3 , h e a d l i n e = " I d i d n ' t do i t ! " , body= " You g o t t h e b r a i n s and t a l e n t t o go a s f a r a s you want and when you do I ' l l be r i g h t t h e r e t o b o r r o w money . " ) >>> a r 4 = A r t i c l e . o b j e c t s . c r e a t e ( a u t h o r = a4 , h e a d l i n e = "BAAAAART ! ! ! " , body= " I had a c a t named S n o w b a l l . She d i e d ! She d i e d ! Mom s a i d s h e was s l e e p i n g . She l i e d ! She l i e d ! Why oh why i s my c a t d e a d ? Couldn ' t t h a t C h r y s l e r h i t me i n s t e a d ? I had a h a m s t e r named S n u f f y . He d i e d " ) 1.4: 1 2 3 4 5 6 7 8 9 1 u s e r y o u r d i r / E x e r c i s e 1 $ p y t h o n manage . py s h e l l P y t h o n 2 . 7 . 6 ( d e f a u l t , Mar 22 2 0 1 4 , 2 2 : 5 9 : 5 6 ) [GCC 4 . 8 . 2 ] on l i n u x 2 Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more i n f o r m a t i o n . ( InteractiveConsole ) >>> from A r t i c l e R e a d e r . m o d e l s i m p o r t Newspaper >>> n = Newspaper . o b j e c t s . g e t ( name= " n e w s p a p e r 1 " ) >>> n < Newspaper : Newspaper o b j e c t > from django.db import models 2 3 class Newspaper(models.Model): 4 5 6 7 name = models.CharField(max_length=20, unique=True) location = models.CharField(max_length=50) description = models.TextField() 8 9 10 11 12 def __str__(self): return " ".join([self.name, self.location, self.description]) 13 14 class Author(models.Model): 15 16 17 18 19 fistname = models.CharField(max_length=20) middlename = models.CharField(max_length=20) lastname = models.CharField(max_length=20) newspaper = models.ForeignKey(Newspaper) 20 21 22 23 24 25 def __str__(self): return " ".join([self.fistname, self.middlename, self.lastname, self.newspaper.name]) 26 27 class Article(models.Model): 28 29 30 31 author = models.ForeignKey(Author) headline = models.CharField(max_length=140) body = models.TextField() 32 33 34 35 36 37 def __str__(self): return " ".join([self.headline, self.body, self.author.fistname, self.author.newspaper.name]) 85 u s e r y o u r d i r / E x e r c i s e 1 $ p y t h o n manage . py s h e l l P y t h o n 2 . 7 . 6 ( d e f a u l t , Mar 22 2 0 1 4 , 2 2 : 5 9 : 5 6 ) [GCC 4 . 8 . 2 ] on l i n u x 2 Type " h e l p " , " c o p y r i g h t " , " c r e d i t s " o r " l i c e n s e " f o r more i n f o r m a t i o n . ( InteractiveConsole ) >>> from A r t i c l e R e a d e r . m o d e l s i m p o r t Newspaper >>> n = Newspaper . o b j e c t s . g e t ( name= " n e w s p a p e r 1 " ) >>> n < Newspaper : n e w s p a p e r 1 1 2 3 , f a k e s t r e e t F i r s t n e w s p a p e r t e s t > >>> from A r t i c l e R e a d e r . m o d e l s i m p o r t Author , A r t i c l e >>> au = A u t h o r . o b j e c t s . g e t ( f i s t n a m e = " Homer " ) >>> au < A u t h o r : Homer J a y Simpson n e w s p a p e r 1 > >>> a r = A r t i c l e . o b j e c t s . g e t ( a u t h o r = au ) >>> a r < A r t i c l e : Ouch ! No m a t t e r how good you a r e a t s o m e t h i n g , t h e r e ' s a l w a y s a b o u t a m i l l i o n p e o p l e b e t t e r t h a n you . Homer n e w s p a p e r 1 > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1.5 >>> >>> >>> >>> >>> u' 1 2 3 4 5 6 7 8 [ { " fields ":{ " middlename " : " J a y " , " l a s t n a m e " : " Simpson " , " newspaper " : 1 , " f i s t n a m e " : " Homer " }, " model " : " A r t i c l e R e a d e r . a u t h o r " , " pk " : 1 9 10 11 12 13 14 15 16 17 18 }, { " fields ":{ " middlename " : " . " , " l a s t n a m e " : " Simpson " , " newspaper " : 1 , " f i s t n a m e " : " Marge " }, " model " : " A r t i c l e R e a d e r . a u t h o r " , " pk " : 2 19 20 21 22 23 24 25 26 27 28 }, { " fields ":{ " middlename " : " . " , " l a s t n a m e " : " Simpson " , " newspaper " : 2 , " fistname " : " Bart " }, " model " : " A r t i c l e R e a d e r . a u t h o r " , " pk " : 3 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 from d j a n g o . c o r e i m p o r t s e r i a l i z e r s from A r t i c l e R e a d e r . m o d e l s i m p o r t A u t h o r a u t h o r _ l i s t = Author . o b j e c t s . a l l ( ) serialized_authors = s e r i a l i z e r s . s e r i a l i z e ( ' json ' , a u t h o r _ l i s t ) serialized_authors }, { " fields ":{ " middlename " : " . " , " l a s t n a m e " : " Simpson " , " newspaper " : 2 , " fistname " : " Lisa " }, " model " : " A r t i c l e R e a d e r . a u t h o r " , 86 " pk " : 4 46 } 47 48 ] ' Note: the output has been formatted to enhance the reading. 2.1 1 2 3 4 5 6 7 8 def getdict(self): article = {'author': " ".join([ self.author.fistname, self.author.middlename, self.author.lastname,]), 'headline': self.headline, 'body': self.body} return article 2.2 def serialize(self, article): return json.dumps(self.getdict(article)) 1 2 2.3 1 2 3 4 5 6 7 >>> >>> >>> >>> >>> >>> { " h e a d l i n e " : " Ouch ! " , " body " : "No m a t t e r how good you a r e a t s o m e t h i n g , t h e r e ' s a l w a y s a b o u t a m i l l i o n p e o p l e b e t t e r t h a n you . " , " a u t h o r " : " Homer J a y Simpson " 8 9 10 11 from A r t i c l e R e a d e r . m o d e l s i m p o r t A r t i c l e from A r t i c l e R e a d e r . s e r i a l i z e r s i m p o r t A r t i c l e S e r i a l i z e r a = ArticleSerializer () a r t i c l e = Article . objects . all () [0] s e r i a l i z e d _ a r t i c l e =a . s e r i a l i z e ( a r t i c l e ) serialized_article } 2.4 1 2 3 4 5 def serialize_many(self, list): array=[] for article in list: array.append(self.getdict(article)) return json.dumps(array) You can’t use serialize() because it’s return is a string. To use it you should parse the string to get a dict object again, join all the objects in an array and then serialize the array, that’s why you should do the auxiliary method. 2.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >>> >>> >>> >>> >>> >>> >>> '[ from A r t i c l e R e a d e r . m o d e l s i m p o r t A r t i c l e from A r t i c l e R e a d e r . s e r i a l i z e r s i m p o r t A r t i c l e S e r i a l i z e r a = Article . objects . all () s = ArticleSerializer () s e r i a l i z e d _ l i s t = s . serialize_many ( a ) serialized_list { " h e a d l i n e " : " Ouch ! " , " body " : "No m a t t e r how good you a r e a t s o m e t h i n g , t h e r e ' s a l w a y s a b o u t a m i l l i o n p e o p l e b e t t e r t h a n you . " , " a u t h o r " : " Homer J a y Simpson " }, { " h e a d l i n e " : "Hrmmm . . . " , 87 " body " : " I 'm g o i n g i n t o t h e d i n i n g room t o h a v e a c o n v e r s a t i o n . I f you want t o j o i n me , f i n e . ( g o e s i n t o t h e d i n i n g room and i m i t a t e s a s e c o n d v o i c e ) H e l l o Marge , how ' s t h e f a m i l y ? ( i n r e g u l a r v o i c e ) I don ' t want t o t a l k a b o u t i t ! Mind y o u r own b u s i n e s s ! " , " a u t h o r " : " Marge . Simpson " }, { " h e a d l i n e " : " I d i d n ' t do i t ! " , " body " : " You g o t t h e b r a i n s and t a l e n t t o go a s f a r a s you want and when you do I ' l l be r i g h t t h e r e t o b o r r o w money . " , " a u t h o r " : " B a r t . Simpson " }, { " h e a d l i n e " : "BAAAAART ! ! ! " , " body " : " I had a c a t named S n o w b a l l . She d i e d ! She d i e d ! Mom s a i d s h e was s l e e p i n g . She l i e d ! She l i e d ! Why oh why i s my c a t d e a d ? Couldn ' t t h a t C h r y s l e r h i t me i n s t e a d ? I had a h a m s t e r named S n u f f y . He d i e d \ \ u2014 " , " a u t h o r " : " L i s a . Simpson " } 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ]' 2.6 1 2 3 4 5 6 7 8 9 10 11 12 class ArticleSerializer: def getdict(self,article): aux = { 'headline':article.headline, 'body': article.body } article = { 'newspaper':article.author.newspaper.id, 'author': article.author.id, 'article': aux } return article 13 14 15 16 17 18 def getlistdict(self,article): article = { 'headline':article.headline, 'articlelink':'/article/%s' % article.id } 19 20 return article 21 22 23 def serialize(self, article): return json.dumps(self.getdict(article)) 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 def serialize_many(self, list): array=[] for article in list: array.append(self.getlistdict(article)) return json.dumps(array) >>> from A r t i c l e R e a d e r . s e r i a l i z e r s i m p o r t A r t i c l e S e r i a l i z e r >>> from A r t i c l e R e a d e r . m o d e l s i m p o r t A r t i c l e >>> s = A r t i c l e S e r i a l i z e r ( ) >>> a r t i c l e _ l i s t = A r t i c l e . o b j e c t s . a l l ( ) >>> s t r i n g = s . s e r i a l i z e ( a r t i c l e _ l i s t [ 0 ] ) >>> s t r i n g '{ " newspaper " : 1 , " article ": { " h e a d l i n e " : " Ouch ! " , " body " : "No m a t t e r how good you a r e a t s o m e t h i n g , t h e r e ' s always about a m i l l i o n people b e t t e r 88 t h a n you . " 13 }, " autho r " : 1 }' 14 15 16 1 2 3 4 5 6 7 >>> >>> >>> >>> >>> >>> [ from A r t i c l e R e a d e r . s e r i a l i z e r s i m p o r t A r t i c l e S e r i a l i z e r from A r t i c l e R e a d e r . m o d e l s i m p o r t A r t i c l e s = ArticleSerializer () a r t i c l e _ l i s t = Article . objects . all () string = s . serialize_many ( a r t i c l e _ l i s t ) string { 8 " h e a d l i n e " : " Ouch ! " , " a r t i c l e l i n k " : " / a r t i c l e /1 " 9 10 }, { 11 12 " h e a d l i n e " : "Hrmmm . . . " , " a r t i c l e l i n k " : " / a r t i c l e /2 " 13 14 }, { 15 16 " h e a d l i n e " : " I d i d n ' t do i t ! " , " a r t i c l e l i n k " : " / a r t i c l e /3 " 17 18 }, { 19 20 " h e a d l i n e " : "BAAAAART ! ! ! " , " a r t i c l e l i n k " : " / a r t i c l e /4 " 21 22 } 23 24 ] 2.7 1 class NewspaperSerializer: 2 3 4 5 6 7 8 9 10 11 12 13 14 def getdict(self,newspaper): author_list = newspaper.author_set.all() aux = [] for author in author_list: aux.append({'authorname':author.fistname, 'authorlink':'/Author/%s' % author.id}) mynewspaper = { 'name':newspaper.name, 'location': newspaper.location, 'authors':aux } return mynewspaper 15 16 17 18 19 20 21 def getlistdict(self,newspaper): newspaper = { 'name':newspaper.name, 'newspaperlink':'newspapers/%s' % newspaper.id } return newspaper 22 23 24 25 26 27 def serialize_many(self,list): array = [] for newspaper in list: array.append(self.getlistdict(newspaper)) return json.dumps(array) 28 29 30 1 def serialize(self,newspaper): return json.dumps(self.getdict(newspaper)) class AuthorSerializer: 89 2 3 4 5 6 7 8 9 10 11 def getdict(self,author): aux = { 'firstname': author.fistname, 'middlename': author.middlename, 'lastname': author.lastname} myauthor = { 'information': aux, 'newspaper': author.newspaper.id, 'articleslistlink':'/author/%s/articles/' % author.id } return myauthor 12 13 14 def serialize(self,author): return json.dumps(self.getdict(author)) VIEWS 1.1 1 class NewspaperListView(View): 2 3 4 def get(self,request,*args,**kwargs): return HttpResponse(status=200) 1.2 1 2 3 1 url(r'^newspaper/$', NewspaperListView.as_view(), name='newspaper-list') class NewspaperListView(View): 2 3 4 5 6 7 def get(self,request,*args,**kwargs): newspaper_list = Newspaper.objects.all() serializer = NewspaperSerializer() data = serializer.serialize_many(newspaper_list) return HttpResponse(data) 1.3 1 2 3 1 url(r'^author/(?P<pk>\d+)/$', AuthorView.as_view(), name='author-detail' ) class AuthorView(View): 2 3 4 5 6 7 8 def get(self,request,*args,**kwargs): author_id = int(kwargs['pk']) author = Author.objects.get(id=author_id) serializer = AuthorSerializer() data = serializer.serialize(author) return HttpResponse(data) Result 1.4 1 2 3 1 url(r'^article/(?P<pk>\d+)/$', ArticleView.as_view(), name='article-detail') class ArticleView(View): 2 3 def get(self,request,*args,**kwargs): 90 article_id = int(kwargs['pk']) article = Article.objects.get(id=article_id) serializer = ArticleSerializer() data = serializer.serialize(article) return HttpResponse(data) 4 5 6 7 8 1.5 url(r'^newspaper/(?P<pk>\d+)/$', NewspaperView.as_view(), name ='newspaper-detail') 1 2 3 1 class NewspaperView(View): 2 def get(self,request,*args,**kwargs): newspaper_id = int(kwargs['pk']) newspaper = Newspaper.objects.get(id=newspaper_id) s = NewspaperSerializer() data = s.serialize(newspaper) return HttpResponse(data) 3 4 5 6 7 8 1.6 1 2 3 1 url(r'^search/by_newspaper/(?P<pk>\d+)/$', SearchView.as_view(),{'filter':'newspaper'}, name='search-newspaper'), class SearchView(View): 2 3 4 5 6 7 8 9 10 11 12 def get(self,request,*args,**kwargs): if kwargs['filter']=='newspaper': aux = [] newspaper = Newspaper.objects.get(id=int(kwargs['pk'])) for author in newspaper.author_set.all(): for article in author.article_set.all(): aux.append(article) s = ArticleSerializer() data = s.serialize_many(aux) return HttpResponse(data) 1.7 1 2 3 4 5 url( r'^search/by_date/(?P<day>\d\d)-(?P<month>\d\d)-(?P<year>\d\d\d\d)/$', SearchView.as_view(), {'filter':'date'}, name='search-date'), Indent changed to 2 spaces to fit the code in the page 1 class SearchView(View): 2 3 4 5 6 7 8 9 10 11 12 13 def get(self,request,*args,**kwargs): if kwargs['filter']=='newspaper': aux = [] newspaper = Newspaper.objects.get(id=int(kwargs['pk'])) for author in newspaper.author_set.all(): for article in author.article_set.all(): aux.append(article) s = ArticleSerializer() data = s.serialize_many(aux) return HttpResponse(data) elif kwargs['filter'] == 'date': 91 day = int(kwargs['day']) month = int(kwargs['month']) year = int(kwargs['year']) newspaper_list = Newspaper.objects.all() aux=[] for newspaper in newspaper_list: for author in newspaper.author_set.all(): for article in author.article_set.filter( date__day=day, date__month=month, date__year=year): aux.append(article) s = ArticleSerializer() data = s.serialize_many(aux) return HttpResponse(data) 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1.8 Look for every .get() query and wrap it with a try...except structure and in the exception return an HttpResponse(status=404) 1 class NewspaperView(View): 2 3 4 5 6 7 8 9 10 11 def get(self,request,*args,**kwargs): newspaper_id = int(kwargs['pk']) try: newspaper = Newspaper.objects.get(id=newspaper_id) except: return HttpResponse(status=404) s = NewspaperSerializer() data = s.serialize(newspaper) return HttpResponse(data) MIDDLEWARE 1.2 1 2 3 4 5 6 7 8 9 10 MIDDLEWARE_CLASSES = ( 'ArticleReader.middleware.ParsingMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.auth.middleware.SessionAuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', ) The method must be ’process_view(self,request,view_func,view_args,view_kwargs)’ 1.3 1 2 3 4 5 def process_view(self, request, view_func, view_args, view_kwargs): print(view_func.__name__) if view_func.__name__ == 'ArticleListView': return None return None 1.4 1 2 3 4 5 6 7 def process_view(self, request, view_func, view_args, view_kwargs): print(view_func.__name__) if view_func.__name__ == 'ArticleListView': body = request.body.decode('utf-8') data = json.loads(body) print('Data: %s'%data) return None 92 1.5 1 2 3 4 5 6 7 8 9 def process_view(self, request, view_func, view_args, view_kwargs): print(view_func.__name__) if view_func.__name__ == 'ArticleListView': if request.method == 'POST': if request.POST != "": body = request.body.decode('utf-8') data = json.loads(body) print('Data: %s'%data) return None 1.6 article_is_valid is the manual form, much harder to maintain. are_same_type is much more maintainable, because you only need to change the dummie object to check a different structure. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 def article_is_valid(article): if type(article) is dict: if len(article) == 3: if ('newspaper' in article): if ('author' in article): if ('article' in article): if type(article['newspaper']) is int and type(article['author']) is int and type(article['article']) is dict: if len(article['article'].keys()) == 2: if 'headline' in article['article'] and 'body' in article['article']: if type(article['article']['headline']) is str and type(article['article']['body']) is str: return True return False 17 18 19 20 21 22 23 24 25 26 27 28 29 def are_same_type(obj,dummie): if type(obj) is not type(dummie): return False if type(dummie) == dict or type(obj) is list: if len(dummie) != len(obj): return False for key in dummie: if type(dummie[key]) is not type(obj[key]): return False if type(dummie[key]) is list or type(dummie[key]) is dict: return are_same_type(obj[key], dummie[key]) return True 1.7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class ParsingMiddleware(): article_dummie = {'newspaper':1, 'author':1, 'article':{ 'headline':'asdf', 'body':'asdf'} } def process_view(self, request, view_func, view_args, view_kwargs): print(view_func.__name__) if view_func.__name__ == 'ArticleListView': if request.method == 'POST': if request.POST != "": body = request.body.decode('utf-8') print(body) data = json.loads(body) print('Data: %s'%data) if are_same_type(data,self.article_dummie): 93 18 19 20 21 return None else: return HttpResponse(status=400) return None 94 Chapter 8 REST Aplied to a SDN Application 8.1 Introduction The main principle behind Software-Defined Networking (SDN) is the physical separation of the network control plane from the forwarding plane, where a single control plane controls several devices. Software-Defined Networking is an emerging architecture that is dynamic, manageable, cost-effective, and adaptable, making it ideal for the high-bandwidth, dynamic nature of today’s applications. This architecture decouples the network control and forwarding functions enabling the network control to become directly programmable and the underlying infrastructure to be abstracted for applications and network services. The SDN architecture is: • Directly programmable: Network control is directly programmable because it is decoupled from forwarding functions. • Agile: Abstracting control from forwarding lets administrators dynamically adjust network-wide traffic flow to meet changing needs. • Centrally managed: Network intelligence is (logically) centralized in software-based SDN controllers that maintain a global view of the network, which appears to applications and policy engines as a single, logical switch. • Programmatically configured: SDN lets network managers configure, manage, secure, and optimize network resources very quickly via dynamic, automated SDN programs, which they can write themselves because the programs do not depend on proprietary software. • Open standards-based and vendor-neutral: When implemented through open standards, SDN simplifies network design and operation because instructions are provided by SDN controllers instead of multiple, vendorspecific devices and protocols. 8.2 Ryu Introduction Ryu is a component-based framework for Software-Defined Networking applications. It provides software components with well defined API that make it easy to create network management and control applications. Ryu supports various protocols for managing network devices, such as OpenFlow, Netconf, OF-config, etc. Ryu supports OpenFlow 1.0, 1.2, 1.3, 1.4. It’s fully developed in Python and all of the code is freely available under the Apache 2.0 license. It is the tool chosen to develop the Control Plane in the experiments exposed in this document. All of the scenarios analyzed rely on OpenFlow 1.3, Open vSwitch, Ryu and Mininet. 95 8.3 Ryu Features Ryu defines two important base classes, that you’ll need to extend to create your applications and controllers: RyuApp and ControllerBase. The first one is located in the app_manager package (app_manager.RyuApp). It is used to receive messages originated in the switches. You can receive any OpenFlow message. To receive messages you’ll have to create your method, that processes the incoming message and decorate it with a ’@set_ev_cls’ decorator (see section 8.3.2). When a message is received the RyuApp class will look for methods in the extended class that are decorated and if it finds a method with the right decorator it will call it. Otherwise the packet will not be processed. RyuApp contains a variable called OFP_VERSIONS. It is a list of all the OpenFlow versions that the app will accept. If a switch does not operate in one of the versions listed in OFP_VERSIONS, that switch will be ignored. ControllerBase is the base class to generate your APIs. It will handle the incoming HTTP connections through a WSGI interface. To respond to a certain HTTP connection you’ll have to create a method that processes the request and sends the response back and decorate it with a ’@route’ decorator (see section 8.3.3). When a request arrives the server will look in the class extended from ControllerBase if there’s a method decorated with the right decorator. If the method is found, the packet will be handed to the method and the response gotten from the method will be sent back to the client. Otherwise, the request will be deprecated and the client will receive a response with a 404 status code. 8.3.1 Message Reply Handlers To process the packets that arrive from the switches you can use a decorator found in the package ryu.controller.handler called set_ev_cls. This decorator accepts two arguments. The first one is the type of packets that the function will handle. Those are OpenFlow packets. In this example we will only use asdf of them: • EventOFPStateChange: It is received when a switch’s state changes. • EventOFPFlowStatsReply: It is received after a EventOFPFlowStatsRequest is sent. It contains information about the switch’s flows. • EventOFPPortStatsReply: It is received after a EventOFPPortStatsRequest is sent. It contains statistics collected from the switche’s ports. The second argument is the switch’s state when it generated the message. There are four possible states: • HANDSHAKE_DISPATCHER: The switch is up and it looks for controllers. • CONFIG_DISPATCHER: Exchange of features, such as OpenFlow versions available. • MAIN_DISPATCHER: Normal state, the switch forwards packets and communicates with the controller when necessary. • DEAD_DISPATCHER: The switch is disconnected from the controller. When a packet is processed, the method receives one positional argument, the event, which will be explained on section 8.3.2 Example: 1 2 3 @set_ev_cls(ofp_event.EventOFPStateChange, MAIN_DISPATCHER) def add_switch(self, ev): print('New switch is up') 4 5 6 7 @set_ev_cls(ofp_event.EventOFPStateChange, DEAD_DISPATCHER) def delete_switch(self,ev): print('Swtich went down') This methods will receive the EventOFPStateChange message when the switch is ready to work and when the switch is disconnected from the controller. It can be used, for example, to keep track of working switches. 96 8.3.2 OpenFlow protocol messages When a message is received, it is encapsulated in one object. The object contains the msg object. The msg object contains the switch’s datapath object, the datapath object contains the ofproto object and the parser object which are independent from the message. It also contains the specific information that the packet contains (For example the port statistics). While the first part (which contains the switch information) has always the same structure, the structure from the part that carries the message information depends on the type of the message received. Example: OFPPortStatsReply contains an object called ’body’ inside msg that contains a list of OFPPortStats. Each flow contains information such as the packets received and the packets sent. 1 2 3 4 @set_ev_cls(ofp_event.EventOFPPortStatsReply, MAIN_DISPATCHER) def _port_stats_reply_handler(self, ev): datapath = ev.msg.datapath datapath_id = datapath.id 5 6 body = ev.msg.body 7 8 9 for stat in body: self.logger.info('Datapath: %s, port: %s, received packets: %s' % (datapath.id, stat.port_no, stat.rx_packets)) This simple function prints the statistics received in the controller, specifying the datapath and the port number. To send messages to the switches you’ll need the datapath and the parser object contained in the received messages. You should store them when a switch changes to MAIN_DISPATCHER state. The parser object contains an OFPMatch method, that takes named arguments and generates a match object. It also contains methods to generate instructions. We’ll only use OFPInstructionActions and OFPActionOutput but you can find the full list of methods in the Ryu’s docummentation. Finally, we’ll also use the parser object to compile the match, the instructions, the datapath, etc in a single object using OFPFlowMod. It takes named arguments. The most common are: match, instructions, priority and datapath. The datapath object contains a method named send_msg, which takes as argument an OFPFlowMod and sends a message to the switch with the compiled object. Example: 1 2 3 4 5 6 7 8 def add_flow(self, datapath): ofproto = datapath.ofproto parser = datapath.ofproto_parser match = parser.OFPMatch(in_port=1) inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser.OFPActionOutput(2)])] mod = parser.OFPFlowMod(datapath=datapath, priority=1, match=match, instructions=inst) datapath.send_msg(mod) self.logger.info("New Flow Added") This simple method, takes a datapath as the first positional argument, then generates a message that installs a new flow into the switch. The flow indicates that when a packet is received from the port 1 it must be forwarded to port 2. 8.3.3 HTTP Request Handlers A class that extends from ControllerBase will be able to handle HTTP Requests that arrive to the server. The requests will be filtered by their URL and their methods and Ryu will decide which method is called. To indicate which URL and methods will be matched the class’ methods must be decorated with the route decorator. Route accepts two positional arguments and two key-word arguments: 1. Request name: It’s only an identifier string to name the resource. It doesn’t have any further implications. 2. URL: It contains a string defining the URL to match. It doesn’t contain the domain nor the first part of the URI (protocol://ip_address:port). To define variable URL parts you can use brackets ’{’ and ’}’. Inside this brackets 97 you have to specify a representative name, it doesn’t matter which one do you use, you’ll use it only to identify the substring contained. You’ll be able to access the substring wrapped in braces inside the method by calling ’kwargs[’name’]’. 3. methods=[]: It contains an array with all the method(s) that this method will listen to. 4. requirements={}: It contains a dictionary whose keys are the names are the names defined in the URL (inside the brackets) and the values are patterns. It forces the substring identified by the key to match the pattern contained in the value. Example: 1 self.simple_digit_pattern='\d' 2 3 4 5 6 @route('My_test_example', '/resource/{resource_identifier}', methods=['GET','POST','DELETE'], requirements={'resource_identifier':self.simple_digit_pattern}) The methods decorated with route should take two arguments: a request argument, and a key-word argument. They have to return a Response object from the ’webob’ package. 8.3.4 Link REST Controllers with Ryu applications The rest linkage of a certain Ryu application with a web interface is made outside the app’s class. You have to create a controller class that handles requests and responses. To link a controller with an application you can use a WSGI object, created by Ryu and stored in the key-words argument. It is accessible through the key ’wsgi’. This object allows you to register controllers for your api. The controller registered will recieve the incoming requests and will have an instance of the application (See Figure 8.1). The controller will be able to call the application methods through this instance. RYU APP Register Request W S G I CONTROLLER App Instance Figure 8.1: App and Controller interconnection Example: 1 2 from ryu.app.wsgi import ControllerBase from webob import Response 3 4 class TestApp(app_manager.RyuApp): 5 6 _CONTEXTS = { 'wsgi': WSGIApplication } 7 8 9 10 11 def __init__(self,*args,**kwargs): super(TestApp, self).__init__(*args, **kwargs) wsgi = kwargs['wsgi'] wsgi.register(TestController, {'TestApp': self}) 12 13 def PrintHelloWorld(self): 98 Request print("Hello World") 14 15 16 class TestController(ControllerBase): 17 def __init__(self, req, link, data, **config): super(TestController, self).__init__(req, link, data, **config) self.testapp = data['TestApp'] 18 19 20 21 @route('Hello World','/',methods=['GET']) def Hello_world(self,req,**kwargs): return Response(body=self.testapp.PrintHelloWorld()) 22 23 24 8.4 Monitoring Application In this section we will show how can a Ryu application offer a rest interface for clients to interact with the network. First of all, we need to create the list of resources that we want to offer. In this case the list will be: 1. Bookmark : Application’s entry point 2. Topology : Topology bookmark. Lists all the topology resources avaliable. 3. Switch List : Represents a list of the active switches. 4. Link List : Represents a list of the interconnections between all switches. 5. Switch’s link list : Represents a list of the existing links between the specified switch and the other switches. 6. List of flows: Complete list of flows in the network. 7. A switch’s list of flows: List of flows defined for a switch. 8. Statistics of a port: Represents the packet and byte load for a certain port of a certain switch. 9. Statistics of a switch: Lists the statistics of every port in a switch. In table 8.1 you can see the list of URIs defined for this application. Resource Bookmark Topology Switch List Link List Switch’s link list Flows list Switch’s flow list Port Statistics Switch Statistics URI ’/’ ’topology/’ ’topology/switches/’ ’topology/links/’ ’topology/links/<id>/’ ’flows/’ ’flows/<id>/ ’statistics/<id>/<port>/ ’statistics/<id>’ Table 8.1: List of URIs 8.4.1 RYU implementation In this section we will explain how to create a simple Ryu application and a controller that are able to serve the resources listed above. 99 Application First of all we will have a look at the Ryu application: This ryu application will have three functionalities: 1. Store a record of active switches. 2. Send messages to add and delete flows. 3. Send and recieve status messages. Switches To send messages to the switches you’ll need to have a datapath object, which contains the identifier of a switch, the OpenFlow protocol it is using and a parser which contains a callable object which is used to prepare the message for the switch. To store them we will create a dictionary which contains the objects under the datapath’s id number as a key. We will catch the datpath object when a switch sends an EventOFPStateChange message, that indicates that the switch is in a MAIN_DISPATCHER state (Active). Also, when a switch changes it’s state to DEAD_DISPATCHER (It’s no longer active) we will remove the switch from the dictionary. To ’catch’ the EventOFPStateChange messages, like any other message we will add a decorator to the function designed to handle it. In the decorator we will specify the message (ofp_event.EventOFPStateChange) and the switch’s state: 1 2 3 4 5 6 7 8 9 @set_ev_cls(ofp_event.EventOFPStateChange, MAIN_DISPATCHER) def add_switch(self, ev): try: datapath = ev.datapath d_id = datapath.id except: print("Error Occurred") self.switches[d_id]=datapath self.logger.info("Switch %s UP", d_id) 10 11 12 13 14 15 16 17 @set_ev_cls(ofp_event.EventOFPStateChange, DEAD_DISPATCHER) def delete_switch(self,ev): datapath = ev.datapath d_id = datapath.id if d_id in self.switches: self.switches.pop(d_id) self.logger.info("Switch %s DOWN", d_id) self.switches is the dictionary where the datapaths are stored. They are indexed by their id. Sometimes, DEAD_DISPATCHER messages arrive more than once, so you have to check if you deleted it already. The logger is just a debugging tool, it doesn’t affect the application even if it is highly recommended to use it in order to keep a record of the events. Even if with this we can keep a track of the active switches, to implement the topology resource which lists switches we will not use this information but the information provided by ryu’s topology api, which will give us much more information about the switches (number of ports, etc.). Flows In this app, we will implement two methods regarding flows: One for adding them and another one for deleting them. This methods will be called by the controller when the right request arrives (we’ll see that later). Adding a flow: 1 2 3 4 5 6 def add_flow(self, d_id, priority, conditions, out_port, buffer_id=None): datapath = self.switches[int(d_id)] ofproto = datapath.ofproto parser = datapath.ofproto_parser match = parser.OFPMatch(**conditions) if str(out_port) == "BROADCAST": 100 7 8 9 10 11 12 13 14 15 inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser. OFPActionOutput(ofproto.OFPP_FLOOD)])] else: inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser. OFPActionOutput(int(out_port))])] self.logger.info("datapath: %s,conditions = %s, output = %s" % (d_id, conditions,out_port )) mod = parser.OFPFlowMod(datapath=datapath, priority=priority, match=match, instructions=inst, cookie=self.count) self.count+=1 datapath.send_msg(mod) self.logger.info("New Flow Added") As you can see, first of all you need the datapath stored in self.switches and the objects it contains: ofproto and ofproto_parser. After that we generate a match with the conditions received from the request and OFPMatch method from the parser. To generate the match we need to use the double start notation to simplify the code. The OFPMatch method requires named arguments: 1 match = parser.OFPMatch(in_port=2) If you receive a dictionary with all the conditions you’d need to make a call specifying all the match conditions that the dictionary contains, but you don’t know which ones are defined, which means that you’d have to call something like: 1 match = parser.OFPMatch(in_port=conditions['in_port'],in_phy_port=conditions['in_phy_port'],...) To solve this we’ve used the double star symbol. The double star on a dict transforms the key strings into variables and leaves the value as it is. For example a conditions example could be: {’in_port’:2,’ipv4_src’:"192.168.1.1"}. If you use double star just like in the code, the result would be the same as calling: 1 match = parser.OFPMatch(in_port=2,ipv4_src="192.168.1.1") We can use the conditions without checking their validity because it’s been done in a previous stage. After the match is generated, it’s time for the set of instructions. In this app we will only accept two actions: send a packet to a certain port or broadcast it. Instructions are more problematic than matches because, with matches you only need named variables, while every action in an instruction set must be constructed from the parser. This is why in this app we will only accept OFPActionOutput instructions. To send a packet to a certain port you need to generate a OFPActionOutput specifying the port number or OFPP_FLOOD to broadcast it. We construct the message with OFPFlowMod specifying the datapath, the match conditions, the instructions to perform and the priority and a cookie. Cookies are numbers tied to a flow that the developer can use freely for his own advantage. We will use it to identify every flow we insert in a switch. We will set the value of the cookie using a simple integer variable that we increment every time we use it. Finally we sent the OFPFlowMod message using send_msg. Note: In this app all the flows are installed in the table 0 of the switch. Deleting a flow: 1 2 3 4 5 6 7 def remove_table_flows(self, dpid, flow_id): """Create OFP flow mod message to remove flows from table.""" datapath = self.switches[int(dpid)] ofproto = datapath.ofproto parser = datapath.ofproto_parser flow = self.flow_list[int(flow_id)] match = datapath.ofproto_parser.OFPMatch(**flow['match']) 101 out_port = flow['actions'][0]['OFPActionOutput'] table_id = flow['table_id'] if str(out_port) == "BROADCAST": inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser.OFPActionOutput (ofproto.OFPP_FLOOD)])] else: inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [parser.OFPActionOutput (int(out_port))])] flow_mod = datapath.ofproto_parser.OFPFlowMod(datapath, 0, 0,table_id, ofproto.OFPFC_DELETE ,0, 0,1,ofproto.OFPCML_NO_BUFFER,ofproto.OFPP_ANY,ofproto.OFPG_ANY, 0,match, inst) datapath.send_msg(flow_mod) 8 9 10 11 12 13 14 15 This method works just like the add_flow method. First of all, you have to get the datapath from the stored list, then you build the match and instruction set objects with the information sored on flow_list and finally you have to construct a OFPFlowMod message but you construct it with the OFPFC_DELETE attribute. In this method we don’t use the cookie generated because you can’t define a flow by it’s cookie, because multiple flows can have the same cookie, instead we retrieve the flow information stored in self.flow_list (datapath, match, instructions and table) to identify the flow. Status Messages OpenFlow defines two types of status messages. OFPPortStatsRequest messages and OFPFlowStatsRequest messages ( with their corresponding OFPPortStatsReply and OFPFlowStatsReply). Port messages are sent to a switch to demand information of every port they have. The reply contains traffic information such as bytes sent/received from that port. The Flow reply contains a list of all the flows that have been installed into a switch. In this app we will implement a traffic monitor. To do so we will send OFPPortStatsRequest messages periodically to every switch and we will store this information in a database. In this app we will also track the flows installed into the switches. To do so we will send OFPFlowStatsRequest messages periodically to every switch. First of all, let’s see how to create independent threads to work in parallel and keep sending the request messages to the switches. 1 2 self.port_thread = hub.spawn(self._port_monitor) self.flow_thread = hub.spawn(self._flow_monitor) Inside Ryu.lib there is a ’hub’ class that spawns independent threads which execute a certain method. We will create two different threads. We’ll see why later. 1 2 3 4 5 def _port_monitor(self): while True: for dp in self.switches.values(): self._request_port_stats(dp) hub.sleep(self.port_refresh_rate) 6 7 8 9 10 11 def _flow_monitor(self): while True: for dp in self.switches.values(): self._request_flow_stats(dp) hub.sleep(self.flow_refresh_rate) 12 13 14 15 def _request_port_stats(self, datapath): ofproto = datapath.ofproto parser = datapath.ofproto_parser 16 17 18 req = parser.OFPPortStatsRequest(datapath, 0, ofproto.OFPP_ANY) datapath.send_msg(req) 19 20 21 22 def _request_flow_stats(self,datapath): ofproto = datapath.ofproto parser = datapath.ofproto_parser 23 24 25 req = parser.OFPFlowStatsRequest(datapath) datapath.send_msg(req) 102 _port_monitor and _flow_monitor will run while the app is working. For every switch in the switch list they will call a function that sends OFPFlowStatsRequest messages and another function that sends OfPPortStatsRequest messages. After that it will wait a determined interval until the next round. To catch the reply messages we will use the set_ev_cls decorator. Catching Flows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 @set_ev_cls(ofp_event.EventOFPFlowStatsReply, MAIN_DISPATCHER) def _flow_stats_reply_handler(self, ev): body = ev.msg datapath = ev.msg.datapath.id data_json=body.to_jsondict() self.flow_list = {key: value for key, value in self.flow_list.items() if value['dpid'] != datapath} self.flow_time[int(datapath)]=time.time() for flow in data_json['OFPFlowStatsReply']['body']: cookie = flow['OFPFlowStats']['cookie'] if cookie != 0 and cookie not in self.flow_list: match_list = flow['OFPFlowStats']['match']['OFPMatch']['oxm_fields'] condition_list={} for condition in match_list: condition_list[condition['OXMTlv']['field']]=condition['OXMTlv']['value'] actionlist = flow['OFPFlowStats']['instructions'][0]['OFPInstructionActions']['actions'] for action in actionlist: for actionname in action: if actionname == 'OFPActionOutput': action[actionname]=action[actionname]['port'] else: action[actionname].pop('max_len') action[actionname].pop('len') action[actionname].pop('type') table_id = flow['OFPFlowStats']['table_id'] flow={'match':condition_list,'actions':actionlist,'table_id':table_id,'dpid':datapath} self.flow_list[cookie]=flow To fully understand this method you have to know the OFPFlowStatsReply structure. This kind of messages contain a json representation of a list of flows. The most important fields in a flow are: match, table_id and action_list. In the first part of the code, the message’s body is transformed into a json object. Then for every flow listed in the object, the relevant information about the flow is stored and everything else is deprecated. From every match we keep only the oxm_fields, which contains the list of conditions that must be fulfilled. Also, we won’t keep all the oxm_fields structure. oxm_fields is a list of objects which contain two values: ’field’ and ’value’. Instead of using this structure we will take a new dictionary whose keys are the ’field’ from oxm_fields and whose values are the ’value’ fields from oxm_fields. The same happens with every action: we remove some parameters such as len and type and for the most usual action (OFPActionOutput) we’ve changed the format to an easier read. To store the flows in ’flow_list’ we will index it by the cookie that we explained in add_flow. When we receive a OFPFlowStatsReply we delete all the entries in the dictionary whose datapath_id value is the same as the datapath_id received in the message in order to delete possible flows that were removed from the switches. After that we put every incoming flow into the flow_list. If the flow’s cookie is 0 we will not store it. This kind of flows are generated by the ryu app and are used, for example, to define flows to address the messages from the switches to the controller. Catching Ports: 1 2 3 4 5 6 @set_ev_cls(ofp_event.EventOFPPortStatsReply, MAIN_DISPATCHER) def _port_stats_reply_handler(self, ev): body = ev.msg.body for stat in body: dp_id = ev.msg.datapath.id.__str__() if dp_id in self.previous_read: 103 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 if str(stat.port_no) in self.previous_read[str(dp_id)]: old=self.previous_read[str(dp_id)][str(stat.port_no)] self.db.record_stats(ev.msg.datapath.id,stat.port_no, stat.rx_packets-old['rx_packets'], stat.rx_bytes-old['rx_bytes'], stat.rx_errors-old['rx_errors'], stat.tx_packets-old['tx_packets'], stat.tx_bytes-old['tx_bytes'], stat.tx_errors-old['tx_errors']) old=None self.previous_read[str(dp_id)][str(stat.port_no)]={} else: self.previous_read[str(dp_id)]={} self.previous_read[str(dp_id)][str(stat.port_no)]={} self.previous_read[str(dp_id)][str(stat.port_no)]['rx_packets']=stat.rx_packets self.previous_read[str(dp_id)][str(stat.port_no)]['rx_bytes']=stat.rx_bytes self.previous_read[str(dp_id)][str(stat.port_no)]['rx_errors']=stat.rx_errors self.previous_read[str(dp_id)][str(stat.port_no)]['tx_packets']=stat.tx_packets self.previous_read[str(dp_id)][str(stat.port_no)]['tx_bytes']=stat.tx_bytes self.previous_read[str(dp_id)][str(stat.port_no)]['tx_errors']=stat.tx_errors OFPPortStatsReply contains every port’s throughput (packets, bytes and errors sent and received). Those values are in absolute values, which means that they’re not suited for a performance analysis. To get the average values we will take the values received and subtract the previous value received from them. This way, if you send a OFPPortStatsRequest message every second you’ll have the average throughput per second in bytes, packets and errors. To store the previous read we will create a dictionary indexed by the datapath’s id whose values are also dictionaries, indexed by the port’s id whose values are objects that contain the throughput information. Example: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 { '1': { '1': { "rx_packets":X, "rx_bytes":Y, ... } '2': { "rx_packets":2X, "rx_bytes":2Y, ... } } } In the previous example we see a switch with datapath.id = 1 that has two ports ’1’ and ’2’, and the obtained values of those ports. Once we have the average throughput we store it in a database using an auxiliary class (self.db) which will be explained later. If it is the first time that we receive a OFPPortStatsReply from that switch we create an empty dictionary. Finally we have to update the values which are now the previous_read values. Controller In the controller we will create a method for every resource and we will decorate it with the @route decorator. Topology 104 1 2 3 4 5 @route('sw_list','/topology/switches/',methods=['GET']) def get_sw_list(self,req, **kwargs): switch_list = get_all_switch(self.myapp) body = json.dumps([switch.to_dict() for switch in switch_list]) return Response(body=body) 6 7 8 9 10 11 @route('all_links','/topology/links/', methods=['GET']) def get_all_links(self,req,**kwargs): links = get_all_link(self.myapp) body = json.dumps([link.to_dict() for link in links]) return Response(body=body) 12 13 14 15 16 17 18 19 @route('sw_link_list','/topology/links/{dpid}',methods=['GET']) def get_link_list(self,req, **kwargs): if int(kwargs['dpid']) not in self.myapp.switches: return Response(body=None,status_code=404) links = get_link(self.myapp, int(kwargs['dpid'])) body = json.dumps([link.to_dict() for link in links]) return Response(body=body) As we said before, we will not use the app’s list of switches to send it as a response, instead we will use ryu.topology.api package which contains four methods. We will use three of them: 1. get_all_switch(App), which returns a list of switches with relevant information about them such as a list of ports, dpid, etc. 2. get_all_link(App), which returns a list of links with information about which switches they connect. 3. get_link(App,datapath_id) which returns all the links that a switch has. Flows 1 2 3 4 @route('flow_table','/flows/',methods=['GET']) def get_flow_list(self,req, **kwargs): body = self.myapp.serialize_flow_list() return Response(body=body) 5 6 7 8 9 10 11 @route('sw_flow_table','/flows/{dpid}/',methods=['GET'],requirements = {'dpid':'\d+'}) def get_sw_flow_list(self,req, **kwargs): if int(kwargs['dpid']) not in self.myapp.switches: return Response(body=None,status_code=404) body = self.myapp.serialize_flow_list(kwargs['dpid']) return Response(body=body) 12 13 14 15 16 17 18 @route('flow','/flows/{dpid}/{flow}/',methods=['GET'], requirements = {'dpid':'\d+','flow':'\d+' }) def get_single_flow(self,req, **kwargs): if int(kwargs['dpid']) not in self.myapp.switches or int(kwargs['flow']) not in self.myapp. flow_list: return Response(body=None,status_code=404) body = self.myapp.serialize_flow_list(dpid=kwargs['dpid'],flow=int(kwargs['flow'])) return Response(body=body) 19 20 21 22 23 24 25 26 27 28 29 @route('add_flow','/flows/{dpid}/',methods=['POST'],requirements = {'dpid':'\d+'}) def put_flow_into_list(self,req,**kwargs): if int(kwargs['dpid']) not in self.myapp.switches: return Response(body=None,status_code=404) try: data = eval(req.body) except: return Response(staus_code=400) for flow in data: result = self.myapp.add_flow(kwargs['dpid'],int(flow['priority']),flow['conditions'],flow[' out_port']) 105 30 31 32 if result == False: return Response(body=None,status_code=404) return Response(status_code=200) 33 34 35 36 37 38 39 @route('delete_flow', '/flows/{dpid}/{flow}', methods=['DELETE'], requirements = {'dpid':'\d+',' flow':'\d+'}) def delete_flow(self, req,**kwargs): if int(kwargs['dpid']) not in self.myapp.switches or int(kwargs['flow']) not in self.myapp. flow_list: return Response(body=None,status_code=404) self.myapp.remove_table_flows(kwargs['dpid'],kwargs['flow']) return Response(status_code=200) As you can see, we’ve used two variable pieces of the URLs to identify the resources. Those are ’dpid’ and ’flow’. The first will be used to determine the datapath_id from a switch and the second will be used to determine the flow identificator (the cookie). There are five defined methods to work with flows. get_flow_list(), get_sw_flow_list, get_single_flow, put_flow_into_list() and delete_flow(). The get_flow_list method serializes the information contained in the flow_list. get_sw_flow_list serializes only the flows that are installed on a switch. get_single_flow returns only one flow from the flow_list. The put_flow_into_list method uses add_flow from the application to send a OFPFlowMod message to a switch with the parameters specified in the body. Finally, the ’delete_flow’ method uses the remove_table_flow method from the application. Statistics 1 2 3 4 5 6 7 8 @route('get_statistics', '/statistics/{dpid}/{port}',methods=['GET'], requirements = {'dpid':'\d+ ','port':'\d+'}) def get_statistics(self, req, **kwargs): if int(kwargs['dpid']) not in self.myapp.switches: return Response(body=None,status_code=404) stats = self.myapp.db.get_statistics(kwargs['dpid'],kwargs['port'],10) if not stats: return Response(body=None,status_code=404) return Response(body=stats) This method queries the database and retrieves the last ten results. Database Connection The app will query a database to send an retrieve port statistics. This connection will be done through the package MySQLdb and a MySQL database. Sending statistics: 1 2 3 4 5 def record_flow_entries(self, dpid, match, instructions, table_id): try: self.c.execute("INSERT INTO flow_stats (dpid,flow_match,instructions,table_id) VALUES ('%s ','%s','%s','%s')"%(dpid,match,instructions,table_id)) except Exception as e: pass 6 7 8 9 10 11 12 13 14 15 def record_stats(self, dpid, port, rx_pkt,rx_b, rx_e, tx_pkt, tx_b, tx_errors): try: self.c.execute('INSERT INTO switch_%s_port_%s (rx_pkts,rx_bytes,rx_error,tx_pkts,tx_bytes, tx_error) VALUES (%s,%s,%s,%s,%s,%s)'%(dpid,port,rx_pkt,rx_b,rx_e,tx_pkt,tx_b,tx_errors)) self.db.commit() except: self.c.execute("CREATE TABLE switch_%s_port_%s LIKE base_table" % (dpid,port)) self.db.commit() self.c.execute('INSERT INTO switch_%s_port_%s (rx_pkts,rx_bytes,rx_error,tx_pkts,tx_bytes, tx_error) VALUES (%s,%s,%s,%s,%s,%s)'%(dpid,port,rx_pkt,rx_b,rx_e,tx_pkt,tx_b,tx_errors)) self.db.commit() 106 This method will send a new row to be inserted in a table named with following the format: switch_{datapath_id}_port_{port_number}. If the table doesn’t exist (a switch just got added) it is created. The base format of the tables are: 1 2 3 4 5 6 7 8 9 10 11 +−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+ | Field | Type | N u l l | Key | D e f a u l t | Extra | +−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+ | rx_pkts | b i g i n t ( 2 0 ) | YES | | NULL | | | r x _ b y t e s | b i g i n t ( 2 0 ) | YES | | NULL | | | r x _ e r r o r | b i g i n t ( 2 0 ) | YES | | NULL | | | tx_pkts | b i g i n t ( 2 0 ) | YES | | NULL | | | t x _ b y t e s | b i g i n t ( 2 0 ) | YES | | NULL | | | t x _ e r r o r | b i g i n t ( 2 0 ) | YES | | NULL | | | time | timestamp | NO | | CURRENT_TIMESTAMP | on u p d a t e CURRENT_TIMESTAMP | +−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+ To generate it you can use the MySQL query: 1 2 3 4 5 6 7 8 9 CREATE TABLE ` b a s e _ t a b l e ` ( ` r x _ p k t s ` b i g i n t ( 2 0 ) DEFAULT NULL, ` r x _ b y t e s ` b i g i n t ( 2 0 ) DEFAULT NULL, ` r x _ e r r o r ` b i g i n t ( 2 0 ) DEFAULT NULL, ` t x _ p k t s ` b i g i n t ( 2 0 ) DEFAULT NULL, ` t x _ b y t e s ` b i g i n t ( 2 0 ) DEFAULT NULL, ` t x _ e r r o r ` b i g i n t ( 2 0 ) DEFAULT NULL, ` t i m e ` t i m e s t a m p NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ); 8.4.2 Django Implementation We will use an intermediary server in order to reduce the server load. This intermediary server will be composed by different modules that will implement: cache, content-negotiation and authentication functions. The intermediary will receive a request, process it, decide what should be done with it and if it’s required, connect to the server to retrieve the necessary information. It will be the entry point of the system and the user will not notice it’s existence. Figure 8.2: Monitoring system topology procedure. 107 Authentication The first entry point of the intermediary system will be an authentication subsystem. This authentication will use HTTP headers to receive credentials from the clients. It will use the Basic authentication scheme. The resources will be separated in four realms: 1. A realm to access the API bookmarks. 2. A realm to access the statistics. 3. A realm to access the installed flows. 4. A realm to access the network topology. The first one will not require any credentials but the rest will do. To implement this subsystem we will build a simple piece of middleware that decodes the Authorization header and decides if a request can get the resource it demands. Authentication Algorithm: View belongs to realm? Entry point No Follow Middleware Chain Yes Request contains credentials? No Send WWW-Authenticate Yes Follow Middleware Chain Yes Valid Credentials? No 401: Unauthorized Figure 8.3: Authentication algorithm procedure. For the credentials to be valid, they must use the Basic scheme, the result of decoding the credentials must follow a ’user:pwd’ pattern and finally they must be authorized credentials for the realm in which the view belongs. 108 Code: 1 from serverconnector.authconfig import authconfig,credentialslist 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 class HttpAuthMiddleware(): def process_view(self,request,view_func,view_args,view_kwargs): realm = authconfig[view_func.__name__] if realm == None: return None else: if request.META.has_key('HTTP_AUTHORIZATION'): [auth, credentials]=request.META['HTTP_AUTHORIZATION'].split(' ',1) if auth.lower() == 'basic': auth = credentials.strip().decode('base64') username, password = auth.split(':', 1) if username in credentialslist[realm]: if credentialslist[realm][username]==password: request.META['HTTP_AUTHORIZATION']=True return None response = HttpResponse(status=401) response['WWW-Authenticate'] = "Basic realm=\"%s\"" % (realm) return response authconfig and credentialslist are two dictionaries. The first one is indexed by view names and contains the realm name in which the view belongs. The second one is indexed by realm names and it’s values are dictionaries that contain user credentials: the key is the user name and the value is the password. Both variables have been hard-coded in an external file. 1 2 3 4 5 6 7 8 9 10 authconfig={ 'Bookmark':None, 'TopologyBookmark':None, 'SwitchListView':'Topo', 'LinkListView':'Topo', 'FullFlowView':'Flow', 'SwFlowView':'Flow', 'SingleFlowView':'Flow', 'StatsView':'Stats', } 11 12 13 14 15 16 credentialslist={ 'Flow':{'root':'root','flowuser':'flowpwd'}, 'Topo':{'root':'root','topouser':'topopwd'}, 'Stats':{'root':'root','statsuser':'statspwd'}, } Reasons not to use django’s default authentication module: • Even if django has an authentication application which contains a user model, it works with sessions and cookies, and therefore contradicts one of the REST constrains (stateless). To make it stateless we need to write a workaround. • In this application it’s not necessary to manage users, only to have a list of valid credentials. • If we construct the authentication subsystem like this, we obtain a pluggable mechanism, that doesn’t require any modifications on the views while if we use djang’s authentication model we need to define the permission requirements on the view. Since we’re applying realms and every view can belong to different realms we will use the process_view method, because we need to apply this middleware after the view has been selected. Finally, as you can see in line 16, if the credentials are valid and the view is going to be processed, the request’s Authorization header is modified and it’s value is set to True (Boolean). It’s a workarround to achieve a correct functioning of the cache middleware, that will be explained later. 109 Content-Negotiation Once the user is authenticated, the server will decide if it is able to serve the request in the format that the client requires in a server-driven negotiation style. The server will have a list of acceptable formats for each view. The server will look for an Accept header with all the allowed types and quality factors and will look for the best one (See section 2.9.10) that is accepted by the chosen view. All the views will be served in a standard format and after the view has been generated, the server will decide if it has to transform the content’s format or not. Content-Negotiation Algorithm: Entry point Request contains Accept header? Yes No Set content header to view's default format Parse Header Look for the best matching format Error on pasing None found 406: Not Acceptable 406: Not Acceptable Set content header to best matching format Figure 8.4: Content Negotiation algorithm procedure. When a message containing 406 status code is sent, the server sends in it’s body the list of available formats. Since HTTP does not define a standard format to do so they will be sent in json format. Code: 1 class ConentNegotiationMiddleware(): 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def process_view(self,request,view_func,view_args,view_kwargs): if request.META.has_key('HTTP_ACCEPT'): aux=request.META['HTTP_ACCEPT'].replace(' ','') allowed_types=aux.split(',') best_fit_type={'general':'*','specific':'*','q':0,'default':True} func_class = get_class(view_func.__module__, view_func.__name__) availiable_types = func_class.types default = func_class.default try: for elem in allowed_types: chunks = elem.split(';',1) if len(chunks) == 1: q=1 elif len(chunks) == 2: q = chunks[1].replace("q=","") else: return HttpResponse(json.dumps(availiable_types),status=406) if float(q) > 0.0 and float(q) <= 1.0: [general, specific] = chunks[0].split('/',1) if general in availiable_types: if specific in availiable_types[general]: if float(q) > float(best_fit_type['q']): best_fit_type = {'general':general,'specific':specific,'q':q,'default':False} elif float(q) == float(best_fit_type['q']) and best_fit_type['specific']=='*': best_fit_type = {'general':general,'specific':specific,'q':q,'default':False} elif specific == '*': if float(q) > float(best_fit_type['q']): best_fit_type = {'general':general,'specific':specific,'q':q,'default':False} elif general == '*' and specific == '*': best_fit_type = {'general':general,'specific':specific,'q':q,'default':False} if best_fit_type['default']==True: 110 34 35 36 37 38 39 40 41 42 43 44 45 46 47 return HttpResponse(json.dumps(availiable_types),status=406) else: if best_fit_type['general'] == '*': request.META['HTTP_ACCEPT']="%s" % default['*/*'] elif best_fit_type['specific'] == '*': request.META['HTTP_ACCEPT']="%s" % default[best_fit_type['general']] else: request.META['HTTP_ACCEPT']="%s/%s" % (best_fit_type['general'], best_fit_type[' specific']) return None except: return HttpResponse(json.dumps(availiable_types),status=406) else: request.META['HTTP_ACCEPT']="%s" % default['*/*'] return None 48 49 50 51 52 53 54 55 56 57 58 59 def process_response(self,request, response): last = request.META['HTTP_ACCEPT'] last = last[-4:] if response.status_code==200 and response.content != None and last == 'yaml': aux = yaml.load(response.content.decode('utf-8')) if type(aux) == list : response.content = yaml.dump_all(aux) else: response.content = yaml.dump(aux) response['Content-Type'] = request.META['HTTP_ACCEPT'] return response The first method (process_view) is the responsible of the content negotiation. availiable_types and default are dictionaries, defined in every view, that list all the possible formats that can be served and the default ones, in case of receiving headers with ’*/*’ or ’something/*’ format. They are defined inside the view’s class with the names ’types’ and ’default’. availiable_types is indexed by the general type (For example text, application, audio, etc.) and it’s values are lists of strings defining specific types (For example, for text they can be html, plain, xml, etc.). default contains general types as keys too but its values are strings which define only one specific type: the default value it will be used when a ’general/*’ Accept header is received. It also contains a key ’*/*’ which is the default value for ’*/*’ requests. Example: 1 2 types = { 'application':[ 'json', 'yaml', 3 4 5 6 7 8 9 10 11 12 13 14 15 'text': ] [ 'xml', 'html', ] } default = { '*/*':'application/json', 'application':'yaml', 'text':'html', } This view accepts ’application/json’, ’application/yaml’, ’text/xml’ and ’text/html’. If the best qualified contenttype is one of them, it will be served. However, if the best qualified content-type is ’application/*’ the serve will send the content in ’application/yaml’ format and if it is ’text/*’, the server will render the response in ’text/html’. Finally, if the best qualified format is ’*/*’ the server will send the response in ’application/json’ As you can see in lines 8-10, to get types and default from the view’s class we use an auxiliary function called get_class: 111 1 from django.utils import importlib 2 3 4 5 6 7 8 9 10 11 12 13 def get_class(module_name, cls_name): try: module = importlib.import_module(module_name) except ImportError: raise ImportError('Invalid class path: {}'.format(module_name)) try: cls = getattr(module, cls_name) except AttributeError: raise ImportError('Invalid class name: {}'.format(cls_name)) else: return cls When the best fitting format is found it’s q factor and the other Accept values are removed in order to ease the task of selecting the right format when the view is processed. For example (using the allowed and default content types from the last example): 1 A c c e p t : a p p l i c a t i o n / j s o n ; q = 0 . 3 , t e x t / xml ; q = 0 . 5 , t e x t / p l a i n ; q = 0 . 8 Would be transformed into: 1 A c c e p t : t e x t / xml Once the content type has been fixed and the view has been executed, the second method (process_response) is called. Knowing the default format that the view serves it will transform the content when necessary to the type that the client requested. In this application, the accepted types can only be json or yaml and the format served by the view is json. That’s why the code above only decides between yaml or json depending on the last four chars from the Content-Type header. If the last four letters of the header are ’yaml’ it will transform the message, but if they’re json it won’t do anything. Cache Cache will be implemented with django’s default middleware, just like it was detailed on section 6.10, using memcached and pylibmc. In this case, however, we will use one more decorator for the views: 1 @vary_on_header ( ' header1 ' , ' header2 ' , . . . ) This decorator indicates that the results stored on the cache will be used only if the incoming request contain the exact same values on the specified headers. Even if in the authentication and content-negotiation middleware we defined that invalid requests must be responded with error status codes, the cache middleware process_response method doesn’t listen to the status code present in the response and if the resource is cached, the middleware sends it. For example: If it arrives a request with valid authentication credentials, the response is cached. After that, a request with non valid credentials arrive, and the authentication middleware returns a 401 status code, but the middleware cache has the resource stored. It ignores the status code and returns a message with the stored content with a new status code (200). The same happens with the Content-Type header, meaning that a user could receive data in a format that he did not requested. With the decorator this problems but causes another one : Since this application does not generate different content for different users, all the requests with valid credentials can be responded with cached values, but since we are using the decorator, the middleware will differentiate between different users, even if the content stored is the same. That’s why in the authentication middleware, if the user had valid credentials to access some view the server switches the Authentication header value to ’True’ (Boolean). The maximum time for a resource to be stored will be different for each one. There will be four different validity times, from low varying(5 minutes) to very high varying(1 second). 112 Code: 1 urlpatterns = patterns('', 2 3 url(r'^$',cache_page(60*5)(vary_on_headers('Accept','WWW-Authorization')(Bookmark.as_view())), name='bookmark'), 4 5 url(r'^topology/$',cache_page(60*5)(vary_on_headers('Accept','WWW-Authorization')( TopologyBookmark.as_view())), name='topo-book'), 6 7 url(r'^topology/switches/$', cache_page(60*1)(vary_on_headers('Accept','WWW-Authorization')( SwitchListView.as_view())), name='switch-list'), 8 9 url(r'^topology/links/$',cache_page(60*1)(vary_on_headers('Accept','WWW-Authorization')( LinkListView.as_view())),name='ful-link-list'), 10 11 url(r'^topology/links/(?P<pk>\d+)/$', cache_page(60*1)(vary_on_headers('Accept','WWWAuthorization')(LinkListView.as_view())), name='link-list'), 12 13 url(r'^flows/$', cache_page(30)(vary_on_headers('Accept','WWW-Authorization')(FullFlowView. as_view())), name='full-flow-list'), 14 15 url(r'^flows/(?P<dpid>\d+)/$', cache_page(30)(vary_on_headers('Accept','WWW-Authorization')( SwFlowView.as_view())), name='sw-flow-list'), 16 17 url(r'^flows/(?P<dpid>\d+)/(?P<flow>\d+)/$', cache_page(30)(vary_on_headers('Accept','WWWAuthorization')(SingleFlowView.as_view())), name='flow-view'), 18 19 url(r'^statistics/(?P<dpid>\d+)/(?P<port>\d+)/', cache_page(1)(vary_on_headers('Accept','WWWAuthorization')(StatsView.as_view())), name='statistics'), 20 21 ) Views In this project, the main function of views will be establish a new connection to the Ryu server and retransmit the request from the user to the server and the response from the server to the user. We will write a view for each resource. In the urls configuration we will use the same locators than we’ve used in the Ryu server. The view will take the locator received from the request (without the scheme nor the domain part) and create a new uri pointing to the resource in the Ryu server. When a response is received from the Ryu server, if it’s a valid response, the server will modify the body of the response to add the links to follow the application flow and it will return it to the client. If the server responds with any status code from 200 it will generate a response with nothing but the status code. To make new requests we will use the library ’requests’, available on pip. It will create HTTP connections to retransmit the received requests to the Ryu server. The view, as explained before, will contain the content-negotiation information for the resource. The content types will be custom content types to differentiate between the information types that each resource contains. Code: 1 2 class SwFlowView(View): http_method_names = ['get','post','options'] 3 4 5 6 7 8 9 10 11 types = { 'application': [ 'vnd+SDN.swflowlist+json', 'vnd+SDN.swflowlist+yaml', ] } default = { '*/*':'application/vnd+SDN.swflowlist+json', 113 'application':'application/vnd+SDN.swflowlist+json', 12 13 } 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 def get(self,request,*args,**kwargs): uri = basedir + request.path try: r = requests.get(uri) if r.status_code == 200: flow_list = eval(r.text) for flow in flow_list: if flow != 'age': flow_list[flow]['detailflowlink'] = reverse('flow-view',kwargs={'dpid': flow_list[flow]['dpid'],'flow':flow}) return HttpResponse(json.dumps(flow_list)) else: return HttpResponse(status=r.status_code) except: return HttpResponse(status=500) 29 30 31 32 33 valid_keys = ['in_port','in_phy_port','metadata','eth_dst','eth_src','eth_type','vlan_vid',' ip_dscp','ip_ecn','ip_proto', 'ipv4_src','ipv4_dst','tcp_src','tcp_dst','udp_src','udp_dst','sctp_src',' sctp_dst','icmpv4_type','icmpv4_code', 'arp_op','arp_spa','arp_tpa','arp_sha','arp_tha','ipv6_src','ipv6_dst',' ipv6_flabel','icmpv6_type','icmpv6_code', 'ipv6_nd_target','ipv6_nd_sll','ipv6_nd_tll','mpls_label','mpls_tc','mpls_bos', 'pbb_isid','tunnel_id','ipv6_exthdr'] 34 35 36 37 38 39 40 41 42 43 def isflow(self,flow): keys = flow.keys() if len(keys) == 4: if 'd_id' in keys and 'priority' in keys and 'conditions' in keys and 'out_port' in keys: for key in flow['conditions']: if key not in self.valid_keys: return False return True return False 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 def post(self,request,*args,**kwargs): if request.META['CONTENT_TYPE'] == 'application/vnd+SDN.flowlist+json': try: data=request.body.decode('utf-8') flow_list = yaml.load(data) except: return HttpResponse(status=500) for flow in flow_list: if not self.isflow(flow): return HttpResponse(status=400) try: r = requests.post(basedir + request.path, data = request.body.decode('utf-8') ) return HttpResponse(status=r.status_code) except: return HttpResponse(status=400) return HttpResponse(status=400) This view corresponds to the switch’s flow list resource. There are two available methods for this resource: GET and POST. The GET method is very simple: It retrieves a list of flows from the server in json format, parses it to a dictionary and adds to every flow a link to the flow’s detailed resource using the reverse function. Reverse function is used to avoid having to write hard-coded URLs. It takes the name view and the parameters that a view contains and generates the URL. This allows you to change your URLs freely without having to worry about 114 all the places where this URL is used. Example: If you have defined this url pattern: 1 url(r'^flows/(?P<dpid>\d+)/$', SwFlowView.as_view(), name='sw-flow-list') The response to this call: 1 reverse('sw-flow-list', kwargs={'dpid': 5}) Will generate the string: 1 / flows / 5 / If any exception raises in the execution of the view, the server will return a 500 status code. It could happen on the process of connecting to the Ryu server or on the process of parsing the data received from it. Even if the response received is on json format, we will parse it using the yaml parser. Since yaml is a superset of json there is no problem on doing it. The reason behind it is that if you parse the data using the json method designed for it and then you want to format it with yaml, the output contains some encoding markups that are not required. By using yaml parser we avoid this problem, since the data parsed with the yaml parser can be formatted in json or again in yaml without generating this markups. The POST method takes the URL from the request, builds the new URI, parses the information received from the client and checks if it is valid data. If it is valid it establishes a connection to the Ryu server to retransmit the request. Finally it returns only a status code, because the responses to the POST methods don’t contain any data. To check the validity of the data received from the client, the first parameter to evaluate is the Content-Type header. After that, the server tries to parse the data contained in the body of the request, it evaluates the parsed data by checking the key names, the number of keys and the keys contained in the ’conditions’ dictionary. If any exception is raised in this process, the view returns a 500 status code, and if no exceptions are raised but the data parsed is not valid the server returns a 400 status code. The rest of the views can be found on annex: 8.4.3 Topology configuration To create the topology to implement this system we will use three kinds of virtualization. • The Ryu server, the Django intermediary server and the client will be created with LxC • To simulate the network that has to be controlled by the Ryu server we will use a virtual network generated with Mininet. First of all, we’re going to create three virtual machines with lxc-create tool: 1 2 3 $ sudo lxc-create -t ubuntu -n client $ sudo lxc-create -t ubuntu -n django $ sudo lxc-create -t ubuntu -n ryu After that, we’re going to edit the configuration files: Client configuration in /var/lib/lxc/client/config: 1 2 3 4 5 # T e m p l a t e u s e d t o c r e a t e t h i s c o n t a i n e r : / u s r / s h a r e / l x c / t e m p l a t e s / l x c −u b u n t u # Parameters passed to the template : # For a d d i t i o n a l c o n f i g o p t i o n s , p l e a s e look a t l x c . c o n t a i n e r . conf ( 5 ) # Common c o n f i g u r a t i o n l x c . i n c l u d e = / u s r / s h a r e / l x c / c o n f i g / u b u n t u . common . c o n f 6 7 8 9 10 # Container specific configuration lxc . r o o t f s = / var / l i b / lxc / c l i e n t / r o o t f s l x c . mount = / v a r / l i b / l x c / c l i e n t / f s t a b l x c . utsname = c l i e n t 115 11 12 13 14 15 16 17 l x c . a r c h = amd64 lxc . a a _ p r o f i l e = unconfined # Network c o n f i g u r a t i o n : p h y s i c a l h o s t l i n k lxc . network . type = veth l x c . n e t w o r k . f l a g s = up lxc . network . l i n k = lxc br0 l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : 2 a : 0 d : 3 f 18 19 20 21 22 23 24 25 26 # Network c o n f i g u r a t i o n : SUBNET 1 0 . 0 . 0 . 0 / 2 4 lxc . network . type = veth l x c . n e t w o r k . f l a g s = up lxc . network . l i n k = br0 l x c . n e t w o r k . name = e t h 1 lxc . network . ipv4 = 1 0 . 0 . 0 . 1 / 2 4 lxc . network . veth . p a i r = vethc l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : f f : 4 d : 8 e Django configuration in /var/lib/lxc/django/config: 1 2 3 4 5 # T e m p l a t e u s e d t o c r e a t e t h i s c o n t a i n e r : / u s r / s h a r e / l x c / t e m p l a t e s / l x c −u b u n t u # Parameters passed to the template : # For a d d i t i o n a l c o n f i g o p t i o n s , p l e a s e look a t l x c . c o n t a i n e r . conf ( 5 ) # Common c o n f i g u r a t i o n l x c . i n c l u d e = / u s r / s h a r e / l x c / c o n f i g / u b u n t u . common . c o n f 6 7 8 9 10 11 12 # Container specific configuration lxc . r o o t f s = / var / l i b / lxc / django / r o o t f s l x c . mount = / v a r / l i b / l x c / d j a n g o / f s t a b l x c . utsname = django l x c . a r c h = amd64 lxc . a a _ p r o f i l e = unconfined 13 14 15 16 17 18 # Network c o n f i g u r a t i o n : p h y s i c a l h o s t l i n k lxc . network . type = veth l x c . n e t w o r k . f l a g s = up lxc . network . l i n k = lxc br0 l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : bb : 2 9 : a a 19 20 21 22 23 24 25 26 27 # Network c o n f i g u r a t i o n : SUBNET 1 0 . 0 . 0 . 0 / 2 4 lxc . network . type = veth l x c . n e t w o r k . f l a g s = up lxc . network . l i n k = br0 l x c . n e t w o r k . name = e t h 1 lxc . network . ipv4 = 1 0 . 0 . 0 . 2 / 2 4 lxc . network . veth . p a i r = vethd1 l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : 5 2 : 0 4 : 4 e 28 29 30 31 32 33 34 35 # Network c o n f i g u r a t i o n : SUBNET 1 0 . 0 . 1 . 0 / 2 4 lxc . network . type = veth l x c . n e t w o r k . f l a g s = up lxc . network . l i n k = br1 lxc . network . ipv4 = 1 0 . 0 . 1 . 1 / 2 4 l x c . n e t w o r k . name = e t h 2 lxc . network . veth . p a i r = vethd2 Ryu configuration in /var/lib/lxc/ryu/config: 1 2 3 # T e m p l a t e u s e d t o c r e a t e t h i s c o n t a i n e r : / u s r / s h a r e / l x c / t e m p l a t e s / l x c −u b u n t u # Parameters passed to the template : # For a d d i t i o n a l c o n f i g o p t i o n s , p l e a s e look a t l x c . c o n t a i n e r . conf ( 5 ) 4 5 6 # Common c o n f i g u r a t i o n l x c . i n c l u d e = / u s r / s h a r e / l x c / c o n f i g / u b u n t u . common . c o n f 7 8 9 # Container specific configuration lxc . r o o t f s = / var / l i b / lxc / ryu / r o o t f s 116 10 11 12 13 lxc lxc lxc lxc . mount = / v a r / l i b / l x c / r y u / f s t a b . utsname = ryu . a r c h = amd64 . a a _ p r o f i l e = unconfined 14 15 16 17 18 19 # Network c o n f i g u r a t i o n : p h y s i c a l h o s t l i n k lxc . network . type = veth l x c . n e t w o r k . f l a g s = up lxc . network . l i n k = lxc br0 l x c . n e t w o r k . hwaddr = 0 0 : 1 6 : 3 e : 8 2 : 0 7 : 5 20 21 22 23 24 25 26 27 # Network c o n f i g u r a t i o n : SUBNET 1 0 . 0 . 1 . 0 / 2 4 lxc . network . type = veth lxc . network . l i n k = br1 l x c . n e t w o r k . f l a g s = up lxc . network . ipv4 = 1 0 . 0 . 1 . 2 / 2 4 l x c . n e t w o r k . name = e t h 1 lxc . network . veth . p a i r = vet hr1 These files will set the following network configuration: Django Client br0 eth1 10.0.0.1/24 br1 eth1 10.0.0.2/24 eth0 10.0.3.117/24 eth2 10.0.1.1/24 eth1 10.0.1.2/24 eth0 10.0.3.99/24 eth0 10.0.3.51/24 lxbr0 lxbr0 10.0.3.1/24 Physical Host Figure 8.5: Topology generated with lxc containers procedure. Notes: • The part of the config file that corresponds with the connection with the physical host should not be changed. Depending on the changes, the containers may start taking a long time to load (about 5 min). Also, some functionalities may not work correctly, such as DNS service. • The device names look different from the point of view of the physical host. From the point of view of the container, the names look like in figure 8.5 but if you call ifconfig in the physical host you’ll see the names defined in the configuration files with the parameter lxc.network.veth.pair (vethr1, vethd2, vethd1, etc.). It is important to define them because if you don’t do it, the names will be defined with a static part (veth) and a random string appended (For example: vethXHD92Q) and can be chaotic if you need to apply some changes to the network. 117 • LxC creates only the lxbr0 bridge. The other ones have to be created manually (with brctl for example). If they’re not created at the moment of starting the container LxC will throw an error and it won’t start. However, it’s not necessary to configure them, LxC will add the interfaces to the bridges and will set them up. Once the containers have been correctly configured and the network works correctly the next step is to install all the required packages in order to be able to execute the project. Client: Start the client container typing: 1 $ sudo lxc-start -n client Once the container is loaded log with the default credentials User: Ubuntu and Password: Ubuntu. On the client we will only need curl. 1 $ sudo apt-get install curl Django Start the container and log just like in the Client container. In this container we will need to install the django framework, the memcached service and some python packages. To make everything easier we will also install pip. 1 2 3 4 5 6 7 8 9 10 11 $ $ $ $ $ $ $ $ $ $ $ sudo sudo sudo sudo sudo sudo sudo sudo sudo sudo sudo apt-get update apt-get install python-pip pip install django pip install pyyaml pip install requests apt-get install memcached apt-get install libmemcached-dev apt-get install gcc apt-get install python-dev pip install python-memcached pip install pylibmc Ryu Finally, in the Ryu container we will have to install mininet, the ryu framework, OpenvSwitch, Mysql and python packages. We will also install Git to clone some repositories. Mininet: 1 2 3 $ sudo apt-get install git $ git clone git://github.com/mininet/mininet $ mininet/util/install.sh -a OpenvSwitch: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 $ sudo apt-get install wget $ wget http://openvswitch.org/releases/openvswitch-2.0.1.tar.gz $ tar zxvf openvswitch-2.0.1.tar.gz $ cd openvswitch-2.0.1 $ ./boot.sh $ sudo ./configure $ sudo make && sudo make install $ mkdir -p /usr/local/etc/openvswitch $ sudo ovsdb-tool create /usr/local/etc/openvswitch/conf.db vswitchd/vswitch.ovsschema $ sudo ovsdb-server -v --remote=punix:/usr/local/var/run/openvswitch/db.sock \ --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ --private-key=db:Open_vSwitch,SSL,private_key \ --certificate=db:Open_vSwitch,SSL,certificate \ --pidfile --detach --log-file $ sudo ovs-vsctl --no-wait init $ sudo ovs-vswitchd --pidfile --detach Every time you reboot the system you’ll have to execute the following commands: 118 1 2 3 4 5 6 7 $ sudo ovsdb-server -v --remote=punix:/usr/local/var/run/openvswitch/db.sock \ --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ --private-key=db:Open_vSwitch,SSL,private_key \ --certificate=db:Open_vSwitch,SSL,certificate \ --pidfile --detach --log-file $ sudo ovs-vsctl --no-wait init $ sudo ovs-vswitchd --pidfile --detach Ryu framework: 1 2 3 4 5 6 7 $ $ $ $ $ $ $ sudo apt-get install python-pip sudo apt-get install python-dev git clone git://github.com/osrg/ryu.git cd ryu; sudo python ./setup.py install sudo apt-get install python-eventlet sudo apt-get install python-routes sudo apt-get install paramiko At this point you can test if it is everything has been installed correctly: Open a new terminal and execute: 1 $ sudo lxc-console -n ryu It will convert the terminal from the physical host to a terminal from the ryu container. On the first terminal execute: 1 $ sudo mn --topo single,2 --controller remote --mac --switch ovsk On the second terminal execute, from the ryu directory: 1 ~/ryu/$ PYTHONPATH=. ./bin/ryu-manager --observe-links ryu/app/simple_switch_13.py It will launch a Ryu application that makes the virtual switch to work like any level 2 switch. Back on the first terminal, inside the mininet environment, call: 1 > h1 ping -c 1 h2 If the ping was answered everything is working fine. MySQL: 2 $ sudo apt-get install mysql-server $ sudo apt-get install python-mysqldb 1 $ mysql -u <username> -p 1 You should use the user that you created during the ’mysql-server’ installation. 1 2 3 4 5 6 7 8 9 mysql> CREATE DATABASE SDN: mysql> CREATE TABLE `base_table` ( `rx_pkts` bigint(20) DEFAULT NULL, `rx_bytes` bigint(20) DEFAULT NULL, `rx_error` bigint(20) DEFAULT NULL, `tx_pkts` bigint(20) DEFAULT NULL, `tx_bytes` bigint(20) DEFAULT NULL, `tx_error` bigint(20) DEFAULT NULL, `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); Complete Test: We’ve finally set all the virtual machines up with the correct configuration and with the necessary packages for the whole application to work. 119 To test if it works, we’ll start the three machines, start the Ryu server and mininet on the Ryu container and the django server in django container. Finally we’ll capture the traffic on br0 and br1 with Wireshark and we’ll use curl to execute a call to the system. First of all, add br0 and br1: 1 2 $ sudo brctl addbr br0 $ sudo brctl addbr br1 After, start the three virtual machines. 1 2 3 $ sudo lxc-start -n client $ sudo lxc-start -n django $ sudo lxc-start -n ryu Open a new terminal and start another ryu console: 1 $ sudo lxc-console -n ryu On the first ryu console configure the OpenvSwitch and start mininet with the following configuration: 7 $ sudo ovsdb-server -v --remote=punix:/usr/local/var/run/openvswitch/db.sock \ --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ --private-key=db:Open_vSwitch,SSL,private_key \ --certificate=db:Open_vSwitch,SSL,certificate \ --pidfile --detach --log-file $ sudo ovs-vsctl --no-wait init $ sudo ovs-vswitchd --pidfile --detach 1 $ sudo mn --topo single,2 --mac --switch ovsk --controller remote 1 2 3 4 5 6 On the second ryu console configure the bridge to use OpenFlow13 and start the ryu server: 1 2 3 $ sudo ovs-vsctl set Bridge s1 protocols=OpenFlow13 $ cd ryu/ $ PYTHONPATH=. ./bin/ryu-manager --observe-links ryu/app/myapp.py On the django console start the django server with the manage script. 1 sudo ./manage.py runserver 0.0.0.0:80 Start two Wireshark instances, set one to capture on br0 and another one to capture on br1. Finally, on the client use curl to make a POST call to the django server and add two flows, one that redirects the packets from port1 to port 2 and another one that redirects th packets from port 2 to port 1 1 curl -X post 10.0.0.2:80/flows/1/ -d '[{"d_id":0000000000000001,"priority":1,"conditions":{" in_port":1},"out_port":2},{"d_id":0000000000000001,"priority":1,"conditions":{"in_port":2}," out_port":1}]' -u root:root On the client side we only receive a 200 status code. If we look at the wireshark capture (Figure 8.6) we can see how every bridge captured different TCP connections. Tcp Stream on br0: 1 2 3 4 5 6 7 POST / f l o w s / 1 / HTTP / 1 . 1 A u t h o r i z a t i o n : B a s i c cm9vdDpyb290 User−Agent : c u r l / 7 . 2 2 . 0 ( x86_64−pc−l i n u x −gnu ) l i b c u r l / 7 . 2 2 . 0 OpenSSL / 1 . 0 . 1 z l i b / 1 . 2 . 3 . 4 l i b i d n /1.23 librtmp /2.3 Host : 1 0 . 0 . 0 . 2 : 8 0 0 0 Accept : */* C o n t e n t −Type : a p p l i c a t i o n / vnd+SDN . f l o w l i s t + j s o n C o n t e n t −L e n g t h : 129 8 120 9 10 11 12 13 14 [{ " d_id " :1 , " p r i o r i t y " :1 , " c o n d i t i o n s " :{ " i n _ p o r t " :1} , " out_port " :2} ,{ " d_id " :1 , " p r i o r i t y " :1 , " conditions " :{ " in_port " :2} , " out_port " :1}] HTTP / 1 . 0 200 OK D a t e : Sun , 05 J u l 2015 2 3 : 1 6 : 5 6 GMT S e r v e r : WSGIServer / 0 . 1 P y t h o n / 2 . 7 . 3 Vary : Accept , WWW −A u t h o r i z a t i o n C o n t e n t −Type : a p p l i c a t i o n / vnd+SDN . s w f l o w l i s t + j s o n Tcp Stream on br1: 1 2 3 4 5 6 7 POST / f l o w s / 1 / HTTP / 1 . 1 Host : 1 0 . 0 . 1 . 2 : 8 0 8 0 C o n t e n t −L e n g t h : 129 User−Agent : p y t h o n−r e q u e s t s / 2 . 7 . 0 CPython / 2 . 7 . 3 L i n u x /3.13.0 −37 − g e n e r i c C o n n e c t i o n : keep−a l i v e Accept : */* Accept−E n c o d i n g : g z i p , d e f l a t e 8 9 10 11 12 13 14 [{ " d_id " :1 , " p r i o r i t y " :1 , " c o n d i t i o n s " :{ " i n _ p o r t " :1} , " out_port " :2} ,{ " d_id " :1 , " p r i o r i t y " :1 , " conditions " :{ " in_port " :2} , " out_port " :1}] HTTP / 1 . 1 200 OK C o n t e n t −Type : t e x t / h t m l ; c h a r s e t =UTF−8 C o n t e n t −L e n g t h : 0 D a t e : Sun , 05 J u l 2015 2 3 : 1 6 : 5 6 GMT C o n n e c t i o n : keep−a l i v e (a) Capture on br0 (b) Capture on br1 Figure 8.6: Wireshark captures from a POST request Finally, on the first ryu console, to test if the flows were correctly added, you can try: 1 mininet> h1 ping h2 121 Bibliography [1] Roy Thomas Fielding. Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine, 2000. [2] Paul Sobocinski. Hypermedia apis: The benefits of hateoas, 2014. http://www.programmableweb.com/news/hypermedia-apis-benefits-hateoas/how-to/2014/02/27]. [Online; [3] Roy T. Fielding. Rest apis must be hypertext-driven, 2008. [Online; http://roy.gbiv.com/untangled/2008/restapis-must-be-hypertext-driven]. [4] Leonard Richardson and Sam Ruby. RESTful Web Services. O’Reilly Media, 2007. [5] Joshua Thijssen. The restful cookbook. [Online; http://restcookbook.com/]. [6] Jim Webber, Savas Parastatidis, and Ian Robinson. http://www.infoq.com/articles/webber-rest-workflow]. How to get a cup of coffee, 2008. [Online; [7] Draft - make readable uris, 2004. [Online; http://www.w3.org/QA/2004/08/readable-uri]. [8] Mike Amundsen. Roy fielding on versioning, http://www.infoq.com/articles/roy-fielding-on-versioning]. hypermedia, and rest, 2014. [Online; [9] T. Berners-Lee, L. Masinter, and M. McCahill. Uniform Resource Locators (URL). RFC 1738 (Proposed Standard), December 1994. Obsoleted by RFCs 4248, 4266, updated by RFCs 1808, 2368, 2396, 3986, 6196, 6270. [10] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. Hypertext Transfer Protocol – HTTP/1.1. RFC 2616 (Draft Standard), June 1999. Obsoleted by RFCs 7230, 7231, 7232, 7233, 7234, 7235, updated by RFCs 2817, 5785, 6266, 6585. [11] A. Barth. HTTP State Management Mechanism. RFC 6265 (Proposed Standard), April 2011. [12] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. RFC 7230 (Proposed Standard), June 2014. [13] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC 7231 (Proposed Standard), June 2014. [14] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests. RFC 7232 (Proposed Standard), June 2014. [15] R. Fielding, Y. Lafon, and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Range Requests. RFC 7233 (Proposed Standard), June 2014. [16] R. Fielding, M. Nottingham, and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Caching. RFC 7234 (Proposed Standard), June 2014. 122 [17] R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Authentication. RFC 7235 (Proposed Standard), June 2014. [18] Introducing json. [Online; http://www.json.org]. [19] Use http basic authentification to login into django, 2014. [Online; http://ponytech.net/blog/use-http-basicauthentification-login]. [20] Jacob K. Moss Adrian Holovaty. The Definitive Guide to Django: Web Development Done Right. Apress, 2006. [21] Django documentation. [Online; https://docs.djangoproject.com/en/1.8/]. [22] Erik Christensen, Francisco Curbera, Greg Meredith, and Sanjiva Weerawarana. Web services description language (wsdl) 1.1, 2001. [Online; http://www.w3.org/TR/wsdl]. [23] Nilo Mitra and Yves Lafon. www.w3.org/TR/soap12-part0/]. Soap version 1.2 part 0: Primer (second edition), 2007. [Online; [24] Martin Gudgin, Marc Hadley, Noah Mendelsohn, Jean-Jacques Moreau, Henrik Frystyk Nielsen, Anish Karmarkar, and Yves Lafon. Soap version 1.2 part 1: Messaging framework (second edition), 2007. [Online; www.w3.org/TR/soap12-part1/]. [25] Hugo Haas and Allen Brown. Web services glossary, 2004. [Online; http://www.w3.org/TR/ws-gloss/]. [26] Don Box. A brief history of soap, April 2001. [Online; http://www.xml.com/pub/a/ws/2001/04/04/soap.html]. [27] Some thoughts for the enterprise embracing web apis, 2012. [Online; http://apievangelist.com/2012/12/09/somethoughts-for-the-enterprise-embracing-web-apis/]. [28] Douglas C. Schmidt. Overview of http://www.cs.wustl.edu/ schmidt/PDF/rpc4.pdf]. remote procedure calls (rpc). [Online; [29] From edi to xml and uddi: A brief history of web services, 2001. [Online; http://www.informationweek.com/from-edi-to-xml-and-uddi-a-brief-history-of-web-services/d/d-id/1012008]. [30] R. Srinivasan. RPC: Remote Procedure Call Protocol Specification Version 2. RFC 1831 (Proposed Standard), August 1995. Obsoleted by RFC 5531. [31] Topology discovery with ryu, 2014. [Online; http://sdn-lab.com/2014/12/31/topology-discovery-with-ryu/]. [32] Setting up openvswitch 2.0 + mininet 2.1+ ubuntu 13.04, 2013. [Online; http://sdn-lab.com/2013/11/14/settingup-openvswitch-2-0-mininet-2-1/]. [33] Robert Daigneau. Service design patterns : fundamental design solutions for SOAP/WSDL and restful Web services. Addison-Wesley, 2012. [34] Ryu development team. Ryubook 1.0, 2014. [Online; http://osrg.github.io/ryu-book/en/html/index.html]. [35] Hao He. What is service-oriented architecture, september 2003. [Online; http://www.xml.com/lpt/a/1292]. [36] Dave Marshall. Remote procedure http://www.cs.cf.ac.uk/Dave/C/node33.html]. 123 calls (rpc), March 1999. [Online; REST API for SDN Code Django Settings 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 """ Django settings for SDN project. For more information on this file, see https://docs.djangoproject.com/en/1.7/topics/settings/ For the full list of settings and their values, see https://docs.djangoproject.com/en/1.7/ref/settings/ """ # Build paths inside the project like this: os.path.join(BASE_DIR, ...) import os BASE_DIR = os.path.dirname(os.path.dirname(__file__)) # Quick-start development settings - unsuitable for production # See https://docs.djangoproject.com/en/1.7/howto/deployment/checklist/ # SECURITY WARNING: keep the secret key used in production secret! SECRET_KEY = ’xhm-zr6&p_i%9kb5#=y)n1p6#p%5d!jx5tq#i-^l^lfzbx!_b5’ # SECURITY WARNING: don’t run with debug turned on in production! DEBUG = False TEMPLATE_DEBUG = False ALLOWED_HOSTS = [] # Application definition INSTALLED_APPS = ( #’django.contrib.admin’, #’django.contrib.auth’, #’django.contrib.contenttypes’, #’django.contrib.sessions’, #’django.contrib.messages’, #’django.contrib.staticfiles’, #’serverconnector’, ) CACHES={ ’default’:{ ’BACKEND’:’django.core.cache.backends.memcached.MemcachedCache ’, ’LOCATION’:’127.0.0.1:11211’, 45 46 }, 47 } 48 49 MIDDLEWARE_CLASSES = ( 50 #’django.contrib.sessions.middleware.SessionMiddleware’, 51 #’django.middleware.common.CommonMiddleware’, 52 #’django.middleware.csrf.CsrfViewMiddleware’, 53 #’django.contrib.auth.middleware.AuthenticationMiddleware’, 54 #’django.contrib.auth.middleware.SessionAuthenticationMiddleware’, 1 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 #’django.contrib.messages.middleware.MessageMiddleware’, #’django.middleware.clickjacking.XFrameOptionsMiddleware’, ’serverconnector.middleware.HttpAuthMiddleware’, ’serverconnector.middleware.ConentNegotiationMiddleware’, ’django.middleware.cache.UpdateCacheMiddleware’, ’django.middleware.cache.FetchFromCacheMiddleware’, ) CACHE_MIDDLEWARE_ALIAS = "default" CACHE_MIDDLEWARE_SECONDS = 5*60 CACHE_MIDDLEWARE_KEY_PREFIX = "" ROOT_URLCONF = ’SDN.urls’ WSGI_APPLICATION = ’SDN.wsgi.application’ # Database # https://docs.djangoproject.com/en/1.7/ref/settings/#databases DATABASES = { ’default’: { ’ENGINE’: ’django.db.backends.sqlite3’, ’NAME’: os.path.join(BASE_DIR, ’db.sqlite3’), } } # Internationalization # https://docs.djangoproject.com/en/1.7/topics/i18n/ LANGUAGE_CODE = ’en-us’ TIME_ZONE = ’UTC’ USE_I18N = True USE_L10N = True USE_TZ = True # Static files (CSS, JavaScript, Images) # https://docs.djangoproject.com/en/1.7/howto/static-files/ STATIC_URL = ’/static/’ 2 URL patterns 1 from django.conf.urls import patterns, include, url 2 from django.contrib import admin 3 from serverconnector.views import TopologyBookmark,FullFlowView,SwFlowView, 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 SingleFlowView,StatsView,Bookmark,SwitchListView,LinkListView from django.views.decorators.csrf import csrf_exempt from django.views.decorators.cache import cache_page from django.views.decorators.vary import vary_on_headers urlpatterns = patterns(’’, url(r’^$’,cache_page(60*5)(vary_on_headers(’Accept’,’WWW-Authorization’)( Bookmark.as_view())), name=’bookmark’), url(r’^topology/$’,cache_page(60*5)(vary_on_headers(’Accept’,’WWWAuthorization’)(TopologyBookmark.as_view())), name=’topo-book’), url(r’^topology/switches/$’, cache_page(60*1)(vary_on_headers(’Accept’,’WWWAuthorization’)(SwitchListView.as_view())), name=’switch-list’), url(r’^topology/links/$’,cache_page(60*1)(vary_on_headers(’Accept’,’WWWAuthorization’)(LinkListView.as_view())),name=’ful-link-list’), url(r’^topology/links/(?P<pk>\d+)/$’, cache_page(60*1)(vary_on_headers(’Accept ’,’WWW-Authorization’)(LinkListView.as_view())), name=’link-list’), url(r’^flows/$’, cache_page(30)(vary_on_headers(’Accept’,’WWW-Authorization’)( FullFlowView.as_view())), name=’full-flow-list’), url(r’^flows/(?P<dpid>\d+)/$’, cache_page(30)(vary_on_headers(’Accept’,’WWWAuthorization’)(SwFlowView.as_view())), name=’sw-flow-list’), url(r’^flows/(?P<dpid>\d+)/(?P<flow>\d+)/$’, cache_page(30)(vary_on_headers(’ Accept’,’WWW-Authorization’)(SingleFlowView.as_view())), name=’flow-view’) , url(r’^statistics/(?P<dpid>\d+)/(?P<port>\d+)/’, cache_page(1)(vary_on_headers (’Accept’,’WWW-Authorization’)(StatsView.as_view())), name=’statistics’), ) 3 Views 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 from django.shortcuts import render from django.http import HttpResponse from django.views.generic.base import View import requests, sys, json, yaml from django.utils.decorators import method_decorator from django.views.decorators.csrf import csrf_exempt from django.contrib.auth.decorators import login_required from django.core.urlresolvers import reverse from django.views.decorators.vary import vary_on_headers basedir = ’http://10.0.1.2:8080’ bookmark_obj={ ’topologylink’: ’topology/’, ’fullflowlink’: ’flows/’, } topology_bookmark_obj = { ’Swithceslink’:’topology/switches/’, ’Linklink’:’topology/links/’, } class Bookmark(View): types={ ’application’:[ ’json’, ’yaml’ ] } default = { ’*/*’:’application/json’, ’application’:’application/json’, } def get(self,request,*args,**kwargs): response = HttpResponse(json.dumps(bookmark_obj)) return HttpResponse(response) class TopologyBookmark(View): types={ ’application’:[ ’json’, ’yaml’ ] } default = { ’*/*’:’application/json’, ’application’:’application/json’, } def get(self,request,*args,**kwargs): if request.META[’HTTP_ACCEPT’] == ’application/vnd+SDN.bookmark+json’: response = HttpResponse(json.dumps(topology_bookmark_obj)) else: response = HttpResponse(yaml.dump(topology_bookmark_obj)) return HttpResponse(response) class SwitchListView(View): types = { ’application’:[ 4 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 ’json’, ’yaml’, ] } default = { ’*/*’:’application/json’, ’application’:’application/json’, } def get(self,request, *args, **kwargs): uri = basedir + request.path try: r = requests.get(uri) if r.status_code == 200: sw_list = yaml.load(r.text) for sw in sw_list: sw[’linkslink’] = str(reverse(’link-list’, kwargs={’pk’: int(sw[’dpid’]) })) sw[’flowlistlink’] = str(reverse(’sw-flow-list’, kwargs={’dpid’: int(sw [’dpid’])})) for port in sw[’ports’]: port[’statslink’] = str(reverse(’statistics’, kwargs={’dpid’: int(sw[’ dpid’]), ’port’:int(port[’port_no’])})) return HttpResponse(json.dumps(sw_list)) else: return HttpResponse(status=r.status_code) except: return HttpResponse(status=500) 81 82 83 84 85 86 87 class LinkListView(View): 88 types = { 89 ’application’:[ 90 ’json’, 91 ’yaml’, 92 ] 93 } 94 default = { 95 ’*/*’:’application/json’, 96 ’application’:’application/json’, 97 } 98 def get(self,request, *args, **kwargs): 99 uri = basedir + request.path 100 try: 101 r = requests.get(uri) 102 if r.status_code == 200 and r.text!=None: 103 link_list = yaml.load(r.text) 104 return HttpResponse(json.dumps(link_list)) 105 else: 106 return HttpResponse(status=r.status_code) 107 except: 108 return HttpResponse(status=500) 109 110 class FullFlowView(View): 111 types = { 112 ’application’:[ 113 ’json’, 114 ’yaml’, 115 ] 116 } 117 default = { 5 118 119 120 121 122 123 124 125 126 127 128 129 130 131 ’*/*’:’application/json’, ’application’:’application/json’, } def get(self, request, *args,**kwargs): uri = basedir + request.path r = requests.get(uri) try: r = requests.get(uri) if r.status_code == 200: flow_list = yaml.load(r.text) new_flow_list = [] for flow in flow_list: new_flow = {’flow_id’:flow,’detailflowlink’:reverse(’flow-view’,kwargs ={’dpid’:flow[’dpid’],’flow’:flow[’id’]})} new_flow_list.append(new_flow) return HttpResponse(json.dumps(new_flow_list)) else: return HttpResponse(status=r.status_code) except: return HttpResponse(status=500) 132 133 134 135 136 137 138 139 class SwFlowView(View): 140 http_method_names = [’get’,’post’,’options’] 141 142 types = { 143 ’application’: [ 144 ’json’, 145 ’yaml’, 146 ] 147 } 148 default = { 149 ’*/*’:’application/json’, 150 ’application’:’application/json’, 151 } 152 153 def get(self,request,*args,**kwargs): 154 uri = basedir + request.path 155 try: 156 r = requests.get(uri) 157 if r.status_code == 200: 158 flow_list = yaml.load(r.text) 159 for flow in flow_list: 160 if flow != ’age’: 161 flow_list[flow][’detailflowlink’] = reverse(’flow-view’,kwargs={’dpid 162 163 164 165 166 167 168 169 170 171 ’:flow_list[flow][’dpid’],’flow’:flow}) return HttpResponse(json.dumps(flow_list)) else: return HttpResponse(status=r.status_code) except: return HttpResponse(status=500) valid_keys = [’in_port’,’in_phy_port’,’metadata’,’eth_dst’,’eth_src’,’eth_type ’,’vlan_vid’,’ip_dscp’,’ip_ecn’,’ip_proto’, ’ipv4_src’,’ipv4_dst’,’tcp_src’,’tcp_dst’,’udp_src’,’udp_dst’,’sctp_src ’,’sctp_dst’,’icmpv4_type’,’icmpv4_code’, ’arp_op’,’arp_spa’,’arp_tpa’,’arp_sha’,’arp_tha’,’ipv6_src’,’ipv6_dst’,’ ipv6_flabel’,’icmpv6_type’,’icmpv6_code’, ’ipv6_nd_target’,’ipv6_nd_sll’,’ipv6_nd_tll’,’mpls_label’,’mpls_tc’,’ mpls_bos’,’pbb_isid’,’tunnel_id’,’ipv6_exthdr’] 172 6 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 def isflow(self,flow): keys = flow.keys() if len(keys) == 4: if ’d_id’ in keys and ’priority’ in keys and ’conditions’ in keys and ’ out_port’ in keys: for key in flow[’conditions’]: if key not in self.valid_keys: return False return True return False def post(self,request,*args,**kwargs): if request.META[’CONTENT_TYPE’] == ’application/json’: print "not here" try: print "lol1" data=request.body.decode(’utf-8’) flow_list = yaml.load(data) except: print "lol2" return HttpResponse(status=500) print(flow_list) for flow in flow_list: print(flow) if not self.isflow(flow): print "data" return HttpResponse(status=400) try: r = requests.post(basedir + request.path, data = request.body.decode(’ utf-8’)) return HttpResponse(status=r.status_code) except: print "server" return HttpResponse(status=500) print "yeshere" return HttpResponse(status=400) 201 202 203 204 205 206 207 208 class SingleFlowView(View): 209 types = { 210 ’application’:[ 211 ’json’, 212 ’yaml’, 213 ] 214 } 215 default = { 216 ’*/*’:’application/json’, 217 ’application’:’application/json’ 218 } 219 def get(self,request,*args,**kwargs): 220 uri = basedir + request.path 221 try: 222 r = requests.get(uri) 223 if r.status_code == 200: 224 flow = yaml.load(r.text) 225 return HttpResponse(json.dumps(flow)) 226 else: 227 return HttpResponse(status=r.status_code) 228 except: 229 return HttpResponse(status=500) 230 231 def delete(self,request,*args,**kwargs): 7 232 uri = basedir + request.path 233 try: 234 r = requests.delete(uri) 235 return HttpResponse(status=r.status_code) 236 except: 237 return HttpResponse(status=500) 238 239 class StatsView(View): 240 types = { 241 ’application’:[ 242 ’json’, 243 ’yaml’, 244 ] 245 } 246 default = { 247 ’*/*’:’application/json’, 248 ’application’:’application/json’, 249 } 250 def get(self,request,*args,**kwargs): 251 uri = basedir + request.path 252 try: 253 r = requests.get(uri) 254 if r.status_code == 200: 255 flow = yaml.load(r.text) 256 return HttpResponse(json.dumps(flow)) 257 else: 258 return HttpResponse(status=r.status_code) 259 except: 260 return HttpResponse(status=500) 8 Middleware 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 from serverconnector.authconfig import authconfig,credentialslist from myutils import get_class from django.http import HttpResponse import json, yaml class HttpAuthMiddleware(): def process_view(self,request,view_func,view_args,view_kwargs): realm = authconfig[view_func.__name__] if realm == None: return None else: if request.META.has_key(’HTTP_AUTHORIZATION’): [auth, credentials]=request.META[’HTTP_AUTHORIZATION’].split(’ ’,1) if auth.lower() == ’basic’: auth = credentials.strip().decode(’base64’) username, password = auth.split(’:’, 1) if username in credentialslist[realm]: if credentialslist[realm][username]==password: request.META[’HTTP_AUTHORIZATION’]=True return None response = HttpResponse(status=401) response[’WWW-Authenticate’] = "Basic realm=\"%s\"" % (realm) return response class ConentNegotiationMiddleware(): def process_view(self,request,view_func,view_args,view_kwargs): if request.META.has_key(’HTTP_ACCEPT’): aux=request.META[’HTTP_ACCEPT’].replace(’ ’,’’) allowed_types=aux.split(’,’) best_fit_type={’general’:’*’,’specific’:’*’,’q’:0,’default’:True} func_class = get_class(view_func.__module__, view_func.__name__) availiable_types = func_class.types default = func_class.default try: for elem in allowed_types: chunks = elem.split(’;’,1) if len(chunks) == 1: q=1 elif len(chunks) == 2: q = chunks[1].replace("q=","") else: request.META[’MYTEST’] = "hello" return HttpResponse(json.dumps(availiable_types),status=406) if float(q) > 0.0 and float(q) <= 1.0: [general, specific] = chunks[0].split(’/’,1) if general in availiable_types: if specific in availiable_types[general]: if float(q) > float(best_fit_type[’q’]): best_fit_type = {’general’:general,’specific’:specific,’q’:q,’ default’:False} elif float(q) == float(best_fit_type[’q’]) and best_fit_type[’ specific’]==’*’: best_fit_type = {’general’:general,’specific’:specific,’q’:q,’ default’:False} elif specific == ’*’: if float(q) > float(best_fit_type[’q’]): best_fit_type = {’general’:general,’specific’:specific,’q’:q,’ default’:False} 9 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 elif general == ’*’ and specific == ’*’: best_fit_type = {’general’:general,’specific’:specific,’q’:q,’ default’:False} if best_fit_type[’default’]==True: return HttpResponse(json.dumps(availiable_types),status=406) else: if best_fit_type[’general’] == ’*’: request.META[’HTTP_ACCEPT’]="%s" % default[’*/*’] elif best_fit_type[’specific’] == ’*’: request.META[’HTTP_ACCEPT’]="%s" % default[best_fit_type[’general’]] else: request.META[’HTTP_ACCEPT’]="%s/%s" % (best_fit_type[’general’], best_fit_type[’specific’]) return None except: return HttpResponse(json.dumps(availiable_types),status=406) else: request.META[’HTTP_ACCEPT’]="%s" % default[’*/*’] return None def process_response(self,request, response): last = request.META[’HTTP_ACCEPT’] last = last[-4:] if response.status_code==200 and response.content != None and last == ’yaml’: aux = yaml.load(response.content.decode(’utf-8’)) if type(aux) == list : response.content = yaml.dump_all(aux) else: response.content = yaml.dump(aux) response[’Content-Type’] = request.META[’HTTP_ACCEPT’] return response 10 Authentication config 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 authconfig={ ’Bookmark’:None, ’TopologyBookmark’:None, ’SwitchListView’:’Topo’, ’LinkListView’:’Topo’, ’FullFlowView’:’Flow’, ’SwFlowView’:’Flow’, ’SingleFlowView’:’Flow’, ’StatsView’:’Stats’, } credentialslist={ ’Flow’:{’root’:’root’,’flowuser’:’flowpwd’}, ’Topo’:{’root’:’root’,’topouser’:’topopwd’}, ’Stats’:{’root’:’root’,’statsuser’:’statspwd’}, } Utils 1 from django.utils import importlib 2 3 def get_class(module_name, cls_name): 4 try: 5 module = importlib.import_module(module_name) 6 except ImportError: 7 raise ImportError(’Invalid class path: {}’.format(module_name)) 8 try: 9 cls = getattr(module, cls_name) 10 except AttributeError: 11 raise ImportError(’Invalid class name: {}’.format(cls_name)) 12 else: 13 return cls 11 Ryu 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 import json, sys, MySQLdb, time from wsgiref.handlers import format_date_time from operator import attrgetter from webob import Response from ryu.app.wsgi import ControllerBase, WSGIApplication, route from ryu.base import app_manager from ryu.lib import dpid as dpid_lib from ryu.topology.api import get_switch, get_link, get_all_switch, get_all_link from ryu.controller.handler import set_ev_cls from ryu.controller import ofp_event from ryu.controller.handler import DEAD_DISPATCHER, MAIN_DISPATCHER from ryu.ofproto import ofproto_v1_3 from ryu.lib import hub myapp_instance_name = ’MyApp’ class MyApp(app_manager.RyuApp): _CONTEXTS = { ’wsgi’: WSGIApplication } OFP_VERSIONS = [ofproto_v1_3.OFP_VERSION] def __init__(self, *args, **kwargs): super(MyApp, self).__init__(*args, **kwargs) wsgi = kwargs[’wsgi’] wsgi.register(MyAppRestController, {myapp_instance_name: self}) self.port_thread = hub.spawn(self._port_monitor) self.flow_thread = hub.spawn(self._flow_monitor) self.flow_list={} self.switches={} self.monitored_switches={} self.db = SQLDB() self.previous_read = {} self.flow_refresh_rate=3 self.port_refresh_rate=3 self.flow_time={} self.count=1 @set_ev_cls(ofp_event.EventOFPStateChange, MAIN_DISPATCHER) def add_switch(self, ev): try: datapath = ev.datapath d_id = datapath.id except: print("Error Occurred") self.switches[d_id]=datapath self.logger.info("Switch %s UP", d_id) @set_ev_cls(ofp_event.EventOFPStateChange, DEAD_DISPATCHER) def delete_switch(self,ev): datapath = ev.datapath d_id = datapath.id if d_id in self.switches: self.switches.pop(d_id) self.logger.info("Switch %s DOWN", d_id) def add_flow(self, d_id, priority, conditions, out_port, buffer_id=None): datapath = self.switches[int(d_id)] 12 ofproto = datapath.ofproto parser = datapath.ofproto_parser match = parser.OFPMatch(**conditions) if str(out_port) == "BROADCAST": inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [ parser.OFPActionOutput(ofproto.OFPP_FLOOD)])] else: inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [ parser.OFPActionOutput(int(out_port))])] self.logger.info("datapath: %s,conditions = %s, output = %s" % (d_id, conditions,out_port)) mod = parser.OFPFlowMod(datapath=datapath, priority=priority, match=match, instructions=inst, cookie=self. count) self.count+=1 datapath.send_msg(mod) self.logger.info("New Flow Added") 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 def remove_table_flows(self, dpid, flow_id): """Create OFP flow mod message to remove flows from table.""" datapath = self.switches[int(dpid)] ofproto = datapath.ofproto parser = datapath.ofproto_parser flow = self.flow_list[int(flow_id)] match = datapath.ofproto_parser.OFPMatch(**flow[’match’]) out_port = flow[’actions’][0][’OFPActionOutput’] table_id = flow[’table_id’] if str(out_port) == "BROADCAST": inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [ parser.OFPActionOutput(ofproto.OFPP_FLOOD)])] else: inst = [parser.OFPInstructionActions(ofproto.OFPIT_APPLY_ACTIONS, [ parser.OFPActionOutput(int(out_port))])] flow_mod = datapath.ofproto_parser.OFPFlowMod(datapath, 0, 0,table_id, ofproto.OFPFC_DELETE,0, 0,1,ofproto.OFPCML_NO_BUFFER,ofproto.OFPP_ANY, ofproto.OFPG_ANY, 0,match, inst) datapath.send_msg(flow_mod) def _port_monitor(self): while True: for dp in self.switches.values(): self._request_port_stats(dp) hub.sleep(self.port_refresh_rate) def _flow_monitor(self): while True: for dp in self.switches.values(): self._request_flow_stats(dp) hub.sleep(self.flow_refresh_rate) def _request_port_stats(self, datapath): ofproto = datapath.ofproto parser = datapath.ofproto_parser req = parser.OFPPortStatsRequest(datapath, 0, ofproto.OFPP_ANY) datapath.send_msg(req) def _request_flow_stats(self,datapath): ofproto = datapath.ofproto parser = datapath.ofproto_parser 13 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 req = parser.OFPFlowStatsRequest(datapath) datapath.send_msg(req) def serialize_flow_list(self,dpid=None,flow=None): if dpid != None: if flow != None: aux = self.flow_list[flow] else: aux = {key: value for key, value in self.flow_list.items() if value[’dpid’] == int(dpid)} aux[’age’]=time.time()-self.flow_time[int(dpid)] else: aux = [] for flow in self.flow_list: aux.append({’id’:flow,’dpid’:self.flow_list[flow][’dpid’]}) return json.dumps(aux, ensure_ascii=False, encoding=’utf8’) @set_ev_cls(ofp_event.EventOFPFlowStatsReply, MAIN_DISPATCHER) def _flow_stats_reply_handler(self, ev): body = ev.msg datapath = ev.msg.datapath.id data_json=body.to_jsondict() self.flow_list = {key: value for key, value in self.flow_list.items() if value[’dpid’] != datapath} self.flow_time[int(datapath)]=time.time() for flow in data_json[’OFPFlowStatsReply’][’body’]: cookie = flow[’OFPFlowStats’][’cookie’] if cookie != 0 and cookie not in self.flow_list: flow[’OFPFlowStats’][’match’][’OFPMatch’][’oxm_fields’][0][’OXMTlv ’].pop(’mask’) match_list = flow[’OFPFlowStats’][’match’][’OFPMatch’][’oxm_fields ’] condition_list={} for condition in match_list: condition_list[condition[’OXMTlv’][’field’]]=condition[’OXMTlv ’][’value’] actionlist = flow[’OFPFlowStats’][’instructions’][0][’ OFPInstructionActions’][’actions’] for action in actionlist: for actionname in action: if actionname == ’OFPActionOutput’: action[actionname]=action[actionname][’port’] else: action[actionname].pop(’max_len’) action[actionname].pop(’len’) action[actionname].pop(’type’) table_id = flow[’OFPFlowStats’][’table_id’] flow={’match’:condition_list,’actions’:actionlist,’table_id’: table_id,’dpid’:datapath} self.flow_list[cookie]=flow @set_ev_cls(ofp_event.EventOFPPortStatsReply, MAIN_DISPATCHER) def _port_stats_reply_handler(self, ev): body = ev.msg.body for stat in body: dp_id = ev.msg.datapath.id.__str__() if dp_id in self.previous_read: if str(stat.port_no) in self.previous_read[str(dp_id)]: old=self.previous_read[str(dp_id)][str(stat.port_no)] 14 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 self.db.record_stats(ev.msg.datapath.id,stat.port_no, stat.rx_packets-old[’ rx_packets’], stat.rx_bytes-old[’rx_bytes’], stat.rx_errors-old[’rx_errors ’], stat.tx_packets-old[’ tx_packets’], stat.tx_bytes-old[’tx_bytes’], stat.tx_errors-old[’tx_errors ’]) old=None self.previous_read[str(dp_id)][str(stat.port_no)]={} else: self.previous_read[str(dp_id)]={} self.previous_read[str(dp_id)][str(stat.port_no)]={} self.previous_read[str(dp_id)][str(stat.port_no)][’rx_packets’]=stat. rx_packets self.previous_read[str(dp_id)][str(stat.port_no)][’rx_bytes’]=stat. rx_bytes self.previous_read[str(dp_id)][str(stat.port_no)][’rx_errors’]=stat. rx_errors self.previous_read[str(dp_id)][str(stat.port_no)][’tx_packets’]=stat. tx_packets self.previous_read[str(dp_id)][str(stat.port_no)][’tx_bytes’]=stat. tx_bytes self.previous_read[str(dp_id)][str(stat.port_no)][’tx_errors’]=stat. tx_errors 185 186 class MyAppRestController(ControllerBase): 187 188 def __init__(self, req, link, data, **config): 189 super(MyAppRestController, self).__init__(req, link, data, **config) 190 self.myapp = data[myapp_instance_name] 191 192 @route(’sw_list’,’/topology/switches/’,methods=[’GET’]) 193 def get_sw_list(self,req, **kwargs): 194 switch_list = get_all_switch(self.myapp) 195 body = json.dumps([switch.to_dict() for switch in switch_list]) 196 return Response(body=body) 197 198 @route(’all_links’,’/topology/links/’, methods=[’GET’]) 199 def get_all_links(self,req,**kwargs): 200 links = get_all_link(self.myapp) 201 response = json.dumps([link.to_dict() for link in links]) 202 return Response(body=response) 203 204 @route(’sw_link_list’,’/topology/links/{dpid}/’,methods=[’GET’]) 205 def get_link_list(self,req, **kwargs): 206 if int(kwargs[’dpid’]) not in self.myapp.switches: 207 return Response(body=None,status_code=404) 208 links = get_link(self.myapp, int(kwargs[’dpid’])) 209 body = json.dumps([link.to_dict() for link in links]) 210 return Response(body=body) 211 212 @route(’flow_table’,’/flows/’,methods=[’GET’]) 213 def get_flow_list(self,req, **kwargs): 214 body = self.myapp.serialize_flow_list() 215 return Response(body=body) 216 217 @route(’sw_flow_table’,’/flows/{dpid}/’,methods=[’GET’],requirements = {’dpid 15 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 ’:’\d+’}) def get_sw_flow_list(self,req, **kwargs): if int(kwargs[’dpid’]) not in self.myapp.switches: return Response(body=None,status_code=404) body = self.myapp.serialize_flow_list(kwargs[’dpid’]) return Response(body=body) @route(’flow’,’/flows/{dpid}/{flow}/’,methods=[’GET’], requirements = {’dpid ’:’\d+’,’flow’:’\d+’}) def get_single_flow(self,req, **kwargs): if int(kwargs[’dpid’]) not in self.myapp.switches or int(kwargs[’flow’]) not in self.myapp.flow_list: return Response(body=None,status_code=404) body = self.myapp.serialize_flow_list(dpid=kwargs[’dpid’],flow=int(kwargs [’flow’])) return Response(body=body) @route(’add_flow’,’/flows/{dpid}/’,methods=[’POST’],requirements = {’dpid’:’\d +’}) def put_flow_into_list(self,req,**kwargs): if int(kwargs[’dpid’]) not in self.myapp.switches: return Response(body=None,status_code=404) try: data = eval(req.body) except: return Response(staus_code=400) for flow in data: result = self.myapp.add_flow(kwargs[’dpid’],int(flow[’priority’]),flow [’conditions’],flow[’out_port’]) if result == False: return Response(body=None,status_code=404) return Response(status_code=200) @route(’delete_flow’, ’/flows/{dpid}/{flow}/’, methods=[’DELETE’], requirements = {’dpid’:’\d+’,’flow’:’\d+’}) def delete_flow(self, req,**kwargs): if int(kwargs[’dpid’]) not in self.myapp.switches or int(kwargs[’flow’]) not in self.myapp.flow_list: return Response(body=None,status_code=404) self.myapp.remove_table_flows(kwargs[’dpid’],kwargs[’flow’]) return Response(status_code=200) @route(’get_statistics’, ’/statistics/{dpid}/{port}/’,methods=[’GET’], requirements = {’dpid’:’\d+’,’port’:’\d+’}) def get_statistics(self, req, **kwargs): if int(kwargs[’dpid’]) not in self.myapp.switches: return Response(body=None,status_code=404) stats = self.myapp.db.get_statistics(kwargs[’dpid’],kwargs[’port’],10) if stats == "": return Response(body=None,status_code=404) return Response(body=stats) 253 254 255 256 257 258 259 260 261 262 class SQLDB(): 263 264 def __init__(self): 265 self.db=MySQLdb.connect(host=’127.0.0.1’,user=’root’, passwd=’root’, db=’ 266 267 268 SDN’) self.c = self.db.cursor() def get_statistics(self, dpid, port, limit): 16 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 action = "SELECT rx_pkts,rx_bytes,rx_error,tx_pkts,tx_bytes,tx_error,time FROM switch_%s_port_%s ORDER BY time DESC LIMIT %s"%(dpid,port,limit) try: self.c.execute(action) except: return False jsonlist=[] for (rx_pkts,rx_bytes,rx_error,tx_pkts,tx_bytes,tx_error,date) in self.c: resultdict ={} resultdict[’rx_pkts’] = rx_pkts resultdict[’rx_bytes’] = rx_bytes resultdict[’rx_error’] = rx_error resultdict[’tx_pkts’] = tx_pkts resultdict[’tx_bytes’] = tx_bytes resultdict[’tx_error’] = tx_error resultdict[’date’] = str(date) jsonlist.append(resultdict) results = json.dumps(jsonlist) return results def record_flow_entries(self, dpid, match, instructions, table_id): try: self.c.execute("INSERT INTO flow_stats (dpid,flow_match,instructions, table_id) VALUES (’%s’,’%s’,’%s’,’%s’)"%(dpid,match,instructions, table_id)) except Exception as e: pass def record_stats(self, dpid, port, rx_pkt,rx_b, rx_e, tx_pkt, tx_b, tx_errors) : try: self.c.execute(’INSERT INTO switch_%s_port_%s (rx_pkts,rx_bytes,rx_error, tx_pkts,tx_bytes,tx_error) VALUES (%s,%s,%s,%s,%s,%s)’%(dpid,port,rx_pkt ,rx_b,rx_e,tx_pkt,tx_b,tx_errors)) self.db.commit() except: self.c.execute("CREATE TABLE switch_%s_port_%s LIKE base_table" % ( dpid,port)) self.db.commit() self.c.execute(’INSERT INTO switch_%s_port_%s (rx_pkts,rx_bytes, rx_error,tx_pkts,tx_bytes,tx_error) VALUES (%s,%s,%s,%s,%s,%s)’%( dpid,port,rx_pkt,rx_b,rx_e,tx_pkt,tx_b,tx_errors)) self.db.commit() 17 Application demonstration of use For this demonstration, the mininet virtualized network contains two switches using Open Flow 1.3, a link between them exists and each of them is connected to a host (h1, h2) to perform tests. $ s u d o mn −−t o p o l i n e a r , 2 −−mac −−s w i t c h o v s k −− c o n t r o l l e r r e m o t e $ s u d o ovs−v s c t l s e t B r i d g e s 1 p r o t o c o l s =OpenFlow13 $ s u d o ovs−v s c t l s e t B r i d g e s 2 p r o t o c o l s =OpenFlow13 Topology Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / Response: 1 {"fullflowlink": "flows/", "topologylink": "topology/"} Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / Response: 1 {"Linklink": "topology/links/", "Swithceslink": "topology/switches/"} Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −v Response: * About t o c o n n e c t ( ) t o 1 0 . 0 . 0 . 2 p o r t 80 ( # 0 ) Trying 1 0 . 0 . 0 . 2 . . . connected * > GET / t o p o l o g y / s w i t c h e s / HTTP / 1 . 1 > User−Agent : c u r l / 7 . 2 2 . 0 ( x86_64−pc−l i n u x −gnu ) l i b c u r l / 7 . 2 2 . 0 OpenSSL / 1 . 0 . 1 z l i b / 1 . 2 . 3 . 4 libidn /1.23 librtmp /2.3 > Host : 1 0 . 0 . 0 . 2 > Accept : */* > * HTTP 1 . 0 , assume c l o s e a f t e r body < HTTP / 1 . 0 401 UNAUTHORIZED < D a t e : F r i , 10 J u l 2015 0 8 : 0 2 : 3 6 GMT < S e r v e r : WSGIServer / 0 . 1 P y t h o n / 2 . 7 . 3 < C o n t e n t −Type : * / * < WWW −A u t h e n t i c a t e : B a s i c r e a l m =" Topo " < * C l o s i n g c o n n e c t i o n #0 i t h c e s l i n k ": " topology / switches /"} Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t Response: 1 {"Linklink": "topology[{"flowlistlink": "/flows/1/", "ports": [{"hw_addr": "0a:b8:aa:8b:e2:4f", "statslink": "/statistics/1/1/", "name": "s1-eth1", "port_no": "00000001", "dpid": "0000000000000001"}, {"hw_addr": "02:b1: b2:3b:9a:03", "statslink": "/statistics/1/2/", "name": "s1-eth2", " port_no": "00000002", "dpid": "0000000000000001"}], "linkslink": "/ topology/links/1/", "dpid": "0000000000000001"}, {"flowlistlink": "/ flows/2/", "ports": [{"hw_addr": "92:0d:30:2c:40:94", "statslink": "/ statistics/2/1/", "name": "s2-eth1", "port_no": "00000001", "dpid": "000 0000000000002"}, {"hw_addr": "26:83:73:b5:6a:2a", "statslink": "/ 1 statistics/2/2/", "name": "s2-eth2", "port_no": "00000002", "dpid": "000 0000000000002"}], "linkslink": "/topology/links/2/", "dpid": "0000000000 000002"}, {"flowlistlink": "/flows/3/", "ports": [], "linkslink": "/ topology/links/3/", "dpid": "0000000000000003"}]/links/", "Swithceslink" : "topology/switches/"} Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / l i n k s / −u r o o t : r o o t Response: 1 [{"src": {"hw_addr": "26:83:73:b5:6a:2a", "name": "s2-eth2", "port_no": "00 000002", "dpid": "0000000000000002"}, "dst": {"hw_addr": "02:b1:b2:3b:9a :03", "name": "s1-eth2", "port_no": "00000002", "dpid": "000000000000000 1"}}, {"src": {"hw_addr": "02:b1:b2:3b:9a:03", "name": "s1-eth2", " port_no": "00000002", "dpid": "0000000000000001"}, "dst": {"hw_addr": "2 6:83:73:b5:6a:2a", "name": "s2-eth2", "port_no": "00000002", "dpid": "00 00000000000002"}}] Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / t o p o l o g y / l i n k s / 1 / −u r o o t : r o o t Response: 1 [{"src": {"hw_addr": "02:b1:b2:3b:9a:03", "name": "s1-eth2", "port_no": "00 000002", "dpid": "0000000000000001"}, "dst": {"hw_addr": "26:83:73:b5:6a :2a", "name": "s2-eth2", "port_no": "00000002", "dpid": "000000000000000 2"}}] Flows Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / −u r o o t : r o o t Response: 1 [] Client: c u r l −X POST h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 1 / −d @add_flows −u r o o t : r o o t −H " C o n t e n t −t y p e : a p p l i c a t i o n / j s o n " −v add_flows: 1 [{"d_id":1,"priority":1,"conditions":{"in_port":1},"out_port":2},{"d_id":1, "priority":1,"conditions":{"in_port":2},"out_port":1}] Response * * * > > > > > > > About t o c o n n e c t ( ) t o 1 0 . 0 . 0 . 2 p o r t 80 ( # 0 ) Trying 1 0 . 0 . 0 . 2 . . . connected Server auth using Basic with user ’ root ’ POST / f l o w s / 1 / HTTP / 1 . 1 A u t h o r i z a t i o n : B a s i c cm9vdDpyb290 User−Agent : c u r l / 7 . 2 2 . 0 ( x86_64−pc−l i n u x −gnu ) l i b c u r l / 7 . 2 2 . 0 OpenSSL / 1 . 0 . 1 z l i b / 1 . 2 . 3 . 4 libidn /1.23 librtmp /2.3 Host : 1 0 . 0 . 0 . 2 Accept : */* C o n t e n t −t y p e : a p p l i c a t i o n / j s o n C o n t e n t −L e n g t h : 129 2 > * * < < < < < < * u p l o a d c o m p l e t e l y s e n t o f f : 129 o u t o f 129 b y t e s HTTP 1 . 0 , assume c l o s e a f t e r body HTTP / 1 . 0 200 OK D a t e : F r i , 10 J u l 2015 0 8 : 1 6 : 2 8 GMT S e r v e r : WSGIServer / 0 . 1 P y t h o n / 2 . 7 . 3 Vary : Accept , WWW −A u t h o r i z a t i o n C o n t e n t −Type : a p p l i c a t i o n / j s o n C l o s i n g c o n n e c t i o n #0 Client: c u r l −X POST h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 2 / −d @add_flows −u r o o t : r o o t −H " C o n t e n t −t y p e : a p p l i c a t i o n / j s o n " −v Response * * * > > > > > > > > * * < < < < < < * About t o c o n n e c t ( ) t o 1 0 . 0 . 0 . 2 p o r t 80 ( # 0 ) Trying 1 0 . 0 . 0 . 2 . . . connected Server auth using Basic with user ’ root ’ POST / f l o w s / 2 / HTTP / 1 . 1 A u t h o r i z a t i o n : B a s i c cm9vdDpyb290 User−Agent : c u r l / 7 . 2 2 . 0 ( x86_64−pc−l i n u x −gnu ) l i b c u r l / 7 . 2 2 . 0 OpenSSL / 1 . 0 . 1 z l i b / 1 . 2 . 3 . 4 libidn /1.23 librtmp /2.3 Host : 1 0 . 0 . 0 . 2 Accept : */* C o n t e n t −t y p e : a p p l i c a t i o n / j s o n C o n t e n t −L e n g t h : 129 u p l o a d c o m p l e t e l y s e n t o f f : 129 o u t o f 129 b y t e s HTTP 1 . 0 , assume c l o s e a f t e r body HTTP / 1 . 0 200 OK D a t e : F r i , 10 J u l 2015 0 8 : 1 8 : 2 5 GMT S e r v e r : WSGIServer / 0 . 1 P y t h o n / 2 . 7 . 3 Vary : Accept , WWW −A u t h o r i z a t i o n C o n t e n t −Type : a p p l i c a t i o n / j s o n C l o s i n g c o n n e c t i o n #0 Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 1 / −u r o o t : r o o t Response: 1 {"1": {"detailflowlink": "/flows/1/1/", "table_id": 0, "actions": [{" OFPActionOutput": 2}], "match": {"in_port": 1}, "dpid": 1}, "age": 2.707 3638439178467, "2": {"detailflowlink": "/flows/1/2/", "table_id": 0, " actions": [{"OFPActionOutput": 1}], "match": {"in_port": 2}, "dpid": 1}} Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 2 / −u r o o t : r o o t Response 1 {"age": 0.3208048343658447, "3": {"detailflowlink": "/flows/2/3/", " table_id": 0, "actions": [{"OFPActionOutput": 2}], "match": {"in_port": 1}, "dpid": 2}, "4": {"detailflowlink": "/flows/2/4/", "table_id": 0, " actions": [{"OFPActionOutput": 1}], "match": {"in_port": 2}, "dpid": 2}} On mininet: m i n i n e t > h1 p i n g −c 1 h2 PING 1 0 . 0 . 0 . 2 ( 1 0 . 0 . 0 . 2 ) 5 6 ( 8 4 ) b y t e s o f d a t a . 64 b y t e s from 1 0 . 0 . 0 . 2 : i c m p _ r e q =1 t t l =64 t i m e = 0 . 5 0 9 ms −−− 1 0 . 0 . 0 . 2 p i n g s t a t i s t i c s −−− 3 1 p a c k e t s t r a n s m i t t e d , 1 r e c e i v e d , 0% p a c k e t l o s s , t i m e 0ms r t t min / avg / max / mdev = 0 . 5 0 9 / 0 . 5 0 9 / 0 . 5 0 9 / 0 . 0 0 0 ms Client: c u r l −X DELETE h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 2 / 3 / −u r o o t : r o o t Response: None On mininet: m i n i n e t > h1 p i n g −c 1 h2 PING 1 0 . 0 . 0 . 2 ( 1 0 . 0 . 0 . 2 ) 5 6 ( 8 4 ) b y t e s o f d a t a . 64 b y t e s from 1 0 . 0 . 0 . 2 : i c m p _ r e q =1 t t l =64 t i m e = 0 . 5 0 9 ms −−− 1 0 . 0 . 0 . 2 p i n g s t a t i s t i c s −−− 1 p a c k e t s t r a n s m i t t e d , 1 r e c e i v e d , 0% p a c k e t l o s s , t i m e 0ms r t t min / avg / max / mdev = 0 . 5 0 9 / 0 . 5 0 9 / 0 . 5 0 9 / 0 . 0 0 0 ms Client: c u r l −X POST h t t p : / / 1 0 . 0 . 0 . 2 / f l o w s / 2 / −d @ a d d _ s i n g l e −u r o o t : r o o t −H " C o n t e n t −t y p e : application / json " add_single: 1 [{"d_id":2,"priority":1,"conditions":{"in_port":1},"out_port":2}] Response: None Statistics For this section the value of the port refresh rate has been set to 1 per second so it is Client: c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / s t a t i s t i c s / 1 / 1 / −d @ a d d _ s i n g l e −u r o o t : r o o t −H " C o n t e n t −t y p e : a p p l i c a t i o n / j s o n " && s l e e p 10 && c u r l −X GET h t t p : / / 1 0 . 0 . 0 . 2 / s t a t i s t i c s / 1 / 1 / −d @ a d d _ s i n g l e −u r o o t : r o o t −H " C o n t e n t −t y p e : a p p l i c a t i o n / j s o n " Right after executing this command, on mininet: mininet > i p e r f Response 1 2 3 4 5 6 7 8 9 10 [{"tx_bytes": 51, "date": "2015-07-10 10:58:02", "rx_bytes": 0, "rx_pkts": 0, "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:58:01", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:58:00", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:57:59", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:57:58", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:57:57", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:57:56", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:57:55", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:57:54", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:57:53", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}] 4 11 12 14 15 16 17 18 19 20 21 22 [{"tx_bytes": 51, "date": "2015-07-10 10:58:12", "rx_bytes": 0, "rx_pkts": 0, "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:58:11", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:58:10", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}, {"tx_bytes": 451161, "date": "2015-07-10 10:58:09", "rx_bytes": -3995999330 , "rx_pkts": 6835, "tx_error": 0, "tx_pkts": 6836, "rx_error": 0}, {"tx_bytes": 3448023, "date": "2015-07-10 10:58:08", "rx_bytes": 2286001006 , "rx_pkts": 52243, "tx_error": 0, "tx_pkts": 52243, "rx_error": 0}, {"tx_bytes": 3937281, "date": "2015-07-10 10:58:07", "rx_bytes": -152970022 0, "rx_pkts": 63174, "tx_error": 0, "tx_pkts": 59656, "rx_error": 0}, {"tx_bytes": 3546759, "date": "2015-07-10 10:58:06", "rx_bytes": 2383469390 , "rx_pkts": 54471, "tx_error": 0, "tx_pkts": 53739, "rx_error": 0}, {"tx_bytes": 3646281, "date": "2015-07-10 10:58:05", "rx_bytes": -181992188 6, "rx_pkts": 56845, "tx_error": 0, "tx_pkts": 55180, "rx_error": 0}, {"tx_bytes": 3224035, "date": "2015-07-10 10:58:04", "rx_bytes": 2176716210 , "rx_pkts": 49701, "tx_error": 0, "tx_pkts": 48849, "rx_error": 0}, {"tx_bytes": 51, "date": "2015-07-10 10:58:03", "rx_bytes": 0, "rx_pkts": 0 , "tx_error": 0, "tx_pkts": 1, "rx_error": 0}] Grphical representation: Transmitted bytes on iperf test 4500000 4000000 3500000 3000000 Bytes 13 2500000 2000000 1500000 1000000 500000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Seconds Graphical representation of the statistics retreived Cache Having a wireshark instance capturing br0 and another one caputring br1 filtering HTTP: Client: c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t && s l e e p 10 && c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t && s l e e p 60 && c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u root : root Result on figure 2. 5 Wireshark capture of a cached response Content Negotiation c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t −H " A c c e p t : a p p l i c a t i o n / yaml ; q = 0 . 5 , a p p l i c a t i o n / * ; q = 0 . 8 " && c u r l −X GET 1 0 . 0 . 0 . 2 / t o p o l o g y / s w i t c h e s / −u r o o t : r o o t −H " A c c e p t : * / * ; q = 0 . 5 , a p p l i c a t i o n / yaml ; q =0.8" Result: 1 [{"flowlistlink": "/flows/1/", "ports": [{"hw_addr": "0a:b8:aa:8b:e2:4f", " statslink": "/statistics/1/1/", "name": "s1-eth1", "port_no": "00000001" , "dpid": "0000000000000001"}, {"hw_addr": "02:b1:b2:3b:9a:03", " statslink": "/statistics/1/2/", "name": "s1-eth2", "port_no": "00000002" , "dpid": "0000000000000001"}], "linkslink": "/topology/links/1/", "dpid ": "0000000000000001"}, {"flowlistlink": "/flows/2/", "ports": [{" hw_addr": "92:0d:30:2c:40:94", "statslink": "/statistics/2/1/", "name": "s2-eth1", "port_no": "00000001", "dpid": "0000000000000002"}, {"hw_addr ": "26:83:73:b5:6a:2a", "statslink": "/statistics/2/2/", "name": "s2-eth 2", "port_no": "00000002", "dpid": "0000000000000002"}], "linkslink": "/ topology/links/2/", "dpid": "0000000000000002"}] 2 3 4 5 6 7 8 9 10 11 12 dpid: ’0000000000000001’ flowlistlink: /flows/1/ linkslink: /topology/links/1/ ports: - {dpid: ’0000000000000001’, hw_addr: ’0a:b8:aa:8b:e2:4f’, name: s1-eth1, port_no: ’00000001’, statslink: /statistics/1/1/} - {dpid: ’0000000000000001’, hw_addr: ’02:b1:b2:3b:9a:03’, name: s1-eth2, port_no: ’00000002’, statslink: /statistics/1/2/} --dpid: ’0000000000000002’ 6 13 14 15 16 17 18 19 flowlistlink: /flows/2/ linkslink: /topology/links/2/ ports: - {dpid: ’0000000000000002’, hw_addr: ’92:0d:30:2c:40:94’, name: s2-eth1, port_no: ’00000001’, statslink: /statistics/2/1/} - {dpid: ’0000000000000002’, hw_addr: ’26:83:73:b5:6a:2a’, name: s2-eth2, port_no: ’00000002’, statslink: /statistics/2/2/} 7
© Copyright 2026 Paperzz