SERVIZI MULTIMEDIALI PER L'INTERAZIONE Web Basics Antonio Corradi Luca Foschini Anno accademico 2016/2017 Laurea Magistrale in Advanced Design UNIVERSITY OF BOLOGNA ENGINEERING SCHOOL Alma Mater Studiorum Università di Bologna Stats and facts of Web The World Wide Web (WWW) has been created in 1989 by Tim Berners-Lee while working at CERN in Geneve The basic idea behind the design of the Web was to provide tools suitable for a good sharing of: • static documents (not changing, prepared, persitent, and well known) • in hypertext format • available in the Internet The Web intended to replace and subsume the old document sharing ways based on old protocols, such as traditional Internet protocols as FTP and Gopher (based on text only) 2 Hypertext An hypertext is a set of documents in relationship to each other via user defined links (hyperlinks, or simply links) • It can be described as a net (better as a graph) where documents are nodes • Navigating a link we can pass from one document (its part) to any other document part of the global graph of all documents • Main property of an hypertext is the new strategy of reading, where users do not follow a linear sequence (as in a book): any document can be the next in a specific user navigation • The hypertext makes possible an infinite number of different paths of reading, consultation, and navigation The Web intended to connect not only text, but also multimedia content (with no limitation: images, sounds, video streaming, …) We call Hypermedia those heterogeneous information graphs 3 WWW as a global hypertext resource The main idea of Berners-Lee was to put together the hypertext concept and the Internet • World Wide Web (WWW) is an hypertext distributed over the global net • Documents, namely Web pages, can reside in servers scattered around the world (World Wide) to build a unique globally interconnected virtual net (the Web) • pages generally consist of more resources: text, static images, dynamic images, stream, … • resources within a page can also reside at several different locations over different servers • Any document allows “jumping” to one another independently from its location • Those jumps enable Web navigation (or surfing) 4 Web basic items The implementation of a global hypertext needs three basic concepts: 1. A mechanism to localize any interesting document 2. A protocol to access to resources that compose the document and to transfer content to the requestor 3. A language to describe the hypertext documents (needed to compose and constitute the Web pages) and two types of “physical” supports: • Servers capable of providing all resources that compose the documents and of making contents available • Clients capable of asking and visualizing the interesting documents and of supporting the navigation from one document to other ones 5 Web Model: Scheme - Architecture Server machine Client machine URL Web Server Browser HTML Document Resources (HTML documents) 6 Web Model: basic items • The Web is based on a Client/Server model • Clients, called Web Browsers • use the http protocol to reach servers • use URL to identify resources • ask for Web pages to server and render the page information • Servers, called Web Servers • are always waiting for connections and requests of new clients • use the http protocol to interact with clients • reply to client with the Web page contents that have been specifically requested 7 A well-known distributed case Our first case is when accessing to a Web page • The system is distributed or simply located over the global Network … • A user surfs the Web pages that are stored and located at several different servers (transparently from their location) User vision technical vision (Architecture?) client node FORM server node ELABORAZIONE external applications support applications INPUT UTENTE OUTPUT HTTP client elemento request reply HTTP server abcdef CGI local system server system TCP / IP LOCAL VISION TCP / IP rete user client /server interaction REMOTE NODES INTERNET 8 Web formula Put it as a market advertisement, the initial Web vision can be described by the sort of a “formula” below: WWW = URL + HTTP + HTML Uniform Resource Locator to address resources available on servers HyperText Transfer Protocol to allow resource content transfer HyperText Markup Language to allow the representation of hypertext documents 9 First Historical Home page • Home page is typically the first access page in a specified system, e.g., the default page in either a Web server or a browser • It may contain links to other referred pages The first Web home page 10 BASICS: UNIFORM RESOURCE LOCATOR WWW = URL + HTTP + HTML The first term of our “Web formula” introduces some very core problems: • How to identify the server capable of giving us back a document part of the hypertext (either one page or a resource within one page)? • How to identify the resource (part of the hypertext) we are interested to? • Which mechanisms to use in accessing the resource? The unifying answer to those above questions is a naming for any Web resource called URL 11 URL: UNIFORM RESOURCE LOCATOR Uniform Resource Locator (URL): it identifies a resource via the primary access mechanism (connected to its Internet location) The term resource identifies any entity that has a precise identity (so can be made available separately or as a part of a document): a (text) document, an image, a service, a video, a collection of other resources, … The URL allows also to give information related to its contents together with information on how to get and access to it http://middleware.unibo.it/ http://www.lia.disi.unibo.it/Courses/WebTech/ While there are many other name conventions (and available name systems) the URL system is the most used and also the most significant 12 UNIFORM RESOURCE LOCATOR URLs can also give information on how to access the identified resource Specifies also the requested protocol for the transfer of the resource itself • Typically the URL name tags also the protocol to be used for the transfer • The remaining part is protocol-dependent • The most common form (HTTP-like scheme) has a syntax (compulsory items in red, optional in violet and within [ ]) <protocol>://[<username>:<password>@] <host> [:<port>] [/<path>] [?<query>] [#fragment] • The above URL form allows many other protocols of very common use: HTTP, HTTPS, FTP, WAP, … • E-mail not allowed (asynchronous: it may take too long) 13 URL components in HTTP-like formats • <protocol>: it defines the protocol to be used in the access to the server (HTTP, HTTPS, FTP, MMS, …) • <host>: server address for the resource; either a domain name or an IP address • <path>: the path (pathname) in the file system, directory of the server to identify and name the resource. If missing a default page is intended (i.e., the home page) • <port>: the port (physical transport entity) to be used (TCP protocol per HTTP). The default port used when not differently specified for HTTP is 80 • <username>:<password>: optional credentials for user authentication • <query>: character string to allow the embedding of additional information (command parameters) to the server. Usually with the format <parameter=value>: parameter1=value¶meter2=value2… 14 Example of an URL with an HTTP format Domain Name of the server that stores the Web page Server Port http://lia.disi.unibo.it:8080/Courses/index.html Communication protocol with the server The http protocol is the default one for the Web File Path in the file system of the server 15 HyperText Transfer Protocol - HTTP WWW = URL + HTTP + HTML HTTP is the acronym of HyperText Transport Protocol, the protocol to be used for the interaction between Web servers and clients, i.e., to transfer the requests from clients to servers and the replies form servers to clients The HTTP is an Internet protocol that specifies the message format and the details of the whole Web interaction (operation requests and message replies) • The HTTP protocol is a client/server protocol where clients are allowed to ask for different operations and server are expected to wait for operation requests and to answer them (clients have the initiative and server answers with a result) 16 HyperText Transfer Protocol - HTTP HTTP -> a client server protocol HTTP is an Internet protocol that drives any communication in the Web scenarios HTTP is client /server in the sense that the client asks for an operation and wait until a reply is given back to it and the server gives the answer In other words, it is a Request/Reply protocol: any operation is organized in one request and one corresponding reply, in a two step protocol for the client The server, of course, should honor any request by taking into account it and execute a corresponding operation to produce the waited for reply 17 HTTP Messages: example of a request The protocol employs messages with an ASCII format (readable text) that are very verbose, long, and complex An http request example Request line with the command (GET, POST…), the requested resource, and protocol version Header lines Empty body GET /somedir/page.html HTTP/1.1 Host: www.unibo.it Connection: close User-agent: Mozilla/4.0 Accept: text/html,image/gif,image/jpeg Accept-language:it Close connection at request completion: non-persistent connection 18 A more complex example of a request GET /search?q=Introduction+to+XML+and+Web+Technologies HTTP/1.1 Host: www.google.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040803 Accept: text/xml,application/xml,application/xhtml+xml, text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: da,en-us;q=0.8,en;q=0.5,sw;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://www.google.com/ Do not close connection at request completion: persistent connection Keep connection open interval: persistent connection time 19 Protocol commands: REQUEST / REPLY There are several messages that can be exchanged between client and server It is a client-initiative protocol, so it consists of client request and server reply • REQUEST Commands • GET and POST to send requests for pages • PUT and DELETE to change or delete pages • HEAD, OPTIONS, TRACE for management • with several different attributes to better specify ‘how to’ for operations • REPLY and attributes • Unique format for the result in the reply • with several different items in the reply: one is the HTML of the requested page 20 Request Commands: GET • GET The most requested command (operation): it is the one activated by clicking an hypertext link in an HTML document, or when specifying an URL in the higher URL search indicator in a browser window It allows both: • requesting a resource to a server • passing parameters to better specify the operation (the <query> part of URL) It has: • a maximum length per URL; that limitation allows only for a limited number of parameters 21 Request Commands: POST • POST It is the full message to request a resource Differently from GET, all details about operations and its computation are not within the URL, but they are contained in a different message part not in the URL: the message body • There are no maximum limit length to the parameters in a request POST is typically used to ask for more complex operations to the server (see the following) such as to submit the information for personalized contents (such as an HTML form for a CGI application) This transmission information does not necessarily require the presence/creation of a resource over the server 22 Request Commands: PUT and DELETE • PUT Requests the storing of a resource at the specified URL to the server • The PUT method transmits information from the client to the server • Differently from POST it implies the creation of a resource (or its replacement if it already exists) • The PUT argument is the resource that one intends to obtain with a successive GET by using the same name • DELETE Requests the deletion of a resource at the specified URL to the server Typically disabled commands in public servers 23 Request Commands: HEAD, OPTIONS & TRACE • HEAD similar to the GET method , but the server answers only with the requested header, without sending the body • Used for URL verification • Validity: the resource exists and it is not void • Accessibility: no authentication requested • OPTIONS: to request information about available options in the communication • TRACE: to invoke the remote loop-back interface at the application level for the request message • The client is allowed to see what the server has received: typically used by diagnostic processes and testing phases of Web architectures 24 Reply Format status line (protocol, state code, status phrase) Header part Message Body: in this case, the requested HTML page HTTP/1.1 200 OK Connection: close Date: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html>...</html HTTP 1.0: the server closes the connection at request completion HTTP 1.1: the server keeps open the connection or closes it only in case of the clause “Connection: close” 25 Status Codes The status code is a three digit code to give information about the process: the first digit is the class of the answer and the other ones are more answer specific There are 5 classes: • 1xx: Informational. A temporary answer to the request, while going on with execution (deprecated since HTTP 1.0) • 2xx: Successful. The server received, understood correctly, and accepted the request • 3xx: Redirection. The server received, understood correctly, but other client actions are needed before executing the requested operation • 4xx: Client error. The requested operation cannot be honored because of problems (syntax error or unauthorized request) • 5xx: Server error. The requested operation can also be a correct one, but the server cannot satisfy it, possibly for an internal problem (server or other applications, such as CGI) 26 Examples of Status Codes • 100 Continue (sent if the client has not send the body yet) • 200 Ok (successful GET) • 201 Created (successful PUT) • 301 Moved permanently (no longer valid URL, no new position known by server) • 400 Bad request (syntax error in request) • 401 Unauthorized (missing authorization) • 403 Forbidden (not acceptable request) • 404 Not found (error in URL) • 500 Internal server error (typically an invalid answer, e.g. from a CGI) • 501 Not implemented (method unknown to the server) 27 HyperText Transfer Protocol - HTTP WWW = URL + HTTP + HTML HTML is the acronym of HyperText Markup Language, the language used to describe the Web pages that are the nodes of the hypertext (made of HTML texts and hyperlinks) The HTML is a special (text-based) language to code and qualify text based on markers (markup language) a language to code the text is a formal tool to specify how to represent a document stored as a text over a digital support in such a way that it can be dealt with by machines and stored and computed as a text 28 Markup Languages A markup language consists of: • A set of instructions called tags or markups (markers) to represent the text document properties • a syntax to give rules about the markup process • a semantics to define the application domain and give suggestions about markup process The markers are inserted directly into the text they are referring to • A tag is expressed as a sequence of characters, preceded by special characters that tag the markers and allow to distinguish them from normal text <tag> • Tags <a href=“remoteURL"> are <p> mixed in <hr> the text < /p> 29 Markup Languages: Descriptive or Declarative Most used markup languages are descriptive and text-based, in the sense that they give information about (they declare) the text content via additions of recognized text In particular, they can describe the editorial structure, typically constituted by components (content objects) organized in a hierarchy via tags • Header, introduction, body, appendices,… • Chapters, subchapters, acts, scenes, poems,… • Titles, epigraphy, abstract,… • Paragraphs, verses, words, dictionary entries, … • Emphasis, citations,… 30 HTML HTML is modeled after a richer and more expressive language, SGML (Standard Generalized Markup Language) • HTML makes possible to put together documents with a simple structure that contains text, images, interactive objects, and hypertext connections to other documents • Apart from content description, HTML associates also graphic meaning to its defined items • It gives instructions on how to graphically render the defined items That double meaning (and mixed objectives) can introduce complexity and create problems 31 Tag The HTML tags are the way to define the markup of HTML items • Tags are preceded and followed respectively by two characters “<“ and “>” (angle brackets) • The text between start tag and end tag is called the item content or tag content • Some “<…>” are coupled; for instance: <p> e </p>, respectively called start tag and end tag • An HTML document contains text and tags and it is composed by text delimited by tags: Item <p>Text of a paragraph</p> start tag Item / tag Content end tag 32 Links In HTML, within the text you mix also the tag to express links to other documents, called references • The tag is href and indicates the URL of the precise document you refer to; typically you express the link part, the URL, and the text associated to it, to be visualized in the web page <a href="http://java.sun.com/products/jdk"> Java Development Kit (JDK) </a> • In general the form is <a href=“some URL”> any text </a> • The URL is accepted in any form • The text is not constrained in any form 33 HTML SOURCE <head> <title> Getting Started</title> </head> <body> <h1> Getting Started <img src=../images/Start.gif height=40 width=40 align=top> </h1> <p> <h3><em> by Kathy Walrath and Mary Campione</em></h3> <p> The lessons in this trail show you the simplest possible Java programs and tell you how to compile and run them. They then go on to explain the programs, giving you the background knowledge you need to understand how they work. <p> ........... <p align=center> <center> <applet code=Animator.class codebase="../example" width=55 height=68> <param name=endimage value=10> <param name=pauses value="2500|100"> </applet> </center> </p> <hr><strong> Before you go on:</strong> If you don't own a Java development environment, you might want to download the <a href="http://java.sun.com/products/jdk"> Java Development Kit (JDK)</a> The JDK provides a compiler you can use to compile all kinds of Java programs. It also provides an interpreter you can use to run Java applications. To run Java applets, you can ............. 34 Basic Structure of an HTML document <!DOCTYPE HTML PUBLIC "//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>Hello document</title> </head> <body> Hello World! </body> </html> 35 Header The Header contains the essential information about the page non visualized by the browser (service information), and it is identified by the tag <head> It may contains several items, such as: • • • • • • <title>: page title (shown in the upper part of the window of the browser) <meta>: metadata information for external applications (e.g., search engines) or for browser (national language, character encodings, for non Latin alphabets, …) <base>: to give an anchor to the references in links <link>: the connections toward external files: CSS, script, icons visualized in the address bar of browser <script>: executable code potentially used by document <style>: information on style sheets to use (local CSS) 36 Body The document body contains the document part typically shown by browsers The tag <body> identifies and contains the whole document with several attributes, among which: • • • • • background =uri Defines the URI of an image to used as the background for the visualization text =color Defines the color of the text bgcolor =color alternatively to background, defines the color of the background of the page lang =language defines the page language, e.g., language="it" … 37 Item types in the body • • • • • • • • • Title: titles in a hierarchy Text structures: paragraphs, indentation, etc. Text aspect and font: bold, italic, etc. Items and lists: numbered, dotted Tables Forms: buttons, checkbox and radio button, jump menu, input inserting field, etc. Hypertext links and anchors Images and other multimedia contents (audio, video, animations, etc.) Interactive contents: scripts, external applications, … 38 Body example <body> <h1>Titolo</h1> <p>Questo é un paragrafo completo di un documento.</p> <p>Un altro paragrafo<br>con un a capo</p> <hr> <p>Esempio di lista puntata, la lista della spesa:</p> <ul> <li>Pane</li> <li>Latte</li> <li>Prosciutto</li> <li>Formaggio</li> </ul> </body> Visualization 39 Physical Tags • • • • • <tt>...</tt> <i>... </i> <b>...</b> <u>...</u> <s>...</s> Single space char formatting Italic Bold Underlined (deprecated) Strike-out text < tt>monospaced text</tt> monospaced text <i>italic text</i> italic text <b>bold text</b> bold text <u>underlined text</u> underlined text <s>stroke</s> stroke 40 Browser market Usage 41 Market Share 2015 Browser 42 CLIENT/SERVER actions to ACCESS a PAGE USER REQUEST DOCUMENT HTTP//WWW.FOO.IT HTTP REQUEST SERVER HTTP BROWSER IDENTIFICATION OF FILE IN THE FILE SYSTEM SERVER HTTP PROVISIONING OF FILE AND VISUALIZATION BROWSER UTENTE HTTP RESPONSE SERVER HTTP INDEX.HTM FIG.GIF STYLE:CSS 43 HTTP request / response format at a glance HTML server address request method options path+ filename (in general, content) Request Method (with parameters) GET POST HEAD PUT DELETE access to one URL new content appended to the request (widely used) request of header only (for caching control) new page publication Web page removal Reply Method status code: either success or failure (e.g., file not found) resource information: some resource-specific data content: more information HTML content status code resource information page content 44 Servizi Multimediali per l’Interazione Staff Antonio Corradi - Distributed Systems & Middleware E-mail: [email protected] Ph: 051 20 93083 Luca Foschini - Mobile distributed, Social Systems Research assistant at DISI E-mail: [email protected] Ph: 051 20 93541 45
© Copyright 2026 Paperzz