the Web - LIA - Laboratory of Advanced Research on Computer

SERVIZI MULTIMEDIALI PER L'INTERAZIONE
Web Basics
Antonio Corradi
Luca Foschini
Anno accademico 2016/2017
Laurea Magistrale in
Advanced Design
UNIVERSITY OF BOLOGNA ENGINEERING SCHOOL
Alma Mater Studiorum Università di Bologna
Stats and facts of Web
The World Wide Web (WWW) has
been created in 1989 by Tim
Berners-Lee while working at CERN
in Geneve
The basic idea behind the design of the Web was
to provide tools suitable for a good sharing of:
• static documents (not changing, prepared, persitent,
and well known)
• in hypertext format
• available in the Internet
The Web intended to replace and subsume the old
document sharing ways based on old protocols,
such as traditional Internet protocols as FTP and
Gopher (based on text only)
2
Hypertext
An hypertext is a set of documents in
relationship to each other via user defined
links (hyperlinks, or simply links)
• It can be described as a net (better as a
graph) where documents are nodes
• Navigating a link we can pass from one
document (its part) to any other document
part of the global graph of all documents
• Main property of an hypertext is the new strategy of reading,
where users do not follow a linear sequence (as in a book): any
document can be the next in a specific user navigation
• The hypertext makes possible an infinite number of different
paths of reading, consultation, and navigation
The Web intended to connect not only text, but also
multimedia content (with no limitation: images, sounds, video
streaming, …)
We call Hypermedia those heterogeneous information graphs
3
WWW as a global hypertext resource
The main idea of Berners-Lee was to put together the
hypertext concept and the Internet
• World Wide Web (WWW) is an hypertext distributed
over the global net
• Documents, namely Web pages, can reside in
servers scattered around the world (World Wide) to
build a unique globally interconnected virtual net
(the Web)
• pages generally consist of more resources: text, static
images, dynamic images, stream, …
• resources within a page can also reside at several
different locations over different servers
• Any document allows “jumping” to one another
independently from its location
• Those jumps enable Web navigation (or surfing)
4
Web basic items
The implementation of a global hypertext needs three
basic concepts:
1. A mechanism to localize any interesting document
2. A protocol to access to resources that compose the
document and to transfer content to the requestor
3. A language to describe the hypertext documents
(needed to compose and constitute the Web pages)
and two types of “physical” supports:
• Servers capable of providing all resources that
compose the documents and of making contents
available
• Clients capable of asking and visualizing the
interesting documents and of supporting the
navigation from one document to other ones
5
Web Model: Scheme - Architecture
Server machine
Client machine
URL
Web Server
Browser
HTML
Document
Resources
(HTML
documents)
6
Web Model: basic items
• The Web is based on a Client/Server model
• Clients, called Web Browsers
• use the http protocol to reach servers
• use URL to identify resources
• ask for Web pages to server and render the page
information
• Servers, called Web Servers
• are always waiting for connections and requests of
new clients
• use the http protocol to interact with clients
• reply to client with the Web page contents that have
been specifically requested
7
A well-known distributed case
Our first case is when accessing to a Web page
• The system is distributed or simply located over the global
Network …
• A user surfs the Web pages that are stored and located at
several different servers (transparently from their location)
User vision
technical vision (Architecture?)
client node
FORM
server node
ELABORAZIONE
external applications
support applications
INPUT UTENTE
OUTPUT
HTTP
client
elemento
request
reply
HTTP
server
abcdef
CGI
local system
server system
TCP / IP
LOCAL VISION
TCP / IP
rete
user
client /server interaction
REMOTE NODES
INTERNET
8
Web formula
Put it as a market advertisement, the initial Web vision
can be described by the sort of a “formula” below:
WWW = URL + HTTP + HTML
Uniform Resource Locator
to address resources
available on servers
HyperText Transfer Protocol
to allow resource content
transfer
HyperText Markup
Language
to allow the
representation of
hypertext documents
9
First Historical Home page
• Home page is typically the first access page in a
specified system, e.g., the default page in either a
Web server or a browser
• It may contain links to other referred pages
The first Web
home page
10
BASICS: UNIFORM RESOURCE LOCATOR
WWW = URL + HTTP + HTML
The first term of our “Web formula” introduces
some very core problems:
• How to identify the server capable of giving us back a
document part of the hypertext (either one page or a
resource within one page)?
• How to identify the resource (part of the hypertext)
we are interested to?
• Which mechanisms to use in accessing the resource?
The unifying answer to those above questions is
a naming for any Web resource called URL
11
URL: UNIFORM RESOURCE LOCATOR
Uniform Resource Locator (URL): it identifies a
resource via the primary access mechanism
(connected to its Internet location)
The term resource identifies any entity that has a precise
identity (so can be made available separately or as a
part of a document): a (text) document, an image, a
service, a video, a collection of other resources, …
The URL allows also to give information related to its
contents together with information on how to get and
access to it
http://middleware.unibo.it/
http://www.lia.disi.unibo.it/Courses/WebTech/
While there are many other name conventions (and
available name systems) the URL system is the most used
and also the most significant
12
UNIFORM RESOURCE LOCATOR
URLs can also give information on how to access the
identified resource
Specifies also the requested protocol for the transfer
of the resource itself
• Typically the URL name tags also the protocol to be used
for the transfer
• The remaining part is protocol-dependent
• The most common form (HTTP-like scheme) has a syntax
(compulsory items in red, optional in violet and within [ ])
<protocol>://[<username>:<password>@] <host>
[:<port>] [/<path>] [?<query>] [#fragment]
• The above URL form allows many other protocols of very
common use: HTTP, HTTPS, FTP, WAP, …
• E-mail not allowed (asynchronous: it may take too long)
13
URL components in HTTP-like formats
• <protocol>: it defines the protocol to be used in the
access to the server (HTTP, HTTPS, FTP, MMS, …)
• <host>: server address for the resource; either a
domain name or an IP address
• <path>: the path (pathname) in the file system,
directory of the server to identify and name the resource.
If missing a default page is intended (i.e., the home page)
• <port>: the port (physical transport entity) to be used
(TCP protocol per HTTP). The default port used when not
differently specified for HTTP is 80
• <username>:<password>: optional credentials for
user authentication
• <query>: character string to allow the embedding of
additional information (command parameters) to the
server. Usually with the format <parameter=value>:
parameter1=value&parameter2=value2…
14
Example of an URL with an HTTP format
Domain Name of the
server that stores the
Web page
Server Port
http://lia.disi.unibo.it:8080/Courses/index.html
Communication
protocol with the
server
The http protocol is the
default one for the Web
File Path in the file
system of the server
15
HyperText Transfer Protocol - HTTP
WWW = URL + HTTP + HTML
HTTP is the acronym of HyperText Transport Protocol,
the protocol to be used for the interaction between Web
servers and clients, i.e., to transfer the requests from
clients to servers and the replies form servers to clients
The HTTP is an Internet protocol that specifies the
message format and the details of the whole Web
interaction (operation requests and message replies)
• The HTTP protocol is a client/server protocol where
clients are allowed to ask for different operations and
server are expected to wait for operation requests and
to answer them (clients have the initiative and server
answers with a result)
16
HyperText Transfer Protocol - HTTP
HTTP -> a client server protocol
HTTP is an Internet protocol that drives any
communication in the Web scenarios
HTTP is client /server in the sense that the client asks for
an operation and wait until a reply is given back to it and
the server gives the answer
In other words, it is a Request/Reply protocol: any
operation is organized in one request and one
corresponding reply, in a two step protocol for the client
The server, of course, should honor any request by taking
into account it and execute a corresponding operation to
produce the waited for reply
17
HTTP Messages: example of a request
The protocol employs messages with an ASCII format
(readable text) that are very verbose, long, and
complex
An http
request example
Request
line
with the command
(GET, POST…), the
requested
resource, and
protocol version
Header
lines
Empty body
GET /somedir/page.html HTTP/1.1
Host: www.unibo.it
Connection: close
User-agent: Mozilla/4.0
Accept: text/html,image/gif,image/jpeg
Accept-language:it
Close connection at request
completion: non-persistent connection
18
A more complex example of a request
GET /search?q=Introduction+to+XML+and+Web+Technologies HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Gecko/20040803
Accept: text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: da,en-us;q=0.8,en;q=0.5,sw;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.google.com/
Do not close connection
at request completion:
persistent connection
Keep connection
open interval:
persistent connection time
19
Protocol commands: REQUEST / REPLY
There are several messages that can be exchanged
between client and server
It is a client-initiative protocol, so it consists of client
request and server reply
• REQUEST Commands
• GET and POST to send requests for pages
• PUT and DELETE to change or delete pages
• HEAD, OPTIONS, TRACE for management
• with several different attributes to better specify
‘how to’ for operations
• REPLY and attributes
• Unique format for the result in the reply
• with several different items in the reply: one is the
HTML of the requested page
20
Request Commands: GET
• GET
The most requested command (operation): it is
the one activated by clicking an hypertext link in
an HTML document, or when specifying an URL in
the higher URL search indicator in a browser
window
It allows both:
• requesting a resource to a server
• passing parameters to better specify the
operation (the <query> part of URL)
It has:
• a maximum length per URL; that limitation allows
only for a limited number of parameters
21
Request Commands: POST
• POST
It is the full message to request a resource
Differently from GET, all details about operations
and its computation are not within the URL, but
they are contained in a different message part not
in the URL: the message body
•
There are no maximum limit length to the
parameters in a request
POST is typically used to ask for more complex
operations to the server (see the following) such as
to submit the information for personalized contents
(such as an HTML form for a CGI application)
This transmission information does not necessarily
require the presence/creation of a resource over
the server
22
Request Commands: PUT and DELETE
• PUT
Requests the storing of a resource at the
specified URL to the server
• The PUT method transmits information from the
client to the server
• Differently from POST it implies the creation of a
resource (or its replacement if it already exists)
• The PUT argument is the resource that one
intends to obtain with a successive GET by
using the same name
• DELETE
Requests the deletion of a resource at the
specified URL to the server
Typically disabled commands in public servers
23
Request Commands: HEAD, OPTIONS & TRACE
• HEAD similar to the GET method , but the server
answers only with the requested header, without
sending the body
• Used for URL verification
• Validity: the resource exists and it is not void
• Accessibility: no authentication requested
• OPTIONS: to request information about available
options in the communication
• TRACE: to invoke the remote loop-back interface
at the application level for the request message
• The client is allowed to see what the server has
received: typically used by diagnostic
processes and testing phases of Web
architectures
24
Reply Format
status line
(protocol,
state code,
status phrase)
Header part
Message Body:
in this case,
the requested
HTML page
HTTP/1.1 200 OK
Connection: close
Date: Thu, 06 Aug 1998 12:00:15 GMT
Server: Apache/1.3.0 (Unix)
Last-Modified: Mon, 22 Jun 1998 …...
Content-Length: 6821
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD
HTML 4.01 Transitional//EN">
<html>...</html
HTTP 1.0: the server closes the connection at request
completion
HTTP 1.1: the server keeps open the connection or closes
it only in case of the clause “Connection: close”
25
Status Codes
The status code is a three digit code to give
information about the process: the first digit is the
class of the answer and the other ones are more
answer specific
There are 5 classes:
• 1xx: Informational. A temporary answer to the request, while
going on with execution (deprecated since HTTP 1.0)
• 2xx: Successful. The server received, understood correctly,
and accepted the request
• 3xx: Redirection. The server received, understood correctly,
but other client actions are needed before executing the
requested operation
• 4xx: Client error. The requested operation cannot be honored
because of problems (syntax error or unauthorized request)
• 5xx: Server error. The requested operation can also be a
correct one, but the server cannot satisfy it, possibly for an
internal problem (server or other applications, such as CGI)
26
Examples of Status Codes
• 100 Continue (sent if the client has not send the
body yet)
• 200 Ok (successful GET)
• 201 Created (successful PUT)
• 301 Moved permanently (no longer valid URL,
no new position known by server)
• 400 Bad request (syntax error in request)
• 401 Unauthorized (missing authorization)
• 403 Forbidden (not acceptable request)
• 404 Not found (error in URL)
• 500 Internal server error (typically an invalid
answer, e.g. from a CGI)
• 501 Not implemented (method unknown to the
server)
27
HyperText Transfer Protocol - HTTP
WWW = URL + HTTP + HTML
HTML is the acronym of HyperText Markup Language,
the language used to describe the Web pages that are
the nodes of the hypertext (made of HTML texts and
hyperlinks)
The HTML is a special (text-based) language to code
and qualify text based on markers (markup language)
 a language to code the text is a formal tool to
specify how to represent a document stored as a
text over a digital support in such a way that it can
be dealt with by machines and stored and
computed as a text
28
Markup Languages
A markup language consists of:
• A set of instructions called tags or markups
(markers) to represent the text document
properties
• a syntax to give rules about the markup process
• a semantics to define the application domain
and give suggestions about markup process
The markers are inserted directly into the text they
are referring to
• A tag is expressed as a sequence of characters,
preceded by special characters that tag the
markers and allow to distinguish them from
normal text <tag>
• Tags <a href=“remoteURL"> are <p> mixed in
<hr> the text < /p>
29
Markup Languages: Descriptive or Declarative
Most used markup languages are descriptive and
text-based, in the sense that they give information
about (they declare) the text content via additions
of recognized text
In particular, they can describe the editorial
structure, typically constituted by components
(content objects) organized in a hierarchy via tags
• Header, introduction, body, appendices,…
• Chapters, subchapters, acts, scenes, poems,…
• Titles, epigraphy, abstract,…
• Paragraphs, verses, words, dictionary entries,
…
• Emphasis, citations,…
30
HTML
HTML is modeled after a richer and more
expressive language, SGML (Standard
Generalized Markup Language)
• HTML makes possible to put together documents
with a simple structure that contains text,
images, interactive objects, and hypertext
connections to other documents
• Apart from content description, HTML associates
also graphic meaning to its defined items
• It gives instructions on how to graphically
render the defined items
That double meaning (and mixed objectives) can
introduce complexity and create problems
31
Tag
The HTML tags are the way to define the markup of
HTML items
• Tags are preceded and followed respectively
by two characters “<“ and “>” (angle brackets)
• The text between start tag and end tag is called
the item content or tag content
• Some “<…>” are coupled; for instance: <p> e
</p>, respectively called start tag and end tag
• An HTML document contains text and tags and
it is composed by text delimited by tags:
Item
<p>Text of a paragraph</p>
start tag
Item / tag Content
end tag
32
Links
In HTML, within the text you mix also the tag to
express links to other documents, called
references
• The tag is href and indicates the URL of the
precise document you refer to; typically you
express the link part, the URL, and the text
associated to it, to be visualized in the web page
<a href="http://java.sun.com/products/jdk">
Java Development Kit (JDK)
</a>
• In general the form is
<a href=“some URL”> any text </a>
• The URL is accepted in any form
• The text is not constrained in any form
33
HTML SOURCE
<head> <title> Getting Started</title> </head>
<body> <h1> Getting Started <img src=../images/Start.gif height=40
width=40 align=top> </h1> <p> <h3><em> by Kathy Walrath and Mary
Campione</em></h3> <p>
The lessons in this trail show you the simplest possible Java programs
and tell you how to compile and run them. They then go on to explain
the programs, giving you the background knowledge you need to
understand how they work.
<p> ........... <p align=center> <center>
<applet code=Animator.class codebase="../example" width=55
height=68>
<param name=endimage value=10> <param
name=pauses value="2500|100">
</applet> </center> </p>
<hr><strong> Before you go on:</strong> If you don't own a Java
development environment, you might want to download the
<a href="http://java.sun.com/products/jdk"> Java Development Kit
(JDK)</a> The JDK provides a compiler you can use to compile all
kinds of Java programs. It also provides an interpreter you can use to
run Java applications. To run Java applets, you can .............
34
Basic Structure of an HTML document
<!DOCTYPE HTML PUBLIC "//W3C//DTD HTML 4.0
Transitional//EN">
<html>
<head>
<title>Hello
document</title>
</head>
<body>
Hello World!
</body>
</html>
35
Header
The Header contains the essential information
about the page non visualized by the browser
(service information), and it is identified by the tag
<head>
It may contains several items, such as:
•
•
•
•
•
•
<title>: page title (shown in the upper part of the
window of the browser)
<meta>: metadata information for external
applications (e.g., search engines) or for browser
(national language, character encodings, for non Latin
alphabets, …)
<base>: to give an anchor to the references in links
<link>: the connections toward external files: CSS,
script, icons visualized in the address bar of browser
<script>: executable code potentially used by
document
<style>: information on style sheets to use (local CSS) 36
Body
The document body contains the document part
typically shown by browsers
The tag <body> identifies and contains the whole
document with several attributes, among which:
•
•
•
•
•
background =uri
Defines the URI of an image to used as the
background for the visualization
text =color
Defines the color of the text
bgcolor =color
alternatively to background, defines the color of the
background of the page
lang =language
defines the page language, e.g., language="it"
…
37
Item types in the body
•
•
•
•
•
•
•
•
•
Title: titles in a hierarchy
Text structures: paragraphs, indentation, etc.
Text aspect and font: bold, italic, etc.
Items and lists: numbered, dotted
Tables
Forms: buttons, checkbox and radio button,
jump menu, input inserting field, etc.
Hypertext links and anchors
Images and other multimedia contents (audio,
video, animations, etc.)
Interactive contents: scripts, external
applications, …
38
Body example
<body>
<h1>Titolo</h1>
<p>Questo &eacute; un
paragrafo completo di un
documento.</p>
<p>Un altro paragrafo<br>con
un a capo</p>
<hr>
<p>Esempio di lista puntata,
la lista della spesa:</p>
<ul>
<li>Pane</li>
<li>Latte</li>
<li>Prosciutto</li>
<li>Formaggio</li>
</ul>
</body>
Visualization
39
Physical Tags
•
•
•
•
•
<tt>...</tt>
<i>... </i>
<b>...</b>
<u>...</u>
<s>...</s>
Single space char formatting
Italic
Bold
Underlined (deprecated)
Strike-out text
< tt>monospaced text</tt> 
monospaced text
<i>italic text</i>
 italic text
<b>bold text</b>
 bold text
<u>underlined text</u>
 underlined text
<s>stroke</s>
 stroke
40
Browser market Usage
41
Market Share 2015 Browser
42
CLIENT/SERVER actions to ACCESS a PAGE
USER
REQUEST DOCUMENT
HTTP//WWW.FOO.IT
HTTP REQUEST
SERVER
HTTP
BROWSER
IDENTIFICATION OF FILE
IN THE FILE SYSTEM
SERVER
HTTP
PROVISIONING OF
FILE AND
VISUALIZATION
BROWSER
UTENTE
HTTP RESPONSE
SERVER
HTTP
INDEX.HTM
FIG.GIF
STYLE:CSS
43
HTTP request / response format at a glance
HTML
server
address
request
method
options
path+ filename
(in general, content)
Request Method (with parameters)
GET
POST
HEAD
PUT
DELETE
access to one URL
new content appended to the request (widely used)
request of header only (for caching control)
new page publication
Web page removal
Reply Method
status code: either success or failure (e.g., file not found)
resource information: some resource-specific data
content:
more information
HTML
content
status
code
resource
information
page
content
44
Servizi Multimediali per l’Interazione
Staff
Antonio Corradi - Distributed Systems & Middleware
E-mail: [email protected]
Ph: 051 20 93083
Luca Foschini - Mobile distributed, Social Systems
Research assistant at DISI
E-mail: [email protected]
Ph: 051 20 93541
45