HTML Notes

Hypertext Markup Language (HTML)
Is a throw back to another age that runs counter to the last decades' trends towards
graphically-based WYSIWYG word processing. If your first experience with word processing
was with a GUI based application like MS Word then HTML scripting seemed primitive and
broken. But people use to computers (like the Unix world) saw it as command-driven and
state sensitive.
An HTML document is an ordinary text file and the document's ultimate appearance is
controlled by “magical sequences of characters scattered about the text that is the
document's real 'message.'” Preparing a document in HTML is more like application
development than desktop publishing because it involves an endless cycle of modifying the
HTML source in a text editor, loading the file into a browser to see how it looks and prints,
figuring out what the problems are, and going back to the text editor to make changes.
Some things to remember since HTML does not have the same goals as a word processor:
•
•
•
It is a method of encoding complex hypermedia documents that focus on document
structure rather than physical appearance.
Documents have to be portable across every imaginable CPU architecture, operating
system, network transport, and methods of mass storage.
There is no way to predict the capabilities of the system on which the document would
be viewed or printed, even to the level of a minimum screen resolution, number of
colors, or included fonts.
HTML Structure
HTML formatting commands, know as tags, are merely reserved sequences of characters
that start with a < (less than) and end with > (greater than). HTML tags are not case
sensitive. In most cases, the tags are used in symmetric pairs with the final tag indicated by
the inclusion of the / (forward slash) character. <x>This is text.</x> This scheme of
symmetric tags comes from the industry-standard SGML (Standard Generalized Markup
Language). HTML is based on SGML – it is described by an SGML document type definition
(DTD) but it doesn't begin to exploit the full capabilities provided by SGML. There have been
five different versions or levels of HTML currently coexisting on the WWW:
HTML 0.9 / 1.0 (1989 - 1994)
Concerns itself mainly with control of headings, lists, and character formatting. It is very easy
to learn and use. It has eight categories of commands:
1. Structural – identify a valid HTML document and indicate the beginning and end of
logical sections within the document.
2. Text-Flow – indicate the ends of paragraphs, forced line breaks, headings, indented or
pre-formatted text.
3. Headings – provide for the formatting of six different levels of headings and
subheadings.
4. Character-Formatting – apply a “physical” or “logical” style to a stream of characters.
5. List – are used to format several different kinds of lists.
6. Special Character Escape Sequences – used to display glyphs that are not in the
ASCII character set, characters that cannot be entered with the author's keyboard, or
characters that would otherwise be interpreted as HTML commands.
7. In-Line Graphics – identify an external file as a graphic resource to be retrieved and
displayed within the document's text by the browser.
8. Anchor – creates a hyperlink or serves as a target if a hyperlink.
HTML 2.0 (1995)
Added menu commands and interactive forms. Browser makers started to create their own
features (thus requiring additional tags to use the features but the tags themselves were not
part of the actual HTML specification). Between HTML 1.0 and HTML 2.0 W3C was formed.
HTML 3.0 (1997)
Adds background bitmaps, a rich set of commands for formatting tables, and expanded
options for form elements. This version also allowed web pages to include complex
mathematical equations. Because W3C delayed agreeing on the next version (after HTML
2.0) of HTML, HTML 3.2 was created instead of HTML 3.0. It included support for CSS
(cascaded style sheets), but browser manufactures did not support it very well in their
browsers. Browser manufactures instead included support for frames even though HTML 3.2
specification did not support this feature.
HTML 4.0 (1999)
Adopted many browser-specific element types and attributes, but at the same time sought to
phase out Netscape's visual markup features by marking them as deprecated in favor of style
sheets. This version added support for style sheets and scripting ability for multimedia
elements. HTML 4.01 focused on separating presentation styling information from the actual
content by the use of style sheets as HTML 3.20 resulted in difficult maintenance because
presentation styling information was included directly in the web page. In HTML 4.0 with the
use of style sheets, it is now possible to change the appearance/look of the website by
changing just the style sheet (s) itself. In comparison, in the earlier versions of HTML making
the same changes for the entire website meant changing the styling information in the
individual pages! (A site with many pages would have meant many changes need to be made
before the appearance of the website could be changed.)
HTML 5.0 (2014)
Its core aims are to improve the language with support for the latest multimedia while keeping
it easily readable by humans and consistently understood by computers and devices (web
browsers, parsers, etc.). HTML5 is intended to subsume not only HTML 4, but also XHTML 1
and DOM Level 2 HTML. HTML5 is a response to the fact that the HTML and XHTML in
common use on the World Wide Web have a mixture of features introduced by various
specifications, along with those introduced by software products such as web browsers and
those established by common practice. It is also an attempt to define a single markup
language. It includes detailed processing models to encourage more inter-operable
implementations; it extends, improves and rationalizes the markup available for documents,
and introduces markup and application programming interfaces (APIs) for complex web
applications. For the same reasons, HTML5 is also a potential candidate for cross-platform
mobile applications. Many features of HTML5 have been built with the consideration of being
able to run on low-powered devices such as smartphones and tablets.
In particular, HTML5 adds many new syntactic features. These include the new <video>,
<audio> and <canvas> elements, as well as the integration of scalable vector graphics
(SVG) content (replacing generic <object> tags), and MathML for mathematical formulas.
These features are designed to make it easy to include and handle multimedia and graphical
content on the web without having to resort to proprietary plugins and APIs. Other new page
structure elements, such as <main>, <section>, <article>, <header>, <footer>,
<aside>, <nav> and <figure>, are designed to enrich the semantic content of documents.
New attributes have been introduced, some elements and attributes have been removed and
some elements, such as <a>, <cite> and <menu> have been changed, redefined or
standardized. The APIs and Document Object Model (DOM) are no longer afterthoughts, but
are fundamental parts of the HTML5 specification. HTML5 also defines in some detail the
required processing for invalid documents so that syntax errors will be treated uniformly by all
conforming browsers and other user agents.
Basic HTML 5 Documents
A basic HTML document looks like this:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Sample page</title>
<meta charset="utf­8"/>
</head>
<body>
<h1>Sample page</h1>
<p>This is a <a href="demo.html">simple</a> sample.</p>
<!­­ this is a comment ­­>
</body>
</html>
Elements of HTML
1. Structural Tags
The three most important HTML tags in this group are <html>, <head>, and <body>. They
provide fundamental identification and document organization information for the browser that
retrieves the document.
A DOCTYPE is a required preamble and required for legacy reasons. When omitted,
browsers tend to use a different rendering mode that is incompatible with some specifications.
Including the DOCTYPE in a document ensures that the browser makes a best-effort attempt
at following the relevant specifications.
The DOCTYPE for HTML 5 is <!DOCTYPE html>.
<html>
</html>
The html element represents the root of an HTML document. It is placed at the beginning
and end of the entire document, respectively, and identifies the document as a valid
document encoded in a particular version of the hypertext markup language. In more precise
language: these tags bound the portion of the document that the browser is expected to
understand and render on the screen for the viewer. Note: Browsers are able to cope with
documents missing this tag but it is a good practice to include it for ones that do not.
Authors are encouraged to specify a lang attribute on the root html element, giving the
document's language. This aids speech synthesis tools to determine what pronunciations to
use, translation tools to determine what rules to use, and so forth.
<!DOCTYPE html> <html lang="en">
<head>
</head>
The head element represents a collection of metadata for the Document. It is important to
note that nothing in the head section appears in the client section of the browser's window.
Only a few tags are allowed to appear between these tags. The most important tag within the
head section is the <title> </title>
Text between the <title> tags gets displayed by the browser in the title bar of its window
but more importantly, this is the text that becomes a bookmark in the Bookmark list of the
browser. Authors should be specific and brief in their title name.
There must be no more than one title element per document.
The meta element represents various kinds of metadata that cannot be expressed using
other tags. It is meant for browsers and HTTP servers that can take advantage of it. The
most common meta element involves character set encoding for the document, for example:
<meta charset="utf-8"/>
<body>
</body>
The body element represents the content of the document, i.e., the portion of the document
that will be displayed to the viewer within the client area of the browser's window. In
conforming documents, there is only one body element. The body consists of an arbitrary
mixture of text paragraphs, horizontal rules, headings, lists, hyperlinks, and in-line graphics,
along with embedded tags for character formatting.
2. Text-Flow Tags
The three most commonly used are <p> </p>, <br>, and <hr>. Note that these three tags
represent the two ways that tags are formed: pairs and singletons.
<p>
</p>
Used to indicate the beginning and end of a textual paragraph. Other tags permitting, all the
text between these two tags can be re-flowed by the browser according to the window size,
the metrics of the display fonts, etc.. By convention, browsers use one line height's worth of
white space to set each paragraph apart from the next paragraph.
<br>
Forces a line break, or a hard return. It does not imply the end of a logical paragraph, and a
line terminated with <br> is not followed by any extra white space. Its typical use is to ensure
that the elements of a name and address are displayed on separate lines rather than reflowed and run together by the browser.
<hr>
Draws a horizontal rule (or line) across the client area of its window. It has a secondary effect
of acting like a <br> tag; the flow of the paragraph is interrupted at the point of the <hr> tag,
the <hr> rule is drawn with a “reasonable” amount of white space above and below, and then
text display resumes at the left margin below the rule.
To increase white space, place <p>&nbsp;</p> tags before and after the <hr> tag.
Different browsers create many variations of the horizontal rule but it remains the preferable
way to separate text instead of using hard-coded separators (in-line graphics) because:
1) the browser knows the window's width and system's capabilities and thus can render
the horizontal rule efficiently and appropriately,
2) it also gives documents a consistent look for the viewer, and
3) it is compact and can be transferred faster than a series of underscores or a graphic
file.
<pre>
</pre>
Delimits text that is preformatted. Unlike ordinary text which is displayed in an attractive
proportional font and reflowed at the browser's discretion, the text marked off by <pre> tags
is displayed in a non-proportional font and the arrangement of white space is respected.
<blockquote>
</blockquote>
Frames a quotation or an extract from another source. The quoted text is indented from both
margins, preceded and followed by a browser-dependent amount of extra white space and
may be displayed in a different font.
3. Heading Tags
These elements represent headings for their sections; each have a rank given by the number
in their name. The h1 element is said to have the highest rank, the h6 element has the lowest
rank, and two elements with the same name have equal rank. Must be used in pairs.
<h1> </h1> <h2> </h2> <h3> </h3> <h4> </h4> <h5> </h5> <h6> </h6>
Although the actual font and font size that are used are browser dependent, you can be
assured that a h1 element will be larger and more conspicuous than a h2 element and so on.
Heading tags have an important effect on text flow. When a heading tag is encountered, the
current paragraph is terminated and the heading text is displayed left-aligned and in a visually
distinct font with extra white space preceding and following the heading; text flow is then
resumed at the left margin. The amount of white space: one line height of the same font as
the heading. To add more white space see <hr> above. You cannot reduce the white space.
4. Character-Formatting Tags
These tags can be divided into two subgroups: physical character attributes tags and logical
character attribute tags. The tags are always paired and you insert them directly into the
character stream; they do not affect indentation, spacing, or line breaks.
Physical attribute tags (equivalent to the formatting you would apply in a word processor)
<b>
<i>
<u>
<tt>
</b>
</i>
</u>
</tt>
Boldface
Italic
Underline
Teletype text (non-proportional font like Courier)
Logical attribute tags
More abstract and numerous; roughly analogous to the “character styles” in a word processor.
<address>
<cite>
<code>
<dfn>
<em>
</address>
</cite>
</code>
</dfn>
</em>
Contact information for author of the HTML document (Italic)
Citation (Italic)
Used for HTML directives (large fixed width font)
Definition (Italic)
Emphasis (Italic)
<kbd>
<samp>
<strong>
<var>
</kbd>
</samp>
</strong>
</var>
Keyboard character (fixed width font)
Sample output of a command (fixed width font)
Strong emphasis (boldface)
Program variable (fixed width font)
There is no way to predict what a particular browser will do with these logical tags. Many just
map the various tags to the physical tags. But the future may make them more significant
because the tags tell the browser something about the nature of the text as well as its
appearance. Web authors are encouraged to use them instead of physical tags whenever
appropriate.
5. List-Formatting Tags
Supports three types of lists: ordered <ol>, unordered <ul>, and definition <dl>. The tags
are always paired.
<ol>
</ol>
Ordered Lists
Are intended for sequential operations or algorithms, and the browser automatically generates
numbers for each item in the list. Individual items within are flagged with the <li> pair of
tags.
<ul>
</ul>
Unordered Lists
Are used for simple shopping lists of items where order is irrelevant and each item in the list is
displayed with a preceding bullet. Individual items within are flagged with the <li> pair of
tags.
<dl>
</dl>
Definition Lists
Provide a specialized dictionary-like or glossary-like formatting for terms and their associated
descriptions. Individual items with <dl> lists are delimited by <dt> and <dd> tag pairs to
mark the term and definition respectively.
Nested lists are fully supported and some browsers even make an attempt to use a
reasonable hierarchy of differently shaped or colored bullets for unordered lists. Hanging
indents, spacing, and other critical aspects of lists are not under the author's control. The
later introduction of tables improved the way one can display ordered information.
6. Special Characters
The original character set for HTML documents is the ISO 8859/1, an 8-bit single byte coded
graphical character set, also known as Latin Alphabet Number One, or Latin-1. The 256
character set includes many graphic elements and accented characters needed for text
written in the most widely used European languages as well as English. The first 128
character codes are essentially the traditional ASCII. Today authors should use UTF-8. In
HTML, to declare that the character encoding is UTF-8, the author could include the following
markup near the top of the document (in the head element): <meta charset="utf­8"> When an author needs to use a character not easily produced with their keyboard or to use a
reserved character (<, >, &, “), HTML defines special “escape sequences,” known as
“character entities.” Examples:
<
&lt;
&#60;
>
&gt;
&#62;
&
&amp;
&#38;
“
&quot;
&#34;
©
&#169;
®
&#174;
Non breaking space &nbsp;
7. In-Line Graphics Tags
The most exciting aspect of the WWW is its “multimedia” capability, i.e., the ability to merge
pictures, icons, video clips, and sound seamlessly with the supporting text and present the
result in a visually rich, attractive, and integrated manner. This is the main reason the WWW
exploded out of nowhere during 1994 and virtually overnight eclipsed its text-only
predecessors.
The smooth integration of graphics is more apparent than real. From the document author's
viewpoint, proper handling of graphics is enormously time consuming. Why?
•
•
•
•
•
Image acquisition and ownership issues
aesthetic issues
hyperlink validation issues
technical issues of image formats and palette mapping
performance issues
At the simplest level, graphics elements are in-lined with text by use of the <img> tag. The
tag includes a URL that specifies the actual location of the graphics object in a separate file
and some optional display-tweaking information. In other words, the graphic is not actually
embedded into the HTML document but is incorporated by reference. The URL may be
absolute or relative, so the graphic many be from anywhere.
The full form of the <img> tag:
<img src=”URL” alt=”text” width="num" height="num">
The browser will retrieve any graphics object referred to by the src parameter in a separate
transaction, then merge it into the displayed text according to the optional width and height
parameters.
The optional alt attribute specifies text to be displayed in place of the graphic for character
mode only browsers. The alt attribute on images is a very important accessibility attribute.
Authoring useful alt attribute content requires the author to carefully consider the context in
which the image appears and the function that image may have in that context.
Examples of scenarios where users benefit from text alternatives for images
•
•
•
•
•
•
•
They have a very slow connection and are browsing with images disabled.
They have a vision impairment and use text to speech software.
They have a cognitive impairment and use text to speech software.
They are using a text-only browser.
They are listening to the page being read out by a voice Web browser.
They have images disabled to save on download costs.
They have problems loading images or the source of an image is wrong.
8. Anchor Tags
Used to encode hyperlinks, the colored or underlined portions of text or bitmaps with a special
border that users can click on to jump to another document/object or to another location in the
same document. Basic format:
<a href=”URL”>some text here</a>
The URL is the destination and the user sees the underlined or colored “some text here”. The
URL can be:
•
•
•
Absolute – containing the full host name and file name of the target document/object
Relative – the host name and path is assumed to be the same as the document
containing the anchor tag
Local – a file residing on the machine running the browser rather than a server.
Between the <a> and </a> tags you can insert any amount of text, an <img> tag for an inline graphic, or a combination of the two.
To create a hyperlink label within a target document, use the following general format:
<a href=”URL#label”>user­visible text here</a>
The label must be encoded in the destination document with an anchor tag:
<a name=”label”>user­visible text here</a>
For hyperlinks that merely cause the browser to reposition itself within the current document:
<a href=”#label”>user­visible text here</a>
No URL is required but the # character is mandatory. This is a useful technique for creating
indexes. Unfortunately the BACK button can have an unexpected effect.
9. Comments
Like most programming languages, HTML includes commenting – text that is retained in the
HTML document but not displayed to the viewer. Comments are used to add information to
your documents for the people who maintain them, not for the end user (browser).
Comments are valid both in your document's body and head sections. Please note that
although the comments are not directly visible, they are revealed when browser tools reveal
the source code of your document. General format:
<!­­ some comments
­­>
Note: Never nest comments because the second <!­­ will be ignored but the first ­­> will be
honored causing the ending comment to be exposed to the viewer. Also use caution with
comments surrounding HTML tags because some browsers may go ahead and render it even
inside your comment tags.
10. More About <meta> Tags Inside the <head> Section
Metadata is data (information) about data. The <meta> tag provides metadata about the
HTML document. Metadata will not be displayed on the page, but will be machine parsable.
Meta elements are typically used to specify page description, keywords, author of the
document, last modified, and other metadata. The metadata can be used by browsers (how
to display content or reload page), search engines (keywords), or other web services.
Note: <meta> tags always go inside the <head> element.
Note: Metadata is always passed as name/value pairs.
Note: The content attribute MUST be defined if the name or the http-equiv attribute is defined.
If none of these are defined, the content attribute CANNOT be defined.
Example 1 - Define keywords for search engines:
<meta name="keywords" content="HTML, CSS, XML, XHTML, JavaScript">
Example 2 - Define a description of your web page:
<meta name="description" content="Free Web tutorials on HTML and CSS">
Example 3 - Define the author of a page:
<meta name="author" content="Hege Refsnes">
Example 4 - Refresh document every 30 seconds:
<meta http­equiv="refresh" content="30">