Hypertext Markup Language (HTML) Is a throw back to another age that runs counter to the last decades' trends towards graphically-based WYSIWYG word processing. If your first experience with word processing was with a GUI based application like MS Word then HTML scripting seemed primitive and broken. But people use to computers (like the Unix world) saw it as command-driven and state sensitive. An HTML document is an ordinary text file and the document's ultimate appearance is controlled by “magical sequences of characters scattered about the text that is the document's real 'message.'” Preparing a document in HTML is more like application development than desktop publishing because it involves an endless cycle of modifying the HTML source in a text editor, loading the file into a browser to see how it looks and prints, figuring out what the problems are, and going back to the text editor to make changes. Some things to remember since HTML does not have the same goals as a word processor: • • • It is a method of encoding complex hypermedia documents that focus on document structure rather than physical appearance. Documents have to be portable across every imaginable CPU architecture, operating system, network transport, and methods of mass storage. There is no way to predict the capabilities of the system on which the document would be viewed or printed, even to the level of a minimum screen resolution, number of colors, or included fonts. HTML Structure HTML formatting commands, know as tags, are merely reserved sequences of characters that start with a < (less than) and end with > (greater than). HTML tags are not case sensitive. In most cases, the tags are used in symmetric pairs with the final tag indicated by the inclusion of the / (forward slash) character. <x>This is text.</x> This scheme of symmetric tags comes from the industry-standard SGML (Standard Generalized Markup Language). HTML is based on SGML – it is described by an SGML document type definition (DTD) but it doesn't begin to exploit the full capabilities provided by SGML. There have been five different versions or levels of HTML currently coexisting on the WWW: HTML 0.9 / 1.0 (1989 - 1994) Concerns itself mainly with control of headings, lists, and character formatting. It is very easy to learn and use. It has eight categories of commands: 1. Structural – identify a valid HTML document and indicate the beginning and end of logical sections within the document. 2. Text-Flow – indicate the ends of paragraphs, forced line breaks, headings, indented or pre-formatted text. 3. Headings – provide for the formatting of six different levels of headings and subheadings. 4. Character-Formatting – apply a “physical” or “logical” style to a stream of characters. 5. List – are used to format several different kinds of lists. 6. Special Character Escape Sequences – used to display glyphs that are not in the ASCII character set, characters that cannot be entered with the author's keyboard, or characters that would otherwise be interpreted as HTML commands. 7. In-Line Graphics – identify an external file as a graphic resource to be retrieved and displayed within the document's text by the browser. 8. Anchor – creates a hyperlink or serves as a target if a hyperlink. HTML 2.0 (1995) Added menu commands and interactive forms. Browser makers started to create their own features (thus requiring additional tags to use the features but the tags themselves were not part of the actual HTML specification). Between HTML 1.0 and HTML 2.0 W3C was formed. HTML 3.0 (1997) Adds background bitmaps, a rich set of commands for formatting tables, and expanded options for form elements. This version also allowed web pages to include complex mathematical equations. Because W3C delayed agreeing on the next version (after HTML 2.0) of HTML, HTML 3.2 was created instead of HTML 3.0. It included support for CSS (cascaded style sheets), but browser manufactures did not support it very well in their browsers. Browser manufactures instead included support for frames even though HTML 3.2 specification did not support this feature. HTML 4.0 (1999) Adopted many browser-specific element types and attributes, but at the same time sought to phase out Netscape's visual markup features by marking them as deprecated in favor of style sheets. This version added support for style sheets and scripting ability for multimedia elements. HTML 4.01 focused on separating presentation styling information from the actual content by the use of style sheets as HTML 3.20 resulted in difficult maintenance because presentation styling information was included directly in the web page. In HTML 4.0 with the use of style sheets, it is now possible to change the appearance/look of the website by changing just the style sheet (s) itself. In comparison, in the earlier versions of HTML making the same changes for the entire website meant changing the styling information in the individual pages! (A site with many pages would have meant many changes need to be made before the appearance of the website could be changed.) HTML 5.0 (2014) Its core aims are to improve the language with support for the latest multimedia while keeping it easily readable by humans and consistently understood by computers and devices (web browsers, parsers, etc.). HTML5 is intended to subsume not only HTML 4, but also XHTML 1 and DOM Level 2 HTML. HTML5 is a response to the fact that the HTML and XHTML in common use on the World Wide Web have a mixture of features introduced by various specifications, along with those introduced by software products such as web browsers and those established by common practice. It is also an attempt to define a single markup language. It includes detailed processing models to encourage more inter-operable implementations; it extends, improves and rationalizes the markup available for documents, and introduces markup and application programming interfaces (APIs) for complex web applications. For the same reasons, HTML5 is also a potential candidate for cross-platform mobile applications. Many features of HTML5 have been built with the consideration of being able to run on low-powered devices such as smartphones and tablets. In particular, HTML5 adds many new syntactic features. These include the new <video>, <audio> and <canvas> elements, as well as the integration of scalable vector graphics (SVG) content (replacing generic <object> tags), and MathML for mathematical formulas. These features are designed to make it easy to include and handle multimedia and graphical content on the web without having to resort to proprietary plugins and APIs. Other new page structure elements, such as <main>, <section>, <article>, <header>, <footer>, <aside>, <nav> and <figure>, are designed to enrich the semantic content of documents. New attributes have been introduced, some elements and attributes have been removed and some elements, such as <a>, <cite> and <menu> have been changed, redefined or standardized. The APIs and Document Object Model (DOM) are no longer afterthoughts, but are fundamental parts of the HTML5 specification. HTML5 also defines in some detail the required processing for invalid documents so that syntax errors will be treated uniformly by all conforming browsers and other user agents. Basic HTML 5 Documents A basic HTML document looks like this: <!DOCTYPE html> <html lang="en"> <head> <title>Sample page</title> <meta charset="utf8"/> </head> <body> <h1>Sample page</h1> <p>This is a <a href="demo.html">simple</a> sample.</p> <! this is a comment > </body> </html> Elements of HTML 1. Structural Tags The three most important HTML tags in this group are <html>, <head>, and <body>. They provide fundamental identification and document organization information for the browser that retrieves the document. A DOCTYPE is a required preamble and required for legacy reasons. When omitted, browsers tend to use a different rendering mode that is incompatible with some specifications. Including the DOCTYPE in a document ensures that the browser makes a best-effort attempt at following the relevant specifications. The DOCTYPE for HTML 5 is <!DOCTYPE html>. <html> </html> The html element represents the root of an HTML document. It is placed at the beginning and end of the entire document, respectively, and identifies the document as a valid document encoded in a particular version of the hypertext markup language. In more precise language: these tags bound the portion of the document that the browser is expected to understand and render on the screen for the viewer. Note: Browsers are able to cope with documents missing this tag but it is a good practice to include it for ones that do not. Authors are encouraged to specify a lang attribute on the root html element, giving the document's language. This aids speech synthesis tools to determine what pronunciations to use, translation tools to determine what rules to use, and so forth. <!DOCTYPE html> <html lang="en"> <head> </head> The head element represents a collection of metadata for the Document. It is important to note that nothing in the head section appears in the client section of the browser's window. Only a few tags are allowed to appear between these tags. The most important tag within the head section is the <title> </title> Text between the <title> tags gets displayed by the browser in the title bar of its window but more importantly, this is the text that becomes a bookmark in the Bookmark list of the browser. Authors should be specific and brief in their title name. There must be no more than one title element per document. The meta element represents various kinds of metadata that cannot be expressed using other tags. It is meant for browsers and HTTP servers that can take advantage of it. The most common meta element involves character set encoding for the document, for example: <meta charset="utf-8"/> <body> </body> The body element represents the content of the document, i.e., the portion of the document that will be displayed to the viewer within the client area of the browser's window. In conforming documents, there is only one body element. The body consists of an arbitrary mixture of text paragraphs, horizontal rules, headings, lists, hyperlinks, and in-line graphics, along with embedded tags for character formatting. 2. Text-Flow Tags The three most commonly used are <p> </p>, <br>, and <hr>. Note that these three tags represent the two ways that tags are formed: pairs and singletons. <p> </p> Used to indicate the beginning and end of a textual paragraph. Other tags permitting, all the text between these two tags can be re-flowed by the browser according to the window size, the metrics of the display fonts, etc.. By convention, browsers use one line height's worth of white space to set each paragraph apart from the next paragraph. <br> Forces a line break, or a hard return. It does not imply the end of a logical paragraph, and a line terminated with <br> is not followed by any extra white space. Its typical use is to ensure that the elements of a name and address are displayed on separate lines rather than reflowed and run together by the browser. <hr> Draws a horizontal rule (or line) across the client area of its window. It has a secondary effect of acting like a <br> tag; the flow of the paragraph is interrupted at the point of the <hr> tag, the <hr> rule is drawn with a “reasonable” amount of white space above and below, and then text display resumes at the left margin below the rule. To increase white space, place <p> </p> tags before and after the <hr> tag. Different browsers create many variations of the horizontal rule but it remains the preferable way to separate text instead of using hard-coded separators (in-line graphics) because: 1) the browser knows the window's width and system's capabilities and thus can render the horizontal rule efficiently and appropriately, 2) it also gives documents a consistent look for the viewer, and 3) it is compact and can be transferred faster than a series of underscores or a graphic file. <pre> </pre> Delimits text that is preformatted. Unlike ordinary text which is displayed in an attractive proportional font and reflowed at the browser's discretion, the text marked off by <pre> tags is displayed in a non-proportional font and the arrangement of white space is respected. <blockquote> </blockquote> Frames a quotation or an extract from another source. The quoted text is indented from both margins, preceded and followed by a browser-dependent amount of extra white space and may be displayed in a different font. 3. Heading Tags These elements represent headings for their sections; each have a rank given by the number in their name. The h1 element is said to have the highest rank, the h6 element has the lowest rank, and two elements with the same name have equal rank. Must be used in pairs. <h1> </h1> <h2> </h2> <h3> </h3> <h4> </h4> <h5> </h5> <h6> </h6> Although the actual font and font size that are used are browser dependent, you can be assured that a h1 element will be larger and more conspicuous than a h2 element and so on. Heading tags have an important effect on text flow. When a heading tag is encountered, the current paragraph is terminated and the heading text is displayed left-aligned and in a visually distinct font with extra white space preceding and following the heading; text flow is then resumed at the left margin. The amount of white space: one line height of the same font as the heading. To add more white space see <hr> above. You cannot reduce the white space. 4. Character-Formatting Tags These tags can be divided into two subgroups: physical character attributes tags and logical character attribute tags. The tags are always paired and you insert them directly into the character stream; they do not affect indentation, spacing, or line breaks. Physical attribute tags (equivalent to the formatting you would apply in a word processor) <b> <i> <u> <tt> </b> </i> </u> </tt> Boldface Italic Underline Teletype text (non-proportional font like Courier) Logical attribute tags More abstract and numerous; roughly analogous to the “character styles” in a word processor. <address> <cite> <code> <dfn> <em> </address> </cite> </code> </dfn> </em> Contact information for author of the HTML document (Italic) Citation (Italic) Used for HTML directives (large fixed width font) Definition (Italic) Emphasis (Italic) <kbd> <samp> <strong> <var> </kbd> </samp> </strong> </var> Keyboard character (fixed width font) Sample output of a command (fixed width font) Strong emphasis (boldface) Program variable (fixed width font) There is no way to predict what a particular browser will do with these logical tags. Many just map the various tags to the physical tags. But the future may make them more significant because the tags tell the browser something about the nature of the text as well as its appearance. Web authors are encouraged to use them instead of physical tags whenever appropriate. 5. List-Formatting Tags Supports three types of lists: ordered <ol>, unordered <ul>, and definition <dl>. The tags are always paired. <ol> </ol> Ordered Lists Are intended for sequential operations or algorithms, and the browser automatically generates numbers for each item in the list. Individual items within are flagged with the <li> pair of tags. <ul> </ul> Unordered Lists Are used for simple shopping lists of items where order is irrelevant and each item in the list is displayed with a preceding bullet. Individual items within are flagged with the <li> pair of tags. <dl> </dl> Definition Lists Provide a specialized dictionary-like or glossary-like formatting for terms and their associated descriptions. Individual items with <dl> lists are delimited by <dt> and <dd> tag pairs to mark the term and definition respectively. Nested lists are fully supported and some browsers even make an attempt to use a reasonable hierarchy of differently shaped or colored bullets for unordered lists. Hanging indents, spacing, and other critical aspects of lists are not under the author's control. The later introduction of tables improved the way one can display ordered information. 6. Special Characters The original character set for HTML documents is the ISO 8859/1, an 8-bit single byte coded graphical character set, also known as Latin Alphabet Number One, or Latin-1. The 256 character set includes many graphic elements and accented characters needed for text written in the most widely used European languages as well as English. The first 128 character codes are essentially the traditional ASCII. Today authors should use UTF-8. In HTML, to declare that the character encoding is UTF-8, the author could include the following markup near the top of the document (in the head element): <meta charset="utf8"> When an author needs to use a character not easily produced with their keyboard or to use a reserved character (<, >, &, “), HTML defines special “escape sequences,” known as “character entities.” Examples: < < < > > > & & & “ " " © © ® ® Non breaking space 7. In-Line Graphics Tags The most exciting aspect of the WWW is its “multimedia” capability, i.e., the ability to merge pictures, icons, video clips, and sound seamlessly with the supporting text and present the result in a visually rich, attractive, and integrated manner. This is the main reason the WWW exploded out of nowhere during 1994 and virtually overnight eclipsed its text-only predecessors. The smooth integration of graphics is more apparent than real. From the document author's viewpoint, proper handling of graphics is enormously time consuming. Why? • • • • • Image acquisition and ownership issues aesthetic issues hyperlink validation issues technical issues of image formats and palette mapping performance issues At the simplest level, graphics elements are in-lined with text by use of the <img> tag. The tag includes a URL that specifies the actual location of the graphics object in a separate file and some optional display-tweaking information. In other words, the graphic is not actually embedded into the HTML document but is incorporated by reference. The URL may be absolute or relative, so the graphic many be from anywhere. The full form of the <img> tag: <img src=”URL” alt=”text” width="num" height="num"> The browser will retrieve any graphics object referred to by the src parameter in a separate transaction, then merge it into the displayed text according to the optional width and height parameters. The optional alt attribute specifies text to be displayed in place of the graphic for character mode only browsers. The alt attribute on images is a very important accessibility attribute. Authoring useful alt attribute content requires the author to carefully consider the context in which the image appears and the function that image may have in that context. Examples of scenarios where users benefit from text alternatives for images • • • • • • • They have a very slow connection and are browsing with images disabled. They have a vision impairment and use text to speech software. They have a cognitive impairment and use text to speech software. They are using a text-only browser. They are listening to the page being read out by a voice Web browser. They have images disabled to save on download costs. They have problems loading images or the source of an image is wrong. 8. Anchor Tags Used to encode hyperlinks, the colored or underlined portions of text or bitmaps with a special border that users can click on to jump to another document/object or to another location in the same document. Basic format: <a href=”URL”>some text here</a> The URL is the destination and the user sees the underlined or colored “some text here”. The URL can be: • • • Absolute – containing the full host name and file name of the target document/object Relative – the host name and path is assumed to be the same as the document containing the anchor tag Local – a file residing on the machine running the browser rather than a server. Between the <a> and </a> tags you can insert any amount of text, an <img> tag for an inline graphic, or a combination of the two. To create a hyperlink label within a target document, use the following general format: <a href=”URL#label”>uservisible text here</a> The label must be encoded in the destination document with an anchor tag: <a name=”label”>uservisible text here</a> For hyperlinks that merely cause the browser to reposition itself within the current document: <a href=”#label”>uservisible text here</a> No URL is required but the # character is mandatory. This is a useful technique for creating indexes. Unfortunately the BACK button can have an unexpected effect. 9. Comments Like most programming languages, HTML includes commenting – text that is retained in the HTML document but not displayed to the viewer. Comments are used to add information to your documents for the people who maintain them, not for the end user (browser). Comments are valid both in your document's body and head sections. Please note that although the comments are not directly visible, they are revealed when browser tools reveal the source code of your document. General format: <! some comments > Note: Never nest comments because the second <! will be ignored but the first > will be honored causing the ending comment to be exposed to the viewer. Also use caution with comments surrounding HTML tags because some browsers may go ahead and render it even inside your comment tags. 10. More About <meta> Tags Inside the <head> Section Metadata is data (information) about data. The <meta> tag provides metadata about the HTML document. Metadata will not be displayed on the page, but will be machine parsable. Meta elements are typically used to specify page description, keywords, author of the document, last modified, and other metadata. The metadata can be used by browsers (how to display content or reload page), search engines (keywords), or other web services. Note: <meta> tags always go inside the <head> element. Note: Metadata is always passed as name/value pairs. Note: The content attribute MUST be defined if the name or the http-equiv attribute is defined. If none of these are defined, the content attribute CANNOT be defined. Example 1 - Define keywords for search engines: <meta name="keywords" content="HTML, CSS, XML, XHTML, JavaScript"> Example 2 - Define a description of your web page: <meta name="description" content="Free Web tutorials on HTML and CSS"> Example 3 - Define the author of a page: <meta name="author" content="Hege Refsnes"> Example 4 - Refresh document every 30 seconds: <meta httpequiv="refresh" content="30">
© Copyright 2026 Paperzz