Internet Internet is defined as an Information super Highway, to access information over the web. However, It can be defined in many ways as follows: Internet is a world-wide global system of interconnected computer networks. Internet uses the standard Internet Protocol (TCP/IP). Every computer in internet is identified by a unique IP address. IP Address is a unique set of numbers (such as 110.22.33.114) which identifies a computer location. A special computer DNS (Domain Name Server) is used to give name to the IP Address so that user can locate a computer by a name. For example, a DNS server will resolve a name http://www.tutorialspoint.com to a particular IP address to uniquely identify the computer on which this website is hosted. Internet is accessible to every user all over the world. 1 Evolution The concept of Internet was originated in 1969 and has undergone several technological & Infrastructural changes as discussed below: The origin of Internet devised from the concept of Advanced Research Project Agency Network (ARPANET). ARPANET was developed by United States Department of Defense. Basic purpose of ARPANET was to provide communication among the various bodies of government. Initially, there were only four nodes, formally called Hosts. In 1972, the ARPANET spread over the globe with 23 nodes located at different countries and thus became known as Internet. By the time, with invention of new technologies such as TCP/IP protocols, DNS, WWW, browsers, scripting languages etc.,Internet provided a medium to publish and access information over the web. 2 Advantages Internet covers almost every aspect of life, one can think of. Here, we will discuss some of the advantages of Internet: Internet allows us to communicate with the people sitting at remote locations. There are various apps available on the wed that uses Internet as a medium for communication. One can find various social networking sites such as: o Facebook o Twitter o Yahoo o Google+ o Flickr o Orkut 3 One can surf for any kind of information over the internet. Information regarding various topics such as Technology, Health & Science, Social Studies, Geographical Information, Information Technology, Products etc can be surfed with help of a search engine. Apart from communication and source of information, internet also serves a medium for entertainment. Following are the various modes for entertainment over internet. o Online Television o Online Games o Songs o Videos o Social Networking Apps Internet allows us to use many services like: o Internet Banking o Matrimonial Services o Online Shopping o Online Ticket Booking o Online Bill Payment o Data Sharing o E-mail Internet provides concept of electronic commerce, that allows the business deals to be conducted on electronic systems Disadvantages 4 However, Internet has prooved to be a powerful source of information in almost every field, yet there exists many disadvanatges discussed below: There are always chances to loose personal information such as name, address, credit card number. Therefore, one should be very careful while sharing such information. One should use credit cards only through authenticated sites. Another disadvantage is the Spamming.Spamming corresponds to the unwanted e-mails in bulk. These e-mails serve no purpose and lead to obstruction of entire system. Virus can easily be spread to the computers connected to internet. Such virus attacks may cause your system to crash or your important data may get deleted. Also a biggest threat on internet is pornography. There are many pornographic sites that can be found, letting your children to use internet which indirectly affects the children healthy mental life. 5 There are various websites that do not provide the authenticated information. This leads to misconception among many people. Packet Switching Shortcomings of message switching gave birth to an idea of packet switching. The entire message is broken down into smaller chunks called packets. The switching information is added in the header of each packet and transmitted independently. It is easier for intermediate networking devices to store small size packets and they do not take much resources either on carrier path or in the internal memory of switches. Packet switching enhances line efficiency as packets from multiple applications can be multiplexed over the carrier. The internet uses packet switching technique. Packet switching enables the user to differentiate data streams based on priorities. Packets are stored and forwarded according to their priority to provide quality of service. Internet Domain Name System Advertisements Previous Page Next Page 6 Overview When DNS was not into existence, one had to file containing host names and their corresponding IP increase in number of hosts of internet, the size of host This resulted in increased traffic on downloading this problem the DNS system was introduced. download a Host address. But with file also increased. file. To solve this Domain Name System helps to resolve the host name to an address. It uses a hierarchical naming scheme and distributed database of IP addresses and associated names IP Address IP address is a unique logical address assigned to a machine over the network. An IP address exhibits the following properties: IP address is the unique address assigned to each host present on Internet. IP address is 32 bits (4 bytes) long. IP address consists of two components: network component and host component. Each of the 4 bytes is represented by a number from 0 to 255, separated with dots. For example 137.170.4.124 IP address is 32-bit number while on the other hand domain names are easy to remember names. For example, when we enter an email address we always enter a symbolic string such as [email protected]. Uniform Resource Locator (URL) Uniform Resource Locator (URL) refers to a web address which uniquely identifies a document over the internet. This document can be a web page, image, audio, video or anything else present on the web. For example, www.tutorialspoint.com/internet_technology/index.htmlis 7 an URL to the index.html which is stored on tutorialspoint web server under internet_technology directory. URL Types There are two forms of URL as listed below: 1. Absolute URL 2. Relative URL ABSOLUTE URL Absolute URL is a complete address of a resource on the web. This completed address comprises of protocol used, server name, path name and file name. For example http:// /index.htm. where: www.tutorialspoint.com http is the protocol. tutorialspoint.com is the server name. index.htm is the file name. / internet_technology The protocol part tells the web browser how to handle the file. Similarly we have some other protocols also that can be used to create URL are: FTP https Gopher mailto news RELATIVE URL Relative URL is a partial address of a webpage. Unlike absolute URL, the protocol and server part are omitted from relative URL. Relative URLs are used for internal links i.e. to create links to file that are part of same website as the WebPages on which you are placing the link. 8 For example, to link an image on tutorialspoint.com/internet_technology/internet_referemce_models, we can use the relative URL which can take the form like /internet_technologies/internet-osi_model.jpg. Difference between Absolute and Relative URL Absolute URL Relative URL Used to link web pages on different websites Used to link web pages within the same website. Difficult to manage. Easy to Manage Changes when the server name or directory name changes Remains same even of we change the server name or directory name. Take time to access Comparatively faster to access. Domain Name System Architecture The Domain name system comprises of Domain Names, Domain Name Space, Name Server that have been described below: Domain Names Domain Name is a symbolic string associated with an IP address. There are several domain names available; some of them are generic such as com, edu, gov, net etc, while some country level domain names such as au, in, za, usetc. The following table shows the Generic Top-Level Domain names: Domain Name Meaning Com Commercial business Edu Education 9 Gov U.S. government agency Int International entity Mil U.S. military Net Networking organization Org Non profit organization The following table shows the Country top-level domain names: Domain Name Meaning au Australia in India cl Chile fr France us United States za South Africa uk United Kingdom jp Japan es Spain de Germany ca Canada ee Estonia 10 hk Hong Kong Domain Name Space The domain name space refers a hierarchy in the internet naming structure. This hierarchy has multiple levels (from 0 to 127), with a root at the top. The following diagram shows the domain name space hierarchy: In the above diagram each subtree represents a domain. Each domain can be partitioned into sub domains and these can be further partitioned and so on. Name Server Name server contains the DNS database. This database comprises of various names and their corresponding IP addresses. Since it is not possible for a single server to maintain entire DNS database, therefore, the information is distributed among many DNS servers. Hierarchy of server is same as hierarchy of names. The entire name space is divided into the zones Zones Zone is collection of nodes (sub domains) under the main domain. The server maintains a database called zone file for every zone. 11 If the domain is not further divided into sub domains then domain and zone refers to the same thing. The information about the nodes in the sub domain is stored in the servers at the lower levels however; the original server keeps reference to these lower levels of servers. TYPES OF NAME SERVERS Following are the three categories of Name Servers that manages the entire Domain Name System: 1. Root Server 2. Primary Server 3. Secondary Server ROOT SERVER Root Server is the top level server which consists of the entire DNS tree. It does not contain the information about domains but delegates the authority to the other server 12 PRIMARY SERVERS Primary Server stores a file about its zone. It has authority to create, maintain, and update the zone file. SECONDARY SERVER Secondary Server transfers complete information about a zone from another server which may be primary or secondary server. The secondary server does not have authority to create or update a zone file. DNS Working DNS translates the domain name into IP address automatically. Following steps will take you through the steps included in domain resolution process: When we type www.tutorialspoint.com into the browser, it asks the local DNS Server for its IP address. Here the local DNS is at ISP end. When the local DNS does not find the IP address of requested domain name, it forwards the request to the root DNS server and again enquires about IP address of it. The root DNS server replies with delegation that I do not know the IP address of www.tutorialspoint.com but know the IP address of DNS Server. The local DNS server then asks the com DNS Server the same question. The com DNS Server replies the same that it does not know the IP address of www.tutorialspont.com but knows the address of tutorialspoint.com. Then the local DNS asks the tutorialspoint.com DNS server the same question. Then tutorialspoint.com DNS server replies with IP address of www.tutorialspoint.com. Now, the local DNS sends the IP address of www.tutorialspoint.com to the computer that sends the request. 13 Internet Services Advertisements Previous Page Next Page Internet Services allows us to access huge amount of information such as text, graphics, sound and software over the internet. Following diagram shows the four different categories of Internet Services. Communication Services There are various Communication Services available that offer exchange of information with individuals or groups. The following table gives a brief introduction to these services: S.N. Service Description 1 Electronic Mail Used to send electronic message over the internet. 2 Telnet Used to log on to a remote computer that is attached to internet. 3 Newsgroup Offers a forum for people to discuss topics of common interests. 14 4 Internet Relay Chat (IRC) Allows the people from all over the world to communicate in real time. 5 Mailing Lists Used to organize group of internet users to share common information through e-mail. 6 Internet Telephony (VoIP) Allows the internet users to talk across internet to any PC equipped to receive the call. 7 Instant Messaging Offers real time chat between individuals and group of people. Eg. Yahoo messenger, MSN messenger. Information Retrieval Services There exist several Information retrieval services offering easy access to information present on the internet. The following table gives a brief introduction to these services: S.N. Service Description 1 File Transfer Protocol (FTP) Enable the users to transfer files. 2 Archie It’s updated database of public FTP sites and their content. It helps to search a file by its name. 3 Gopher Used to search, retrieve, and display documents on remote sites. 4 Very Easy Rodent Oriented Netwide Index to Computer Achieved (VERONICA) VERONICA is gopher based resource. It allows access to the information resource stored on gopher’s servers. Web Services 15 Web services allow exchange of information between applications on the web. Using web services, applications can easily interact with each other. The web services are offered using concept of Utility Computing. World Wide Web (WWW) WWW is also known as W3. It offers a way to access documents spread over the several servers over the internet. These documents may contain texts, graphics, audio, video, hyperlinks. The hyperlinks allow the users to navigate between the documents. Video Conferencing Video conferencing or Video teleconferencing is a method of communicating by two-way video and audio transmission with help of telecommunication technologies. Modes of Video Conferencing POINT-TO-POINT This mode of conferencing connects two locations only. MULTI-POINT This mode of conferencing connects more than two locations through Multipoint Control Unit (MCU). 16 Internet Protocols Advertisements Previous Page Next Page Transmission Control Protocol (TCP) TCP is a connection oriented protocol and offers end-to-end packet delivery. It acts as back bone for connection.It exhibits the following key features: Transmission Control Protocol (TCP) corresponds to the Transport Layer of OSI Model. TCP is a reliable and connection oriented protocol. 17 TCP offers: o Stream Data Transfer. o Reliability. o Efficient Flow Control o Full-duplex operation. o Multiplexing. TCP offers connection oriented end-to-end packet delivery. TCP ensures reliability by sequencing bytes with a forwarding acknowledgement number that indicates to the destination the next byte the source expect to receive. It retransmits the bytes not acknowledged with in specified time period. TCP Services TCP offers following services to the processes at the application layer: Stream Delivery Service Sending and Receiving Buffers Bytes and Segments Full Duplex Service Connection Oriented Service Reliable Service STREAM DELIVER SERVICE TCP protocol is stream oriented because it allows the sending process to send data as stream of bytes and the receiving process to obtain data as stream of bytes. 18 SENDING AND RECEIVING BUFFERS It may not be possible for sending and receiving process to produce and obtain data at same speed, therefore, TCP needs buffers for storage at sending and receiving ends. BYTES AND SEGMENTS The Transmission Control Protocol (TCP), at transport layer groups the bytes into a packet. This packet is called segment. Before transmission of these packets, these segments are encapsulated into an IP datagram. FULL DUPLEX SERVICE Transmitting the data in duplex mode means flow of data in both the directions at the same time. CONNECTION ORIENTED SERVICE TCP offers connection oriented service in the following manner: 1. TCP of process-1 informs TCP of process – 2 and gets its approval. 2. TCP of process – 1 and TCP of process – 2 and exchange data in both the two directions. 3. After completing the data exchange, when buffers on both sides are empty, the two TCP’s destroy their buffers. RELIABLE SERVICE For sake of reliability, TCP uses acknowledgement mechanism. Internet Protocol (IP) Internet Protocol is connectionless and unreliable protocol. It ensures no guarantee of successfully transmission of data. In order to make it reliable, it must be paired with reliable protocol such as TCP at the transport layer. Internet protocol transmits the data in form of a datagram as shown in the following diagram: 19 Points to remember: The length of datagram is variable. The Datagram is divided into two parts: header and data. The length of header is 20 to 60 bytes. The header contains information for routing and delivery of the packet. User Datagram Protocol (UDP) Like IP, UDP is connectionless and unreliable protocol. It doesn’t require making a connection with the host to exchange data. Since UDP is unreliable protocol, there is no mechanism for ensuring that data sent is received. UDP transmits the data in form of a datagram. The UDP datagram consists of five parts as shown in the following diagram: 20 Points to remember: UDP is used by the application that typically transmit small amount of data at one time. UDP provides protocol port used i.e. UDP message contains both source and destination port number, that makes it possible for UDP software at the destination to deliver the message to correct application program. File Transfer Protocol (FTP) FTP is used to copy files from one host to another. FTP offers the mechanism for the same in following manner: FTP creates two processes such as Control Process and Data Transfer Process at both ends i.e. at client as well as at server. FTP establishes two different connections: one is for data transfer and other is for control information. Control connection is made between control processes while Data Connection is made between <="" b="" style="box-sizing: border-box;"> FTP uses port 21 for the control connection and Port 20 for the data connection. 21 Trivial File Transfer Protocol (TFTP) Trivial File Transfer Protocol is also used to transfer the files but it transfers the files without authentication. Unlike FTP, TFTP does not separate control and data information. Since there is no authentication exists, TFTP lacks in security features therefore it is not recommended to use TFTP. Key points TFTP makes use of UDP for data transport. Each TFTP message is carried in separate UDP datagram. The first two bytes of a TFTP message specify the type of message. The TFTP session is initiated when a TFTP client sends a request to upload or download a file. 22 The request is sent from an ephemeral UDP port to the UDP port 69 of an TFTP server. Difference between FTP and TFTP S.N. Parameter FTP TFTP 1 Operation Transferring Files Transferring Files 2 Authentication Yes No 3 Protocol TCP UDP 4 Ports 21 – Control, 20 – Data Port 3214, 69, 4012 5 Control and Data Separated Separated 6 Data Transfer Reliable Unreliable Telnet Telnet is a protocol used to log in to remote computer on the internet. There are a number of Telnet clients having user friendly user interface. The following diagram shows a person is logged in to computer A, and from there, he remote logged into computer B. Hyper Text Transfer Protocol (HTTP) HTTP is a communication protocol. It defines mechanism for communication between browser and the web server. It is also called request and response 23 protocol because the communication between browser and server takes place in request and response pairs. HTTP Request HTTP request comprises of lines which contains: Request line Header Fields Message body Key Points The first line i.e. the Request line specifies the request method i.e. Get or Post. The second line specifies the header which indicates the domain name of the server from where index.htm is retrieved. HTTP Response Like HTTP request, HTTP response also has certain structure. HTTP response contains: Status line Headers Message body E-mail Protocols Advertisements Previous Page Next Page 24 E-mail Protocols are set of rules that help the client to properly transmit the information to or from the mail server. Here in this tutorial, we will discuss various protocols such as SMTP, POP, and IMAP. SMPTP SMTP stands for Simple Mail Transfer Protocol. It was first proposed in 1982. It is a standard protocol used for sending e-mail efficiently and reliably over the internet. Key Points: SMTP is application level protocol. SMTP is connection oriented protocol. SMTP is text based protocol. It handles exchange of messages between e-mail servers over TCP/IP network. Apart from transferring e-mail, SMPT also provides notification regarding incoming mail. When you send e-mail, your e-mail client sends it to your e-mail server which further contacts the recipient mail server using SMTP client. These SMTP commands specify the sender’s and receiver’s e-mail address, along with the message to be send. The exchange of commands between servers is carried out without intervention of any user. In case, message cannot be delivered, an error report is sent to the sender which makes SMTP a reliable protocol. SMTP Commands The following table describes some of the SMTP commands: S.N. Command Description 25 1 HELLO This command initiates the SMTP conversation. 2 EHELLO This is an alternative command to initiate the conversation. ESMTP indicates that the sender server wants to use extended SMTP protocol. 3 MAIL FROM This indicates the sender’s address. 4 RCPT TO It identifies the recipient of the mail. In order to deliver similar message to multiple users this command can be repeated multiple times. 5 SIZE This command let the server know the size of attached message in bytes. 6 DATA The DATA command signifies that a stream of data will follow. Here stream of data refers to the body of the message. 7 QUIT This commands is used to terminate the SMTP connection. 8 VERFY This command is used by the receiving server in order to verify whether the given username is valid or not. 9 EXPN It is same as VRFY, except it will list all the users name when it used with a distribution list. IMAP IMAP stands for Internet Mail Access Protocol. It was first proposed in 1986. There exist five versions of IMAP as follows: 1. Original IMAP 2. IMAP2 26 3. IMAP3 4. IMAP2bis 5. IMAP4 Key Points: IMAP allows the client program to manipulate the e-mail message on the server without downloading them on the local computer. The e-mail is hold and maintained by the remote server. It enables us to take any action such as downloading, delete the mail without reading the mail.It enables us to create, manipulate and delete remote message folders called mail boxes. IMAP enables the users to search the e-mails. It allows concurrent access to multiple mailboxes on multiple mail servers. IMAP Commands The following table describes some of the IMAP commands: S.N. Command Description 1 IMAP_LOGIN This command opens the connection. 2 CAPABILITY This command requests for listing the capabilities that the server supports. 3 NOOP This command is used as a periodic poll for new messages or message status updates during a period of inactivity. 4 SELECT This command helps to select a mailbox to access the messages. 27 5 EXAMINE It is same as SELECT command except no change to the mailbox is permitted. 6 CREATE It is used to create mailbox with a specified name. 7 DELETE It is used to permanently delete a mailbox with a given name. 8 RENAME It is used to change the name of a mailbox. 9 LOGOUT This command informs the server that client is done with the session. The server must send BYE untagged response before the OK response and then close the network connection. POP POP stands for Post Office Protocol. It is generally used to support a single client. There are several versions of POP but the POP 3 is the current standard. Key Points POP is an application layer internet standard protocol. Since POP supports offline access to the messages, thus requires less internet usage time. POP does not allow search facility. In order to access the messaged, it is necessary to download them. It allows only one mailbox to be created on server. It is not suitable for accessing non mail data. POP commands are generally abbreviated into codes of three or four letters. Eg. STAT. 28 POP Commands The following table describes some of the POP commands: S.N. Command Description 1 LOGIN This command opens the connection. 2 STAT It is used to display number of messages currently in the mailbox. 3 LIST It is used to get the summary of messages where each message summary is shown. 4 RETR This command helps to select a mailbox to access the messages. 5 DELE It is used to delete a message. 6 RSET It is used to reset the session to its initial state. 7 QUIT It is used to log off the session. Comparison between POP and IMAP S.N. POP IMAP 1 Generally used to support single client. Designed to handle multiple clients. 2 Messages are accessed offline. Messages are accessed online although it also supports offline mode. 3 POP does not allow search facility. It offers ability to search emails. 29 4 All the messages have to be downloaded. It allows selective transfer of messages to the client. 5 Only one mailbox can be created on the Multiple mailboxes can be created server. on the server. 6 Not suitable for accessing non-mail data. Suitable for accessing non-mail data i.e. attachment. 7 POP commands are generally abbreviated into codes of three or four letters. Eg. STAT. IMAP commands are not abbreviated, they are full. Eg. STATUS. 8 It requires minimum use of server resources. Clients are totally dependent on server. 9 Mails once downloaded cannot be accessed from some other location. Allows mails to be accessed from multiple locations. 10 The e-mails are not downloaded automatically. Users can view the headings and sender of e-mails and then decide to download. 10 POP requires less internet usage time. IMAP requires more internet usage time. Markup language From Wikipedia, the free encyclopedia For other uses, see Markup (disambiguation). 30 Example of RecipeBook, a simple markup language based on XML for creating recipes. The markup can be converted to HTML, PDF and Rich Text Format using a programming languageor XSL. A markup language is a system for annotating a document in a way that is syntactically distinguishable from the text.[1] The idea and terminology evolved from the "marking up" of paper manuscripts, i.e., the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts.[citation needed] In digital media this "blue pencil instruction text" was replaced by tags, that is, instructions are expressed directly by tags or "instruction text encapsulated by tags." Examples include typesetting instructions such as those found in troff, TeX and LaTeX, or structural markers such as XML tags. Markup instructs the software that displays the text to carry out appropriate actions, but is omitted from the version of the text that users see. Some markup languages, such as the widely used HTML, have pre-defined presentation semantics—meaning that their specification prescribes how to present the structured data. Others, such as XML, do not have them and are general purpose. HyperText Markup Language (HTML), one of the document formats of the World Wide Web, is an instance of SGML (though, strictly, it does not comply with all the rules of SGML), and follows many of the markup conventions used in the publishing industry in the communication of printed work between authors, editors, and printers.[citation needed] Types[edit] There are three main general categories of electronic markup:[2][3] Presentational markup The kind of markup used by traditional word-processing systems: binary codes embedded within document text that produce the WYSIWYG effect. Such markup is usually hidden from human users, even authors or editors. Procedural markup Markup is embedded in text and provides instructions for programs that are to process the text. Well-known examples include troff, TeX, and PostScript. It is expected that the processor will run through the text from beginning to end, following the instructions as encountered. Text with such markup is often edited with the markup visible and directly manipulated by the author. Popular procedural-markup systems usually include programming constructs, so macros or subroutines can be defined and invoked by name. Descriptive markup Markup is used to label parts of the document rather than to provide specific instructions as to how they should be processed. Well-known examples include LaTeX, HTML, and XML. The objective is to decouple the inherent structure of the document from any particular treatment or rendition of it. Such markup is often described as "semantic". An example of descriptive markup would be HTML's <cite> tag, which is used to label a citation. Descriptive markup—sometimes called logical markup or conceptual markup—encourages authors to write in a way that describes the material conceptually, rather than visually.[4] There is considerable blurring of the lines between the types of markup. In modern word-processing systems, presentational markup is often saved in descriptive-markuporiented systems such as XML, and then processed procedurally by implementations. The programming constructs in procedural-markup systems such as TeX may be used to create higher-level markup systems that are more descriptive, such as LaTeX. 31 In recent years, a number of small and largely unstandardized markup languages have been developed to allow authors to create formatted text via web browsers, for use in wikis and web forums. These are sometimes called lightweight markup languages. Markdown or the markup language used by Wikipedia are examples of such wiki markup. History[edit] Etymology and origin[edit] The term markup is derived from the traditional publishing practice of "marking up" a manuscript, which involves adding handwritten annotations in the form of conventional symbolic printer's instructions in the margins and text of a paper manuscript or printed proof. For centuries, this task was done primarily by skilled typographers known as "markup men"[5] or "copy markers"[6] who marked up text to indicate what typeface, style, and size should be applied to each part, and then passed the manuscript to others for typesetting by hand. Markup was also commonly applied by editors, proofreaders, publishers, and graphic designers, and indeed by document authors. GenCode[edit] The first well-known public presentation of markup languages in computer text processing was made by William W. Tunnicliffe at a conference in 1967, although he preferred to call it generic coding. It can be seen as a response to the emergence of programs such as RUNOFF that each used their own control notations, often specific to the target typesetting device. In the 1970s, Tunnicliffe led the development of a standard called GenCode for the publishing industry and later was the first chair of the International Organization for Standardization committee that created SGML, the first standard descriptive markup language. Book designer Stanley Rice published speculation along similar lines in 1970.[7] Brian Reid, in his 1980 dissertation at Carnegie Mellon University, developed the theory and a working implementation of descriptive markup in actual use. However, IBM researcher Charles Goldfarb is more commonly seen today as the "father" of markup languages. Goldfarb hit upon the basic idea while working on a primitive document management system intended for law firms in 1969, and helped invent IBM GML later that same year. GML was first publicly disclosed in 1973. In 1975, Goldfarb moved from Cambridge, Massachusetts to Silicon Valley and became a product planner at the IBM Almaden Research Center. There, he convinced IBM's executives to deploy GML commercially in 1978 as part of IBM's Document Composition Facility product, and it was widely used in business within a few years. SGML, which was based on both GML and GenCode, was developed by Goldfarb in 1974.[8] Goldfarb eventually became chair of the SGML committee. SGML was first released by ISO as the ISO 8879 standard in October 1986. troff and nroff[edit] Some early examples of computer markup languages available outside the publishing industry can be found in typesetting tools on Unix systems such as troff and nroff. In these systems, formatting commands were inserted into the document text so that typesetting software could format the text according to the editor's specifications. It was a trial and error iterative process to get a document printed correctly.[9] Availability of WYSIWYG ("what you see is what you get") publishing software supplanted much 32 use of these languages among casual users, though serious publishing work still uses markup to specify the non-visual structure of texts, and WYSIWYG editors now usually save documents in a markup-language-based format. TeX[edit] Another major publishing standard is TeX, created and refined by Donald Knuth in the 1970s and '80s. TeX concentrated on detailed layout of text and font descriptions to typeset mathematical books. This required Knuth to spend considerable time investigating the art of typesetting. TeX is mainly used in academia, where it is a de facto standard in many scientific disciplines. A TeX macro package known as LaTeX provides a descriptive markup system on top of TeX, and is widely used. Scribe, GML and SGML[edit] Main articles: Scribe (markup language), IBM Generalized Markup Language, and Standard Generalized Markup Language The first language to make a clean distinction between structure and presentation was Scribe, developed by Brian Reid and described in his doctoral thesis in 1980.[10] Scribe was revolutionary in a number of ways, not least that it introduced the idea of styles separated from the marked up document, and of a grammar controlling the usage of descriptive elements. Scribe influenced the development of Generalized Markup Language (later SGML) and is a direct ancestor to HTML and LaTeX. In the early 1980s, the idea that markup should be focused on the structural aspects of a document and leave the visual presentation of that structure to the interpreter led to the creation of SGML. The language was developed by a committee chaired by Goldfarb. It incorporated ideas from many different sources, including Tunnicliffe's project, GenCode. Sharon Adler, Anders Berglund, and James A. Marke were also key members of the SGML committee. SGML specified a syntax for including the markup in documents, as well as one for separately describing what tags were allowed, and where (the Document Type Definition (DTD) or schema). This allowed authors to create and use any markup they wished, selecting tags that made the most sense to them and were named in their own natural languages. Thus, SGML is properly a meta-language, and many particular markup languages are derived from it. From the late '80s on, most substantial new markup languages have been based on SGML system, including for example TEI and DocBook. SGML was promulgated as an International Standard by International Organization for Standardization, ISO 8879, in 1986. SGML found wide acceptance and use in fields with very large-scale documentation requirements. However, many found it cumbersome and difficult to learn—a side effect of its design attempting to do too much and be too flexible. For example, SGML made end tags (or start-tags, or even both) optional in certain contexts, because its developers thought markup would be done manually by overworked support staff who would appreciate saving keystrokes[citation needed]. HTML[edit] Main article: HTML In 1989, physicist Sir Tim Berners-Lee wrote a memo proposing an Internetbased hypertext system,[11] then specified HTML and wrote the browser and server software in the last part of 1990. The first publicly available description of HTML was a document called "HTML Tags", first mentioned on the Internet by Berners-Lee in late 1991.[12][13] It describes 18 elements comprising the initial, relatively simple design of 33 HTML. Except for the hyperlink tag, these were strongly influenced by SGMLguid, an inhouse SGML-based documentation format at CERN. Eleven of these elements still exist in HTML 4.[14] Berners-Lee considered HTML an SGML application. The Internet Engineering Task Force (IETF) formally defined it as such with the mid-1993 publication of the first proposal for an HTML specification: "Hypertext Markup Language (HTML)" InternetDraft by Berners-Lee and Dan Connolly, which included an SGML Document Type Definition to define the grammar.[15] Many of the HTML text elements are found in the 1988 ISO technical report TR 9537 Techniques for using SGML, which in turn covers the features of early text formatting languages such as that used by the RUNOFF command developed in the early 1960s for the CTSS (Compatible Time-Sharing System) operating system. These formatting commands were derived from those used by typesetters to manually format documents. Steven DeRose[16] argues that HTML's use of descriptive markup (and influence of SGML in particular) was a major factor in the success of the Web, because of the flexibility and extensibility that it enabled. HTML became the main markup language for creating web pages and other information that can be displayed in a web browser, and is quite likely the most used markup language in the world today. XML[edit] Main article: XML XML (Extensible Markup Language) is a meta markup language that is now widely used. XML was developed by the World Wide Web Consortium, in a committee created and chaired by Jon Bosak. The main purpose of XML was to simplify SGML by focusing on a particular problem—documents on the Internet.[17] XML remains a meta-language like SGML, allowing users to create any tags needed (hence "extensible") and then describing those tags and their permitted uses. XML adoption was helped because every XML document can be written in such a way that it is also an SGML document, and existing SGML users and software could switch to XML fairly easily. However, XML eliminated many of the more complex and humanoriented features of SGML to simplify implementation environments such as documents and publications. However, it appeared to strike a happy medium between simplicity and flexibility, and was rapidly adopted for many other uses. XML is now widely used for communicating data between applications. XHTML[edit] Main article: XHTML Since January 2000, all W3C Recommendations for HTML have been based on XML rather than SGML, using the abbreviation XHTML (Extensible HyperText Markup Language). The language specification requires that XHTML Web documents must be well-formed XML documents. This allows for more rigorous and robust documents while using tags familiar from HTML. One of the most noticeable differences between HTML and XHTML is the rule that all tags must be closed: empty HTML tags such as <br> must either be closed with a regular end-tag, or replaced by a special form: <br /> (the space before the ' / ' on the end tag is optional, but frequently used because it enables some pre-XML Web browsers, and SGML parsers, to accept the tag). Another is that all attribute values in tags must be quoted. Finally, all tag and attribute names within the XHTML namespace must be lowercase to be valid. HTML, on the other hand, was case-insensitive. 34 Other XML-based applications[edit] Many XML-based applications now exist, including the Resource Description Framework as RDF/XML, XForms, DocBook, SOAP, and the Web Ontology Language (OWL). For a partial list of these, see List of XML markup languages. Features[edit] A common feature of many markup languages is that they intermix the text of a document with markup instructions in the same data stream or file. This is not necessary; it is possible to isolate markup from text content, using pointers, offsets, IDs, or other methods to co-ordinate the two. Such "standoff markup" is typical for the internal representations that programs use to work with marked-up documents. However, embedded or "inline" markup is much more common elsewhere. Here, for example, is a small section of text marked up in HTML: <h1>Anatidae</h1> <p> The family <i>Anatidae</i> includes ducks, geese, and swans, but <em>not</em> the closely related screamers. </p> The codes enclosed in angle-brackets <like this> are markup instructions (known as tags), while the text between these instructions is the actual text of the document. The codes h1 , p , and em are examples of semantic markup, in that they describe the intended purpose or meaning of the text they include. Specifically, h1 means "this is a first-level heading", p means "this is a paragraph", and em means "this is an emphasized word or phrase". A program interpreting such structural markup may apply its own rules or styles for presenting the various pieces of text, using different typefaces, boldness, font size, indentation, colour, or other styles, as desired. A tag such as "h1" (header level 1) might be presented in a large bold sans-serif typeface, for example, or in a monospaced (typewriter-style) document it might be underscored – or it might not change the presentation at all. In contrast, the i tag in HTML is an example of presentational markup; it is generally used to specify a particular characteristic of the text (in this case, the use of an italic typeface) without specifying the reason for that appearance. The Text Encoding Initiative (TEI) has published extensive guidelines[18] for how to encode texts of interest in the humanities and social sciences, developed through years of international cooperative work. These guidelines are used by projects encoding historical documents, the works of particular scholars, periods, or genres, and so on. 35
© Copyright 2026 Paperzz