Intro to XML

PAGE (2 of 4)

HTML versus XML

The most salient difference between HTML and XML is that HTML describes presentation and XML describes content. An HTML document rendered in a web browser is human readable. XML is aimed toward being both human and machine readable.

Consider the following HTML.

<html>
<head><title>Books</title><head>

<body>

<h2>Books</h2>
<hr>

<em>Sense and Sensibility</em>, <b>Jane Austen</b>, 1811<br>
<em>Pride and Prejudice</em>, <b>Jane Austen</b>, 1813<br>
<em>Alice in Wonderland</em>, <b>Lewis Carroll</b>, 1866<br>
<em>Through the Looking Glass<</em>, <b>Lewis Carroll</b>, 1872<br> 

</body>
</html>

The previous HTML is rendered in a browser as follows.

The HTML above describes how bibliography information is to be presented and formatted for a human to view in a web browser. Knowing that Sense and Sensibility is enclosed in italic tags does not however help a program determine that it is the title of a book. XML attempts to describe web data to address this void.

The following is XML describing the contents of the books HTML page above.

<books>
   <book>
      <title>Sense and Sensibility</title>
      <author>Jane Austen</author>
      <year>1811</year>
   </book>

   <book>
      <title>Pride and Prejudice</title>
      <author>Jane Austen</author>
      <year>1813</year>
   </book>

   <book>
      <title>Alice in Wonderland</title>
      <author>Lewis Carroll</author>
      <year>1866</year>
   </book>

   <book>
      <title>Through the Looking Glass</title>
      <author>Lewis Carroll</author>
      <year>1872</year>
   </book>
</books>

A program parsing this data can take advantage of the fact that all book titles are enclosed in <title> tags. Where would such a program find such information? An XML document may contain an optional description of its grammar. A grammar describes which tags are used in the XML document and how such tags can be nested. A grammar is a schema or road map for the XML document. Originally an XML grammar was specified in a DTD (Document Type Definition). A newer standard however, XSchema (XML Schema) has been adopted. XSchema addresses some of the limitations of DTDs.

As can be seen above, XML does not contain any information indicating how the document should be rendered in a browser. Therefore, XML factors data from presentation. The beauty of this feature is that the same data can be presented in a variety of ways without having to replicate any data (e.g., consider making book titles bold and authors italic).

How XML syntax differs from HTML


Author: Saverio Perugini Computer Science Dept : VA TECH. (c) Copyright 2002.
Last Updated: 3/18/2003