Optimisation

Most Web pages, apart from those with excessive JavaScript content, occupy between 2 and 7KB, which is set by the number of characters in the ‘source’. In practice, most problems associated with loading time are due to the large size of image files. Having said this, a two-colour GIF image actually takes less space than the equivalent HTML text. In addition, such an image, if used several times on the same page, is only cached once, thereby avoiding extra download time.

The process of reducing an HTML file to the smallest possible size is known as optimisation. This can be achieved by removing unwanted characters or elements, or by using tricks with graphics, all of which are described below. The job itself can be done by hand or by using an optimisation application, the latter working on all the HTML files used in your website.

Unnecessary White Space Characters

Although spaces, tabs and CR (carriage return) or LF (line feed) characters are useful during the creation of a Web page, they’re of no use to a browser. Hence text in the hierarchic format of:-

<head>

    <title>

        My Web Page

    </title>

</head>

    <body>

    <p>

        My text

    </p>

</body>

can be reduced to a plain format, such as:-

<head>

<title>My Web Page</title>

</head>

<body>

<p>My text</p>

</body>

or even to a compact format, such as:-

<head><title>My Web Page</title></head><body><p>My text</p></body>

Unnecessary spaces can also be removed from other statements. For example, a line such as:-

<meta name = "keyword" content = "one, two, three" />

can be reduced to:-

<meta name="keyword"content="one,two,three" />

Unnecessary Tag Groups

Some tags don’t actually contribute to the content of a Web page, despite the fact that they contain useful information about the document. For example, many pages don’t include a DOCTYPE declaration, since this isn’t used by most browsers. However, such lines indicate the form of HTML used in the file: without this information it’s impossible to validate the contents (see below).

Some of the meta tags found in the head of a document are unnecessary, although, once again, they can contain useful information. The following tags can usually be removed without any problems:-

<meta http-equiv="author"…

Contains the name author who created the page, which should be retained in copyright material.

<metahttp-equiv="content-type"…

Indicates the kind of content in the Web page and the character set.

<meta name="description"…

Describes the content of the document, which can be of use to a search engine.

<meta http-equiv="generator"…

Contains the name of the application that the document, although this isn’t usually required.

<meta name="keywords"…

Provides a list of key words for search engines. You should use 75 characters or less for this purpose.

Finally, there are those tags that are produced in error by a Web-authoring application. Such programs can produce internal tags, as in the following:-

<b>Bold text. </b><b>More bold text.</b>

instead of:-

<b>Bold text. More Bold text.</b>

while others produce empty tags that are entirely redundant, as in:-

Some text. <b></b>More text.

Unnecessary Closing Tags

The XHTML 1.0 standard demands that every opening tag has a corresponding closing tag. This means that the the following lines:-

<p>First line

<p>Second line

must be replaced by:-

<p>First line</p>

<p>Second line</p>

The older HTML 4.0 specification permits some closing tags to be omitted, including those used for the p, colgroup, dd, dt, li and option elements. Unfortunately, some browsers interpret the junctions of such elements in different ways, meaning that leaving out these tags can significantly change the appearance of a Web page. As a rule, closing tags should always be included.

The closing tags for body and html, although part of the XHTML and HTML standards, aren’t used by browsers such as Internet Explorer or Netscape, although little is gained by omitting them.

Other Tag Considerations

Some tag attributes can influence the speed of page loading. For example, the use of width and height attributes within an image tag can improve performance, while deprecated attributes such as align or hspace often slow things down.

The repeated use of attributes can also make a page sluggish. For example, you should avoid using valign="top" for every td entry in a table. Instead, you should employ <tr valign="top"> at the start of each row. Similarly, you shouldn’t use deprecated tags, especially the font tag, as in <font face="times">, but should use stylesheets wherever possible.

If you’re really desperate to reduce the size of a document you can replace long tags with shorter ones. For example, you could consider replacing all the cite (citation) tags by i (italic) tags. In practice, however, the saving in space is small. In addition, the cite tag conveys a real meaning with regards to the nature of the text, rather than just information about the printed style of the material.

Graphical Tricks

Data can also be minimised by using small graphics. Or, if your page contain repetitive graphics you can build up parts of a picture within a table element. For example the following image

can be created using four images, which are assembled in the following table:-

<table border="0" width="159" cellspacing="0" cellpadding="0">

    <tr valign="top">

        <td colspan="3"><img src="images/title_bar.gif" alt="" /></td>

    </tr>

    <tr valign="top">

        <td><img src="images/left_edge.gif" alt="" /></td>

        <td>

            <br /><br /><br />

            <img src="images/trash.gif" alt="" />

        </td>

        <td><img src="images/right_edge.gif" alt="" /></td>

    </tr>

    <tr valign="top">

        <td colspan="3"><img src="images/base_bar.gif" alt="" /></td>

    </tr>

</table>

Another small graphic file, which you can call spacer.gif, can be added to ‘fine tune’ the positioning of any of the parts.

Validation

Irrespective of whether a page has been optimised or not, it should conform to an appropriate HTML standard. You can check your material using a process known as validation. There are two ways of doing this: you can visit a special website or employ a suitable HTML text-editing application, such as BBEdit. The latter is preferable if you don’t want to spend too much time on the Internet.

Whatever method you choose, your pages must have a suitable DOCTYPE declaration. Failure to include such a declaration can cause your validation software to assume, perhaps incorrectly, that you’re using a specific version of HTML.

Reasons and Effects of Failure

The causes of validation failure are too numerous to describe here. Suffice to say, most problems are due to tags being in the wrong order (often as a result of producing pages in a WYSIWYG Web-authoring application) or incorrectly encompassing other tags. For example, the inline span tag (as well as the deprecated font element) are only allowed inside a single p element. If such tags cover several paragraphs the Web page concerned is bound to fail its validation test.

The implications of failure aren’t obvious, since a page with hundreds of errors can appear perfect, since modern browsers are tolerant of such mistakes. However, some browsers, perhaps on another computer platform, can make different assumptions about your errors, possibly giving an unexpected result. So, to avoid problems, your Web pages should always conform to the relevant standards.

©Ray White 2004.