Each web page on the internet consists of a simple text file composed in Hypertext Markup Language (HTML). This is one of a family of markup languages used to store information on computer systems. Other varieties include Extensible Markup Language (XML) and Server HTML (SHTML), which is the same as HTML, but launched from a web server.
A text document used for a web page must have a filename extension (the dot and letters at the end of the name) of
.html. There are many different types of text file, some designed for specific computer operating systems, but the variety almost universally used for web pages is the 8‑bit Unicode Transformation Format (UTF‑8).
Any kind of text, of any language, including pictographic characters and emojis, can be entered into a UTF‑8 document. If you create a web page using a UTF‑8 text file all the characters you type or enter will be faithfully reproduced when viewed in a modern web browser.
To create a web page you’ll need a specialised text editor, one that incorporates utilities for processing HTML content, as well as a preview facility, allowing you to view the page as seen in a web browser.
To create a web page you should open a new text document, ensuring that the formatting options are set to Unicode (UTF‑8) and that the line endings are set to Unix (LF). Line Feed (LF) is the control code used to begin a new line in text files employed in the Unix computer operating system, a technology widely employed for web servers.
All markup languages use angle bracket characters,
＜ (also known as ‘less than’) and
﹥ (also known as ‘greater than’), to delineate tags. Here’s a simple example:
﹤p﹥This is ﹤em﹥emphasised﹤/em﹥ text.﹤/p﹥
Seen in a browser, this appears as a paragraph of text with a portion emphasised, like this:
The spacing above and below the paragraph, as well as the size and style of of the font, and the variation caused by emphasis, is determined by a Cascading Stylesheet (CSS), usually kept in a separate file. Should such a file not exist, the content is displayed using the browser’s defaults, which can vary, depending on the software or the device.
A section of HTML code, with matching open tags and close tags, such as
＜/em＞, is called an element.
＜wbr＞(optional word break)
Browsers usually treat white space characters, such as control codes that don't represent any visible character, as a normal space between words. Similarly, multiple spaces or single spaces in combination with other white space characters are also interpreted as a single space. This means that line feed characters are best left at the end of real lines and other white space characters avoided.
＜wbr＞tag in the middle of a word or string of text allows the browser to wrap the text at that point, should there be insufficient space in the window.
Raw HTML can end up being difficult to read, especially in a large and complex document, so the text is often formatted to ease editing, although this makes no difference to the appearance of the page as seen in a browser. The option to do this should to available in your text editor. Here’s the example above in a formatted style:
␉ This is ␊
␉ ␉ ＜em＞ ␊
␉ ␉ ␉ emphasised ␊
␉ ␉ ＜/em＞ ␊
␉ text. ␊
As you can see,
␉ (horizontal tabs) and
␊ (line feed) characters have been added. Various kinds of formatting exist, so you’re free to use whatever format is easiest for you. Note, however, that the more sophisticated styles of formatting may introduce rather too many line feeds, introducing unwanted spaces in your final content.
Formatting an HTML file can almost double its size, which means it takes up more space on the server, takes longer to download and therefore appears more slowly in a browser. To minimise size, your text editor should also have a facility to optimise the document, removing all the unwanted control codes that were introduced by formatting.
The beginning of tags in HTML is delineated by the
＜ character, meaning it can't be used elsewhere in your text. Normally this isn’t a problem, since it isn’t that commonly used.
When you really need to use a
＜ character there are two options:
1. Use one of the other Unicode variations of ‘less than’ characters, along with the matching ‘greater than’ symbols.
2. Employ a character entity, a mechanism used long before UTF‑8 was introduced. This fools the browser into treating a
＞ as a character, not as part of HTML. To do this, simply type
＆gt; in their place.
＆(ampersand) where it might create other character entities by mistake. Such entities consist of
＆followed by other characters and/or numbers and ending in a
;(semicolon). Chances are this shouldn't happen very often.
＆in character entities, some older validation software may object to the existence of
＆itself within an HTML document. If this happens you should replace all instances of
＆amp;or avoid the character entirely within the content of your text.
In almost every web page you can click on hyperlinks, portions of ‘clickable’ text that take you to other web locations. Each hyperlink requires an anchor tag, represented as
＜a＞, as in:
where ‘here’ is the hyperlink text. Such text can also be a partial or complete URL, such as:
You can also use an image as a ‘clickable’ object, as in:
www.apple.com"＞ ＜imgsrc="apple.png" alt="Link to Apple"＞ ＜/a＞
which includes the
alt attribute for those with visual impairments.
Links can also be made for bookmarks, linking to other parts of the same page, parts of another page on the same site or even parts of a page on another site. You do can do this with something like:
which provides a link to the following
＜h2＞ heading elsewhere on the page, identified by its
To bookmark a location in another page, the link can be of the form:
＜a href="html_demo.htmlChapter 4
C4 is the
id of the required location on the page at
id value can be any combination of letters and numbers, or can be set to
bottom, allowing navigation to either end of a document.