Markup Languages

A text file, which is usually identified by an extension of .txt, frequently contains plain text in the form of ASCII characters that anyone can read. However, it’s possible to convey additional information in a text file by means of a markup language. The most common language is HyperText Markup Language (HTML), which is dealt with elsewhere in these guides.

All markup languages employ special groups of characters, usually known as tags, which convey the extra information, such as text formatting and layout.The result, which can be tricky to read in a standard text editor, is normally viewed in a suitable browser application.

The document itself can be a plain text file or a variety of Unicode Text File (UTF), although some applications, including older Web browsers, can’t accommodate every kind of UTF file.

Markup Language Editors

The creation of a text file containing a markup language can be approached in two ways. By using an advanced text-processor, such as BBEdit, you can work directly on the raw text, although this can be time-consuming and difficult. Alternatively, you can use a WYSIWYG editor suited to your chosen markup language, allowing you to use intuitive methods to create text and other elements.

SGML Derivatives

Traditional markup languages have been developed from the Standard Generalised Markup Language (SGML), using tags to convey text style, formatting and other information. Of these, the most common variations are listed below. In the Classic Mac OS all these files have a type code of TEXT, although they should always be identified by the filename extensions shown here.

Hypertext Markup Language (HTML)  .htm/.html

As used to create Web pages on the World Wide Web, employing Uniform Resource Locators (URLs) for links to other documents or graphics, the latter supplied as GIF, PNG or JPEG files. The formatting of a group of pages, or even an entire Web site, can be defined using a Cascading Style Sheet (CSS). If necessary, different CSSs can be used for specific pages.

Dynamic HTML (DHTML)  .htm/.html

A variation of HTML that accommodates animated layers, behaviours and style sheets by using JavaScript to manipulate a CSS. This can be exploited in recent Web browsers, such as Explorer and Netscape, when working with active channels. Since this kind of coding actually contains ‘executables’ there’s some possibility of conveying a virus.

Extended HTML (XHTML)  .htm/.html

An improved version of HTML that uses XML tag methods for HTML. As in XML, lowercase tags have to be used, opening and closing tags must be included and all parameters must be inside quotes.

Extensible Markup Language (XML)  .xml

Unlike HTML, this describes the content of the text, not its actual formatting. Once coded in XML, the material can be converted, as required, into HTML by means of a suitable application. The conversion relies on the XML tags that have been designed for the content, as specified in a Document Type Definition (DTD) file or in a standalone document declaration (SDD) in the XML file itself. The appearance of a document can be modified by using a CSS file or by using another file written in Extensible Stylesheet Language (XSL), itself a variation of XML. Further information about elements can be contained in an Element Definition Document (EDD).

Lasso Dynamic Markup Language (LDML)

Used for getting access to databases over the Internet, including the SQL variety.

Wireless Access Protocol Markup Language (WML)  .wml

This is a variation of HTML that’s specially designed to suit the small displays used on Internet-equipped mobile phones, commonly known as WAP phones. All files, both text and graphics, must contain less than 1,400 bytes and all images have to be in black-and-white form, not greyscale.

Other Languages

Other languages, often based on one the above or XML, include:-

Claris Dynamic Markup Language (CDML)  .htm/.html

An expanded version of HTML, as used in Web-creation applications such as Claris Home Page and FileMaker Pro, employing tags of the form <X-CLARIS...>, which are ignored by a Web browser.

ColdFusion Markup Language (CFML)  .cfm/.cfml

Devised by Allaire for use with a Web site that has dynamic page content and Java Database Connectivity (JDBC), allowing anyone who visits the site to retrieve database information.

Compact HTML (CHTML)  .htm/.html

A variation of HTML that’s used for i-mode devices.

Document Object Model Language (DOML)

Another markup language based on XML.

Extensible Forms Description Language (XFDL)

Based on XML, but specially designed for creating, viewing and entering data into complex forms, including legal contracts.

Extensible Style Language  .xml

Extensible Style Sheet Language (XSL)  .xsl

Extensible Style Sheet Language Template (XSLT)  .xslt

More languages based on XML.

FileMaker Dynamic Markup Language (FDML)  .fdml

A variant of HTML that uses special tags, such as [FMP-Record] to link a FileMaker database into a Web page.

Personalised Print Markup Language (PPML)

An XML-based language, not intended for the Web, but designed for plugging variable data from a database, such as text or images, into a printed document that has fixed elements. The effect is similar to the mail merge feature found in AppleWorks and other general-purpose applications.

Precision Graphic Markup Language (PGML)

Based on PostScript, this language was developed from XML by Adobe for sending vector graphics over the Web. It can be used an an alternative to Flash, which is a popular binary format.

Standard Generalised Markup Language (SGML)  .sgm/.sgml

A general-purpose markup language used for publishing various kinds of documents.

Synchronised Multimedia Integration Language (SMIL)  .smil

This XML-based language is understood by RealPlayer G2 and QuickTime 4.1 or later. It can convey synchronised sound and video over the Web, as well as text, images and Flash animations.

Vector Markup Language (VML)  .vml

Also based on XML, this alternative to Flash carries vector graphics.

Virtual Reality Markup Language (VRML)  .ivr/.vrml/.wrl/.wrz

This language is used by three-dimensional (3D) Web sites, but also supports ordinary two-dimensional vector images. It can be decoded by Apple’s QuickDraw 3D software.

Rich Text Format (RTF)

Although not usually considered a markup language, RTF has many similar characteristics. It’s commonly used with word processors, usually when interchanging documents via Microsoft’s Word application. Each file consists of normal text interspersed with special strings of characters that represent information about font styles and formatting. RTFs are also supported by numerous other applications, including ClarisWorks, MacWrite II, Works and WriteNow.

There are several variations in the RTF standard, causing some applications to reject specific files. Information about the contents of a document can be gleaned by examining it with a text editor such as BBEdit. The type of file is indicated in the first line of text, sometimes known as a file header. For example, a document that uses the Windows (ANSI) character set usually begins with:-

{\rtf0\ansi

while one that uses the Mac OS character set should begin with:-

{\rtf1\mac

As this implies, all RTF commands are preceded by a \ (backslash) whilst tabular lists and other data related to commands are held in groups of { and } brackets. Each document begins with a list of fonts, followed by stylesheet data, paper size and margin information. Here’s an example paragraph:-

\pard…{\f21 This is }{\f21 \b bold}{\f21 now }{\f21 \i italic}

{\f21 now }{\f21 \ul underline}{\f21 .\par

where actually consists of a string of other commands, \f21 relates to the font in the font list at the beginning of the document and \b, \i and \u indicate font styles. The result is rendered as:-

This is bold now italic

now underline.

RTFs in Mac OS X

Mac OS X lets you create RTF files using the TextEdit application. A TextEdit document is often created in a special folder, known as a bundle or package, that behaves as a single file. This folder, also containing the PICT image files for the document, has a filename extension of .rtfd.

When viewed in a text editor, a TextEdit RTF file begins with:-

{\rtf1\mac\ansicpg10000\cocoartf100

reminding us that Cocoa is the native programming environment of OS X. Such documents aren’t acceptable to some applications, although the RTF translator supplied with later versions of MacLinkPlus Deluxe works well. However, embedded graphics, linked to PICT files in the same folder as the RTF, aren’t always understood. As seen in the raw data, these appear as:-

{{\NeXTGraphic __RES1000__.pict \width5100 \height740

}¨}

where __RES1000__.pict is the name of the appropriate graphic file. This kind of line, which is really in a non-standard form, also clearly betrays the NeXT origins of Mac OS X.

  Palm Markup Language (PML)

This proprietary markup language is used to create documents for the Palm OS. A document containing PML can be created in a text editor such as BBEdit or by creating a word-processing file in the Word application and applying a special macro known as word2pml.

The completed PML file can be dropped onto DropBook, a special application that converts the content into an electronic book (eBook). The resultant file, which is identified by a .pdb extension, can be viewed using Palm Reader (Palm), either on your Palm organiser or on a standard computer.

Palm Formats

Palm Reader can also be used to view documents that have been created in DOC format, a standard type of file found in the Palm OS. Various applications can create these files, such as Pordible, which converts standard .txt files to .pdb DOC files and vice versa.

The Language

PML is very simple and is similar in some ways to RTF. The commands are preceded by a \ (backslash) and usually followed by = (equals) and the parameters within straight quotes.

The following table summarises the standard commands:-

CommandMeaning
\pPage break (no tag at end)
\xNew chapter (no tag at end)
\XnNew chapter, indented n levels (no tag at end)
\cCenter align
\rRight-align
\iItalic
\uUnderline
\oOverstrike
\vInvisible (comment)
\tIndent
\T="50%"Indent by percentage of page width
\W="50%"Embed horizontal rule by percentage of page width
\nNormal font style
\sStandard font
\bBold font
\lLarge font
\axxxNon-ASCII Palm character; xxx is decimal value
\m="imagename.png"Image of the specified name (no tag at end)
\q="#linkanchor"the_textLink to anchor in document, with 'the_text' underlined
\Q="linkanchor"Anchor for link (no tag at end)
\-Soft hyphen, only shown if word is broken by line
\BBold style
\SpSuperscript
\SbSubscript
\Fn="footnote1"1Footnote '1', content given at end of document (see below)
\Sd="sidebar1"the_textSidebar link, content given at end of document (see below)
\Cn="Chapter Title"Chapter title; chapter number n

Many of these tags are used at the beginning and end of the required effect. So, for example, if you want the word bold in the following to be presented using a bold font you must use:-

This is a \bbold\b word

but if you want to employ the current font in a bold style you must use:-

This is a \Bboldened\B word

Footnotes and sidebars content is specified in XML form at the end of the document, as in:-

<sidebar id="sidebar1">

Here's the \itext\i contained in the sidebar

</sidebar>

Converting Plain Text to PML

Any ASCII text file can be converted to PML using the Find & Replace feature found in a text editing application such as BBEdit. This can be applied to your own work or to non-copyright material, such as that published by Project Gutenberg. The following procedure can be used:-

  1. Ensure that your file only contains ASCII characters (those that you can see on your keyboard). If necessary, replace the offending characters by suitable alternatives.
  2. Remove any single - (hyphen) characters from the ends of lines but not a -- (double hyphen).
  3. Remove line-ending CR (carriage return) and/or LF (line feed) codes by replacing, in order:-
    1. any double line ending codes by a special string, such as •••.
    2. the remaining line ending codes by a space.
    3. the special strings, ••• in this case, by double line endings.
  4. Remove all double spaces.
  5. Replace any ` (backquote) characters by a ' (straight quote).
  6. Replace any occurrence of "' (a double-quote followed by a single quote) by "\a160', where \a160 is a special PML code representing NBSP (non-breaking space).
  7. Replace any occurrences of '" (a single-quote followed by a double quote) by '\a160".
  8. Replace ... (run of periods) by \a133, the latter representing an ellipsis in PML.
  9. Replace capitalised words, as appropriate, by normal characters within \i codes, which makes the text italic.
  10. As appropriate, replace words that begin and end with a _ (underscore) by normal characters contained within \u codes, which will makes the text underlined.

Note that the replacement of single line endings by a space, as shown in step 3, may not always be appropriate, as, for example, in poetic verse.

Reference

Peanut Press website at www.peanutpress.com.

©Ray White 2004.