Character Coding

In a computer, each kind of character in a string of text must have its own code. A complete set of characters represented by such codes is known as a character set.

The maximum number of characters in a set is determined by the number of bits that are available for each code. Very early computers only used seven bits for each character, an eighth bit being employed for parity error detection. Unfortunately, such 7-bit coding only accommodated 128 characters, although these were fully standardised and known as the ASCII character set (see below).

Later systems employed more sophisticated error handling, allowing all eight bits to be used for character representation. This newer 8-bit coding accommodated 256 characters, although the codes not defined by ASCII aren’t fully standardised. Finally, there’s 16-bit coding (double-byte coding), providing 65,536 characters, including those used in pictographic languages.

ASCII Character Set

The American Standard for Information Interchange (ASCII) standard was devised for sending data over communications links. It corresponds to the American National Standards Institute (ANSI) standard X34-1986 and is similar to the International Standards Organisation (ISO) specification ISO 646.

In these standards, each character is given a 7-bit code, with a value between 0 and 127. This accommodates 128 characters, usually those printed on a keyboard, as shown below.

Although the set includes all of the usual letters, numbers and punctuation, it unfortunately excludes many non-English or accented characters and other familiar symbols.

ASCII also defines numbers from 0 to 31 as control codes, each identified by a two or three-letter mnemonic. Examples include HT (Horizontal Tab), CR (carriage return), LF (line feed) and FF (Form Feed). In practice, most applications in the Classic Mac OS shows such characters as a square box (), although in some instances they’re also used by the operating system for real-world characters. Many of these codes are often ignored, although some devices use the BEL (Bell) code.

The table below shows how ASCII values are encoded in both decimal and in hex:

ASCII Codes

Finding the character represented by a number from this kind of table is actually quite easy. For example, suppose you need the character represented by decimal 69. Simply find the row containing the nearest number, 64 in this case, and then move along to a column that increases the value to 69, which in this instance is 5. Hence we deduce that the character is E.

The following tables show the keys used to create these characters in the Classic Mac OS:-

Classic Mac Keys 1

* Not all Mac keyboards can produce these codes

For a British keyboard: Shift-3 is used in US version of the Mac OS

Classic Mac Keys 2

Mac OS Character Sets

The Mac OS, in common with other modern computer systems, uses 8-bit codes, providing 256 characters in the set, of which 0 to 127 are used for standard ASCII characters. The remainder, 128 to 255, are used for special characters and are coded to match the Mac’s own operating system.

The set used on your machine depends on the script used to represent your language. The following table shows the Mac character sets that are used for each kind of script:-

ScriptSet
Western (Roman)MacRoman
Central EuropeanMacCE
ArabicMacArabic
CroatianMacCroatian
CyrillicMacCyrillic
DevanagariMacDevanagari
FarsiMacFarsi
GreekMacGreek
GujaratiMacGujarati
HebrewMacHebrew
IcelandicMacIcelandic
RomanianMacRomanian
TurkishMacTurkish
UkrainianMacUkrainian

The Roman character set, as used in the Mac OS for western languages, is shown below:-

Mac OS Roman Character Set

As you can see, the characters up to 127 (hex 7F) are the same as the standard ASCII character set. In order to obtain some of the extra characters you must press a deadkey followed by a second key. For example, to get the character Ü you must press Option-U followed by Shift-U.

To add to the confusion, some Mac fonts contain non-standard characters. For example, older versions of the Geneva font have a curious sheep character that changes into a rabbit in the larger font sizes.Similarly, some fonts are designed to work as a system font with later versions of the Classic Mac OS, providing the special symbols that appear in pull-down menus. The diagram below shows how such a font uses codes from 1 to 31 to represent such characters:-

whilst the following kind of font, intended for older versions of the Mac OS, uses higher codes:-

The required key combinations for non-ASCII Mac characters are given in the following tables:-

Mac OS Non-ACSII Keys 1
For British keyboard: Option-3 is used in US version of the Mac OS
Mac OS Non-ACSII Keys 2

NBSP Non-breaking space

* No key combination available

# Character also available via a deadkey sequence

Windows Character Sets

The Windows operating system also uses its own character sets, as shown in the following table:-

ScriptSet
Western1252
Central European1250
Arabic1256
Baltic1257
Cyrillic1251
Greek1253
Hebrew1255
Turkish1254
Vietnamese1258

Most western countries, excluding central Europe, employ the Windows 1252 set, which is essentially based on the ISO 8859-1 standard (see below). The 1252 character set is shown below:- Windows Character Set

ISO 8859 Character Sets

The International Standards Organisation (ISO) has defined several 8-bit character sets, mainly intended for transferring textual information between different computer platforms. The common ISO 8859 standard defines the following character sets:-

ScriptSet
WesternISO-8859-1
WesternISO-8859-15
Central EuropeanISO-8859-2
South EuropeanISO-8859-3
ArabicISO-8859-6
BalticISO-8859-4
BalticISO-8859-13
CelticISO-8859-14
CyrillicISO-8859-5
GreekISO-8859-7
HebrewISO-8859-8-1
Hebrew (Visual)ISO-8859-8
NordicISO-8859-10
TurkishISO-8859-9

The ISO 8859-1 character set, also known as the Latin-1 set, is employed to represent the characters in many western languages, excluding those of central Europe. Fortunately, this set is basically the same as Windows 1252 (see above), although the codes from 128 to 159 aren’t included in the standard, since they’re commonly used in alternative character sets, as shown below:-

ISO Character Set

These values should always be avoided when using ISO 8859-1 coding. Instead, the characters represented by these codes must be conveyed using 16-bit values, usually in the form of Unicode.

©Ray White 2004.