Character Coding

In a computer, each kind of character in a string of text must have its own code. A complete set of characters represented by such codes is known as a character set.

The maximum number of characters in a set is determined by the number of bits that are available for each code. Very early computers only used seven bits for each character, an eighth bit being employed for parity error detection. Unfortunately, such 7-bit coding only accommodated 128 characters, although these were fully standardised and known as the ASCII character set (see below).

Later systems employed more sophisticated error handling, allowing all eight bits to be used for character representation. This newer 8-bit coding accommodated 256 characters, although the codes not defined by ASCII aren’t fully standardised. Finally, there’s 16-bit coding (double-byte coding), providing 65,536 characters, including those used in pictographic languages.

ASCII Character Set

The American Standard for Information Interchange (ASCII) standard was devised for sending data over communications links. It corresponds to the American National Standards Institute (ANSI) standard X34-1986 and is similar to the International Standards Organisation (ISO) specification ISO 646.

In these standards, each character is given a 7-bit code, with a value between 0 and 127. This accommodates 128 characters, usually those printed on a keyboard, as shown below.

Although the set includes all of the usual letters, numbers and punctuation, it unfortunately excludes many non-English or accented characters and other familiar symbols.

ASCII also defines numbers from 0 to 31 as control codes, each identified by a two or three-letter mnemonic. Examples include HT (Horizontal Tab), CR (carriage return), LF (line feed) and FF (Form Feed). In practice, most applications in the Classic Mac OS shows such characters as a square box (), although in some instances they’re also used by the operating system for real-world characters. Many of these codes are often ignored, although some devices use the BEL (Bell) code.

The table below shows how ASCII values are encoded in both decimal and in hex:

Hex  000102030405060708090A0B0C0D0E0F
 Dec0123456789101112131415
000NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1016DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2032 !"#$%&'()*+,-./
30480123456789:;<=>?
4064@ABCDEFGHIJKLMNO
5080PQRSTUVWXYZ[\]^_
6096`abcdefghijklmno
70112pqrstuvwxyz{|}~

Finding the character represented by a number from this kind of table is actually quite easy. For example, suppose you need the character represented by decimal 69. Simply find the row containing the nearest number, 64 in this case, and then move along to a column that increases the value to 69, which in this instance is 5. Hence we deduce that the character is E.

The following tables show the keys used to create these characters in the Classic Mac OS:-

HexDecCharKeys Pressed HexDecCharKeys Pressed
000NULNot available * 2032 Space
011SOHCtrl-A or Home * 2133!Shift-1
022STXCtrl-B * 2234"Shift-'
033ETXCtrl-C or Enter * 2335#Option-3 •
044EOTCtrl-D or End * 2436$Shift-4
055ENQCtrl-E or Help * 2537%Shift-5
066ACKCtrl-F * 2638&Shift-7
077BEL Ctrl-G * 2739''
088BSCtrl-H or Delete 2840(Shift-9
099HTCtrl-I or Tab* 2941)Shift-0
0A10LFCtrl-J * 2A42*Shift-8
0B11VTCtrl-K or Page Up * 2B43+Shift-=
0C12FFCtrl-L or Page Down * 2C44,,
0D13CRCtrl-M or Return 2D45--
0E14SOCtrl-N * 2E46..
0F15SICtrl-O * 2F47//
1016DLECtrl-P or F1-F15 * 304800
1117DC1Ctrl-Q * 314911
1218DC2Ctrl-R * 325022
1319DC3Ctrl-S * 335133
1420DC4Ctrl-T * 345244
1521NAKCtrl-U * 355355
1622SYNCtrl-V * 365466
1723ETBCtrl-W * 375577
1824CANCtrl-X * 385688
1925EMCtrl-Y * 395799
1A26SUBCtrl-Z * 3A58:Shift-;
1B27ESCEsc or Clear 3B59;;
1C28FSLeft Arrow * 3C60<Shift-,
1D29GSRight Arrow * 3D61==
1E30RSUp Arrow * 3E62>Shift-.
1F31USDown Arrow * 3F63?Shift-/

* Not all Mac keyboards can produce these codes

For a British keyboard: Shift-3 is used in US version of the Mac OS

HexDecCharKeys Pressed HexDecCharKeys Pressed
4064@Shift-2 6096``
4165AShift-A 6197aA
4266BShift-B 6298bB
4367CShift-C 6399cC
4468DShift-D 64100dD
4569EShift-E 65101eE
4670FShift-F 66102fF
4771GShift-G 67103gG
4872HShift-H 68104hH
4973IShift-I 69105iI
4A74JShift-J 6A106jJ
4B75KShift-K 6B107kK
4C76LShift-L 6C108lL
4D77MShift-M 6D109mM
4E78NShift-N 6E110nN
4F79OShift-O 6F111oO
5080PShift-P 70112pP
5181QShift-Q 71113qQ
5282RShift-R 72114rR
5383SShift-S 73115sS
5484TShift-T 74116tT
5585UShift-U 75117uU
5686VShift-V 76118vV
5787WShift-W 77119wW
5888XShift-X 78120xX
5989YShift-Y 79121yY
5A90ZShift-Z 7A122zZ
5B91[[ 7B123{{
5C92\\ 7C124||
5D93]] 7D125}}
5E94^Shift-6 7E126~Shift-`
5F95_Shift-- 7F127DELForward Delete

Mac OS Character Sets

The Mac OS, in common with other modern computer systems, uses 8-bit codes, providing 256 characters in the set, of which 0 to 127 are used for standard ASCII characters. The remainder, 128 to 255, are used for special characters and are coded to match the Mac’s own operating system.

The set used on your machine depends on the script used to represent your language. The following table shows the Mac character sets that are used for each kind of script:-

ScriptSet ScriptSet
Western (Roman)MacRoman GreekMacGreek
Central EuropeanMacCE GujaratiMacGujarati
ArabicMacArabic HebrewMacHebrew
CroatianMacCroatian IcelandicMacIcelandic
CyrillicMacCyrillic RomanianMacRomanian
DevanagariMacDevanagari TurkishMacTurkish
FarsiMacFarsi UkrainianMacUkrainian

The Roman character set, as used in the Mac OS for western languages, is shown below:- 

Hex  000102030405060708090A0B0C0D0E0F
 Dec0123456789101112131415
000NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1016DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2032 !"#$%&'()*+,-./
30480123456789:;<=>?
4064@ABCDEFGHIJKLMNO
5080PQRSTUVWXYZ[\]^_
6096`abcdefghijklmno
70112pqrstuvwxyz{|}~
80128ÄÅÇÉÑÖÜáàâäãåçéè
90144êëíìîïñóòôöõúùûü
A0160°¢£§ß®©´¨ÆØ
B0176±¥µπªºΩæø
C0192¿¡¬ƒ«» ÀÃÕŒœ
D0208÷ÿŸ
E0224·ÂÊÁËÈÍÎÏÌÓÔ
F0240ÒÚÛÙıˆ˜¯˘˙˚¸˝˛ˇ

As you can see, the characters up to 127 (hex 7F) are the same as the standard ASCII character set. In order to obtain some of the extra characters you must press a deadkey followed by a second key. For example, to get the character Ü you must press Option-U followed by Shift-U.

To add to the confusion, some Mac fonts contain non-standard characters. For example, older versions of the Geneva font have a curious sheep character that changes into a rabbit in the larger font sizes.Similarly, some fonts are designed to work as a system font with later versions of the Classic Mac OS, providing the special symbols that appear in pull-down menus. The diagram below shows how such a font uses codes from 1 to 31 to represent such characters:-

whilst the following kind of font, intended for older versions of the Mac OS, uses higher codes:-

The required key combinations for non-ASCII Mac characters are given in the following tables:-

HexDecCharKeys Pressed HexDecCharKeys Pressed
80128ÄOption-U, Shift-A A0160Option-T
81129ÅShift-Option-A A1161°Shift-Option-8
82130ÇShift-Option-C A2162¢Option-4
83131ÉOption-E, Shift-E A3163£Shift-3 •
84132ÑOption-N, Shift-N A4164§Option-6
85133ÖOption-U, Shift-O A5165Option-8
86134ÜOption-U, Shift-U A6166Option-7
87135áOption-E, A A7167ßOption-S
88136àOption-`, A A8168®Option-R
89137âOption-I, A A9169©Option-G
8A138äOption-U, A AA170Option-2
8B139ãOtion-N, A AB171´Otion-E, Space
8C140åOption-A AC172¨Option-U, Space
8D141çOption-C AD173Option-=
8E142éOption-E, E AE174ÆShift-Option-'
8F143èOption-`, E AF175ØShift-Option-O
90144êOption-I, E B0176Option-5
91145ëOption-U, E B1177±Shift-Option-=
92146íOption-E, I B2178Option-,
93147ìOption-`, I B3179Option-.
94148îOption-I, I B4180¥Option-Y
95149ïOption-U, I B5181µOption-M
96150ñOption-N, N B6182Option-D
97151óOption-E, O B7183Option-W
98152òOption-`, O B8184Shift-Option-P
99153ôOption-I, O B9185πOption-P
9A154öOption-U, O BA186Option-B
9B155õOption-N, O BB187ªOption-9
9C156úOption-E, U BC188ºOption-0
9D157ùOption-`, U BD189ΩOption-Z
9E158ûOption-I, U BE190æOption-'
9F159üOption-U, U BF191øOption-O
For British keyboard: Option-3 is used in US version of the Mac OS
HexDecCharKeys Pressed HexDecCharKeys Pressed
C0192¿Shift-Option-/ E0224Shift-Option-7
C1193¡Option-1 E1225·Shift-Option-9
C2194¬Option-L E2226Shift-Option-0
C3195Option-V E3227Shift-Option-W
C4196ƒOption-F E4228Shift-Option-E
C5197Option-X E5229ÂShift-Option-R #
C6198Option-J E6230ÊShift-Option-T #
C7199«Option-\ E7231ÁShift-Option-Y #
C8200»Shift-Option-\ E8232ËShift-Option-U #
C9201Option-; E9233ÈShift-Option-I #
CA202NBSPOption-Space EA234ÍShift-Option-S #
CB203ÀOption-`, Shift-A EB235ÎShift-Option-D #
CC204ÃOption-N, Shift-A EC236ÏShift-Option-F #
CD205ÕOption-N, Shift-O ED237ÌShift-Option-G #
CE206ŒShift-Option-Q EE238ÓShift-Option-H #
CF207œOption-Q EF239ÔShift-Option-J #
D0208Option-- F0240Shift-Option-K
D1209Shift-Option-- F1241ÒShift-Option-L #
D2210Option-[ F2242ÚShift-Option-; #
D3211Shift-Option-[ F3243ÛShift-Option-Z #
D4212Option-] F4244ÙShift-Option-X #
D5213Shift-Option-] F5245ıShift-Option-B
D6214÷Option-/ F6246ˆShift-Option-N
D7215Shift-Option-V F7247˜Shift-Option-M
D8216ÿOption-U, Y F8248¯Shift-Option-,
D9217ŸShift-Option-` # F9249˘Shift-Option-.
DA218Shift-Option-1 FA250˙Option-H
DB219Shift-Option-2 FB251˚Option-K
DC220Shift-Option-3 FC252¸*
DD221Shift-Option-4 FD253˝*
DE222Shift-Option-5 FE254˛*
DF223Shift-Option-6 FF255ˇ*

NBSP Non-breaking space

* No key combination available

# Character also available via a deadkey sequence

Windows Character Sets

The Windows operating system also uses its own character sets, as shown in the following table:-

ScriptSet ScriptSet
Western1252 Greek1253
Central European1250 Hebrew1255
Arabic1256 Turkish1254
Baltic1257 Vietnamese1258
Cyrillic1251   

Most western countries, excluding central Europe, employ the Windows 1252 set, which is essentially based on the ISO 8859-1 standard (see below). The 1252 character set is shown below:-

Hex  000102030405060708090A0B0C0D0E0F
 Dec0123456789101112131415
000NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1016DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2032 !"#$%&'()*+,-./
30480123456789:;<=>?
4064@ABCDEFGHIJKLMNO
5080PQRSTUVWXYZ[\]^_
6096`abcdefghijklmno
70112pqrstuvwxyz{|}~
80128ƒ^ŠŒŽ
90144-˜šœžŸ
A0160 ¡¢£¤¥¦§¨©ª«¬-®¯
B0176°±²³´µ·¸¹º»¼½¾¿
C0192ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
D0208ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
E0224àáâãäåæçèéêëìíîï
F0240ðñòóôõö÷øùúûüýþÿ

ISO 8859 Character Sets

The International Standards Organisation (ISO) has defined several 8-bit character sets, mainly intended for transferring textual information between different computer platforms. The common ISO 8859 standard defines the following character sets:-

ScriptSet ScriptSet
WesternISO-8859-1 CelticISO-8859-14
WesternISO-8859-15 CyrillicISO-8859-5
Central EuropeanISO-8859-2 GreekISO-8859-7
South EuropeanISO-8859-3 HebrewISO-8859-8-1
ArabicISO-8859-6 Hebrew (Visual)ISO-8859-8
BalticISO-8859-4 NordicISO-8859-10
BalticISO-8859-13 TurkishISO-8859-9

The ISO 8859-1 character set, also known as the Latin-1 set, is employed to represent the characters in many western languages, excluding those of central Europe. Fortunately, this set is basically the same as Windows 1252 (see above), although the codes from 128 to 159 aren’t included in the standard, since they’re commonly used in alternative character sets, as shown below:-

Hex  000102030405060708090A0B0C0D0E0F
 Dec0123456789101112131415
80128^ŠŒ
90144ı`´^˜¯˘˙¨šœ˝˛Ÿ
Hex  000102030405060708090A0B0C0D0E0F
 Dec0123456789101112131415
80128 à^äŠÎ
90144ı`´^˜¯˘˙¨ªº¸˝˛ˇ

These values should always be avoided when using ISO 8859-1 coding. Instead, the characters represented by these codes must be conveyed using 16-bit values, usually in the form of Unicode.

©Ray White 2004.