Unicode

Normal 8-bit coding provides 256 characters in a standard character set, which is insufficient for all the special symbols, punctuation and accented characters used in various languages. And it’s certainly inadequate for the vast range of symbols used in pictographic languages such as Chinese or Japanese.

Unicode, also known as ISO 10646-M, uses 16-bit codes to define up to 65,536 characters. These are grouped as shown in the following table:-

FromToUsage
08191Alphabetic Characters (0-255 as ISO)
819212287Alphabetic punctuation, symbols, dingbats
1228816383Pictographic, auxiliary alphabets, punctuation
1638459391Pictographic characters
5939265024Special
6502565535Software development

Although all of these Unicode codes are fully standardised, many applications or computer operating systems are limited to showing only some of the characters.

Further details regarding some these groups appear in the following sections.

Codes 0 to 255: Alphabetic Characters and Punctuation

The characters generated by these codes are identical to those defined by the ASCII standard and the ISO 8859-1 standard, the latter also known as Latin-1. This makes it easy to convert material that’s coded using a western-based character set, often known as Roman, into Unicode form.

The ISO 8859-1 character set is as follows:-

Hex  000102030405060708090A0B0C0D0E0F
 Dec0123456789101112131415
000NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI
1016DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS
2032 !"#$%&'()*+,-./
30480123456789:;<=>?
4064@ABCDEFGHIJKLMNO
5080PQRSTUVWXYZ[\]^_
6096`abcdefghijklmno
70112pqrstuvwxyz{|}~
80128
90144
A0160 ¡¢£¤¥¦§¨©ª«¬-®¯
B0176°±²³´µ·¸¹º»¼½¾¿
C0192ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
D0208ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
E0224àáâãäåæçèéêëìíîï
F0240ðñòóôõö÷øùúûüýþÿ

The codes from 128 to 159 should be avoided, as these can be used for non-standard characters.

Codes 256 to 912: Special Characters and Accents

This area is used for less common accented characters, many of which appear with an accent separate to the letter itself, as shown below. Characters that can’t be displayed by your browser appear as a ? (query) or as a keyboard button symbol.

Hex  000102030405060708090A0B0C0D0E0F
 Dec0123456789101112131415
100256ĀāĂ㥹ĆćĈĉĊċČčĎď
110272ĐđĒēĔĕĖėĘęĚěĜĝĞğ
120288ĠġĢģĤĥĦħĨĩĪīĬĭĮį
130304İıIJijĴĵĶķĸĹĺĻļĽľĿ
140320ŀŁłŃńŅņŇňʼnŊŋŌōŎŏ
150336ŐőŒœŔŕŖŗŘřŚśŜŝŞş
160352ŠšŢţŤťŦŧŨũŪūŬŭŮů
170368ŰűŲųŴŵŶŷŸŹźŻżŽžſ

Of these, and other codes in this group, the following are the most useful:-

HexDecDescriptionChar HexDecDescriptionChar
152338OE ligatureŒ 2C6710Letter circumflexˆ
153339oe ligatureœ 2D8728Breve accent˘
160352S caronŠ 2D9729Dot accent˙
161353s caronš 2DA730Ring accent˚
178376Y diaeresisŸ 2DB731Ogonek˛
192402Small hook fƒ 2DC732Small tilde˜

The remaining codes in this block are assigned to other obscure characters.

Codes 913 to 982: Greek characters

These characters are often used in maths and other applications. For simplicity, unassigned codes have been omitted from the following table.

HexDecDescriptionChar HexDecDescriptionChar
391913Capital alphaΑ 3B4948Small deltaδ
392914Capital betaΒ 3B5949Small epsilonε
393915Capital gammaΓ 3B6950Small zetaζ
394916Capital deltaΔ 3B7951Small etaη
395917Capital epsilonΕ 3B8952Small thetaθ
396918Capital zetaΖ 3B9953Small iotaι
397919Capital etaΗ 3BA954Small kappaκ
398920Capital thetaΘ 3BB955Small lambdaλ
399921Capital iotaΙ 3BC956Small muμ
39A922Capital kappaΚ 3BD957Small nuν
39B923Capital lambdaΛ 3BE958Small xiξ
39C924Capital muΜ 3BF959Small omicronο
39D925Capital nuΝ 3C0960Small piπ
39E926Capital xiΞ 3C1961Small rhoρ
39F927Capital omicronΟ 3C2962Small final sigmaς
3A0928Capital piΠ 3C3963Small sigmaσ
3A1929Capital rhoΡ 3C4964Small tauτ
3A3931Capital sigmaΣ 3C5965Small upsilonυ
3A4932Capital tauΤ 3C6966Small phiφ
3A5933Capital upsilonΥ 3C7967Small chiχ
3A6934Capital phiΦ 3C8968Small psiψ
3A7935Capital chiΧ 3C9969Small omegaω
3A8936Capital psiΨ 3D1977Small theta symbolϑ
3A9937Capital omegaΩ 3D2978Upsilon with hook symbolϒ
3B1945Small alphaα 3D5981Symbolϕ
3B2946Small betaβ 3D6982Pi symbolϖ
3B3947Small gammaγ     

Codes 8192 to 12287: Special Characters and Punctuation

These codes are used for rather less common characters and punctuation. The following table only shows the more usual characters, with numerous rows omitted for clarity. Some codes don’t appear to create any visible character but are in fact used for a range of different types of spaces. Those characters that can’t be displayed by your browser are indicated by a ? or by a keyboard button symbol.

Hex  000102030405060708090A0B0C0D0E0F
 Dec0123456789101112131415
20008192 
20108208
20208224
20A08352
21108464
21208480
21308496
21908592
21B08624
21D08656
22108720
22308752
22408768
22608800
22808832
22908848
22A08864
22B08880
22C08896
22D08912
22E08928
22F08944
23008960
23108976
25C09664
26609824

Of these, the following are commonly used:-

HexDecDescriptionChar HexDecDescriptionChar
20028194N-space 22058709Empty set
20038195M-space 22078711Nabla
20098201Thin space 22088712Element of
200C8204Zero width non-joiner 22098713Not an element
200D8205Zero width joiner 220B8715Contains as member
200E8206Left-to-right mark 220F8719Product
200F8207Right-to-left mark 22118721Sum
20138211N-dash 22128722Minus
20148212M-dash 22178727Low asterisk
20188216Left quote 221A8730Radical or square root
20198217Right quote 221D8733Proportional
201A8218Single low-9 quote 221E8734Infinity
201C8220Left double quote 22208736Angle
201D8221Right double quote 22278743Logical AND
201E8222Double low-9 quote 22288744Logical OR
20208224Dagger 22298745Cap
20218225Double dagger 222A8746Cup
20228226Bullet 222B8747Integral
20268230Horizontal ellipsis 22348756Therefore
20308240Per mille sign 223C8764Similar to
20328242Prime 22458773Approximately equal
20338243Double prime 22488776Asymptotic
20398249Single left angle quote 22608800Not equal
203A8250Single right angle quote 22618801Equivalent
203E8254Overline 22648804Less-than or equal
20448260Fraction slash 22658805Greater-than or equal
20AC8364Euro symbol 22828834Subset
21118465Imaginary part 22838835Superset
21188472Weierstrass p 22848836Not subset
211C8476Real part 22868838Subset or equal
21228482Trade mark 22878839Superset or equal
21358501Alef symbol 22958853Circled plus
21908592Left arrow 22978855Circled times
21918593Up arrow 22A58869Perpendicular
21928594Right arrow 22C58901Dot operator
21938595Down arrow 23088968Left ceiling
21948596Left right arrow 23098969Right ceiling
21B58629Carriage return arrow 230A8970Left floor
21D08656Left double arrow 230B8971Right floor
21D18657Up double arrow 23299001Left angle bracket
21D28658Right double arrow 232A9002Right angle bracket
21D38659Down double arrow 25CA9674Lozenge
21D48660Left right double arrow 26609824Black spades
22008704For all 26639827Black clubs
22028706Partial differential 26659829Black hearts
22038707There exists 26669830Black diamonds

©Ray White 2004.