Normal 8-bit coding provides 256 characters in a standard character set, which is insufficient for all the special symbols, punctuation and accented characters used in various languages. And it’s certainly inadequate for the vast range of symbols used in pictographic languages such as Chinese or Japanese.
Unicode, also known as ISO 10646-M, uses 16-bit codes to define up to 65,536 characters. These are grouped as shown in the following table:-
From | To | Usage |
---|---|---|
0 | 8191 | Alphabetic |
8192 | 12287 | Alphabetic |
12288 | 16383 | Pictographic, |
16384 | 59391 | Pictographic |
59392 | 65024 | Special |
65025 | 65535 | Software |
Although all of these Unicode codes are fully standardised, many applications or computer operating systems are limited to showing only some of the characters.
Further details regarding some these groups appear in the following sections.
• Click here to see a more extensive list of the Unicode groups.
• Click here to see a full textual description of all Unicode characters.
The characters generated by these codes are identical to those defined by the ASCII standard and the ISO 8859-1 standard, the latter also known as Latin-1. This makes it easy to convert material that’s coded using a western-based character set, often known as Roman, into Unicode form.
The ISO 8859-1 character set is as follows:-
The codes from 128 to 159 should be avoided, as these can be used for non-standard characters.
This area is used for less common accented characters, many of which appear with an accent separate to the letter itself, as shown below. Characters that can’t be displayed by your browser appear as a ? (query) or as a keyboard button symbol.
Of these, and other codes in this group, the following are the most useful:-
Hex | Dec | Description | Char |
---|---|---|---|
152 | 338 | OE ligature | Œ |
153 | 339 | oe ligature | œ |
160 | 352 | S caron | Š |
161 | 353 | s caron | š |
178 | 376 | Y diaeresis | Ÿ |
192 | 402 | Small hook f | ƒ |
2C6 | 710 | Letter circumflex | ˆ |
2D8 | 728 | Breve accent | ˘ |
2D9 | 729 | Dot accent | ˙ |
2DA | 730 | Ring accent | ˚ |
2DB | 731 | Ogonek | ˛ |
2DC | 732 | Small tilde | ˜ |
The remaining codes in this block are assigned to other obscure characters.
These characters are often used in maths and other applications. For simplicity, unassigned codes have been omitted from the following table.
Hex | Dec | Description | Char |
---|---|---|---|
391 | 913 | Capital alpha | Α |
392 | 914 | Capital beta | Β |
393 | 915 | Capital gamma | Γ |
394 | 916 | Capital delta | Δ |
395 | 917 | Capital epsilon | Ε |
396 | 918 | Capital zeta | Ζ |
397 | 919 | Capital eta | Η |
398 | 920 | Capital theta | Θ |
399 | 921 | Capital iota | Ι |
39A | 922 | Capital kappa | Κ |
39B | 923 | Capital lambda | Λ |
39C | 924 | Capital mu | Μ |
39D | 925 | Capital nu | Ν |
39E | 926 | Capital xi | Ξ |
39F | 927 | Capital omicron | Ο |
3A0 | 928 | Capital pi | Π |
3A1 | 929 | Capital rho | Ρ |
3A3 | 931 | Capital sigma | Σ |
3A4 | 932 | Capital tau | Τ |
3A5 | 933 | Capital upsilon | Υ |
3A6 | 934 | Capital phi | Φ |
3A7 | 935 | Capital chi | Χ |
3A8 | 936 | Capital psi | Ψ |
3A9 | 937 | Capital omega | Ω |
3B1 | 945 | Small alpha | α |
3B2 | 946 | Small beta | β |
3B3 | 947 | Small gamma | γ |
3B4 | 948 | Small delta | δ |
3B5 | 949 | Small epsilon | ε |
3B6 | 950 | Small zeta | ζ |
3B7 | 951 | Small eta | η |
3B8 | 952 | Small theta | θ |
3B9 | 953 | Small iota | ι |
3BA | 954 | Small kappa | κ |
3BB | 955 | Small lambda | λ |
3BC | 956 | Small mu | μ |
3BD | 957 | Small nu | ν |
3BE | 958 | Small xi | ξ |
3BF | 959 | Small omicron | ο |
3C0 | 960 | Small pi | π |
3C1 | 961 | Small rho | ρ |
3C2 | 962 | Small final sigma | ς |
3C3 | 963 | Small sigma | σ |
3C4 | 964 | Small tau | τ |
3C5 | 965 | Small upsilon | υ |
3C6 | 966 | Small phi | φ |
3C7 | 967 | Small chi | χ |
3C8 | 968 | Small psi | ψ |
3C9 | 969 | Small omega | ω |
3D1 | 977 | Small theta symbol | ϑ |
3D2 | 978 | Upsilon with | ϒ |
3D5 | 981 | Symbol | ϕ |
3D6 | 982 | Pi symbol | ϖ |
These codes are used for rather less common characters and punctuation. The following table only shows the more usual characters, with numerous rows omitted for clarity. Some codes don’t appear to create any visible character but are in fact used for a range of different types of spaces. Those characters that can’t be displayed by your browser are indicated by a ? or by a keyboard button symbol.
Of these, the following are commonly used:-
Hex | Dec | Description | Char |
---|---|---|---|
2002 | 8194 | N-space | |
2003 | 8195 | M-space | |
2009 | 8201 | Thin | |
200C | 8204 | Zero | |
200D | 8205 | Zero | |
200E | 8206 | Left-to-right | |
200F | 8207 | Right-to-left | |
2013 | 8211 | N-dash | – |
2014 | 8212 | M-dash | — |
2018 | 8216 | Left | ‘ |
2019 | 8217 | Right | ’ |
201A | 8218 | Single | ‚ |
201C | 8220 | Left | “ |
201D | 8221 | Right | ” |
201E | 8222 | Double | „ |
2020 | 8224 | Dagger | † |
2021 | 8225 | Double | ‡ |
2022 | 8226 | Bullet | • |
2026 | 8230 | Horizontal | … |
2030 | 8240 | Per | ‰ |
2032 | 8242 | Prime | ′ |
2033 | 8243 | Double | ″ |
2039 | 8249 | Single | ‹ |
203A | 8250 | Single | › |
203E | 8254 | Overline | ‾ |
2044 | 8260 | Fraction | ⁄ |
20AC | 8364 | Euro | € |
2111 | 8465 | Imaginary | ℑ |
2118 | 8472 | Weierstrass | ℘ |
211C | 8476 | Real | ℜ |
2122 | 8482 | Trade | ™ |
2135 | 8501 | Alef | ℵ |
2190 | 8592 | Left | ← |
2191 | 8593 | Up | ↑ |
2192 | 8594 | Right | → |
2193 | 8595 | Down | ↓ |
2194 | 8596 | Left | ↔ |
21B5 | 8629 | Carriage | ↵ |
21D0 | 8656 | Left | ⇐ |
21D1 | 8657 | Up | ⇑ |
21D2 | 8658 | Right | ⇒ |
21D3 | 8659 | Down | ⇓ |
21D4 | 8660 | Left | ⇔ |
2200 | 8704 | For | ∀ |
2202 | 8706 | Partial | ∂ |
2203 | 8707 | There | ∃ |
2205 | 8709 | Empty | ∅ |
2207 | 8711 | Nabla | ∇ |
2208 | 8712 | Element | ∈ |
2209 | 8713 | Not | ∉ |
220B | 8715 | Contains | ∋ |
220F | 8719 | Product | ∏ |
2211 | 8721 | Sum | ∑ |
2212 | 8722 | Minus | − |
2217 | 8727 | Low | ∗ |
221A | 8730 | Radical | √ |
221D | 8733 | Proportional | ∝ |
221E | 8734 | Infinity | ∞ |
2220 | 8736 | Angle | ∠ |
2227 | 8743 | Logical | ∧ |
2228 | 8744 | Logical | ∨ |
2229 | 8745 | Cap | ∩ |
222A | 8746 | Cup | ∪ |
222B | 8747 | Integral | ∫ |
2234 | 8756 | Therefore | ∴ |
223C | 8764 | Similar | ∼ |
2245 | 8773 | Approximately | ≅ |
2248 | 8776 | Asymptotic | ≈ |
2260 | 8800 | Not | ≠ |
2261 | 8801 | Equivalent | ≡ |
2264 | 8804 | Less-than | ≤ |
2265 | 8805 | Greater-than | ≥ |
2282 | 8834 | Subset | ⊂ |
2283 | 8835 | Superset | ⊃ |
2284 | 8836 | Not | ⊄ |
2286 | 8838 | Subset | ⊆ |
2287 | 8839 | Superset | ⊇ |
2295 | 8853 | Circled | ⊕ |
2297 | 8855 | Circled | ⊗ |
22A5 | 8869 | Perpendicular | ⊥ |
22C5 | 8901 | Dot | ⋅ |
2308 | 8968 | Left | ⌈ |
2309 | 8969 | Right | ⌉ |
230A | 8970 | Left | ⌊ |
230B | 8971 | Right | ⌋ |
2329 | 9001 | Left | 〈 |
232A | 9002 | Right | 〉 |
25CA | 9674 | Lozenge | ◊ |
2660 | 9824 | Black | ♠ |
2663 | 9827 | Black | ♣ |
2665 | 9829 | Black | ♥ |
2666 | 9830 | Black | ♦ |
©Ray White 2004.