ASCII and ISO-8859-1
There're 2 Character Sets most widely-used for computers.
- ASCII (Basic Latin)
- ISO-8859-1 (Latin-1)
The table in below shows the internal code range in computer for the 2 Character Sets.
| Charset | Hex |
Dec | Bin |
| ASCII (Basic Latin) | 0x00 - 0x7F | 0 - 127 | 00000000 - 01111111 |
| ISO-8859-1 (Latin-1) | 0x80 - 0xFF | 128 - 255 | 10000000 - 11111111 |
All ASCII and Latin-1 characters are encoded in one byte(8-bit).
People also refers ISO-8859-1 as the superset of ASCII. there's a little mess.
ISO-10646 and Unicode
In around 1991, ISO-10646 and Unicode merged their work, and will keep the 2 Character Sets consistent in the future.
Commonly, people say "unicode" when they are refering to encoding method, it's truly meaning UTF-16.
| UTF-7 |
relatively unpopular 7-bit encoding, often considered obsolete |
| UTF-8 |
8-bit, variable-width encoding |
| UCS-2 and UTF-16 | 16-bit, fixed-width encoding, difference is that UCS-2 only supports the BMP |
| UCS-4 and UTF-32 |
functionally identical 32-bit fixed-width encodings |
| UTF-EBCDIC |
unpopular encoding intended for EBCDIC based mainframe systems |
Now Unicode 2.0 is widely used.
Reference
- UTF-8 and Unicode FAQ CN
http://www.linuxforum.net/books/UTF-8-Unicode.htm - ASCII
http://www.answers.com/topic/ascii#Wikipedia - ISO/IEC 8859-1
http://www.answers.com/topic/iso-8859-1 - Unicode
http://www.answers.com/topic/unicode#Wikipedia - UTF-8
http://www.answers.com/topic/UTF-8#Wikipedia - UTF-8 encoding table and Unicode characters
http://www.utf8-chartable.de/unicode-utf8-table.pl


