用镜头记录,用心灵体验 | 订阅本站 | 所有笔记 | 亲和力设计 | 流量

DB2 学习笔记

博客话题:DB2,Linux,Web,业余无线电,户外,摄影,截拳道,Thankpad,其他

软件开发

A summary for UTF-8 and Unicode FAQ

目录

  1. ASCII and ISO-8859-1 
  2. ISO-10646 and Unicode 
  3. Reference

ASCII and ISO-8859-1 

There're 2 Character Sets most widely-used for computers.

  • ASCII (Basic Latin)
  • ISO-8859-1 (Latin-1)

The table in below shows the internal code range in computer for the 2 Character Sets.

Charset  Hex
Dec  Bin 
ASCII (Basic Latin) 0x00 - 0x7F  0 - 127  00000000 - 01111111 
ISO-8859-1 (Latin-1) 0x80 - 0xFF  128 - 255  10000000 - 11111111 

All ASCII and Latin-1 characters are encoded in one byte(8-bit).

People also refers ISO-8859-1 as the superset of ASCII. there's a little mess. 

ISO-10646 and Unicode 

In around 1991, ISO-10646 and Unicode merged their work, and will keep the 2 Character Sets consistent in the future.

Commonly, people say "unicode" when they are refering to encoding method, it's truly meaning UTF-16.

UTF-7
relatively unpopular 7-bit encoding, often considered obsolete
UTF-8
8-bit, variable-width encoding
UCS-2 and UTF-16 16-bit, fixed-width encoding, difference is that UCS-2 only supports the BMP
UCS-4 and UTF-32
functionally identical 32-bit fixed-width encodings
UTF-EBCDIC
unpopular encoding intended for EBCDIC based mainframe systems

Now Unicode 2.0 is widely used. 

Reference

 

Alex's picture

my email address in picture

搜索|Search

评论|Recent Comments

按月归档|By Month

2009
07
2008
11
10
07
05
04
03
02
01
2007
12
10
07
06
05
04
03
02
01
2006
12
11
10
09
08
07
06
05
04
03
02
01
2005
11
10
09
08
07
04
03
2004
12
11
10
09
08
07
06
05
04
03
02
01
2003
12
10
09
08
06
2002
09
08
04
03
02
2001
12
09
07
06
05

我读|My Books

我的链接|My Links

我的朋友|My Friends

Creative Commons License
This blog is licensed under a Creative Commons License.
Movable Type 4 Logo