Character Subset Allocation (in BMP)
In the ISO/IEC
10646-1:2000 (Unicode 3.0), BMP's code space is divided into several
areas, and which are then subdivided into character blocks:
General Script Area (0000-1FFF)
It consists of alphabetic and syllabic scripts that have relatively
small character sets, such as Latin, Cyrillic, Greek, Hebrew, Arabic,
Thai, and Devanagari, etc
Symbols Area (2000-28FF)
It includes a large variety of symbols and dingbats, for punctuation,
mathematics, chemistry, technical, and other specialized usage
CJK Phonetics and Symbols Area (2E80-33FF)
It includes punctuation, symbols, radicals, and phonetics for Chinese,
Japanese, and Korean
CJK Ideographs Area (3400-9FA5)
It consists of 27,484 unified CJK ideographs
Yi Syllables Area (A000-A4C6)
It consists of 1,165 syllables and 50 Yi radicals
Hangul Syllables Area (AC00-D7A3)
It consists of 11,172 pre-composed Korean Hangul syllables
Surrogates Area (D800-DFFF)
It consists of 1,024 low-half surrogates and 1,024 high-half surrogates
that are used in the surrogate extension method to access more than
1 million codes for future expansion
Private Use Area (E000-F8FF)
It contains 6,400 code positions used for defining user-specific
or vendor-specific characters
Compatibility and Specials Area (F900-FA2D)
It contains many of the characters from widely used corporate and
national standards that have other representations in Unicode encoding,
as well as several special-use characters
 |
 |
|