Questions about ISO 10646

1. What is coding standards for Chinese characters?
It is the coding standards to manipulate and display Chinese Characters on computers by using a set of internal codes. Big5 is commonly used in both Taiwan and Hong Kong, Guo Biao (GB2312-80) is commonly used in Mainland China. In recent years, Unicode (equivalent to ISO/IEC 10646 standard) are implemented on many systems and conversion facilities between Big5-Unicode and GB-Unicode are provided.

2. What is ISO/IEC 10646?
ISO/IEC 10646, which is named the Universal Multiple-Octet Coded Character Set, is an international coding standard developed by the International Organization for Standardization (ISO), This standard is applicable to the representation, transmission, interchange, processing, storage, input and presentation of the written form of world languages (scripts) in electronic basis.

3. Why do we need to unify coding standards?
When electronic information is received from one place to another place where the systems use different coding standards, such information may become mis-coded or incorrectly displayed even if code conversion is applied. The aim of ISO 10646 standard is to develop a Universal code standards which contains all scripts in the world for the communication, exchange, and processing of electronic information without the need to convert in different systems and different scripts can be processed on the same platform.

4. What is the current status of ISO /IEC 10646?
The year 2000 version of ISO 10646, ISO/IEC 10646-1:2000, was released in October 2000 which includes 6,582 additional ideographic characters (in Extension A, see next paragraph) in addition to 20,902 ideographic characters already defined in ISO/IEC 10646-1:1993. All characters in the HKSCS which are not included in ISO 10646 already have been submitted to the IRG for inclusion in future releases of ISO 10646.

Ideograph character extensions to ISO/IEC 10646-1:1993 are carried out in phases, referred to as Extension A, Extension B and Extension C, etc. Extension A, which contains 6,582 ideographic characters, was released in October 2000. Extension B was published in ISO/IEC 10646-2:2001 with ideographic characters mainly from Kang Xi Dictionary, Han Yu Da Zi Dian and Han Yu Da Ci Dian. Currently, the IRG is working on Extension C and publication date is yet to be determined.

5. What is Unicode?
Unicode Consortium is an industry consortium, which produces the Unicode Standard and provides implementations for its standards. Unicode standards are code-for-code identical to ISO/IEC 10646. In general, Unicode provides character names and code values with important implementation algorithms, properties, and semantic information, whereas ISO/IEC 10646 provides the same names and code values. Thus it is said to be the implementation of ISO/IEC 10646.

Unicode 3.0 is code-for-code identical to ISO/IEC 10646-1:2000 for all encoded characters, including the East Asian (Han) ideographic characters. Unicode 3.1, the latest version, has 44,946 additional characters (42,711 of them are ideographic characters) in addition to 49,194 characters already defined in Unicode 3.0. Unicode 3.1 was also released in 2001.

6. What is the relationship between Unicode and ISO /IEC 10646?
The Unicode Standard is fully compatible with the International Standard - ISO/IEC 10646-1:2000. A formal convergence of Unicode and ISO/IEC 10646 was negotiated, and their repertoires were merged into one universal standard for coding multilingual text in January 1992 in order to keep their respective versions synchronized. In general, Unicode provides character names and code values with important implementation algorithms, properties, and semantic information, whereas ISO/IEC 10646 provides the same names and code values. Thus it is said to be the implementation of ISO/IEC 10646.

7. What is Big5?
The name of "Big5" was drawn up by five large computer markers in Taiwan that developed the de facto standard, also called industrial standard, in 1984. It contains 13,051 distinct Chinese characters, arranged in two levels by number of strokes and then by radical. It is also commonly used in Hong Kong.

8. What is Guo Biao (GB)?
Stands for a series of standards issued by the government in Mainland China. The most commonly used computer coding standards is called GB2312-80, which time is referred to as GB2313 or simply GB. Those characters are arranged by two levels, the first is arranged by pronunciation (Pinyin) and the second is by radical then number of strokes.

9. How is the Character Subset Allocation (in BMP)?
In the ISO/IEC 10646-1:2000 (Unicode 3.0), BMP code space is divided into several character blocks: General Script Area (0000-1FFF), Symbols Area (2000-28FF), CJK Phonetics and Symbols Area (2E80-33FF), CJK Ideographs Area (3400-9FA5), Yi Syllables Area (A000-A4C6), Hangul Syllables Area (AC00-D7A3), Surrogates Area (D800-DFFF), Private Use Area (E000-F8FF), and Compatibility and Specials Area (F900-FA2D).