Questions
about ISO 10646
1.
What is coding standards for Chinese characters?
It is the coding standards to manipulate and display Chinese Characters
on computers by using a set of internal codes. Big5 is commonly used
in both Taiwan and Hong Kong, Guo Biao (GB2312-80) is commonly used
in Mainland China. In recent years, Unicode (equivalent to ISO/IEC
10646 standard) are implemented on many systems and conversion facilities
between Big5-Unicode and GB-Unicode are provided.
2. What
is ISO/IEC 10646?
ISO/IEC 10646, which is named the Universal Multiple-Octet Coded
Character Set, is an international coding standard developed by
the International Organization for Standardization (ISO), This standard
is applicable to the representation, transmission, interchange,
processing, storage, input and presentation of the written form
of world languages (scripts) in electronic basis.
3. Why do
we need to unify coding standards?
When electronic information is received from one place to another
place where the systems use different coding standards, such information
may become mis-coded or incorrectly displayed even if code conversion
is applied. The aim of ISO 10646 standard is to develop a Universal
code standards which contains all scripts in the world for the communication,
exchange, and processing of electronic information without the need
to convert in different systems and different scripts can be processed
on the same platform.
4. What
is the current status of ISO /IEC 10646?
The year 2000 version of ISO 10646, ISO/IEC 10646-1:2000, was released
in October 2000 which includes 6,582 additional ideographic characters
(in Extension A, see next paragraph) in addition to 20,902 ideographic
characters already defined in ISO/IEC 10646-1:1993. All characters
in the HKSCS which are not included in ISO 10646 already have been
submitted to the IRG for inclusion in future releases of ISO 10646.
Ideograph character extensions to ISO/IEC 10646-1:1993 are carried
out in phases, referred to as Extension A, Extension B and Extension
C, etc. Extension A, which contains 6,582 ideographic characters,
was released in October 2000. Extension B was published in ISO/IEC
10646-2:2001 with ideographic characters mainly from Kang Xi Dictionary,
Han Yu Da Zi Dian and Han Yu Da Ci Dian. Currently, the IRG is working
on Extension C and publication date is yet to be determined.
5. What
is Unicode?
Unicode Consortium is an industry consortium, which produces the
Unicode Standard and provides implementations for its standards.
Unicode standards are code-for-code identical to ISO/IEC 10646.
In general, Unicode provides character names and code values with
important implementation algorithms, properties, and semantic information,
whereas ISO/IEC 10646 provides the same names and code values. Thus
it is said to be the implementation of ISO/IEC 10646.
Unicode 3.0 is code-for-code identical to ISO/IEC 10646-1:2000 for
all encoded characters, including the East Asian (Han) ideographic
characters. Unicode 3.1, the latest version, has 44,946 additional
characters (42,711 of them are ideographic characters) in addition
to 49,194 characters already defined in Unicode 3.0. Unicode 3.1
was also released in 2001.
6. What
is the relationship between Unicode and ISO /IEC 10646?
The Unicode Standard is fully compatible with the International
Standard - ISO/IEC 10646-1:2000. A formal convergence of Unicode
and ISO/IEC 10646 was negotiated, and their repertoires were merged
into one universal standard for coding multilingual text in January
1992 in order to keep their respective versions synchronized. In
general, Unicode provides character names and code values with important
implementation algorithms, properties, and semantic information,
whereas ISO/IEC 10646 provides the same names and code values. Thus
it is said to be the implementation of ISO/IEC 10646.
7. What
is Big5?
The name of "Big5" was drawn up by five large computer markers in
Taiwan that developed the de facto standard, also called industrial
standard, in 1984. It contains 13,051 distinct Chinese characters,
arranged in two levels by number of strokes and then by radical.
It is also commonly used in Hong Kong.
8. What
is Guo Biao (GB)?
Stands for a series of standards issued by the government in Mainland
China. The most commonly used computer coding standards is called
GB2312-80, which time is referred to as GB2313 or simply GB. Those
characters are arranged by two levels, the first is arranged by
pronunciation (Pinyin) and the second is by radical then number
of strokes.
9. How is
the Character Subset Allocation (in BMP)?
In the ISO/IEC 10646-1:2000 (Unicode 3.0), BMP code space is divided
into several character blocks: General Script Area (0000-1FFF),
Symbols Area (2000-28FF), CJK Phonetics and Symbols Area (2E80-33FF),
CJK Ideographs Area (3400-9FA5), Yi Syllables Area (A000-A4C6),
Hangul Syllables Area (AC00-D7A3), Surrogates Area (D800-DFFF),
Private Use Area (E000-F8FF), and Compatibility and Specials Area
(F900-FA2D).
|