Ideograph Characters and Unification

CJK unified ideographs in ISO/IEC 10646-1 contain almost 27,484 ideographs, and are derived from over 66,000 ideographs which are found in various different national and regional standards for coded character sets. It is so called the "source codes" or "sources ".

The source code standards are shown in five groups according to their origins. The groups are identified as the G-, T-, J-, K- and V-sources (simplified Chinese / traditional Chinese / Japanese / Korean / Vietnamese).

For the purposes of ISO/IEC 10646, a unification process is applied to the ideographic characters taken from the codes in the source groups. In this process single ideographs from two or more of the source groups are associated together, and a single code position is assigned to them in this standard. The associations are made according to a set of procedures that are described below. Ideographs that are thus associated are described here as "unified".

The Unification procedure consists of:

1. Scope of unification
Ideographs that are unrelated in historical derivation (non-cognate characters) have not been unified. An association between ideographs from different sources is made here if their shapes are sufficiently similar, according to the following system of classification.

2. Two level classification
A two-level system of classification is used to differentiate (a) between abstract shapes and (b) between actual shapes determined by particular typefaces. Variant forms of an ideograph, which can not be unified, are identified based on the difference between their abstract shapes.

3. Procedure
A unification procedure is used to determine whether two ideographs have the same abstract shape or different ones. The unification procedure has two stages, applied in the following order:

(a) Analysis of component structure;
The component structure of each ideograph is examined. A component of an ideograph is a geometrical combination of primitive elements. Alternative ideographs can be configured from the same set of components. Components can be combined to create a new component with a more complicated structure. An ideograph, therefore, can be defined as a component tree, where the top node is the ideograph itself, and the bottom nodes are the primitive elements.

(b) Analysis of component features;
The components located at corresponding nodes of two ideographs are compared, staring from the most superior node.