Characters and Unification
CJK unified ideographs in ISO/IEC 10646-1 contain almost 27,484 ideographs, and are derived from over 66,000 ideographs which are found in various different national and regional standards for coded character sets. It is so called the "source codes" or "sources ".
The source code standards are shown in five groups according to their origins. The groups are identified as the G-, T-, J-, K- and V-sources (simplified Chinese / traditional Chinese / Japanese / Korean / Vietnamese).
For the purposes of ISO/IEC 10646, a unification process is applied to the ideographic characters taken from the codes in the source groups. In this process single ideographs from two or more of the source groups are associated together, and a single code position is assigned to them in this standard. The associations are made according to a set of procedures that are described below. Ideographs that are thus associated are described here as "unified".
procedure consists of:
1. Scope of unification
Ideographs that are unrelated in historical derivation (non-cognate
characters) have not been unified. An association between ideographs
from different sources is made here if their shapes are sufficiently
similar, according to the following system of classification.
2. Two level classification
A two-level system of classification is used to differentiate (a)
between abstract shapes and (b) between actual shapes determined
by particular typefaces. Variant forms of an ideograph, which can
not be unified, are identified based on the difference between their
A unification procedure is used to determine whether two ideographs
have the same abstract shape or different ones. The unification
procedure has two stages, applied in the following order:
(a) Analysis of component structure;
The component structure of each ideograph is examined. A component
of an ideograph is a geometrical combination of primitive elements.
Alternative ideographs can be configured from the same set of components.
Components can be combined to create a new component with a more
complicated structure. An ideograph, therefore, can be defined as
a component tree, where the top node is the ideograph itself, and
the bottom nodes are the primitive elements.
(b) Analysis of component features;
The components located at corresponding nodes of two ideographs
are compared, staring from the most superior node.