General Structure of UCS
The normative
specification of the structure is given below. The value of any
octet is expressed in hexadecimal notation from 00 to FF in ISO/IEC
10646.
The canonical form of this coded character set - the way in which
it is to be conceived - uses a four-dimensional coding space, regarded
as a single entity, consisting of 128 three-dimensional groups (00-7F).
Each group consists of 256 two-dimensional planes (00-FF). Each
plane consists of 256 one-dimensional rows (00-FF), each row containing
256 cells (00-FF). Thus, each plane contains 65,536 characters position
(0000-FFFF).
A character is located and coded at a cell within this coding space
or the cell is declared unused. Each character is located within
the coded character set in terms of its Group-octet, Plane-octet,
Row-octet, and Cell-octet. Therefore, the available coding space
is 128 x 256 x 256 x 256=231.
In addition, ISO/IEC 10646 has 2 ways to represent the interchanging
code of characters, which are Four-Octet Canonical Form and Two-Octet
BMP Form. ISO/IEC 10646 specifies the first plane (Plane 00 of Group
00) to be the Basic Multilingual Plane (BMP).
Figure 1: Entire coding space of the ISO/IEC
10646

Figure 2: Group
00 of the ISO/IEC 10646

|