General Structure of UCS

The normative specification of the structure is given below. The value of any octet is expressed in hexadecimal notation from 00 to FF in ISO/IEC 10646.

The canonical form of this coded character set - the way in which it is to be conceived - uses a four-dimensional coding space, regarded as a single entity, consisting of 128 three-dimensional groups (00-7F). Each group consists of 256 two-dimensional planes (00-FF). Each plane consists of 256 one-dimensional rows (00-FF), each row containing 256 cells (00-FF). Thus, each plane contains 65,536 characters position (0000-FFFF).

A character is located and coded at a cell within this coding space or the cell is declared unused. Each character is located within the coded character set in terms of its Group-octet, Plane-octet, Row-octet, and Cell-octet. Therefore, the available coding space is 128 x 256 x 256 x 256=231.

In addition, ISO/IEC 10646 has 2 ways to represent the interchanging code of characters, which are Four-Octet Canonical Form and Two-Octet BMP Form. ISO/IEC 10646 specifies the first plane (Plane 00 of Group 00) to be the Basic Multilingual Plane (BMP).

Figure 1: Entire coding space of the ISO/IEC 10646

Figure 2: Group 00 of the ISO/IEC 10646