Private Use Area



         


Unicode reserves 1,114,112 (= 220 + 216) code points, and currently assigns characters to more than 96,000 of those code points. The first 256 codes precisely match those of ISO 8859-1, the most popular 8-bit character encoding in the "Western world"; as a result, the first 128 characters are also identical to ASCII.

The Unicode code space for characters is divided into 17 "planes" and each plane has 65,536 (= 216) code points.

There is much controversy among CJK specialists, particularly Japanese ones, about the desirability and technical merit of the "Han unification" process used to map multiple Chinese and Japanese character sets into a single set of unified glyphs. (See Chinese character encoding)

The cap of ~220 code points exists in order to maintain compatibility with the UTF-16 encoding, which can only address that range (see below). There is only ten percent current utilization of the Unicode code space. Furthermore, ranges of characters have been tentatively blocked out for every known unencoded script (see ), and while Unicode may need another plane for ideographic characters, there are ten planes that could only be needed if previously unknown scripts with tens of thousands of characters are discovered. This ~20 bit limit is unlikely to be reached in the near future.

[Top]

Basic Multilingual Plane

The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.


As of Unicode 4.01, The BMP includes the following scripts:

Several scripts are expected to be included in the next revision of Unicode:

Several other scripts are proposed for inclusion in the BMP, including:

[Top]

Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane, (SMP) is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.

As of Unicode 4.01, Plane One includes the following scripts:

[Top]

Private Use Area

A Private Use Area is one of several ranges which are reserved for private use. For this range, the Unicode standard does not specify any characters.

The Basic Multilingual Plane includes a Private Use Area in the range U+E000–U+F8FF (57344–63743), and Plane Fifteen (U+F0000–U+FFFFF) and Plane Sixteen (U+100000–0010FFFF) are completely reserved for private use as well.

The use of the Private Use Area was a concept inherited from certain Asian encoding systems. These systems used private use areas to encode Japanese Gaiji (rare personal name characters) in application specific ways. Similarily the ConScript Unicode Registry aims to coördinate the mapping of scripts not yet encoded in or rejected by Unicode in the PUAs.

[Top]

Other planes

Plane 2, the Supplementary Ideographic Plane (SIP), is used for about 40,000 rare Chinese characters that are mostly historic, although there are some modern ones. Plane 14, the Supplementary Special-purpose Plane (SSP), currently contains some non-recommended language tag characters and some variation selection characters.






  View Live Article   This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License