HS Password (10FFF0-10FFFF) (End for UTF-16)

Click on the stars to rate this FontStruction.
Balanced Rating: 7.97
Average Rating: 1.00
Click for more information about this rating. 7.97 0 votes

by G64(2)

6 2160

Info:

Created on 22nd February 2022. Last edited on 22nd February 2022.
License:

FontStruct License
Categories:
Sets:
Tag:

2 Comments

Speaking of UTF-16, in 1989, There were a draft of ISO 10646 that defined 128 groups of 256 planes of 256 rows of 256 cells, which was published in 1990. This allowed for 2,147,483,648 characters, but only 679,477,248 characters could have been encoded due to the policy forbidding byte values of control characters (0x00 to 0x1F and 0x80 to 0x9F) in any byte of the group, plane, row, and cell. For example, the Latin capital A was defined in group 0x20, plane 0x20, row 0x20, and cell 0x41).

One could have encodede the characters three ways:

1: UCS-4, four bytes for every characters, which enabled the simple encoding of all characters.

2: UCS-2, two bytes for every characters, which enabled the encoding of the first plane, 0x20, the Basic Multilanugal Plane, which contained the first 36,864 codepoints, and other planes and groups could have been accessed via ISO 2022 escape sequences.

3: UTF-1, which encoded all the characters in sequences of bytes of varying length (from 1 to 5 bytes, which no byte contained control codes.

In 1990, two initiatives for a universal character set had existed: Unicode, with 16 bits for every character for 65,536 possible characters, and IsO 10646, which I explained above. Software companies refused to accept the complexity and size requirements of said ISO standard and they convinced a number of ISO National Bodies to vote against it. ISO officials became aware that they could not continue to support the standard in the current state and negotiated the unification of their standards with Unicode. The following changes taken place: Lifting of the prohibition on control bytes (which allowed for codepoints like 0x0000103F (ဿ, Myanmar Letter Great Sa)), as well as the synchronization of the repertoire of the Basic Multilangual Plane with that of Unicode.

Many years later, the situation has changed in Unicode: 65,536 characters was not enough and Version 2.0 added the Surrogate mechanism to encode upwards of 1,112,064 characters within 17 planes. ISO 10646 was then limited to contain as many as can be encoded by UTF-16, and no more (a bit over 1.1 million characters instead of more than 679 million). The UCS-4 version was then incorporated with the same limitation, under UTF-32, although it is not very useful outside internal program data.

TL;DR: UTF-16 arose from the original draft that was written in 1989, and published in 1990, then it was unified with Unicode, then 65,536 characters was not enough, so surrogates were defined in Version 2.0

Comment by Bryndan W. Meyerholt (BWM) 22nd february 2022

Typos: "encodede" should have been "encoded", "Multilanugal" should have been "Multilangual", and "IsO" should have been "ISO"

Comment by Bryndan W. Meyerholt (BWM) 22nd february 2022