Glyph Naming and Encoding»

Glyph Naming and Character Encoding»

Viewing a font by many different character indexing methods and groupings is a key FontLab VI feature.

Here is how it works:

A font is mostly a big collection of glyphs that are used to represent many characters. On an average screen the Font Window can show just a few hundred character cells, so we need to have some method to browse the font “through” the Font Window. Also, different/older font formats use different methods to encode characters.

In FontLab, you can choose the Encodings menu (in the top part of the Font Window) to display a subset of the glyph collection.

In the following sections you will find more information about encoding modes, Unicode and name-based identification and the character-glyph model. See the About Glyphs article for more details.

See Language Support and Glyphs for more details of selecting characters for your font.

Characters, Codes and Glyphs»

In addition to storing each glyph, a font has some header information that stores general information about the font such as the family name, the style name, the copyright string, the ascender and descender values, and others.

Simply speaking, text in digital form is a collection of character codes (or “codepoints”) — integer numbers. When you enter text into a computer, the computer turns the keystrokes that you press on the keyboard into integers and assigns a number (character code) to each character that you enter. When the computer needs to show some text on screen or print it, it accesses a font and turns the character codes into visual shapes.

A character encoding standard is a table that defines the relation between characters and the codes that are used to represent these characters in the computer.

Character Encoding Standards»

There are many other character encoding standards (sometimes called codepages) used in the world to help use different languages.

One major difference between the encoding standards (besides the assignment of codes) is the size of the code. There are one-byte, double-byte and multi-byte encoding standards. With a one-byte encoding, each character in the text is encoded using exactly one byte (8 bits of information). This means that only 256 different characters can be encoded in a single one-byte encoding standard.

A double-byte encoding uses two bytes (16 bits) for every character, so it’s possible to map 65,536 characters.

Multi-byte mapping standards use from one to four bytes for every character — expanding the code space to billions of characters.

The biggest problem of single-byte encoding standards, such as the old Mac and Windows codepages, is their limited capacity. With only 256 slots (available codepoints), usually only characters from one alphabet (writing system) can be encoded. So for example it is not possible to encode Latin and Cyrillic text using a single codepage.

256 character codes are not even sufficient to encode various accented (diacritic) characters from different languages that use the Roman alphabet. This is why separate codepages were created for Western European languages (English, German, French etc.), Central and Eastern European languages (Polish, Czech, Hungarian etc.), Baltic languages (Latvian, Lithuanian, Estonian etc.) and so on.

In addition, different companies assign character codes differently. For example: the letter ä (adieresis) is represented by the character code 228 in the Windows Western (WinANSI, 1252) codepage used by Microsoft, and with the character code 138 in the MacOS Roman codepage used by Apple. The confusion becomes evident if you realize that the same code (138) in the Windows Western codepage is used to represent the Š (Scaron) that… does not have its own codepoint at all in MacOS Roman. On the Macintosh, the Scaron is only available in the MacOS Central European codepage, under the code 225!

Most of this confusion is somewhat reduced by using Unicode as a standard, all-encompassing encoding. Apple, Microsoft and most other companies have recognized Unicode encodings for many years now. The old encodings are mostly useful as a way of viewing a specific glyphset rather than for mapping of codes. By default, FontLab exports fonts with a Unicode encoding.