Conversion table format

My preferred table format is:

For each character, a line containing

0xnn or 0xnnnn

the character's code in the charset (hexadecimal),

whitespace

0xnnnn

the character's code in Unicode (hexadecimal),

whitespace (optional)

a comment, beginning with # and extending to the end of line
The character lines are sorted according to their first column.
Some comment lines (beginning with #) at the beginning (optional).

This is the format in which most of the unicode.org tables come. It has the advantage of being very easy to manipulate using grep and sed.

My preferred table format for tables I generate myself is:

For each character, a line containing

0xnn or 0xnnnn

the character's code in the charset (hexadecimal),

a tab

0xnnnn

the character's code in Unicode (hexadecimal).
The character lines are sorted according to their first column.

This is a special case of the above. It has the advantage of being very easy to manipulate using grep, sed, sort, uniq, join, diff and small C programs (scanf, printf). Plus, it is rather compact.

Comparison of conversion tables
Bruno Haible <bruno@clisp.org>

Last modified: 31 December 2003.