Mimer SQL Unicode Collation Charts
Languages - Predefined and Downloadable
Below are specifications on sorting adjustments for various languages, so called tailorings,
needed to get the correct national sort order compared to the Unicode default sorting order.
In the table below, languages with their names bolded are among the predefined collations
included in the current version of Mimer SQL.
For some of the languages that are not bolded, the collation definition can be found and
easily used by copy/paste. Where applicable, see Uyghur
for example, the respective language's page contains a Collation link (in the top
of the page) that leads to the CREATE COLLATION statement used to define the collation.
In this context a script is a collection of symbols used to represent textual information.
The Unicode Character Database (UCD)
provides data for a mapping from Unicode characters to script names.
European Ordering Rules (EOR) is a standard
that defines how Latin, Greek and Cyrillic scripts should be sorted.
It should provide guidance on sorting European repertoires in Unicode.
ISO/IEC 8859-1 (SQL datatype CHAR)
The following script for Latin-1 representation
is used with the CHAR datatype in SQL.
Unicode (SQL datatype NCHAR)
Below are scripts for the Unicode representation,
used with the NCHAR datatype in SQL.
The Default Unicode Collation Element Table (DUCET) is provided in the
AllKeys table, as stated in the
specification for the Unicode Collation Algorithm (UCA).
This table provides a mapping from characters to collation elements.
The following scripts represent different parts of the table,
given in the order they are defined.
The Variable script above includes characters that may be set to Ignorable by using a collation option.
Among these characters space, punctuation marks and most symbols can be found.
The Common script above includes digits, currency symbols, etc.