23.1. Locale Support
Locale support refers to an application respecting cultural preferences regarding alphabets, sorting, number formatting, etc. Postgres Pro uses the standard ISO C and POSIX locale facilities provided by the server operating system. For additional information refer to the documentation of your system.
23.1.1. Overview
Locale support is automatically initialized when a database cluster is created using initdb
. initdb
will initialize the database cluster with the locale setting of its execution environment by default, so if your system is already set to use the locale that you want in your database cluster then there is nothing else you need to do. If you want to use a different locale (or you are not sure which locale your system is set to), you can instruct initdb
exactly which locale to use by specifying the --locale
option. For example:
initdb --locale=ru_RU
This example for Unix systems sets the locale to Russian (ru
) as spoken in Russia (RU
). Other possibilities might include en_US
(U.S. English) and fr_CA
(French Canadian). If more than one character set can be used for a locale then the specifications can take the form language_territory.codeset
. For example, fr_BE.UTF-8
represents the French language (fr) as spoken in Belgium (BE), with a UTF-8 character set encoding.
What locales are available on your system under what names depends on what was provided by the operating system vendor and what was installed. On most Unix systems, the command locale -a
will provide a list of available locales. Windows uses more verbose locale names, such as German_Germany
or Swedish_Sweden.1252
, but the principles are the same.
Occasionally it is useful to mix rules from several locales, e.g., use English collation rules but Spanish messages. To support that, a set of locale subcategories exist that control only certain aspects of the localization rules:
LC_COLLATE | String sort order |
LC_CTYPE | Character classification (What is a letter? Its upper-case equivalent?) |
LC_MESSAGES | Language of messages |
LC_MONETARY | Formatting of currency amounts |
LC_NUMERIC | Formatting of numbers |
LC_TIME | Formatting of dates and times |
The category names translate into names of initdb
options to override the locale choice for a specific category. For instance, to set the locale to French Canadian, but use U.S. rules for formatting currency, use initdb --locale=fr_CA --lc-monetary=en_US
.
If you want the system to behave as if it had no locale support, use the special locale name C
, or equivalently POSIX
.
Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE
and LC_CTYPE
are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt. (But you can alleviate this restriction using collations, as discussed in Section 23.2.) The default values for these categories are determined when initdb
is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE
command.
The other locale categories can be changed whenever desired by setting the server configuration parameters that have the same name as the locale categories (see Section 19.11.2 for details). The values that are chosen by initdb
are actually only written into the configuration file postgresql.conf
to serve as defaults when the server is started. If you remove these assignments from postgresql.conf
then the server will inherit the settings from its execution environment.
Note that the locale behavior of the server is determined by the environment variables seen by the server, not by the environment of any client. Therefore, be careful to configure the correct locale settings before starting the server. A consequence of this is that if client and server are set up in different locales, messages might appear in different languages depending on where they originated.
Note
When we speak of inheriting the locale from the execution environment, this means the following on most operating systems: For a given locale category, say the collation, the following environment variables are consulted in this order until one is found to be set: LC_ALL
, LC_COLLATE
(or the variable corresponding to the respective category), LANG
. If none of these environment variables are set then the locale defaults to C
.
Some message localization libraries also look at the environment variable LANGUAGE
which overrides all other locale settings for the purpose of setting the language of messages. If in doubt, please refer to the documentation of your operating system, in particular the documentation about gettext.
To enable messages to be translated to the user's preferred language, NLS must have been selected at build time (configure --enable-nls
). All other locale support is built in automatically.
23.1.2. Behavior
The locale settings influence the following SQL features:
Sort order in queries using
ORDER BY
or the standard comparison operators on textual dataPattern matching operators (
LIKE
,SIMILAR TO
, and POSIX-style regular expressions); locales affect both case insensitive matching and the classification of characters by character-class regular expressionsThe ability to use indexes with
LIKE
clauses
The drawback of using locales other than C
or POSIX
in Postgres Pro is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE
. For this reason use locales only if you actually need them.
As a workaround to allow Postgres Pro to use indexes with LIKE
clauses under a non-C locale, several custom operator classes exist. These allow the creation of an index that performs a strict character-by-character comparison, ignoring locale comparison rules. Refer to Section 11.10 for more information. Another approach is to create indexes using the C
collation, as discussed in Section 23.2.
23.1.3. Selecting Locales
Locales can be selected in different scopes depending on requirements. The above overview showed how locales are specified using initdb
to set the defaults for the entire cluster. The following list shows where locales can be selected. Each item provides the defaults for the subsequent items, and each lower item allows overriding the defaults on a finer granularity.
As explained above, the environment of the operating system provides the defaults for the locales of a newly initialized database cluster. In many cases, this is enough: If the operating system is configured for the desired language/territory, then Postgres Pro will by default also behave according to that locale.
As shown above, command-line options for
initdb
specify the locale settings for a newly initialized database cluster. Use this if the operating system does not have the locale configuration you want for your database system.A locale can be selected separately for each database. The SQL command
CREATE DATABASE
and its command-line equivalentcreatedb
have options for that. Use this for example if a database cluster houses databases for multiple tenants with different requirements.Locale settings can be made for individual table columns. This uses an SQL object called collation and is explained in Section 23.2. Use this for example to sort data in different languages or customize the sort order of a particular table.
Finally, locales can be selected for an individual query. Again, this uses SQL collation objects. This could be used to change the sort order based on run-time choices or for ad-hoc experimentation.
23.1.4. Locale Providers
Postgres Pro supports multiple locale providers. This specifies which library supplies the locale data. One standard provider name is libc
, which uses the locales provided by the operating system C library. These are the locales used by most tools provided by the operating system. Another provider is icu
, which uses the external ICU library. ICU locales can only be used if support for ICU was configured when Postgres Pro was built.
The commands and tools that select the locale settings, as described above, each have an option to select the locale provider. The examples shown earlier all use the libc
provider, which is the default. Here is an example to initialize a database cluster using the ICU provider:
initdb --locale-provider=icu --icu-locale=en
See the description of the respective commands and programs for details. Note that you can mix locale providers at different granularities, for example use libc
by default for the cluster but have one database that uses the icu
provider, and then have collation objects using either provider within those databases.
Which locale provider to use depends on individual requirements. For most basic uses, either provider will give adequate results. For the libc provider, it depends on what the operating system offers; some operating systems are better than others. For advanced uses, ICU offers more locale variants and customization options.
23.1.5. Problems
If locale support doesn't work according to the explanation above, check that the locale support in your operating system is correctly configured. To check what locales are installed on your system, you can use the command locale -a
if your operating system provides it.
Check that Postgres Pro is actually using the locale that you think it is. The LC_COLLATE
and LC_CTYPE
settings are determined when a database is created, and cannot be changed except by creating a new database. Other locale settings including LC_MESSAGES
and LC_MONETARY
are initially determined by the environment the server is started in, but can be changed on-the-fly. You can check the active locale settings using the SHOW
command.
Client applications that handle server-side errors by parsing the text of the error message will obviously have problems when the server's messages are in a different language. Authors of such applications are advised to make use of the error code scheme instead.