Re: Controlling locale and impact on LIKE statements - Mailing list pgsql-general

From Alvaro Herrera
Subject Re: Controlling locale and impact on LIKE statements
Date
Msg-id 20070906125743.GD6186@alvh.no-ip.org
Whole thread Raw
In response to Controlling locale and impact on LIKE statements  ("Martin Langhoff" <martin.langhoff@gmail.com>)
List pgsql-general
Martin Langhoff escribió:
> On 9/5/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > Martin Langhoff escribió:
> >
> > > As I have a Pg install where the locale is already en_US.UTF-8, and
> > > the database already exists, is there a DB-scoped way of controlling
> > > the locale?
> >
> > Not really.
>
> Ah well. But I do have to wonder why... if each database can have its
> own encoding, that is likely to be matched with a locale. Isn't that
> the main usage scenario? In fact, with unicode encodings, it's likely
> that all your DBs are utf-8 encoded, but each may have its own locale.

The problem is twofold:

1. index ordering is dependent on locale, and
2. there are some indexes over text columns on shared tables, that is,
tables to are in all databases (pg_database, pg_authid, etc).

So you cannot really change the locale without making those indexes
invalid.  It has been said in the past that it is possible to work
around this, which would allow us to change locale per database, but it
hasn't gotten done yet.

> And yet, right now it's all affected by the locale the cluster was
> init'd under. In my case, old Pg installations have been upgraded a
> few times from a Debian Sarge (C locale). Newer DB servers based on
> ubuntu are getting utf-8-ish locales. And all this variation is
> impacting something that should be per DB...
>
> Is this too crazy to ask? ;-)

Well, you are not the only one to have asked this, so it's probably not
crazy.  It just hasn't gotten any hacker motivated enough yet, though.

> > You are right and Eloy is wrong on that discussion.  There is not
> > anything the DB can do to use the regular index if the locale is not C
> > for LIKE queries.  There are good reasons for this.  There's not much
> > option beyond creating the pattern_ops index.
>
> Are the reasons *really* good? ;-)

Well, I can't remember them ATM :-)  But this was given deep
consideration and the pattern_ops were the best solution to be found.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees."                                      (E. Dijkstra)

pgsql-general by date:

Previous
From: Tino Wildenhain
Date:
Subject: Re: Alias "all fields"?
Next
From: Lincoln Yeoh
Date:
Subject: Re: Need suggestion on how best to update 3 million rows