Re: Comparing strings with non-ASCII characters - Mailing list pgsql-general

From Tom Lane
Subject Re: Comparing strings with non-ASCII characters
Date
Msg-id 21525.1037199502@sss.pgh.pa.us
Whole thread Raw
In response to Comparing strings with non-ASCII characters  (Artur Rataj <arataj@iitis.gliwice.pl>)
Responses Re: Comparing strings with non-ASCII characters  (Artur Rataj <arataj@iitis.gliwice.pl>)
List pgsql-general
Artur Rataj <arataj@iitis.gliwice.pl> writes:
> Anyway, why the index optimizations are disabled for the ~ operator
> and not for the < or > operators in the locale?

The < and > operators are compatible with the index ordering, by
definition.  The trouble with ~ is that it is not doing ordering,
but pattern matching.  You can exploit an index to narrow down
the candidates for a left-anchored pattern only when the index
ordering is strict lexicographic ... and in all too many non-C
locales, it ain't.  Digraphs, multipass sorting rules, and things
like that are killers.

We have tried and failed (repeatedly) to find a way to use non-C
indexes for LIKE and ~.  If you'd like to try again, you're more
than welcome, but I suggest you read the PG list archives to learn
the reasons why previous attempts didn't work.

The long-run answer to this will probably be to allow individual columns
to be declared as being of particular locales; then you could assign C
locale to a column you need to do pattern matching on.  This seems to
require building our own locale library :-( ... so it's not going to
happen quickly.

BTW, you might think about using full-text search (contrib/tsearch
or contrib/fulltextindex) instead of pattern matching.

            regards, tom lane

pgsql-general by date:

Previous
From: "Areski"
Date:
Subject: Re: Performance : Optimize query
Next
From: Stephan Szabo
Date:
Subject: Re: index question..