Re: 7.3 no longer using indexes for LIKE queries - Mailing list pgsql-general

From Tom Lane
Subject Re: 7.3 no longer using indexes for LIKE queries
Date
Msg-id 18114.1039034514@sss.pgh.pa.us
Whole thread Raw
In response to Re: 7.3 no longer using indexes for LIKE queries  (Matthew Gabeler-Lee <mgabelerlee@zycos.com>)
Responses Re: 7.3 no longer using indexes for LIKE queries  (Greg Stark <gsstark@mit.edu>)
List pgsql-general
Matthew Gabeler-Lee <mgabelerlee@zycos.com> writes:
> How about, as an intermediate solution, a list of 'sane' locales in which
> the optimization applied to the C/POSIX locale works?

If you provide such a list, we'll be happy to improve locale_is_like_safe().

Offhand though, I suspect that *most* if not all non-C locales have
problems; even en_US, which has no character set issues, still manages
to insist on a multipass sort algorithm :-(.  An example on a recent
Linux release:

[tgl@g3]$ echo -e  "a\na a\naa\na  a\nab\na b" | LC_ALL=en_US sort
a
aa
a a
a  a
ab
a b

There's no way to use an index ordered like this to look for strings
beginning "a ", because the sorting of spaces depends on what comes
after them.

Making any real dent in the problem will probably require in-depth
analysis of common locale (mis)behaviors.  For example, if space sorting
is the only thing that's funny about en_US, it might make sense for us
to support a modified form of the LIKE optimization that doesn't
consider space as a "safe" prefix character (ie, we could index only
for "a" not "a ", even if the pattern is LIKE 'a %').

I have no idea exactly what sort of compromises would be effective
though.  Any localedef experts out there?

            regards, tom lane

pgsql-general by date:

Previous
From: Matthew Gabeler-Lee
Date:
Subject: Re: 7.3 no longer using indexes for LIKE queries
Next
From: Joseph Shraibman
Date:
Subject: where did debug_print_query go in 7.3???