Home > mailing lists

Sigh, LIKE indexing is still broken in foreign locales - Mailing list pgsql-bugs

From	Tom Lane
Subject	Sigh, LIKE indexing is still broken in foreign locales
Date	June 7, 2000 22:23:02
Msg-id	14045.960430926@sss.pgh.pa.us Whole thread Raw
In response to	LIKE bug (Moucha Václav <MouchaV@Radiomobil.cz>)
Responses	Re: [HACKERS] Sigh, LIKE indexing is still broken in foreign locales Re: [HACKERS] Sigh, LIKE indexing is still broken in foreign locales
List	pgsql-bugs

Tree view

Moucha Václav <MouchaV@Radiomobil.cz> writes:
> 1. Compilation
>    ./configure --enable-locale    # not needed for RPMS precompiled binaries 

> 2. Starting postmaster
>    export LC_CTYPE=cs_CZ
>    export LC_COLLATE=cs_CZ        # this setting is important for the
> bug result
>    postmaster -S -D /home/pgsql/data -o '-Fe'    

> 3. SQL steps
>    create table test (name text);
>    insert into test values ('�');    # the first char is E1 from LATIN 2
> coding
>    insert into test values ('�b');
>    create index test_index on test (name);
>    set cpu_tuple_cost=1;        # force backend to use index
> scanning
>    select * from test where name like '�%';

> BUG: Only 1 line is selected with '�' only instead of both lines.

The problem here is that given the search pattern '\341%', the planner
generates index limit conditionsname >= '\341' AND name < '\342';

Apparently, in CZ locale it is true that '\341' is less than '\342',
but it does not follow from that that all strings starting with '\341'
are less than '\342'.  In fact '\341b' is considered greater than '\342'.

Since '\341' and '\342' are two different accented forms of 'a'
(if I'm looking at the right character set), this is perhaps not so
improbable as all that.  Evidently the collation rule is that different
accent forms sort the same unless the strings would otherwise be
considered equal, in which case an ordering is assigned to them.

So, the rule we thought we had for generating index bounds falls flat,
and we're back to the same old question: given a proposed prefix string,
how can we generate bounds that are certain to be considered <= and >=
all strings starting with that prefix?

I am now thinking that maybe we should search for a string that compares
greater than "fooz" when the prefix is "foo" --- that is, append a 'z'
to the prefix string.  But I wouldn't be surprised if that fails too
in some locales.

I'm also wondering if the left-hand inequality ('foo' <= any string
beginning with 'foo') might fail in some locales ... we haven't seen
it reported but who knows ...
        regards, tom lane

pgsql-bugs by date:

From: Tom Lane
Date: 07 June 2000, 19:22:42
Subject: Re: Bug?

From: Giles Lean
Date: 08 June 2000, 02:43:25
Subject: Re: [HACKERS] Sigh, LIKE indexing is *still* broken in foreign locales

Sigh, LIKE indexing is still broken in foreign locales - Mailing list pgsql-bugs

Previous

Next

Sigh, LIKE indexing is *still* broken in foreign locales - Mailing list pgsql-bugs

Previous

Next

Sigh, LIKE indexing is still broken in foreign locales - Mailing list pgsql-bugs