Thread: select fails on indexed varchars.

select fails on indexed varchars.

From
Alex Krohn
Date:
Hi,

First off I'm running:

links=# select version() ;
                               version
---------------------------------------------------------------------
 PostgreSQL 7.0.3 on i686-pc-linux-gnu, compiled by gcc egcs-2.91.66
(1 row)

Now, if I have a table with an index, I'm not able to do some selects on
it. To reproduce:

        links=# create table foo ( a char(25) );
        CREATE
        links=# create index foodx on foo (a);
        CREATE
        links=# insert into foo values ('Test/Test');
        INSERT 29689 1
        links=# select * from foo;
                     a
        ---------------------------
         Test/Test
        (1 row)

        links=# select * from foo where a like 'Test/%'
        links-# ;
         a
        ---
        (0 rows)

# Strange result, why 0 rows, the select failed.

        links=# select * from foo where a like 'Test%';
                     a
        ---------------------------
         Test/Test
        (1 row)

# This one's fine.

        links=# select * from foo where a like 'Test/T%';
                     a
        ---------------------------
         Test/Test
        (1 row)

# And so is this one.

It seems the /% causes a problem. If I drop the index, the selects work
fine. Is this a bug, am I missing something? If you need any other
system info, please let me know. I did an RPM install on a pretty plain
Redhat 6.2 system.

Please reply to alex@gossamer-threads.com with any ideas.

Thanks!

Alex

--------------------  Gossamer Threads Inc.  ----------------------
Alex Krohn                        Email: alex@gossamer-threads.com
Internet Consultant               Phone: (604) 687-5804
http://www.gossamer-threads.com   Fax  : (604) 687-5806

Re: select fails on indexed varchars.

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
>         links=# select * from foo where a like 'Test/%'
>         links-# ;
>          a
>         ---
>         (0 rows)

This looks like an artifact of the known problems with LIKE index
optimization in non-ASCII locales.  What locale are you running the
postmaster in?

            regards, tom lane

Re: select fails on indexed varchars.

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
> [postgres@penguin pgsql]$ locale
> LANG=en_US
> LC_CTYPE="en_US"
> LC_NUMERIC="en_US"
> LC_TIME="en_US"
> LC_COLLATE="en_US"
> LC_MONETARY="en_US"
> LC_MESSAGES="en_US"
> LC_ALL=en_US
> [postgres@penguin pgsql]$

> Postmaster is running as user pgsql. Any ideas on workarounds?

Use locale "C" unless you have a really good reason why you need
en_US sorting order.

Beware of changing the postmaster's locale on the fly, however,
since that will leave you with corrupted (out-of-order) indexes.
Safest to dump/initdb in new locale/reload.

            regards, tom lane

Re: select fails on indexed varchars.

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
>> Beware of changing the postmaster's locale on the fly, however,
>> since that will leave you with corrupted (out-of-order) indexes.
>> Safest to dump/initdb in new locale/reload.

> How would I go about changing that? Setting LANG and LC_ALL in the pgsql
> users home directory .bashrc? Or do I need to edit the startup file?

I'd recommend setting LANG/LC_xxx directly in the script you use to fire
up the postmaster.  This ensures it will be right no matter whether the
postmaster is launched by a boot script, by hand by someone logged in as
pgsql, by hand by someone su'd from another account with different
locale, yadda yadda.

            regards, tom lane

Re: select fails on indexed varchars.

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
> So I added:
> LANG=C
> LC_ALL=C
> to the /etc/rc.d/init.d/postgres file and stop, restarted the server. I
> then dropped and recreated the database. However still same results.

LC_COLLATE overrides LC_ALL, I think --- didn't you previously show us
that all the LC_xxx family variables were set in your default
environment?  You may need to set (or unset if you prefer) all of 'em.

Also, I'd really recommend an initdb, not the above half-baked approach,
because the above will not fix any problems that the template1 indexes
might have with a changed sort order.

            regards, tom lane

Re: select fails on indexed varchars.

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
> I added to the startup file:
> LANG=C
> LC_CTYPE=C
> LC_NUMERIC=C
> LC_TIME=C
> LC_COLLATE=C
> LC_MONETARY=C
> LC_MESSAGES=C
> LC_ALL=C

Seems reasonable.  It's possible you needed "export" commands in there
too, but I wouldn't have thought so (anything coming in from the outer
environment should be exported already).

> After this, my create test and select still produced the same error. Ugh.

Hm, maybe I'm barking up the wrong tree.  Let's try a direct test.
What do you get from

    select 'a_b'::text < 'ac'::text;

    select 'A_B'::text < 'ac'::text;

On my machine, these produce 't' in C locale, but 'f' in en_US locale.

            regards, tom lane

Re: select fails on indexed varchars.

From
Alex Krohn
Date:
Hi,

> Alex Krohn <alex@gossamer-threads.com> writes:
> >> Beware of changing the postmaster's locale on the fly, however,
> >> since that will leave you with corrupted (out-of-order) indexes.
> >> Safest to dump/initdb in new locale/reload.
>
> > How would I go about changing that? Setting LANG and LC_ALL in the pgsql
> > users home directory .bashrc? Or do I need to edit the startup file?
>
> I'd recommend setting LANG/LC_xxx directly in the script you use to fire
> up the postmaster.  This ensures it will be right no matter whether the
> postmaster is launched by a boot script, by hand by someone logged in as
> pgsql, by hand by someone su'd from another account with different
> locale, yadda yadda.

So I added:

LANG=C
LC_ALL=C

to the /etc/rc.d/init.d/postgres file and stop, restarted the server. I
then dropped and recreated the database. However still same results.

Is this only my installation that has this problem? It's just a very
plain RedHat 6.2 with rpm'd install of Postgres, so I'm a little wary
about how many other people will be having this problem as well.

Cheers,

Alex

--------------------  Gossamer Threads Inc.  ----------------------
Alex Krohn                        Email: alex@gossamer-threads.com
Internet Consultant               Phone: (604) 687-5804
http://www.gossamer-threads.com   Fax  : (604) 687-5806

Re: select fails on indexed varchars.

From
Alex Krohn
Date:
Hi,

> > So I added:
> > LANG=C
> > LC_ALL=C
> > to the /etc/rc.d/init.d/postgres file and stop, restarted the server. I
> > then dropped and recreated the database. However still same results.
>
> LC_COLLATE overrides LC_ALL, I think --- didn't you previously show us
> that all the LC_xxx family variables were set in your default
> environment?  You may need to set (or unset if you prefer) all of 'em.
>
> Also, I'd really recommend an initdb, not the above half-baked approach,
> because the above will not fix any problems that the template1 indexes
> might have with a changed sort order.

I added to the startup file:

LANG=C
LC_CTYPE=C
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE=C
LC_MONETARY=C
LC_MESSAGES=C
LC_ALL=C

as well as to the postgres users default environment. I then shut down
postmaster, and as user postgres ran `initdb /var/lib/pgsql`. I then ran
/etc/rc.d/init.d/postgres start as root, and then as user postgres ran
`createdb mytest`.

After this, my create test and select still produced the same error. Ugh.

Cheers,

Alex

Re: select fails on indexed varchars.

From
Alex Krohn
Date:
Hi Tom,

> > [postgres@penguin pgsql]$ locale
> > LANG=en_US
> > LC_CTYPE="en_US"
> > LC_NUMERIC="en_US"
> > LC_TIME="en_US"
> > LC_COLLATE="en_US"
> > LC_MONETARY="en_US"
> > LC_MESSAGES="en_US"
> > LC_ALL=en_US
> > [postgres@penguin pgsql]$
>
> > Postmaster is running as user pgsql. Any ideas on workarounds?
>
> Use locale "C" unless you have a really good reason why you need
> en_US sorting order.
>
> Beware of changing the postmaster's locale on the fly, however,
> since that will leave you with corrupted (out-of-order) indexes.
> Safest to dump/initdb in new locale/reload.

How would I go about changing that? Setting LANG and LC_ALL in the pgsql
users home directory .bashrc? Or do I need to edit the startup file?

Cheers,

Alex

Re: select fails on indexed varchars.

From
Alex Krohn
Date:
Hi Tom,

> Alex Krohn <alex@gossamer-threads.com> writes:
> >         links=# select * from foo where a like 'Test/%'
> >         links-# ;
> >          a
> >         ---
> >         (0 rows)
>
> This looks like an artifact of the known problems with LIKE index
> optimization in non-ASCII locales.  What locale are you running the
> postmaster in?

Is this what you are looking for:

[postgres@penguin pgsql]$ locale
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_ALL=en_US
[postgres@penguin pgsql]$

Postmaster is running as user pgsql. Any ideas on workarounds?

Cheers,

Alex

Re: select fails on indexed varchars.

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
>> On my machine, these produce 't' in C locale, but 'f' in en_US locale.

> Seem to be in C locale:

So it does.  Okay, what was the complete test case again?
I'm afraid I didn't save your original message because I wrote it off
as a known problem ...

            regards, tom lane

Re: select fails on indexed varchars.

From
Alex Krohn
Date:
Hi Tom,

> > I added to the startup file:
> > LANG=C
> > LC_CTYPE=C
> > LC_NUMERIC=C
> > LC_TIME=C
> > LC_COLLATE=C
> > LC_MONETARY=C
> > LC_MESSAGES=C
> > LC_ALL=C
>
> Seems reasonable.  It's possible you needed "export" commands in there
> too, but I wouldn't have thought so (anything coming in from the outer
> environment should be exported already).
>
> > After this, my create test and select still produced the same error. Ugh.
>
> Hm, maybe I'm barking up the wrong tree.  Let's try a direct test.
> What do you get from
>
>     select 'a_b'::text < 'ac'::text;
>
>     select 'A_B'::text < 'ac'::text;
>
> On my machine, these produce 't' in C locale, but 'f' in en_US locale.

Seem to be in C locale:

links=# select 'a_b'::text < 'ac'::text;
 ?column?
----------
 t
(1 row)

links=# select 'A_B'::text < 'ac'::text;
 ?column?
----------
 t
(1 row)

links=#

Cheers,

Alex

--------------------  Gossamer Threads Inc.  ----------------------
Alex Krohn                        Email: alex@gossamer-threads.com
Internet Consultant               Phone: (604) 687-5804
http://www.gossamer-threads.com   Fax  : (604) 687-5806

Re: select fails on indexed varchars.

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
>>> Seem to be in C locale:
>>
>> So it does.  Okay, what was the complete test case again?
>> I'm afraid I didn't save your original message because I wrote it off
>> as a known problem ...

> Here it is:

>         links=# create table foo ( a char(25) );
>         CREATE
>         links=# create index foodx on foo (a);
>         CREATE
>         links=# insert into foo values ('Test/Test');
>         INSERT 29689 1
>         links=# select * from foo;
>                      a
>         ---------------------------
>          Test/Test
>         (1 row)

>         links=# select * from foo where a like 'Test/%';
>          a
>         ---
>         (0 rows)

How odd.  I get 'Test/Test' from the last select, under both 7.0.2
and current sources, when using C locale.  The query certainly looks
like the kind that would suffer from the LIKE-optimization problem in
non-C locales ... but we seem to have established that you've gotten
the postmaster switched into C locale.

What does EXPLAIN VERBOSE select * from foo where a like 'Test/%';
show?

            regards, tom lane

Re: select fails on indexed varchars.

From
Tom Lane
Date:
Alex Krohn <alex@gossamer-threads.com> writes:
>> What does EXPLAIN VERBOSE select * from foo where a like 'Test/%';
>> show?

Well, the indexqual is just what it should be for C locale:

:indxqual ((
{ EXPR :typeOid 16  :opType op :oper
{ OPER :opno 1061 :opid 1052 :opresulttype 16 } :args (
{ VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 29  :varlevelsup 0 :varnoold 1 :varoattno 1}
{ CONST :consttype 1042 :constlen -1 :constisnull false :constvalue  9 [ 9 0 0 0 84 101 115 116 47 ]  :constbyval false
})} 
{ EXPR :typeOid 16  :opType op :oper
{ OPER :opno 1058 :opid 1049 :opresulttype 16 } :args (
{ VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 29  :varlevelsup 0 :varnoold 1 :varoattno 1}
{ CONST :consttype 1042 :constlen -1 :constisnull false :constvalue  9 [
9 0 0 0 84 101 115 116 48 ]  :constbyval false })}))

This mess translates as

    a >= 'Test/'::bpchar AND a < 'Test0'::bpchar

which is what the LIKE index optimizer is supposed to generate.
I infer that one or the other of these conditions yields false on your
machine, which should not be happening if the thing is in C locale.

            regards, tom lane

Re: select fails on indexed varchars.

From
Alex Krohn
Date:
Hi Tom,

> Alex Krohn <alex@gossamer-threads.com> writes:
> >> On my machine, these produce 't' in C locale, but 'f' in en_US locale.
>
> > Seem to be in C locale:
>
> So it does.  Okay, what was the complete test case again?
> I'm afraid I didn't save your original message because I wrote it off
> as a known problem ...

Here it is:

        links=# create table foo ( a char(25) );
        CREATE
        links=# create index foodx on foo (a);
        CREATE
        links=# insert into foo values ('Test/Test');
        INSERT 29689 1
        links=# select * from foo;
                     a
        ---------------------------
         Test/Test
        (1 row)

        links=# select * from foo where a like 'Test/%';
         a
        ---
        (0 rows)

Cheers,

Alex

Re: select fails on indexed varchars.

From
Alex Krohn
Date:
Hi Tom,

> >> So it does.  Okay, what was the complete test case again?
> >> I'm afraid I didn't save your original message because I wrote it off
> >> as a known problem ...
>
> > Here it is:
>
> >         links=# create table foo ( a char(25) );
> >         CREATE
> >         links=# create index foodx on foo (a);
> >         CREATE
> >         links=# insert into foo values ('Test/Test');
> >         INSERT 29689 1
> >         links=# select * from foo;
> >                      a
> >         ---------------------------
> >          Test/Test
> >         (1 row)
>
> >         links=# select * from foo where a like 'Test/%';
> >          a
> >         ---
> >         (0 rows)
>
> How odd.  I get 'Test/Test' from the last select, under both 7.0.2
> and current sources, when using C locale.  The query certainly looks
> like the kind that would suffer from the LIKE-optimization problem in
> non-C locales ... but we seem to have established that you've gotten
> the postmaster switched into C locale.
>
> What does EXPLAIN VERBOSE select * from foo where a like 'Test/%';
> show?

Here's a cut and paste, not sure if there is a nicer way to output it:

links=# EXPLAIN VERBOSE select * from foo where a like 'Test/%';
NOTICE:  QUERY DUMP:

{ INDEXSCAN :startup_cost 0.00 :total_cost 8.14 :rows 10 :width 12 :state <> :qptargetlist ({ TARGETENTRY :resdom {
RESDOM:resno 1 :restype 1042 :restypmod 29 :resname a :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr {
VAR:varno 1 :varattno 1 :vartype 1042 :vartypmod 29  :varlevelsup 0 :varnoold 1 :varoattno 1}}) :qpqual ({ EXPR
:typeOid16  :opType op :oper { OPER :opno 1211 :opid 850 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype
1042:vartypmod 29  :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST :consttype 25 :constlen -1 :constisnull false
:constvalue 10 [ 10 0 0 0 84 101 115 116 47 37 ]  :constbyval false })}) :lefttree <> :righttree <> :extprm () :locprm
():initplan <> :nprm 0  :scanrelid 1 :indxid ( 18825) :indxqual (({ EXPR :typeOid 16  :opType op :oper { OPER :opno
1061:opid 1052 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 29  :varlevelsup 0
:varnoold1 :varoattno 1} { CONST :consttype 1042 :constlen -1 :constisnull false :constvalue  9 [ 9 0 0 0 84 101 115
11647 ]  :constbyval false })} { EXPR :typeOid 16  :opType op :oper { OPER :opno 1058 :opid 1049 :opresulttype 16 }
:args({ VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 29  :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST
:consttype1042 :constlen -1 :constisnull false :constvalue  9 [ 9 0 0 0 84 101 115 116 48 ]  :constbyval false })}))
:indxqualorig(({ EXPR :typeOid 16  :opType op :oper { OPER :opno 1061 :opid 1052 :opresulttype 16 } :args ({ VAR :varno
1:varattno 1 :vartype 1042 :vartypmod 29  :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST :consttype 1042 :constlen -1
:constisnullfalse :constvalue  9 [ 9 0 0 0 84 101 115 116 47 ]  :constbyval false })} { EXPR :typeOid 16  :opType op
:oper{ OPER :opno 1058 :opid 1049 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 29
:varlevelsup0 :varnoold 1 :varoattno 1} { CONST :consttype 1042 :constlen -1 :constisnull false :constvalue  9 [ 9 0 0
084 101 115 116 48 ]  :constbyval false })})) :ind 
NOTICE:  QUERY PLAN:

Index Scan using foodx on foo  (cost=0.00..8.14 rows=10 width=12)

EXPLAIN
links=#

Cheers,

Alex

Re: select fails on indexed varchars.

From
Tom Lane
Date:
> links=# select * from foo where a < 'Test0'::bpchar;
>  a
> ---
> (0 rows)

> links=#

> Are you saying the second test should have returned true under C locale?

Yes.  You are not really in C locale, or at least your postmaster isn't.
This looks like ISO sorting rules to me --- perhaps you are in en_US
locale.

> Is this a version dependant bug? Will downgrading to 6.x get me going?

No.  Fix your locale.  I'm afraid I don't have any more advice to offer
on that than I gave already ...

            regards, tom lane

Re: select fails on indexed varchars.

From
Alex Krohn
Date:
Hi Tom,

> >> What does EXPLAIN VERBOSE select * from foo where a like 'Test/%';
> >> show?
>
> Well, the indexqual is just what it should be for C locale:
>
> :indxqual ((
> { EXPR :typeOid 16  :opType op :oper
> { OPER :opno 1061 :opid 1052 :opresulttype 16 } :args (
> { VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 29  :varlevelsup 0 :varnoold 1 :varoattno 1}
> { CONST :consttype 1042 :constlen -1 :constisnull false :constvalue  9 [ 9 0 0 0 84 101 115 116 47 ]  :constbyval
false})}  
> { EXPR :typeOid 16  :opType op :oper
> { OPER :opno 1058 :opid 1049 :opresulttype 16 } :args (
> { VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 29  :varlevelsup 0 :varnoold 1 :varoattno 1}
> { CONST :consttype 1042 :constlen -1 :constisnull false :constvalue  9 [
> 9 0 0 0 84 101 115 116 48 ]  :constbyval false })}))
>
> This mess translates as
>
>     a >= 'Test/'::bpchar AND a < 'Test0'::bpchar
>
> which is what the LIKE index optimizer is supposed to generate.
> I infer that one or the other of these conditions yields false on your
> machine, which should not be happening if the thing is in C locale.

Here's what I get:

links=# select * from foo where a >= 'Test/'::bpchar;
             a
---------------------------
 Test/Test
(1 row)

links=# select * from foo where a < 'Test0'::bpchar;
 a
---
(0 rows)

links=#

Are you saying the second test should have returned true under C locale?

Is this a version dependant bug? Will downgrading to 6.x get me going?

Cheers,

Alex