Thread: Index problem

Index problem

From
Patrik Kudo
Date:
Hi!

We're using postgres 6.5.3 for an application where we, among other
things, store information about users. On two separate occations on
two different servers we've experienced problems with indices on the
same table (out of over 100).

The problem is that postgres refuse to find a certain row in the table
when performing a simple select using the primary (unique) key like
this:

select * from elever where userid = 'walther';

When selecting all rows the row in question is shown. When doing the
following, I get the result I want:

select * from elever where userid like 'w%'; <-- OK
select * from elever where userid like 'wa%'; <-- OK
select * from elever where userid like 'wal%'; <-- OK
select * from elever where userid like 'walt%'; <-- OK

but not

select * from elever where userid like 'walth%'; <-- Not OK!

Droping and recreating the index solves the problem, but that's not
good enough for me since the problem has reoccured on a different
machine with a different database. vacuum and vacuum analyze does not
report any problem with the table.

Both times the problem occured with userid's starting with a
"w". Postgres is running with a Swedish locale, and on FreeBSD this
means that "w" and "v" (among a number of other letter) are treated
equally when collating/sorting. I suppose this could be part of the
problem.

Is this a known problem? If so, is it fixed in 7.0.2? I've not seen
this happen on any of our postgres 7.0.2 systems, but as I can't even
reproduce it on the 6.5.3, that's no guarantee it's fixed...

For the record... I know we shouldn't run 6.5.3 and we will upgrade to
7.x within soon...

psql -c 'select version()'
PostgreSQL 6.5.3 on i386-unknown-freebsd4.1, compiled by cc

uname -a
FreeBSD 4.1-RELEASE #0: Tue Aug 29 15:31:01 CEST 2000

Regards,
Patrik Kudo

ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Känns det oklart? Fråga på!



Re: Index problem

From
Tom Lane
Date:
Patrik Kudo <kudo@partitur.se> writes:
> select * from elever where userid like 'walth%'; <-- Not OK!

> Droping and recreating the index solves the problem, but that's not
> good enough for me since the problem has reoccured on a different
> machine with a different database. vacuum and vacuum analyze does not
> report any problem with the table.

> Both times the problem occured with userid's starting with a
> "w". Postgres is running with a Swedish locale, and on FreeBSD this
> means that "w" and "v" (among a number of other letter) are treated
> equally when collating/sorting. I suppose this could be part of the
> problem.

> Is this a known problem? If so, is it fixed in 7.0.2? I've not seen
> this happen on any of our postgres 7.0.2 systems, but as I can't even
> reproduce it on the 6.5.3, that's no guarantee it's fixed...

Hmm.  I can think of two known issues that may be relevant.  First off,
there are indeed known problems with LIKE index optimization in
non-ASCII locales.  I believe that 7.0 fixes the problems for locales
that just have a non-ASCII sort order of individual characters, but
we've since realized that it can still do the wrong thing in locales
where there are special rules for handling multi-character sequences.
I don't know the Swedish rules offhand, so don't know if that's a
problem for you.

However, a LIKE optimization problem would not be fixed by dropping and
recreating the index.  This makes me think you are being bit by the
other issue: if you compile with LOCALE support then it is critical that
the postmaster *always* be started with the same LOCALE settings (at
least the same LC_COLLATE value).  If LOCALE varies that means your
text sort ordering varies, which means that indexes on text columns may
appear out-of-order, which causes index searches to miss entries they
should have found.  If you insert entries under different LOCALE
settings then you may wind up with an index that is not consistent with
*any* single LOCALE; the only cure for that is to drop and rebuild the
index.

Unfortunately, it's way too easy to get bit by this bug.  The most
common error is to start the postmaster by hand from a shell account
whose LOCALE environment is different from what's supplied when the
postmaster is started from a boot-time script.  Best bet is to set
the correct LOCALE values in a wrapper script that you use in both
cases.

We have talked about fixing this by saving the active LOCALE variables
at initdb time, and having the postmaster adopt those values whenever
it's started.  But it hasn't got done yet.  (Peter, would it be
easy to make GUC handle this?  There'd need to be some way to cause
guc.c to do a putenv() ...)
        regards, tom lane


Re: Index problem

From
Peter Eisentraut
Date:
Tom Lane writes:

> We have talked about fixing this by saving the active LOCALE variables
> at initdb time, and having the postmaster adopt those values whenever
> it's started.  But it hasn't got done yet.  (Peter, would it be
> easy to make GUC handle this?  There'd need to be some way to cause
> guc.c to do a putenv() ...)

I thought pg_control was the place for this.  Any idea how long a locale
string can legally be?  Doesn't look too hard beyond that.

Then again, perhaps this should be a per-database thing, much like the
encoding.  The postmaster doesn't do any collating, so it should be safe
to have different locale in different backends. -- Unless someone puts an
index on pg_database.datname. :-)

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Index problem

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> But it hasn't got done yet.  (Peter, would it be
>> easy to make GUC handle this?  There'd need to be some way to cause
>> guc.c to do a putenv() ...)

> I thought pg_control was the place for this.  Any idea how long a locale
> string can legally be?  Doesn't look too hard beyond that.

That would work too, I guess.

> Then again, perhaps this should be a per-database thing, much like the
> encoding.  The postmaster doesn't do any collating, so it should be safe
> to have different locale in different backends. -- Unless someone puts an
> index on pg_database.datname. :-)

There already is an index on pg_group.groname, so I think we'd better be
conservative and require the same locale across the whole installation.

Hmm ... doesn't that mean that per-database encoding is a bogus concept
too!?
        regards, tom lane