Thread: Index problem
Hi! We're using postgres 6.5.3 for an application where we, among other things, store information about users. On two separate occations on two different servers we've experienced problems with indices on the same table (out of over 100). The problem is that postgres refuse to find a certain row in the table when performing a simple select using the primary (unique) key like this: select * from elever where userid = 'walther'; When selecting all rows the row in question is shown. When doing the following, I get the result I want: select * from elever where userid like 'w%'; <-- OK select * from elever where userid like 'wa%'; <-- OK select * from elever where userid like 'wal%'; <-- OK select * from elever where userid like 'walt%'; <-- OK but not select * from elever where userid like 'walth%'; <-- Not OK! Droping and recreating the index solves the problem, but that's not good enough for me since the problem has reoccured on a different machine with a different database. vacuum and vacuum analyze does not report any problem with the table. Both times the problem occured with userid's starting with a "w". Postgres is running with a Swedish locale, and on FreeBSD this means that "w" and "v" (among a number of other letter) are treated equally when collating/sorting. I suppose this could be part of the problem. Is this a known problem? If so, is it fixed in 7.0.2? I've not seen this happen on any of our postgres 7.0.2 systems, but as I can't even reproduce it on the 6.5.3, that's no guarantee it's fixed... For the record... I know we shouldn't run 6.5.3 and we will upgrade to 7.x within soon... psql -c 'select version()' PostgreSQL 6.5.3 on i386-unknown-freebsd4.1, compiled by cc uname -a FreeBSD 4.1-RELEASE #0: Tue Aug 29 15:31:01 CEST 2000 Regards, Patrik Kudo ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol Känns det oklart? Fråga på!
Patrik Kudo <kudo@partitur.se> writes: > select * from elever where userid like 'walth%'; <-- Not OK! > Droping and recreating the index solves the problem, but that's not > good enough for me since the problem has reoccured on a different > machine with a different database. vacuum and vacuum analyze does not > report any problem with the table. > Both times the problem occured with userid's starting with a > "w". Postgres is running with a Swedish locale, and on FreeBSD this > means that "w" and "v" (among a number of other letter) are treated > equally when collating/sorting. I suppose this could be part of the > problem. > Is this a known problem? If so, is it fixed in 7.0.2? I've not seen > this happen on any of our postgres 7.0.2 systems, but as I can't even > reproduce it on the 6.5.3, that's no guarantee it's fixed... Hmm. I can think of two known issues that may be relevant. First off, there are indeed known problems with LIKE index optimization in non-ASCII locales. I believe that 7.0 fixes the problems for locales that just have a non-ASCII sort order of individual characters, but we've since realized that it can still do the wrong thing in locales where there are special rules for handling multi-character sequences. I don't know the Swedish rules offhand, so don't know if that's a problem for you. However, a LIKE optimization problem would not be fixed by dropping and recreating the index. This makes me think you are being bit by the other issue: if you compile with LOCALE support then it is critical that the postmaster *always* be started with the same LOCALE settings (at least the same LC_COLLATE value). If LOCALE varies that means your text sort ordering varies, which means that indexes on text columns may appear out-of-order, which causes index searches to miss entries they should have found. If you insert entries under different LOCALE settings then you may wind up with an index that is not consistent with *any* single LOCALE; the only cure for that is to drop and rebuild the index. Unfortunately, it's way too easy to get bit by this bug. The most common error is to start the postmaster by hand from a shell account whose LOCALE environment is different from what's supplied when the postmaster is started from a boot-time script. Best bet is to set the correct LOCALE values in a wrapper script that you use in both cases. We have talked about fixing this by saving the active LOCALE variables at initdb time, and having the postmaster adopt those values whenever it's started. But it hasn't got done yet. (Peter, would it be easy to make GUC handle this? There'd need to be some way to cause guc.c to do a putenv() ...) regards, tom lane
Tom Lane writes: > We have talked about fixing this by saving the active LOCALE variables > at initdb time, and having the postmaster adopt those values whenever > it's started. But it hasn't got done yet. (Peter, would it be > easy to make GUC handle this? There'd need to be some way to cause > guc.c to do a putenv() ...) I thought pg_control was the place for this. Any idea how long a locale string can legally be? Doesn't look too hard beyond that. Then again, perhaps this should be a per-database thing, much like the encoding. The postmaster doesn't do any collating, so it should be safe to have different locale in different backends. -- Unless someone puts an index on pg_database.datname. :-) -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: >> But it hasn't got done yet. (Peter, would it be >> easy to make GUC handle this? There'd need to be some way to cause >> guc.c to do a putenv() ...) > I thought pg_control was the place for this. Any idea how long a locale > string can legally be? Doesn't look too hard beyond that. That would work too, I guess. > Then again, perhaps this should be a per-database thing, much like the > encoding. The postmaster doesn't do any collating, so it should be safe > to have different locale in different backends. -- Unless someone puts an > index on pg_database.datname. :-) There already is an index on pg_group.groname, so I think we'd better be conservative and require the same locale across the whole installation. Hmm ... doesn't that mean that per-database encoding is a bogus concept too!? regards, tom lane