Thread: Repeatedly breaking indexes
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, I have encountered serious problem with PostgreSQL 7.3.4, the same with previous version 7.3.2 (installed from RPMs postgresql-7.3.4-1PGDG) running on dual processor machine. Maybe dual processors cause described problems but I suppose it should work on N-processor machine. Redhat 8.0 is running on this machine. I have a DB with 50 tables, following three are the most problematic: zone, qmail_account, qmail_dot (scheme at the end of this mail). * zone has a unique key (name) * qmail_account has column domain which refers to zone(name) qmail_account has a unique key (username,domain) * qmail_dot has columns username,domain which refers to qmail_account(username,domain) qmail_dot has a unique key (username,domain,extension) * there are preinsert/preupdate triggers on tables which DON'T modify indexed columns There are more RI/FK constraints, see scheme. - ----- 'LIKE' / '=' PROBLEM Several times I came to the situation when I was not able to insert a record to the table zone because of unique index. But when I tried to select the record "where name='domain'" I got no result. When i used "where name LIKE 'domain'" I got it. LIKE is without wild chars %. After "update zone set name='domain' where name like 'domain'" I was able to select it with '='. So these operators acts differently on the same indexes. - ----- SESSION PROBLEM Another problem is similar to the previous. My coleague tried to select domain from the zone table and got no result, so he inserted the record. Yes, this time the insert succeeded. Then he selected the record with '=' and he saw his newly inserted record. I told his that I already inserted this domain and I did select with '=' and saw my record. When I used 'like' I saw both records. When he used 'like' he still saw only his record. He closed the session and open new, then he saw both records with 'like'. After playing with it I used REINDEX TABLE {zone,qmail_account,qmail_dot} FORCE and problem was fixed. But for a while only. After some time the indexes broke again on table qmail_account and qmail was complaining about missing accounts, but I saw the record with the EXACTLY same select query. After closing session and opening new one I didn't see it as well :-( After reindex everything is fine. - ----- Duplicates on unique index The last problem with indexes I encountered is that when I have unique index sometime it's possible to have duplicates. I don't know when it happens, but unlike in previous problem this time I see both record using '=' operator. - ---- tables CREATE TABLE zone ( id SERIAL, contractid int not null REFERENCES contract (id), name varchar(100) not null unique DEFAULT '', public_mail boolean not null DEFAULT '0', public_web boolean not null DEFAULT '0', ns1id int not null REFERENCES server (id) DEFAULT '0', ns2id int REFERENCES server (id) DEFAULT '0', ns3id int REFERENCES server (id) DEFAULT '0', mx0id int not null REFERENCES server (id) DEFAULT '0', mx10id int REFERENCES server (id) DEFAULT '0', mx100id int REFERENCES server (id) DEFAULT '0', expire date null, extra text null DEFAULT '', comment varchar(200) null DEFAULT '', PRIMARY KEY(id) ); CREATE INDEX zone_contractid_idx ON zone (contractid); CREATE TABLE qmail_account ( id SERIAL, contractid int not null REFERENCES contract (id), username varchar(50), domain varchar(100) not null REFERENCES zone (name) ON UPDATE CASCADE, email varchar(255), password varchar(50), passwordenc boolean DEFAULT 'n', uid int DEFAULT '8', gid int DEFAULT '11', home varchar(100), hardquota int DEFAULT '50', valid_from timestamp null DEFAULT 'now()', valid_to timestamp null, enabled boolean DEFAULT '1', use_dot_qmail boolean DEFAULT '0', lastlogin timestamp, lastdelivery timestamp, comment varchar(200) null DEFAULT '', PRIMARY KEY(id) ); CREATE INDEX qmail_account_contractid_idx ON qmail_account (contractid); CREATE INDEX qmail_account_username_idx ON qmail_account (username); CREATE UNIQUE INDEX qmail_account_email ON qmail_account (email); CREATE UNIQUE INDEX qmail_account_email_idx ON qmail_account (username,domain); CREATE TABLE qmail_dot ( id SERIAL, username varchar(50), domain varchar(100) not null REFERENCES zone (name) ON UPDATE CASCADE, extension varchar(64), dotqmail text, FOREIGN KEY(username,domain) REFERENCES qmail_account(username,domain) ON UPDATE CASCADE ON DELETE CASCADE, PRIMARY KEY(id) ); CREATE UNIQUE INDEX qmail_dot_key ON qmail_dot (username,domain,extension); - -- Martin Edlman Fortech s.r.o, Litomysl Public PGP key: http://edas.visaci.cz/#keys -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/pjYCoFlEKJy9loQRApIXAJ0VTkKloaYVM9vF9Ai6rgv7+oiEvwCfRW/l 4oLXCT46Gd9SwHCLd+eZsLs= =7yx5 -----END PGP SIGNATURE-----
Martin Edlman <edlman@fortech.cz> writes: > [ unstable index behavior ] I'm wondering about hardware problems --- how sure are you that you don't have flaky RAM or a bad disk drive? Also, what locale settings are you using in the database (use pg_controldata to verify this)? It seems possible that a broken locale definition could mess up indexes. I know the above sounds like passing the buck, but when you're the only one reporting such troubles, I have to ask what's different about your installation ... regards, tom lane
On Mon, 3 Nov 2003, Tom Lane wrote: > Martin Edlman <edlman@fortech.cz> writes: > > [ unstable index behavior ] > > I'm wondering about hardware problems --- how sure are you that you > don't have flaky RAM or a bad disk drive? I'll check this, the RAID controller reports this. RAM should be OK. I had the same problem on the old (same type) server. We installed new server (HW) with some BIOS upgrades and maintance utils and put the disks from the old server to the new one. I'll try to install PgSQL on some ordinary HW and test it. > Also, what locale settings are you using in the database (use > pg_controldata to verify this)? It seems possible that a broken > locale definition could mess up indexes. I use cs_CZ locale. But any of indexes we are talking about doesn't use czech chars, furthermore even any of these tables doesn't contain czech chars. $ pg_controldata /var/lib/pgsql/data pg_control version number: 72 Catalog version number: 200211021 Database cluster state: in production pg_control last modified: Current log file ID: 37 Next log file segment: 116 Latest checkpoint location: 25/732BA740 Prior checkpoint location: 25/732B04F8 Latest checkpoint's REDO location: 25/732BA740 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's StartUpID: 44 Latest checkpoint's NextXID: 83236896 Latest checkpoint's NextOID: 30054204 Time of latest checkpoint: Database block size: 8192 Blocks per segment of large relation: 131072 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: Floating point Maximum length of locale name: 128 LC_COLLATE: cs_CZ.ISO8859-2 LC_CTYPE: cs_CZ.ISO8859-2 > I know the above sounds like passing the buck, but when you're the only > one reporting such troubles, I have to ask what's different about your > installation ... I understand. I suppose that if my problem is somehow general you would already know about it. Regards, -- Martin Edlman Fortech s.r.o, Litomysl Public PGP key: http://edas.visaci.cz/#keys
On Nov-04 2003, Tue, 07:07 +0100 Martin Edlman <edlman@fortech.cz> wrote: > I use cs_CZ locale. But any of indexes we are talking about doesn't use > czech chars, furthermore even any of these tables doesn't contain czech > chars. Martin, you can probably rule out the cs_CZ (LATIN2) locale as the cause of your problems -- I've been using that one for years on many production postgres systems (often huge and constantly loaded) and have never observed the problems you're describing. -- Tomas Szepe <szepe@pinerecords.com>
Tomas Szepe <szepe@pinerecords.com> writes: > Martin, you can probably rule out the cs_CZ (LATIN2) locale as the cause > of your problems -- I've been using that one for years on many production > postgres systems (often huge and constantly loaded) and have never observed > the problems you're describing. Thanks for the info. But are you using cs_CZ.ISO8859-2 in particular on Red Hat 8.0 in particular? If it is a locale-related issue, it might be specific to that particular variant on that platform. I have RH 8.0 here, and could easily run some tests, but I'm not sure what to look for. A quick run of the regression tests didn't reveal any issues, other than expectable differences from C locale in sort ordering. regards, tom lane
On Nov-04 2003, Tue, 09:24 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Tomas Szepe <szepe@pinerecords.com> writes: > > Martin, you can probably rule out the cs_CZ (LATIN2) locale as the cause > > of your problems -- I've been using that one for years on many production > > postgres systems (often huge and constantly loaded) and have never observed > > the problems you're describing. > > Thanks for the info. But are you using cs_CZ.ISO8859-2 in particular on > Red Hat 8.0 in particular? If it is a locale-related issue, it might be > specific to that particular variant on that platform. Slackware 7.1 -> 9.1, which directly translates to vanilla glibc. -- Tomas Szepe <szepe@pinerecords.com>
Hello, finally the problem is solved. The problem was with locales. I installed and initialised the database under cs_CZ, so Pg recorded cs_CZ in all its configurations and startup scripts. After that I changed /etc/sysconfig/i18n to use en_US locale as system default. And that was the problem. According to "ps axe" the postmaster was running under en_US locales instead of cs_CZ as I expected. So I changed /etc/sysconfig/i18n to use cs_CZ and everything is ok from then. All indexes seem to be working fine and there are no problems. But I would like to know why it was not started under cs_CZ when all Pg config files specify cs_CZ and ~postgres/.bash_profile contains export LANG="cs_CZ" and postmaster is run from it's rc script as su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA ...." So I expected that cs_CZ will be used in this case. :-( GRrrrrr. Ok now it's working, thank you all who helped. Martin > Tomas Szepe <szepe@pinerecords.com> writes: > > Martin, you can probably rule out the cs_CZ (LATIN2) locale as the cause > > of your problems -- I've been using that one for years on many production > > postgres systems (often huge and constantly loaded) and have never observed > > the problems you're describing. > > Thanks for the info. But are you using cs_CZ.ISO8859-2 in particular on > Red Hat 8.0 in particular? If it is a locale-related issue, it might be > specific to that particular variant on that platform. > > I have RH 8.0 here, and could easily run some tests, but I'm not sure > what to look for. A quick run of the regression tests didn't reveal > any issues, other than expectable differences from C locale in sort > ordering. > > regards, tom lane > -- Martin Edlman Fortech s.r.o, Litomysl Public PGP key: http://edas.visaci.cz/#keys
Martin Edlman <edlman@fortech.cz> writes: > finally the problem is solved. The problem was with locales. > I installed and initialised the database under cs_CZ, so Pg > recorded cs_CZ in all its configurations and startup scripts. > After that I changed /etc/sysconfig/i18n to use en_US locale as > system default. > And that was the problem. According to "ps axe" the postmaster was > running under en_US locales instead of cs_CZ as I expected. "ps" is not a reliable guide to the locale settings being used by Postgres. The postmaster will adopt LC_COLLATE and LC_CTYPE from the settings recorded in pg_control (by initdb) regardless of its environment. Other, noncritical, settings such as LC_MESSAGES are taken from postgresql.conf if specified there. The environment values reported by "ps" would only be used for noncritical settings, and then only if not overridden by postgresql.conf. So I'm not convinced that you've correctly identified the problem. However, it seems possible that part of the issue is misbehavior if the various LC_xxx settings aren't all alike --- could you dig further and try to isolate it? regards, tom lane
> "ps" is not a reliable guide to the locale settings being used by > Postgres. Maybe it's RedHat related, I don't know. When I had en_US in /etc/sysconfig/i18n and cs_CZ in ~postgres/.bash_profile and when I did (as root) `su - postgres -s /bin/sh -c "echo $LANG"' I saw LANG=en_US (!!!), when I did `su - postgres' and then `echo $LANG' (as postgres) I saw LANG=cs_CZ. So I believe that ps gives correct info and I believe that postmaster was running with en_US locales. It seems that the problem is with `su' which may not run user's .bash* scripts when executing command with -c. > The postmaster will adopt LC_COLLATE and LC_CTYPE from the settings > recorded in pg_control (by initdb) regardless of its environment. That's what I was hoping for, but unfortunately I didn't work this way. I isolated the problem when trying Pg on different HW. I installed RH from install CD and fortunatelly installed it with cs_CZ as default. Then evertyhing worked - the same Pg, the same database. After that I looked at `ps axe' and saw the difference in LANG. > So I'm not convinced that you've correctly identified the problem. > However, it seems possible that part of the issue is misbehavior if > the various LC_xxx settings aren't all alike --- could you dig further > and try to isolate it? I set ALL LC_xxx variables to cs_CZ in postgres.conf but it didn't help. Only setting LANG to cs_CZ in systemwide settings. Regards, Martin -- Martin Edlman Fortech s.r.o, Litomysl Public PGP key: http://edas.visaci.cz/#keys