Thread: Repeatedly breaking indexes

Repeatedly breaking indexes

From
Martin Edlman
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

    I have encountered serious problem with PostgreSQL 7.3.4, the same
with previous version 7.3.2 (installed from RPMs postgresql-7.3.4-1PGDG)
running on dual processor machine. Maybe dual processors cause described
problems but I suppose it should work on N-processor machine. Redhat 8.0
is running on this machine.

    I have a DB with 50 tables, following three are the most
problematic: zone, qmail_account, qmail_dot (scheme at the end of this
mail).

* zone has a unique key (name)
* qmail_account has column domain which refers to zone(name)
  qmail_account has a unique key (username,domain)
* qmail_dot has columns username,domain which refers to
  qmail_account(username,domain)
  qmail_dot has a unique key (username,domain,extension)
* there are preinsert/preupdate triggers on tables which DON'T modify
  indexed columns

There are more RI/FK constraints, see scheme.

- -----
'LIKE' / '=' PROBLEM

Several times I came to the situation when I was not able to insert a
record to the table zone because of unique index.

But when I tried to select the record "where name='domain'" I got no
result. When i used "where name LIKE 'domain'" I got it.  LIKE is without
wild chars %.

After "update zone set name='domain' where name like 'domain'" I was able
to select it with '='.

So these operators acts differently on the same indexes.

- -----
SESSION PROBLEM

Another problem is similar to the previous. My coleague tried to select
domain from the zone table and got no result, so he inserted the record.
Yes, this time the insert succeeded. Then he selected the record with '='
and he saw his newly inserted record.
I told his that I already inserted this domain and I did select with '='
and saw my record. When I used 'like' I saw both records.
When he used 'like' he still saw only his record. He closed the session
and open new, then he saw both records with 'like'.

After playing with it I used REINDEX TABLE {zone,qmail_account,qmail_dot}
FORCE and problem was fixed. But for a while only. After some time the
indexes broke again on table qmail_account and qmail was complaining about
missing accounts, but I saw the record with the EXACTLY same select query.
After closing session and opening new one I didn't see it as well :-(
After reindex everything is fine.

- -----
Duplicates on unique index

The last problem with indexes I encountered is that when I have unique
index sometime it's possible to have duplicates. I don't know when it
happens, but unlike in previous problem this time I see both record using
'=' operator.

- ----
tables

CREATE TABLE zone (
    id SERIAL,
    contractid int not null REFERENCES contract (id),
    name varchar(100) not null unique DEFAULT '',
      public_mail boolean not null DEFAULT '0',
    public_web boolean not null DEFAULT '0',
        ns1id int not null REFERENCES server (id) DEFAULT '0',
        ns2id int REFERENCES server (id) DEFAULT '0',
    ns3id int REFERENCES server (id) DEFAULT '0',
    mx0id int not null REFERENCES server (id) DEFAULT '0',
    mx10id int REFERENCES server (id) DEFAULT '0',
        mx100id int REFERENCES server (id) DEFAULT '0',
        expire date null,
        extra text null DEFAULT '',
        comment varchar(200) null DEFAULT '',
        PRIMARY KEY(id)
);

CREATE INDEX zone_contractid_idx ON zone (contractid);

CREATE TABLE qmail_account (
        id SERIAL,
    contractid int not null REFERENCES contract (id),
    username varchar(50),
      domain varchar(100) not null REFERENCES zone (name) ON UPDATE
        CASCADE,
    email varchar(255),
        password varchar(50),
        passwordenc boolean DEFAULT 'n',
        uid int DEFAULT '8',
        gid int DEFAULT '11',
    home varchar(100),
        hardquota int DEFAULT '50',
        valid_from timestamp null DEFAULT 'now()',
        valid_to timestamp null,
        enabled boolean DEFAULT '1',
        use_dot_qmail boolean DEFAULT '0',
        lastlogin timestamp,
        lastdelivery timestamp,
        comment varchar(200) null DEFAULT '',
        PRIMARY KEY(id)
);

CREATE INDEX qmail_account_contractid_idx ON qmail_account (contractid);
CREATE INDEX qmail_account_username_idx ON qmail_account (username);
CREATE UNIQUE INDEX qmail_account_email ON qmail_account (email);
CREATE UNIQUE INDEX qmail_account_email_idx ON qmail_account (username,domain);

CREATE TABLE qmail_dot (
        id SERIAL,
        username varchar(50),
        domain varchar(100) not null REFERENCES zone (name) ON UPDATE
        CASCADE,
        extension varchar(64),
        dotqmail text,
        FOREIGN KEY(username,domain) REFERENCES
        qmail_account(username,domain) ON UPDATE CASCADE ON
        DELETE CASCADE,
        PRIMARY KEY(id)
);

CREATE UNIQUE INDEX qmail_dot_key ON qmail_dot (username,domain,extension);


- --

Martin Edlman
Fortech s.r.o, Litomysl
Public PGP key: http://edas.visaci.cz/#keys


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/pjYCoFlEKJy9loQRApIXAJ0VTkKloaYVM9vF9Ai6rgv7+oiEvwCfRW/l
4oLXCT46Gd9SwHCLd+eZsLs=
=7yx5
-----END PGP SIGNATURE-----

Re: Repeatedly breaking indexes

From
Tom Lane
Date:
Martin Edlman <edlman@fortech.cz> writes:
> [ unstable index behavior ]

I'm wondering about hardware problems --- how sure are you that you
don't have flaky RAM or a bad disk drive?

Also, what locale settings are you using in the database (use
pg_controldata to verify this)?  It seems possible that a broken
locale definition could mess up indexes.

I know the above sounds like passing the buck, but when you're the only
one reporting such troubles, I have to ask what's different about your
installation ...

            regards, tom lane

Re: Repeatedly breaking indexes

From
Martin Edlman
Date:
On Mon, 3 Nov 2003, Tom Lane wrote:

> Martin Edlman <edlman@fortech.cz> writes:
> > [ unstable index behavior ]
>
> I'm wondering about hardware problems --- how sure are you that you
> don't have flaky RAM or a bad disk drive?

I'll check this, the RAID controller reports this. RAM should be OK. I had
the same problem on the old (same type) server. We installed new server
(HW) with some BIOS upgrades and maintance utils and put the disks from
the old server to the new one.
I'll try to install PgSQL on some ordinary HW and test it.

> Also, what locale settings are you using in the database (use
> pg_controldata to verify this)?  It seems possible that a broken
> locale definition could mess up indexes.

I use cs_CZ locale. But any of indexes we are talking about doesn't use
czech chars, furthermore even any of these tables doesn't contain czech
chars.

$ pg_controldata /var/lib/pgsql/data
pg_control version number:            72
Catalog version number:               200211021
Database cluster state:               in production
pg_control last modified:
Current log file ID:                  37
Next log file segment:                116
Latest checkpoint location:           25/732BA740
Prior checkpoint location:            25/732B04F8
Latest checkpoint's REDO location:    25/732BA740
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's StartUpID:        44
Latest checkpoint's NextXID:          83236896
Latest checkpoint's NextOID:          30054204
Time of latest checkpoint:
Database block size:                  8192
Blocks per segment of large relation: 131072
Maximum length of identifiers:        64
Maximum number of function arguments: 32
Date/time type storage:               Floating point
Maximum length of locale name:        128
LC_COLLATE:                           cs_CZ.ISO8859-2
LC_CTYPE:                             cs_CZ.ISO8859-2


> I know the above sounds like passing the buck, but when you're the only
> one reporting such troubles, I have to ask what's different about your
> installation ...

I understand. I suppose that if my problem is somehow general you would
already know about it.

Regards,

--

Martin Edlman
Fortech s.r.o, Litomysl
Public PGP key: http://edas.visaci.cz/#keys

Re: Repeatedly breaking indexes

From
Tomas Szepe
Date:
On Nov-04 2003, Tue, 07:07 +0100
Martin Edlman <edlman@fortech.cz> wrote:

> I use cs_CZ locale. But any of indexes we are talking about doesn't use
> czech chars, furthermore even any of these tables doesn't contain czech
> chars.

Martin, you can probably rule out the cs_CZ (LATIN2) locale as the cause
of your problems -- I've been using that one for years on many production
postgres systems (often huge and constantly loaded) and have never observed
the problems you're describing.

--
Tomas Szepe <szepe@pinerecords.com>

Re: Repeatedly breaking indexes

From
Tom Lane
Date:
Tomas Szepe <szepe@pinerecords.com> writes:
> Martin, you can probably rule out the cs_CZ (LATIN2) locale as the cause
> of your problems -- I've been using that one for years on many production
> postgres systems (often huge and constantly loaded) and have never observed
> the problems you're describing.

Thanks for the info.  But are you using cs_CZ.ISO8859-2 in particular on
Red Hat 8.0 in particular?  If it is a locale-related issue, it might be
specific to that particular variant on that platform.

I have RH 8.0 here, and could easily run some tests, but I'm not sure
what to look for.  A quick run of the regression tests didn't reveal
any issues, other than expectable differences from C locale in sort
ordering.

            regards, tom lane

Re: Repeatedly breaking indexes

From
Tomas Szepe
Date:
On Nov-04 2003, Tue, 09:24 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Tomas Szepe <szepe@pinerecords.com> writes:
> > Martin, you can probably rule out the cs_CZ (LATIN2) locale as the cause
> > of your problems -- I've been using that one for years on many production
> > postgres systems (often huge and constantly loaded) and have never observed
> > the problems you're describing.
>
> Thanks for the info.  But are you using cs_CZ.ISO8859-2 in particular on
> Red Hat 8.0 in particular?  If it is a locale-related issue, it might be
> specific to that particular variant on that platform.

Slackware 7.1 -> 9.1, which directly translates to vanilla glibc.

--
Tomas Szepe <szepe@pinerecords.com>

Re: Repeatedly breaking indexes - SOLVED

From
Martin Edlman
Date:
Hello,

    finally the problem is solved. The problem was with locales.

    I installed and initialised the database under cs_CZ, so Pg
recorded cs_CZ in all its configurations and startup scripts.
    After that I changed /etc/sysconfig/i18n to use en_US locale as
system default.
    And that was the problem. According to "ps axe" the postmaster was
running under en_US locales instead of cs_CZ as I expected.
    So I changed /etc/sysconfig/i18n to use cs_CZ and everything is ok
from then. All indexes seem to be working fine and there are no problems.

    But I would like to know why it was not started under cs_CZ when
all Pg config files specify cs_CZ and ~postgres/.bash_profile contains
export LANG="cs_CZ" and postmaster is run from it's rc script as

su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl  -D $PGDATA ...."

    So I expected that cs_CZ will be used in this case. :-( GRrrrrr.

    Ok now it's working, thank you all who helped.

Martin

> Tomas Szepe <szepe@pinerecords.com> writes:
> > Martin, you can probably rule out the cs_CZ (LATIN2) locale as the cause
> > of your problems -- I've been using that one for years on many production
> > postgres systems (often huge and constantly loaded) and have never observed
> > the problems you're describing.
>
> Thanks for the info.  But are you using cs_CZ.ISO8859-2 in particular on
> Red Hat 8.0 in particular?  If it is a locale-related issue, it might be
> specific to that particular variant on that platform.
>
> I have RH 8.0 here, and could easily run some tests, but I'm not sure
> what to look for.  A quick run of the regression tests didn't reveal
> any issues, other than expectable differences from C locale in sort
> ordering.
>
>             regards, tom lane
>

--

Martin Edlman
Fortech s.r.o, Litomysl
Public PGP key: http://edas.visaci.cz/#keys

Re: Repeatedly breaking indexes - SOLVED

From
Tom Lane
Date:
Martin Edlman <edlman@fortech.cz> writes:
>     finally the problem is solved. The problem was with locales.

>     I installed and initialised the database under cs_CZ, so Pg
> recorded cs_CZ in all its configurations and startup scripts.
>     After that I changed /etc/sysconfig/i18n to use en_US locale as
> system default.
>     And that was the problem. According to "ps axe" the postmaster was
> running under en_US locales instead of cs_CZ as I expected.

"ps" is not a reliable guide to the locale settings being used by
Postgres.

The postmaster will adopt LC_COLLATE and LC_CTYPE from the settings
recorded in pg_control (by initdb) regardless of its environment.
Other, noncritical, settings such as LC_MESSAGES are taken from
postgresql.conf if specified there.  The environment values reported by
"ps" would only be used for noncritical settings, and then only if
not overridden by postgresql.conf.

So I'm not convinced that you've correctly identified the problem.
However, it seems possible that part of the issue is misbehavior if
the various LC_xxx settings aren't all alike --- could you dig further
and try to isolate it?

            regards, tom lane

Re: Repeatedly breaking indexes - SOLVED

From
Martin Edlman
Date:
> "ps" is not a reliable guide to the locale settings being used by
> Postgres.

Maybe it's RedHat related, I don't know.
When I had en_US in /etc/sysconfig/i18n and cs_CZ in
~postgres/.bash_profile and when I did (as root) `su - postgres -s /bin/sh
-c "echo $LANG"' I saw LANG=en_US (!!!), when I did `su - postgres' and
then `echo $LANG' (as postgres) I saw LANG=cs_CZ.
So I believe that ps gives correct info and I believe that postmaster was
running with en_US locales. It seems that the problem is with `su' which
may not run user's .bash* scripts when executing command with -c.

> The postmaster will adopt LC_COLLATE and LC_CTYPE from the settings
> recorded in pg_control (by initdb) regardless of its environment.

That's what I was hoping for, but unfortunately I didn't work this way.

I isolated the problem when trying Pg on different HW. I installed RH from
install CD and fortunatelly installed it with cs_CZ as default. Then
evertyhing worked - the same Pg, the same database. After that I looked at
`ps axe' and saw the difference in LANG.

> So I'm not convinced that you've correctly identified the problem.
> However, it seems possible that part of the issue is misbehavior if
> the various LC_xxx settings aren't all alike --- could you dig further
> and try to isolate it?

I set ALL LC_xxx variables to cs_CZ in postgres.conf but it didn't help.
Only setting LANG to cs_CZ in systemwide settings.

Regards, Martin

--

Martin Edlman
Fortech s.r.o, Litomysl
Public PGP key: http://edas.visaci.cz/#keys