Index Corruption - Mailing list pgsql-general

From Dylan Adams
Subject Index Corruption
Date
Msg-id CA+Rt69Yb6W2fXRYb2P57LWSVS7wf8gGxedunNLChqp2api9pow@mail.gmail.com
Whole thread Raw
Responses Re: Index Corruption  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Index Corruption  (pasman pasmański <pasman.p@gmail.com>)
Re: Index Corruption  (Scott Marlowe <scott.marlowe@gmail.com>)
List pgsql-general
We moved to PostgreSQL about 2 years ago and have been very happy with it
overall. The only major issue that we've had is intermittent index corruption.
This manifests itself as either "duplicate key value violates unique constraint"
or "could not read block 37422 of relation 1663/18663/19063: read only 0 of 8192
bytes." REINDEXing the table solves the problem. We do sometimes have bugs that
cause unique index violations, so the first error is especially annoying.

We've encountered the problem with both JDBC and libpq clients. The problem has
persisted through upgrades to our database servers (from 32bit CentOS 5.3 with
PostgreSQL 8.3.9 to 64bit CentOS 5.6 with PostgreSQL 8.4.8, all with stock
kernels).

Our database servers are fully virtualized, running under VMware Server on Dell
PowerEdge Servers. We use battery backed raid controllers (PERC4/5/6),
configured for RAID 10. We also experienced the problem when we had physical
database servers. Servers all have 2 CPUs.

PostgreSQL is installed from the PGDG RPMs (yum.postgresql.org). Our only
non-default postgresql.conf options are the typical tunables: shared_buffers,
checkpoint_segments, effective_cache_size, default_statistics_target, etc. WAL
settings are defaults (save checkpoint_segments). No clustering, no addons.
We've tuned autovacuum to be more aggressive in an attempt to address some bloat
issues; this didn't seem to have any impact on the frequency of index
corruption.

The databases are unloaded nightly. We do a weekly pg_dumpall as an integrity
check.

We've only seen the problem in our many weekly batch processes. Typically, these
processes DELETE a large subset of data from a table and then repopulate with
the same values in indexed fields. Depending on the dataset, they run from 30
minutes to 8 hours. These processes are scheduled so that only one program
(which is single threaded) is updating any given table at a time.

We have 12 database servers and we usually have about 1 incident per week.
Sometimes we'll go for weeks without any occurrences, and then we'll have a
flurry of them.

My primary question: is this normal? There isn't an overwhelming amount of
messages in the archives about index corruption, which leads me to think that
there's something with our configuration or our processes that is making us more
susceptible.

Is there something we should be doing to make index corruption less likely? Is
there anyway to do an index integrity check so we can be more proactive with
REINDEXing?

Thanks,
dylan

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Has Pg 9.1.0 been released today?
Next
From: Gavin Flower
Date:
Subject: Re: Has Pg 9.1.0 been released today?