Thread: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT THE SAME AS HEAP

vacuumdb is failing with NUMBER OF INDEX TUPLES NOT THE SAME AS HEAP

From
Lonni Friedman
Date:
Greetings,
I've got an annoying problem.  I'm currently running PostgreSQL-7.3.4
on Linux (x86).  This problem started with 7.3.3.  I've got a database
that is on the larger side (about 3GB dump).  I run "vacuumdb -z -a
-f" religiously via a cronjob three times a day.

All of a sudden last month (after about 3 years) I started getting
this warning when vacuumdb was run:

INFO:  Index pg_largeobject_loid_pn_index: Pages 903; Tuples 323847:
Deleted 0.    CPU 0.04s/0.07u sec elapsed 0.10 sec.
WARNING:  Index pg_largeobject_loid_pn_index: NUMBER OF INDEX' TUPLES
(323847) IS NOT THE SAME AS HEAP' (323802).
    Recreate the index.

So I put postgresql into standalone mode, recreated the index, and
everything was ok for about 2 days, and then the problem would return.
 I did some Googling and found that this was a potential bug in older
versions of postgresql, but was supposedly fixed in 7.3.4 and later
versions.  So I upgraded to 7.3.4 (using the semi-official RPMs on the
postgresql.org ftp servers).  Dropped into standalone mode, reindexed,
and everything was fine for about the past month.

Until this morning when it came back again.  The server where this is
running isn't having any hardware problems, isn't getting shutdown
improperly or anything like that.  Its current uptime is 209 days, and
postgresql is never shutdown improperly.

Now I'd be willing to upgrade further, but I really can't afford
unnecessary downtime.  So I'd like some guidance/input on which
version of postgresql will not have this bug.  Or maybe this isn't the
bug at all, and there's some other weird problem?

Either way, any and all advice is appreciated.

Thanks!
Lonni

Lonni Friedman <netllama@gmail.com> writes:
> All of a sudden last month (after about 3 years) I started getting
> this warning when vacuumdb was run:

> INFO:  Index pg_largeobject_loid_pn_index: Pages 903; Tuples 323847:
> Deleted 0.    CPU 0.04s/0.07u sec elapsed 0.10 sec.
> WARNING:  Index pg_largeobject_loid_pn_index: NUMBER OF INDEX' TUPLES
> (323847) IS NOT THE SAME AS HEAP' (323802).
>     Recreate the index.

Hmm.  Is it always that same index, or have you seen this on other indexes?

I'm not aware of any known bug in 7.3.* that could lead to this sort of
error.  Don't suppose you have any suggestions about how to reproduce
the problem?

            regards, tom lane

Lonni Friedman <netllama@gmail.com> writes:
> Unfortunately, i have no clue how to replicate this.  It was happening
> fairly consistantly before i upgraded from 7.3.3 to 7.3.4 (like nearly
> every vacuumdb run).

> Then nothing for a month after going to 7.3.4, and now its happening
> every vacuumdb run again.

Once the complaint starts appearing, I'd expect it to continue until you
reindex the index.

            regards, tom lane

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT THE SAME AS HEAP

From
Lonni Friedman
Date:
Its _always_ that same index.  No others have had this problem.
Unfortunately, i have no clue how to replicate this.  It was happening
fairly consistantly before i upgraded from 7.3.3 to 7.3.4 (like nearly
every vacuumdb run).

Then nothing for a month after going to 7.3.4, and now its happening
every vacuumdb run again.

On Wed, 05 May 2004 11:30:33 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Lonni Friedman <netllama@gmail.com> writes:
> > All of a sudden last month (after about 3 years) I started getting
> > this warning when vacuumdb was run:
>
> > INFO:  Index pg_largeobject_loid_pn_index: Pages 903; Tuples 323847:
> > Deleted 0.    CPU 0.04s/0.07u sec elapsed 0.10 sec.
> > WARNING:  Index pg_largeobject_loid_pn_index: NUMBER OF INDEX' TUPLES
> > (323847) IS NOT THE SAME AS HEAP' (323802).
> >       Recreate the index.
>
> Hmm.  Is it always that same index, or have you seen this on other indexes?
>
> I'm not aware of any known bug in 7.3.* that could lead to this sort of
> error.  Don't suppose you have any suggestions about how to reproduce
> the problem?
>
>                         regards, tom lane

Lonni Friedman <netllama@gmail.com> writes:
> On Wed, 05 May 2004 12:31:21 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Once the complaint starts appearing, I'd expect it to continue until you
>> reindex the index.

> That's exactly what happens.  It consistantly errors until reindexed.
> Any suggestions?  thanks.

You are seemingly triggering some sort of bug in the backend's
large-object code, causing extra index entries to be made ...
but I sure haven't the foggiest idea how that would happen.

Perhaps you could look at the parts of your application code that work
with large objects and see if you are doing anything "off the beaten
track" that might suggest a way to trigger the bug reproducibly.
I'm sure we could fix it if we could see it happening.

            regards, tom lane

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT THE SAME AS HEAP

From
Lonni Friedman
Date:
On Wed, 05 May 2004 12:31:21 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Lonni Friedman <netllama@gmail.com> writes:
> > Unfortunately, i have no clue how to replicate this.  It was happening
> > fairly consistantly before i upgraded from 7.3.3 to 7.3.4 (like nearly
> > every vacuumdb run).
>
> > Then nothing for a month after going to 7.3.4, and now its happening
> > every vacuumdb run again.
>
> Once the complaint starts appearing, I'd expect it to continue until you
> reindex the index.

That's exactly what happens.  It consistantly errors until reindexed.
Any suggestions?  thanks.

Lonni Friedman <netllama@gmail.com> writes:
> hrmmm, i'm not sure what would constitute 'off the beaten track'.

Neither am I ... if we knew what you were doing that triggers the bug,
we'd already be halfway there :-(

            regards, tom lane

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT THE SAME AS HEAP

From
Lonni Friedman
Date:
On Wed, 05 May 2004 13:56:41 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni Friedman <netllama@gmail.com> writes:
> > On Wed, 05 May 2004 12:31:21 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Once the complaint starts appearing, I'd expect it to continue until you
> >> reindex the index.
>
> > That's exactly what happens.  It consistantly errors until reindexed.
> > Any suggestions?  thanks.
>
> You are seemingly triggering some sort of bug in the backend's
> large-object code, causing extra index entries to be made ...
> but I sure haven't the foggiest idea how that would happen.
>
> Perhaps you could look at the parts of your application code that work
> with large objects and see if you are doing anything "off the beaten
> track" that might suggest a way to trigger the bug reproducibly.
> I'm sure we could fix it if we could see it happening.

hrmmm, i'm not sure what would constitute 'off the beaten track'.  is
there something specific that i could look for?  or what is considered
to be normal?

I can tell you that the large objects that are getting chucked into
the database are generally under 1MB in size (each), as they're mostly
M$ documents.  There are a few that are between 1MB & 10MB but I don't
think there's anything larger than 10MB.

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT

From
Denis Braekhus
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lonni Friedman wrote:
|>>Then nothing for a month after going to 7.3.4, and now its happening
|>>every vacuumdb run again.
|>Once the complaint starts appearing, I'd expect it to continue until you
|>reindex the index.
| That's exactly what happens.  It consistantly errors until reindexed.
| Any suggestions?  thanks.

A long shot, but we experienced a similar kind of issue, though our
dataset was a bit different (not so many large objects).
After having reindexed a couple of times with about the same results
(working fine for some time, then complaining again), I dropped the
index and recreated it. That seemed to fix the issue at our
installation, we had no such problems after that. Ran for some months
after that before I took the plunge and moved to 7.4..

This might not at all work for you though, and as removing the index
will probably kill your performance for a while I cannot promise you a
fix with this workaround..

Best Regards
- --
Denis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-nr2 (Windows XP)

iD8DBQFAmqdovsCA6eRGOOARAptYAJ0YF6+tGtyA55hqOf8UaF+q2JnN4wCfdQtV
r4nBoDljvFDGlXh3OhNSPcM=
=6xLx
-----END PGP SIGNATURE-----

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT THE SAME AS HEAP

From
Lonni Friedman
Date:
Thanks for your reply.  I thought (perhaps erroneously) that there
wasn't any real difference between dropping an index then recreating
it, and just reindexing an index?

On Thu, 06 May 2004 23:00:25 +0200, Denis Braekhus <denis@startsiden.no> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Lonni Friedman wrote:
> |>>Then nothing for a month after going to 7.3.4, and now its happening
> |>>every vacuumdb run again.
> |>Once the complaint starts appearing, I'd expect it to continue until you
> |>reindex the index.
> | That's exactly what happens.  It consistantly errors until reindexed.
> | Any suggestions?  thanks.
>
> A long shot, but we experienced a similar kind of issue, though our
> dataset was a bit different (not so many large objects).
> After having reindexed a couple of times with about the same results
> (working fine for some time, then complaining again), I dropped the
> index and recreated it. That seemed to fix the issue at our
> installation, we had no such problems after that. Ran for some months
> after that before I took the plunge and moved to 7.4..
>
> This might not at all work for you though, and as removing the index
> will probably kill your performance for a while I cannot promise you a
> fix with this workaround..
>
> Best Regards
> - --
> Denis

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT

From
Denis Braekhus
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lonni Friedman wrote:

| Thanks for your reply.  I thought (perhaps erroneously) that there
| wasn't any real difference between dropping an index then recreating
| it, and just reindexing an index?

I am definitely not sure, and I agree it sounds logical that they would
produce the same results. However my experience was that dropping and
re-creating the index worked.

The docs say :
"Another approach to dealing with a corrupted user-table index is just
to drop and recreate it. This may in fact be preferable if you would
like to maintain some semblance of normal operation on the table
meanwhile. REINDEX acquires exclusive lock on the table, while CREATE
INDEX only locks out writes not reads of the table."

Indicating that they should produce the same results, but that they work
differently. I am not sure what that implies, but maybe someone else knows ?

Regards
- --
Denis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-nr2 (Windows XP)

iD8DBQFAnPzivsCA6eRGOOARAl1OAKC0zcgN409n7ylgyHV61J9/o4LsBgCgqEpJ
yT24Y03fQItzhbRlxHyUg8s=
=YBoz
-----END PGP SIGNATURE-----

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT

From
Tom Lane
Date:
Denis Braekhus <denis@startsiden.no> writes:
> Indicating that they should produce the same results, but that they work
> differently. I am not sure what that implies, but maybe someone else knows ?

The only difference the docs are talking about is what kind of lock is
held while the rebuild proceeds.

A reindex builds a new index file from scratch, and AFAICS should give
the same results as dropping/recreating the index --- at least in terms
of what's in the file proper.  The only theory I can come up with for
your experience is that there was some corruption in the system catalog
rows describing the index.  That would not get fixed by a reindex.
However, I haven't the foggiest idea what sort of corruption might
allow the index to seem to work (and not, say, crash the reindex itself
which is going to use that information...) yet allow problems to appear
much later on.  Too bad the evidence is gone now.

            regards, tom lane

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT

From
Denis Braekhus
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane wrote:
|>Indicating that they should produce the same results, but that they work
|>differently. I am not sure what that implies, but maybe someone else
knows ?
| The only difference the docs are talking about is what kind of lock is
| held while the rebuild proceeds.

Yes I understood that, but the docs didn't (as you do now) excplicitly
explain the different ways they work. I expected it to be as you say,
however I was not 100% sure. Thanks for clarifying.

| A reindex builds a new index file from scratch, and AFAICS should give
| the same results as dropping/recreating the index --- at least in terms
| of what's in the file proper.  The only theory I can come up with for
| your experience is that there was some corruption in the system catalog
| rows describing the index.  That would not get fixed by a reindex.
| However, I haven't the foggiest idea what sort of corruption might
| allow the index to seem to work (and not, say, crash the reindex itself
| which is going to use that information...) yet allow problems to appear
| much later on.  Too bad the evidence is gone now.

Yes, sorry about not bringing up the issue at the right time, however my
main focus at that time was to bring the production system back to normal..

Regards
- --
Denis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-nr2 (Windows XP)

iD8DBQFAnh+gvsCA6eRGOOARAlivAKCl8aIuii8GeSFLetWn+exBVXnptwCeKMUr
wjAEgS7gP1LQeS/xZdiC03g=
=ZRI6
-----END PGP SIGNATURE-----

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT

From
Andrew Sullivan
Date:
On Sat, May 08, 2004 at 12:31:28PM -0400, Tom Lane wrote:
> However, I haven't the foggiest idea what sort of corruption might
> allow the index to seem to work (and not, say, crash the reindex itself
> which is going to use that information...) yet allow problems to appear
> much later on.  Too bad the evidence is gone now.

Dunno if this is any help, but on a 7.2 system I saw a REINDEX which
was interrupted leave the index at least partially working.  We ended
up with an index which seemed fine, but which didn't contain certain
rows (so those rows were not visible when the SELECT criterion was
the indexed field).  This was extremely puzzling, but a DROP INDEX;
CREATE INDEX pair solved it.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The plural of anecdote is not data.
        --Roger Brinner

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT

From
Tom Lane
Date:
Andrew Sullivan <ajs@crankycanuck.ca> writes:
> Dunno if this is any help, but on a 7.2 system I saw a REINDEX which
> was interrupted leave the index at least partially working.  We ended
> up with an index which seemed fine, but which didn't contain certain
> rows (so those rows were not visible when the SELECT criterion was
> the indexed field).  This was extremely puzzling, but a DROP INDEX;
> CREATE INDEX pair solved it.

Hmm, I would expect that behavior for an overwrite-in-place REINDEX,
but 7.2 only seems to use overwrite-in-place for critical system
catalogs.  What were you reindexing exactly?  Were you running a
standalone backend?

            regards, tom lane

Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT

From
Andrew Sullivan
Date:
On Mon, May 10, 2004 at 07:49:42PM -0400, Tom Lane wrote:
>
> Hmm, I would expect that behavior for an overwrite-in-place REINDEX,
> but 7.2 only seems to use overwrite-in-place for critical system
> catalogs.  What were you reindexing exactly?  Were you running a
> standalone backend?

Not as far as I know (I didn't cause the problem, I only fixed it
later, so I'm relying on the report of the person who ticked the
problem to understand what happened).  It was definitely on a user
column, and IIRC it was a two-field unique index.  My memory is hazy
on it, though -- I didn't report it because it wasn't a system I
could leave in that state for debugging.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca