Thread: Errors on VACUUM

Errors on VACUUM

From
Bob Smith
Date:
I received the following error from a VACUUM ANALYZE:

NOTICE:  FlushRelationBuffers(place, 454): block 231 is referenced
(private 0, global 4)
FATAL 1:  VACUUM (vc_repair_frag): FlushRelationBuffers returned -2

and psql lost the connection right after that.  This was repeatable,
after the first occurrence I re-started the postmaster and tried again,
same result.  The particular table in question is static data, that is
it was originally filled in once by COPY and there were no subsequent
inserts or deletes.  But I had just added a column to the table and
updated it from another table to eliminate a frequently-done join, that
was the reason for doing the VACUUM.

Since I had the text file with the complete table contents from the
original COPY, I decided to re-build the table, so next I did DROP
TABLE, and I got this:

NOTICE:  Buffer Leak: [004] (freeNext=-3, freePrev=-3,
relname=place_pid, blockNum=1, flags=0xc, refcount=2 -1)
NOTICE:  Buffer Leak: [005] (freeNext=-3, freePrev=-3, relname=place,
blockNum=231, flags=0xc, refcount=4 -1)
NOTICE:  Buffer Leak: [008] (freeNext=-3, freePrev=-3, relname=place,
blockNum=85, flags=0xc, refcount=3 -1)
NOTICE:  Buffer Leak: [011] (freeNext=-3, freePrev=-3, relname=place,
blockNum=0, flags=0xc, refcount=2 -1)
.
.

repeating about 20 more times with blockNum varying, and finally ending
with the table being dropped.  I then re-built the table from scratch
and everything seems to be fine now, VACUUM no longer gives errors.

Here is my system configuration:

PostgreSQL 7.0.2
RedHat Linux 7.0
P-III 800, 768MB RAM, 80GB disk

So I'm wondering what happened here, what might have caused the original
error, if something else is potentially still corrupted, and if I should
maybe re-build the whole database to be safe?

Advice much appreciated!

Bob Smith
Hammett & Edison, Inc.
bsmith@h-e.com


Re: Errors on VACUUM

From
Tom Lane
Date:
Bob Smith <bsmith@h-e.com> writes:
> PostgreSQL 7.0.2

Time to update ...

I'm rather surprised that restarting the postmaster didn't make the
error go away, but it's unlikely that anyone will care to investigate
--- unless you can reproduce the problem in 7.1.3 or later.

            regards, tom lane

Re: Errors on VACUUM

From
Bob Smith
Date:
On Friday, March 15, 2002, at 09:53 , Tom Lane wrote:

> Bob Smith <bsmith@h-e.com> writes:
>> PostgreSQL 7.0.2
>
> Time to update ...

I know, I know...  But I'm about two weeks from bringing a whole new
server on-line with the latest version of Postgres, so I don't want to
hassle with upgrading the old one.  I just need to keep it limping along
for a little while longer.

> I'm rather surprised that restarting the postmaster didn't make the
> error go away, but it's unlikely that anyone will care to investigate
> --- unless you can reproduce the problem in 7.1.3 or later.

So this isn't an error that would be caused by some sort of file
problem?  The server got new hard drives a few months ago and files went
through a backup/restore with tar, I didn't use pg_dump/pg_restore, so
it occurred to me that something might have been scrambled.  But until
this VACUUM failure, the only problem I've _ever_ had, before or after
the drive upgrade, was one occurrence of the "Buffer Leak" error which I
couldn't reproduce.

Well, I guess I just keep my fingers crossed for a couple of weeks until
I have my new server up.

Thanks!

Bob Smith
Hammett & Edison, Inc.
bsmith@h-e.com


Re: Errors on VACUUM

From
Tom Lane
Date:
Bob Smith <bsmith@h-e.com> writes:
>> I'm rather surprised that restarting the postmaster didn't make the
>> error go away, but it's unlikely that anyone will care to investigate
>> --- unless you can reproduce the problem in 7.1.3 or later.

> So this isn't an error that would be caused by some sort of file
> problem?

AFAICT what you have there is just dangling buffer reference counts;
ie, something didn't release its reference to a disk page buffer in
shared memory.  We used to have quite a number of bugs of that ilk,
though I've not heard of such lately.  They're fairly harmless
normally.

It's not clear to me why a restart (which would naturally clear shared
memory) didn't fix the problem.  Perhaps you had some configuration
on disk that tickled the original buffer-leak bug again each time you
restarted; but I dunno what.  VACUUM itself was never prone to such
errors, so it's hard to see how a restart and immediate VACUUM would
show the problem.  Were you doing other stuff in between?  Have you
got any interesting stuff (like functional indexes, maybe) on this
table?

            regards, tom lane

Re: Errors on VACUUM

From
Bob Smith
Date:
On Friday, March 15, 2002, at 03:07 , Tom Lane wrote:

> It's not clear to me why a restart (which would naturally clear shared
> memory) didn't fix the problem.

Er, um, (humble look), I just studied my syslog, and in fact the
postmaster _didn't_ restart.  I tried to restart it using the Red Hat
control script from /etc/rc.d/init.d,  but apparently that script lied
to me when it said the restart succeeded.  The syslog clearly shows the
postmaster didn't actually restart until about 10 minutes later, _after_
the second error.  So I was talking to the same postmaster every time.
Arrggh.

I'm reassured now that this is just a passing memory management issue
and not a symptom of a more serious problem, I'm sure the error wouldn't
have happened the second time if the postmaster had in fact restarted.
Everything looks fine now so I'm going to stop worrying about it.

Thanks much for your help!

Bob Smith
Hammett & Edison, Inc.
bsmith@h-e.com