Re: postgres bug report - Mailing list pgsql-bugs

From Tom Lane
Subject Re: postgres bug report
Date
Msg-id 13121.964190662@sss.pgh.pa.us
Whole thread Raw
In response to postgres bug report  (jfaith <jfaith@cemsys.com>)
List pgsql-bugs
jfaith <jfaith@cemsys.com> writes:
> A daily 'vacuum verbose' cron failed, the end of the log follows:-
> NOTICE:  Index crd1: Pages 533; Tuples 104774: Deleted 6159. CPU
> 0.43s/0.81u sec.
> pqReadData() -- backend closed the channel unexpectedly.

Is there a coredump file from the vacuum crash?  (Look in the
data/base/DBNAME/ directory for the database.)  If so a backtrace
from it would be helpful.

> The prtc_ncn process died when the cron ran. On restart it seemed to be
> fine.

> At the time the cron runs prtcwri would not have any activity and did
> not die,
> when audits came though later its log file indicated they were performed
> correctly. However when I ran psql selects on cards did not show the new
> data!

This seems odd.  A hard crash in the vacuum process should lead the
postmaster to abort all other active backends.  Why would prtc_ncn's
backend quit but not prtcwri's ?

> If it will be helpfull I can simulate most of these cicumstances on a
> test machine here in an attempt to recreate the failure. What debugging
> options should I use?

Compile with -g and assert checking (configure --enable-cassert handles
the second, for the first the easiest way is "gmake PROFILE=-g all").

If you did not get a corefile above, this probably indicates you are
starting the postmaster with coredumps disabled --- on many systems
processes started from boot scripts or init run with "ulimit -c 0" by
default, and you have to change that to get cores from backends.  Check
that before trying very hard to simulate the failure (an easy way to
verify whether you get cores from crashes is "kill -SEGV backendPID" on
a running backend...)

> I've just looked at the code for vacuum. If I understand correctly it
> locks each table in turn exclusively.

Yes.  There is no alternative to that.  A trick that many people use
is to drop indexes on a table, vacuum the table, rebuild indexes;
index-vacuuming is slower than a full rebuild at present :-(.  However
if you are using the indexes to enforce UNIQUE constraints then this
may not be a good answer.

We have long-range plans to allow space recycling without VACUUM,
which'll make life easier for 24x7 operations.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: problem with view and case - please help
Next
From: Tom Lane
Date:
Subject: Re: Damn bug!