Thread: Ack...major(?) bug just found in v6.3.1...

Ack...major(?) bug just found in v6.3.1...

From
The Hermit Hacker
Date:
acctng=> vacuum radlog;
NOTICE:  BlowawayRelationBuffers(radlog, 3): block 786 is referenced
(private 0, last 0, global 53)
FATAL 1:  VACUUM (vc_rpfheap): BlowawayRelationBuffers returned -2
acctng=>

Just got this on one of my tables...indices are all dropped before doing
the vacuum...




Re: [HACKERS] Ack...major(?) bug just found in v6.3.1...

From
"Vadim B. Mikheev"
Date:
The Hermit Hacker wrote:
>
> acctng=> vacuum radlog;
> NOTICE:  BlowawayRelationBuffers(radlog, 3): block 786 is referenced
> (private 0, last 0, global 53)
                      ^^^^^^^^^
I assume that you got some FATAL before vacuum.

We have problems with elog(FATAL): backend just exits (with normal code)
and postmaster doesn't re-initialize shmem etc though backend could
have some spinlocks and pinned buffers. This leaves system in unpredictable
state!

IMO, in elog(FATAL) backend should abort() (just like in ASSERT).

> FATAL 1:  VACUUM (vc_rpfheap): BlowawayRelationBuffers returned -2
> acctng=>
>
> Just got this on one of my tables...indices are all dropped before doing
> the vacuum...

Comments ?

Vadim

Re: [HACKERS] Ack...major(?) bug just found in v6.3.1...

From
dg@illustra.com (David Gould)
Date:
> The Hermit Hacker wrote:
> >
> > acctng=> vacuum radlog;
> > NOTICE:  BlowawayRelationBuffers(radlog, 3): block 786 is referenced
> > (private 0, last 0, global 53)
>                       ^^^^^^^^^
> I assume that you got some FATAL before vacuum.
>
> We have problems with elog(FATAL): backend just exits (with normal code)
> and postmaster doesn't re-initialize shmem etc though backend could
> have some spinlocks and pinned buffers. This leaves system in unpredictable
> state!
>
> IMO, in elog(FATAL) backend should abort() (just like in ASSERT).

The other way to do this, and it might be a good idea is to review the code
for elogs while holding a spinlock and make sure to release the lock first!

This problem will get worse if we start allowing query cancelation from
the client, and when the spinlock backoff code goes in.

In the long run you have to make all the signal handlers safe. Safe means
they set a flag and the code periodically polls for it to catch whatever
the condition is. Obviously this doesn't work for SEGV etc, but IO and the
timers yes.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - Linux. Not because it is free. Because it is better.