Thread: bufmgr and smgr don't talk to each other, apparently

bufmgr and smgr don't talk to each other, apparently

From
Tom Lane
Date:
I have just noticed something that's been broken for a good long while
(at least since 6.3): bufmgr.c expects that I/O errors will result in
an SM_FAIL return code from the smgr.c routines, but smgr.c does no
such thing: it does elog(ERROR) if it sees a failure.  All of the
"error handling" paths in bufmgr.c are dead code and have been since
at least 6.3.

It seems to me that we should either reduce smgr.c's elog()s to NOTICEs,
or rip out all of the dead code in bufmgr.c.  I'm inclined to the
latter, since the former seems likely to create new bugs.

I'm also thinking that AbortBufferIO is *way* overstepping its authority
by forcing a postmaster restart if it notices a double write failure.
The dirty buffer is a problem, no doubt, but this solution looks like
urban renewal via A-bomb.  I'd rather just keep failing anytime some
transaction tries to write the buffer --- better that than taking out
all active transactions whether they'd ever touched that buffer or not.
If the write failure really is permanent, the dbadmin would eventually
have to intervene via a manual restart, but a manual restart at the time
of the dbadmin's choosing seems better than forcing a failure under
load.

Comments?
        regards, tom lane


RE: bufmgr and smgr don't talk to each other, apparently

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Tom Lane
> 
> I have just noticed something that's been broken for a good long while
> (at least since 6.3): bufmgr.c expects that I/O errors will result in
> an SM_FAIL return code from the smgr.c routines, but smgr.c does no
> such thing: it does elog(ERROR) if it sees a failure.  All of the

except smgropen(). It's not easy to return from mdxxx() in case of
errors. Fortunately I succeeded to return from mdopen() in 'file non-
existent' cases.

> "error handling" paths in bufmgr.c are dead code and have been since
> at least 6.3.
> 
> It seems to me that we should either reduce smgr.c's elog()s to NOTICEs,
> or rip out all of the dead code in bufmgr.c.  I'm inclined to the
> latter, since the former seems likely to create new bugs.
>

I also prefer the latter. Even though smgr returns SM_FAIL,md stuff
already calls elog(ERROR) in many places.

Regards.

Hiroshi Inoue


Re: bufmgr and smgr don't talk to each other, apparently

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> (at least since 6.3): bufmgr.c expects that I/O errors will result in
>> an SM_FAIL return code from the smgr.c routines, but smgr.c does no
>> such thing: it does elog(ERROR) if it sees a failure.  All of the

> except smgropen().

Right.  I'm mainly looking at the block read/write/flush calls,
which have a lot of now-useless error recovery code after them.

> I also prefer the latter. Even though smgr returns SM_FAIL,md stuff
> already calls elog(ERROR) in many places.

Good point, and the fd.c level may have some elogs too...
        regards, tom lane