Re: 7.4.5 losing committed transactions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: 7.4.5 losing committed transactions
Date
Msg-id 5883.1096065425@sss.pgh.pa.us
Whole thread Raw
In response to Re: 7.4.5 losing committed transactions  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: 7.4.5 losing committed transactions  (Jan Wieck <JanWieck@Yahoo.com>)
List pgsql-hackers
Jan Wieck <JanWieck@Yahoo.com> writes:
> Is it somehow possible that the commit record was still sitting in the 
> shared WAL buffers (unwritten) when the response got sent to the client? 

I don't think so.  What I see in the two cases I have now are:

(1) The backend that was doing the "lost" transaction is *not* the one
I kill -9'd.  I know this in both cases because I know which table has
the missing entries, and I can see that that instance of the script got
a "WARNING: terminating connection because of crash of another server
process" message rather than just a connection closure.

(2) There's a pretty fair distance in the WAL log between the entries
made by the "lost" transaction and the checkpoint made by recovery ---
a dozen or so other transactions were made and committed in between.
It seems unlikely that this transaction would have been the only one to
lose a WAL record if something like that had happened.

What I'm currently speculating about is that there might be some
weirdness associated with the very act of sending out the WARNING.
quickdie() isn't doing anything to ensure that the system is in a good
state before it calls ereport --- which is probably not so cool
considering it is a signal handler.  It might be wise to reset at least
the elog.c state before doing this.

Can you still reproduce the problem if you take out the ereport call
in quickdie()?

BTW, what led you to develop this test setup ... had you already seen
something that made you suspect a data loss problem?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: 7.4.5 losing committed transactions
Next
From: Cott Lang
Date:
Subject: implosion follow up, 7.4.5