Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high. - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.
Date
Msg-id 5538A018.9030608@iki.fi
Whole thread Raw
In response to Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-bugs
On 04/23/2015 04:20 AM, Michael Paquier wrote:
> On Thu, Apr 23, 2015 at 7:09 AM,  <grant@amazon.com> wrote:
>> 1.       Set max_prepared_transactions to 2 in postgresql.conf.
>> 2.       start Postgres.
>> 3.       Create two uncommitted prepared transactions: BEGIN; PREPARE
>> TRANSACTION 'test1'; BEGIN; PREPARE TRANSACTION 'test2';
>> 4.       Set max_prepared_transactions to 1 in postgresql.conf.
>> 5.       Restart Postgres again.
>>
>> At this point the startup will fail with a fatal but the postmaster process
>> keeps running.
>>
>> LOG:  database system was interrupted; last known up at 2015-04-16 17:19:56
>> PDT
>> LOG:  database system was not properly shut down; automatic recovery in
>> progress
>> LOG:  record with zero length at 0/1826C70
>> LOG:  redo is not required
>> LOG:  recovering prepared transaction 1685
>> LOG:  recovering prepared transaction 1683
>> FATAL:  maximum number of prepared transactions reached
>> HINT:  Increase max_prepared_transactions (currently 1).
>>
>> Looks like their may be a LWLock that is not correctly getting released
>> which causes the process to hang rather then exit.
>
> Yep, the startup process remains stuck here, so we should release the
> lock before issuing ERROR in twophase.c:

As Tom noted in the other thread, that is not generally required.
LWLockReleaseAll() is call during process exit.

Hmm. What happens here is that when the startup process is about to
exit, because of the ERROR, it calls the shmem-exit hooks.
AtProcExit_Twophase sees that MyLockedGxact != NULL, and it tries to
clear MyLockedGxact->locking_backend to release the entry.

That's bogus. MyLockedGxact != NULL indicates that the backend is
currently operating on the entry. But that's not the case at that point
during RecoverPreparedTransactions(). It has already completed
recovering the previous transaction, and has not yet locked a
GlobalTransaction entry for the next one.

RecoverPreparedTransactions() should clear MyLockedGxact after it has
recovered the transaction, after the StandbyReleaseLockTree() call.
That's analogous to the normal PREPARE TRANSACTION path, in
PrepareTransaction(), where we call PostPrepare_Twophase() after
releasing the locks.

Attached is a patch for that. I introduced this bug in commit
bb38fb0d43c8d7ff54072bfd8bd63154e536b384, which added the proc-exit hook
and the call PostPrepare_Twophase(), so unless you see more bugs in
this, I'll commit and backpatch this to all versions.

- Heikki


Attachment

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Next
From: Robert Haas
Date:
Subject: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)