Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high. - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.
Date
Msg-id CAB7nPqT5y9892TzLRzZ+CxvG6FwA206SoXcSbsvP6b-i=n3AUw@mail.gmail.com
Whole thread Raw
In response to BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.  (grant@amazon.com)
Responses Re: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-bugs
On Thu, Apr 23, 2015 at 7:09 AM,  <grant@amazon.com> wrote:
> 1.       Set max_prepared_transactions to 2 in postgresql.conf.
> 2.       start Postgres.
> 3.       Create two uncommitted prepared transactions: BEGIN; PREPARE
> TRANSACTION 'test1'; BEGIN; PREPARE TRANSACTION 'test2';
> 4.       Set max_prepared_transactions to 1 in postgresql.conf.
> 5.       Restart Postgres again.
>
> At this point the startup will fail with a fatal but the postmaster process
> keeps running.
>
> LOG:  database system was interrupted; last known up at 2015-04-16 17:19:56
> PDT
> LOG:  database system was not properly shut down; automatic recovery in
> progress
> LOG:  record with zero length at 0/1826C70
> LOG:  redo is not required
> LOG:  recovering prepared transaction 1685
> LOG:  recovering prepared transaction 1683
> FATAL:  maximum number of prepared transactions reached
> HINT:  Increase max_prepared_transactions (currently 1).
>
> Looks like their may be a LWLock that is not correctly getting released
> which causes the process to hang rather then exit.

Yep, the startup process remains stuck here, so we should release the
lock before issuing ERROR in twophase.c:
  * frame #0: 0x00007fff95004b72 libsystem_kernel.dylib`semop + 10
    frame #1: 0x000000010f59530f
postgres`PGSemaphoreLock(sema=0x00000001187a1190) + 63 at
pg_sema.c:387
    frame #2: 0x000000010f632af9
postgres`LWLockAcquireCommon(lock=0x000000010fc452e0,
mode=LW_EXCLUSIVE, valptr=0x0000000000000000, val=0) + 377 at
lwlock.c:1037
    frame #3: 0x000000010f63296b
postgres`LWLockAcquire(l=0x000000010fc452e0, mode=LW_EXCLUSIVE) + 43
at lwlock.c:900
    frame #4: 0x000000010f2f975d postgres`AtAbort_Twophase + 93 at
twophase.c:284
    frame #5: 0x000000010f2f9f24 postgres`AtProcExit_Twophase(code=1,
arg=0) + 20 at twophase.c:246

The patch attached fixes the problem for me, and I think that this
should be backpatched as well.
(Note to self: check the other calls of LWLockAcquire/Release to see
if there are other code paths in the same situation).
Regards,
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: grant@amazon.com
Date:
Subject: BUG #13128: Postgres deadlock on startup failure when max_prepared_transactions is not sufficiently high.
Next
From: Thomas Munro
Date:
Subject: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)