Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date
Msg-id 20150226145354.GA2384@alvh.no-ip.org
Whole thread Raw
In response to Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running  ("MauMau" <maumau307@gmail.com>)
List pgsql-hackers
FWIW a fix for this has been posted to all active branches:

Author: Andres Freund <andres@anarazel.de>
Branch: master [fd6a3f3ad] 2015-02-26 12:50:07 +0100
Branch: REL9_4_STABLE [d72115112] 2015-02-26 12:50:07 +0100
Branch: REL9_3_STABLE [abce8dc7d] 2015-02-26 12:50:07 +0100
Branch: REL9_2_STABLE [d67076529] 2015-02-26 12:50:07 +0100
Branch: REL9_1_STABLE [5c8dabecd] 2015-02-26 12:50:08 +0100
Branch: REL9_0_STABLE [82e0d6eb5] 2015-02-26 12:50:08 +0100
   Reconsider when to wait for WAL flushes/syncrep during commit.      Up to now RecordTransactionCommit() waited for
WALto be flushed (if   synchronous_commit != off) and to be synchronously replicated (if   enabled), even if a
transactiondid not have a xid assigned. The primary   reason for that is that sequence's nextval() did not assign a
xid,but   are worthwhile to wait for on commit.      This can be problematic because sometimes read only transactions
do  write WAL, e.g. HOT page prune records. That then could lead to read only   transactions having to wait during
commit.Not something people expect   in a read only transaction.      This lead to such strange symptoms as backends
beingseemingly stuck   during connection establishment when all synchronous replicas are   down. Especially annoying
whensaid stuck connection is the standby   trying to reconnect to allow syncrep again...      This behavior also is
involvedin a rather complicated <= 9.4 bug where   the transaction started by catchup interrupt processing waited for
syncrepusing latches, but didn't get the wakeup because it was already   running inside the same overloaded signal
handler.Fix the issue here   doesn't properly solve that issue, merely papers over the problems. In   9.5 catchup
interruptsaren't processed out of signal handlers anymore.      To fix all this, make nextval() acquire a top level
xid,and only wait for   transaction commit if a transaction both acquired a xid and emitted WAL   records.  If only a
xidhas been assigned we don't uselessly want to   wait just because of writes to temporary/unlogged tables; if only WAL
 has been written we don't want to wait just because of HOT prunes.      The xid assignment in nextval() is unlikely to
causeoverhead in   real-world workloads. For one it only happens SEQ_LOG_VALS/32 values   anyway, for another only
usageof nextval() without using the result in   an insert or similar is affected.      Discussion:
20150223165359.GF30784@awork2.anarazel.de,      369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau     Per complaint from maumau and Thom Brown      Backpatch all the way back;
9.0doesn't have syncrep, but it seems   better to be consistent behavior across all maintained branches.
 

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: Primary not sending to synchronous standby
Next
From: David Steele
Date:
Subject: Re: pgaudit - an auditing extension for PostgreSQL