Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date
Msg-id 20140707071448.GB29124@alap3.anarazel.de
Whole thread Raw
In response to [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running  ("MauMau" <maumau307@gmail.com>)
Responses Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
List pgsql-hackers
Hi,

On 2014-07-04 22:59:15 +0900, MauMau wrote:
> My customer reported a strange connection hang problem.  He and I couldn't
> reproduce it.  I haven't been able to understand the cause, but I can think
> of one hypothesis.  Could you give me your opinions on whether my hypothesis
> is correct, and a direction on how to fix the problem?  I'm willing to
> submit a patch if necessary.

> The connection attempt is waiting for a reply from the standby.  This is
> strange, because we didn't anticipate that the connection establishment (and
> subsequent SELECT queries) would update something and write some WAL.  The
> doc says:
> 
> http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION
> 
> "When requesting synchronous replication, each commit of a write transaction
> will wait until confirmation is received that the commit has been written to
> the transaction log on disk of both the primary and standby server.
> ...
> Read only transactions and transaction rollbacks need not wait for replies
> from standby servers. Subtransaction commits do not wait for responses from
> standby servers, only top-level commits."
> 
> 
> [Hypothesis]
> Why does the connection processing emit WAL?
> 
> Probably, it did page-at-a-time vacuum during access to pg_database and
> pg_authid for client authentication.  src/backend/access/heap/README.HOT
> describes:

> [How to fix]
> Of course, adding "-o '-c synchronous_commit=local'" or "-o '-c
> synchronous_standby_names='" to pg_ctl start in the recovery script would
> prevent the problem.

> But isn't there anything to fix in PostgreSQL?  I think the doc needs
> improvement so that users won't misunderstand that only write transactions
> would block at commit.

I think we should rework RecordTransactionCommit() to only wait for the
standby if `markXidCommitted' and not if `wrote_xlog'. There really
isn't a reason to make a readonly transaction's commit wait just because
it did some hot pruning.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: 9.5 CF1
Next
From: Kohei KaiGai
Date:
Subject: Re: 9.5 CF1