Thread: pg_rewind hangs if --source-server is used and syncrep is enabled

pg_rewind hangs if --source-server is used and syncrep is enabled

From
Michael Banck
Date:
Hi,

if pg_rewind is told to fetch data via a libpq connection
(--source-server), synchronous replication is enabled and there is only
one sync standby (pilot error, but sill); pg_rewinding the old master
hangs at the CREATE TEMPORARY TABLE step (CREATE TABLE waiting for
X/XXXXXXXX).  At least this happened to one of our clients while
evaluating pg_rewind.

To the user, the last thing printed is "need to copy XXXX MB [...]".  If
the user cancels the pg_rewind command with ^C, the backend keeps
hanging around even in --dry-run mode.  That won't hurt too much as it
does not seem to block future pg_rewind runs after synchronous_commit
has been set to a different value, but looks surprising to me.

Not sure whether pg_rewind could error out gracefully without hanging in
this case, but maybe it could/should clean up the query on SIGTERM? And
at the least, maybe this caveat should be documented?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Re: pg_rewind hangs if --source-server is used and syncrep is enabled

From
Michael Banck
Date:
On Wed, Oct 05, 2016 at 04:39:39PM +0200, Michael Banck wrote:
> if pg_rewind is told to fetch data via a libpq connection
> (--source-server), synchronous replication is enabled and there is only
> one sync standby (pilot error, but sill); pg_rewinding the old master
> hangs at the CREATE TEMPORARY TABLE step (CREATE TABLE waiting for
> X/XXXXXXXX).  At least this happened to one of our clients while
> evaluating pg_rewind.
>
> To the user, the last thing printed is "need to copy XXXX MB [...]".  If
> the user cancels the pg_rewind command with ^C, the backend keeps
> hanging around even in --dry-run mode.  That won't hurt too much as it
> does not seem to block future pg_rewind runs after synchronous_commit
> has been set to a different value, but looks surprising to me.
>
> Not sure whether pg_rewind could error out gracefully without hanging in
> this case,

My colleague Christoph Berg pointed out that pg_rewind could just set
synchronous_commit = local before creating the temporary table, which
indeed works, proof-of-concept patch attached


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment

Re: pg_rewind hangs if --source-server is used and syncrep is enabled

From
Michael Paquier
Date:
On Wed, Oct 5, 2016 at 11:53 PM, Michael Banck
<michael.banck@credativ.de> wrote:
> On Wed, Oct 05, 2016 at 04:39:39PM +0200, Michael Banck wrote:
>> To the user, the last thing printed is "need to copy XXXX MB [...]".  If
>> the user cancels the pg_rewind command with ^C, the backend keeps
>> hanging around even in --dry-run mode.  That won't hurt too much as it
>> does not seem to block future pg_rewind runs after synchronous_commit
>> has been set to a different value, but looks surprising to me.

Oops.

>> Not sure whether pg_rewind could error out gracefully without hanging in
>> this case,
>
> My colleague Christoph Berg pointed out that pg_rewind could just set
> synchronous_commit = local before creating the temporary table, which
> indeed works, proof-of-concept patch attached

Even synchronous_commit = off would not matter much, and we could just
use that for performance reasons. The temporary table used in this
context is just used to track the delta chunks of blocks, so this
solution sounds better to me. I'll patch 9.4's pg_rewind similarly to
what is decided here, and we could as well use an extra PQexec call to
bring more clarity for the code, now an extra round-trip could be a
big deal where network lag matters, but compared to the COPY chunks
sent out that would not matter much I guess. I am just posting another
version, and added a CF entry to not lose track of it:
https://commitfest.postgresql.org/11/811/
--
Michael

Attachment

Re: pg_rewind hangs if --source-server is used and syncrep is enabled

From
Heikki Linnakangas
Date:
On 10/06/2016 02:24 AM, Michael Paquier wrote:
> On Wed, Oct 5, 2016 at 11:53 PM, Michael Banck
> <michael.banck@credativ.de> wrote:
>> My colleague Christoph Berg pointed out that pg_rewind could just set
>> synchronous_commit = local before creating the temporary table, which
>> indeed works, proof-of-concept patch attached
>
> Even synchronous_commit = off would not matter much, and we could just
> use that for performance reasons. The temporary table used in this
> context is just used to track the delta chunks of blocks, so this
> solution sounds better to me. I'll patch 9.4's pg_rewind similarly to
> what is decided here, and we could as well use an extra PQexec call to
> bring more clarity for the code, now an extra round-trip could be a
> big deal where network lag matters, but compared to the COPY chunks
> sent out that would not matter much I guess.

Committed, thanks! I moved the call to where we establish the 
connection, that felt slightly more natural.

- Heikki




Re: pg_rewind hangs if --source-server is used and syncrep is enabled

From
Michael Paquier
Date:
On Thu, Oct 6, 2016 at 7:37 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Committed, thanks! I moved the call to where we establish the connection,
> that felt slightly more natural.

Thanks for the commit. Indeed that's better with the other sanity checks.
-- 
Michael