Re: BUG #14321: pg_basebackup --xlog-method=stream fails - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #14321: pg_basebackup --xlog-method=stream fails
Date
Msg-id CAB7nPqR1bRAdE2SruRfpH39B5cO-sHO-_tqOrpTY0foJXvh-rw@mail.gmail.com
Whole thread Raw
In response to BUG #14321: pg_basebackup --xlog-method=stream fails  (Jürgen Strobel <juergen+postgresql@strobel.info>)
Responses Re: BUG #14321: pg_basebackup --xlog-method=stream fails  (Jürgen Strobel <juergen+postgresql@strobel.info>)
List pgsql-bugs
On Sat, Sep 10, 2016 at 9:10 AM, J=C3=BCrgen Strobel
<juergen+postgresql@strobel.info> wrote:
> First, I do have another WAL archive (usually).
> But no I only see the first WAL segments up to the point when the problem
> occurs, then nothing more.
>
> The timeline as far as I can tell is:
>
> 1. pg_basebackup --xlog-method=3Dstream starts and creates 2 connections =
for
> backup and WAL streaming.
> 2. The VM's crappy IO system hickups and stalls the whole VM for a
> surprisingly long time.

I know that people can do fancy things here, believe me.

> 3. The server runs into wal_sender_timeout and closes the WAL streaming
> connection.
> 4. pg_basebackup prints the warning, and continues the filesystem copy, *=
but
> makes no effort to re-open the WAL streaming connection*. With ps I see
> zombie child of the pg_basbackup process, I assume that's the one doing t=
he
> WAL streaming.
> 5. pg_baseback finishes up with the second half of pg_xlog missing, and t=
he
> DB fails to start.
>
> In contrast if the same problem occurs while running pg_receivexlog it wa=
its
> for 5 seconds then reopens the connection. I think that pg_basebackup sho=
uld
> show the same resilience.

You can blame your VM here to begin with :(
Even with the default values of pg_basebackup --status-interval and
wal_sender_timeout on the server there is enough margin to prevent
things to get killed, but if things get heavily constrained on I/O...
Well, there is not much than any software could do... Now I agree that
there would be room for improvement to make pg_basebackup retry a
stream instead of failing, and that may be something that people would
be willing to have. But that's hard to think about improvements in
this area as something else than a new feature, and not a bug.

Anyway, replication slots would not help here if you just rely on
pg_basebackup to finish the job.
--=20
Michael

pgsql-bugs by date:

Previous
From: Jürgen Strobel
Date:
Subject: BUG #14321: pg_basebackup --xlog-method=stream fails
Next
From: Jürgen Strobel
Date:
Subject: Re: BUG #14321: pg_basebackup --xlog-method=stream fails