Home > mailing lists

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint - Mailing list pgsql-hackers

From	Magnus Hagander
Subject	Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
Date	February 11, 2017 13:07:59
Msg-id	CABUevExpVYuLUgoNgYNNHxFmZqo3PuuaKgcVwYE5B5wCGScZkQ@mail.gmail.com Whole thread
In response to	[HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint (Michael Banck <michael.banck@credativ.de>)
Responses	Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting forcheckpoint
List	pgsql-hackers

Tree view

On Sat, Feb 11, 2017 at 10:38 AM, Michael Banck <michael.banck@credativ.de> wrote:

Hi,

one take-away from the Gitlab Post-Mortem[1] appears to be that after
their secondary lost replication, they were confused about what
pg_basebackup was doing when they tried to rebuild it. It just sat there
and did nothing (even with --verbose), so they assumed something was
wrong with either the primary or the connection, and restarted it
several times.

AFAICT, it turns out the checkpoint was written on the master (they
probably did not use -c fast), but this wasn't obvious to them:

Yeah, I've seen this happen to a number of people. I think that sounds like what's happened here as well. I've considered things in the line of the patch you posted, but never got around to actually doing anything about it.

ISTM that even with WAL streaming, nothing would be written on the
client server until the checkpoint is complete, as do_pg_start_backup()
runs the checkpoint and only returns the starting WAL location
afterwards.

The attached (untested) patch is to kick of a discussion on how to
improve the situation, it is supposed to mention the checkpoint when
--verbose is used and adds a paragraph about the checkpoint being run to
the Notes section of the documentation.

Docs look good to me, other than claiming that pg_basebackup runs on a server (it can run anywhere). I would just say "during which pg_basebackup will appear idle". How does that sound to you?

As for the code, while I haven't tested it, isn't the "checkpoint completed" message in the wrong place? Doesn't PQsendQuery() complete immediately, and the check needs to be put *after* the PQgetResult() call?

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

pgsql-hackers by date:

From: Michael Banck
Date: 11 February 2017, 12:38:09
Subject: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

From: Erik Rijkers
Date: 11 February 2017, 13:16:34
Subject: Re: [HACKERS] Logical replication existing data copy

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint - Mailing list pgsql-hackers

Previous

Next