Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting forcheckpoint - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting forcheckpoint
Date
Msg-id c869c5ba-0aa2-06df-4e1d-60169cf7e230@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting forcheckpoint  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
List pgsql-hackers
On 02/17/2017 08:17 PM, Jim Nasby wrote:> On 2/14/17 5:18 PM, Robert Haas wrote:>> On Tue, Feb 14, 2017 at 4:06 PM,
AlvaroHerrera>> <alvherre@2ndquadrant.com> wrote:>>> I'd rather have a --quiet mode instead.  If you're running it by
hand,>>>you're likely to omit the switch, whereas when writing the cron job>>> you're going to notice lack of switch
evenbefore you let the job run>>> once.>>>> Well, that might've been a better way to design it, but changing it>> now
wouldbreak backward compatibility and I'm not really sure that's>> Meh... it's really only going to affect cronjobs or
scripts,which are> easy enough to fix, and you're not going to have that many of them (or> if you do you certainly have
anautomated way to push the update).>
 

I think you're underestimating the breakage and overestimating how easy 
it's going to be to it. It's true we'd only change this in a major 
version, so people should assume possible breakage and test.
>> a good idea.  Even if it is, it's a separate concern from whether or>> not in the less-quiet mode we should point
outthat we're waiting for>> a checkpoint on the server side.>> Well, --quite was suggested because of confusion from
pg_basebackup>twiddling it's thumbs...
 

I'm in favor of the '--verbose' route. People are used to that when 
investigating issues, and it does not break existing cron jobs. I can 
live with --quiet though, as long as we don't resort to some craziness 
along the lines "if there's tty be verbose, otherwise be quiet".

I have my doubts about this actually addressing gitlab-like mistakes, 
though, because it's a helluva jump from "It's waiting and not doing 
anything," to "We need to remove the datadir." (One of the reasons being 
that non-empty directory is a local issue, and there's no reason why the 
tool should wait instead of just reporting an error.)

FWIW before messing with the pg_basebackup code, perhaps we should 
improve the documentation and explain clearly the meaning of 'fast' and 
'spread' checkpoint modes. Right now, pg_basebackup docs only say this:
  Sets checkpoint mode to fast or spread (default) (see Section 24.3.3).

which is pretty damn useless, when you're investigating an issue. And 
the referenced section (Making a Base Backup Using the Low Level API) 
does not clearly explain how this maps to pg_start_backup(_,?).

What about adding a paragraph into pg_basebackup docs, explaining that 
with 'fast' it does immediate checkpoint, while with 'spread' it'll wait 
for a spread checkpoint.

regards

-- Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade
Next
From: Peter Eisentraut
Date:
Subject: [HACKERS] logical replication access control patches