Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
Date
Msg-id CA+Tgmoa6tRN-dDAEygmmr9K5HiiZ6xt=g0e-3E7o=TGAuAwm+w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting forcheckpoint  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sat, Feb 18, 2017 at 4:52 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I have my doubts about this actually addressing gitlab-like mistakes,
> though, because it's a helluva jump from "It's waiting and not doing
> anything," to "We need to remove the datadir." (One of the reasons being
> that non-empty directory is a local issue, and there's no reason why the
> tool should wait instead of just reporting an error.)

It's pretty clear that the gitlab postmortem involves multiple people
making multiple serious errors, including failing to test that the
ostensible backups could actually be restored.  I was taught that rule
#1 as far as backups are concerned is to test that you can restore
them, so that seems like a big miss.  However, I don't think the fact
they made other mistakes is a reason not to improve the things we can
improve and, certainly, having some way for pg_basebackup to tell you
that it's waiting for the master to checkpoint will help the next
person who is confused by that particular thing.  That person may go
on to be confused by something else, but then again maybe not.
Improving the reporting in this case stands on its own merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Does having a NULL column automatically exclude thetable from the tupleDesc cache?
Next
From: Pavan Deolasee
Date:
Subject: Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY