On Sat, Feb 18, 2017 at 4:52 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I have my doubts about this actually addressing gitlab-like mistakes,
> though, because it's a helluva jump from "It's waiting and not doing
> anything," to "We need to remove the datadir." (One of the reasons being
> that non-empty directory is a local issue, and there's no reason why the
> tool should wait instead of just reporting an error.)
It's pretty clear that the gitlab postmortem involves multiple people
making multiple serious errors, including failing to test that the
ostensible backups could actually be restored. I was taught that rule
#1 as far as backups are concerned is to test that you can restore
them, so that seems like a big miss. However, I don't think the fact
they made other mistakes is a reason not to improve the things we can
improve and, certainly, having some way for pg_basebackup to tell you
that it's waiting for the master to checkpoint will help the next
person who is confused by that particular thing. That person may go
on to be confused by something else, but then again maybe not.
Improving the reporting in this case stands on its own merits.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company