Re: backup manifests - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: backup manifests
Date
Msg-id 20200103193559.GY3195@tamriel.snowman.net
Whole thread Raw
In response to Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Jan 3, 2020 at 12:01 PM Stephen Frost <sfrost@snowman.net> wrote:
> > You're certainly intending to do *something* with the manifest, and
> > while I appreciate that you feel you've come up with a complete use-case
> > that this simple manifest will be sufficient for, I frankly doubt
> > that'll actually be the case.  Not long ago it wasn't completely clear
> > that a manifest at *all* was even going to be necessary for the specific
> > use-case you had in mind (I'll admit I wasn't 100% sure myself at the
> > time either), but now that we're down the road of having one, I can't
> > agree with the blanket assumption that we're never going to want to
> > extend it, or even that it won't be necessary to add to it before this
> > particular use-case is fully addressed.
> >
> > And the same goes for the other things that were discussed up-thread
> > regarding memory context and error handling and such.
>
> Well, I don't know how to make you happy here.

I suppose I should admit that, first off, I don't feel you're required
to make me happy, and I don't think it's necessary to make me happy to
get this feature into PG.

Since you expressed that interest though, I'll go out on a limb and say
that what would make me *really* happy would be to think about where the
project should be taking pg_basebackup, what we should be working on
*today* to address the concerns we hear about from our users, and to
consider the best way to implement solutions to what they're actively
asking for a core backup solution to be providing.  I get that maybe
that isn't how the world works and that sometimes we have people who
write our paychecks wanting us to work on something else, and yes, I'm
sure there are some users who are asking for this specific thing but I
certainly don't think it's a common ask of pg_basebackup or what users
feel is missing from the backup options we offer in core; we had users
on this list specifically saying they *wouldn't* use this feature
(referring to the differential backup stuff, of course), in fact,
because of the things which are missing, which is pretty darn rare.

That's what would make *me* happy.  Even some comments about how to
*get* there while also working towards these features would be likely
to make me happy.  Instead, I feel like we're being told that we need
this feature badly in v13 and we're going to cut bait and do whatever
is necessary to get us there.

> It looks to me like
> insisting on a JSON-format manifest will likely mean that this doesn't
> get into PG13 or PG14 or probably PG15, because a port of all that
> machinery to work in frontend code will be neither simple nor quick.

I certainly understand that these things take time, sometimes quite a
bit of it as the past 2 years have shown in this other little side
project, and that was hacking without having to go through the much
larger effort involved in getting things into PG core.  That doesn't
mean that kind of effort isn't worthwhile or that, because something is
a bunch of work, we shouldn't spend the time on it.  I do feel what
you're after here is a multi-year project, and I've said before that I
don't agree that this is a feature (the differential backup with
pg_basebackup thing) that makes any sense going into PG at this time,
but I'm also not trying to block this feature, just to share the
experience that we've gotten from working in this area for quite a
while and hopefully help guide the effort in PG away from pitfalls and
in a good direction long-term.

> If you want this to happen for this release, you've got to be willing
> to settle for something that can be implemented in the time we have.

I'm not sure what you're expecting here, but for my part, at least, I'm
not going to be terribly upset if this feature doesn't make this release
because there's an agreement and understanding that the current
direction isn't a good long-term solution.  Nor am I going to be
terribly upset about the time that's been spent on this particular
approach given that there's been no shortage of people commenting that
they'd rather see an extensible format, like JSON, and has been for
quite some time.

All that said- one thing we've done is to consider that *we* are the
ones who are writing the JSON, while also being the ones to read it- we
don't need the parsing side to understand and deal with *any* JSON that
might exist out there, just whatever it is the server creates/created.
It may be possible to use that to simplify the parser, or perhaps at
least to accept that if it ends up being given something else that it
might not perform as well with it.  I'm not sure how helpful that will
be to you, but I recall David finding it a helpful thought.

> I'm not sure whether what you and David are arguing boils down to
> thinking that I'm wrong when I say that doing that is hard, or whether
> you know it's hard but you just don't care because you'd rather see
> the feature go nowhere than use a format other than JSON. I don't see
> much difference between the latter position and a desire to block the
> feature permanently. And if it's the former then you have yet to make
> any suggestions for how to get it done with reasonable effort.

There seems to be a great deal of daylight between the two positions
you're proposing I might have (as I don't speak for David..).

I *do* think there's a lot of work that would need to be done here to
make this a good solution.  I'm *not* completely against other formats
besides JSON.  Even more so though, I am *not* argueing that this
feature should go 'nowhere', whether it uses JSON or not.

What I don't care for is having a hand-hacked inflexible format that's
going to require everyone down the road to implement their own parser
for it and bespoke code for every version of the custom format that
there ends up being, *including* PG core, to be clear.  Whatever utility
is going to be utilizing this manifest, it's going to need to support
older versions, just like pg_dump deals with older versions of custom
format dumps (though we still get people complaining about not being
able to use older tools with newer dumps- it'd be awful nice if we
could use JSON, or something, and then just *add* things that wouldn't
break older tools, except for the rare case where we don't have a
choice..).  Not to mention the debugging grief and such, since we can't
just use a tool like jq to check out what's going on.

As to the reference to pg_hba.conf- I don't think the packagers would
necessairly agree that there's been little grief around that, but even
so, a given pg_hba.conf is only going to be used with a given major
version and, sure, it might have to be updated to that newer major
version's format if we change the format and someone copies the old
version to the new version, but that's during a major version upgrade of
the server, and at least newer tools don't have to deal with the older
pg_hba.conf version.

Also, pg_hba.conf doesn't seem like a terribly good example in any case-
the last time the actual structure of that file was changed in a
breaking way was in 2002 when the 'user' column was added, and the
example pg_hba.conf from that commit works just fine with PG12, it
seems, based on some quick tests.  There have been other
backwards-incompatible changes, of course, the last being 6 years ago, I
think, when 'krb5' was removed.  I suppose there is some chance that you
might have a PG12-configured pg_hba.conf and you try copying that back
to a PG11 or PG10 server and it doesn't work, but that strikes me as far
less of an issue than trying to read a PG12 backup with a PG11 tool,
which we know people do because they complain on the lists about it with
pg_dump/pg_restore.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: aggregate crash
Next
From: Pavel Stehule
Date:
Subject: Re: Assigning ROW variable having NULL value to RECORD type variabledoesn't give any structure to the RECORD variable.