Re: Updated backup APIs for non-exclusive backups - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: Updated backup APIs for non-exclusive backups |
Date | |
Msg-id | CABUevExxspZG5jQN8ahd=0yy-9TGJBxP_BLQ_EM1JSDqqb1EzA@mail.gmail.com Whole thread Raw |
In response to | Re: Updated backup APIs for non-exclusive backups (David Steele <david@pgmasters.net>) |
Responses |
Re: Updated backup APIs for non-exclusive backups
Re: Updated backup APIs for non-exclusive backups |
List | pgsql-hackers |
On Thu, Mar 31, 2016 at 4:00 AM, David Steele <david@pgmasters.net> wrote:
On 3/30/16 4:18 AM, Magnus Hagander wrote:
>
> On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
> This certainly looks like it would work but it raises the barrier for
> implementing backups by quite a lot. It's fine for backrest or barman
> but it won't be pleasant for anyone who has home-grown scripts.
>
>
> How much does it really raise the bar, though?
>
> It would go from "copy all files and make damn sure you copy pg_control
> last, and rename it to pg_control.backup" to "take this binary blob you
> got from the server and write it to pg_control.backup"?
>
> Also, the target of these APIs is specifically the backup tools and not
> homewritten scripts.
Then what would home-grown scripts use, the deprecated API that we know
has issues?
They should use either pg_basebackup/pg_receivexlog, or they should use a tool like backrest or barman.
> A simple shellscript will have trouble enough using
> it in the first place since it requires a persistent connection to the
> database.
All that's required is to spawn a psql process. I'm no shell expert but
that's simple enough.
Oh, it's certainly doable, but you also need to come back and talk to it at a later time (to call stop backup), and do something useful with a multiline output. None of that is particularly easy. Certainly doable, but it becomes the wrong tool for the job.
> But those scripts are likely broken anyway.
Yes, probably. Backup and especially archiving correctly are harder
than most people realize.
Exactly. Which is why we should discourage people from doing that, and instead point them to the tools that do the job right. This is the whole reason we're making this change in the first place.
> The main reason for Heikki to suggest this one over the other basic one
> is that it brings protection against the "backup script/program crashed
> halfway through but the user still tried to restore from that". They will
> outright fail because there is no pg_control.backup in that case.
But if we are going to make this complicated I'm not sure it's a big
deal just to require pg_control to be copied last. pgBackRest already
does that and Barman probably does, too.
It does (I did doublecheck that at some point).
I don't see "don't copy pg_control" and "copy pg_control last" as all
that different in terms of complexity.
pgBackRest also *restores* pg_control last which I think is pretty
important and would not be addressed by this solution.
So maybe we should at least start that way. And just document that clearly, and then use the patch that we have right now?
Except we add so it still returns the stoppoint() as well (which is returned from the current pg_stop_backup, but not in the new one).
We can always extend the function with more columns later if we need - it's changing the ones we have that's a big problem there :)
Then we start this way and we can keep improving if necessary. But the advantage of sticking to this for now is we don't have to touch the recovery side at all. And we're pretty late in the cycle ATM.
> If we
> don't care about that, then we can go back to just saying "copy
> pg_control last and we're done". But you yourself complained about that
> requirement because it's too easy to get wrong (though you advocated
> using backup_label to transfer the data over -- but that has the
> potential for getting more complicated if we now or at any point in the
> future want more than one field to transfer, for example).
Perhaps. I'm still not convinced that getting some information from
backup_label and other information from pg_control is a good solution.
I would rather write the recovery point into the backup_label and use
that instead of the value in pg_control. Text files are much easier to
parse and push around accurately (and test).
What about all the *other* functionality that's then in pg_control?
We could do as Heikki at one point suggested in our chat as well which is basically write all of pg_control out to the backup_label file, and reconstruct it from that. But that makes that file a lot more complicated...
//Magnus
pgsql-hackers by date: