Re: [HACKERS] Creating backup history files for backups taken from standbys - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [HACKERS] Creating backup history files for backups taken from standbys
Date
Msg-id CAHGQGwGS48YZfCe1CrtukMmRvYhJwbF5WwxcXUXrm6suhAXsuQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Creating backup history files for backups taken fromstandbys  (David Steele <david@pgmasters.net>)
Responses Re: [HACKERS] Creating backup history files for backups taken fromstandbys  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Mon, Mar 5, 2018 at 11:01 PM, David Steele <david@pgmasters.net> wrote:
> On 3/5/18 1:06 AM, Michael Paquier wrote:
>> On Fri, Mar 02, 2018 at 03:41:57PM -0500, David Steele wrote:
>>> On 3/2/18 1:03 PM, Fujii Masao wrote:
>>>> On Fri, Mar 2, 2018 at 1:07 PM, Michael Paquier <michael@paquier.xyz> wrote:
>>>>> We would talk about two backups running
>>>>> simultaneously on a standby, which would overlap with each other to
>>>>> generate a file aimed only at being helpful for debugging purposes, and
>>>>> we provide no information now for backups taken from standbys.  We could
>>>>> of course make that logic a bit smarter by checking if there is an
>>>>> extsing file with the same name and create a new file with a different
>>>>> name.  But is that worth the complication? That's where I am not
>>>>> convinced, and that's the reason why this patch is doing things this
>>>>> way.
>>
>> Sorry for my late reply, I was thinking more about the whole thing.
>>
>>>> What about including the backup history file in the base backup instead of
>>>> creating it in the standby's pg_wal and archiving it? As the good side effect
>>>> of this approach, we can use the backup history file for debugging purpose
>>>> even when WAL archiving is not enabled.
>>
>> That's not going to be much helpful for tar'ed base backups as the
>> backup history file would be at the end of the stream :(  For a debug
>> file, that's not really helpful.
>>
>>> I don't think this is a good idea, even if it does solve the race
>>> condition.  First, it would be different from the way primary-style
>>> backups generate this file.  Second, these files would not be excluded
>>> from future restore / backup cycles so they could build up after a while
>>> and cause confusion.  I guess we could teach PG to delete them or
>>> pg_basebackup to ignore them, but that doesn't seem worth the effort.
>>
>> Something I did not consider though is that this causes archive commands
>> to choke on those identical file names, so any archive command which is
>> not able to handle that would cause WAL to bloat on the standby.  For
>> this reason I would rather drop the current approach taken.  Remember
>> that the docs tell to use a plain "cp" for example, so this would cause
>> standbys to go silently down.
>
> That's a good point.  The pgBackRest archive command would definitely
> not like this.
>
>> Something that I would find way more portable though is the addition of
>> the backup history file in the output of pg_stop_backup(), which would
>> map as well with what Fujii-san's idea to add this file to the backup
>> data itself, and do the same for backups taken on a primary.  However
>> this would mean moving the 'c' marker marking the end of the tar stream
>> within do_pg_stop_backup(), and I am in no way a fan of that: the
>> handling of the tar stream should be out of reach of the start and stop
>> phases.
>
> If the file is stored with the backup instead of the archive then I
> don't think it adds much.  Why not just add stop time and stop LSN to
> backup_label and get rid of the history file entirely.
>
> I've been working closely with backup for nearly five years now and I've
> need had need to look at a .backup file.  Other than writing code to
> handle it in archiving, I've barely given it a thought.
>
>> Another idea that I have here as well would be to change a bit the
>> history backup file name so as the backup name is added to it, but that
>> does not fly high as pg_basebackup uses its own name with spaces
>> (Yeah!), or even better the PID of the backend taking the backup.
>
> This could work, but seems complicated for little benefit.  It would
> also require some archive commands to be updated to understand the new
> format.
>
>> The renaming is more appealing, still not perfect.  So my mood of the
>> day would be to just drop the patch entirely.
>
> I think that's the best idea for now.  It doesn't seem worth using
> cycles in the last CF coming up with a new approach.

I have no objection to mark the patch "returned with feedback".

Regards,

-- 
Fujii Masao


pgsql-hackers by date:

Previous
From: Douglas Doole
Date:
Subject: Re: INOUT parameters in procedures
Next
From: Alvaro Herrera
Date:
Subject: Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?