Re: Add notes to pg_combinebackup docs - Mailing list pgsql-hackers

From David Steele
Subject Re: Add notes to pg_combinebackup docs
Date
Msg-id f0fd7c33-645f-467e-a5f9-0a875c4c035c@pgmasters.net
Whole thread Raw
In response to Re: Add notes to pg_combinebackup docs  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers

On 4/12/24 22:40, Tomas Vondra wrote:
> On 4/12/24 11:50, David Steele wrote:
>> On 4/12/24 19:09, Magnus Hagander wrote:
>>> On Fri, Apr 12, 2024 at 12:14 AM David Steele <david@pgmasters.net
>>>
>>> ...>>
>>>       > But yeah, having to keep the backups as expanded directories is
>>> not
>>>       > great, I'd love to have .tar. Not necessarily because of the disk
>>>      space
>>>       > (in my experience the compression in filesystems works quite
>>> well for
>>>       > this purpose), but mostly because it's more compact and allows
>>>      working
>>>       > with backups as a single piece of data (e.g. it's much cleared
>>>      what the
>>>       > checksum of a single .tar is, compared to a directory).
>>>
>>>      But again, object stores are commonly used for backup these days and
>>>      billing is based on data stored rather than any compression that
>>> can be
>>>      done on the data. Of course, you'd want to store the compressed
>>> tars in
>>>      the object store, but that does mean storing an expanded copy
>>> somewhere
>>>      to do pg_combinebackup.
>>>
>>> Object stores are definitely getting more common. I wish they were
>>> getting a lot more common than they actually are, because they
>>> simplify a lot.  But they're in my experience still very far from
>>> being a majority.
>>
>> I see it the other way, especially the last few years. The majority seem
>> to be object stores followed up closely by NFS. Directly mounted storage
>> on the backup host appears to be rarer.
>>
> 
> One thing I'd mention is that not having built-in support for .tar and
> .tgz backups does not mean it's impossible to use pg_combinebackup with
> archives. You can mount them using e.g. "ratarmount" and then use that
> as source directories for pg_combinebackup.
> 
> It's not entirely friction-less because AFAICS it's necessary to do the
> backup in plain format and then do the .tar to have the expected "flat"
> directory structure (and not manifest + 2x tar). But other than that it
> seems to work fine (based on my limited testing).

Well, that's certainly convoluted and doesn't really help a lot in terms 
of space consumption, it just shifts the additional space required to 
the backup side. I doubt this is something we'd be willing to add to our 
documentation so it would be up to the user to figure out and script.

> FWIW the "archivemount" performs terribly, so adding this capability
> into pg_combinebackup is clearly far from trivial.

I imagine this would perform pretty badly. And yes, doing it efficiently 
is not trivial but certainly doable. Scanning the tar file and matching 
to entries in the manifest is one way, but I would prefer to store the 
offsets into the tar file in the manifest then assemble an ordered list 
of work to do on each tar file. But of course the latter requires a 
manifest-centric approach, which is not what we have right now.

Regards,
-David



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: CASE control block broken by a single line comment
Next
From: David Steele
Date:
Subject: Re: post-freeze damage control