Re: Add notes to pg_combinebackup docs - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Add notes to pg_combinebackup docs
Date
Msg-id a5e5e36f-f09d-4165-ab4f-edc7701ea5d7@enterprisedb.com
Whole thread Raw
In response to Re: Add notes to pg_combinebackup docs  (David Steele <david@pgmasters.net>)
Responses Re: Add notes to pg_combinebackup docs
List pgsql-hackers
On 4/11/24 02:01, David Steele wrote:
> On 4/9/24 19:44, Tomas Vondra wrote:
>>
>> On 4/9/24 09:59, Martín Marqués wrote:
>>> Hello,
>>>
>>> While doing some work/research on the new incremental backup feature
>>> some limitations were not listed in the docs. Mainly the fact that
>>> pg_combienbackup works with plain format and not tar.
>>>
>>
>> Right. The docs mostly imply this by talking about output directory and
>> backup directories, but making it more explicit would not hurt.
>>
>> FWIW it'd be great if we could make incremental backups work with tar
>> format in the future too. People probably don't want to keep around the
>> expanded data directory or extract everything before combining the
>> backups is not very convenient. Reading and writing the tar would make
>> this simpler.
> 
> I have a hard time seeing this feature as being very useful, especially
> for large databases, until pg_combinebackup works on tar (and compressed
> tar). Right now restoring an incremental requires at least twice the
> space of the original cluster, which is going to take a lot of users by
> surprise.
> 

I do agree it'd be nice if pg_combinebackup worked with .tar directly,
without having to extract the directories first. No argument there, but
as I said in the other thread, I believe that's something we can add
later. That's simply how incremental development works.

I can certainly imagine other ways to do pg_combinebackup, e.g. by
"merging" the increments into the data directory, instead of creating a
copy. But again, I don't think that has to be in v1.

> I know you have made some improvements here for COW filesystems, but my
> experience is that Postgres is generally not run on such filesystems,
> though that is changing a bit.
> 

I'd say XFS is a pretty common choice, for example. And it's one of the
filesystems that work great with pg_combinebackup.

However, who says this has to be the filesystem the Postgres instance
runs on? Who in their right mind put backups on the same volume as the
instance anyway? At which point it can be a different filesystem, even
if it's not ideal for running the database.

FWIW I think it's fine to tell users that to minimize the disk space
requirements, they should use a CoW filesystem and --copy-file-range.
The docs don't say that currently, that's true.

All of this also depends on how people do the restore. With the CoW
stuff they can do a quick (and small) copy on the backup server, and
then copy the result to the actual instance. Or they can do restore on
the target directly (e.g. by mounting a r/o volume with backups), in
which case the CoW won't really help.

But yeah, having to keep the backups as expanded directories is not
great, I'd love to have .tar. Not necessarily because of the disk space
(in my experience the compression in filesystems works quite well for
this purpose), but mostly because it's more compact and allows working
with backups as a single piece of data (e.g. it's much cleared what the
checksum of a single .tar is, compared to a directory).

>>> Around the same time, Tomas Vondra tested incremental backups with a
>>> cluster where he enabled checksums after taking the previous full
>>> backup. After combining the backups the synthetic backup had pages
>>> with checksums and other pages without checksums which ended in
>>> checksum errors.
>>
>> I'm not sure just documenting this limitation is sufficient. We can't
>> make the incremental backups work in this case (it's as if someone
>> messes with cluster without writing stuff into WAL), but I think we
>> should do better than silently producing (seemingly) corrupted backups.
>>
>> I say seemingly, because the backup is actually fine, the only problem
>> is it has checksums enabled in the controlfile, but the pages from the
>> full backup (and the early incremental backups) have no checksums.
>>
>> What we could do is detect this in pg_combinebackup, and either just
>> disable checksums with a warning and hint to maybe enable them again. Or
>> maybe just print that the user needs to disable them.
>>
>> I was thinking maybe we could detect this while taking the backups, and
>> force taking a full backup if checksums got enabled since the last
>> backup. But we can't do that because we only have the manifest from the
>> last backup, and the manifest does not include info about checksums.
> 
> I'd say making a new full backup is the right thing to do in this case.
> It should be easy enough to store the checksum state of the cluster in
> the manifest.
> 

Agreed.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Comment about handling of asynchronous requests in postgres_fdw.c
Next
From: "Daniel Westermann (DWE)"
Date:
Subject: Re: type in basebackup_incremental.c ?