Re: Add a different archive_command example for Linux / Unix - Mailing list pgsql-docs

From gparc@free.fr
Subject Re: Add a different archive_command example for Linux / Unix
Date
Msg-id 705658729.361810855.1707401435362.JavaMail.zimbra@free.fr
Whole thread Raw
In response to Re: Add a different archive_command example for Linux / Unix  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Add a different archive_command example for Linux / Unix
List pgsql-docs
Thanks Stephen for your detailed reply and broad perspective.
But I see the cp example command used **as is** most of the time.

Regards
Gilles

----- Mail original -----
> De: "Stephen Frost" <sfrost@snowman.net>
> À: "gparc" <gparc@online.fr>, "pgsql-docs" <pgsql-docs@lists.postgresql.org>
> Envoyé: Jeudi 8 Février 2024 13:00:13
> Objet: Re: Add a different archive_command example for Linux / Unix

> Greetings,
>
> * PG Doc comments form (noreply@postgresql.org) wrote:
>> Hello,
>> in
>> https://www.postgresql.org/docs/16/continuous-archiving.html#BACKUP-ARCHIVING-WAL
>> the example given could be improved for Linux / Unix environment.
>>
>> As cp command is buffered it means it could return success although the data
>> didn't reach the disk
>> which puts backups at risk.
>
> Yup.
>
>> I propose to use dd command with its fsync option.
>>
>> So the actual equivalent example would be :
>>
>> archive_command = 'dd if=%p of=/mnt/server/archivedir/%f bs=1M
>> conv=fsync,excl status=none' # Unix
>>
>> What do you think ?
>
> This doesn't fsync the directory though, for one thing, and there are
> other considerations beyond that when having archive_command run and
> more generally when doing backups with PG.  In short, the example in the
> documentation is not to ever be used but is intended to show how the
> replacement is done when the command is called, so that backup tool
> authors know how it works.
>
> In reality though, to write backup software for PG, you really do need
> to know PG in much more detail than the documentation provides, which
> means reading the source- for example, backup software should be
> checking the pg_control file's CRC as it's possible to read it just as
> it's being written and end up with an invalid pg_control file in the
> backup, making the backup invalid.  There's been some discussion about
> how to improve this situation but nothing exists today from PG, so
> backup authors have to handle it.  This is just one example, there are
> lots of others- unlogged table handling, temporary file handling, etc,
> etc.
>
> I'd strongly recommend using one of the existing well maintained backup
> tools which have been written specifically for PG for your backups.
> Writing a new backup tool for PG is a good bit of work and isn't
> really reasonable to do with shell scripts or simple unix commands.
>
> I do feel that we could improve the documentation around this by
> dropping comments like "using any convenient file-system-backup tool
> such as tar or cpio" as those don't, for example, support any way to
> reasonably deal with unlogged tables by themselves.  Technically you
> could scan the data directory and provide an exclude file, or not
> include unlogged table files in the list of files to include, but then
> you're starting to get into things like how to tell if a file is
> associated with an unlogged table or not and while that's deep in the
> documentation, we don't make any mention or reference to unlogged tables
> in the backup documentation.  Perhaps an addition to the low-level
> documentation under 'Backing Up The Data Directory' along these lines
> would be helpful:
>
> #########
> You should omit any unlogged relation files (other than the 'init' fork)
> as they will be reset to be empty upon recovery and backing them up will
> simply increase your backup size (potentially significantly) and slow
> down the restore process.  Unlogged tables have an init fork (link to
> storage-init.html) which is a file with the same filenode number as the
> relation but with a suffix added of '_init' (link to
> storage-file-layout.html).  When an '_init' fork exists for a given
> relation, the '_init' file should be included in the backup, but all
> other files for that relation (the 'main' fork, which does not have a
> suffix, and all other forks which exist other than the 'init' fork)
> should be excluded from the backup.
> #########
>
> There's also no way for tar or cpio to directly validate that the copy
> of pg_control that they copied is valid.  Worse, on a restore, they'll
> restore pg_control more-or-less whenever and then if the restore doesn't
> complete for whatever reason, you might end up with a cluster that can
> be started, run for a while, but be missing whole tables.  While it was
> only demonstrated relatively recently that the pg_control file can, in
> fact, be invalid when read during a backup, it's a real issue that's
> been around for, probably, forever, and there isn't really a good way to
> address it today.  Still, perhaps we should include in the
> documentation, again under the 'Backup Up The Data Directory', something
> like:
>
> #########
> You must be sure to check that the copy of pg_control which was
> copied is valid.  The pg_controldata tool can be used for this purpose-
> after making the copy of the 'data_dir/global/pg_control' file which
> will be included in the backup, restore it into a new directory
> 'test_data_dir/global' and then run 'pg_controldata' on 'test_data_dir'
> and ensure that no error is returned.  If an error is returned, attempt
> to re-copy the 'data_dir/global/pg_control' file and test again (during
> the backup), or fail the backup entirely.  A backup without a valid
> pg_control file is not able to be restored.
>
> Further, on restore, it is strongly recommended to restore the
> pg_control file last and only after the successfully restoring the rest
> of the database, to prevent the database from being started either while
> the restore is happening or in the event that not all of the files which
> were part of the backup were able to be restored.
> #########
>
> Of course, there's more, such as the lack of any discussion about making
> sure to have a separate manifest of all of the files that were copied as
> part of the backup to allow you to make sure that, for example, the tar
> file that was created for the backup didn't just get truncated somewhere
> along the way.  The above alludes to this idea, at least.
>
> I will note that while we don't talk about all of these things in the
> documentation, pg_basebackup and pg_receivewal do handle things like
> doing an fsync() after the backup has completed or when receiving WAL,
> including of directories, skipping unlogged relations and temporary
> files, includes a manifest with a per-file checksum which the
> pg_verifybackup tool can be used to check, etc.  While pg_basebackup
> doesn't yet handle the issue of an invalid read of pg_control, there's
> ongoing discussion about how to address that and patches have been
> proposed to deal with it.  Hopefully something will be done soon there
> as pg_basebackup, particularly when run against a replica where the
> pg_control file is rewritten much more frequently, in certain cases,
> seems to be at particular risk.  This isn't to say it's very likely-
> it's not, just that it is a thing that could happen and could lead to an
> invalid backup.
>
> Really, rewriting the entire "how to perform a low-level backup" and
> re-focusing that section of the documentation to be for backup tool
> authors rather than having it seem like it's a unix administrator or
> general DBA's documentation for performing backups would really go a
> long way towards improving the situation, for my ... slightly more than
> 2 cents (or your favorite currency), anyway.
>
> Thanks!
>
> Stephen



pgsql-docs by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Add a different archive_command example for Linux / Unix
Next
From: Stephen Frost
Date:
Subject: Re: Add a different archive_command example for Linux / Unix