Re: Duplicate history file? - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Duplicate history file?
Date
Msg-id 20210615024844.lnemu7jytlwh3jcj@nol
Whole thread Raw
In response to Re: Duplicate history file?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Tue, Jun 15, 2021 at 10:20:37AM +0900, Kyotaro Horiguchi wrote:
> 
> Actually there's large room for losing data with cp. Yes, we would
> need additional redundancy of storage and periodical integrity
> inspection of the storage and archives on maybe need copies at the
> other sites on the other side of the Earth. But they are too-much for
> some kind of users.  They have the right and responsibility to decide
> how durable/reliable their archive needs to be.  (Putting aside some
> hardware/geological requirements :p)

Note that most of those considerations are orthogonal to what a proper
archive_command should be responsible for.

Yes users are responsible to decide they want valid and durable backup or
not, but we should assume a sensible default behavior, which is a valid and
durable archive_command.  We don't document a default fsync = off with later
recommendation explaining why you shouldn't do that, and I think it should be
the same for the archive_command.  The problem with the current documentation
is that many users will just blindly copy/paste whatever is in the
documentation without reading any further.

As an example, a few hours ago some french user on the french bulletin board
said that he fixed his "postmaster.pid already exists" error with a
pg_resetxlog -f, referring to some article explaining how to start postgres in
case of "PANIC: could not locate a valid checkpoint record".  Arguably
that article didn't bother to document what are the implication for executing
pg_resetxlog, but given that the user original problem had literally nothing to
do with what was documented, I really doubt that it would have changed
anything.

> If we mandate some
> characteristics on the archive_command, we should take them into core.

I agree.

> I remember I saw some discussions on archive command on this line but
> I think it had ended at the point something like that "we cannot
> design one-fits-all interface comforming the requirements" or
> something (sorry, I don't remember in its detail..)

I also agree, but this problem is solved by making archive_command
customisable.  Providing something that can reliably work in some general and
limited cases would be a huge improvement.

> Well. rman used rsync/ssh in its documentation in the past and now
> looks like providing barman-wal-archive so it seems that you're right
> in that point.  So, do we recommend them in our documentation? (I'm
> not sure they are actually comform the requirement, though..)

We could maybe bless some third party backup solutions, but this will probably
lead to a lot more discussions, so it's better to discuss that in a different
thread.  Note that I don't have a deep knowledge of any of those tools so I
don't have an opinion.

> If we write an example with a pseudo tool name, requiring some
> characteristics on the tool, then not telling about the minimal tools,
> I think that it is equivalent to that we are inhibiting certain users
> from using archive_command even if they really don't want such level
> of durability.

I already saw customers complaining about losing backups because their
archive_command didn't ensure that the copy was durable.  Some users may not
care about losing their backups in such case, but I really think that the
majority of users expect a backup to be valid, durable and everything without
even thinking that it may not be the case.  It should be the default behavior,
not optional.



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Teaching users how they can get the most out of HOT in Postgres 14
Next
From: Masahiko Sawada
Date:
Subject: Re: PG 14 release notes, first draft