Re: Two fsync related performance issues? - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Two fsync related performance issues?
Date
Msg-id CAMsr+YHtLs-OTqOi2Vn3HRUsa=mB1ObWJtfPuVVq+Z5Z-Ewz3g@mail.gmail.com
Whole thread Raw
In response to Re: Two fsync related performance issues?  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Two fsync related performance issues?
List pgsql-hackers


On Wed, 14 Oct 2020, 13:06 Michael Paquier, <michael@paquier.xyz> wrote:
On Wed, Oct 14, 2020 at 02:48:18PM +1300, Thomas Munro wrote:
> On Wed, Oct 14, 2020 at 12:53 AM Michael Banck
> <michael.banck@credativ.de> wrote:
>> One question about this: Did you consider the case of a basebackup being
>> copied/restored somewhere and the restore/PITR being started? Shouldn't
>> Postgres then sync the whole data directory first in order to assure
>> durability, or do we consider this to be on the tool that does the
>> copying? Or is this not needed somehow?
>
> To go with precise fsyncs, we'd have to say that it's the job of the
> creator of the secondary copy.  Unfortunately that's not a terribly
> convenient thing to do (or at least the details vary).

Yeah, it is safer to assume that it is the responsability of the
backup tool to ensure that because it could be possible that a host is
unplugged just after taking a backup, and having Postgres do this work
at the beginning of archive recovery would not help in most cases.

Let's document that assumption in the docs for pg_basebackup and the file system copy based replica creation docs. With a reference to initdb's datadir sync option.

IMO this comes back to the point where we usually should not care much
how long a backup  takes as long as it is done right.  Users care much
more about how long a restore takes until consistency is reached.  And
this is in line with things that have been done via bc34223b or
96a7128.
--
Michael

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Bharath Rupireddy
Date:
Subject: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW