Re: fdatasync performance problem with large number of DB files - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: fdatasync performance problem with large number of DB files
Date
Msg-id CA+hUKG+z+ET2-o8cX5XBqPTNzNUXzMAZKP2YU8cgz53cdC65OA@mail.gmail.com
Whole thread Raw
In response to Re: fdatasync performance problem with large number of DB files  (David Steele <david@pgmasters.net>)
Responses Re: fdatasync performance problem with large number of DB files  (David Steele <david@pgmasters.net>)
List pgsql-hackers
Thanks Justin and David.  Replies to two emails inline:

On Sat, Mar 20, 2021 at 12:01 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Fri, Mar 19, 2021 at 06:28:46PM +1300, Thomas Munro wrote:
> > +++ b/doc/src/sgml/config.sgml
>
> > +        <productname>PostgreSQL</productname> will recursively open and fsync
> > +        all files in the data directory before crash recovery begins.  This
>
> Maybe it should say "data, tablespace, and WAL directories", or just "critical
> directories".

Fair point.  Here's what I went with:

        When set to <literal>fsync</literal>, which is the default,
        <productname>PostgreSQL</productname> will recursively open and
        synchronize all files in the data directory before crash
recovery
        begins.  The search for files will follow symbolic links for the WAL
        directory and each configured tablespace (but not any other symbolic
        links).

> > +     {
> > +             {"recovery_init_sync_method", PGC_POSTMASTER, ERROR_HANDLING_OPTIONS,
> > +                     gettext_noop("Sets the method for synchronizing the data directory before crash recovery."),
> > +             },
>
> "and tablespaces and WAL"

I feel like that's getting too detailed for the GUC description?

On Sat, Mar 20, 2021 at 2:55 AM David Steele <david@pgmasters.net> wrote:
> I had a look at the patch and it looks good to me.

Thanks!

> Should we mention in the docs that the contents of non-standard symlinks
> will not be synced, i.e. anything other than tablespaces and pg_wal? Or
> can we point them to some docs saying not to do that (if such exists)?

Good idea.  See above for the adjustment I went with to describe the
traditional behaviour, and then I also made a similar change to the
syncfs part, which, I hope, manages to convey that the new mode
matches the existing policy on symlinks:

        On Linux, <literal>syncfs</literal> may be used instead, to ask the
        operating system to synchronize the whole file systems that contain the
        data directory, the WAL file and each tablespace (but not any other
        file systems that may be reachable through symbolic links).

I thought about adding some text along the lines that such symlinks
are not expected, but I think you're right that what we really need is
a good place to point to.  I mean, generally you can't mess around
with the files managed by PostgreSQL and expect everything to keep
working correctly, but it wouldn't hurt to make an explicit statement
about symlinks and where they're allowed (or maybe there is one
already and I failed to find it).  There are hints though, like
pg_basebackup's documentation which tells you it won't follow or
preserve them in general, but... hmm, it also contemplates various
special subdirectories (pg_dynshmem, pg_notify, pg_replslot, ...) that
might be symlinks without saying why.

> > Ok, I made the changes you suggested.  Let's see if anyone else would
> > like to vote for or against the concept of the 0002 patch
> > (recovery_init_sync_method=none).
>
> It worries me that this needs to be explicitly "turned off" after the
> initial recovery. Seems like something of a foot gun.
>
> Since we have not offered this functionality before I'm not sure we
> should rush to introduce it now. For backup solutions that do their own
> syncing, syncfs() should provide excellent performance so long as the
> file system is not shared, which is something the user can control (and
> is noted in the docs).

Thanks.  I'm leaving the 0002 patch "on ice" until someone can explain
how you're supposed to use it without putting a hole in your foot.

I pushed the 0001 patch.  Thanks to all who reviewed.  Of course,
further documentation improvement patches are always welcome.

(One silly thing I noticed is that our comments generally think
"filesystem" is one word, but our documentation always has a space;
this patch followed the local convention in both cases!)



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Andres Freund
Date:
Subject: shared memory stats: high level design decisions: consistency, dropping