Re: fdatasync performance problem with large number of DB files - Mailing list pgsql-hackers

From David Steele
Subject Re: fdatasync performance problem with large number of DB files
Date
Msg-id 453b323b-9532-2885-32f9-8976f2605c60@pgmasters.net
Whole thread Raw
In response to Re: fdatasync performance problem with large number of DB files  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On 3/19/21 7:16 PM, Thomas Munro wrote:
> Thanks Justin and David.  Replies to two emails inline:
> 
> Fair point.  Here's what I went with:
> 
>          When set to <literal>fsync</literal>, which is the default,
>          <productname>PostgreSQL</productname> will recursively open and
>          synchronize all files in the data directory before crash
> recovery
>          begins.  The search for files will follow symbolic links for the WAL
>          directory and each configured tablespace (but not any other symbolic
>          links).
> 

+1

> I thought about adding some text along the lines that such symlinks
> are not expected, but I think you're right that what we really need is
> a good place to point to.  I mean, generally you can't mess around
> with the files managed by PostgreSQL and expect everything to keep
> working correctly

WRT to symlinks I'm not sure that's fair to say. From PG's perspective 
it's just a dir/file after all. Other than pg_wal I have seen 
pg_stat/pg_stat_tmp sometimes symlinked, plus config files, and the log dir.

pgBackRest takes a pretty liberal approach here. Were preserve all 
dir/file symlinks no matter where they appear and allow all of them to 
be remapped on restore.

> but it wouldn't hurt to make an explicit statement
> about symlinks and where they're allowed (or maybe there is one
> already and I failed to find it).  

I couldn't find it either and I would be in favor of it. For instance, 
pgBackRest forbids tablespaces inside PGDATA and when people complain 
(more often then you might imagine) we can just point to the code/docs.

> There are hints though, like
> pg_basebackup's documentation which tells you it won't follow or
> preserve them in general, but... hmm, it also contemplates various
> special subdirectories (pg_dynshmem, pg_notify, pg_replslot, ...) that
> might be symlinks without saying why.

Right, pg_dynshmem is another one that I've seen symlinked. Some things 
are nice to have on fast storage. pg_notify and pg_replslot are similar 
since they get written to a lot in certain configurations.

>> It worries me that this needs to be explicitly "turned off" after the
>> initial recovery. Seems like something of a foot gun.
>>
>> Since we have not offered this functionality before I'm not sure we
>> should rush to introduce it now. For backup solutions that do their own
>> syncing, syncfs() should provide excellent performance so long as the
>> file system is not shared, which is something the user can control (and
>> is noted in the docs).
> 
> Thanks.  I'm leaving the 0002 patch "on ice" until someone can explain
> how you're supposed to use it without putting a hole in your foot.

+1

> (One silly thing I noticed is that our comments generally think
> "filesystem" is one word, but our documentation always has a space;
> this patch followed the local convention in both cases!)

Personally I prefer "file system".

Regards,
-- 
-David
david@pgmasters.net



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [PATCH] Identify LWLocks in tracepoints
Next
From: Hannu Krosing
Date:
Subject: Re: shared memory stats: high level design decisions: consistency, dropping