Re: Two fsync related performance issues? - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Two fsync related performance issues?
Date
Msg-id CAMsr+YH0qzia9yDGsDgsCM8du_HZtiP8utpZnhCHBJ_RrA+ZSw@mail.gmail.com
Whole thread Raw
In response to Two fsync related performance issues?  (Paul Guo <pguo@pivotal.io>)
Responses Re: Two fsync related performance issues?
List pgsql-hackers


On Tue, 12 May 2020, 08:42 Paul Guo, <pguo@pivotal.io> wrote:
Hello hackers,

1. StartupXLOG() does fsync on the whole data directory early in the crash recovery. I'm wondering if we could skip some directories (at least the pg_log/, table directories) since wal, etc could ensure consistency. Here is the related code.

      if (ControlFile->state != DB_SHUTDOWNED &&
          ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
      {
          RemoveTempXlogFiles();
          SyncDataDirectory();
      }

This would actually be a good candidate for a thread pool. Dispatch sync requests and don't wait. Come back later when they're done. 

Unsure if that's at all feasible given that pretty much all the Pg APIs aren't thread safe though. No palloc, no elog/ereport, etc. However I don't think we're ready to run bgworkers or use shm_mq etc at that stage.

Of course if OSes would provide asynchronous IO interfaces that weren't utterly vile and broken, we wouldn't have to worry...



RecreateTwoPhaseFile() writes a state file for a prepared transaction and does fsync. It might be good to do fsync for all files once after writing them, given the kernel is able to do asynchronous flush when writing those file contents. If the TwoPhaseState->numPrepXacts is large we could do batching to avoid the fd resource limit. I did not test them yet but this should be able to speed up checkpoint/restartpoint a bit.

I seem to recall some hints we can set on a FD or mmapped  range that encourage dirty buffers to be written without blocking us, too. I'll have to look them up...
 

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Remove page-read callback from XLogReaderState.
Next
From: Jeff Janes
Date:
Subject: max_slot_wal_keep_size comment in postgresql.conf