Home > mailing lists

Two fsync related performance issues? - Mailing list pgsql-hackers

From	Paul Guo
Subject	Two fsync related performance issues?
Date	May 12, 2020 03:42:23
Msg-id	CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw@mail.gmail.com Whole thread Raw
Responses	Re: Two fsync related performance issues? (Fujii Masao <masao.fujii@oss.nttdata.com>) Re: Two fsync related performance issues? (Robert Haas <robertmhaas@gmail.com>) Re: Two fsync related performance issues? (Craig Ringer <craig@2ndquadrant.com>) Re: Two fsync related performance issues? (Thomas Munro <thomas.munro@gmail.com>)
List	pgsql-hackers

Tree view

Hello hackers,

1. StartupXLOG() does fsync on the whole data directory early in the crash recovery. I'm wondering if we could skip some directories (at least the pg_log/, table directories) since wal, etc could ensure consistency. Here is the related code.

if (ControlFile->state != DB_SHUTDOWNED &&
ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
{
RemoveTempXlogFiles();
SyncDataDirectory();
}

I have this concern since I saw an issue in a real product environment that the startup process needs 10+ seconds to start wal replay after relaunch due to elog(PANIC) (it was seen on postgres based product Greenplum but it is a common issue in postgres also). I highly suspect the delay was mostly due to this. Also it is noticed that on public clouds fsync is much slower than that on local storage so the slowness should be more severe on cloud. If we at least disable fsync on the table directories we could skip a lot of file fsync - this may save a lot of seconds during crash recovery.

2. CheckPointTwoPhase()

This may be a small issue.

See the code below,

for (i = 0; i < TwoPhaseState->numPrepXacts; i++)

RecreateTwoPhaseFile(gxact->xid, buf, len);

RecreateTwoPhaseFile() writes a state file for a prepared transaction and does fsync. It might be good to do fsync for all files once after writing them, given the kernel is able to do asynchronous flush when writing those file contents. If the TwoPhaseState->numPrepXacts is large we could do batching to avoid the fd resource limit. I did not test them yet but this should be able to speed up checkpoint/restartpoint a bit.

Any thoughts?

Regards.

pgsql-hackers by date:

From: Justin Pryzby
Date: 12 May 2020, 03:41:55
Subject: Re: PG 13 release notes, first draft

From: Bruce Momjian
Date: 12 May 2020, 03:54:14
Subject: Re: PG 13 release notes, first draft

Two fsync related performance issues? - Mailing list pgsql-hackers

Previous

Next