Home > mailing lists

Re: Two fsync related performance issues? - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: Two fsync related performance issues?
Date	May 12, 2020 06:55:37
Msg-id	6d2709b3-3965-9cea-d5a0-2b0de33dd5a9@oss.nttdata.com Whole thread Raw
In response to	Two fsync related performance issues? (Paul Guo <pguo@pivotal.io>)
Responses	Re: Two fsync related performance issues? (Michael Paquier <michael@paquier.xyz>)
List	pgsql-hackers

Tree view


On 2020/05/12 9:42, Paul Guo wrote:
> Hello hackers,
> 
> 1. StartupXLOG() does fsync on the whole data directory early in the crash recovery. I'm wondering if we could skip
somedirectories (at least the pg_log/, table directories) since wal, etc could ensure consistency.
 

I agree that we can skip log directory but I'm not sure if skipping
table directory is really safe. Also ISTM that we can skip the directories
that those contents are removed or zeroed during recovery,
for example, pg_snapshots, pg_substrans, etc.

> Here is the related code.
> 
>        if (ControlFile->state != DB_SHUTDOWNED &&
>            ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
>        {
>            RemoveTempXlogFiles();
>            SyncDataDirectory();
>        }
> 
> I have this concern since I saw an issue in a real product environment that the startup process needs 10+ seconds to
startwal replay after relaunch due to elog(PANIC) (it was seen on postgres based product Greenplum but it is a common
issuein postgres also). I highly suspect the delay was mostly due to this. Also it is noticed that on public clouds
fsyncis much slower than that on local storage so the slowness should be more severe on cloud. If we at least disable
fsyncon the table directories we could skip a lot of file fsync - this may save a lot of seconds during crash
recovery.
> 
> 2.  CheckPointTwoPhase()
> 
> This may be a small issue.
> 
> See the code below,
> 
> for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>      RecreateTwoPhaseFile(gxact->xid, buf, len);
> 
> RecreateTwoPhaseFile() writes a state file for a prepared transaction and does fsync. It might be good to do fsync
forall files once after writing them, given the kernel is able to do asynchronous flush when writing those file
contents.If the TwoPhaseState->numPrepXacts is large we could do batching to avoid the fd resource limit. I did not
testthem yet but this should be able to speed up checkpoint/restartpoint a bit.
 
> 
> Any thoughts?

It seems worth making the patch and measuring the performance improvement.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

pgsql-hackers by date:

From: Tom Lane
Date: 12 May 2020, 06:38:33
Subject: Re: PG 13 release notes, first draft

From: Kyotaro Horiguchi
Date: 12 May 2020, 07:09:08
Subject: Re: PG 13 release notes, first draft

Re: Two fsync related performance issues? - Mailing list pgsql-hackers

Previous

Next