Speed up the removal of WAL files - Mailing list pgsql-hackers

From Tsunakawa, Takayuki
Subject Speed up the removal of WAL files
Date
Msg-id 0A3221C70F24FB45833433255569204D1F81B0C8@G01JPEXMBYT05
Whole thread Raw
Responses Re: Speed up the removal of WAL files  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Hello,

The attached patch speeds up the removal of WAL files in the old timelines.  I'll add this to the next CF.


BACKGROUND
==================================================

We need to meet a severe availability requirement of a potential customer.  They will use synchronous streaming
replication. The allowed failover duration, from the failure through failure detection to the failover completion, is
10seconds.  Even one second is precious.
 

During a testing on a fast machine with SSD, we observed about 2 seconds between these messages.  There were no other
messagesbetween them.
 

LOG:  archive recovery complete
LOG:  MultiXact member wraparound protections are now enabled


CAUSE
==================================================

Examining the source code, RemoveNonParentXlogFiles() seems to account for the time.  It syncs pg_wal directory every
timeit deletes a WAL file.  max_wal_size was set to 48GB, so about 1,000 WAL files were probably deleted and hence the
pg_waldirectory was synced as much.
 


FIX
==================================================

unlink() the WAL files, then sync the pg_wal directory once at the end.

Unfortunately, the original machine is now not available, so I confirmed the speedup on a VM with HDD.

[time to remove 1,000 WAL files including the directory sync]
nonpatched: 2.45 seconds
patched:    0.81 seconds


Regards
Takayuki Tsunakawa


Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Add PGDLLIMPORT lines to some variables
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Walsender timeouts and large transactions