Re: WAL prefetch - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: WAL prefetch |
Date | |
Msg-id | ef234489-1875-cde1-1ff1-0a58de95fb9b@postgrespro.ru Whole thread Raw |
In response to | Re: WAL prefetch (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: WAL prefetch
|
List | pgsql-hackers |
On 15.06.2018 18:03, Amit Kapila wrote: > On Fri, Jun 15, 2018 at 1:08 PM, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> >> On 15.06.2018 07:36, Amit Kapila wrote: >>> On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost <sfrost@snowman.net> >>> wrote: >>>>> I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb >>>>> NVME >>>>> RAID 10 storage device and 256Gb of RAM connected using InfiniBand. >>>>> The speed of synchronous replication between two nodes is increased from >>>>> 56k >>>>> TPS to 60k TPS (on pgbench with scale 1000). >>>> I'm also surprised that it wasn't a larger improvement. >>>> >>>> Seems like it would make sense to implement in core using >>>> posix_fadvise(), perhaps in the wal receiver and in RestoreArchivedFile >>>> or nearby.. At least, that's the thinking I had when I was chatting w/ >>>> Sean. >>>> >>> Doing in-core certainly has some advantage such as it can easily reuse >>> the existing xlog code rather trying to make a copy as is currently >>> done in the patch, but I think it also depends on whether this is >>> really a win in a number of common cases or is it just a win in some >>> limited cases. >>> >> I am completely agree. It was my mail concern: on which use cases this >> prefetch will be efficient. >> If "full_page_writes" is on (and it is safe and default value), then first >> update of a page since last checkpoint will be written in WAL as full page >> and applying it will not require reading any data from disk. >> > What exactly you mean by above? AFAIU, it needs to read WAL to apply > full page image. See below code: > > XLogReadBufferForRedoExtended() > { > .. > /* If it has a full-page image and it should be restored, do it. */ > if (XLogRecBlockImageApply(record, block_id)) > { > Assert(XLogRecHasBlockImage(record, block_id)); > *buf = XLogReadBufferExtended(rnode, forknum, blkno, > get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK); > page = BufferGetPage(*buf); > if (!RestoreBlockImage(record, block_id, page)) > .. > } > > Sorry, for my confusing statement. Definitely we need to read page from WAL. I mean that in case of "full page write" we do not need to read updated page from the database. It can be just overwritten. pg_prefaulter and my wal_prefetch are not prefetching WAL pages themselves. There is no sense to do it, because them are just written by wal_receiver and so should be present in file system cache. wal_prefetch is prefetching blocks referenced by WAL records. But in case of "full page writes" such prefetch is not needed and even is harmful. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: