Re: WAL prefetch - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: WAL prefetch
Date
Msg-id ef234489-1875-cde1-1ff1-0a58de95fb9b@postgrespro.ru
Whole thread Raw
In response to Re: WAL prefetch  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: WAL prefetch
List pgsql-hackers

On 15.06.2018 18:03, Amit Kapila wrote:
> On Fri, Jun 15, 2018 at 1:08 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>>
>> On 15.06.2018 07:36, Amit Kapila wrote:
>>> On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost <sfrost@snowman.net>
>>> wrote:
>>>>> I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb
>>>>> NVME
>>>>> RAID 10 storage device and 256Gb of RAM connected using InfiniBand.
>>>>> The speed of synchronous replication between two nodes is increased from
>>>>> 56k
>>>>> TPS to 60k TPS (on pgbench with scale 1000).
>>>> I'm also surprised that it wasn't a larger improvement.
>>>>
>>>> Seems like it would make sense to implement in core using
>>>> posix_fadvise(), perhaps in the wal receiver and in RestoreArchivedFile
>>>> or nearby..  At least, that's the thinking I had when I was chatting w/
>>>> Sean.
>>>>
>>> Doing in-core certainly has some advantage such as it can easily reuse
>>> the existing xlog code rather trying to make a copy as is currently
>>> done in the patch, but I think it also depends on whether this is
>>> really a win in a number of common cases or is it just a win in some
>>> limited cases.
>>>
>> I am completely agree. It was my mail concern: on which use cases this
>> prefetch will be efficient.
>> If "full_page_writes" is on (and it is safe and default value), then first
>> update of a page since last checkpoint will be written in WAL as full page
>> and applying it will not require reading any data from disk.
>>
> What exactly you mean by above?  AFAIU, it needs to read WAL to apply
> full page image.  See below code:
>
> XLogReadBufferForRedoExtended()
> {
> ..
> /* If it has a full-page image and it should be restored, do it. */
> if (XLogRecBlockImageApply(record, block_id))
> {
> Assert(XLogRecHasBlockImage(record, block_id));
> *buf = XLogReadBufferExtended(rnode, forknum, blkno,
>    get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK);
> page = BufferGetPage(*buf);
> if (!RestoreBlockImage(record, block_id, page))
> ..
> }
>
>

Sorry, for my confusing statement.
Definitely we need to read page from WAL.
I mean that in case of "full page write" we do not need to read updated 
page from the database.
It can be just overwritten.

pg_prefaulter and my wal_prefetch are not prefetching WAL pages themselves.
There is no sense to do it, because them are just written by 
wal_receiver and so should be present in file system cache.
wal_prefetch is prefetching blocks referenced by WAL records. But in 
case of "full page writes" such prefetch is not needed and even is harmful.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Make description of heap records more talkative for flags
Next
From: Arseny Sher
Date:
Subject: Re: Possible bug in logical replication.