Re: Improve WALRead() to suck data directly from WAL buffers when possible - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date
Msg-id 20221212.115717.1183698540772933396.horikyota.ntt@gmail.com
Whole thread Raw
In response to Improve WALRead() to suck data directly from WAL buffers when possible  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
At Fri, 9 Dec 2022 14:33:39 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> The patch introduces concurrent readers for the WAL buffers, so far
> only there are concurrent writers. In the patch, WALRead() takes just
> one lock (WALBufMappingLock) in shared mode to enable concurrent
> readers and does minimal things - checks if the requested WAL page is
> present in WAL buffers, if so, copies the page and releases the lock.
> I think taking just WALBufMappingLock is enough here as the concurrent
> writers depend on it to initialize and replace a page in WAL buffers.
> 
> I'll add this to the next commitfest.
> 
> Thoughts?

This adds copying of the whole page (at least) at every WAL *record*
read, fighting all WAL writers by taking WALBufMappingLock on a very
busy page while the copying. I'm a bit doubtful that it results in an
overall improvement. It seems to me almost all pread()s here happens
on file buffer so it is unclear to me that copying a whole WAL page
(then copying the target record again) wins over a pread() call that
copies only the record to read. Do you have an actual number of how
frequent WAL reads go to disk, or the actual number of performance
gain or real I/O reduction this patch offers?

This patch copies the bleeding edge WAL page without recording the
(next) insertion point nor checking whether all in-progress insertion
behind the target LSN have finished. Thus the copied page may have
holes.  That being said, the sequential-reading nature and the fact
that WAL buffers are zero-initialized may make it work for recovery,
but I don't think this also works for replication.

I remember that the one of the advantage of reading the on-memory WAL
records is that that allows walsender to presend the unwritten
records. So perhaps we should manage how far the buffer is filled with
valid content (or how far we can presend) in this feature.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Tree-walker callbacks vs -Wdeprecated-non-prototype
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible