On Fri, Dec 23, 2022 at 3:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Dec 12, 2022 at 8:27 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
>
> Thanks for providing thoughts.
>
> > At Fri, 9 Dec 2022 14:33:39 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > > The patch introduces concurrent readers for the WAL buffers, so far
> > > only there are concurrent writers. In the patch, WALRead() takes just
> > > one lock (WALBufMappingLock) in shared mode to enable concurrent
> > > readers and does minimal things - checks if the requested WAL page is
> > > present in WAL buffers, if so, copies the page and releases the lock.
> > > I think taking just WALBufMappingLock is enough here as the concurrent
> > > writers depend on it to initialize and replace a page in WAL buffers.
> > >
> > > I'll add this to the next commitfest.
> > >
> > > Thoughts?
> >
> > This adds copying of the whole page (at least) at every WAL *record*
> > read,
>
> In the worst case yes, but that may not always be true. On a typical
> production server with decent write traffic, it happens that the
> callers of WALRead() read a full WAL page of size XLOG_BLCKSZ bytes or
> MAX_SEND_SIZE bytes.
I agree with this.
> > This patch copies the bleeding edge WAL page without recording the
> > (next) insertion point nor checking whether all in-progress insertion
> > behind the target LSN have finished. Thus the copied page may have
> > holes. That being said, the sequential-reading nature and the fact
> > that WAL buffers are zero-initialized may make it work for recovery,
> > but I don't think this also works for replication.
>
> WALRead() callers are smart enough to take the flushed bytes only.
> Although they read the whole WAL page, they calculate the valid bytes.
Right
On first read the patch looks good, although it needs some more
thoughts on 'XXX' comments in the patch.
And also I do not like that XLogReadFromBuffers() is using 3 bools
hit/partial hit/miss, instead of this we can use an enum or some
tristate variable, I think that will be cleaner.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com