Re: Improve WALRead() to suck data directly from WAL buffers when possible - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Improve WALRead() to suck data directly from WAL buffers when possible |
Date | |
Msg-id | 2ef04861c0f77e7ae78b703770cc2bbbac3d85e6.camel@j-davis.com Whole thread Raw |
In response to | Re: Improve WALRead() to suck data directly from WAL buffers when possible (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Improve WALRead() to suck data directly from WAL buffers when possible
|
List | pgsql-hackers |
On Thu, 2023-11-02 at 22:38 +0530, Bharath Rupireddy wrote: > > I suppose the question is: should reading from the WAL buffers an > > intentional thing that the caller does explicitly by specific > > callers? > > Or is it an optimization that should be hidden from the caller? > > > > I tend toward the former, at least for now. > > Yes, it's an optimization that must be hidden from the caller. As I said, I tend toward the opposite: that specific callers should read from the buffers explicitly in the cases where it makes sense. I don't think this is the most important point right now though, let's sort out the other details. > > > At any given point of time, WAL buffer pages are maintained as a > circularly sorted array in an ascending order from > OldestInitializedPage to InitializedUpTo (new pages are inserted at > this end). I don't see any reference to OldestInitializedPage or anything like it, with or without your patch. Am I missing something? > - Read 6 pages starting from LSN 80. Nothing is read from WAL buffers > as the page at LSN 80 doesn't exist despite other 5 pages starting > from LSN 90 exist. This is what I imagined a "partial hit" was: read the 5 pages starting at 90. The caller would then need to figure out how to read the page at LSN 80 from the segment files. I am not saying we should support this case; perhaps it doesn't matter. I'm just describing why that term was confusing for me. > If a caller asks for an unflushed WAL read > intentionally or unintentionally, XLogReadFromBuffers() reads only 4 > pages starting from LSN 150 to LSN 180 and will leave the remaining 2 > pages for the caller to deal with. This is the partial hit that can > happen. To me that's more like an EOF case. "Partial hit" sounds to me like the data exists but is not available in the cache (i.e. go to the segment files); whereas if it encountered the end, the data is not available at all. > > > WALBufMappingLock protects both xlblocks and WAL buffer pages [1][2]. > I'm not sure how using the memory barrier, not WALBufMappingLock, > prevents writers from replacing WAL buffer pages while readers > reading > the pages. It doesn't *prevent* that case, but it does *detect* that case. We don't want to prevent writers from replacing WAL buffers, because that would mean we are slowing down the critical WAL writing path. Let me explain the potential problem cases, and how the barriers prevent them: Potential problem 1: the page is not yet resident in the cache at the time the memcpy begins. In this case, the first read barrier would ensure that the page is also not yet resident at the time xlblocks[idx] is read into endptr1, and we'd break out of the loop. Potential problem 2: the page is evicted before the memcpy finishes. In this case, the second read barrier would ensure that the page was also evicted before xlblocks[idx] is read into endptr2, and again we'd detect the problem and break out of the loop. I assume here that, if xlblocks[idx] holds the endPtr of the desired page, all of the bytes for that page are resident at that moment. I don't think that's true right now: AdvanceXLInsertBuffers() zeroes the old page before updating xlblocks[nextidx]. I think it needs something like: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr); pg_write_barrier(); before the MemSet. I didn't review your latest v14 patch yet. Regards, Jeff Davis
pgsql-hackers by date: