Thread: Track Oldest Initialized WAL Buffer Page

Track Oldest Initialized WAL Buffer Page

From
Bharath Rupireddy
Date:
Hi,

While working on [1], I was looking for a quick way to tell if a WAL
record is present in the WAL buffers array without scanning but I
couldn't find one. Hence, I put up a patch that basically tracks the
oldest initialized WAL buffer page, named OldestInitializedPage, in
XLogCtl. With OldestInitializedPage, we can easily illustrate WAL
buffers array properties:

1) At any given point of time, pages in the WAL buffers array are
sorted in an ascending order from OldestInitializedPage till
InitializedUpTo. Note that we verify this property for assert-only
builds, see IsXLogBuffersArraySorted() in the patch for more details.

2) OldestInitializedPage is monotonically increasing (by virtue of how
postgres generates WAL records), that is, its value never decreases.
This property lets someone read its value without a lock. There's no
problem even if its value is slightly stale i.e. concurrently being
updated. One can still use it for finding if a given WAL record is
available in WAL buffers. At worst, one might get false positives
(i.e. OldestInitializedPage may tell that the WAL record is available
in WAL buffers, but when one actually looks at it, it isn't really
available). This is more efficient and performant than acquiring a
lock for reading. Note that we may not need a lock to read
OldestInitializedPage but we need to update it holding
WALBufMappingLock.

3) One can start traversing WAL buffers from OldestInitializedPage
till InitializedUpTo to list out all valid WAL records and stats, and
expose them via SQL-callable functions to users, for instance, as
pg_walinspect functions.

4) WAL buffers array is inherently organized as a circular, sorted and
rotated array with OldestInitializedPage as pivot/first element of the
array with the property where LSN of previous buffer page (if valid)
is greater than OldestInitializedPage and LSN of the next buffer page
(if
valid) is greater than OldestInitializedPage.

Thoughts?

[1] https://www.postgresql.org/message-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Track Oldest Initialized WAL Buffer Page

From
Nathan Bossart
Date:
On Tue, Feb 07, 2023 at 07:30:00PM +0530, Bharath Rupireddy wrote:
> +        /*
> +         * Try updating oldest initialized XLog buffer page.
> +         *
> +         * Update it if we are initializing an XLog buffer page for the first
> +         * time or if XLog buffers are full and we are wrapping around.
> +         */
> +        if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
> +            (!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) &&
> +             XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx))
> +        {
> +            Assert(XLogCtl->OldestInitializedPage < NewPageBeginPtr);
> +
> +            XLogCtl->OldestInitializedPage = NewPageBeginPtr;
> +        }

nitpick: I think you can simplify the conditional to

    if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
        XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)

It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
Doesn't that set it to the beginning of the newest initialized page?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Track Oldest Initialized WAL Buffer Page

From
Bharath Rupireddy
Date:
On Tue, Feb 28, 2023 at 5:52 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 07, 2023 at 07:30:00PM +0530, Bharath Rupireddy wrote:
> > +             /*
> > +              * Try updating oldest initialized XLog buffer page.
> > +              *
> > +              * Update it if we are initializing an XLog buffer page for the first
> > +              * time or if XLog buffers are full and we are wrapping around.
> > +              */
> > +             if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
> > +                     (!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) &&
> > +                      XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx))
> > +             {
> > +                     Assert(XLogCtl->OldestInitializedPage < NewPageBeginPtr);
> > +
> > +                     XLogCtl->OldestInitializedPage = NewPageBeginPtr;
> > +             }
>
> nitpick: I think you can simplify the conditional to
>
>         if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
>                 XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)

Oh, yes, done that.

> It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
> Doesn't that set it to the beginning of the newest initialized page?

Yes, that's the intention, see below. OldestInitializedPage points to
the start address of the oldest initialized page whereas the
InitializedUpTo points to the end address of the latest initialized
page. With this, one can easily track all the WAL between
OldestInitializedPage and InitializedUpTo.

+        /*
+         * OldestInitializedPage and InitializedUpTo are always starting and
+         * ending addresses of (same or different) XLog buffer page
+         * respectively. Hence, they can never be same even if there's only one
+         * initialized page in XLog buffers.
+         */
+        Assert(XLogCtl->OldestInitializedPage != XLogCtl->InitializedUpTo);

Thanks for looking at it. I'm attaching v2 patch with the above review
comment addressed for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Track Oldest Initialized WAL Buffer Page

From
Nathan Bossart
Date:
On Tue, Feb 28, 2023 at 11:12:29AM +0530, Bharath Rupireddy wrote:
> On Tue, Feb 28, 2023 at 5:52 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
>> Doesn't that set it to the beginning of the newest initialized page?
> 
> Yes, that's the intention, see below. OldestInitializedPage points to
> the start address of the oldest initialized page whereas the
> InitializedUpTo points to the end address of the latest initialized
> page. With this, one can easily track all the WAL between
> OldestInitializedPage and InitializedUpTo.

This is where I'm confused.  Why would we set the variable for the start
address of the _oldest_ initialized page to the start address of the
_newest_ initialized page?  I must be missing something obvious, so sorry
if this is a silly question.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Track Oldest Initialized WAL Buffer Page

From
Bharath Rupireddy
Date:
On Wed, Mar 1, 2023 at 9:49 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 28, 2023 at 11:12:29AM +0530, Bharath Rupireddy wrote:
> > On Tue, Feb 28, 2023 at 5:52 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
> >> Doesn't that set it to the beginning of the newest initialized page?
> >
> > Yes, that's the intention, see below. OldestInitializedPage points to
> > the start address of the oldest initialized page whereas the
> > InitializedUpTo points to the end address of the latest initialized
> > page. With this, one can easily track all the WAL between
> > OldestInitializedPage and InitializedUpTo.
>
> This is where I'm confused.  Why would we set the variable for the start
> address of the _oldest_ initialized page to the start address of the
> _newest_ initialized page?  I must be missing something obvious, so sorry
> if this is a silly question.

That's the crux of the patch. Let me clarify it a bit.

Firstly, we try to set OldestInitializedPage at the end of the
recovery but that's conditional, that is, only when the last replayed
WAL record spans partially to the end block.

Secondly, we set OldestInitializedPage while initializing the page for
the first time, so the missed-conditional case above gets coverd too.

And, OldestInitializedPage isn't updated for every new initialized
page, only when the previous OldestInitializedPage is being reused
i.e. the wal_buffers are full and it wraps around. Please see the
comment and the condition
XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx which
holds true if we're crossing-over/wrapping around previous
OldestInitializedPage.

+        /*
+         * Try updating oldest initialized XLog buffer page.
+         *
+         * Update it if we are initializing an XLog buffer page for the first
+         * time or if XLog buffers are full and we are wrapping around.
+         */
+        if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
+            XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)
+        {
+            Assert(XLogCtl->OldestInitializedPage < NewPageBeginPtr);
+
+            XLogCtl->OldestInitializedPage = NewPageBeginPtr;
+        }

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Track Oldest Initialized WAL Buffer Page

From
Heikki Linnakangas
Date:
On 07/02/2023 16:00, Bharath Rupireddy wrote:
> Hi,
> 
> While working on [1], I was looking for a quick way to tell if a WAL
> record is present in the WAL buffers array without scanning but I
> couldn't find one.

/* The end-ptr of the page that contains the record */
expectedEndPtr += XLOG_BLCKSZ - recptr % XLOG_BLCKSZ;

/* get the buffer where the record is, if it's in WAL buffers at all */
idx = XLogRecPtrToBufIdx(recptr);

/* prevent the WAL buffer from being evicted while we look at it */
LWLockAcquire(WALBufMappingLock, LW_SHARED);

/* Check if the page we're interested in is in the buffer */
found = XLogCtl->xlblocks[idx] == expectedEndPtr;

LWLockRelease(WALBufMappingLock, LW_SHARED);

> Hence, I put up a patch that basically tracks the
> oldest initialized WAL buffer page, named OldestInitializedPage, in
> XLogCtl. With OldestInitializedPage, we can easily illustrate WAL
> buffers array properties:
> 
> 1) At any given point of time, pages in the WAL buffers array are
> sorted in an ascending order from OldestInitializedPage till
> InitializedUpTo. Note that we verify this property for assert-only
> builds, see IsXLogBuffersArraySorted() in the patch for more details.
> 
> 2) OldestInitializedPage is monotonically increasing (by virtue of how
> postgres generates WAL records), that is, its value never decreases.
> This property lets someone read its value without a lock. There's no
> problem even if its value is slightly stale i.e. concurrently being
> updated. One can still use it for finding if a given WAL record is
> available in WAL buffers. At worst, one might get false positives
> (i.e. OldestInitializedPage may tell that the WAL record is available
> in WAL buffers, but when one actually looks at it, it isn't really
> available). This is more efficient and performant than acquiring a
> lock for reading. Note that we may not need a lock to read
> OldestInitializedPage but we need to update it holding
> WALBufMappingLock.

You actually hint at the above solution here, so I'm confused. If you're 
OK with slightly stale results, you can skip the WALBufferMappingLock 
above too, and perform an atomic read of xlblocks[idx] instead.

> 3) One can start traversing WAL buffers from OldestInitializedPage
> till InitializedUpTo to list out all valid WAL records and stats, and
> expose them via SQL-callable functions to users, for instance, as
> pg_walinspect functions.
> 
> 4) WAL buffers array is inherently organized as a circular, sorted and
> rotated array with OldestInitializedPage as pivot/first element of the
> array with the property where LSN of previous buffer page (if valid)
> is greater than OldestInitializedPage and LSN of the next buffer page
> (if
> valid) is greater than OldestInitializedPage.

These properties are true, maybe we should document them explicitly in a 
comment. But I don't see the point of tracking OldestInitializedPage. It 
seems cheap enough that we could, if there's a need for it, but I don't 
see the need.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Track Oldest Initialized WAL Buffer Page

From
Daniel Gustafsson
Date:
> On 3 Jul 2023, at 15:27, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> But I don't see the point of tracking OldestInitializedPage. It seems cheap enough that we could, if there's a need
forit, but I don't see the need. 

Based on the above comments, and the thread stalling, I am marking this
returned with feedback.  Please feel free to continue the discussion here and
re-open a new entry in a future CF if there is a new version of the patch.

--
Daniel Gustafsson




Re: Track Oldest Initialized WAL Buffer Page

From
Bharath Rupireddy
Date:
On Mon, Jul 3, 2023 at 6:57 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>

Thanks a lot for responding. Sorry for being late.

> On 07/02/2023 16:00, Bharath Rupireddy wrote:
> > Hi,
> >
> > While working on [1], I was looking for a quick way to tell if a WAL
> > record is present in the WAL buffers array without scanning but I
> > couldn't find one.
>
> /* The end-ptr of the page that contains the record */
> expectedEndPtr += XLOG_BLCKSZ - recptr % XLOG_BLCKSZ;
>
> /* get the buffer where the record is, if it's in WAL buffers at all */
> idx = XLogRecPtrToBufIdx(recptr);
>
> /* prevent the WAL buffer from being evicted while we look at it */
> LWLockAcquire(WALBufMappingLock, LW_SHARED);
>
> /* Check if the page we're interested in is in the buffer */
> found = XLogCtl->xlblocks[idx] == expectedEndPtr;
>
> LWLockRelease(WALBufMappingLock, LW_SHARED);

This is exactly what I'm doing in the 0001 patch here
https://www.postgresql.org/message-id/CALj2ACU3ZYzjOv4vZTR+LFk5PL4ndUnbLS6E1vG2dhDBjQGy2A@mail.gmail.com.

My bad! I should have mentioned the requirement properly - I want to
avoid taking WALBufMappingLock to peek into wal_buffers to determine
if the WAL buffer page containing the required WAL record exists.

> You actually hint at the above solution here, so I'm confused. If you're
> OK with slightly stale results, you can skip the WALBufferMappingLock
> above too, and perform an atomic read of xlblocks[idx] instead.

I get that and I see GetXLogBuffer first reading xlblocks without lock
and then to confirm it anyways takes the lock again in
AdvanceXLInsertBuffer.

     * However, we don't hold a lock while we read the value. If someone has
     * just initialized the page, it's possible that we get a "torn read" of
     * the XLogRecPtr if 64-bit fetches are not atomic on this platform. In
     * that case we will see a bogus value. That's ok, we'll grab the mapping
     * lock (in AdvanceXLInsertBuffer) and retry if we see anything else than
     * the page we're looking for. But it means that when we do this unlocked
     * read, we might see a value that appears to be ahead of the page we're
     * looking for. Don't PANIC on that, until we've verified the value while
     * holding the lock.
     */

The the 0001 patch at
https://www.postgresql.org/message-id/CALj2ACU3ZYzjOv4vZTR+LFk5PL4ndUnbLS6E1vG2dhDBjQGy2A@mail.gmail.com
reads the WAL buffer page with WALBufMappingLock. So, the patch can
avoid WALBufMappingLock and do something like [1]:

[1]
{
    idx = XLogRecPtrToBufIdx(ptr);
    expectedEndPtr = ptr;
    expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;

    /*
     * Do a stale read of xlblocks without WALBufMappingLock. All the callers
     * of this function are expected to read WAL that's already flushed to disk
     * from WAL buffers. If this stale read says the requested WAL buffer page
     * doesn't exist, it means that the WAL buffer page either is being or has
     * already been replaced for reuse. If this stale read says the requested
     * WAL buffer page exists, we then take WALBufMappingLock and re-read the
     * xlblocks to ensure the WAL buffer page really exists and nobody is
     * replacing it meanwhile.
     */
    endptr = XLogCtl->xlblocks[idx];

    /* Requested WAL isn't available in WAL buffers. */
    if (expectedEndPtr != endptr)
        break;

    /*
     * Requested WAL is available in WAL buffers, so recheck the existence
     * under the WALBufMappingLock and read if the page still exists, otherwise
     * return.
     */
    LWLockAcquire(WALBufMappingLock, LW_SHARED);

    endptr = XLogCtl->xlblocks[idx];

    /* Requested WAL isn't available in WAL buffers. */
    if (expectedEndPtr != endptr)
        break;

    /*
     * We found the WAL buffer page containing the given XLogRecPtr.
Get starting
     * address of the page and a pointer to the right location given
     * XLogRecPtr in that page.
     */
    page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
    data = page + ptr % XLOG_BLCKSZ;

    return data;
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com