Re: Improve WALRead() to suck data directly from WAL buffers when possible - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date
Msg-id 20230306220027.GA3122138@nathanxps13
Whole thread Raw
In response to Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Improve WALRead() to suck data directly from WAL buffers when possible  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
+void
+XLogReadFromBuffers(XLogRecPtr startptr,
+                    TimeLineID tli,
+                    Size count,
+                    char *buf,
+                    Size *read_bytes)

Since this function presently doesn't return anything, can we have it
return the number of bytes read instead of storing it in a pointer
variable?

+    ptr = startptr;
+    nbytes = count;
+    dst = buf;

These variables seem superfluous.

+            /*
+             * Requested WAL isn't available in WAL buffers, so return with
+             * what we have read so far.
+             */
+            break;

nitpick: I'd move this to the top so that you can save a level of
indentation.

    if (expectedEndPtr != endptr)
        break;

    ... logic for when the data is found in the WAL buffers ...

+                /*
+                 * All the bytes are not in one page. Read available bytes on
+                 * the current page, copy them over to output buffer and
+                 * continue to read remaining bytes.
+                 */

Is it possible to memcpy more than a page at a time?

+            /*
+             * The fact that we acquire WALBufMappingLock while reading the WAL
+             * buffer page itself guarantees that no one else initializes it or
+             * makes it ready for next use in AdvanceXLInsertBuffer().
+             *
+             * However, we perform basic page header checks for ensuring that
+             * we are not reading a page that just got initialized. Callers
+             * will anyway perform extensive page-level and record-level
+             * checks.
+             */

Hm.  I wonder if we should make these assertions instead.

+    elog(DEBUG1, "read %zu bytes out of %zu bytes from WAL buffers for given LSN %X/%X, Timeline ID %u",
+         *read_bytes, count, LSN_FORMAT_ARGS(startptr), tli);

I definitely don't think we should put an elog() in this code path.
Perhaps this should be guarded behind WAL_DEBUG.

+        /*
+         * Check if we have read fully (hit), partially (partial hit) or
+         * nothing (miss) from WAL buffers. If we have read either partially or
+         * nothing, then continue to read the remaining bytes the usual way,
+         * that is, read from WAL file.
+         */
+        if (count == read_bytes)
+        {
+            /* Buffer hit, so return. */
+            return true;
+        }
+        else if (read_bytes > 0 && count > read_bytes)
+        {
+            /*
+             * Buffer partial hit, so reset the state to count the read bytes
+             * and continue.
+             */
+            buf += read_bytes;
+            startptr += read_bytes;
+            count -= read_bytes;
+        }
+
+        /* Buffer miss i.e., read_bytes = 0, so continue */

I think we can simplify this.  We effectively take the same action any time
"count" doesn't equal "read_bytes", so there's no need for the "else if".

    if (count == read_bytes)
        return true;

    buf += read_bytes;
    startptr += read_bytes;
    count -= read_bytes;

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: using memoize in in paralel query decreases performance
Next
From: Melanie Plageman
Date:
Subject: Re: add PROCESS_MAIN to VACUUM