Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: WIP: WAL prefetch (another approach)
Date
Msg-id 20220309064633.woyf53w3odnhltah@jrouhaud
Whole thread Raw
In response to Re: WIP: WAL prefetch (another approach)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: WIP: WAL prefetch (another approach)
List pgsql-hackers
Hi,

On Tue, Mar 08, 2022 at 06:15:43PM +1300, Thomas Munro wrote:
> On Wed, Dec 29, 2021 at 5:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > https://github.com/macdice/postgres/tree/recovery-prefetch-ii
>
> Here's a rebase.  This mostly involved moving hunks over to the new
> xlogrecovery.c file.  One thing that seemed a little strange to me
> with the new layout is that xlogreader is now a global variable.  I
> followed that pattern and made xlogprefetcher a global variable too,
> for now.

I for now went through 0001, TL;DR the patch looks good to me.  I have a few
minor comments though, mostly to make things a bit clearer (at least to me).

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 2340dc247b..c129df44ac 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -407,10 +407,10 @@ XLogDumpRecordLen(XLogReaderState *record, uint32 *rec_len, uint32 *fpi_len)
     * add an accessor macro for this.
     */
    *fpi_len = 0;
+   for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
    {
        if (XLogRecHasBlockImage(record, block_id))
-           *fpi_len += record->blocks[block_id].bimg_len;
+           *fpi_len += record->record->blocks[block_id].bimg_len;
    }
(and similar in that file, xlogutils.c and xlogreader.c)

This could use XLogRecGetBlock?  Note that this macro is for now never used.

xlogreader.c also has some similar forgotten code that could use
XLogRecMaxBlockId.


+ * See if we can release the last record that was returned by
+ * XLogNextRecord(), to free up space.
+ */
+void
+XLogReleasePreviousRecord(XLogReaderState *state)

The comment seems a bit misleading, as I first understood it as it could be
optional even if the record exists.  Maybe something more like "Release the
last record if any"?


+    * Remove it from the decoded record queue.  It must be the oldest item
+    * decoded, decode_queue_tail.
+    */
+   record = state->record;
+   Assert(record == state->decode_queue_tail);
+   state->record = NULL;
+   state->decode_queue_tail = record->next;

The naming is a bit counter intuitive to me, as before reading the rest of the
code I wasn't expecting the item at the tail of the queue to have a next
element.  Maybe just inverting tail and head would make it clearer?


+DecodedXLogRecord *
+XLogNextRecord(XLogReaderState *state, char **errormsg)
+{
[...]
+       /*
+        * state->EndRecPtr is expected to have been set by the last call to
+        * XLogBeginRead() or XLogNextRecord(), and is the location of the
+        * error.
+        */
+
+       return NULL;

The comment should refer to XLogFindNextRecord, not XLogNextRecord?
Also, is it worth an assert (likely at the top of the function) for that?


 XLogRecord *
 XLogReadRecord(XLogReaderState *state, char **errormsg)
+{
[...]
+   if (decoded)
+   {
+       /*
+        * XLogReadRecord() returns a pointer to the record's header, not the
+        * actual decoded record.  The caller will access the decoded record
+        * through the XLogRecGetXXX() macros, which reach the decoded
+        * recorded as xlogreader->record.
+        */
+       Assert(state->record == decoded);
+       return &decoded->header;

I find it a bit weird to mention XLogReadRecord() as it's the current function.


+/*
+ * Allocate space for a decoded record.  The only member of the returned
+ * object that is initialized is the 'oversized' flag, indicating that the
+ * decoded record wouldn't fit in the decode buffer and must eventually be
+ * freed explicitly.
+ *
+ * Return NULL if there is no space in the decode buffer and allow_oversized
+ * is false, or if memory allocation fails for an oversized buffer.
+ */
+static DecodedXLogRecord *
+XLogReadRecordAlloc(XLogReaderState *state, size_t xl_tot_len, bool allow_oversized)

Is it worth clearly stating that it's the reponsability of the caller to update
the decode_buffer_head (with the real size) after a successful decoding of this
buffer?


+   if (unlikely(state->decode_buffer == NULL))
+   {
+       if (state->decode_buffer_size == 0)
+           state->decode_buffer_size = DEFAULT_DECODE_BUFFER_SIZE;
+       state->decode_buffer = palloc(state->decode_buffer_size);
+       state->decode_buffer_head = state->decode_buffer;
+       state->decode_buffer_tail = state->decode_buffer;
+       state->free_decode_buffer = true;
+   }

Maybe change XLogReaderSetDecodeBuffer to also handle allocation and use it
here too?  Otherwise XLogReaderSetDecodeBuffer should probably go in 0002 as
the only caller is the recovery prefetching.

+   return decoded;
+}

I would find it a bit clearer to explicitly return NULL here.


    readOff = ReadPageInternal(state, targetPagePtr,
                               Min(targetRecOff + SizeOfXLogRecord, XLOG_BLCKSZ));
-   if (readOff < 0)
+   if (readOff == XLREAD_WOULDBLOCK)
+       return XLREAD_WOULDBLOCK;
+   else if (readOff < 0)

ReadPageInternal comment should be updated to mention the new XLREAD_WOULDBLOCK
possible return value.

It's also not particulary obvious why XLogFindNextRecord() doesn't check for
this value.  AFAICS callers don't (and should never) call it with a
nonblocking == true state, maybe add an assert for that?


@@ -468,7 +748,7 @@ restart:
            if (pageHeader->xlp_info & XLP_FIRST_IS_OVERWRITE_CONTRECORD)
            {
                state->overwrittenRecPtr = RecPtr;
-               ResetDecoder(state);
+               //ResetDecoder(state);

AFAICS this is indeed not necessary anymore, so it can be removed?


 static void
 ResetDecoder(XLogReaderState *state)
 {
[...]
+   /* Reset the decoded record queue, freeing any oversized records. */
+   while ((r = state->decode_queue_tail))

nit: I think it's better to explicitly check for the assignment being != NULL,
and existing code is more frequently written this way AFAICS.


+/* Return values from XLogPageReadCB. */
+typedef enum XLogPageReadResultResult

typo



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Logical replication timeout problem
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Optionally automatically disable logical replication subscriptions on error