Re: Adding SMGR discriminator to buffer tags - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Adding SMGR discriminator to buffer tags
Date
Msg-id CA+hUKG+8awhBh0ZZ9=2rRuXj8MA4JF2AKo5fqMxNrOX0kN8Ctw@mail.gmail.com
Whole thread Raw
In response to Re: Adding SMGR discriminator to buffer tags  (Shawn Debnath <sdn@amazon.com>)
Responses Re: Adding SMGR discriminator to buffer tags  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Jul 12, 2019 at 11:19 AM Shawn Debnath <sdn@amazon.com> wrote:
> On Fri, Jul 12, 2019 at 10:16:21AM +1200, Thomas Munro wrote:
> +
> +               /* TODO: How should we handle other smgr IDs? */
> +               if (smgrid != SMGR_MD)
>                         continue;
>
> All files are copied verbatim from source to target except for relation
> files. So this would include slru data and undo data. From what I read
> in the docs, I do not believe we need any special handling for either
> new SMGRs and your current code should suffice.
>
> process_block_change() is very relation specific so if different
> handling is required by different SMGRs, it would make sense to call on
> smgr specific functions instead.

Right.  And since undo and slru etc data will be WAL-logged with block
references, it's entirely possible to teach it to scan them properly,
though it's not clear whether it's worth doing that.  Ok, good, TODO
removed.

> Can't wait for the SMGR_MD to SMGR_REL change :-) It will make
> understanding this code a tad bit easier.

Or could we retrofit different words that start with M and D?

Here's a new version of the patch set (ie the first 3 patches in the
undo patch set, and the part that I think you need for slru work),
this time with the pg_buffercache changes as a separate commit since
it's somewhat independent and has a different (partial) reviewer.

I was starting to think about whether I might be able to commit these,
but now I see that this increase in WAL size is probably not
acceptable:

@@ -727,6 +734,8 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
                }
                if (!samerel)
                {
+                       memcpy(scratch, ®buf->smgrid, sizeof(SmgrId));
+                       scratch += sizeof(SmgrId);
                        memcpy(scratch, ®buf->rnode, sizeof(RelFileNode));
                        scratch += sizeof(RelFileNode);
                }

@@ -1220,8 +1221,10 @@ DecodeXLogRecord(XLogReaderState *state,
XLogRecord *record, char **errormsg)
                        }
                        if (!(fork_flags & BKPBLOCK_SAME_REL))
                        {
+                               COPY_HEADER_FIELD(&blk->smgrid, sizeof(SmgrId));
                                COPY_HEADER_FIELD(&blk->rnode,
sizeof(RelFileNode));
                                rnode = &blk->rnode;
+                               smgrid = blk->smgrid;
                        }

That's an enum, so it works out to a word per record.  The obvious way
to avoid increasing the size is shove the SMGR ID into the same space
that holds the forknum.  Unlike BufferTag, where forknum currently
swims in 32 bits which this patch chops in half, XLogRecorBlockHeader
is already crammed into a uint8 fork_flags of which it has only the
lower nibble, and the upper nibble is used for eg BKP_BLOCK_xxx flag
bits, and there isn't even a spare bit to say 'has non-zero SMGR ID'.
Rats.  I suppose I could change it to a byte.  I wonder if one extra
byte per WAL record is acceptable.  Anyone?

-- 
Thomas Munro
https://enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Paul Guo
Date:
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Next
From: "Daniel Westermann (DWE)"
Date:
Subject: Documentation fix for adding a column with a default value