Re: Extensible Rmgr for Table AMs - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Extensible Rmgr for Table AMs
Date
Msg-id 20220118095332.6xtlcjoyxobv6cbk@jrouhaud
Whole thread Raw
In response to Re: Extensible Rmgr for Table AMs  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Extensible Rmgr for Table AMs
List pgsql-hackers
Hi,

On Mon, Jan 17, 2022 at 12:42:06AM -0800, Jeff Davis wrote:
> On Mon, 2021-11-08 at 15:36 -0800, Jeff Davis wrote:
> > The attached patch (against v14, so it's easier to test columnar) is
> > somewhat like a simplified version of [3] combined with refactoring
> > to
> > make decoding a part of the rmgr.
> 
> New patches attached (v3). Essentially the content as v2, but split
> into 3 patches and rebased.
> 
> Review on patch 0001 would be helpful. It makes decoding just another
> method of rmgr, which makes a lot of sense to me from a code
> organization standpoint regardless of the other patches. Is there any
> reason not to do that?

I think that it's a clean and sensible approach, so +1 for me.

There's a bit of 0003 present in 002:

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..612b1b3723 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -738,7 +738,7 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
                              (uint32) SizeOfXLogRecord, record->xl_tot_len);
        return false;
    }
-   if (record->xl_rmid > RM_MAX_ID)
+   if (record->xl_rmid > RM_MAX_ID && record->xl_rmid < RM_CUSTOM_MIN_ID)
    {

> 
> The other patches then make rmgr extensible, which in turn makes
> decoding extensible and solves the logical replication problem for
> custom table AMs. The most obvious way to make rmgr extensible would be
> to expand the rmgr table, but I was concerned about making that dynamic
> (right now the structure is entirely constant and perhaps that's
> important for some optimizations?). So, at the cost of complexity I
> made a separate, dynamic rmgr table to hold the custom entries.

I'm not sure about performance overhead, but allowing extension to access the
main table directly seems a bit scary.  We definitely accepted some overhead
with the various extensible support, and this new GetRmgr() doesn't look like
it's going to cost a lot for the builtin case, especially compared to the cost
of any of the code associated with the rmgr.

Also, to answer your initial email I think it's a better way to go compared to
your previous approach, given the limitations and performance of the generic
xlog infrastructure, and hopefully index AMs should be able to rely on this
too.

A few random comments on 0003:

 #define RM_MAX_ID              (RM_NEXT_ID - 1)
+#define RM_CUSTOM_MIN_ID       128
+#define RM_CUSTOM_MAX_ID       UINT8_MAX

It would be a good idea to add a StaticAssertStmt here to make sure that
there's no overlap in the ranges.

+
+/*
+ * RmgrId to use for extensions that require an RmgrId, but are still in
+ * development and have not reserved their own unique RmgrId yet. See:
+ * https://wiki.postgresql.org/wiki/ExtensibleRmgr
+ */
+#define RM_EXPERIMENTAL_ID     128

I'm a bit dubious about the whole "register your ID in the Wiki" approach.  It
might not be a problem since there probably won't be hundreds of users, and I
don't have any better suggestion since it has to be consistent across nodes.

+   elog(LOG, "registering customer rmgr \"%s\" with ID %d",
+        rmgr->rm_name, rmid);

Should it be a DEBUG message instead?  Also s/customer/custom/

+RmgrData
+GetCustomRmgr(RmgrId rmid)
+{
+   if (rmid < RM_CUSTOM_MIN_ID || rmid > RM_CUSTOM_MAX_ID)
+       elog(PANIC, "custom rmgr id %d out of range", rmid);

Should this be an assert?

+#define GetBuiltinRmgr(rmid) RmgrTable[(rmid)]
+#define GetRmgr(rmid) ((rmid < RM_CUSTOM_MIN_ID) ? \
+                      GetBuiltinRmgr(rmid) : GetCustomRmgr(rmid))

rmid should be protected in the macro.

> The custom rmgr API would probably be marked "experimental" for a
> while, and I don't expect lots of people to use it given the challenges
> of a production-quality table AM. But it's still important, because
> without it a table AM has no chance to participate in logical
> replication.

How you plan to mark it experimental?



pgsql-hackers by date:

Previous
From: Juan José Santamaría Flecha
Date:
Subject: [PATCH] allow src/tools/msvc/clean.bat script to be called from the root of the source tree
Next
From: Peter Eisentraut
Date:
Subject: Re: Replace uses of deprecated Python module distutils.sysconfig