Thread: Extensible Rmgr for Table AMs

Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
Motivation:

I'm working on a columnar compression AM[0]. Currently, it uses generic
xlog, which works for crash recovery and physical replication, but not
logical decoding/replication.

Extensible rmgr would enable the table AM to support its own
redo/decode hooks and WAL format, so that it could support crash
recovery, physical replication, and logical replication.

Background:

I submitted another patch[0] to add new logical records, which could be
used to support logical decoding directly, without the need for
extensible rmgr and without any assumptions about the table AM. This
was designed to be easy to use, but inefficient. Amit raised
concerns[1] about whether it could meet the needs of zheap. Andres
suggested (off-list) that it would be better to just tackle the
extensible rmgr problem.

The idea for extensible rmgr has been proposed before[3]. The biggest
argument against it seemed to be that there was no complete use
case[4], so the worry was that something would be left out. Columnar is
complete enough that I think it qualifies as a good use case.

A subsequent proposal[5] was shot down because of a (potential?) need
for catalog access[6]. The attached patch does not use the catalog;
instead, it relies on table AM authors choosing IDs that don't conflict
with each other. This seems like a reasonable answer, considering that
there will likely be very few table AMs that go far enough to fully
support WAL including decoding.

Are there any other major arguments/objections that I missed?

Proposal:

The attached patch (against v14, so it's easier to test columnar) is
somewhat like a simplified version of [3] combined with refactoring to
make decoding a part of the rmgr.

  * adds a new RmgrData method rm_decode
  * refactors decode.c to use
the new method
  * add a layer of indirection GetRmgr to find an rmgr
   
 * fast path to find builtin rmgr in RmgrTable
    * to find a custom
rmgr, traverses list of custom rmgrs that
      are currently loaded
(unlikely to ever be more than a few)
  * rmgr IDs from 0-127 are
reserved for builtin rmgrs
  * rmgr IDs from 128-255 are reserved for
custom rmgrs
    * table AM authors need to avoid collisions between
rmgr IDs

I have tested with columnar using a simple WAL format for logical
decoding only, and I'm still using generic xlog for recovery and
physical replication. I haven't tested the redo path, or how easy it
might be to do something like generic xlog.

Questions:

0. Do we want to go this route, or something simpler like my other
proposal, which introduces new logical record types[0]?

1. I am allocating the custom rmgr list in TopMemoryContext, and it
only works when loading as a part of shared_preload_libraries. This
avoids the need for shared memory in Simon's patch[3]. Is that the
right thing to do?

2. If we go this route, what do we do with generic xlog? It seems like
a half feature, since it doesn't work with logical decoding.

3. If the custom rmgr throws an error during redo, the server won't
start. Should we have a GUC to turn non-builtin redo into a no-op to
reduce the impact of bugs in the implementation of a custom rmgr?

4. Do we want to encourage index AMs to use this mechanism as well? I
didn't really look into how suitable it is, but at a high level it
seems reasonable.

Regards,
    Jeff Davis

[0] 
https://postgr.es/m/20ee0b0ae6958804a88fe9580157587720faf664.camel@j-davis.com
[1] 
https://postgr.es/m/CAA4eK1JVDnbQ80ULdZuhzQkzr_yYhfON-tg%3Dd1U7aWjK_R1ixQ%40mail.gmail.com
[2] https://github.com/citusdata/citus/tree/master/src/backend/columnar
[3] https://postgr.es/m/1229541840.4793.79.camel%40ebony.2ndQuadrant
[4] https://postgr.es/m/20992.1232667957%40sss.pgh.pa.us
[5] https://postgr.es/m/1266774840.7341.29872.camel%40ebony
[6] https://postgr.es/m/26134.1266776040%40sss.pgh.pa.us



Attachment

Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Mon, 2021-11-08 at 15:36 -0800, Jeff Davis wrote:
> The attached patch (against v14, so it's easier to test columnar) is
> somewhat like a simplified version of [3] combined with refactoring
> to
> make decoding a part of the rmgr.

I created a wiki page here: 
https://wiki.postgresql.org/wiki/ExtensibleRmgr

To coordinate reservation of RmgrIds, to avoid conflicts. I don't
expect it to be a practical problem given how much work it takes to
create a new table AM that needs full WAL support, but might as well
have some transparency on how to choose a new RmgrId.

I also updated the patch to point to the wiki page in the comments, and
added in a new RM_EXPERIMENTAL_ID that can be used while an extension
is still in development. Hopefully this will prevent people reserving
lots of RmgrIds for extensions that never get released.

Regards,
    Jeff Davis


Attachment

Re: Extensible Rmgr for Table AMs

From
Robert Haas
Date:
On Mon, Nov 8, 2021 at 6:36 PM Jeff Davis <pgsql@j-davis.com> wrote:
> Extensible rmgr would enable the table AM to support its own
> redo/decode hooks and WAL format, so that it could support crash
> recovery, physical replication, and logical replication.

Without taking a position on your implementation, which I have not
studied, I like this idea in concept and I think it's an important
goal.

> Are there any other major arguments/objections that I missed?

ISTR some discussion of the fact that uninstalling the extension that
uses this facility, or failing to install it on your standby, will
lead to an unusable database. Personally, I don't see that as a big
problem: we should just document that if you choose to use an
extension like this, then (1) it needs to be installed on all standbys
and (2) if you ever want to get rid of it, you need to stop using it,
drop all the objects created with it, and then wait until all the WAL
previously generated by that extension is gone not only from pg_wal
but from any archived WAL files or backups that you intend to use with
that cluster before you actually nuke it. Users who don't want to
abide by those restrictions need not install such extensions. Users
who don't read the documentation might end up sad, but it doesn't seem
particularly likely, and it's hardly the only part of the
documentation that users shouldn't ignore.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Mon, 2021-11-08 at 15:36 -0800, Jeff Davis wrote:
> The attached patch (against v14, so it's easier to test columnar) is
> somewhat like a simplified version of [3] combined with refactoring
> to
> make decoding a part of the rmgr.

New patches attached (v3). Essentially the content as v2, but split
into 3 patches and rebased.

Review on patch 0001 would be helpful. It makes decoding just another
method of rmgr, which makes a lot of sense to me from a code
organization standpoint regardless of the other patches. Is there any
reason not to do that?

The other patches then make rmgr extensible, which in turn makes
decoding extensible and solves the logical replication problem for
custom table AMs. The most obvious way to make rmgr extensible would be
to expand the rmgr table, but I was concerned about making that dynamic
(right now the structure is entirely constant and perhaps that's
important for some optimizations?). So, at the cost of complexity I
made a separate, dynamic rmgr table to hold the custom entries.

The custom rmgr API would probably be marked "experimental" for a
while, and I don't expect lots of people to use it given the challenges
of a production-quality table AM. But it's still important, because
without it a table AM has no chance to participate in logical
replication.

Regards,
    Jeff Davis


Attachment

Re: Extensible Rmgr for Table AMs

From
Julien Rouhaud
Date:
Hi,

On Mon, Jan 17, 2022 at 12:42:06AM -0800, Jeff Davis wrote:
> On Mon, 2021-11-08 at 15:36 -0800, Jeff Davis wrote:
> > The attached patch (against v14, so it's easier to test columnar) is
> > somewhat like a simplified version of [3] combined with refactoring
> > to
> > make decoding a part of the rmgr.
> 
> New patches attached (v3). Essentially the content as v2, but split
> into 3 patches and rebased.
> 
> Review on patch 0001 would be helpful. It makes decoding just another
> method of rmgr, which makes a lot of sense to me from a code
> organization standpoint regardless of the other patches. Is there any
> reason not to do that?

I think that it's a clean and sensible approach, so +1 for me.

There's a bit of 0003 present in 002:

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..612b1b3723 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -738,7 +738,7 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
                              (uint32) SizeOfXLogRecord, record->xl_tot_len);
        return false;
    }
-   if (record->xl_rmid > RM_MAX_ID)
+   if (record->xl_rmid > RM_MAX_ID && record->xl_rmid < RM_CUSTOM_MIN_ID)
    {

> 
> The other patches then make rmgr extensible, which in turn makes
> decoding extensible and solves the logical replication problem for
> custom table AMs. The most obvious way to make rmgr extensible would be
> to expand the rmgr table, but I was concerned about making that dynamic
> (right now the structure is entirely constant and perhaps that's
> important for some optimizations?). So, at the cost of complexity I
> made a separate, dynamic rmgr table to hold the custom entries.

I'm not sure about performance overhead, but allowing extension to access the
main table directly seems a bit scary.  We definitely accepted some overhead
with the various extensible support, and this new GetRmgr() doesn't look like
it's going to cost a lot for the builtin case, especially compared to the cost
of any of the code associated with the rmgr.

Also, to answer your initial email I think it's a better way to go compared to
your previous approach, given the limitations and performance of the generic
xlog infrastructure, and hopefully index AMs should be able to rely on this
too.

A few random comments on 0003:

 #define RM_MAX_ID              (RM_NEXT_ID - 1)
+#define RM_CUSTOM_MIN_ID       128
+#define RM_CUSTOM_MAX_ID       UINT8_MAX

It would be a good idea to add a StaticAssertStmt here to make sure that
there's no overlap in the ranges.

+
+/*
+ * RmgrId to use for extensions that require an RmgrId, but are still in
+ * development and have not reserved their own unique RmgrId yet. See:
+ * https://wiki.postgresql.org/wiki/ExtensibleRmgr
+ */
+#define RM_EXPERIMENTAL_ID     128

I'm a bit dubious about the whole "register your ID in the Wiki" approach.  It
might not be a problem since there probably won't be hundreds of users, and I
don't have any better suggestion since it has to be consistent across nodes.

+   elog(LOG, "registering customer rmgr \"%s\" with ID %d",
+        rmgr->rm_name, rmid);

Should it be a DEBUG message instead?  Also s/customer/custom/

+RmgrData
+GetCustomRmgr(RmgrId rmid)
+{
+   if (rmid < RM_CUSTOM_MIN_ID || rmid > RM_CUSTOM_MAX_ID)
+       elog(PANIC, "custom rmgr id %d out of range", rmid);

Should this be an assert?

+#define GetBuiltinRmgr(rmid) RmgrTable[(rmid)]
+#define GetRmgr(rmid) ((rmid < RM_CUSTOM_MIN_ID) ? \
+                      GetBuiltinRmgr(rmid) : GetCustomRmgr(rmid))

rmid should be protected in the macro.

> The custom rmgr API would probably be marked "experimental" for a
> while, and I don't expect lots of people to use it given the challenges
> of a production-quality table AM. But it's still important, because
> without it a table AM has no chance to participate in logical
> replication.

How you plan to mark it experimental?



Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Tue, 2022-01-18 at 17:53 +0800, Julien Rouhaud wrote:
> > Review on patch 0001 would be helpful. It makes decoding just
> > another
> > method of rmgr, which makes a lot of sense to me from a code
> > organization standpoint regardless of the other patches. Is there
> > any
> > reason not to do that?
> 
> I think that it's a clean and sensible approach, so +1 for me.

Thank you, committed 0001. Other patches not committed yet.

> There's a bit of 0003 present in 002:

I just squashed 0002 and 0003 together. Not large enough to keep
separate.

> A few random comments on 0003:
> 
>  #define RM_MAX_ID              (RM_NEXT_ID - 1)
> +#define RM_CUSTOM_MIN_ID       128
> +#define RM_CUSTOM_MAX_ID       UINT8_MAX
> 
> It would be a good idea to add a StaticAssertStmt here to make sure
> that
> there's no overlap in the ranges.

Done.

> +
> +/*
> + * RmgrId to use for extensions that require an RmgrId, but are
> still in
> + * development and have not reserved their own unique RmgrId yet.
> See:
> + * https://wiki.postgresql.org/wiki/ExtensibleRmgr
> + */
> +#define RM_EXPERIMENTAL_ID     128
> 
> I'm a bit dubious about the whole "register your ID in the Wiki"
> approach.  It
> might not be a problem since there probably won't be hundreds of
> users, and I
> don't have any better suggestion since it has to be consistent across
> nodes.

Agree, but I don't see a better approach, either.

I do some sanity checking, which should catch collisions when they
happen.

> +   elog(LOG, "registering customer rmgr \"%s\" with ID %d",
> +        rmgr->rm_name, rmid);
> 
> Should it be a DEBUG message instead?  Also s/customer/custom/

It seems like a fairly important thing to have in the log. Only a
couple extensions will ever encounter this message, and only at server
start.

Typo fixed.

> +RmgrData
> +GetCustomRmgr(RmgrId rmid)
> +{
> +   if (rmid < RM_CUSTOM_MIN_ID || rmid > RM_CUSTOM_MAX_ID)
> +       elog(PANIC, "custom rmgr id %d out of range", rmid);
> 
> Should this be an assert?

If we make it an Assert, then it won't be caught in production builds.

> +#define GetBuiltinRmgr(rmid) RmgrTable[(rmid)]
> +#define GetRmgr(rmid) ((rmid < RM_CUSTOM_MIN_ID) ? \
> +                      GetBuiltinRmgr(rmid) : GetCustomRmgr(rmid))
> 
> rmid should be protected in the macro.

Done.

> How you plan to mark it experimental?

I suppose it doesn't need to be marked explicitly -- there are other
APIs that change. For instance, the ProcessUtility_hook changed, and
that's used much more widely.

As long as we generally agree that some kind of custom rmgrs are the
way to go, if we change the API or implementation around from version
to version I can easily update my table access method.

Regards,
    Jeff Davis


Attachment

Re: Extensible Rmgr for Table AMs

From
Julien Rouhaud
Date:
Hi,

On Mon, Jan 31, 2022 at 06:36:59PM -0800, Jeff Davis wrote:
> On Tue, 2022-01-18 at 17:53 +0800, Julien Rouhaud wrote:
> 
> > There's a bit of 0003 present in 002:
> 
> I just squashed 0002 and 0003 together. Not large enough to keep
> separate.

Agreed.

> > +   elog(LOG, "registering customer rmgr \"%s\" with ID %d",
> > +        rmgr->rm_name, rmid);
> > 
> > Should it be a DEBUG message instead?  Also s/customer/custom/
> 
> It seems like a fairly important thing to have in the log. Only a
> couple extensions will ever encounter this message, and only at server
> start.

Ok.

> > +RmgrData
> > +GetCustomRmgr(RmgrId rmid)
> > +{
> > +   if (rmid < RM_CUSTOM_MIN_ID || rmid > RM_CUSTOM_MAX_ID)
> > +       elog(PANIC, "custom rmgr id %d out of range", rmid);
> > 
> > Should this be an assert?
> 
> If we make it an Assert, then it won't be caught in production builds.

Sure, but it seems something so fundamental that it should get done right
during the development phase rather than spending cycles in optimized builds to
check for it.
But even if that happened it would get caught by the final "not found" PANIC
anyway, and for end users I don't think that getting this error instead would
make much difference.

> > How you plan to mark it experimental?
> 
> I suppose it doesn't need to be marked explicitly -- there are other
> APIs that change. For instance, the ProcessUtility_hook changed, and
> that's used much more widely.
> As long as we generally agree that some kind of custom rmgrs are the
> way to go, if we change the API or implementation around from version
> to version I can easily update my table access method.

Agreed, API breaks happen often and that's not really a problem for third-party
extensions, especially if that comes with hard compile failure.

Other than that the patch looks good to me, as you said we just need a decision
on whether custom rmgrs are wanted or not.



Re: Extensible Rmgr for Table AMs

From
Julien Rouhaud
Date:
On Tue, Feb 01, 2022 at 12:39:38PM +0800, Julien Rouhaud wrote:
> 
> Other than that the patch looks good to me, as you said we just need a decision
> on whether custom rmgrs are wanted or not.

One last thing, did you do some benchmark with a couple custom rmgr to see how
much the O(n) access is showing up in profiles?



Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Tue, 2022-02-01 at 20:45 +0800, Julien Rouhaud wrote:
> On Tue, Feb 01, 2022 at 12:39:38PM +0800, Julien Rouhaud wrote:
> > 
> > Other than that the patch looks good to me, as you said we just
> > need a decision
> > on whether custom rmgrs are wanted or not.
> 
> One last thing, did you do some benchmark with a couple custom rmgr
> to see how
> much the O(n) access is showing up in profiles?

What kind of a test case would be reasonable there? You mean having a
lot of custom rmgrs?

I was expecting that few people would have more than one custom rmgr
loaded anyway, so a sparse array or hashtable seemed wasteful. If
custom rmgrs become popular we probably need to have a larger ID space
anyway, but it seems like overengineering to do so now.

Regards,
    Jeff Davis





Re: Extensible Rmgr for Table AMs

From
Julien Rouhaud
Date:
Hi,

On Tue, Feb 01, 2022 at 03:38:32PM -0800, Jeff Davis wrote:
> On Tue, 2022-02-01 at 20:45 +0800, Julien Rouhaud wrote:
> > 
> > One last thing, did you do some benchmark with a couple custom rmgr
> > to see how
> > much the O(n) access is showing up in profiles?
> 
> What kind of a test case would be reasonable there? You mean having a
> lot of custom rmgrs?
> 
> I was expecting that few people would have more than one custom rmgr
> loaded anyway, so a sparse array or hashtable seemed wasteful. If
> custom rmgrs become popular we probably need to have a larger ID space
> anyway, but it seems like overengineering to do so now.

I agree that having dozen of custom rmgrs doesn't seem likely, but I also have
no idea of how much overhead you get by not doing a direct array access.  I
think it would be informative to benchmark something like simple OLTP write
workload on a fast storage (or a ramdisk, or with fsync off...), with the used
rmgr being the 1st and the 2nd custom rmgr.  Both scenario still seems
plausible and shouldn't degenerate on good hardware.



Re: Extensible Rmgr for Table AMs

From
Robert Haas
Date:
On Thu, Feb 3, 2022 at 12:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> I agree that having dozen of custom rmgrs doesn't seem likely, but I also have
> no idea of how much overhead you get by not doing a direct array access.  I
> think it would be informative to benchmark something like simple OLTP write
> workload on a fast storage (or a ramdisk, or with fsync off...), with the used
> rmgr being the 1st and the 2nd custom rmgr.  Both scenario still seems
> plausible and shouldn't degenerate on good hardware.

I think it would be hard to measure the overhead of this approach on a
macrobenchmark. That having been said, I find this a surprising
implementation choice. I think that the approaches that are most worth
considering are:

(1) reallocate the array if needed so that we can continue to just do
RmgrTable[rmid]
(2) have one array for builtins and a second array for extensions and
do rmid < RM_CUSTOM_MIN_ID ? BuiltinRmgrTable[rmid] :
ExtensionRmgrTable[rmid]
(3) change RmgrTable to be an array of pointers to structs rather than
an an array of structs. then the structs don't move around and can be
const, but the pointers can be moved into a larger array if required

I'm not really sure which is best. My intuition for what will be
cheapest on modern hardware is pretty shaky. However, I can't see how
it can be the thing the patch is doing now; a linear search seems like
it has to be the slowest option.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Extensible Rmgr for Table AMs

From
Julien Rouhaud
Date:
Hi,

On Fri, Feb 04, 2022 at 09:10:42AM -0500, Robert Haas wrote:
> On Thu, Feb 3, 2022 at 12:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > I agree that having dozen of custom rmgrs doesn't seem likely, but I also have
> > no idea of how much overhead you get by not doing a direct array access.  I
> > think it would be informative to benchmark something like simple OLTP write
> > workload on a fast storage (or a ramdisk, or with fsync off...), with the used
> > rmgr being the 1st and the 2nd custom rmgr.  Both scenario still seems
> > plausible and shouldn't degenerate on good hardware.
> 
> I think it would be hard to measure the overhead of this approach on a
> macrobenchmark.

Yeah that's also my initial thought, but I wouldn't be terribly surprised to be
wrong.

> That having been said, I find this a surprising
> implementation choice. I think that the approaches that are most worth
> considering are:
> 
> (1) reallocate the array if needed so that we can continue to just do
> RmgrTable[rmid]
> (2) have one array for builtins and a second array for extensions and
> do rmid < RM_CUSTOM_MIN_ID ? BuiltinRmgrTable[rmid] :
> ExtensionRmgrTable[rmid]
> (3) change RmgrTable to be an array of pointers to structs rather than
> an an array of structs. then the structs don't move around and can be
> const, but the pointers can be moved into a larger array if required
> 
> I'm not really sure which is best. My intuition for what will be
> cheapest on modern hardware is pretty shaky. However, I can't see how
> it can be the thing the patch is doing now; a linear search seems like
> it has to be the slowest option.

I guess the idea was to have a compromise between letting rmgr authors choose
arbitrary ids to avoid any conflicts, especially with private implementations,
without wasting too much memory.  But those approaches would be pretty much
incompatible with the current definition:

+#define RM_CUSTOM_MIN_ID       128
+#define RM_CUSTOM_MAX_ID       UINT8_MAX

even if you only allocate up to the  max id found, nothing guarantees that you
won't get a quite high id.



Re: Extensible Rmgr for Table AMs

From
Robert Haas
Date:
On Fri, Feb 4, 2022 at 9:48 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> I guess the idea was to have a compromise between letting rmgr authors choose
> arbitrary ids to avoid any conflicts, especially with private implementations,
> without wasting too much memory.  But those approaches would be pretty much
> incompatible with the current definition:
>
> +#define RM_CUSTOM_MIN_ID       128
> +#define RM_CUSTOM_MAX_ID       UINT8_MAX
>
> even if you only allocate up to the  max id found, nothing guarantees that you
> won't get a quite high id.

Right, which I guess raises another question: if the maximum is
UINT8_MAX, which BTW I find perfectly reasonable, why are we not just
defining this as an array of size 256? There's no point in adding code
complexity to save a few kB of memory.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Extensible Rmgr for Table AMs

From
Julien Rouhaud
Date:
On Fri, Feb 04, 2022 at 09:53:09AM -0500, Robert Haas wrote:
> On Fri, Feb 4, 2022 at 9:48 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > I guess the idea was to have a compromise between letting rmgr authors choose
> > arbitrary ids to avoid any conflicts, especially with private implementations,
> > without wasting too much memory.  But those approaches would be pretty much
> > incompatible with the current definition:
> >
> > +#define RM_CUSTOM_MIN_ID       128
> > +#define RM_CUSTOM_MAX_ID       UINT8_MAX
> >
> > even if you only allocate up to the  max id found, nothing guarantees that you
> > won't get a quite high id.
> 
> Right, which I guess raises another question: if the maximum is
> UINT8_MAX, which BTW I find perfectly reasonable, why are we not just
> defining this as an array of size 256? There's no point in adding code
> complexity to save a few kB of memory.

Agreed, especially if combined with your suggested approach 3 (array of
pointers).



Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Fri, 2022-02-04 at 22:56 +0800, Julien Rouhaud wrote:
> > Right, which I guess raises another question: if the maximum is
> > UINT8_MAX, which BTW I find perfectly reasonable, why are we not
> > just
> > defining this as an array of size 256? There's no point in adding
> > code
> > complexity to save a few kB of memory.
> 
> Agreed, especially if combined with your suggested approach 3 (array
> of
> pointers).

Implemented and attached. I also updated pg_waldump and pg_rewind to do
something reasonable.

Additionally, I now have a reasonably complete implementation of a
custom resource manager now:

https://github.com/citusdata/citus/tree/custom-rmgr-15

(Not committed or intended to actually be used right now -- just a
POC.)

Offline, Andres mentioned that I should test recovery performance if we
take your approach, because making the RmgrTable non-const could impact
optimizations. Not sure if that would be a practical concern compared
to all the other work done at REDO time.

Regards,
    Jeff Davis


Attachment

Re: Extensible Rmgr for Table AMs

From
Andres Freund
Date:
Hi,

On 2022-03-23 21:43:08 -0700, Jeff Davis wrote:
>  /* must be kept in sync with RmgrData definition in xlog_internal.h */
>  #define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
> -    { name, redo, desc, identify, startup, cleanup, mask, decode },
> +    &(struct RmgrData){ name, redo, desc, identify, startup, cleanup, mask, decode },
>  
> -const RmgrData RmgrTable[RM_MAX_ID + 1] = {
> +static RmgrData *RmgrTable[RM_MAX_ID + 1] = {
>  #include "access/rmgrlist.h"
>  };

I think this has been discussed before, but to me it's not obvious that it's a
good idea to change RmgrTable from RmgrData to RmgrData *. That adds an
indirection, without obvious benefit.


> +
> +/*
> + * Start up all resource managers.
> + */
> +void
> +StartupResourceManagers()

(void)


> +void
> +RegisterCustomRmgr(RmgrId rmid, RmgrData *rmgr)
> +{
> +    if (rmid < RM_MIN_CUSTOM_ID)
> +        ereport(PANIC, errmsg("custom rmgr id %d is out of range", rmid));
> +
> +    if (!process_shared_preload_libraries_in_progress)
> +        ereport(ERROR,
> +                (errmsg("custom rmgr must be registered while initializing modules in shared_preload_libraries")));
> +
> +    ereport(LOG,
> +            (errmsg("registering custom rmgr \"%s\" with ID %d",
> +                    rmgr->rm_name, rmid)));
> +
> +    if (RmgrTable[rmid] != NULL)
> +        ereport(PANIC,
> +                (errmsg("custom rmgr ID %d already registered with name \"%s\"",
> +                        rmid, RmgrTable[rmid]->rm_name)));
> +
> +    /* check for existing rmgr with the same name */
> +    for (int i = 0; i <= RM_MAX_ID; i++)
> +    {
> +        const RmgrData *existing_rmgr = RmgrTable[i];
> +
> +        if (existing_rmgr == NULL)
> +            continue;
> +
> +        if (!strcmp(existing_rmgr->rm_name, rmgr->rm_name))
> +            ereport(PANIC,
> +                    (errmsg("custom rmgr \"%s\" has the same name as builtin rmgr",
> +                            existing_rmgr->rm_name)));
> +    }
> +
> +    /* register it */
> +    RmgrTable[rmid] = rmgr;
> +}

Random idea: Might be worth emitting the id->name mapping just after a redo
location is determined, to make it easier to debug things.


> +RmgrData *
> +GetRmgr(RmgrId rmid)
> +{
> +    return RmgrTable[rmid];
> +}

Given this is so simple, why incur the cost of a function call? Rather than
continuing to expose RmgrTable and move GetRmgr() into the header, as a static
inline? As-is this also prevent the compiler from optimizing across repeated
GetRmgr() calls (which often won't be possible anyway, but still)..


> -    if (record->xl_rmid > RM_MAX_ID)
> +    if (record->xl_rmid > RM_MAX_BUILTIN_ID && record->xl_rmid < RM_MIN_CUSTOM_ID)
>      {
>          report_invalid_record(state,
>                                "invalid resource manager ID %u at %X/%X",

Shouldn't this continue to enforce RM_MAX_ID as well?


> @@ -1604,12 +1603,7 @@ PerformWalRecovery(void)
>  
>          InRedo = true;
>  
> -        /* Initialize resource managers */
> -        for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
> -        {
> -            if (RmgrTable[rmid].rm_startup != NULL)
> -                RmgrTable[rmid].rm_startup();
> -        }
> +        StartupResourceManagers();

Personally I'd rather name it ResourceManagersStartup() or RmgrStartup().


> @@ -1871,7 +1860,7 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord *record, TimeLineID *repl
>          xlogrecovery_redo(xlogreader, *replayTLI);
>  
>      /* Now apply the WAL record itself */
> -    RmgrTable[record->xl_rmid].rm_redo(xlogreader);
> +    GetRmgr(record->xl_rmid)->rm_redo(xlogreader);
>  
>      /*
>       * After redo, check whether the backup pages associated with the WAL

So we have just added one indirect call and one pointer indirection
(previously RmgrTable could be resolved by the linker, now it needs to be
dereferenced again), that's not too bad. I was afraid there'd be multiple
calls.


> @@ -2101,16 +2090,16 @@ xlog_outdesc(StringInfo buf, XLogReaderState *record)
>      uint8        info = XLogRecGetInfo(record);
>      const char *id;
>  
> -    appendStringInfoString(buf, RmgrTable[rmid].rm_name);
> +    appendStringInfoString(buf, GetRmgr(rmid)->rm_name);
>      appendStringInfoChar(buf, '/');
>  
> -    id = RmgrTable[rmid].rm_identify(info);
> +    id = GetRmgr(rmid)->rm_identify(info);
>      if (id == NULL)
>          appendStringInfo(buf, "UNKNOWN (%X): ", info & ~XLR_INFO_MASK);
>      else
>          appendStringInfo(buf, "%s: ", id);
>  
> -    RmgrTable[rmid].rm_desc(buf, record);
> +    GetRmgr(rmid)->rm_desc(buf, record);
>  }

Like here. It's obviously not as performance critical as replay. But it's
still a shame to add 3 calls to GetRmgr, that each then need to dereference
RmgrTable. The compiler won't be able to optimize any of that away.


> @@ -117,8 +117,8 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
>  
>      rmid = XLogRecGetRmid(record);
>  
> -    if (RmgrTable[rmid].rm_decode != NULL)
> -        RmgrTable[rmid].rm_decode(ctx, &buf);
> +    if (GetRmgr(rmid)->rm_decode != NULL)
> +        GetRmgr(rmid)->rm_decode(ctx, &buf);
>      else
>      {
>          /* just deal with xid, and done */

This one might actually matter, overhead wise.


> @@ -473,7 +473,7 @@ static void
>  XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
>  {
>      const char *id;
> -    const RmgrDescData *desc = &RmgrDescTable[XLogRecGetRmid(record)];
> +    const RmgrDescData *desc = GetRmgrDesc(XLogRecGetRmid(record));
>      uint32        rec_len;
>      uint32        fpi_len;
>      RelFileNode rnode;
> @@ -658,7 +658,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
>       * calculate column totals.
>       */
>  
> -    for (ri = 0; ri < RM_NEXT_ID; ri++)
> +    for (ri = 0; ri < RM_MAX_ID; ri++)
>      {
>          total_count += stats->rmgr_stats[ri].count;
>          total_rec_len += stats->rmgr_stats[ri].rec_len;

> @@ -694,6 +694,9 @@ XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats)
>              fpi_len = stats->rmgr_stats[ri].fpi_len;
>              tot_len = rec_len + fpi_len;
>  
> +            if (ri > RM_MAX_BUILTIN_ID && count == 0)
> +                continue;
> +

Ah, I see. I had written a concerned comment about the previous hunk...


>              XLogDumpStatsRow(desc->rm_name,
>                               count, total_count, rec_len, total_rec_len,
>                               fpi_len, total_fpi_len, tot_len, total_len);
> @@ -913,16 +916,16 @@ main(int argc, char **argv)
>                          exit(EXIT_SUCCESS);
>                      }
>  
> -                    for (i = 0; i <= RM_MAX_ID; i++)
> +                    for (i = 0; i <= RM_MAX_BUILTIN_ID; i++)
>                      {
> -                        if (pg_strcasecmp(optarg, RmgrDescTable[i].rm_name) == 0)
> +                        if (pg_strcasecmp(optarg, GetRmgrDesc(i)->rm_name) == 0)
>                          {
>                              config.filter_by_rmgr[i] = true;
>                              config.filter_by_rmgr_enabled = true;
>                              break;
>                          }
>                      }
> -                    if (i > RM_MAX_ID)
> +                    if (i > RM_MAX_BUILTIN_ID)
>                      {
>                          pg_log_error("resource manager \"%s\" does not exist",
>                                       optarg);

So we can't filter by rmgr id for non-builtin rmgr's? That sucks. Maybe add a
custom:<i> syntax? Or generally allow numerical identifiers in addition to the
names?


Greetings,

Andres Freund



Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Mon, 2022-03-28 at 11:00 -0700, Andres Freund wrote:
> I think this has been discussed before, but to me it's not obvious
> that it's a
> good idea to change RmgrTable from RmgrData to RmgrData *. That adds
> an
> indirection, without obvious benefit.

I did some performance tests. I created a narrow table, took a base
backup, loaded 100M records, finished the base backup. Then I recovered
using the different build combinations (patched/unpatched, clang/gcc).

                 compiler    run1   run2
    unpatched:   gcc         102s   106s
    patched:     gcc         107s   105s
    unpatched:   clang       109s   110s
    patched:     clang       109s   111s

I don't see a clear signal that this patch worsens performance. The
102s run was the very first run, so I suspect it was just due to the
processor starting out cold. Let me know if you think the test is
testing the right paths.

Perhaps I should address your other perf concerns around GetRmgr (make
it static inline, reduce number of calls), and then leave the
indirection for the sake of cleanliness?

If you are still concerned, I can switch back to separate tables to
eliminate the indirection for built-in rmgrs. Separate rmgr tables
still require a branch (to decide which table to access), but it should
be a highly predictable one.

Regards,
    Jeff Davis





Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Mon, 2022-03-28 at 11:00 -0700, Andres Freund wrote:
> > +/*
> > + * Start up all resource managers.
> > + */
> > +void
> > +StartupResourceManagers()
> 
> (void)

Fixed.

> Random idea: Might be worth emitting the id->name mapping just after
> a redo
> location is determined, to make it easier to debug things.

Not quite sure I understood this idea, do you mean dump all rmgrs and
the IDs when performing recovery?

> > +RmgrData *
> > +GetRmgr(RmgrId rmid)
> > +{
> > +    return RmgrTable[rmid];
> > +}
> 
> Given this is so simple, why incur the cost of a function call?
> Rather than
> continuing to expose RmgrTable and move GetRmgr() into the header, as
> a static
> inline?

Made it static inline.

> Shouldn't this continue to enforce RM_MAX_ID as well?

Done.

> Personally I'd rather name it ResourceManagersStartup() or
> RmgrStartup().

Done.

> Like here. It's obviously not as performance critical as replay. But
> it's
> still a shame to add 3 calls to GetRmgr, that each then need to
> dereference
> RmgrTable. The compiler won't be able to optimize any of that away.

Changed to only call it once and save it in a variable for each call
site where it makes sense.

> So we can't filter by rmgr id for non-builtin rmgr's? That sucks.
> Maybe add a
> custom:<i> syntax? Or generally allow numerical identifiers in
> addition to the
> names?

Good idea. I changed it to allow "custom###" to mean the custom rmgr
with ID ### (3 digits).

I still may change it to go back to two RmgrTables (one for builtin and
one for custom) to remove the lingering performance doubts. Other than
that, and some cleanup, this is pretty close to the version I intend to
commit.

Regards,
    Jeff Davis


Attachment

Re: Extensible Rmgr for Table AMs

From
Andres Freund
Date:
Hi,

On 2022-03-31 14:20:51 -0700, Jeff Davis wrote:
> If you are still concerned, I can switch back to separate tables to
> eliminate the indirection for built-in rmgrs. Separate rmgr tables
> still require a branch (to decide which table to access), but it should
> be a highly predictable one.

I still think the easiest and fastest would be to just make RmgrTable longer,
and not const. When registering, copy the caller provided struct into the
respective RmgrData element.  Yes, we'd waste a bit of space at the end of the
array, but it's typically not going to be touched and thus not be backed by
"actual" memory.

Greetings,

Andres Freund



Re: Extensible Rmgr for Table AMs

From
Bharath Rupireddy
Date:
On Mon, Apr 4, 2022 at 8:33 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> I still may change it to go back to two RmgrTables (one for builtin and
> one for custom) to remove the lingering performance doubts. Other than
> that, and some cleanup, this is pretty close to the version I intend to
> commit.

I just had a quick look over the v6 patch, few comments:

1) Why can't rmid be chosen by the extensions in sequential order from
(129 till 255), say, to start with a columnar extension can choose
129, another extension can register 130 and so on right? This way, the
RmgrStartup, RmgrCleanup, and XLogDumpDisplayRecord can avoid looping
over all the 256 entries, with a global variable like
current_max_rmid(representing the current number of rmgers)? Is there
any specific reason to allow them to choose rmgrid randomly between
129 till 255? Am I missing some discussion here?
2) RM_MAX_ID is now UINT8_MAX - do we need to make it configurable? or
do we think that only a few (127) custom rmgrs can exist?
3) Do we need to talk about extensible rmgrs and
https://wiki.postgresql.org/wiki/ExtensibleRmgr in documentation
somewhere? This will help extension developers to refer when needed.
4) Do we need to change this description?
       <para>
        The default value of this setting is the empty string, which disables
        the feature.  It can be set to <literal>all</literal> to check all
        records, or to a comma-separated list of resource managers to check
        only records originating from those resource managers.  Currently,
        the supported resource managers are <literal>heap</literal>,
        <literal>heap2</literal>, <literal>btree</literal>,
<literal>hash</literal>,
        <literal>gin</literal>, <literal>gist</literal>,
<literal>sequence</literal>,
        <literal>spgist</literal>, <literal>brin</literal>, and
<literal>generic</literal>. Onl
        superusers can change this setting.
5) Do we need to add a PGDLLIMPORT qualifier to RegisterCustomRmgr()
(possibly for RmgrStartup, RmgrTable, RmgrCleanup as well?)  so that
the Windows versions of extensions can use this feature?
6) Should the below strcmp pg_strcmp? The custom rmgrs can still use
rm_name with different cases and below code fail to catch it?
+ if (!strcmp(existing_rmgr->rm_name, rmgr->rm_name))
+ ereport(PANIC,
+ (errmsg("custom rmgr \"%s\" has the same name as builtin rmgr",
+ existing_rmgr->rm_name)));
7) What's the intention of the below code? It seems like below it
checks if there's already a rmgr with the given name (RM_MAX_ID). Had
it been RM_MAX_BUILTIN_ID instead of  RM_MAX_ID, the error message
does make sense. Is the intention here to not have duplicate rmgrs in
the entire RM_MAX_ID(both builtin and custom)?
+ /* check for existing rmgr with the same name */
+ for (int i = 0; i <= RM_MAX_ID; i++)
+ {
+ const RmgrData *existing_rmgr = RmgrTable[i];
+
+ if (existing_rmgr == NULL)
+ continue;
+
+ if (!strcmp(existing_rmgr->rm_name, rmgr->rm_name))
+ ereport(PANIC,
+ (errmsg("custom rmgr \"%s\" has the same name as builtin rmgr",
+ existing_rmgr->rm_name)));
8) Per https://wiki.postgresql.org/wiki/ExtensibleRmgr, 128 is
reserved, can we have it as a macro? Or something like (RM_MAX_ID/2+1)
+#define RM_MIN_CUSTOM_ID 128
9) Thinking if there's a way to test the code, maybe exposing
RegisterCustomRmgr as a function? I think we need to have a dummy
extension that will be used for testing this kind of patches/features
but that's a separate discussion IMO.

Regards,
Bharath Rupireddy.



Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Sun, 2022-04-03 at 21:33 -0700, Andres Freund wrote:
> I still think the easiest and fastest would be to just make RmgrTable
> longer,
> and not const. When registering, copy the caller provided struct into
> the
> respective RmgrData element.  Yes, we'd waste a bit of space at the
> end of the
> array, but it's typically not going to be touched and thus not be
> backed by
> "actual" memory.

Sounds good to me. I tried to break down the performance between these
approaches and didn't get a clear signal, so I'll go with your
intuition here.

Posting new patch in response to Bharath's review, which will include
this change.

Note that GetRmgr() also has an unlikely branch where it tests for the
validity of the rmgr before using it, so that it can throw a nice error
message if someone forgot to include the module in
shared_preload_libraries. I expect this will be highly predictable and
therefore not a problem.

Regards,
    Jeff Davis





Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Mon, 2022-04-04 at 11:15 +0530, Bharath Rupireddy wrote:
> 1) Why can't rmid be chosen by the extensions in sequential order
> from
> (129 till 255), say, to start with a columnar extension can choose
> 129, another extension can register 130 and so on right?

I'm not sure what you mean by "chosen by the extensions in sequential
order". If you mean:

(a) that it would depend on the library load order, and that postgres
would assign the ID; then that won't work because the rmgr IDs need to
be stable across restarts and config changes

(b) that there should be a convention where people reserve IDs on the
wiki page in sequential order; then that's fine with me

> 2) RM_MAX_ID is now UINT8_MAX - do we need to make it configurable?
> or
> do we think that only a few (127) custom rmgrs can exist?

127 should be plenty for quite a long time. If we need more, we can
come up with a different solution (which is OK because the WAL format
changes in new major versions).

> 3) Do we need to talk about extensible rmgrs and
> https://wiki.postgresql.org/wiki/ExtensibleRmgr in documentation
> somewhere? This will help extension developers to refer when needed.

We don't have user-facing documentation for every extension hook, and I
don't think it would be a good idea to document this one (at least in
the user-facing docs). Otherwise, we'd have to document the whole
resource manager interface, and that seems like way too much of the
internals.

> 4) Do we need to change this description?
>        <para>
>         The default value of this setting is the empty string, which
> disables
>         the feature.  It can be set to <literal>all</literal> to
> check all
>         records, or to a comma-separated list of resource managers to
> check
>         only records originating from those resource
> managers.  Currently,
>         the supported resource managers are <literal>heap</literal>,
>         <literal>heap2</literal>, <literal>btree</literal>,
> <literal>hash</literal>,
>         <literal>gin</literal>, <literal>gist</literal>,
> <literal>sequence</literal>,
>         <literal>spgist</literal>, <literal>brin</literal>, and
> <literal>generic</literal>. Onl
>         superusers can change this setting.

Yes, you're right, I updated the docs for pg_waldump and the
wal_consistency_checking GUC.

> 5) Do we need to add a PGDLLIMPORT qualifier to RegisterCustomRmgr()
> (possibly for RmgrStartup, RmgrTable, RmgrCleanup as well?)  so that
> the Windows versions of extensions can use this feature?

That seems to only be required for variables, not functions.

I don't think we want to mark RmgrTable this way, because it's not
intended to be accessed by extensions directly. It's accessed by
GetRmgr(), which is 'static inline', but that's also not intended to be
used by extensions.

> 6) Should the below strcmp pg_strcmp? The custom rmgrs can still use
> rm_name with different cases and below code fail to catch it?
> + if (!strcmp(existing_rmgr->rm_name, rmgr->rm_name))
> + ereport(PANIC,
> + (errmsg("custom rmgr \"%s\" has the same name as builtin rmgr",
> + existing_rmgr->rm_name)));

There are already a lot of places where users can choose identifiers
that differ only in uppercase/lowercase. I don't see a reason to make
an exception here.

> 7) What's the intention of the below code? It seems like below it
> checks if there's already a rmgr with the given name (RM_MAX_ID). Had
> it been RM_MAX_BUILTIN_ID instead of  RM_MAX_ID, the error message
> does make sense. Is the intention here to not have duplicate rmgrs in
> the entire RM_MAX_ID(both builtin and custom)?

Thank you. I redid a lot of the error messages and checked them out to
make sure they are helpful.

> 8) Per https://wiki.postgresql.org/wiki/ExtensibleRmgr, 128 is
> reserved, can we have it as a macro? Or something like
> (RM_MAX_ID/2+1)

It's already defined as RM_EXPERIMENTAL_ID. Is that what you meant or
did I misunderstand?

I don't see the point in defining it as RM_MAX_ID/2+1.

> 9) Thinking if there's a way to test the code, maybe exposing
> RegisterCustomRmgr as a function? I think we need to have a dummy
> extension that will be used for testing this kind of patches/features
> but that's a separate discussion IMO.

It's already exposed as a C function in xlog_internal.h.

You mean as a SQL function? I don't think that makes sense. It can only
be called while processing shared_preload_libraries (on server startup)
anyway.

Other changes this version:

 * implemented Andres's proposed performance change[1]
 * fix wal_consistency_checking
 * general cleanup

Regards,
    Jeff Davis

[1] 
https://postgr.es/m/20220404043337.ocjnni7hknjsibhg@alap3.anarazel.de

Attachment

Re: Extensible Rmgr for Table AMs

From
Bharath Rupireddy
Date:
On Tue, Apr 5, 2022 at 5:55 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2022-04-04 at 11:15 +0530, Bharath Rupireddy wrote:
> > 1) Why can't rmid be chosen by the extensions in sequential order
> > from
> > (129 till 255), say, to start with a columnar extension can choose
> > 129, another extension can register 130 and so on right?
>
> I'm not sure what you mean by "chosen by the extensions in sequential
> order". If you mean:
>
> (b) that there should be a convention where people reserve IDs on the
> wiki page in sequential order; then that's fine with me

Yes, I meant this. If we do this, then a global variable
current_max_rmid can be maintained (by RegisterCustomRmgr) and the
other functions RmgrStartup, RmgrCleanup and RegisterCustomRmgr can
avoid for-loops with full range RM_MAX_ID (for (int id = 0; id <
current_max_rmid; id++)).

> > 3) Do we need to talk about extensible rmgrs and
> > https://wiki.postgresql.org/wiki/ExtensibleRmgr in documentation
> > somewhere? This will help extension developers to refer when needed.
>
> We don't have user-facing documentation for every extension hook, and I
> don't think it would be a good idea to document this one (at least in
> the user-facing docs). Otherwise, we'd have to document the whole
> resource manager interface, and that seems like way too much of the
> internals.

With the custom rmgrs, now it's time to have it in the documentation
(probably not immediately, even after this patch gets in) describing
what rmgrs are, what are built-in and custom rmgrs and how and when an
extension needs to register, load and use rmgrs etc.

> > 4) Do we need to change this description?
> >        <para>
> > <literal>generic</literal>. Onl
> >         superusers can change this setting.
>
> Yes, you're right, I updated the docs for pg_waldump and the
> wal_consistency_checking GUC.

+       <para>
+        Extensions may define additional resource managers with modules loaded
+        in <xref linkend="guc-shared-preload-libraries"/>. However, the names
+        of the custom resource managers are not available at the time the

+ errhint("Include the extension module that implements this resource
manager in shared_preload_libraries.")));

With the above, do you mean to say that only the extensions that load
their library via shared_preload_libraries are allowed to have custom
rmgrs? How about extensions using {session, local}_preload_libraries,
LOAD command? How about extensions that don't load the shared library
via {session, local, shared}_preload_libraries or LOAD command and let
any of its functions load the shared library on first use?

> > 5) Do we need to add a PGDLLIMPORT qualifier to RegisterCustomRmgr()
> > (possibly for RmgrStartup, RmgrTable, RmgrCleanup as well?)  so that
> > the Windows versions of extensions can use this feature?
>
> That seems to only be required for variables, not functions.

You are right. For instance, the functions DefineCustomXXX are not
using PGDLLIMPORT qualifer, I believe they can be used by extensions
on WINDOWS.

> > 6) Should the below strcmp pg_strcmp? The custom rmgrs can still use
> > rm_name with different cases and below code fail to catch it?
> > + if (!strcmp(existing_rmgr->rm_name, rmgr->rm_name))
> > + ereport(PANIC,
> > + (errmsg("custom rmgr \"%s\" has the same name as builtin rmgr",
> > + existing_rmgr->rm_name)));
>
> There are already a lot of places where users can choose identifiers
> that differ only in uppercase/lowercase. I don't see a reason to make
> an exception here.

I think postgres parses the user-specified strings and converts them
to lowercase unless they are double-quoted, that is why we see strcmp,
not pg_strcasecmp. But, the custom rmgr name isn't parsed by postgres
right?

For instance - extension 1, with id 130 and name 'rmgr_foo' and
extension 2 with id 131 and name 'Rmgr_foo'/'RMGR_FOO'...., then the
following code may not catch it right? Do you want rmgr names to be
case-sensitive/insensitive?

+ if (!strcmp(RmgrTable[i].rm_name, rmgr->rm_name))
+ ereport(PANIC,
+ (errmsg("failed to register custom resource manager \"%s\" with ID
%d", rmgr->rm_name, rmid),
+ errdetail("Existing resource manager with ID %d has the same name.", i)));

> > 8) Per https://wiki.postgresql.org/wiki/ExtensibleRmgr, 128 is
> > reserved, can we have it as a macro? Or something like
> > (RM_MAX_ID/2+1)
>
> It's already defined as RM_EXPERIMENTAL_ID. Is that what you meant or
> did I misunderstand?

+#define RM_MAX_ID UINT8_MAX
+#define RM_MAX_BUILTIN_ID (RM_NEXT_ID - 1)
+#define RM_MIN_CUSTOM_ID 128
+#define RM_N_CUSTOM_IDS (RM_MAX_ID - RM_MIN_CUSTOM_ID + 1)
+#define RMID_IS_BUILTIN(rmid) ((rmid) <= RM_MAX_BUILTIN_ID)
+#define RMID_IS_CUSTOM(rmid) ((rmid) >= RM_MIN_CUSTOM_ID && (rmid) < RM_MAX_ID)
+#define RMID_IS_VALID(rmid) (RMID_IS_BUILTIN((rmid)) || RMID_IS_CUSTOM((rmid)))

To me the above seems complex, why can't we just have the following,
it's okay even if we don't use some of the macros in the *.c files?
total rmgrs = 256
built in = 128 (index 0 - 127)
custom = 128 (index 128 - 255)

#define RM_N_IDS (UINT8_MAX + 1)
#define RM_N_BUILTIN_IDS (RM_N_IDS / 2)
#define RM_N_CUSTOM_IDS (RM_N_IDS / 2)
#define RM_MAX_ID (RM_N_IDS - 1)
#define RM_MAX_BUILTIN_ID (RM_NEXT_ID - 1)
#define RM_MIN_CUSTOM_ID (RM_N_IDS / 2)
#define RM_MAX_CUSTOM_ID RM_MAX_ID
#define RMID_IS_BUILTIN(rmid) ((rmid) <= RM_MAX_BUILTIN_ID)
#define RMID_IS_CUSTOM(rmid) ((rmid) >= RM_MIN_CUSTOM_ID && (rmid) <=
RM_MAX_CUSTOM_ID)
#define RMID_IS_VALID(rmid) (RMID_IS_BUILTIN((rmid)) || RMID_IS_CUSTOM((rmid)))

If okay, you can specify in a comment that out of UINT8_MAX + 1 rmgrs,
first half rmgr ids are supposed to be reserved for built in rmgrs and
second half rmgr ids for extensions.

> I don't see the point in defining it as RM_MAX_ID/2+1.
>
> > 9) Thinking if there's a way to test the code, maybe exposing
> > RegisterCustomRmgr as a function? I think we need to have a dummy
> > extension that will be used for testing this kind of patches/features
> > but that's a separate discussion IMO.
>
> It's already exposed as a C function in xlog_internal.h.
>
> You mean as a SQL function? I don't think that makes sense. It can only
> be called while processing shared_preload_libraries (on server startup)
> anyway.

Yes, an SQL function showing all of the available rmgrs. With the
custom rmgrs, I think, an SQL function like pg_resource_managers()
returning a set of {name, id, (if possible - loaded_by_extension_name,
rm_redo, rm_desc, rm_identify ... api names)}

Some more comments:

1) Is there any specific reason to have a function to emit a PANIC
message? And it's being used only in one place in GetRmgr.
+void
+RmgrNotFound(RmgrId rmid)

2) How about "If rm_name is NULL, the corresponding RmgrTable[] entry
is considered invalid."?
+ * RmgrTable[] is indexed by RmgrId values (see rmgrlist.h). If rm_name is
+ * NULL, the structure is considered invalid.

3) How about adding errdetail something like "Custom resource manager
ID must be between %d and %d (both inclusive)", RM_MAX_BUILTIN_ID,
RM_MAX_ID)?
+ if (rmid < RM_MIN_CUSTOM_ID)
+ ereport(PANIC, errmsg("custom resource manager ID %d is out of range", rmid));

4) How about adding errdetail something like "Custom resource manager
must have a name"
+ if (rmgr->rm_name == NULL)
+ ereport(PANIC, errmsg("custom resource manager is invalid"));

5) How about "successfully registered custom resource manager"?
+ (errmsg("registered custom resource manager \"%s\" with ID %d",

6) Shouldn't this be ri <= RM_NEXT_ID?
- for (ri = 0; ri < RM_NEXT_ID; ri++)
+ for (ri = 0; ri < RM_MAX_ID; ri++)

- for (ri = 0; ri < RM_NEXT_ID; ri++)
+ for (ri = 0; ri <= RM_MAX_ID; ri++)

7) Unrelated to this patch, why is there always an extra slot RM_MAX_ID + 1?

-const RmgrDescData RmgrDescTable[RM_MAX_ID + 1] = {
+static const RmgrDescData RmgrDescTable[RM_MAX_BUILTIN_ID + 1] = {
 #include "access/rmgrlist.h"

-const RmgrData RmgrTable[RM_MAX_ID + 1] = {
+RmgrData RmgrTable[RM_MAX_ID + 1] = {

Regards,
Bharath Rupireddy.



Re: Extensible Rmgr for Table AMs

From
Jeff Davis
Date:
On Tue, 2022-04-05 at 15:12 +0530, Bharath Rupireddy wrote:
> Yes, I meant this. If we do this, then a global variable
> current_max_rmid can be maintained (by RegisterCustomRmgr) and the
> other functions RmgrStartup, RmgrCleanup and RegisterCustomRmgr can
> avoid for-loops with full range RM_MAX_ID (for (int id = 0; id <
> current_max_rmid; id++)).

There are only 255 entries, so the loops in those functions aren't a
significant overhead.

If we expand it, then we can come up a different solution. We can
change it easily enough later, because the WAL format changes between
releases and so do the APIs for any extension that might be using this.

> With the custom rmgrs, now it's time to have it in the documentation
> (probably not immediately, even after this patch gets in) describing
> what rmgrs are, what are built-in and custom rmgrs and how and when
> an
> extension needs to register, load and use rmgrs etc.

Added a documentation section right after Generic WAL.

> With the above, do you mean to say that only the extensions that load
> their library via shared_preload_libraries are allowed to have custom
> rmgrs? How about extensions using {session, local}_preload_libraries,
> LOAD command? How about extensions that don't load the shared library
> via {session, local, shared}_preload_libraries or LOAD command and
> let
> any of its functions load the shared library on first use?

Correct. The call to RegisterCustomRmgr *must* happen in the postmaster
before any other processes (even the startup process) are launched.
Otherwise recovery wouldn't work. The only place to make that happen is
shared_preload_libraries.

> For instance - extension 1, with id 130 and name 'rmgr_foo' and
> extension 2 with id 131 and name 'Rmgr_foo'/'RMGR_FOO'...., then the
> following code may not catch it right? Do you want rmgr names to be
> case-sensitive/insensitive?

You convinced me. pg_waldump.c uses pg_strcasecmp(), so I'll use it as
well.

> #define RM_N_IDS (UINT8_MAX + 1)
> #define RM_N_BUILTIN_IDS (RM_N_IDS / 2)
> #define RM_N_CUSTOM_IDS (RM_N_IDS / 2)
> #define RM_MAX_ID (RM_N_IDS - 1)
> #define RM_MAX_BUILTIN_ID (RM_NEXT_ID - 1)
> #define RM_MIN_CUSTOM_ID (RM_N_IDS / 2)
> #define RM_MAX_CUSTOM_ID RM_MAX_ID
> #define RMID_IS_BUILTIN(rmid) ((rmid) <= RM_MAX_BUILTIN_ID)
> #define RMID_IS_CUSTOM(rmid) ((rmid) >= RM_MIN_CUSTOM_ID && (rmid) <=
> RM_MAX_CUSTOM_ID)
> #define RMID_IS_VALID(rmid) (RMID_IS_BUILTIN((rmid)) ||
> RMID_IS_CUSTOM((rmid)))

I improved this a bit. But I didn't use your version that's based on
division, that was more confusing to me.

> Yes, an SQL function showing all of the available rmgrs. With the
> custom rmgrs, I think, an SQL function like pg_resource_managers()
> returning a set of {name, id, (if possible -
> loaded_by_extension_name,
> rm_redo, rm_desc, rm_identify ... api names)}

Added.

> 
> Some more comments:
> 
> 1) Is there any specific reason to have a function to emit a PANIC
> message? And it's being used only in one place in GetRmgr.
> +void
> +RmgrNotFound(RmgrId rmid)

Good call -- better to raise an error, and it can get escalated if it
happens in the wrong place. Done.

> 2) How about "If rm_name is NULL, the corresponding RmgrTable[] entry
> is considered invalid."?
> + * RmgrTable[] is indexed by RmgrId values (see rmgrlist.h). If
> rm_name is
> + * NULL, the structure is considered invalid.

Done.

> 3) How about adding errdetail something like "Custom resource manager
> ID must be between %d and %d (both inclusive)", RM_MAX_BUILTIN_ID,
> RM_MAX_ID)?
> + if (rmid < RM_MIN_CUSTOM_ID)
> + ereport(PANIC, errmsg("custom resource manager ID %d is out of
> range", rmid));

Done.

> 4) How about adding errdetail something like "Custom resource manager
> must have a name"
> + if (rmgr->rm_name == NULL)
> + ereport(PANIC, errmsg("custom resource manager is invalid"));

Improved this message.

> 5) How about "successfully registered custom resource manager"?
> + (errmsg("registered custom resource manager \"%s\" with ID %d",

That's a little much, we don't want to sound too excited that it worked
;-)

> 6) Shouldn't this be ri <= RM_NEXT_ID?
> - for (ri = 0; ri < RM_NEXT_ID; ri++)
> + for (ri = 0; ri < RM_MAX_ID; ri++)
> 
> - for (ri = 0; ri < RM_NEXT_ID; ri++)
> + for (ri = 0; ri <= RM_MAX_ID; ri++)

That would leave out the custom rmgrs.

> 7) Unrelated to this patch, why is there always an extra slot
> RM_MAX_ID + 1?

RM_MAX_ID is 255, but we need 256 entries from 0-255 (inclusive).

Good suggestions, thank you for the review!

I'm happy with this patch and I committed it. Changes from last
version:

   1. I came up with a better solution (based on a suggestion from
Andres) to handle wal_consistency_checking properly. We can't
understand the custom resource managers when the configuration file is
first loaded, so if we enounter an unknown resource manager, I set a
flag and reprocess it after the shared_preload_libraries are loaded.

   2. Added pg_get_wal_resource_managers().

   3. Documentation.

Regards,
    Jeff Davis





Re: Extensible Rmgr for Table AMs

From
Andres Freund
Date:
Hi,

On 2022-04-06 23:15:27 -0700, Jeff Davis wrote:
> I'm happy with this patch and I committed it.

That caused breakage with WAL_DEBUG enabled. Fixed that. But it was
half-broken before, at least since 70e81861fad, because 'rmid' didn't refer to
the current record's rmid anymore, but to rmid from "Initialize resource
managers" - a constant.


Causes plenty new warnings here:

In file included from /home/andres/src/postgresql/src/include/access/xlogrecord.h:14,
                 from /home/andres/src/postgresql/src/include/access/xlogreader.h:41,
                 from /home/andres/src/postgresql/src/include/access/brin_xlog.h:17,
                 from /home/andres/src/postgresql/src/backend/access/transam/rmgr.c:10:
/home/andres/src/postgresql/src/backend/access/transam/rmgr.c: In function ‘RegisterCustomRmgr’:
/home/andres/src/postgresql/src/include/access/rmgr.h:42:66: warning: comparison is always true due to limited range of
datatype [-Wtype-limits]
 
   42 |                                                           (rmid) <= RM_MAX_CUSTOM_ID)
      |                                                                  ^~
/home/andres/src/postgresql/src/backend/access/transam/rmgr.c:104:14: note: in expansion of macro ‘RMID_IS_CUSTOM’
  104 |         if (!RMID_IS_CUSTOM(rmid))
      |              ^~~~~~~~~~~~~~
In file included from /home/andres/src/postgresql/src/include/c.h:814,
                 from /home/andres/src/postgresql/src/include/postgres.h:46,
                 from /home/andres/src/postgresql/src/bin/pg_waldump/rmgrdesc.c:9:
/home/andres/src/postgresql/src/bin/pg_waldump/rmgrdesc.c: In function ‘GetRmgrDesc’:
/home/andres/src/postgresql/src/include/access/rmgr.h:42:66: warning: comparison is always true due to limited range of
datatype [-Wtype-limits]
 
   42 |                                                           (rmid) <= RM_MAX_CUSTOM_ID)
      |                                                                  ^~
/home/andres/src/postgresql/src/bin/pg_waldump/rmgrdesc.c:89:9: note: in expansion of macro ‘Assert’
   89 |         Assert(RMID_IS_VALID(rmid));
      |         ^~~~~~
/home/andres/src/postgresql/src/include/access/rmgr.h:43:57: note: in expansion of macro ‘RMID_IS_CUSTOM’
   43 | #define RMID_IS_VALID(rmid) (RMID_IS_BUILTIN((rmid)) || RMID_IS_CUSTOM((rmid)))
      |                                                         ^~~~~~~~~~~~~~
/home/andres/src/postgresql/src/bin/pg_waldump/rmgrdesc.c:89:16: note: in expansion of macro ‘RMID_IS_VALID’
   89 |         Assert(RMID_IS_VALID(rmid));
      |                ^~~~~~~~~~~~~
/home/andres/src/postgresql/src/include/access/rmgr.h:42:66: warning: comparison is always true due to limited range of
datatype [-Wtype-limits]
 
   42 |                                                           (rmid) <= RM_MAX_CUSTOM_ID)
      |                                                                  ^~
/home/andres/src/postgresql/src/bin/pg_waldump/rmgrdesc.c:89:9: note: in expansion of macro ‘Assert’
   89 |         Assert(RMID_IS_VALID(rmid));
      |         ^~~~~~
/home/andres/src/postgresql/src/include/access/rmgr.h:43:57: note: in expansion of macro ‘RMID_IS_CUSTOM’
   43 | #define RMID_IS_VALID(rmid) (RMID_IS_BUILTIN((rmid)) || RMID_IS_CUSTOM((rmid)))
      |                                                         ^~~~~~~~~~~~~~
/home/andres/src/postgresql/src/bin/pg_waldump/rmgrdesc.c:89:16: note: in expansion of macro ‘RMID_IS_VALID’
   89 |         Assert(RMID_IS_VALID(rmid));
      |                ^~~~~~~~~~~~~

Greetings,

Andres Freund



Re: Extensible Rmgr for Table AMs

From
Andres Freund
Date:
Hi,

On 2022-04-06 23:35:05 -0700, Andres Freund wrote:
> Causes plenty new warnings here:

And my machine isn't alone. There's also:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2022-04-07%2006%3A40%3A14

rmgrdesc.c:44:1: error: missing braces around initializer [-Werror=missing-braces]
rmgrdesc.c:44:1: error: (near initialization for 'CustomNumericNames[0]') [-Werror=missing-braces]
rmgrdesc.c:45:1: error: missing braces around initializer [-Werror=missing-braces]
rmgrdesc.c:45:1: error: (near initialization for 'CustomRmgrDesc[0]') [-Werror=missing-braces]
cc1: all warnings being treated as errors

Greetings,

Andres Freund



Re: Extensible Rmgr for Table AMs

From
Andres Freund
Date:
Hi,

On 2022-04-06 23:56:40 -0700, Andres Freund wrote:
> On 2022-04-06 23:35:05 -0700, Andres Freund wrote:
> > Causes plenty new warnings here:
> 
> And my machine isn't alone. There's also:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2022-04-07%2006%3A40%3A14
> 
> rmgrdesc.c:44:1: error: missing braces around initializer [-Werror=missing-braces]
> rmgrdesc.c:44:1: error: (near initialization for 'CustomNumericNames[0]') [-Werror=missing-braces]
> rmgrdesc.c:45:1: error: missing braces around initializer [-Werror=missing-braces]
> rmgrdesc.c:45:1: error: (near initialization for 'CustomRmgrDesc[0]') [-Werror=missing-braces]
> cc1: all warnings being treated as errors

(pushing a fix for this one in a few)

There's also:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-07%2006%3A49%3A14

ccache /opt/developerstudio12.6/bin/cc  -m64 -Xa -v -O  findtimezone.o initdb.o localtime.o -L../../../src/port
-L../../../src/common-L../../../src/fe_utils -lpgfeutils -L../../../src/common -lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq-lpq   -L/home/nm/sw/nopath/uuid-64/lib -L/home/nm/sw/nopath/openldap-64/lib
-Wl,--as-needed-Wl,-R'/home/nm/farm/studio64v12_6/HEAD/inst/lib'  -lpgcommon -lpgport -lxslt -lxml2 -lpam -lssl
-lcrypto-lgssapi_krb5 -lz -lreadline -ltermcap -lnsl -lsocket -lm  -o initdb
 
Undefined            first referenced
 symbol                  in file
RmgrTable                           initdb.o

IIRC sun studio is annoying because it emits references to global variables
when they're mentioned in a static inline, even if that inline is unused.

Greetings,

Andres Freund



Re: Extensible Rmgr for Table AMs

From
Andres Freund
Date:
Hi,

On 2022-04-07 00:45:34 -0700, Andres Freund wrote:
> On 2022-04-06 23:56:40 -0700, Andres Freund wrote:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-07%2006%3A49%3A14
> 
> ccache /opt/developerstudio12.6/bin/cc  -m64 -Xa -v -O  findtimezone.o initdb.o localtime.o -L../../../src/port
-L../../../src/common-L../../../src/fe_utils -lpgfeutils -L../../../src/common -lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq-lpq   -L/home/nm/sw/nopath/uuid-64/lib -L/home/nm/sw/nopath/openldap-64/lib
-Wl,--as-needed-Wl,-R'/home/nm/farm/studio64v12_6/HEAD/inst/lib'  -lpgcommon -lpgport -lxslt -lxml2 -lpam -lssl
-lcrypto-lgssapi_krb5 -lz -lreadline -ltermcap -lnsl -lsocket -lm  -o initdb
 
> Undefined            first referenced
>  symbol                  in file
> RmgrTable                           initdb.o
> 
> IIRC sun studio is annoying because it emits references to global variables
> when they're mentioned in a static inline, even if that inline is unused.

Looks like a #ifndef FRONTEND around the inlines should work? Or just go back
to the macros, but without the RM_MAX_CUSTOM_ID check that's implied due to
the type width anyway.

Greetings,

Andres Freund



Re: Extensible Rmgr for Table AMs

From
Bharath Rupireddy
Date:
On Thu, Apr 7, 2022 at 11:45 AM Jeff Davis <pgsql@j-davis.com> wrote:
> I'm happy with this patch and I committed it. Changes from last
> version:

Hi Jeff, I think there's a typo [1] in pg_waldump.c, we might miss to
take account of the last custom resource mgr stats. Please fix it.

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 7d92dcaf87..30ca7684bd 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -720,7 +720,7 @@ XLogDumpDisplayStats(XLogDumpConfig *config,
XLogDumpStats *stats)
         * calculate column totals.
         */

-       for (ri = 0; ri < RM_MAX_ID; ri++)
+       for (ri = 0; ri <= RM_MAX_ID; ri++)
        {
                total_count += stats->rmgr_stats[ri].count;
                total_rec_len += stats->rmgr_stats[ri].rec_len;

Regards,
Bharath Rupireddy.



Re: Extensible Rmgr for Table AMs

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I'm happy with this patch and I committed it.

wrasse is less happy:

ccache /opt/developerstudio12.6/bin/cc  -m64 -Xa -v -O  findtimezone.o initdb.o localtime.o -L../../../src/port
-L../../../src/common-L../../../src/fe_utils -lpgfeutils -L../../../src/common -lpgcommon -L../../../src/port -lpgport
-L../../../src/interfaces/libpq-lpq   -L/home/nm/sw/nopath/uuid-64/lib -L/home/nm/sw/nopath/openldap-64/lib
-Wl,--as-needed-Wl,-R'/home/nm/farm/studio64v12_6/HEAD/inst/lib'  -lpgcommon -lpgport -lxslt -lxml2 -lpam -lssl
-lcrypto-lgssapi_krb5 -lz -lreadline -ltermcap -lnsl -lsocket -lm  -o initdb 
Undefined            first referenced
 symbol                  in file
RmgrTable                           initdb.o
ld: fatal: symbol referencing errors

I think this means the static inline functions you added in
xlog_internal.h need to be wrapped in #ifndef FRONTEND.

            regards, tom lane