Thread: Reloptions for table access methods

Reloptions for table access methods

From
Jeff Davis
Date:
A custom table access method might want to add a new reloption to
control something specific to that table access method. Unfortunately,
if you add a new option of type RELOPT_KIND_HEAP, it will immediately
fail because of the validation that happens in fillRelOptions().

Right now, heap reloptions (e.g. FILLFACTOR) are validated in two
places: parseRelOptions() and fillRelOptions().

parseRelOptions() validates against boolRelOpts[], intRelOpts[], etc.
This validation is extensible by add_bool_reloption(), etc.

fillRelOptions() validates when filling in a struct to make sure there
aren't "leftover" options. It does this using a hard-coded parsing
table that is not extensible.

Index access methods get total control over validation of reloptions,
but that doesn't fit well with heaps, because all heaps need the
vacuum-related options.

I considered some other approaches, but they all seemed like over-
engineering, so the attached patch just passes validate=false to
fillRelOptions() for heaps. That allows custom table access methods to
just define new options of kind RELOPT_KIND_HEAP.

Regards,
    Jeff Davis


Attachment

Re: Reloptions for table access methods

From
gkokolatos@pm.me
Date:




‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, 1 September 2020 09:18, Jeff Davis <pgsql@j-davis.com> wrote:

> A custom table access method might want to add a new reloption to
> control something specific to that table access method. Unfortunately,
> if you add a new option of type RELOPT_KIND_HEAP, it will immediately
> fail because of the validation that happens in fillRelOptions().
>
> Right now, heap reloptions (e.g. FILLFACTOR) are validated in two
> places: parseRelOptions() and fillRelOptions().
>
> parseRelOptions() validates against boolRelOpts[], intRelOpts[], etc.
> This validation is extensible by add_bool_reloption(), etc.
>
> fillRelOptions() validates when filling in a struct to make sure there
> aren't "leftover" options. It does this using a hard-coded parsing
> table that is not extensible.
>
> Index access methods get total control over validation of reloptions,
> but that doesn't fit well with heaps, because all heaps need the
> vacuum-related options.
>
> I considered some other approaches, but they all seemed like over-
> engineering, so the attached patch just passes validate=false to
> fillRelOptions() for heaps. That allows custom table access methods to
> just define new options of kind RELOPT_KIND_HEAP.

I have yet to look at the diff. I simply wanted to give you my wholehearted +1 to the idea. It is a necessary and an
essentialpart of developing access methods. 

I have worked extensively in the merge of PG12 into Greenplum, with a focus to the tableam api. Handling reloptions has
beenquite a pain to do cleanly. Given the nature of Greenplum's table access methods, we were forced to take it a
coupleof steps further. We did use an approach which I am certain that you have considered and discarded as
over-engineeringfor postgres. 

In short, I am really excited to see a patch for this topic!

>
> Regards,
> Jeff Davis





Re: Reloptions for table access methods

From
Alvaro Herrera
Date:
On 2020-Aug-31, Jeff Davis wrote:

> fillRelOptions() validates when filling in a struct to make sure there
> aren't "leftover" options. It does this using a hard-coded parsing
> table that is not extensible.

Hmm, I think that if we're going to do this, we should do it for all
AMs, not just table AMs, since surely index AMs also want extensible
reloptions; and I think that makes the 'validate' mechanism dead code
and so we should remove it.

I think this means that you can have tables with mistyped options, and
you'd not get an error message about them.  Is that right?  Is that what
we want?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Reloptions for table access methods

From
Jeff Davis
Date:
On Tue, 2020-09-01 at 12:20 -0400, Alvaro Herrera wrote:
> Hmm, I think that if we're going to do this, we should do it for all
> AMs, not just table AMs, since surely index AMs also want extensible
> reloptions; and I think that makes the 'validate' mechanism dead code
> and so we should remove it.

Index AMs totally control the validation, so as they add options with
add_bool_reloption, they can also add to their custom parsing table.

I'm fine removing the "validate" parameter completely for the sake of
consistency.

> I think this means that you can have tables with mistyped options,
> and
> you'd not get an error message about them.  Is that right?  Is that
> what
> we want?

No, mistyped options (e.g. "fillfactory=50") will still cause an error,
because there are two layers of validation.

The problem case would be a situation like the following:

1. Someone loads an extension that creates a new reloption
"custom_reloption" of kind RELOPT_KIND_HEAP for their table access
method "my_tableam". 

2. They then create a table and forget to specify "USING my_tableam",
but use the option "custom_reloption=123".

Ideally, that would throw an error because "custom_reloption" is only
valid for "my_tableam"; but with my patch, no error would be thrown
because the extension has already added the reloption. It would just
create a normal heap and "custom_reloption=123" would be ignored.



I went with the simple approach because fixing that problem seemed a
bit over-engineered. Here are some thoughts on what we could do:

* Consider StdRdOptions to be an extensible struct (where postgres just
looks at the StdRdOptions fields, but the extension can cast it to the
more specialized struct with extra fields). This is awkward, because
it's not a "normal" struct. Strings are represented as offsets that
point to memory past the end of the struct. We'd need an extra AM hook
that allocateReloptStruct can call into.

* We could refactor the validation logic so that extra options don't
count immediately as an error, but we instead save them in a different
array, and pass them to an AM-specific validation routine. But we have
no place to really put these options once they are parsed, because rel-
>rd_options is already holding the StdRdOptions struct, and there's no
separate place in the catalog for AM-specific reloptions. So the
extension would then need to re-parse them, and stash them in
rd_amcache or something.

* We could introduce a new AM routine that returns a custom
relopt_kind, created with add_reloption_kind(). Then, we could change
heap_reloptions to call default_reloptions like:

    return default_reloptions(reloptions, validate,
        RELOPT_KIND_HEAP|customReloptKind);

This still requires my patch, because we need to avoid the second
validation step. It also requires some extra code for TOAST tables,
because they can also be a custom table access method. The benefit over
my current patch is that extensions wouldn't be creating new options
for RELOPT_KIND_HEAP, they would create them only for their own custom
relopt kind, so if you try to use those options with a heap, you would
get an error.


Suggestions welcome.

Regards,
    Jeff Davis





Re: Reloptions for table access methods

From
gkokolatos@pm.me
Date:




‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, 1 September 2020 20:21, Jeff Davis <pgsql@j-davis.com> wrote:


>
> I'm fine removing the "validate" parameter completely for the sake of
> consistency.

FWIW, the more I think about this, I would agree with the removal.
However, isn't this mechanism used for other things too, e.g. attribute_reloptions,
tablespace_reloptions, that rely on the validation at that layer?. If my understanding
is correct, then simply removing the parameter would not cut it and a more extensive
refactoring would be needed.

>
> > [snip]
>
> The problem case would be a situation like the following:
>
> 1.  Someone loads an extension that creates a new reloption
>     "custom_reloption" of kind RELOPT_KIND_HEAP for their table access
>     method "my_tableam".
>
> 2.  They then create a table and forget to specify "USING my_tableam",
>     but use the option "custom_reloption=123".
>
>     Ideally, that would throw an error because "custom_reloption" is only
>     valid for "my_tableam"; but with my patch, no error would be thrown
>     because the extension has already added the reloption. It would just
>     create a normal heap and "custom_reloption=123" would be ignored.

This is something that I struggle to understand as an "error". In the example,
the set RELOPT_KIND_HEAP was extended for everyone. Regardless of whether
the newly added member will be used or not.

I mean, if the intention was to add reloptions specific to the extension,
shouldn't a new RELOPT_KIND_XXX be introduced? You seem to be thinking
along the same lines. Please, correct me if I understand you wrong.

What I am trying to say, is that with the current patch, I feel the behaviour
is not strange nor unexpected.


> I went with the simple approach because fixing that problem seemed a
> bit over-engineered.

Fixing that problem seems worth it on the long run. I do see the benefit of
the simple approach on the meantime.

//Georgios

>
>     Regards,
>     Jeff Davis
>





Re: Reloptions for table access methods

From
Simon Riggs
Date:
On Tue, 1 Sept 2020 at 18:21, Jeff Davis <pgsql@j-davis.com> wrote:

> I went with the simple approach because fixing that problem seemed a
> bit over-engineered. Here are some thoughts on what we could do:

The simple patch is admirable, but not something we should put into core.

I definitely don't agree with the premise that all existing heap
options should be common across all new or extension tableAMs. There
are dozens of such options and I don't believe that they would all
have meaning in all future storage plugins.

I think options should just work exactly the same for Indexes and Tables.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reloptions for table access methods

From
Jeff Davis
Date:
On Tue, 2020-12-15 at 17:37 +0000, Simon Riggs wrote:
> 
> I definitely don't agree with the premise that all existing heap
> options should be common across all new or extension tableAMs. There
> are dozens of such options and I don't believe that they would all
> have meaning in all future storage plugins.

I agree in principle, but looking at the options that are present
today, a lot of them are integrated into general database features,
like autovacuum, toast, logical replication, and parellel query
planning. 

We need to have some answer about how these features should interact
with a custom AM if we can't assume anything about the reloptions it
understands.

> I think options should just work exactly the same for Indexes and
> Tables.

How should that work with the existing code? Should we have separate AM
hooks for each of these interaction points, and then have the AM figure
out how to handle its options? What about the toast.* options?

It feels like some common options would make sense to avoid too much
code duplication.

I am not trying to push this in a specific direction, but I don't have
a lot of good answers, so I went for the simplest thing I could think
of that would allow an extension to have its own options, even if it's
a bit hacky. I'm open to alternatives.

Regards,
    Jeff Davis





Re: Reloptions for table access methods

From
Michael Paquier
Date:
On Tue, Dec 15, 2020 at 03:59:02PM -0800, Jeff Davis wrote:
> How should that work with the existing code? Should we have separate AM
> hooks for each of these interaction points, and then have the AM figure
> out how to handle its options? What about the toast.* options?

It seems to me that we would want a callback for options that is part
of TableAmRoutine to do the loading work, no?  The bigger picture is
more complex than that though because all in-core AMs rely on
build_reloptions() to do the work which itself depends on a et of
pre-existing reloptions (all the static relOpts array).  In the end,
I'd like to think that we should remove the dependency between relopt
build and initialization state, then switch all the different AMs to
do something similar to create_reloptions_table() in dummy_index_am.c
to define a new set of reloptions, except that we'd want to preload
all the options at backend startup or something similar to that for
all in-core AMs, for tables and indexes.

> It feels like some common options would make sense to avoid too much
> code duplication.

Having a set of options that can be plugged in to any AMs, like a set
of preset options for autovacuum or toast makes sense to have.

> I am not trying to push this in a specific direction, but I don't have
> a lot of good answers, so I went for the simplest thing I could think
> of that would allow an extension to have its own options, even if it's
> a bit hacky. I'm open to alternatives.

FWIW, I agree with Simon's feeling that bypassing a sanity check does
not feel like a good solution in the long term.
--
Michael

Attachment

Re: Reloptions for table access methods

From
Simon Riggs
Date:
On Tue, 15 Dec 2020 at 23:59, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2020-12-15 at 17:37 +0000, Simon Riggs wrote:
> >
> > I definitely don't agree with the premise that all existing heap
> > options should be common across all new or extension tableAMs. There
> > are dozens of such options and I don't believe that they would all
> > have meaning in all future storage plugins.
>
> I agree in principle, but looking at the options that are present
> today, a lot of them are integrated into general database features,
> like autovacuum, toast, logical replication, and parellel query
> planning.
>
> We need to have some answer about how these features should interact
> with a custom AM if we can't assume anything about the reloptions it
> understands.
>
> > I think options should just work exactly the same for Indexes and
> > Tables.
>
> How should that work with the existing code? Should we have separate AM
> hooks for each of these interaction points, and then have the AM figure
> out how to handle its options? What about the toast.* options?
>
> It feels like some common options would make sense to avoid too much
> code duplication.
>
> I am not trying to push this in a specific direction, but I don't have
> a lot of good answers, so I went for the simplest thing I could think
> of that would allow an extension to have its own options, even if it's
> a bit hacky. I'm open to alternatives.

Your argument seems to be that all new Table AMs should offer some
kind of support for these "standard" options, even if it is to accept
and then ignore them, which would presumably require them to document
that. If that is the case, then at very least offer a doc patch that
says that so people know what to do and doc comments describe how we
should treat new parameters in the future.

But you cannot seriously argue that a one line patch that changes user
visible behavior can be acceptable in Postgres core without tests or
docs or code comments. You know as a Committer that that position is
not sustainable.

Minor changes could be OK if they are complete. Please either push
forward to a usable commit or withdraw, so other options can be
considered.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reloptions for table access methods

From
Jeff Davis
Date:
On Wed, 2020-12-30 at 21:23 +0000, Simon Riggs wrote:
> But you cannot seriously argue that a one line patch that changes
> user
> visible behavior can be acceptable in Postgres core without tests or
> docs or code comments.

Hi Simon,

Often, good documented APIs come about after a few extensions gain
experience hacking things together using undocumented assumptions and
internal APIs. But in this case, extension authors have no opportunity
to hack in reloptions for a TableAM, because of this needless extra
check that my patch would eliminate.

The patch does not have any user-visible impact. To have any effect, an
extension would need to use these internal APIs in a specific way that
is not documented externally. I see my tiny patch as a tiny improvement
to the backend code because it eliminates a useless extra check, and
therefore doesn't need much justification. If others see it as a burden
on the engine code, that's a different story.

If we insist that this must be a fully-supported feature or nothing at
all, then I'll withdraw the patch. But I doubt that will result in a
proper feature for v14, it just means that when we do get around to
supporting extensible reloptions, there will be no hacks in the wild to
learn from.

Regards,
    Jeff Davis





Re: Reloptions for table access methods

From
Simon Riggs
Date:
On Sat, Jan 2, 2021 at 6:44 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Wed, 2020-12-30 at 21:23 +0000, Simon Riggs wrote:
> > But you cannot seriously argue that a one line patch that changes
> > user
> > visible behavior can be acceptable in Postgres core without tests or
> > docs or code comments.
>
> Hi Simon,
>
> Often, good documented APIs come about after a few extensions gain
> experience hacking things together using undocumented assumptions and
> internal APIs. But in this case, extension authors have no opportunity
> to hack in reloptions for a TableAM, because of this needless extra
> check that my patch would eliminate.
>
> The patch does not have any user-visible impact. To have any effect, an
> extension would need to use these internal APIs in a specific way that
> is not documented externally. I see my tiny patch as a tiny improvement
> to the backend code because it eliminates a useless extra check, and
> therefore doesn't need much justification. If others see it as a burden
> on the engine code, that's a different story.
>
> If we insist that this must be a fully-supported feature or nothing at
> all, then I'll withdraw the patch. But I doubt that will result in a
> proper feature for v14, it just means that when we do get around to
> supporting extensible reloptions, there will be no hacks in the wild to
> learn from.

Thanks for the reply. I'm trying to get my head around this before a
longer reply.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Reloptions for table access methods

From
David Steele
Date:
On 1/7/21 2:32 PM, Simon Riggs wrote:
> On Sat, Jan 2, 2021 at 6:44 PM Jeff Davis <pgsql@j-davis.com> wrote:
>>
>> On Wed, 2020-12-30 at 21:23 +0000, Simon Riggs wrote:
>>> But you cannot seriously argue that a one line patch that changes
>>> user
>>> visible behavior can be acceptable in Postgres core without tests or
>>> docs or code comments.
>>
>> Often, good documented APIs come about after a few extensions gain
>> experience hacking things together using undocumented assumptions and
>> internal APIs. But in this case, extension authors have no opportunity
>> to hack in reloptions for a TableAM, because of this needless extra
>> check that my patch would eliminate.
>>
>> The patch does not have any user-visible impact. To have any effect, an
>> extension would need to use these internal APIs in a specific way that
>> is not documented externally. I see my tiny patch as a tiny improvement
>> to the backend code because it eliminates a useless extra check, and
>> therefore doesn't need much justification. If others see it as a burden
>> on the engine code, that's a different story.
>>
>> If we insist that this must be a fully-supported feature or nothing at
>> all, then I'll withdraw the patch. But I doubt that will result in a
>> proper feature for v14, it just means that when we do get around to
>> supporting extensible reloptions, there will be no hacks in the wild to
>> learn from.
> 
> Thanks for the reply. I'm trying to get my head around this before a
> longer reply.

Simon, have you had time to formulate your thoughts on this patch?

Regards,
-- 
-David
david@pgmasters.net