Thread: Reloptions for table access methods
A custom table access method might want to add a new reloption to control something specific to that table access method. Unfortunately, if you add a new option of type RELOPT_KIND_HEAP, it will immediately fail because of the validation that happens in fillRelOptions(). Right now, heap reloptions (e.g. FILLFACTOR) are validated in two places: parseRelOptions() and fillRelOptions(). parseRelOptions() validates against boolRelOpts[], intRelOpts[], etc. This validation is extensible by add_bool_reloption(), etc. fillRelOptions() validates when filling in a struct to make sure there aren't "leftover" options. It does this using a hard-coded parsing table that is not extensible. Index access methods get total control over validation of reloptions, but that doesn't fit well with heaps, because all heaps need the vacuum-related options. I considered some other approaches, but they all seemed like over- engineering, so the attached patch just passes validate=false to fillRelOptions() for heaps. That allows custom table access methods to just define new options of kind RELOPT_KIND_HEAP. Regards, Jeff Davis
Attachment
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, 1 September 2020 09:18, Jeff Davis <pgsql@j-davis.com> wrote: > A custom table access method might want to add a new reloption to > control something specific to that table access method. Unfortunately, > if you add a new option of type RELOPT_KIND_HEAP, it will immediately > fail because of the validation that happens in fillRelOptions(). > > Right now, heap reloptions (e.g. FILLFACTOR) are validated in two > places: parseRelOptions() and fillRelOptions(). > > parseRelOptions() validates against boolRelOpts[], intRelOpts[], etc. > This validation is extensible by add_bool_reloption(), etc. > > fillRelOptions() validates when filling in a struct to make sure there > aren't "leftover" options. It does this using a hard-coded parsing > table that is not extensible. > > Index access methods get total control over validation of reloptions, > but that doesn't fit well with heaps, because all heaps need the > vacuum-related options. > > I considered some other approaches, but they all seemed like over- > engineering, so the attached patch just passes validate=false to > fillRelOptions() for heaps. That allows custom table access methods to > just define new options of kind RELOPT_KIND_HEAP. I have yet to look at the diff. I simply wanted to give you my wholehearted +1 to the idea. It is a necessary and an essentialpart of developing access methods. I have worked extensively in the merge of PG12 into Greenplum, with a focus to the tableam api. Handling reloptions has beenquite a pain to do cleanly. Given the nature of Greenplum's table access methods, we were forced to take it a coupleof steps further. We did use an approach which I am certain that you have considered and discarded as over-engineeringfor postgres. In short, I am really excited to see a patch for this topic! > > Regards, > Jeff Davis
On 2020-Aug-31, Jeff Davis wrote: > fillRelOptions() validates when filling in a struct to make sure there > aren't "leftover" options. It does this using a hard-coded parsing > table that is not extensible. Hmm, I think that if we're going to do this, we should do it for all AMs, not just table AMs, since surely index AMs also want extensible reloptions; and I think that makes the 'validate' mechanism dead code and so we should remove it. I think this means that you can have tables with mistyped options, and you'd not get an error message about them. Is that right? Is that what we want? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 2020-09-01 at 12:20 -0400, Alvaro Herrera wrote: > Hmm, I think that if we're going to do this, we should do it for all > AMs, not just table AMs, since surely index AMs also want extensible > reloptions; and I think that makes the 'validate' mechanism dead code > and so we should remove it. Index AMs totally control the validation, so as they add options with add_bool_reloption, they can also add to their custom parsing table. I'm fine removing the "validate" parameter completely for the sake of consistency. > I think this means that you can have tables with mistyped options, > and > you'd not get an error message about them. Is that right? Is that > what > we want? No, mistyped options (e.g. "fillfactory=50") will still cause an error, because there are two layers of validation. The problem case would be a situation like the following: 1. Someone loads an extension that creates a new reloption "custom_reloption" of kind RELOPT_KIND_HEAP for their table access method "my_tableam". 2. They then create a table and forget to specify "USING my_tableam", but use the option "custom_reloption=123". Ideally, that would throw an error because "custom_reloption" is only valid for "my_tableam"; but with my patch, no error would be thrown because the extension has already added the reloption. It would just create a normal heap and "custom_reloption=123" would be ignored. I went with the simple approach because fixing that problem seemed a bit over-engineered. Here are some thoughts on what we could do: * Consider StdRdOptions to be an extensible struct (where postgres just looks at the StdRdOptions fields, but the extension can cast it to the more specialized struct with extra fields). This is awkward, because it's not a "normal" struct. Strings are represented as offsets that point to memory past the end of the struct. We'd need an extra AM hook that allocateReloptStruct can call into. * We could refactor the validation logic so that extra options don't count immediately as an error, but we instead save them in a different array, and pass them to an AM-specific validation routine. But we have no place to really put these options once they are parsed, because rel- >rd_options is already holding the StdRdOptions struct, and there's no separate place in the catalog for AM-specific reloptions. So the extension would then need to re-parse them, and stash them in rd_amcache or something. * We could introduce a new AM routine that returns a custom relopt_kind, created with add_reloption_kind(). Then, we could change heap_reloptions to call default_reloptions like: return default_reloptions(reloptions, validate, RELOPT_KIND_HEAP|customReloptKind); This still requires my patch, because we need to avoid the second validation step. It also requires some extra code for TOAST tables, because they can also be a custom table access method. The benefit over my current patch is that extensions wouldn't be creating new options for RELOPT_KIND_HEAP, they would create them only for their own custom relopt kind, so if you try to use those options with a heap, you would get an error. Suggestions welcome. Regards, Jeff Davis
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, 1 September 2020 20:21, Jeff Davis <pgsql@j-davis.com> wrote: > > I'm fine removing the "validate" parameter completely for the sake of > consistency. FWIW, the more I think about this, I would agree with the removal. However, isn't this mechanism used for other things too, e.g. attribute_reloptions, tablespace_reloptions, that rely on the validation at that layer?. If my understanding is correct, then simply removing the parameter would not cut it and a more extensive refactoring would be needed. > > > [snip] > > The problem case would be a situation like the following: > > 1. Someone loads an extension that creates a new reloption > "custom_reloption" of kind RELOPT_KIND_HEAP for their table access > method "my_tableam". > > 2. They then create a table and forget to specify "USING my_tableam", > but use the option "custom_reloption=123". > > Ideally, that would throw an error because "custom_reloption" is only > valid for "my_tableam"; but with my patch, no error would be thrown > because the extension has already added the reloption. It would just > create a normal heap and "custom_reloption=123" would be ignored. This is something that I struggle to understand as an "error". In the example, the set RELOPT_KIND_HEAP was extended for everyone. Regardless of whether the newly added member will be used or not. I mean, if the intention was to add reloptions specific to the extension, shouldn't a new RELOPT_KIND_XXX be introduced? You seem to be thinking along the same lines. Please, correct me if I understand you wrong. What I am trying to say, is that with the current patch, I feel the behaviour is not strange nor unexpected. > I went with the simple approach because fixing that problem seemed a > bit over-engineered. Fixing that problem seems worth it on the long run. I do see the benefit of the simple approach on the meantime. //Georgios > > Regards, > Jeff Davis >
On Tue, 1 Sept 2020 at 18:21, Jeff Davis <pgsql@j-davis.com> wrote: > I went with the simple approach because fixing that problem seemed a > bit over-engineered. Here are some thoughts on what we could do: The simple patch is admirable, but not something we should put into core. I definitely don't agree with the premise that all existing heap options should be common across all new or extension tableAMs. There are dozens of such options and I don't believe that they would all have meaning in all future storage plugins. I think options should just work exactly the same for Indexes and Tables. -- Simon Riggs http://www.EnterpriseDB.com/
On Tue, 2020-12-15 at 17:37 +0000, Simon Riggs wrote: > > I definitely don't agree with the premise that all existing heap > options should be common across all new or extension tableAMs. There > are dozens of such options and I don't believe that they would all > have meaning in all future storage plugins. I agree in principle, but looking at the options that are present today, a lot of them are integrated into general database features, like autovacuum, toast, logical replication, and parellel query planning. We need to have some answer about how these features should interact with a custom AM if we can't assume anything about the reloptions it understands. > I think options should just work exactly the same for Indexes and > Tables. How should that work with the existing code? Should we have separate AM hooks for each of these interaction points, and then have the AM figure out how to handle its options? What about the toast.* options? It feels like some common options would make sense to avoid too much code duplication. I am not trying to push this in a specific direction, but I don't have a lot of good answers, so I went for the simplest thing I could think of that would allow an extension to have its own options, even if it's a bit hacky. I'm open to alternatives. Regards, Jeff Davis
On Tue, Dec 15, 2020 at 03:59:02PM -0800, Jeff Davis wrote: > How should that work with the existing code? Should we have separate AM > hooks for each of these interaction points, and then have the AM figure > out how to handle its options? What about the toast.* options? It seems to me that we would want a callback for options that is part of TableAmRoutine to do the loading work, no? The bigger picture is more complex than that though because all in-core AMs rely on build_reloptions() to do the work which itself depends on a et of pre-existing reloptions (all the static relOpts array). In the end, I'd like to think that we should remove the dependency between relopt build and initialization state, then switch all the different AMs to do something similar to create_reloptions_table() in dummy_index_am.c to define a new set of reloptions, except that we'd want to preload all the options at backend startup or something similar to that for all in-core AMs, for tables and indexes. > It feels like some common options would make sense to avoid too much > code duplication. Having a set of options that can be plugged in to any AMs, like a set of preset options for autovacuum or toast makes sense to have. > I am not trying to push this in a specific direction, but I don't have > a lot of good answers, so I went for the simplest thing I could think > of that would allow an extension to have its own options, even if it's > a bit hacky. I'm open to alternatives. FWIW, I agree with Simon's feeling that bypassing a sanity check does not feel like a good solution in the long term. -- Michael
Attachment
On Tue, 15 Dec 2020 at 23:59, Jeff Davis <pgsql@j-davis.com> wrote: > > On Tue, 2020-12-15 at 17:37 +0000, Simon Riggs wrote: > > > > I definitely don't agree with the premise that all existing heap > > options should be common across all new or extension tableAMs. There > > are dozens of such options and I don't believe that they would all > > have meaning in all future storage plugins. > > I agree in principle, but looking at the options that are present > today, a lot of them are integrated into general database features, > like autovacuum, toast, logical replication, and parellel query > planning. > > We need to have some answer about how these features should interact > with a custom AM if we can't assume anything about the reloptions it > understands. > > > I think options should just work exactly the same for Indexes and > > Tables. > > How should that work with the existing code? Should we have separate AM > hooks for each of these interaction points, and then have the AM figure > out how to handle its options? What about the toast.* options? > > It feels like some common options would make sense to avoid too much > code duplication. > > I am not trying to push this in a specific direction, but I don't have > a lot of good answers, so I went for the simplest thing I could think > of that would allow an extension to have its own options, even if it's > a bit hacky. I'm open to alternatives. Your argument seems to be that all new Table AMs should offer some kind of support for these "standard" options, even if it is to accept and then ignore them, which would presumably require them to document that. If that is the case, then at very least offer a doc patch that says that so people know what to do and doc comments describe how we should treat new parameters in the future. But you cannot seriously argue that a one line patch that changes user visible behavior can be acceptable in Postgres core without tests or docs or code comments. You know as a Committer that that position is not sustainable. Minor changes could be OK if they are complete. Please either push forward to a usable commit or withdraw, so other options can be considered. -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, 2020-12-30 at 21:23 +0000, Simon Riggs wrote: > But you cannot seriously argue that a one line patch that changes > user > visible behavior can be acceptable in Postgres core without tests or > docs or code comments. Hi Simon, Often, good documented APIs come about after a few extensions gain experience hacking things together using undocumented assumptions and internal APIs. But in this case, extension authors have no opportunity to hack in reloptions for a TableAM, because of this needless extra check that my patch would eliminate. The patch does not have any user-visible impact. To have any effect, an extension would need to use these internal APIs in a specific way that is not documented externally. I see my tiny patch as a tiny improvement to the backend code because it eliminates a useless extra check, and therefore doesn't need much justification. If others see it as a burden on the engine code, that's a different story. If we insist that this must be a fully-supported feature or nothing at all, then I'll withdraw the patch. But I doubt that will result in a proper feature for v14, it just means that when we do get around to supporting extensible reloptions, there will be no hacks in the wild to learn from. Regards, Jeff Davis
On Sat, Jan 2, 2021 at 6:44 PM Jeff Davis <pgsql@j-davis.com> wrote: > > On Wed, 2020-12-30 at 21:23 +0000, Simon Riggs wrote: > > But you cannot seriously argue that a one line patch that changes > > user > > visible behavior can be acceptable in Postgres core without tests or > > docs or code comments. > > Hi Simon, > > Often, good documented APIs come about after a few extensions gain > experience hacking things together using undocumented assumptions and > internal APIs. But in this case, extension authors have no opportunity > to hack in reloptions for a TableAM, because of this needless extra > check that my patch would eliminate. > > The patch does not have any user-visible impact. To have any effect, an > extension would need to use these internal APIs in a specific way that > is not documented externally. I see my tiny patch as a tiny improvement > to the backend code because it eliminates a useless extra check, and > therefore doesn't need much justification. If others see it as a burden > on the engine code, that's a different story. > > If we insist that this must be a fully-supported feature or nothing at > all, then I'll withdraw the patch. But I doubt that will result in a > proper feature for v14, it just means that when we do get around to > supporting extensible reloptions, there will be no hacks in the wild to > learn from. Thanks for the reply. I'm trying to get my head around this before a longer reply. -- Simon Riggs http://www.EnterpriseDB.com/
On 1/7/21 2:32 PM, Simon Riggs wrote: > On Sat, Jan 2, 2021 at 6:44 PM Jeff Davis <pgsql@j-davis.com> wrote: >> >> On Wed, 2020-12-30 at 21:23 +0000, Simon Riggs wrote: >>> But you cannot seriously argue that a one line patch that changes >>> user >>> visible behavior can be acceptable in Postgres core without tests or >>> docs or code comments. >> >> Often, good documented APIs come about after a few extensions gain >> experience hacking things together using undocumented assumptions and >> internal APIs. But in this case, extension authors have no opportunity >> to hack in reloptions for a TableAM, because of this needless extra >> check that my patch would eliminate. >> >> The patch does not have any user-visible impact. To have any effect, an >> extension would need to use these internal APIs in a specific way that >> is not documented externally. I see my tiny patch as a tiny improvement >> to the backend code because it eliminates a useless extra check, and >> therefore doesn't need much justification. If others see it as a burden >> on the engine code, that's a different story. >> >> If we insist that this must be a fully-supported feature or nothing at >> all, then I'll withdraw the patch. But I doubt that will result in a >> proper feature for v14, it just means that when we do get around to >> supporting extensible reloptions, there will be no hacks in the wild to >> learn from. > > Thanks for the reply. I'm trying to get my head around this before a > longer reply. Simon, have you had time to formulate your thoughts on this patch? Regards, -- -David david@pgmasters.net