Thread: Per-table storage parameters for TableAM/IndexAM extensions

Per-table storage parameters for TableAM/IndexAM extensions

From
Sadhuprasad Patro
Date:
Hi,

Currently all the storage options for a table are very much specific
to the heap but a different AM might need some user defined AM
specific parameters to help tune the AM. So here is a patch which
provides an AM level routine so that instead of getting parameters
validated using “heap_reloptions” it will call the registered AM
routine.

e.g:
-- create a new access method and table using this access method
CREATE ACCESS METHOD myam TYPE TABLE HANDLER <new_tableam_handler>;

CREATE TABLE mytest (a int) USING myam ;

--a new parameter is to set storage parameter for only myam as below
ALTER TABLE mytest(squargle_size = '100MB');

The user-defined parameters will have meaning only for the "myam",
otherwise error will be thrown. Our relcache already allows the
AM-specific cache to be stored for each relation.

Open Question: When a user changes AM, then what should be the
behavior for not supported storage options? Should we drop the options
and go with only system storage options?
Or throw an error, in which case the user has to clean the added parameters.

Thanks & Regards
SadhuPrasad
http://www.EnterpriseDB.com/

Attachment

Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Dilip Kumar
Date:
On Wed, Dec 29, 2021 at 10:38 PM Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
Hi,

Currently all the storage options for a table are very much specific
to the heap but a different AM might need some user defined AM
specific parameters to help tune the AM. So here is a patch which
provides an AM level routine so that instead of getting parameters
validated using “heap_reloptions” it will call the registered AM
routine.

+1 for the idea.
 

e.g:
-- create a new access method and table using this access method
CREATE ACCESS METHOD myam TYPE TABLE HANDLER <new_tableam_handler>;

CREATE TABLE mytest (a int) USING myam ;

--a new parameter is to set storage parameter for only myam as below
ALTER TABLE mytest(squargle_size = '100MB');

This will work for CREATE TABLE as well I guess as normal relation storage parameter works now right?


The user-defined parameters will have meaning only for the "myam",
otherwise error will be thrown. Our relcache already allows the
AM-specific cache to be stored for each relation.

Open Question: When a user changes AM, then what should be the
behavior for not supported storage options? Should we drop the options
and go with only system storage options?
Or throw an error, in which case the user has to clean the added parameters.

IMHO, if the user is changing the access method for the table then it should be fine to throw an error if there are some parameters which are not supported by the new AM.  So that user can take a calculative call and first remove those storage options before changing the AM.

I have a few comments on the patch, mostly cosmetics.

1.
+ Assert(routine->taboptions != NULL);

Why AM is not allowed to register the NULL function, if NULL is registered that means the AM
does not support any of the storage parameters.
2.
@@ -1358,6 +1358,7 @@ untransformRelOptions(Datum options)
  return result;
 }
 
+
 /*
  * Extract and parse reloptions from a pg_class tuple.
  *

Unwanted hunk (added extra line)

3.
+ * Parse options for heaps, views and toast tables. This is
+ * implementation of relOptions for access method heapam.
  */

Better to say access method heap instead of heapam.
4.
+ * Parse options for tables.
+ *
+ * taboptions tables AM's option parser function
+ *      reloptions options as text[] datum
+ *      validate error flag

Function header comment formatting is not proper, it also has uneven spacing between words.
5.
-extract_autovac_opts(HeapTuple tup, TupleDesc pg_class_desc)
+extract_autovac_opts(HeapTuple tup, TupleDesc pg_class_desc,
+ reloptions_function taboptions)

Indentation is not proper, run pgindent on this.

5.
>Currently all the storage options for a table are very much specific to the heap but a different AM might need some user defined AM specific parameters to help tune the AM. So here is a patch which provides an AM level routine so that instead of getting >parameters validated using “heap_reloptions” it will call the registered AM routine.

Wrap these long commit message lines at 80 characters.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Rushabh Lathia
Date:


On Wed, Dec 29, 2021 at 10:38 PM Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
Hi,

Currently all the storage options for a table are very much specific
to the heap but a different AM might need some user defined AM
specific parameters to help tune the AM. So here is a patch which
provides an AM level routine so that instead of getting parameters
validated using “heap_reloptions” it will call the registered AM
routine.


This is a good idea. +1.

e.g:
-- create a new access method and table using this access method
CREATE ACCESS METHOD myam TYPE TABLE HANDLER <new_tableam_handler>;

CREATE TABLE mytest (a int) USING myam ;

--a new parameter is to set storage parameter for only myam as below
ALTER TABLE mytest(squargle_size = '100MB');

I syntax here is,  ALTER TABLE <table_name> SET ( attribute_option = value );


The user-defined parameters will have meaning only for the "myam",
otherwise error will be thrown. Our relcache already allows the
AM-specific cache to be stored for each relation.

Open Question: When a user changes AM, then what should be the
behavior for not supported storage options? Should we drop the options
and go with only system storage options?
Or throw an error, in which case the user has to clean the added parameters.

I think throwing an error makes more sense, so that the user can clean that. 

Here are a few quick cosmetic review comments:

1)
@@ -1372,7 +1373,8 @@ untransformRelOptions(Datum options)
  */
 bytea *
 extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
-  amoptions_function amoptions)
+ amoptions_function amoptions,
+ reloptions_function taboptions)

Indentation has been changed and needs to be fixed.

2)

  case RELKIND_MATVIEW:
            options = table_reloptions(taboptions, classForm->relkind, datum, false);
break;

Going beyond line limit.

3)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 9befe01..6324d7e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2581,6 +2581,7 @@ static const TableAmRoutine heapam_methods = {
  .index_build_range_scan = heapam_index_build_range_scan,
  .index_validate_scan = heapam_index_validate_scan,
 
+ .taboptions = heap_reloptions,

Instead of taboptions can name this as relation_options to be in sink with other members. 

4) 

@@ -2427,7 +2428,7 @@ do_autovacuum(void)
  */
  MemoryContextSwitchTo(AutovacMemCxt);
  tab = table_recheck_autovac(relid, table_toast_map, pg_class_desc,
- effective_multixact_freeze_max_age);
+ classRel->rd_tableam->taboptions, effective_multixact_freeze_max_age);
  if (tab == NULL)

Split the another added parameter to function in the next line.
 
5)

Overall patch has many indentation issues, I would suggest running the
pgindent to fix those.



Regards
Rushabh Lathia

Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Jelte Fennema
Date:
Big +1, this is a great addition!

I think it would be very useful if there were some tests for this new
feature. Something similar to the tests for storage parameters for
index AMs in src/test/modules/dummy_index_am.

Apart from that I think the documentation for table storage parameters
needs to be updated in doc/src/sgml/ref/create_table.sgml. It now
needs to indicate that these parameters are different for each table
access method. Similar to this paragraph in the create index storage
parameter section of the docs:

> Each index method has its own set of allowed storage parameters.
> The B-tree, hash, GiST and SP-GiST index methods all accept this
> parameter

Jelte



Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Sadhuprasad Patro
Date:
On Mon, Jan 17, 2022 at 4:47 PM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> Big +1, this is a great addition!
>
> I think it would be very useful if there were some tests for this new
> feature. Something similar to the tests for storage parameters for
> index AMs in src/test/modules/dummy_index_am.
>
Sure, I will refer to the index AM test and add the test cases needed.

> Apart from that I think the documentation for table storage parameters
> needs to be updated in doc/src/sgml/ref/create_table.sgml. It now
> needs to indicate that these parameters are different for each table
> access method. Similar to this paragraph in the create index storage
> parameter section of the docs:

Sure, I will add the documentation part for this.

As of now, I have fixed the comments from Dilip & Rushabh and have
done some more changes after internal testing and review. Please find
the latest patch attached.

Thanks & Regards
SadhuPrasad
www.EnterpriseDB.com

Attachment

Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Jeff Davis
Date:
On Tue, 2022-01-18 at 22:44 +0530, Sadhuprasad Patro wrote:
> As of now, I have fixed the comments from Dilip & Rushabh and have
> done some more changes after internal testing and review. Please find
> the latest patch attached.

Hi,

Thank you for working on this! Some questions/comments:

At a high level, it seems there are some options that are common to all
tables, regardless of the AM. For instance, the vacuum/autovacuum
options. (Even if the AM doesn't require vacuum, then it needs to at
least be able to communicate that somehow.) I think parallel_workers
and user_catalog_table also fit into this category. That means we need
all of StdRdOptions to be the same, with the possible exception of
toast_tuple_target and/or fillfactor.

The current patch just leaves it up to the AM to return a bytea that
can be cast to StdRdOptions, which seems like a fragile API.

That makes me think that what we really want is to have *extra* options
for a table AM, not an entirely custom set. Do you agree?

If so, I suggest you refactor so that if validation doesn't recognize a
parameter, it calls a table AM method to validate it, and lets it in if
validation succeeds. That way all the stuff around StdRdOptions is
unchanged. When the table AM needs the parameter value, it can parse
pg_class.reloptions for itself and save it in rd_amcache.

Regards,
    Jeff Davis





Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Robert Haas
Date:
On Wed, Dec 29, 2021 at 12:08 PM Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
> Open Question: When a user changes AM, then what should be the
> behavior for not supported storage options? Should we drop the options
> and go with only system storage options?
> Or throw an error, in which case the user has to clean the added parameters.

A few people have endorsed the error behavior here but I foresee problems.

Imagine that I am using the "foo" tableam with "compression=lots" and
I want to switch to the "bar" AM which does not support that option.
If I remove the "compression=lots" option using a separate command,
the "foo" table AM may rewrite my whole table and decompress
everything. Then when I convert to the "bar" AM it's going to have to
be rewritten again. That's painful. I clearly need some way to switch
AMs without having to rewrite the table twice.

It's also interesting to consider the other direction. If I am
switching from "bar" to "foo" I would really like to be able to add
the "compression=lots" option at the same time I make the switch.
There needs to be some syntax for that.

One way to solve the first of these problem is to silently drop
unsupported options. Maybe a better way is to have syntax that allows
you to specify options to be added and removed at the time you switch
AMs e.g.:

ALTER TABLE mytab SET ACCESS METHOD bar OPTIONS (DROP compression);
ALTER TABLE mytab SET ACCESS METHOD foo OPTIONS (ADD compression 'lots');

I don't like that particular syntax a ton personally but it does match
what we already use for ALTER SERVER. Unfortunately it's wildly
inconsistent with what we do for ALTER TABLE. Another idea might be
something like:

ALTER TABLE mytab SET ACCESS METHOD bar RESET compression;
ALTER TABLE mytab SET ACCESS METHOD foo SET compression = 'lots';

You'd need to be able to do multiple things with one command e.g.

ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
preferred_fruit = 'banana';

Regardless of the details, I don't think it's viable to just say,
well, rewrite the table multiple times if that's what it takes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Dilip Kumar
Date:
On Sat, Feb 12, 2022 at 2:35 AM Robert Haas <robertmhaas@gmail.com> wrote:

Imagine that I am using the "foo" tableam with "compression=lots" and
I want to switch to the "bar" AM which does not support that option.
If I remove the "compression=lots" option using a separate command,
the "foo" table AM may rewrite my whole table and decompress
everything. Then when I convert to the "bar" AM it's going to have to
be rewritten again. That's painful. I clearly need some way to switch
AMs without having to rewrite the table twice.

I agree with you, if we force users to drop the option as a separate command then we will have to rewrite the table twice.
 
It's also interesting to consider the other direction. If I am
switching from "bar" to "foo" I would really like to be able to add
the "compression=lots" option at the same time I make the switch.
There needs to be some syntax for that.

One way to solve the first of these problem is to silently drop
unsupported options. Maybe a better way is to have syntax that allows
you to specify options to be added and removed at the time you switch
AMs e.g.:

+1


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Sadhuprasad Patro
Date:
> On Sat, Feb 12, 2022 at 2:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>
>> Imagine that I am using the "foo" tableam with "compression=lots" and
>> I want to switch to the "bar" AM which does not support that option.
>> If I remove the "compression=lots" option using a separate command,
>> the "foo" table AM may rewrite my whole table and decompress
>> everything. Then when I convert to the "bar" AM it's going to have to
>> be rewritten again. That's painful. I clearly need some way to switch
>> AMs without having to rewrite the table twice.
>
Agreed. Better to avoid multiple rewrites here. Thank you for figuring out this.


> You'd need to be able to do multiple things with one command e.g.

> ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
> preferred_fruit = 'banana';

+1
Silently dropping some options is not right and it may confuse users
too. So I would like to go
for the command you have suggested, where the user should be able to
SET & RESET multiple
options in a single command for an object.

Thanks & Regards
SadhuPrasad
http://www.enterprisedb.com



Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Simon Riggs
Date:
On Thu, 17 Feb 2022 at 17:55, Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
>
> > On Sat, Feb 12, 2022 at 2:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >>
> >>
> >> Imagine that I am using the "foo" tableam with "compression=lots" and
> >> I want to switch to the "bar" AM which does not support that option.
> >> If I remove the "compression=lots" option using a separate command,
> >> the "foo" table AM may rewrite my whole table and decompress
> >> everything. Then when I convert to the "bar" AM it's going to have to
> >> be rewritten again. That's painful. I clearly need some way to switch
> >> AMs without having to rewrite the table twice.
> >
> Agreed. Better to avoid multiple rewrites here. Thank you for figuring out this.
>
>
> > You'd need to be able to do multiple things with one command e.g.
>
> > ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
> > preferred_fruit = 'banana';
>
> +1
> Silently dropping some options is not right and it may confuse users
> too. So I would like to go
> for the command you have suggested, where the user should be able to
> SET & RESET multiple
> options in a single command for an object.

I prefer ADD/DROP to SET/RESET. The latter doesn't convey the meaning
accurately to me.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Sadhuprasad Patro
Date:
On Fri, Feb 18, 2022 at 10:48 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Thu, 17 Feb 2022 at 17:55, Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
> >
> > > On Sat, Feb 12, 2022 at 2:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > >>
> > >>
> > >> Imagine that I am using the "foo" tableam with "compression=lots" and
> > >> I want to switch to the "bar" AM which does not support that option.
> > >> If I remove the "compression=lots" option using a separate command,
> > >> the "foo" table AM may rewrite my whole table and decompress
> > >> everything. Then when I convert to the "bar" AM it's going to have to
> > >> be rewritten again. That's painful. I clearly need some way to switch
> > >> AMs without having to rewrite the table twice.
> > >
> > Agreed. Better to avoid multiple rewrites here. Thank you for figuring out this.
> >
> >
> > > You'd need to be able to do multiple things with one command e.g.
> >
> > > ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
> > > preferred_fruit = 'banana';
> >
> > +1
> > Silently dropping some options is not right and it may confuse users
> > too. So I would like to go
> > for the command you have suggested, where the user should be able to
> > SET & RESET multiple
> > options in a single command for an object.
>
> I prefer ADD/DROP to SET/RESET. The latter doesn't convey the meaning
> accurately to me.

I have added a dummy test module for table AM and did the document
change in the latest patch attached...
The Next plan is to provide users to change the AM storage parameters
swiftly through a single command. I will work on the same and give
another version.

As of now I will go with the ADD/DROP keywords for "ALTER TABLE" command.

Thanks & Regards
SadhuPrasad
http://www.EnterpriseDB.com

Attachment

Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Andres Freund
Date:
Hi,

On 2022-02-24 12:26:08 +0530, Sadhuprasad Patro wrote:
> I have added a dummy test module for table AM and did the document
> change in the latest patch attached...

The test module doesn't build on windows, unfortunately... Looks like you need
to add PGDLLIMPORT to a few variables:
[01:26:18.539] c:\cirrus\src\test\modules\dummy_table_am\dummy_table_am.c(488): warning C4700: uninitialized local
variable'rel' used [c:\cirrus\dummy_table_am.vcxproj]
 
[01:26:18.539] dummy_table_am.obj : error LNK2001: unresolved external symbol synchronize_seqscans
[c:\cirrus\dummy_table_am.vcxproj]
[01:26:18.539] .\Debug\dummy_table_am\dummy_table_am.dll : fatal error LNK1120: 1 unresolved externals
[c:\cirrus\dummy_table_am.vcxproj]
[01:26:18.539]     1 Warning(s)
[01:26:18.539]     2 Error(s)

https://cirrus-ci.com/task/5067519584108544?logs=build#L2085

Marked the CF entry as waiting-on-author.

Greetings,

Andres



Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Sadhuprasad Patro
Date:
On Tue, Mar 22, 2022 at 7:24 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-02-24 12:26:08 +0530, Sadhuprasad Patro wrote:
> > I have added a dummy test module for table AM and did the document
> > change in the latest patch attached...
>
> The test module doesn't build on windows, unfortunately... Looks like you need
> to add PGDLLIMPORT to a few variables:
> [01:26:18.539] c:\cirrus\src\test\modules\dummy_table_am\dummy_table_am.c(488): warning C4700: uninitialized local
variable'rel' used [c:\cirrus\dummy_table_am.vcxproj]
 
> [01:26:18.539] dummy_table_am.obj : error LNK2001: unresolved external symbol synchronize_seqscans
[c:\cirrus\dummy_table_am.vcxproj]
> [01:26:18.539] .\Debug\dummy_table_am\dummy_table_am.dll : fatal error LNK1120: 1 unresolved externals
[c:\cirrus\dummy_table_am.vcxproj]
> [01:26:18.539]     1 Warning(s)
> [01:26:18.539]     2 Error(s)
>
> https://cirrus-ci.com/task/5067519584108544?logs=build#L2085
>
> Marked the CF entry as waiting-on-author.

HI,
Thank you for the feedback Andres. I will take care of the same.

As of now attached is a new patch on this to support the addition of
new option parameters or drop the old parameters through ALTER TABLE
command.
Need some more testing on this, which is currently in progress.
Providing the patch to get early feedback in case of any major
comments...

New Command:
ALTER TABLE name SET ACCESS METHOD amname [ OPTIONS ( ADD | DROP
option 'value' [, ... ] ) ];


Thanks & Regards
SadhuPrasad
http://www.EnterpriseDB.com

Attachment

Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Greg Stark
Date:
This patch still has code warnings on the cfbot and I don't think
they're platform-specific:

[00:28:28.236] gram.y: In function ‘base_yyparse’:
[00:28:28.236] gram.y:2851:58: error: passing argument 2 of
‘makeDefElemExtended’ from incompatible pointer type
[-Werror=incompatible-pointer-types]
[00:28:28.236] 2851 | $$ = makeDefElemExtended(NULL, $2, NULL,
DEFELEM_DROP, @2);
[00:28:28.236] | ~~~~~~~~~^~~~~~~
[00:28:28.236] | |
[00:28:28.236] | DefElem *
[00:28:28.236] In file included from gram.y:58:
[00:28:28.236] ../../../src/include/nodes/makefuncs.h:102:60: note:
expected ‘char *’ but argument is of type ‘DefElem *’
[00:28:28.236] 102 | extern DefElem *makeDefElemExtended(char
*nameSpace, char *name, Node *arg,
[00:28:28.236] | ~~~~~~^~~~

I gather the patch is still a WIP and ideally we would want to give
feedback on patches in CFs when the author's are looking for it but
this is the last week before feature freeze and the main focus is on
committable patches. I'll move it to next CF.



Re: Per-table storage parameters for TableAM/IndexAM extensions

From
Jacob Champion
Date:
This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

    https://commitfest.postgresql.org/38/3495/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob