Thread: ALTER TABLE SET ACCESS METHOD on partitioned tables
Hello,
This is a fresh thread to continue the discussion on ALTER TABLE SET
ACCESS METHOD when applied to partition roots, as requested.
Current behavior (HEAD):
CREATE TABLE am_partitioned(x INT, y INT)
PARTITION BY hash (x);
ALTER TABLE am_partitioned SET ACCESS METHOD heap2;
ERROR: cannot change access method of a partitioned table
Potential behavior options:
(A) Don't recurse to existing children and ensure that the new am gets
inherited by any new children. (ALTER TABLE SET TABLESPACE behavior)
(B) Recurse to existing children and modify their am. Also, ensure that
any new children inherit the new am.
A patch [1] was introduced earlier by Justin to implement
(A). v1-0001-Allow-ATSETAM-on-partition-roots.patch contains a rebase
of that patch against latest HEAD, with minor updates on comments and
some additional test coverage.
I think that (B) is necessary for partition hierarchies with a high
number of partitions. One typical use case in Greenplum, for
instance, is to convert heap tables containing cold data to append-only
storage at the root or subroot level of partition hierarchies consisting
of thousands of partitions. Asking users to ALTER individual partitions
is cumbersome and error-prone.
Furthermore, I believe that (B) should be the default and (A) can be
chosen by using the ONLY clause. This would give us the best of both
worlds and would make the use of ONLY consistent. The patch
v1-0002-Make-ATSETAM-recurse-by-default.patch achieves that.
Thoughts?
Regards,
Soumyadeep (VMware)
[1] https://www.postgresql.org/message-id/20210308010707.GA29832%40telsasoft.com
This is a fresh thread to continue the discussion on ALTER TABLE SET
ACCESS METHOD when applied to partition roots, as requested.
Current behavior (HEAD):
CREATE TABLE am_partitioned(x INT, y INT)
PARTITION BY hash (x);
ALTER TABLE am_partitioned SET ACCESS METHOD heap2;
ERROR: cannot change access method of a partitioned table
Potential behavior options:
(A) Don't recurse to existing children and ensure that the new am gets
inherited by any new children. (ALTER TABLE SET TABLESPACE behavior)
(B) Recurse to existing children and modify their am. Also, ensure that
any new children inherit the new am.
A patch [1] was introduced earlier by Justin to implement
(A). v1-0001-Allow-ATSETAM-on-partition-roots.patch contains a rebase
of that patch against latest HEAD, with minor updates on comments and
some additional test coverage.
I think that (B) is necessary for partition hierarchies with a high
number of partitions. One typical use case in Greenplum, for
instance, is to convert heap tables containing cold data to append-only
storage at the root or subroot level of partition hierarchies consisting
of thousands of partitions. Asking users to ALTER individual partitions
is cumbersome and error-prone.
Furthermore, I believe that (B) should be the default and (A) can be
chosen by using the ONLY clause. This would give us the best of both
worlds and would make the use of ONLY consistent. The patch
v1-0002-Make-ATSETAM-recurse-by-default.patch achieves that.
Thoughts?
Regards,
Soumyadeep (VMware)
[1] https://www.postgresql.org/message-id/20210308010707.GA29832%40telsasoft.com
Attachment
Hi,
+ tup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ accessMethodId = ((Form_pg_class) GETSTRUCT(tup))->relam;
- /* look up the access method, verify it is for a table */
- if (accessMethod != NULL)
- accessMethodId = get_table_am_oid(accessMethod, false);
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ accessMethodId = ((Form_pg_class) GETSTRUCT(tup))->relam;
- /* look up the access method, verify it is for a table */
- if (accessMethod != NULL)
- accessMethodId = get_table_am_oid(accessMethod, false);
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
Shouldn't the validity of tup be checked before relam field is accessed ?
Cheers
Thanks for copying me. I didn't look closely yet, but this comment is wrong: + * Since these have no storage the tablespace can be updated with a simple + * metadata only operation to update the tablespace. As I see it, AMs are a strong parallel to tablespaces. The default tablespace is convenient: 1) explicitly specified tablespace; 2) tablespace of parent, partitioned table; 3) DB tablespace; 4) default_tablespace: https://www.postgresql.org/message-id/20190423222633.GA8364%40alvherre.pgsql It'd be convenient if AMs worked the same way (and a bit odd that they don't). Note that in v15, pg_dump/restore now allow --no-table-am, an exact parallel to --no-tablespace. -- Justin
On Wed, May 18, 2022 at 4:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I didn't look closely yet, but this comment is wrong:
>
> + * Since these have no storage the tablespace can be updated with a simple
> + * metadata only operation to update the tablespace.
Good catch. Fixed.
> It'd be convenient if AMs worked the same way (and a bit odd that they don't).
> Note that in v15, pg_dump/restore now allow --no-table-am, an exact parallel to
> --no-tablespace.
I agree that ATSET AM should behave in a similar fashion to ATSET tablespaces.
However, the way that ATSET tablespace currently behaves is not consistent with
the ONLY clause.
>
> + * Since these have no storage the tablespace can be updated with a simple
> + * metadata only operation to update the tablespace.
Good catch. Fixed.
> It'd be convenient if AMs worked the same way (and a bit odd that they don't).
> Note that in v15, pg_dump/restore now allow --no-table-am, an exact parallel to
> --no-tablespace.
I agree that ATSET AM should behave in a similar fashion to ATSET tablespaces.
However, the way that ATSET tablespace currently behaves is not consistent with
the ONLY clause.
On a given partition root:
ALTER TABLE ONLY am_partitioned SET TABLESPACE ts;
has the same effect as:
ALTER TABLE am_partitioned SET TABLESPACE ts;
We are missing out on the feature to set the AM/tablespace throughout the
partition hierarchy, with one command.
Regards,
Soumyadeep (VMware)
Attachment
On Wed, May 18, 2022 at 5:49 PM Soumyadeep Chakraborty <soumyadeep2007@gmail.com> wrote:
On Wed, May 18, 2022 at 4:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:> I didn't look closely yet, but this comment is wrong:
>
> + * Since these have no storage the tablespace can be updated with a simple
> + * metadata only operation to update the tablespace.
Good catch. Fixed.
> It'd be convenient if AMs worked the same way (and a bit odd that they don't).
> Note that in v15, pg_dump/restore now allow --no-table-am, an exact parallel to
> --no-tablespace.
I agree that ATSET AM should behave in a similar fashion to ATSET tablespaces.
However, the way that ATSET tablespace currently behaves is not consistent with
the ONLY clause.
On a given partition root:
ALTER TABLE ONLY am_partitioned SET TABLESPACE ts;
has the same effect as:
ALTER TABLE am_partitioned SET TABLESPACE ts;
We are missing out on the feature to set the AM/tablespace throughout the
partition hierarchy, with one command.
Regards,
Soumyadeep (VMware)
Hi,
- /* look up the access method, verify it is for a table */
- if (accessMethod != NULL)
- accessMethodId = get_table_am_oid(accessMethod, false);
+ if (!HeapTupleIsValid(tup))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
The validity check of tup should be done before fetching the value of relam field.
Cheers
On Wed, May 18, 2022 at 6:26 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> + accessMethodId = ((Form_pg_class) GETSTRUCT(tup))->relam;
>
> - /* look up the access method, verify it is for a table */
> - if (accessMethod != NULL)
> - accessMethodId = get_table_am_oid(accessMethod, false);
> + if (!HeapTupleIsValid(tup))
> + elog(ERROR, "cache lookup failed for relation %u", relid);
>
> The validity check of tup should be done before fetching the value of relam field.
>
> - /* look up the access method, verify it is for a table */
> - if (accessMethod != NULL)
> - accessMethodId = get_table_am_oid(accessMethod, false);
> + if (!HeapTupleIsValid(tup))
> + elog(ERROR, "cache lookup failed for relation %u", relid);
>
> The validity check of tup should be done before fetching the value of relam field.
Thanks. Fixed and rebased.
Regards,
Soumyadeep (VMware)
Attachment
On Thu, Jun 09, 2022 at 11:21:58AM -0700, Soumyadeep Chakraborty wrote: > Thanks. Fixed and rebased. I think that I am OK with the concept of this patch to use a partitioned table's relam as a reference when creating a partition rather than relying on the default GUC, in a way similar to tablespaces. One worry I see is that forcing a recursion on the leaves on ALTER TABLE could silently break partitions where multiple table AMs are used across separate partitions if ALTER TABLE SET ACCESS METHOD is used on one of the parents, though it seems like this is not something I would much worry about as now the command is an error. A second worry is that we would just break existing creation flows that rely on the GUC defining the default AM. This is worth a close lookup at pg_dump to make sure that we do things correctly with this patch in place.. Did you check dump and restore flows with partition trees and --no-table-access-method? Perhaps there should be some regression tests with partitioned tables? + /* + * For partitioned tables, when no access method is specified, we + * default to the parent table's AM. + */ + Assert(list_length(inheritOids) == 1); + /* XXX: should implement get_rel_relam? */ + relid = linitial_oid(inheritOids); + tup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); + if (!HeapTupleIsValid(tup)) + elog(ERROR, "cache lookup failed for relation %u", relid); Having a wrapper for that could be useful, yes. We don't have any code paths that would directly need that now, from what I can see, though. This patch gives one reason to have one. -- Michael
Attachment
On Mon, Mar 20, 2023 at 09:30:50AM +0900, Michael Paquier wrote: > Did you check dump and restore flows with partition > trees and --no-table-access-method? Perhaps there should be > some regression tests with partitioned tables? I was looking at the patch, and as I suspected the dumps generated are forgetting to apply the AM to the partitioned tables. For example, assuming that the default AM is heap, the patch creates child_0_10 with heap2 as AM, which is what we want: CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler; CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id) USING heap2; CREATE TABLE child_0_10 PARTITION OF parent_tab FOR VALUES FROM (0) TO (10); However a dump produces that (content cut except for its most relevant parts): CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler; SET default_tablespace = ''; CREATE TABLE public.parent_tab ( id integer ) PARTITION BY RANGE (id); SET default_table_access_method = heap2; CREATE TABLE public.child_0_10 ( id integer ); This would restore the previous contents incorrectly, where parent_tab would use heap and child_0_10 would use heap2, causing any partitions created after the restore to use silently heap. This is going to require a logic similar to tablespaces, where generate SET commands on default_table_access_method so as --no-table-access-method in pg_dump and pg_restore are able to work correctly. Having tests to cover all that is a must-have. -- Michael
Attachment
On Tue, Mar 28, 2023 at 09:13:10AM +0900, Michael Paquier wrote: > On Mon, Mar 20, 2023 at 09:30:50AM +0900, Michael Paquier wrote: > > Did you check dump and restore flows with partition > > trees and --no-table-access-method? Perhaps there should be > > some regression tests with partitioned tables? > > I was looking at the patch, and as I suspected the dumps generated > are forgetting to apply the AM to the partitioned tables. The patch said: + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) pg_dump was missing a similar change that's conditional on RELKIND_HAS_TABLE_AM(). It looks like those are the only two places that need be be specially handled. I dug up my latest patch from 2021 and incorporated the changes from the 0001 patch here, and added a test case. I realized that one difference with tablespaces is that, as written, partitioned tables will *always* have an AM specified, and partitions will never use default_table_access_method. Is that what's intended ? Or do we need logic similar tablespaces, such that the relam of a partitioned table is set only if it differs from default_table_am ? -- Justin
Attachment
On Mon, Mar 27, 2023 at 11:34:35PM -0500, Justin Pryzby wrote: > I realized that one difference with tablespaces is that, as written, > partitioned tables will *always* have an AM specified, and partitions > will never use default_table_access_method. Is that what's intended ? > > Or do we need logic similar tablespaces, such that the relam of a > partitioned table is set only if it differs from default_table_am ? Hmm. This is a good point. It is true that the patch feels incomplete on this side. I don't see why we could not be flexible, and allow a value of 0 in a partitioned table's relam to mean that we would pick up the database default in this case when a partition is is created on it. This would have the advantage to be consistent with older versions where we fallback on the default. We cannot be completely consistent with the reltablespace of the leaf partitions unfortunately, as relam should always be set if a relation has storage. And allowing a value of 0 means that there are likely other tricky cases with dumps? Another thing: would it make sense to allow an empty string in default_table_access_method so as we'd always fallback to a database default? -- Michael
Attachment
On Tue, Mar 28, 2023 at 02:56:28PM +0900, Michael Paquier wrote: > Hmm. This is a good point. It is true that the patch feels > incomplete on this side. I don't see why we could not be flexible, > and allow a value of 0 in a partitioned table's relam to mean that we > would pick up the database default in this case when a partition is > is created on it. This would have the advantage to be consistent with > older versions where we fallback on the default. We cannot be > completely consistent with the reltablespace of the leaf partitions > unfortunately, as relam should always be set if a relation has > storage. And allowing a value of 0 means that there are likely other > tricky cases with dumps? > > Another thing: would it make sense to allow an empty string in > default_table_access_method so as we'd always fallback to a database > default? FYI, I am not sure that I will be able to look more at this patch by the end of the commit fest, and there are quite more points to consider. Perhaps at this stage we'd better mark it as returned with feedback? I understand that I've arrived late at this party :/ -- Michael
Attachment
On Mon, Mar 27, 2023 at 11:34:36PM -0500, Justin Pryzby wrote: > On Tue, Mar 28, 2023 at 09:13:10AM +0900, Michael Paquier wrote: > > On Mon, Mar 20, 2023 at 09:30:50AM +0900, Michael Paquier wrote: > > > Did you check dump and restore flows with partition > > > trees and --no-table-access-method? Perhaps there should be > > > some regression tests with partitioned tables? > > > > I was looking at the patch, and as I suspected the dumps generated > > are forgetting to apply the AM to the partitioned tables. > > The patch said: > > + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) > > pg_dump was missing a similar change that's conditional on > RELKIND_HAS_TABLE_AM(). It looks like those are the only two places > that need be be specially handled. > > I dug up my latest patch from 2021 and incorporated the changes from the > 0001 patch here, and added a test case. > > I realized that one difference with tablespaces is that, as written, > partitioned tables will *always* have an AM specified, and partitions > will never use default_table_access_method. Is that what's intended ? > > Or do we need logic similar tablespaces, such that the relam of a > partitioned table is set only if it differs from default_table_am ? Actually .. I think it'd be a mistake if the relam needed to be interpretted differently for partitioned tables than other relkinds. I updated the patch to allow intermediate partitioned tables to inherit relam from their parent. Michael wrote: > .. and there are quite more points to consider. What more points ? There's nothing that's been raised here. In fact, your message last week is the first comment since last June .. -- Justin
Attachment
On Thu, Mar 30, 2023 at 12:07:58AM -0500, Justin Pryzby wrote: > On Mon, Mar 27, 2023 at 11:34:36PM -0500, Justin Pryzby wrote: > > On Tue, Mar 28, 2023 at 09:13:10AM +0900, Michael Paquier wrote: > > > On Mon, Mar 20, 2023 at 09:30:50AM +0900, Michael Paquier wrote: > > > > Did you check dump and restore flows with partition > > > > trees and --no-table-access-method? Perhaps there should be > > > > some regression tests with partitioned tables? > > > > > > I was looking at the patch, and as I suspected the dumps generated > > > are forgetting to apply the AM to the partitioned tables. > > > > The patch said: > > > > + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) > > > > pg_dump was missing a similar change that's conditional on > > RELKIND_HAS_TABLE_AM(). It looks like those are the only two places > > that need be be specially handled. > > > > I dug up my latest patch from 2021 and incorporated the changes from the > > 0001 patch here, and added a test case. > > > > I realized that one difference with tablespaces is that, as written, > > partitioned tables will *always* have an AM specified, and partitions > > will never use default_table_access_method. Is that what's intended ? > > > > Or do we need logic similar tablespaces, such that the relam of a > > partitioned table is set only if it differs from default_table_am ? > > Actually .. I think it'd be a mistake if the relam needed to be > interpretted differently for partitioned tables than other relkinds. > > I updated the patch to allow intermediate partitioned tables to inherit > relam from their parent. > > Michael wrote: > > .. and there are quite more points to consider. > > What more points ? There's nothing that's been raised here. In fact, > your message last week is the first comment since last June .. Michael ?
On Thu, Mar 30, 2023 at 12:07:58AM -0500, Justin Pryzby wrote: > What more points ? There's nothing that's been raised here. In fact, > your message last week is the first comment since last June .. When I wrote this message, I felt like this may still be missing something in the area of dump/restore. Perhaps my feeling on the matter is wrong, so consider this as a self-reminder not to be taken seriously until I can have a closer look at what's proposed here for v17. :p -- Michael
Attachment
On Mon, Apr 24, 2023 at 07:18:54PM -0500, Justin Pryzby wrote: > On Thu, Mar 30, 2023 at 12:07:58AM -0500, Justin Pryzby wrote: >> Actually .. I think it'd be a mistake if the relam needed to be >> interpretted differently for partitioned tables than other relkinds. >> >> I updated the patch to allow intermediate partitioned tables to inherit >> relam from their parent. > > Michael ? Sorry for dropping the subject for so long. I have spent some time looking at the patch. Here are a few comments. pg_class.h includes the following: /* * Relation kinds that support tablespaces: All relation kinds with storage * support tablespaces, except that we don't support moving sequences around * into different tablespaces. Partitioned tables and indexes don't have * physical storage, but they have a tablespace settings so that their * children can inherit it. */ #define RELKIND_HAS_TABLESPACE(relkind) \ ((RELKIND_HAS_STORAGE(relkind) || RELKIND_HAS_PARTITIONS(relkind)) \ && (relkind) != RELKIND_SEQUENCE) [...] /* * Relation kinds with a table access method (rd_tableam). Although sequences * use the heap table AM, they are enough of a special case in most uses that * they are not included here. */ #define RELKIND_HAS_TABLE_AM(relkind) \ ((relkind) == RELKIND_RELATION || \ (relkind) == RELKIND_TOASTVALUE || \ (relkind) == RELKIND_MATVIEW) It would look much more consistent with the tablespace case if we included partitioned tables in this case, but this refers to rd_tableam for the relcache which we surely don't want to fill for partitioned tables. I guess that at minimum a comment is in order? RELKIND_HAS_TABLE_AM() is much more spread than RELKIND_HAS_TABLESPACE(). * No need to add an explicit dependency for the toast table, as the * main table depends on it. */ - if (RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) + if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) || + relkind == RELKIND_PARTITIONED_TABLE) The comment at the top of this code block needs an update. if (stmt->accessMethod != NULL) + accessMethodId = get_table_am_oid(stmt->accessMethod, false); else if (stmt->partbound && + (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE)) { + /* + * For partitions, if no access method is specified, default to the AM + * of the parent table. + */ + Assert(list_length(inheritOids) == 1); + accessMethodId = get_rel_relam(linitial_oid(inheritOids)); + if (!OidIsValid(accessMethodId)) + accessMethodId = get_table_am_oid(default_table_access_method, false); } + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) + accessMethodId = get_table_am_oid(default_table_access_method, false); This structure seems a bit weird. Could it be cleaner to group the second and third blocks together? I would imagine: if (accessMethod != NULL) { //Extract the AM defined in the statement } else { //This is a relkind that can use a default table AM. if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) { if (stmt->partbound) { //This is a partition, so look at what its partitioned //table holds. } else { //No partition, grab the default. } } } + /* + * Only do this for partitioned tables, for which this is just a + * catalog change. Tables with storage are handled by Phase 3. + */ + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + ATExecSetAccessMethodNoStorage(rel, tab->newAccessMethod); Okay, there is a parallel with tablespaces in this logic. Speaking of which, ATExecSetAccessMethodNoStorage() does a catalog update even if ALTER TABLE is defined to use the same table AM as what is currently set. There is no need to update the relation's pg_class entry in this case. Be careful that InvokeObjectPostAlterHook() still needs to be checked in this case. Perhaps some tests should be added in test_oat_hooks.sql? It would be tempted to add a new SQL file for that. + else if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + /* Do nothing: it's a catalog settings for partitions to inherit */ + } Actually, we could add an assertion telling that rd_rel->relam will always be valid. - if (RELKIND_HAS_TABLE_AM(tbinfo->relkind)) + if (RELKIND_HAS_TABLE_AM(tbinfo->relkind) || + tbinfo->relkind == RELKIND_PARTITIONED_TABLE) tableam = tbinfo->amname; I have spent some time pondering on this particular change, concluding that it should be OK. It passes my tests, as well. +-- partition hierarchies +-- upon ALTER, new children will inherit the new am, whereas the existing +-- children will remain untouched CREATE TABLE am_partitioned(x INT, y INT) PARTITION BY hash (x); +CREATE TABLE am_partitioned_1 PARTITION OF am_partitioned FOR VALUES WITH (MODULUS 3,REMAINDER 0); +CREATE TABLE am_partitioned_2 PARTITION OF am_partitioned FOR VALUES WITH (MODULUS 3,REMAINDER 1); +ALTER TABLE am_partitioned_1 SET ACCESS METHOD heap2; ALTER TABLE am_partitioned SET ACCESS METHOD heap2; Hmm. I think that we should rewrite a bit this test rather than just adding contents on top of it. There is also an extra test I would be interesting in here: a partition tree with 2 levels of depth, an ALTER TABLE SET ACCESS METHOD run at level 1 on a partitioned table, and some new partitions attached to it to check that the new partitions inherit from the level 1 partitioned table, not the top-parent. alter_table.sgml should be updated to explain what happens when SET ACCESS METHOD is applied on a partitioned table. See for example SET TABLESPACE that explains what happens to partitions created afterwards, telling that there is no rewrite in this case. The regression test added to check pg_dump with a partition tree and the two table AMs was mixed with an existing one, but it seems to me that it should be independent of the rest? I have tweaked that as in the attached, on the way, using one partition that relies on the default defined by the parent, and a second that has a USING clause on heap. I did not touch the rest of the code. -- Michael
Attachment
On Thu, May 25, 2023 at 03:49:12PM +0900, Michael Paquier wrote: > looking at the patch. Here are a few comments. ... > * No need to add an explicit dependency for the toast table, as the > * main table depends on it. > */ > - if (RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) > + if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) || > + relkind == RELKIND_PARTITIONED_TABLE) > > The comment at the top of this code block needs an update. What do you think the comment ought to say ? It already says: src/backend/catalog/heap.c- * Make a dependency link to force the relation to be deleted if its src/backend/catalog/heap.c- * access method is. > Speaking of which, ATExecSetAccessMethodNoStorage() does a catalog > update even if ALTER TABLE is defined to use the same table AM as what > is currently set. There is no need to update the relation's pg_class > entry in this case. Be careful that InvokeObjectPostAlterHook() still > needs to be checked in this case. Perhaps some tests should be added > in test_oat_hooks.sql? It would be tempted to add a new SQL file for > that. Are you suggesting to put this in a conditional: if oldrelam!=newAccessMethod ? + ((Form_pg_class) GETSTRUCT(tuple))->relam = newAccessMethod; + CatalogTupleUpdate(pg_class, &tuple->t_self, tuple); + + /* Update dependency on new AM */ + changeDependencyFor(RelationRelationId, relid, AccessMethodRelationId, + oldrelam, newAccessMethod); Why is that desirable ? I'd prefer to see it written with fewer conditionals, not more; then, it hits the same path every time. This ought to address your other comments. -- Justin
Attachment
On Wed, May 31, 2023 at 06:35:34PM -0500, Justin Pryzby wrote: > What do you think the comment ought to say ? It already says: > > src/backend/catalog/heap.c- * Make a dependency link to force the relation to be deleted if its > src/backend/catalog/heap.c- * access method is. This is the third location where we rely on the fact that RELKIND_HAS_TABLE_AM() does not include RELKIND_PARTITIONED_TABLE, so it seems worth documenting what we are relying on as a comment? Say: * Make a dependency link to force the relation to be deleted if its * access method is. * * No need to add an explicit dependency for the toast table, as the * main table depends on it. Partitioned tables have a table access * method defined, and RELKIND_HAS_TABLE_AM ignores them. >> Speaking of which, ATExecSetAccessMethodNoStorage() does a catalog >> update even if ALTER TABLE is defined to use the same table AM as what >> is currently set. There is no need to update the relation's pg_class >> entry in this case. Be careful that InvokeObjectPostAlterHook() still >> needs to be checked in this case. Perhaps some tests should be added >> in test_oat_hooks.sql? It would be tempted to add a new SQL file for >> that. > > Are you suggesting to put this in a conditional: if oldrelam!=newAccessMethod ? Yes, that's what I would add with a few lines close to the beginning of ATExecSetTableSpaceNoStorage() to save from catalog updates if these are not needed. -- Michael
Attachment
Have you had a chance to address the comments raised by Michael in his last review such that a new patch revision can be submitted? -- Daniel Gustafsson
On Thu, Jun 01, 2023 at 08:50:50AM -0400, Michael Paquier wrote: > >> Speaking of which, ATExecSetAccessMethodNoStorage() does a catalog > >> update even if ALTER TABLE is defined to use the same table AM as what > >> is currently set. There is no need to update the relation's pg_class > >> entry in this case. Be careful that InvokeObjectPostAlterHook() still > >> needs to be checked in this case. Perhaps some tests should be added > >> in test_oat_hooks.sql? It would be tempted to add a new SQL file for > >> that. > > > > Are you suggesting to put this in a conditional: if oldrelam!=newAccessMethod ? > > Yes, that's what I would add with a few lines close to the beginning > of ATExecSetTableSpaceNoStorage() to save from catalog updates if > these are not needed. I understand that it's possible for it to be conditional, but I don't undertand why it's desirable/important ? It still seems preferable to be unconditional. On Wed, May 31, 2023 at 06:35:34PM -0500, Justin Pryzby wrote: >> Why is that desirable ? I'd prefer to see it written with fewer >> conditionals, not more; then, it hits the same path every time. It's not conditional in the tablespace code that this follows, nor others like ATExecSetStatistics(), ATExecForceNoForceRowSecurity(), ATExecSetCompression(), etc, some of which are recently-added. If it's important to do here, isn't it also important to do everywhere else ? -- Justin
On Sun, Jul 16, 2023 at 09:37:28PM -0500, Justin Pryzby wrote: > I understand that it's possible for it to be conditional, but I don't > undertand why it's desirable/important ? Because it's cheaper on repeated commands, like no CCI necessary. > It's not conditional in the tablespace code that this follows, nor > others like ATExecSetStatistics(), ATExecForceNoForceRowSecurity(), > ATExecSetCompression(), etc, some of which are recently-added. If it's > important to do here, isn't it also important to do everywhere else ? Good point here. I am OK to discard this point. -- Michael
Attachment
On Thu, Jun 01, 2023 at 08:50:50AM -0400, Michael Paquier wrote: > On Wed, May 31, 2023 at 06:35:34PM -0500, Justin Pryzby wrote: > > What do you think the comment ought to say ? It already says: > > > > src/backend/catalog/heap.c- * Make a dependency link to force the relation to be deleted if its > > src/backend/catalog/heap.c- * access method is. > > This is the third location where we rely on the fact that > RELKIND_HAS_TABLE_AM() does not include RELKIND_PARTITIONED_TABLE, so > it seems worth documenting what we are relying on as a comment? Say: > * Make a dependency link to force the relation to be deleted if its > * access method is. > * > * No need to add an explicit dependency for the toast table, as the > * main table depends on it. Partitioned tables have a table access > * method defined, and RELKIND_HAS_TABLE_AM ignores them. You said that this location "relies on" the macro not including partitioned tables, but I would say the opposite: the places that use RELKIND_HAS_TABLE_AM() and do *not* say "or relkind==PARTITIONED_TABLE" are the ones that "rely on" that... Anyway, this updates various comments. No other changes. -- Justin
Attachment
2024-01 Commitfest. Hi, this patch was marked in CF as "Needs Review" [1], but there has been no activity on this thread for 6+ months. Is anything else planned, or can you post something to elicit more interest in reviews for the latest patch? Otherwise, if nothing happens then the CF entry will be closed ("Returned with feedback") at the end of this CF. ====== [1] https://commitfest.postgresql.org/46/3727/ Kind Regards, Peter Smith. Fujitsu Australia
> @@ -947,20 +947,22 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, > * a type of relation that needs one, use the default. > */ > if (stmt->accessMethod != NULL) > + accessMethodId = get_table_am_oid(stmt->accessMethod, false); > + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) > { > - accessMethod = stmt->accessMethod; > - > - if (partitioned) > - ereport(ERROR, > - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > - errmsg("specifying a table access method is not supported on a partitioned table"))); > + if (stmt->partbound) > + { > + /* > + * For partitions, if no access method is specified, use the AM of > + * the parent table. > + */ > + Assert(list_length(inheritOids) == 1); > + accessMethodId = get_rel_relam(linitial_oid(inheritOids)); > + Assert(OidIsValid(accessMethodId)); > + } > + else > + accessMethodId = get_table_am_oid(default_table_access_method, false); > } I think this works similarly but not identically to tablespace defaults, and the difference could be confusing. You seem to have made it so that the partitioned table _always_ have a table AM, so the partitions can always inherit from it. I think it would be more sensible to _allow_ partitioned tables to have one, but not mandatory; if they don't have it, then a partition created from it would use default_table_access_method. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'. After collecting 500 such letters, he mused, a university somewhere in Arizona would probably grant him a degree. (Don Knuth)
On Thu, Feb 01, 2024 at 04:50:49PM +0100, Alvaro Herrera wrote: > I think this works similarly but not identically to tablespace defaults, > and the difference could be confusing. You seem to have made it so that > the partitioned table _always_ have a table AM, so the partitions can > always inherit from it. I think it would be more sensible to _allow_ > partitioned tables to have one, but not mandatory; if they don't have > it, then a partition created from it would use default_table_access_method. You mean to allow a value of 0 in pg_class.relam on a partitioned table to allow any partitions created on it to use the default AM in the GUC when the partition is created? Yes, this inconsistency was bothering me as well in the patch. -- Michael
Attachment
On Thu, Feb 1, 2024 at 10:51 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I think this works similarly but not identically to tablespace defaults, > and the difference could be confusing. You seem to have made it so that > the partitioned table _always_ have a table AM, so the partitions can > always inherit from it. I think it would be more sensible to _allow_ > partitioned tables to have one, but not mandatory; if they don't have > it, then a partition created from it would use default_table_access_method. I agree that we don't want this feature to invent any new behavior. If it's clearly and fully parallel to what we do for tablespaces, then I think it's probably OK, but anything less than that would be a cause for concern for me. -- Robert Haas EDB: http://www.enterprisedb.com
On 19.07.23 20:13, Justin Pryzby wrote: > On Thu, Jun 01, 2023 at 08:50:50AM -0400, Michael Paquier wrote: >> On Wed, May 31, 2023 at 06:35:34PM -0500, Justin Pryzby wrote: >>> What do you think the comment ought to say ? It already says: >>> >>> src/backend/catalog/heap.c- * Make a dependency link to force the relation to be deleted if its >>> src/backend/catalog/heap.c- * access method is. >> >> This is the third location where we rely on the fact that >> RELKIND_HAS_TABLE_AM() does not include RELKIND_PARTITIONED_TABLE, so >> it seems worth documenting what we are relying on as a comment? Say: >> * Make a dependency link to force the relation to be deleted if its >> * access method is. >> * >> * No need to add an explicit dependency for the toast table, as the >> * main table depends on it. Partitioned tables have a table access >> * method defined, and RELKIND_HAS_TABLE_AM ignores them. > > You said that this location "relies on" the macro not including > partitioned tables, but I would say the opposite: the places that use > RELKIND_HAS_TABLE_AM() and do *not* say "or relkind==PARTITIONED_TABLE" > are the ones that "rely on" that... > > Anyway, this updates various comments. No other changes. It would be helpful if this patch could more extensively document in its commit message what semantic changes it makes. Various options of possible behaviors were discussed in this thread, but it's not clear which behaviors were chosen in this particular patch version. The general idea is that you can set an access method on a partitioned table. That much seems very agreeable. But then what happens with this setting, how can you override it, how can you change it, what happens when you change it, what happens with existing partitions and new partitions, etc. -- and which of these behaviors are new and old. Many things to specify.
On Tue, Feb 20, 2024 at 03:47:46PM +0100, Peter Eisentraut wrote: > It would be helpful if this patch could more extensively document in its > commit message what semantic changes it makes. Various options of possible > behaviors were discussed in this thread, but it's not clear which behaviors > were chosen in this particular patch version. > > The general idea is that you can set an access method on a partitioned > table. That much seems very agreeable. But then what happens with this > setting, how can you override it, how can you change it, what happens when > you change it, what happens with existing partitions and new partitions, > etc. -- and which of these behaviors are new and old. Many things to > specify. The main point in this patch is the following code block in DefineRelation(), that defines the semantics about the AM set for a partitioned table: + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) { + if (stmt->partbound) + { + /* + * For partitions, if no access method is specified, use the AM of + * the parent table. + */ + Assert(list_length(inheritOids) == 1); + accessMethodId = get_rel_relam(linitial_oid(inheritOids)); + Assert(OidIsValid(accessMethodId)); + } + else + accessMethodId = get_table_am_oid(default_table_access_method, false); } This means that all partitioned tables would have pg_class.relam set, and that relam would never be 0: - The USING clause takes priority over default_table_access_method. - If no USING clause, default_table_access_method is the AM used Any partitions created from this partitioned table would inherit the AM set, ignoring default_table_access_method. Alvaro has made a very good point a couple of days ago at [1] where we should try to make the behavior stick closer to tablespaces, where it could be possible to set relam to 0 for a partitioned table, where a partition would inherit the AM set in the GUC when a USING clause is not defined (if USING specifies the AM, we'd just use it). Existing partitions should not be changed if the AM of their partitioned table changes, so you can think of the AM as a hint for the creation of new partitions. [1]: https://www.postgresql.org/message-id/202402011550.sfszd46247zi@alvherre.pgsql -- Michael
Attachment
On 21.02.24 07:40, Michael Paquier wrote: > This means that all partitioned tables would have pg_class.relam set, > and that relam would never be 0: > - The USING clause takes priority over default_table_access_method. > - If no USING clause, default_table_access_method is the AM used > > Any partitions created from this partitioned table would inherit the > AM set, ignoring default_table_access_method. > > Alvaro has made a very good point a couple of days ago at [1] where we > should try to make the behavior stick closer to tablespaces, where it > could be possible to set relam to 0 for a partitioned table, where a > partition would inherit the AM set in the GUC when a USING clause is > not defined (if USING specifies the AM, we'd just use it). Yes, I think most people agreed that that would be the preferred behavior.
On Wed, Feb 21, 2024 at 08:46:48AM +0100, Peter Eisentraut wrote: > Yes, I think most people agreed that that would be the preferred behavior. Challenge accepted. As of the patch attached. Tablespaces rely MyDatabaseTableSpace to fallback to the database's default if not specified, but we cannot do that for table AMs as there is no equivalent to dattablespace. I have implemented that so as we keep the default, historical behavior: if pg_class.relam is 0 for a partitioned table, use the AM defined by default_table_access_method. The patch only adds a path to switch to a different AM than the GUC when creating a new partition if and only if a partitioned table has been manipulated with ALTER TABLE SET ACCESS METHOD to update its AM to something else than the GUC. Similarly to tablespaces, CREATE TABLE USING is *not* supported for partitioned tables, same behavior as previously. There is a bit more regarding the handling of the entries in pg_depend, but nothing really complicated, knowing that there can be three possible patterns: - Add a new dependency if changing the AM to be something different than the GUC. - Remove the dependency if changing the AM to the value of the GUC, when something existing previously. - Update the dependency if switching between AMs that don't refer to the GUC at all. If the AM of a partitioned table is not changed, there is no need to update the catalogs at all. The prep phase of the sub-command is already aware of that, setting the new AM OID to InvalidOid in this case. The attached includes regression tests that check all the dependency entries, the contents of pg_class for partitioned tables, as well as the creation of partitions when pg_class.relam is not 0. I'd welcome more eyes regarding these changes. pg_dump needs to be tweaked to save the AM information of a partitioned table, like the previous versions. There are tests for these dump patterns, that needed a slight tweak to work. Docs have been refreshed. Thoughts, comments? -- Michael
Attachment
On Wed, Feb 28, 2024 at 05:08:49PM +0900, Michael Paquier wrote: > On Wed, Feb 21, 2024 at 08:46:48AM +0100, Peter Eisentraut wrote: > > Yes, I think most people agreed that that would be the preferred behavior. > > Challenge accepted. As of the patch attached. Thanks for picking it up. I find it pretty hard to switch back to put the needed effort into a patch after a long period. > I have implemented that so as we keep the default, historical > behavior: if pg_class.relam is 0 for a partitioned table, use the AM > defined by default_table_access_method. The patch only adds a path to > switch to a different AM than the GUC when creating a new partition if > and only if a partitioned table has been manipulated with ALTER TABLE > SET ACCESS METHOD to update its AM to something else than the GUC. > Similarly to tablespaces, CREATE TABLE USING is *not* supported for > partitioned tables, same behavior as previously. This patch allows resetting relam=0 by running ALTER TABLE SET AM to the same value as the GUC. Maybe it'd be better to have an explicit SET DEFAULT (as in b9424d01 and 4f622503). -- Justin
On Thu, Feb 29, 2024 at 08:51:31AM -0600, Justin Pryzby wrote: > On Wed, Feb 28, 2024 at 05:08:49PM +0900, Michael Paquier wrote: >> I have implemented that so as we keep the default, historical >> behavior: if pg_class.relam is 0 for a partitioned table, use the AM >> defined by default_table_access_method. The patch only adds a path to >> switch to a different AM than the GUC when creating a new partition if >> and only if a partitioned table has been manipulated with ALTER TABLE >> SET ACCESS METHOD to update its AM to something else than the GUC. >> Similarly to tablespaces, CREATE TABLE USING is *not* supported for >> partitioned tables, same behavior as previously. > > This patch allows resetting relam=0 by running ALTER TABLE SET AM to the > same value as the GUC. Maybe it'd be better to have an explicit SET > DEFAULT (as in b9424d01 and 4f622503). Outside the scope of this patch's thread, this looks like a good idea even for tables/matviews. And the semantics are pretty easy: if DEFAULT is specified, just set the access method to NULL in the parser and let tablecmds.c go the AM OID lookup in the prep phase if set to NULL. See 0001 attached. This one looks pretty good taken as an independent piece. When it comes to partitioned tables, there is a still a tricky case: what should we do when a user specifies a non-default value in the SET ACCESS METHOD clause and it matches default_table_access_method? Should the relam be 0 or should we force relam to be the OID of the given value given by the query? Implementation-wise, forcing the value to 0 is simpler, but I can get why it could be confusing as well, because the state of the catalogs does not reflect what was provided in the query. At the same time, the user has explicitly set the access method to be the same as the default, so perhaps 0 makes sense anyway in this case. 0002 does that, as that's simpler. I'm not sure if there is a case for forcing a value in relam if the query has the same value as the default. Thoughts? -- Michael
Attachment
On Fri, 1 Mar 2024 at 02:57, Michael Paquier <michael@paquier.xyz> wrote: > When it comes to partitioned tables, there is a still a tricky case: > what should we do when a user specifies a non-default value in the SET > ACCESS METHOD clause and it matches default_table_access_method? > Should the relam be 0 or should we force relam to be the OID of the > given value given by the query? Implementation-wise, forcing the > value to 0 is simpler, but I can get why it could be confusing as > well, because the state of the catalogs does not reflect what was > provided in the query. At the same time, the user has explicitly set > the access method to be the same as the default, so perhaps 0 makes > sense anyway in this case. I think we should set the AM OID explicitly. Because an important thing to consider is: What behaviour makes sense when later default_table_access_method is changed? I think if someone sets it explicitly on the partitioned table, they would want the AM of the partitioned table to stay the same when default_table_access_method is changed. Which requires storing the AM OID afaict.
On Fri, Mar 01, 2024 at 05:43:25AM +0100, Jelte Fennema-Nio wrote: > I think we should set the AM OID explicitly. Because an important > thing to consider is: What behaviour makes sense when later > default_table_access_method is changed? Per the latest discussion of the thread, we've kind of reached a consensus that we should keep the current historical bevahior on default, where relam remains at 0, causing new partitions to grab the GUC as AM. If we create a partitioned table attached to a partitioned table, it should be 0 as well. If the partitioned table has a non-0 relam, a new partitioned table created on it will inherit the same non-0 value. > I think if someone sets it explicitly on the partitioned table, they > would want the AM of the partitioned table to stay the same when > default_table_access_method is changed. Which requires storing the AM > OID afaict. If we allow relam to be non-0 for a partitioned table, it is equally important to give users a way to reset it at will. My point was a bit more subtle than that. For example, this sequence is clear to me: SET default_table_access_method = 'foo'; ALTER TABLE part SET ACCESS METHOD DEFAULT; The user wants to rely on the GUC, so relam should be 0, new partitions created on it will use the GUC. Now, what should this sequence mean? See: SET default_table_access_method = 'foo'; ALTER TABLE part SET ACCESS METHOD foo; Should the relam be 0 because the user requested a match with the GUC, or use the OID of the AM? There has to be some difference with tablespaces, because relations with physical storage (tables, matviews) can use a reltablespace of 0, but AMs have to be set for tables and matviews. Fun topic, especially once coupled with the internals of tablecmds.c that uses InvalidOid for the new access AM as a special value to work as a no-op. -- Michael
Attachment
On Fri, Mar 01, 2024 at 05:43:25AM +0100, Jelte Fennema-Nio wrote: > I think we should set the AM OID explicitly. Because an important > thing to consider is: What behaviour makes sense when later > default_table_access_method is changed? > I think if someone sets it explicitly on the partitioned table, they > would want the AM of the partitioned table to stay the same when > default_table_access_method is changed. Which requires storing the AM > OID afaict. Oops, I think I misread that. You just mean to always set relam when using an AM in the SET ACCESS METHOD clause. Apologies for the noise. -- Michael
Attachment
On Fri, 1 Mar 2024 at 06:15, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Mar 01, 2024 at 05:43:25AM +0100, Jelte Fennema-Nio wrote: > > I think we should set the AM OID explicitly. Because an important > > thing to consider is: What behaviour makes sense when later > > default_table_access_method is changed? > > I think if someone sets it explicitly on the partitioned table, they > > would want the AM of the partitioned table to stay the same when > > default_table_access_method is changed. Which requires storing the AM > > OID afaict. > > Oops, I think I misread that. You just mean to always set relam when > using an AM in the SET ACCESS METHOD clause. Apologies for the noise. Correct, I intended to say that "SET ACCESS METHOD heap" on a partitioned table should store heap its OID. Because while storing 0 might be simpler, it will result in (imho) wrong behaviour when later the default_table_access_method is changed. behavior won't result in the (imho) intended. i.e. it's not simply a small detail in what the catolog looks like, but there's an actual behavioural change.
On Fri, Mar 01, 2024 at 10:56:50AM +0900, Michael Paquier wrote: > On Thu, Feb 29, 2024 at 08:51:31AM -0600, Justin Pryzby wrote: > > On Wed, Feb 28, 2024 at 05:08:49PM +0900, Michael Paquier wrote: > >> I have implemented that so as we keep the default, historical > >> behavior: if pg_class.relam is 0 for a partitioned table, use the AM > >> defined by default_table_access_method. The patch only adds a path to > >> switch to a different AM than the GUC when creating a new partition if > >> and only if a partitioned table has been manipulated with ALTER TABLE > >> SET ACCESS METHOD to update its AM to something else than the GUC. > >> Similarly to tablespaces, CREATE TABLE USING is *not* supported for > >> partitioned tables, same behavior as previously. > > > > This patch allows resetting relam=0 by running ALTER TABLE SET AM to the > > same value as the GUC. Maybe it'd be better to have an explicit SET > > DEFAULT (as in b9424d01 and 4f622503). > > Outside the scope of this patch's thread, this looks like a good idea > even for tables/matviews. And the semantics are pretty easy: if DEFAULT > is specified, just set the access method to NULL in the parser and let > tablecmds.c go the AM OID lookup in the prep phase if set to NULL. > See 0001 attached. This one looks pretty good taken as an independent > piece. > > When it comes to partitioned tables, there is a still a tricky case: > what should we do when a user specifies a non-default value in the SET > ACCESS METHOD clause and it matches default_table_access_method? I don't think it's tricky - it seems more like a weird hack in the previous patch version to make AMs behave like tablespaces, despite not being completely parallel, due to the absence of a pg_default AM. With the new 001, the hack can go away, and so it should. > Should the relam be 0 or should we force relam to be the OID of the > given value given by the query? You said "force" it to be the user-specified value, but I think that's not "forcing", it's respecting (but to take the user's desired value, and conditionally store 0 instead, that could be described as "forcing.") > Implementation-wise, forcing the value to 0 is simpler, but I can get > why it could be confusing as well, because the state of the catalogs > does not reflect what was provided in the query. > At the same time, the user has explicitly set the access method to be > the same as the default, so perhaps 0 makes sense anyway in this case. I think if the user sets something "explicitly", the catalog should reflect what they set. Tablespaces have dattablespace, but AMs don't -- it's a simpler case. For 001: we don't *need* to support "ALTER SET AM default" for leaf tables. It doesn't do anything that's not already possible. But, if AMs for partitioned tables are optional rather than required, then seems to be needed to allow (re)settinng relam=0. But for partitioned tables, I think it should set relam=0 directly. Currently it 1) falls through to default_table_am; and 2) detects that it's the default, so then sets relam to 0. Since InvalidOid On Fri, Mar 01, 2024 at 02:03:48PM +0900, Michael Paquier wrote: > Fun topic, especially once coupled with the internals of tablecmds.c > that uses InvalidOid for the new access AM as a special value to work > as a no-op. Since InvalidOid is already taken, I guess you might need to introduce a boolean flag, like set_relam, indicating that the statement has an ACCESS METHOD clause. > + * method defined so as their children can inherit it; however, this is handled so that > + * Do nothing: access methods is a setting that partitions can method (singular), or s/is/are/ On Wed, Feb 28, 2024 at 05:08:49PM +0900, Michael Paquier wrote: > Similarly to tablespaces, CREATE TABLE USING is *not* supported for > partitioned tables, same behavior as previously. Maybe I misunderstood what you're trying to say, but CREATE..TABLESPACE *is* supported for partitioned tables. I'm not sure why it wouldn't be supported to set the AM, too. In any case, it'd be a bit confusing for the error message to still say: postgres=# CREATE TABLE a(i int) PARTITION BY RANGE(a) USING heap2; ERROR: specifying a table access method is not supported on a partitioned table -- Justin
On Fri, Mar 01, 2024 at 03:03:14PM -0600, Justin Pryzby wrote: > I think if the user sets something "explicitly", the catalog should > reflect what they set. Tablespaces have dattablespace, but AMs don't -- > it's a simpler case. Okay. > For 001: we don't *need* to support "ALTER SET AM default" for leaf > tables. It doesn't do anything that's not already possible. But, if > AMs for partitioned tables are optional rather than required, then seems > to be needed to allow (re)settinng relam=0. Indeed, for non-partitioned tables DEFAULT is a sugar flavor. Not mandatory, still it's nice to have to not have to type an AM. > But for partitioned tables, I think it should set relam=0 directly. > Currently it 1) falls through to default_table_am; and 2) detects that > it's the default, so then sets relam to 0. > > Since InvalidOid is already taken, I guess you might need to introduce a > boolean flag, like set_relam, indicating that the statement has an > ACCESS METHOD clause. Yes, I don't see an alternative. The default needs a different field to be tracked down to the execution. >> + * method defined so as their children can inherit it; however, this is handled > > so that > >> + * Do nothing: access methods is a setting that partitions can > > method (singular), or s/is/are/ Indeed. Fixed both. > In any case, it'd be a bit confusing for the error message to still say: > > postgres=# CREATE TABLE a(i int) PARTITION BY RANGE(a) USING heap2; > ERROR: specifying a table access method is not supported on a partitioned table I was looking at this one as well and I don't see why we could not remove it, so you are right (missed the tablespace part last week). A partitioned table created as a partition of a partitioned table would inherit the relam of its parent (0 if default is set, or non-0 is something is set). I have added some regression tests for that. And I'm finishing with the attached. To summarize SET ACCESS METHOD on a partitioned table, the semantics are: - DEFAULT sets the relam to 0, any partitions with storage would use the GUC at creation time. Partitioned tables use a relam of 0. - If a value is set for the am, relam becomes non-0. Any partitions created on it inherit it (partitioned as well as non-partitioned tables). - No USING clause means to set its relam to 0. 0001 seems OK here, 0002 needs more eyes. The bulk of the changes is in the regression tests to cover all the cases I could think of. -- Michael
Attachment
On Mon, Mar 04, 2024 at 05:46:56PM +0900, Michael Paquier wrote: > > Since InvalidOid is already taken, I guess you might need to introduce a > > boolean flag, like set_relam, indicating that the statement has an > > ACCESS METHOD clause. > > Yes, I don't see an alternative. The default needs a different field > to be tracked down to the execution. The data structure you used (defaultAccessMethod) allows this, which is intended to be prohibited: postgres=# ALTER TABLE a SET access method default, SET access method default; ALTER TABLE As you wrote it, you pass the "defaultAccessMethod" bool to ATExecSetAccessMethodNoStorage(), which seems odd. Why not just pass the target amoid as newAccessMethod ? When I fooled with this on my side, I called it "chgAccessMethod" to follow "chgPersistence". I think "is default" isn't the right data structure. Attached a relative patch with my version. Also: I just realized that instead of adding a bool, we could test (tab->rewrite & AT_REWRITE_ACCESS_METHOD) != 0 +-- Default and AM set in in clause are the same, relam should be set. in in? -- Justin
Attachment
On Thu, Mar 07, 2024 at 08:02:00PM -0600, Justin Pryzby wrote: > As you wrote it, you pass the "defaultAccessMethod" bool to > ATExecSetAccessMethodNoStorage(), which seems odd. Why not just pass > the target amoid as newAccessMethod ? + /* + * Check that the table access method exists. + * Use the access method, if specified, otherwise (when not specified) 0 + * for partitioned tables or the configured default AM. + */ + if (amname != NULL) + amoid = get_table_am_oid(amname, false); + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + amoid = 0; + else + amoid = get_table_am_oid(default_table_access_method, false); .. While using this flow to choose the AM oid, that's neater than the versions I could come up with, pretty cool. > When I fooled with this on my side, I called it "chgAccessMethod" to > follow "chgPersistence". I think "is default" isn't the right data > structure. > > Attached a relative patch with my version. Thanks. I've applied the patch to add the DEFAULT clause for now, to ease the work on this patch. > Also: I just realized that instead of adding a bool, we could test > (tab->rewrite & AT_REWRITE_ACCESS_METHOD) != 0 Hmm. I've considered that, but the boolean style is able to do the work, while being more consistent, so I'm OK with what you are doing in your 0002. > +-- Default and AM set in in clause are the same, relam should be set. > > in in? Oops, fixed. I have spent more time reviewing the whole and the tests (I didn't see much value in testing the DEFAULT clause twice for the partitioned table case and there is a test in d61a6cad6418), tweaked a few comments and the documentation, did an indentation and a commit message draft. How does that look to you? The test coverage and the semantics do what we want them to do, so that looks rather reasonable here. A second or even third pair of eyes would not hurt. -- Michael
Attachment
On 2024-Mar-08, Michael Paquier wrote: > I have spent more time reviewing the whole and the tests (I didn't see > much value in testing the DEFAULT clause twice for the partitioned > table case and there is a test in d61a6cad6418), tweaked a few > comments and the documentation, did an indentation and a commit > message draft. > > How does that look to you? The test coverage and the semantics do > what we want them to do, so that looks rather reasonable here. A > second or even third pair of eyes would not hurt. I gave this a look. I found some of the comments a bit confusing or overly long, so I propose to reword them. I also propose a small doc change (during writing which I noticed that the docs for tablespace had been neglected and one comment too many; patch to be committed separately soon). I ended up also moving code in tablecmds.c so that all the AT*SetAccessMethod* routines appear together rather than mixed with the ones for tablespaces, and removing one CCI that seems unnecessary, at the bottom of ATExecSetAccessMethodNoStorage. 0001 is Michaël's patch, 0002 are my proposed changes. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Hay que recordar que la existencia en el cosmos, y particularmente la elaboración de civilizaciones dentro de él no son, por desgracia, nada idílicas" (Ijon Tichy)
Attachment
On 2024-Mar-19, Alvaro Herrera wrote: > 0001 is Michaël's patch, 0002 are my proposed changes. Doh, I sent the wrong set of attachments. But I see no reason to post again: what I attached as 0001 is what I wrote was going to be 0002, Michaël's patch is already in archives, and the CI tests with both applied on current master are running here: https://cirrus-ci.com/build/6404370015715328 Michaël, I'll leave this for you to push ... Thanks! -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Given that Michaël is temporarily gone, I propose to push the attached tomorrow. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ <inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell <crab> inflex: you know that "amalgam" means "mixture with mercury", more or less, right? <crab> i.e., "deadly poison"
Attachment
On Mar 21, 2024, at 13:07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Given that Michaël is temporarily gone, I propose to push the attached > tomorrow. Thanks for doing so. I’m wondering whether I should be an author of this patch at this stage, tbh. I wrote all the testsand rewrote most of the internals to adjust with the consensus reached ;) -- Michael
Hello Alvaro, 21.03.2024 15:07, Alvaro Herrera wrote: > Given that Michaël is temporarily gone, I propose to push the attached > tomorrow. Please look at a new anomaly introduced with 374c7a229. Starting from that commit, the following erroneous query: CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x; triggers an assertion failure: TRAP: failed Assert("relation->rd_rel->relam == InvalidOid"), File: "relcache.c", Line: 1219, PID: 3706301 with the stack trace: ... #4 0x00007fe53ced67f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x000055f28555951e in ExceptionalCondition (conditionName=conditionName@entry=0x55f285744788 "relation->rd_rel->relam == InvalidOid", fileName=fileName@entry=0x55f285743f1c "relcache.c", lineNumber=lineNumber@entry=1219) at assert.c:66 #6 0x000055f285550450 in RelationBuildDesc (targetRelId=targetRelId@entry=16385, insertIt=insertIt@entry=false) at relcache.c:1219 #7 0x000055f285550769 in RelationClearRelation (relation=relation@entry=0x7fe5310dd178, rebuild=rebuild@entry=true) at relcache.c:2667 #8 0x000055f285550c41 in RelationFlushRelation (relation=0x7fe5310dd178) at relcache.c:2850 #9 0x000055f285550ca0 in RelationCacheInvalidateEntry (relationId=<optimized out>) at relcache.c:2921 #10 0x000055f285542551 in LocalExecuteInvalidationMessage (msg=0x55f2861b3160) at inval.c:738 #11 0x000055f28554159b in ProcessInvalidationMessages (group=0x55f2861b2e6c, func=func@entry=0x55f2855424a8 <LocalExecuteInvalidationMessage>) at inval.c:518 #12 0x000055f285542740 in CommandEndInvalidationMessages () at inval.c:1180 #13 0x000055f28509cbbd in AtCCI_LocalCache () at xact.c:1550 #14 0x000055f28509e88e in CommandCounterIncrement () at xact.c:1116 #15 0x000055f2851d0c8b in DefineRelation (stmt=stmt@entry=0x55f2861803b0, relkind=relkind@entry=102 'f', ownerId=10, ownerId@entry=0, typaddress=typaddress@entry=0x0, queryString=queryString@entry=0x55f28617f870 "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;") at tablecmds.c:1008 #16 0x000055f28540945d in ProcessUtilitySlow (pstate=pstate@entry=0x55f2861a9dc0, pstmt=pstmt@entry=0x55f286180510, queryString=queryString@entry=0x55f28617f870 "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=0x55f2861807d0, qc=0x7fff15b5d7c0) at utility.c:1203 #17 0x000055f28540911f in standard_ProcessUtility (pstmt=0x55f286180510, queryString=0x55f28617f870 "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;", readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55f2861807d0, qc=0x7fff15b5d7c0) at utility.c:1067 ... On 374c7a229~1 it fails with ERROR: "pg_am" is not partitioned Best regards, Alexander
On 2024-Mar-26, Alexander Lakhin wrote: > Hello Alvaro, > > 21.03.2024 15:07, Alvaro Herrera wrote: > > Given that Michaël is temporarily gone, I propose to push the attached > > tomorrow. > > Please look at a new anomaly introduced with 374c7a229. > Starting from that commit, the following erroneous query: > CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x; > > triggers an assertion failure: > TRAP: failed Assert("relation->rd_rel->relam == InvalidOid"), File: "relcache.c", Line: 1219, PID: 3706301 Hmm, yeah, we're setting relam for relations that shouldn't have it. I propose the attached. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Attachment
On Thu, Mar 21, 2024 at 01:07:01PM +0100, Alvaro Herrera wrote: > Given that Michaël is temporarily gone, I propose to push the attached > tomorrow. Thanks. On Tue, Mar 26, 2024 at 12:05:47PM +0100, Alvaro Herrera wrote: > On 2024-Mar-26, Alexander Lakhin wrote: > > > Hello Alvaro, > > > > 21.03.2024 15:07, Alvaro Herrera wrote: > > > Given that Michaël is temporarily gone, I propose to push the attached > > > tomorrow. > > > > Please look at a new anomaly introduced with 374c7a229. > > Starting from that commit, the following erroneous query: > > CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x; > > > > triggers an assertion failure: > > TRAP: failed Assert("relation->rd_rel->relam == InvalidOid"), File: "relcache.c", Line: 1219, PID: 3706301 > > Hmm, yeah, we're setting relam for relations that shouldn't have it. > I propose the attached. Looks right. That's how I originally wrote it, except for the "stmt->accessMethod != NULL" case. I prefered my way - the grammar should refuse to set stmt->accessMethod for inappropriate relkinds. And you could assert that. I also prefered to set "accessMethodId = InvalidOid" once, rather than twice. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 8a02c5b05b6..050be89728f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -962,18 +962,21 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, * case of a partitioned table) the parent's, if it has one. */ if (stmt->accessMethod != NULL) - accessMethodId = get_table_am_oid(stmt->accessMethod, false); - else if (stmt->partbound) { - Assert(list_length(inheritOids) == 1); - accessMethodId = get_rel_relam(linitial_oid(inheritOids)); + Assert(RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE); + accessMethodId = get_table_am_oid(stmt->accessMethod, false); } - else - accessMethodId = InvalidOid; + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) + { + if (stmt->partbound) + { + Assert(list_length(inheritOids) == 1); + accessMethodId = get_rel_relam(linitial_oid(inheritOids)); + } - /* still nothing? use the default */ - if (RELKIND_HAS_TABLE_AM(relkind) && !OidIsValid(accessMethodId)) - accessMethodId = get_table_am_oid(default_table_access_method, false); + if (RELKIND_HAS_TABLE_AM(relkind) && !OidIsValid(accessMethodId)) + accessMethodId = get_table_am_oid(default_table_access_method, false); + } /* * Create the relation. Inherited defaults and constraints are passed in
On 2024-Mar-26, Justin Pryzby wrote: > Looks right. That's how I originally wrote it, except for the > "stmt->accessMethod != NULL" case. > > I prefered my way - the grammar should refuse to set stmt->accessMethod > for inappropriate relkinds. And you could assert that. Hmm, I didn't like this at first sight, because it looked convoluted and baroque, but I compared both versions for a while and I ended up liking yours more than mine, so I adopted it. > I also prefered to set "accessMethodId = InvalidOid" once, rather than > twice. Grumble. I don't like initialization at declare time, so far from the code that depends on the value. But the alternative would have been to assign right where this blocks starts, an additional line. I pushed it like you had it. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "This is what I like so much about PostgreSQL. Most of the surprises are of the "oh wow! That's cool" Not the "oh shit!" kind. :)" Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
Hello Alvaro, 28.03.2024 18:58, Alvaro Herrera wrote: > Grumble. I don't like initialization at declare time, so far from the > code that depends on the value. But the alternative would have been to > assign right where this blocks starts, an additional line. I pushed it > like you had it. I've stumbled upon a test failure caused by the test query added in that commit: --- .../src/test/regress/expected/create_am.out 2024-03-28 12:14:11.700764888 -0400 +++ .../src/test/recovery/tmp_check/results/create_am.out 2024-03-31 03:10:28.172244122 -0400 @@ -549,7 +549,10 @@ ERROR: access method "btree" is not of type TABLE -- Other weird invalid cases that cause problems CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x; -ERROR: "pg_am" is not partitioned +ERROR: deadlock detected +DETAIL: Process 3076180 waits for AccessShareLock on relation 1259 of database 16386; blocked by process 3076181. +Process 3076181 waits for AccessShareLock on relation 2601 of database 16386; blocked by process 3076180. +HINT: See server log for query details. -- Drop table access method, which fails as objects depends on it DROP ACCESS METHOD heap2; ERROR: cannot drop access method heap2 because other objects depend on it 027_stream_regress_primary.log contains: 2024-03-31 03:10:26.728 EDT [3076181] pg_regress/vacuum LOG: statement: VACUUM FULL pg_class; ... 2024-03-31 03:10:26.797 EDT [3076180] pg_regress/create_am LOG: statement: CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x; ... 2024-03-31 03:10:28.183 EDT [3076181] pg_regress/vacuum LOG: statement: VACUUM FULL pg_database; This simple demo confirms the issue: for ((i=1;i<=20;i++)); do echo "iteration $i" echo "VACUUM FULL pg_class;" | psql >psql-1.log & echo "CREATE FOREIGN TABLE fp PARTITION OF pg_am DEFAULT SERVER x;" | psql >psql-2.log & wait done ... iteration 15 ERROR: "pg_am" is not partitioned iteration 16 ERROR: deadlock detected DETAIL: Process 2556377 waits for AccessShareLock on relation 1259 of database 16384; blocked by process 2556378. Process 2556378 waits for AccessShareLock on relation 2601 of database 16384; blocked by process 2556377. HINT: See server log for query details. ... Best regards, Alexander
On Sun, Mar 31, 2024 at 12:00:00PM +0300, Alexander Lakhin wrote: > Hello Alvaro, > > 28.03.2024 18:58, Alvaro Herrera wrote: > > Grumble. I don't like initialization at declare time, so far from the > > code that depends on the value. But the alternative would have been to > > assign right where this blocks starts, an additional line. I pushed it > > like you had it. > > I've stumbled upon a test failure caused by the test query added in that > commit: > +ERROR: deadlock detected > +DETAIL: Process 3076180 waits for AccessShareLock on relation 1259 of database 16386; blocked by process 3076181. > +Process 3076181 waits for AccessShareLock on relation 2601 of database 16386; blocked by process 3076180. I think means that, although it was cute to use pg_am in the reproducer given in the problem report, it's not a good choice to use here in the sql regression tests. -- Justin
Justin Pryzby <pryzby@telsasoft.com> writes: > On Sun, Mar 31, 2024 at 12:00:00PM +0300, Alexander Lakhin wrote: >> I've stumbled upon a test failure caused by the test query added in that >> commit: >> +ERROR: deadlock detected >> +DETAIL: Process 3076180 waits for AccessShareLock on relation 1259 of database 16386; blocked by process 3076181. >> +Process 3076181 waits for AccessShareLock on relation 2601 of database 16386; blocked by process 3076180. > I think means that, although it was cute to use pg_am in the reproducer > given in the problem report, it's not a good choice to use here in the > sql regression tests. Another case here: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sevengill&dt=2024-04-02%2001%3A32%3A17 AFAICS, e2395cdbe posits that taking exclusive lock on pg_am in the middle of a bunch of concurrent regression scripts couldn't possibly cause any problems. Really? regards, tom lane
On Tue, Apr 02, 2024 at 01:06:06AM -0400, Tom Lane wrote: > AFAICS, e2395cdbe posits that taking exclusive lock on pg_am in the > middle of a bunch of concurrent regression scripts couldn't possibly > cause any problems. Really? There is no need for a catalog here to trigger the failure, and it would have happened as long as a foreign table is used. The problem introduced in 374c7a229042 fixed by e2395cdbe83a comes from a thinko on my side, my apologies for that and the delay in replying. Thanks for the extra fix done in 13b3b62746ec, Alvaro. -- Michael
Attachment
On Mon, Apr 15, 2024 at 10:46:00AM +0900, Michael Paquier wrote: > There is no need for a catalog here to trigger the failure, and it > would have happened as long as a foreign table is used. The problem > introduced in 374c7a229042 fixed by e2395cdbe83a comes from a thinko > on my side, my apologies for that and the delay in replying. Thanks > for the extra fix done in 13b3b62746ec, Alvaro. While doing more tests with this feature, among other things, I've spotted an incorrect behavior with dump/restore with the handling of the GUC default_table_access_method when it comes to partitions. Imagine the following in database "a": CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id); CREATE TABLE parent_tab_2 (id int) PARTITION BY RANGE (id) USING heap; CREATE TABLE parent_tab_3 (id int) PARTITION BY RANGE (id); This leads to the following in pg_class: =# SELECT relname, relam FROM pg_class WHERE oid > 16000; relname | relam --------------+------- parent_tab | 0 parent_tab_2 | 2 parent_tab_3 | 0 (3 rows) Now, let's do the following: $ createdb b $ pg_dump | psql b $ psql b =# SELECT relname, relam FROM pg_class WHERE oid > 16000; relname | relam --------------+------- parent_tab | 0 parent_tab_2 | 0 parent_tab_3 | 0 (3 rows) And parent_tab_2 would now rely on the default GUC when creating new partitions rather than enforce heap. It seems to me that we are going to extend the GUC default_table_access_method with a "default" mode to be able to force relam to 0 and make a difference with the non-0 case, in the same way as ALTER TABLE SET ACCESS METHOD DEFAULT. The thing is that, like tablespaces, we have to rely on a GUC and not a USING clause to be able to handle --no-table-access-method. An interesting point comes to what we should do for default_table_access_method set to "default" when dealing with something else than a partitioned table, where an error may be adapted. Still, I'm wondering if there are more flavors I lack imagination for. This requires more careful design work. Perhaps somebody has a good idea? -- Michael
Attachment
On Tue, Apr 16, 2024 at 02:14:21PM +0900, Michael Paquier wrote: > It seems to me that we are going to extend the GUC > default_table_access_method with a "default" mode to be able to force > relam to 0 and make a difference with the non-0 case, in the same way > as ALTER TABLE SET ACCESS METHOD DEFAULT. The thing is that, like > tablespaces, we have to rely on a GUC and not a USING clause to be > able to handle --no-table-access-method. > > An interesting point comes to what we should do for > default_table_access_method set to "default" when dealing with > something else than a partitioned table, where an error may be > adapted. Still, I'm wondering if there are more flavors I lack > imagination for. This requires more careful design work. > > Perhaps somebody has a good idea? Actually, I've come up with an idea just after hitting the send button: let's use an extra ALTER TABLE SET ACCESS METHOD rather than rely on the GUC to set the AM of the partitioned table correctly. This extra command should be optional, depending on --no-table-access-method. If a partitioned table has 0 as relam, let's not add this extra ALTER TABLE at all. -- Michael
Attachment
On Tue, Apr 16, 2024 at 02:19:56PM +0900, Michael Paquier wrote: > Actually, I've come up with an idea just after hitting the send > button: let's use an extra ALTER TABLE SET ACCESS METHOD rather than > rely on the GUC to set the AM of the partitioned table correctly. > This extra command should be optional, depending on > --no-table-access-method. If a partitioned table has 0 as relam, > let's not add this extra ALTER TABLE at all. I have explored this idea, and while this is tempting this faces a couple of challenges: 1) Binary upgrades would fail because the table rewrite created by ALTER TABLE SET ACCESS METHOD for relkinds with physical storage expects heap_create_with_catalog to have a fixed OID, but the rewrite would require extra steps to be able to handle that, and I am not convinced that more binary_upgrade_set_next_heap_relfilenode() is a good idea. 2) We could limit these extra ALTER TABLE commands to be generated for partitioned tables. This is kind of confusing as resulting dumps would mix SET commands for default_table_access_method that would affect tables with physical storage, while partitioned tables would have their own extra ALTER TABLE commands. Another issue that needs more consideration is that TocEntrys don't hold any relkind information so pg_backup_archiver.c cannot make a difference with tables and partitioned tables to select if SET or ALTER TABLE should be generated. Several designs are possible, like: - Mix SET and ALTER TABLE commands in the dumps to set the AM, SET for tables and matviews, ALTER TABLE for relations without storage. This would bypass the binary upgrade problem with the fixed relid. - Use only SET, requiring a new "default" value for default_table_access_method that would force a partitioned table's relam to be 0. Be stricter with the "current" table AM tracked in pg_dump's backup archiver. - Use only ALTER TABLE commands, with extra binary upgrade tweaks to force relation OIDs for the second heap_create_with_catalog() done with the rewrite to update a relation's AM. With all that in mind, it may be better to revert 374c7a229042 and e2395cdbe83a from HEAD and reconsider how to tackle the dump issues in v18 or newer versions as all of the approaches I can think of lead to more complications of their own. Please see attached a non-polished POC that switches dumps to use ALTER TABLE, that I've used to detect the upgrade problems. Thoughts or comments are welcome. -- Michael
Attachment
On 2024-Apr-17, Michael Paquier wrote: > 2) We could limit these extra ALTER TABLE commands to be generated for > partitioned tables. This is kind of confusing as resulting dumps > would mix SET commands for default_table_access_method that would > affect tables with physical storage, while partitioned tables would > have their own extra ALTER TABLE commands. Hmm, cannot we simply add a USING clause to the CREATE TABLE command for partitioned tables? That would override the default_table_access_method, so it should give the correct result, no? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On 2024-Apr-17, Alvaro Herrera wrote: > Hmm, cannot we simply add a USING clause to the CREATE TABLE command for > partitioned tables? That would override the > default_table_access_method, so it should give the correct result, no? Ah, upthread you noted that pg_restore-time --no-table-access-method needs to be able to elide it, so a dump-time USING clause doesn't work. I think it's easy enough to add a "bool ispartitioned" to TableInfo and use an ALTER TABLE or rely on the GUC depending on that -- seems easy enough. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Wed, Apr 17, 2024 at 09:50:02AM +0200, Alvaro Herrera wrote: > I think it's easy enough to add a "bool ispartitioned" to TableInfo and > use an ALTER TABLE or rely on the GUC depending on that -- seems easy > enough. Yeah, that would be easy enough to track but I was wondering about adding the relkind instead. Still, one thing that I found confusing is the dump generated in this case, as it would mix the SET and the ALTER TABLE commands so one reading the dumps may wonder why the SET has no effect for a CREATE TABLE PARTITION OF without USING. Perhaps that's fine and I just worry too much ;) The extra ALTER commands need to be generated after the object definitions, so we'd need a new subroutine similar to _selectTableAccessMethod() like a _selectTableAccessMethodNoStorage(). Or grouping both together is just simpler? -- Michael
Attachment
On 2024-Apr-17, Michael Paquier wrote: > Yeah, that would be easy enough to track but I was wondering about > adding the relkind instead. Still, one thing that I found confusing > is the dump generated in this case, as it would mix the SET and the > ALTER TABLE commands so one reading the dumps may wonder why the SET > has no effect for a CREATE TABLE PARTITION OF without USING. Perhaps > that's fine and I just worry too much ;) Hmm, maybe we should do a RESET of default_table_access_method before printing the CREATE TABLE to avoid the confusion. > The extra ALTER commands need to be generated after the object > definitions, so we'd need a new subroutine similar to > _selectTableAccessMethod() like a _selectTableAccessMethodNoStorage(). > Or grouping both together is just simpler? I think there should be two routines, since _select* routines just print a SET command; maybe the new one would be _printAlterTableAM() or something like that. Having _select() print an ALTER TABLE command depending on relkind (or the boolean flag) would be confusing, I think. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
BTW if nothing else, this thread led me to discover a 18-month-old typo in the Spanish translation of pg_dump: -msgstr " --no-tablespaces no volcar métodos de acceso de tablas\n" +msgstr " --no-table-access-method no volcar métodos de acceso de tablas\n" Oops. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Los dioses no protegen a los insensatos. Éstos reciben protección de otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)
On Wed, Apr 17, 2024 at 10:31:52AM +0200, Alvaro Herrera wrote: > On 2024-Apr-17, Michael Paquier wrote: >> Yeah, that would be easy enough to track but I was wondering about >> adding the relkind instead. Still, one thing that I found confusing >> is the dump generated in this case, as it would mix the SET and the >> ALTER TABLE commands so one reading the dumps may wonder why the SET >> has no effect for a CREATE TABLE PARTITION OF without USING. Perhaps >> that's fine and I just worry too much ;) > > Hmm, maybe we should do a RESET of default_table_access_method before > printing the CREATE TABLE to avoid the confusion. A hard reset would make the business around currTableAM that decides when to generate the SET default_table_access_method queries slightly more complicated, while increasing the number of queries run on the server. >> The extra ALTER commands need to be generated after the object >> definitions, so we'd need a new subroutine similar to >> _selectTableAccessMethod() like a _selectTableAccessMethodNoStorage(). >> Or grouping both together is just simpler? > > I think there should be two routines, since _select* routines just print > a SET command; maybe the new one would be _printAlterTableAM() or > something like that. Having _select() print an ALTER TABLE command > depending on relkind (or the boolean flag) would be confusing, I think. Fine by me to use two routines to generate the two different commands. I am finishing with the attached for now, making dumps, restores and upgrades work happily as far as I've tested. I was also worrying about a need to dump the protocol version to be able to track the relkind in the toc entries, but a45c78e3284b has already done one. The difference in AM handling between relations without storage and relations with storage pushes the relkind logic more within the internals of pg_backup_archiver.c. What do you think? -- Michael
Attachment
On 2024-Apr-18, Michael Paquier wrote: > On Wed, Apr 17, 2024 at 10:31:52AM +0200, Alvaro Herrera wrote: > > Hmm, maybe we should do a RESET of default_table_access_method before > > printing the CREATE TABLE to avoid the confusion. > > A hard reset would make the business around currTableAM that decides > when to generate the SET default_table_access_method queries slightly > more complicated, while increasing the number of queries run on the > server. Hmm, okay. (I don't think we really care too much about the number of queries, do we?) > Fine by me to use two routines to generate the two different commands. > I am finishing with the attached for now, making dumps, restores and > upgrades work happily as far as I've tested. Great. > I was also worrying about a need to dump the protocol version to be > able to track the relkind in the toc entries, but a45c78e3284b has > already done one. The difference in AM handling between relations > without storage and relations with storage pushes the relkind logic > more within the internals of pg_backup_archiver.c. Hmm, does this mean that every dump taking since a45c78e3284b (April 1st) and before this commit will be unrestorable? This doesn't worry me too much, because we aren't even in beta yet ... and I think we don't have a strict policy about it. > --- a/src/bin/pg_dump/t/002_pg_dump.pl > +++ b/src/bin/pg_dump/t/002_pg_dump.pl > @@ -4591,11 +4591,9 @@ my %tests = ( > CREATE TABLE dump_test.regress_pg_dump_table_am_child_2 > PARTITION OF dump_test.regress_pg_dump_table_am_parent FOR VALUES IN (2);', > regexp => qr/^ > - \QSET default_table_access_method = regress_table_am;\E > - (\n(?!SET[^;]+;)[^\n]*)* > - \n\QCREATE TABLE dump_test.regress_pg_dump_table_am_parent (\E > - (.*\n)* > \QSET default_table_access_method = heap;\E > + (.*\n)* > + \QALTER TABLE dump_test.regress_pg_dump_table_am_parent SET ACCESS METHOD regress_table_am;\E > (\n(?!SET[^;]+;)[^\n]*)* > \n\QCREATE TABLE dump_test.regress_pg_dump_table_am_child_1 (\E > (.*\n)* This looks strange -- why did you remove matching for the CREATE TABLE of the parent table? That line should appear shortly before the ALTER TABLE SET ACCESS METHOD for the same table, shouldn't it? Maybe your intention was to remove only the SET default_table_access_method = regress_table_am line ... but it's not clear to me why we have the "SET default_table_access_method = heap" line before the ALTER TABLE SET ACCESS METHOD. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "If it is not right, do not do it. If it is not true, do not say it." (Marcus Aurelius, Meditations)
On Thu, Apr 18, 2024 at 06:17:56PM +0200, Alvaro Herrera wrote: > On 2024-Apr-18, Michael Paquier wrote: >> I was also worrying about a need to dump the protocol version to be >> able to track the relkind in the toc entries, but a45c78e3284b has >> already done one. The difference in AM handling between relations >> without storage and relations with storage pushes the relkind logic >> more within the internals of pg_backup_archiver.c. > > Hmm, does this mean that every dump taking since a45c78e3284b (April > 1st) and before this commit will be unrestorable? This doesn't worry me > too much, because we aren't even in beta yet ... and I think we don't > have a strict policy about it. I've been scanning the history of K_VERS_1_* in the recent years, and it does not seem that we have a case where we would have needed to bump the version twice in the same release cycle. Anyway, yes, any dump taken since 1_16 has been bumped would fail to restore with this patch in place. For an unreleased not-yet-in-beta branch, why should we care? Things are not set in stone, like extensions. If others have comments about this point, feel free of course. >> --- a/src/bin/pg_dump/t/002_pg_dump.pl >> +++ b/src/bin/pg_dump/t/002_pg_dump.pl >> @@ -4591,11 +4591,9 @@ my %tests = ( >> CREATE TABLE dump_test.regress_pg_dump_table_am_child_2 >> PARTITION OF dump_test.regress_pg_dump_table_am_parent FOR VALUES IN (2);', >> regexp => qr/^ >> - \QSET default_table_access_method = regress_table_am;\E >> - (\n(?!SET[^;]+;)[^\n]*)* >> - \n\QCREATE TABLE dump_test.regress_pg_dump_table_am_parent (\E >> - (.*\n)* >> \QSET default_table_access_method = heap;\E >> + (.*\n)* >> + \QALTER TABLE dump_test.regress_pg_dump_table_am_parent SET ACCESS METHOD regress_table_am;\E >> (\n(?!SET[^;]+;)[^\n]*)* >> \n\QCREATE TABLE dump_test.regress_pg_dump_table_am_child_1 (\E >> (.*\n)* > > This looks strange -- why did you remove matching for the CREATE TABLE > of the parent table? That line should appear shortly before the ALTER > TABLE SET ACCESS METHOD for the same table, shouldn't it? Yeah, with the ALTER in place that did not seem that mandatory but I don't mind keeping it, as well. > Maybe your > intention was to remove only the SET default_table_access_method > = regress_table_am line ... but it's not clear to me why we have the > "SET default_table_access_method = heap" line before the ALTER TABLE SET > ACCESS METHOD. This comes from the contents of the dump for regress_pg_dump_table_am_2, that uses heap as table AM. A SET is issued for it before dumping regress_pg_dump_table_am_parent and its partitions. One trick that I can think of to make the output parsing of the test more palatable is to switch the AMs used by the two partitions, so as we finish with two SET queries before each partition rather than one before the partitioned table. See the attached for the idea. -- Michael
Attachment
On Fri, Apr 19, 2024 at 10:41:30AM +0900, Michael Paquier wrote: > This comes from the contents of the dump for > regress_pg_dump_table_am_2, that uses heap as table AM. A SET is > issued for it before dumping regress_pg_dump_table_am_parent and its > partitions. One trick that I can think of to make the output parsing > of the test more palatable is to switch the AMs used by the two > partitions, so as we finish with two SET queries before each partition > rather than one before the partitioned table. See the attached for > the idea. FYI, as this is an open item, I am planning to wrap that at the beginning of next week after a second lookup. If there are any comments, feel free. -- Michael
Attachment
On Sat, Apr 20, 2024 at 11:55:46AM +0900, Michael Paquier wrote: > FYI, as this is an open item, I am planning to wrap that at the > beginning of next week after a second lookup. If there are any > comments, feel free. This one is now fixed with f46bee346c3b, and the open item is marked as fixed. -- Michael
Attachment
It occurred to me that psql \dP+ should show the AM of partitioned tables (and other partitioned rels). Arguably, this could've been done when \dP was introduced in v12, but at that point would've shown the AM only for partitioned indexes. But it makes a lot of sense to do it now that partitioned tables support AMs. I suggest to consider this for v17. regression=# \dP+ List of partitioned relations Schema | Name | Owner | Type | Table | Access method | Total size | Description --------+----------------------+---------+-------------------+----------------+---------------+------------+------------- public | mlparted | pryzbyj | partitioned table | | heap2 | 104 kB | public | tableam_parted_heap2 | pryzbyj | partitioned table | | | 32 kB | public | trigger_parted | pryzbyj | partitioned table | | | 0 bytes | public | upsert_test | pryzbyj | partitioned table | | | 8192 bytes | public | trigger_parted_pkey | pryzbyj | partitioned index | trigger_parted | btree | 16 kB | public | upsert_test_pkey | pryzbyj | partitioned index | upsert_test | btree | 8192 bytes | --- src/bin/psql/describe.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index f67bf0b8925..22a668409e7 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -4113,7 +4113,7 @@ listPartitionedTables(const char *reltypes, const char *pattern, bool verbose) PQExpBufferData title; PGresult *res; printQueryOpt myopt = pset.popt; - bool translate_columns[] = {false, false, false, false, false, false, false, false, false}; + bool translate_columns[] = {false, false, false, false, false, false, false, false, false, false}; const char *tabletitle; bool mixed_output = false; @@ -4181,6 +4181,14 @@ listPartitionedTables(const char *reltypes, const char *pattern, bool verbose) if (verbose) { + /* + * Table access methods were introduced in v12, and can be set on + * partitioned tables since v17. + */ + appendPQExpBuffer(&buf, + ",\n am.amname as \"%s\"", + gettext_noop("Access method")); + if (showNested) { appendPQExpBuffer(&buf, @@ -4216,6 +4224,9 @@ listPartitionedTables(const char *reltypes, const char *pattern, bool verbose) if (verbose) { + appendPQExpBufferStr(&buf, + "\n LEFT JOIN pg_catalog.pg_am am ON c.relam = am.oid"); + if (pset.sversion < 120000) { appendPQExpBufferStr(&buf, -- 2.42.0
On Tue, May 21, 2024 at 08:33:51AM -0500, Justin Pryzby wrote: > It occurred to me that psql \dP+ should show the AM of partitioned > tables (and other partitioned rels). ping
On Tue, May 21, 2024 at 08:33:51AM -0500, Justin Pryzby wrote: > It occurred to me that psql \dP+ should show the AM of partitioned > tables (and other partitioned rels). > Arguably, this could've been done when \dP was introduced in v12, but > at that point would've shown the AM only for partitioned indexes. > But it makes a lot of sense to do it now that partitioned tables support > AMs. I suggest to consider this for v17. Not sure that this is a must-have. It is nice to have, but extra information is a new feature IMO. Any extra opinions? I would suggest to attach a patch, that makes review easier. And so here is one. -- Michael
Attachment
On Thu, Jun 06, 2024 at 09:43:45AM +0900, Michael Paquier wrote: > Not sure that this is a must-have. It is nice to have, but extra > information is a new feature IMO. Any extra opinions? Hearing nothing, I've applied that on HEAD now that v18 is open. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Jun 06, 2024 at 09:43:45AM +0900, Michael Paquier wrote: >> Not sure that this is a must-have. It is nice to have, but extra >> information is a new feature IMO. Any extra opinions? > Hearing nothing, I've applied that on HEAD now that v18 is open. While this won't actually fail against a v10 or v11 server, it won't show anything useful either (because relam is zero for heaps in pre-v12 versions). Perhaps there should be a check to only add the extra column if server >= v12? regards, tom lane
On Mon, Jul 01, 2024 at 08:40:17PM -0400, Tom Lane wrote: > While this won't actually fail against a v10 or v11 server, it won't > show anything useful either (because relam is zero for heaps in > pre-v12 versions). Perhaps there should be a check to only add the > extra column if server >= v12? I've thought about that and would be OK to restrict things with this suggestion if you'd prefer it. However, I could not decide in favor of it as using a psql \dP+ >= v18 gives the possibility to show the AMs of partitioned indexes, as these are also part listed in \dP. So I've found that useful in itself. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Jul 01, 2024 at 08:40:17PM -0400, Tom Lane wrote: >> While this won't actually fail against a v10 or v11 server, it won't >> show anything useful either (because relam is zero for heaps in >> pre-v12 versions). Perhaps there should be a check to only add the >> extra column if server >= v12? > I've thought about that and would be OK to restrict things with this > suggestion if you'd prefer it. However, I could not decide in favor > of it as using a psql \dP+ >= v18 gives the possibility to show the > AMs of partitioned indexes, as these are also part listed in \dP. So > I've found that useful in itself. Ah, I'd forgotten that partitioned indexes are shown too. Never mind then. regards, tom lane