Thread: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
Hi PostgreSQL Community,
I have encountered an issue when attempting to use
Given the situation, I see two potential paths forward:
1/ Reintroduce Support for Sequences in pgstattuple: This would be a relatively small change. However, it's important to note that the purpose of
2/ Explicitly Block Sequence Support in pgstattuple: We could align sequences with other unsupported objects, such as foreign tables, by providing a more explicit error message. For instance:
Personally, I lean towards the second approach, as it promotes consistency and clarity. However, I would greatly appreciate the community's feedback and suggestions on the best way to proceed.
Based on the feedback received, I will work on the appropriate patch.
Looking forward to your comments and feedback.
[1] Reference to Earlier Discussion: For additional context, I previously discussed this issue on the pgsql-general mailing list. You can find the discussion https://www.postgresql.org/message-id/CACX%2BKaMOd3HHteOJNX7fkWxO%2BR%3DuLJkfKqE2-QUK8fKmKfOwqw%40mail.gmail.com. In that thread, it was suggested that this could be considered a documentation bug, and that we might update the documentation and regression tests accordingly.
Regards
Ayush Vatsa
AWS
I have encountered an issue when attempting to use
pgstattuple
extension with sequences. When executing the following command:SELECT * FROM pgstattuple('serial');This behaviour is observed in PostgreSQL versions post v11 [1] , where sequences support in
ERROR: only heap AM is supported
pgstattuple
used to work fine. However, this issue slipped through as we did not have any test cases to catch it.Given the situation, I see two potential paths forward:
1/ Reintroduce Support for Sequences in pgstattuple: This would be a relatively small change. However, it's important to note that the purpose of
pgstattuple
is to provide statistics like the number of tuples, dead tuples, and free space in a relation. Sequences, on the other hand, return only one value at a time and don’t have attributes like dead tuples. Therefore, the result for any sequence would consistently look something like this:SELECT * FROM pgstattuple('serial');
table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
8192 | 1 | 41 | 0.5 | 0 | 0 | 0 | 8104 | 98.93
(1 row)
2/ Explicitly Block Sequence Support in pgstattuple: We could align sequences with other unsupported objects, such as foreign tables, by providing a more explicit error message. For instance:
SELECT * FROM pgstattuple('x');This approach would ensure that the error handling for sequences is consistent with how other unsupported objects are handled.
ERROR: cannot get tuple-level statistics for relation "x"
DETAIL: This operation is not supported for foreign tables.
Personally, I lean towards the second approach, as it promotes consistency and clarity. However, I would greatly appreciate the community's feedback and suggestions on the best way to proceed.
Based on the feedback received, I will work on the appropriate patch.
Looking forward to your comments and feedback.
[1] Reference to Earlier Discussion: For additional context, I previously discussed this issue on the pgsql-general mailing list. You can find the discussion https://www.postgresql.org/message-id/CACX%2BKaMOd3HHteOJNX7fkWxO%2BR%3DuLJkfKqE2-QUK8fKmKfOwqw%40mail.gmail.com. In that thread, it was suggested that this could be considered a documentation bug, and that we might update the documentation and regression tests accordingly.
Regards
Ayush Vatsa
AWS
On Mon, Aug 26, 2024 at 11:44 AM Ayush Vatsa <ayushvatsa1810@gmail.com> wrote: > Hi PostgreSQL Community, > I have encountered an issue when attempting to use pgstattuple extension with sequences. When executing the following command: > > SELECT * FROM pgstattuple('serial'); > ERROR: only heap AM is supported > > This behaviour is observed in PostgreSQL versions post v11 [1] , where sequences support in pgstattuple used to work fine.However, this issue slipped through as we did not have any test cases to catch it. > > Given the situation, I see two potential paths forward: > 1/ Reintroduce Support for Sequences in pgstattuple: This would be a relatively small change. However, it's important tonote that the purpose of pgstattuple is to provide statistics like the number of tuples, dead tuples, and free space ina relation. Sequences, on the other hand, return only one value at a time and don’t have attributes like dead tuples. Therefore,the result for any sequence would consistently look something like this: > > SELECT * FROM pgstattuple('serial'); > table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space| free_percent > -----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+-------------- > 8192 | 1 | 41 | 0.5 | 0 | 0 | 0 | 8104| 98.93 > (1 row) > > > 2/ Explicitly Block Sequence Support in pgstattuple: We could align sequences with other unsupported objects, such as foreigntables, by providing a more explicit error message. For instance: > > SELECT * FROM pgstattuple('x'); > ERROR: cannot get tuple-level statistics for relation "x" > DETAIL: This operation is not supported for foreign tables. > > This approach would ensure that the error handling for sequences is consistent with how other unsupported objects are handled. > Personally, I lean towards the second approach, as it promotes consistency and clarity. However, I would greatly appreciatethe community's feedback and suggestions on the best way to proceed. > Based on the feedback received, I will work on the appropriate patch. > > Looking forward to your comments and feedback. I don't really see what the problem is here. You state that the information pgstattuple provides isn't really useful for sequences, so that means there's no real reason to do (1). As for (2), I'm not opposed to improving error messages but it's not clear to me why you think that the current one is bad. You say that we should provide a more explicit error message, but "only heap AM is supported" seems pretty explicit to me: it doesn't spell out that this only works for relkind='r', but since relam=heap is only possible for relkind='r', there's not really any other reasonable interpretation, which IMHO makes this pretty specific about what the problem is. Maybe you just find it confusing, but that's a bit different from whether it's explicit enough. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Mon, Aug 26, 2024 at 09:14:27PM +0530, Ayush Vatsa wrote: > Given the situation, I see two potential paths forward: > *1/ Reintroduce Support for Sequences in pgstattuple*: This would be a > relatively small change. However, it's important to note that the purpose > of pgstattuple is to provide statistics like the number of tuples, dead > tuples, and free space in a relation. Sequences, on the other hand, return > only one value at a time and don´t have attributes like dead tuples. > > [...] > > *2/ Explicitly Block Sequence Support in pgstattuple*: We could align > sequences with other unsupported objects, such as foreign tables, by > providing a more explicit error message. While it is apparently pretty uncommon to use pgstattuple on sequences, this is arguably a bug that should be fixed and back-patched. I've CC'd Michael Paquier, who is working on sequence AMs and may have thoughts. I haven't looked at his patch set for that, but I'm assuming that it might fill in pg_class.relam for sequences, which would have the same effect as option 1. I see a couple of other places we might want to look into as part of this thread. Besides pgstattuple, I see that pageinspect and pg_surgery follow a similar pattern. pgrowlocks does, too, but that one seems intentionally limited to RELKIND_RELATION. I also see that amcheck explicitly allows sequences: /* * Sequences always use heap AM, but they don't show that in the catalogs. * Other relkinds might be using a different AM, so check. */ if (ctx.rel->rd_rel->relkind != RELKIND_SEQUENCE && ctx.rel->rd_rel->relam != HEAP_TABLE_AM_OID) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("only heap AM is supported"))); IMHO it would be good to establish some level of consistency here. -- nathan
On Mon, Aug 26, 2024 at 1:26 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > While it is apparently pretty uncommon to use pgstattuple on sequences, > this is arguably a bug that should be fixed and back-patched. I don't understand what would make it a bug. > IMHO it would be good to establish some level of consistency here. Sure, consistency is good, all other things being equal, but just saying "well this used to work one way and now it works another way" isn't enough to say that there is a bug, or that something should be changed. -- Robert Haas EDB: http://www.enterprisedb.com
> You state that the
> information pgstattuple provides isn't really useful for sequences, so
> that means there's no real reason to do (1)
> information pgstattuple provides isn't really useful for sequences, so
> that means there's no real reason to do (1)
That's correct, but we should consider that up until v11,
sequences were supported in pgstattuple. Their support
was removed unintentionally (I believe so). Therefore, it might be worth
discussing whether it makes sense to reinstate support for sequences.
> why you think that the current one is bad
The current implementation has some drawbacks.
> why you think that the current one is bad
The current implementation has some drawbacks.
For instance, when encountering other unsupported objects, the error looks like this:
ERROR: cannot get tuple-level statistics for relation "x"
DETAIL: This operation is not supported for foreign tables.
However, for sequences, the message should explicitly
state that "This operation is not supported for sequences."
Currently, we're deducing that the heap access method (AM) is
for
relkind='r'
, so the message "only heap AM is supported" implies that only relkind='r' are supported.
This prompted my thoughts on the matter.
Moreover, if you refer to the code in pgstattuple.c,
you'll notice that sequences appear to be explicitly allowed in pgstattuple,
but it results in an error encountered here -
Therefore, I believe a small refactoring is needed to make the code cleaner and more consistent.
> IMHO it would be good to establish some level of consistency here.
Agree.
Let me know your thoughts.
Regards
Ayush Vatsa
AWS
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Mon, Aug 26, 2024 at 01:35:52PM -0400, Robert Haas wrote: > On Mon, Aug 26, 2024 at 1:26 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> While it is apparently pretty uncommon to use pgstattuple on sequences, >> this is arguably a bug that should be fixed and back-patched. > > I don't understand what would make it a bug. > >> IMHO it would be good to establish some level of consistency here. > > Sure, consistency is good, all other things being equal, but just > saying "well this used to work one way and now it works another way" > isn't enough to say that there is a bug, or that something should be > changed. The reason I think it's arguably a bug is because it used to work fine and then started ERROR-ing after commit 4b82664. I'm fine with saying that we don't think it's useful and intentionally deprecating it, but AFAICT no such determination has been made. I see no discussion about this on the thread for commit 4b82664, and the only caller of pgstat_heap() intentionally calls into the affected function for sequences (and has since pgstattuple was introduced 18 years ago): if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind) || rel->rd_rel->relkind == RELKIND_SEQUENCE) { return pgstat_heap(rel, fcinfo); } -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Thu, Aug 29, 2024 at 10:17:57PM +0530, Ayush Vatsa wrote: > Please find attached the patch that re-enables > support for sequences within the pgstattuple extension. > I have also included the necessary test cases for > sequences, implemented in the form of regress tests. Thanks. Robert, do you have any concerns with this? +select * from pgstattuple('serial'); + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space| free_percent +-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+-------------- + 8192 | 1 | 41 | 0.5 | 0 | 0 | 0 | 8104| 98.93 +(1 row) I'm concerned that some of this might be platform-dependent and make the test unstable. Perhaps we should just select count(*) here. + /** + * Sequences don't fall under heap AM but are still + * allowed for obtaining tuple-level statistics. + */ I think we should be a bit more descriptive here, like the comment in verify_heapam.c: /* * Sequences always use heap AM, but they don't show that in the catalogs. * Other relkinds might be using a different AM, so check. */ -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Thu, Aug 29, 2024 at 12:36:35PM -0500, Nathan Bossart wrote: > +select * from pgstattuple('serial'); > + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space| free_percent > +-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+-------------- > + 8192 | 1 | 41 | 0.5 | 0 | 0 | 0 | 8104 | 98.93 > +(1 row) > > I'm concerned that some of this might be platform-dependent and make the > test unstable. Perhaps we should just select count(*) here. Sure enough, the CI testing for 32-bit is failing here [0]. [0] https://api.cirrus-ci.com/v1/artifact/task/4798423386292224/testrun/build-32/testrun/pgstattuple/regress/regression.diffs -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Fri, Aug 30, 2024 at 12:17:47AM +0530, Ayush Vatsa wrote: > The patch with all the changes is attached. Looks generally reasonable to me. -- nathan
On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > Thanks. Robert, do you have any concerns with this? I don't know if I'm exactly concerned but I don't understand what problem we're solving, either. I thought Ayush said that the function wouldn't produce useful results for sequences; so then why do we need to change the code to enable it? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Fri, Aug 30, 2024 at 04:07:30PM -0400, Robert Haas wrote: > On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> Thanks. Robert, do you have any concerns with this? > > I don't know if I'm exactly concerned but I don't understand what > problem we're solving, either. I thought Ayush said that the function > wouldn't produce useful results for sequences; so then why do we need > to change the code to enable it? I suppose it would be difficult to argue that it is actually useful, given it hasn't worked since v11 and apparently nobody noticed until recently. If we're content to leave it unsupported, then sure, let's just remove the "relkind == RELKIND_SEQUENCE" check in pgstat_relation(). But I also don't have a great reason to _not_ support it. It used to work (which appears to have been intentional, based on the code), it was unintentionally broken, and it'd work again with a ~1 line change. "SELECT count(*) FROM my_sequence" probably doesn't provide a lot of value, but I have no intention of proposing a patch that removes support for that. All that being said, I don't have a terribly strong opinion, but I guess I lean towards re-enabling. Another related inconsistency I just noticed in pageinspect: postgres=# select t_data from heap_page_items(get_raw_page('s', 0)); t_data -------------------------------------- \x0100000000000000000000000000000000 (1 row) postgres=# select tuple_data_split('s'::regclass, t_data, t_infomask, t_infomask2, t_bits) from heap_page_items(get_raw_page('s',0)); ERROR: only heap AM is supported -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Matthias van de Meent
Date:
On Fri, 30 Aug 2024, 23:06 Nathan Bossart, <nathandbossart@gmail.com> wrote: > > On Fri, Aug 30, 2024 at 04:07:30PM -0400, Robert Haas wrote: > > On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > >> Thanks. Robert, do you have any concerns with this? > > > > I don't know if I'm exactly concerned but I don't understand what > > problem we're solving, either. I thought Ayush said that the function > > wouldn't produce useful results for sequences; so then why do we need > > to change the code to enable it? > > I suppose it would be difficult to argue that it is actually useful, given > it hasn't worked since v11 and apparently nobody noticed until recently. > If we're content to leave it unsupported, then sure, let's just remove the > "relkind == RELKIND_SEQUENCE" check in pgstat_relation(). But I also don't > have a great reason to _not_ support it. It used to work (which appears to > have been intentional, based on the code), it was unintentionally broken, > and it'd work again with a ~1 line change. "SELECT count(*) FROM > my_sequence" probably doesn't provide a lot of value, but I have no > intention of proposing a patch that removes support for that. > > All that being said, I don't have a terribly strong opinion, but I guess I > lean towards re-enabling. > > Another related inconsistency I just noticed in pageinspect: > > postgres=# select t_data from heap_page_items(get_raw_page('s', 0)); > t_data > -------------------------------------- > \x0100000000000000000000000000000000 > (1 row) > > postgres=# select tuple_data_split('s'::regclass, t_data, t_infomask, t_infomask2, t_bits) from heap_page_items(get_raw_page('s',0)); > ERROR: only heap AM is supported I don't think this is an inconsistency: heap_page_items works on a raw page-as-bytea (produced by get_raw_page) without knowing about or accessing the actual relation type of that page, so it doesn't have the context why it should error out if the page looks similar enough to a heap page. I could feed it an arbitrary bytea, and it should still work as long as that bytea looks similar enough to a heap page. tuple_data_split, however, uses the regclass to decode the contents of the tuple, and can thus determine with certainty based on that regclass that it was supplied incorrect (non-heapAM table's regclass) arguments. It therefore has enough context to bail out and stop trying to decode the page's tuple data. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Tue, Sep 03, 2024 at 10:19:33PM +0200, Matthias van de Meent wrote: > On Fri, 30 Aug 2024, 23:06 Nathan Bossart, <nathandbossart@gmail.com> wrote: >> Another related inconsistency I just noticed in pageinspect: >> >> postgres=# select t_data from heap_page_items(get_raw_page('s', 0)); >> t_data >> -------------------------------------- >> \x0100000000000000000000000000000000 >> (1 row) >> >> postgres=# select tuple_data_split('s'::regclass, t_data, t_infomask, t_infomask2, t_bits) from heap_page_items(get_raw_page('s',0)); >> ERROR: only heap AM is supported > > I don't think this is an inconsistency: > > heap_page_items works on a raw page-as-bytea (produced by > get_raw_page) without knowing about or accessing the actual relation > type of that page, so it doesn't have the context why it should error > out if the page looks similar enough to a heap page. I could feed it > an arbitrary bytea, and it should still work as long as that bytea > looks similar enough to a heap page. > tuple_data_split, however, uses the regclass to decode the contents of > the tuple, and can thus determine with certainty based on that > regclass that it was supplied incorrect (non-heapAM table's regclass) > arguments. It therefore has enough context to bail out and stop trying > to decode the page's tuple data. My point is really that tuple_data_split() needlessly ERRORs for sequences. Other heap functions work fine for sequences, and we know it uses the heap table AM, so why should tuple_data_split() fail? I agree that the others needn't enforce relkind checks and that they might succeed in some cases where tuple_data_split() might not be appropriate. -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
Barring objections, I'm planning to commit v3 soon. Robert/Matthias, I'm not sure you are convinced this is the right thing to do (or worth doing, rather), but I don't sense that you are actually opposed to it, either. Please correct me if I am wrong. -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Matthias van de Meent
Date:
On Wed, 11 Sept 2024 at 21:36, Nathan Bossart <nathandbossart@gmail.com> wrote: > > Barring objections, I'm planning to commit v3 soon. Robert/Matthias, I'm > not sure you are convinced this is the right thing to do (or worth doing, > rather), but I don't sense that you are actually opposed to it, either. > Please correct me if I am wrong. Correct: I do think making heapam-related inspection functions have a consistent behaviour when applied to sequences is beneficial, with no preference towards specifically supporting or not supporting sequences in these functions. If people that work on sequences think it's better to also support inspection of sequences, then I think that's a good reason to add that support where it doesn't already exist. As for patch v3, that seems fine with me. Matthias van de Meent Neon (https://neon.tech)
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 11:19:00AM +0900, Michael Paquier wrote: > On Wed, Sep 11, 2024 at 11:02:43PM +0100, Matthias van de Meent wrote: >> As for patch v3, that seems fine with me. > > +1 from here as well, after looking at what v3 is doing for these two > modules. Committed. -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 04:39:14PM -0500, Nathan Bossart wrote: > Committed. Ugh, the buildfarm is unhappy with the new tests [0] [1]. Will fix. [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2024-09-12%2022%3A54%3A45 [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skimmer&dt=2024-09-12%2022%3A38%3A13 -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Fri, Sep 13, 2024 at 09:47:36AM +0900, Michael Paquier wrote: > On Thu, Sep 12, 2024 at 07:41:30PM -0500, Nathan Bossart wrote: >> Ugh, the buildfarm is unhappy with the new tests [0] [1]. Will fix. > > I'd suggest to switch the test to return a count() and make sure that > one record exists. The data in the page does not really matter. That's what I had in mind. I see that skimmer is failing with this error: ERROR: cannot access temporary tables during a parallel operation This makes sense because that machine has debug_parallel_query/force_parallel_mode set to "regress", but this test file has used a temporary table for a couple of years without issue... -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 07:56:56PM -0500, Nathan Bossart wrote: > I see that skimmer is failing with this error: > > ERROR: cannot access temporary tables during a parallel operation > > This makes sense because that machine has > debug_parallel_query/force_parallel_mode set to "regress", but this test > file has used a temporary table for a couple of years without issue... Oh, the answer seems to be commits aeaaf52 and 47a22dc. In short, we can't use a temporary sequence in this test for versions older than v15. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > Here's a patch to make the sequence permanent and to make the output of > tuple_data_split() not depend on endianness. +1 --- I checked this on mamba's host and it does produce "\\x0100000000000001" regardless of endianness. regards, tom lane
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 10:40:11PM -0400, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> Here's a patch to make the sequence permanent and to make the output of >> tuple_data_split() not depend on endianness. > > +1 --- I checked this on mamba's host and it does produce > "\\x0100000000000001" regardless of endianness. Thanks for checking. I'll commit this fix in the morning. -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 10:41:27PM -0500, Nathan Bossart wrote: > Thanks for checking. I'll commit this fix in the morning. Committed. -- nathan