Thread: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

Hi PostgreSQL Community,
I have encountered an issue when attempting to use pgstattuple extension with sequences. When executing the following command:
SELECT * FROM pgstattuple('serial');
ERROR: only heap AM is supported
This behaviour is observed in PostgreSQL versions post v11 [1] , where sequences support in pgstattuple used to work fine. However, this issue slipped through as we did not have any test cases to catch it.

Given the situation, I see two potential paths forward:
1/ Reintroduce Support for Sequences in pgstattuple: This would be a relatively small change. However, it's important to note that the purpose of pgstattuple is to provide statistics like the number of tuples, dead tuples, and free space in a relation. Sequences, on the other hand, return only one value at a time and don’t have attributes like dead tuples. Therefore, the result for any sequence would consistently look something like this:
SELECT * FROM pgstattuple('serial');
table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
8192 | 1 | 41 | 0.5 | 0 | 0 | 0 | 8104 | 98.93
(1 row)

2/ Explicitly Block Sequence Support in pgstattuple: We could align sequences with other unsupported objects, such as foreign tables, by providing a more explicit error message. For instance:
SELECT * FROM pgstattuple('x');
ERROR: cannot get tuple-level statistics for relation "x"
DETAIL: This operation is not supported for foreign tables.
This approach would ensure that the error handling for sequences is consistent with how other unsupported objects are handled.
Personally, I lean towards the second approach, as it promotes consistency and clarity. However, I would greatly appreciate the community's feedback and suggestions on the best way to proceed.
Based on the feedback received, I will work on the appropriate patch.

Looking forward to your comments and feedback.

[1] Reference to Earlier Discussion: For additional context, I previously discussed this issue on the pgsql-general mailing list. You can find the discussion https://www.postgresql.org/message-id/CACX%2BKaMOd3HHteOJNX7fkWxO%2BR%3DuLJkfKqE2-QUK8fKmKfOwqw%40mail.gmail.com. In that thread, it was suggested that this could be considered a documentation bug, and that we might update the documentation and regression tests accordingly.

Regards
Ayush Vatsa
AWS
On Mon, Aug 26, 2024 at 11:44 AM Ayush Vatsa <ayushvatsa1810@gmail.com> wrote:
> Hi PostgreSQL Community,
> I have encountered an issue when attempting to use pgstattuple extension with sequences. When executing the following
command:
>
> SELECT * FROM pgstattuple('serial');
> ERROR:  only heap AM is supported
>
> This behaviour is observed in PostgreSQL versions post v11 [1] , where sequences support in pgstattuple used to work
fine.However, this issue slipped through as we did not have any test cases to catch it. 
>
> Given the situation, I see two potential paths forward:
> 1/ Reintroduce Support for Sequences in pgstattuple: This would be a relatively small change. However, it's important
tonote that the purpose of pgstattuple is to provide statistics like the number of tuples, dead tuples, and free space
ina relation. Sequences, on the other hand, return only one value at a time and don’t have attributes like dead tuples.
Therefore,the result for any sequence would consistently look something like this: 
>
> SELECT * FROM pgstattuple('serial');
>  table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent |
free_space| free_percent 
>
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
>       8192 |           1 |        41 |           0.5 |                0 |              0 |                  0 |
8104|        98.93 
> (1 row)
>
>
> 2/ Explicitly Block Sequence Support in pgstattuple: We could align sequences with other unsupported objects, such as
foreigntables, by providing a more explicit error message. For instance: 
>
> SELECT * FROM pgstattuple('x');
> ERROR:  cannot get tuple-level statistics for relation "x"
> DETAIL:  This operation is not supported for foreign tables.
>
> This approach would ensure that the error handling for sequences is consistent with how other unsupported objects are
handled.
> Personally, I lean towards the second approach, as it promotes consistency and clarity. However, I would greatly
appreciatethe community's feedback and suggestions on the best way to proceed. 
> Based on the feedback received, I will work on the appropriate patch.
>
> Looking forward to your comments and feedback.

I don't really see what the problem is here. You state that the
information pgstattuple provides isn't really useful for sequences, so
that means there's no real reason to do (1). As for (2), I'm not
opposed to improving error messages but it's not clear to me why you
think that the current one is bad. You say that we should provide a
more explicit error message, but "only heap AM is supported" seems
pretty explicit to me: it doesn't spell out that this only works for
relkind='r', but since relam=heap is only possible for relkind='r',
there's not really any other reasonable interpretation, which IMHO
makes this pretty specific about what the problem is. Maybe you just
find it confusing, but that's a bit different from whether it's
explicit enough.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Mon, Aug 26, 2024 at 09:14:27PM +0530, Ayush Vatsa wrote:
> Given the situation, I see two potential paths forward:
> *1/ Reintroduce Support for Sequences in pgstattuple*: This would be a
> relatively small change. However, it's important to note that the purpose
> of pgstattuple is to provide statistics like the number of tuples, dead
> tuples, and free space in a relation. Sequences, on the other hand, return
> only one value at a time and don´t have attributes like dead tuples.
>
> [...] 
> 
> *2/ Explicitly Block Sequence Support in pgstattuple*: We could align
> sequences with other unsupported objects, such as foreign tables, by
> providing a more explicit error message.

While it is apparently pretty uncommon to use pgstattuple on sequences,
this is arguably a bug that should be fixed and back-patched.  I've CC'd
Michael Paquier, who is working on sequence AMs and may have thoughts.  I
haven't looked at his patch set for that, but I'm assuming that it might
fill in pg_class.relam for sequences, which would have the same effect as
option 1.

I see a couple of other places we might want to look into as part of this
thread.  Besides pgstattuple, I see that pageinspect and pg_surgery follow
a similar pattern.  pgrowlocks does, too, but that one seems intentionally
limited to RELKIND_RELATION.  I also see that amcheck explicitly allows
sequences:

    /*
     * Sequences always use heap AM, but they don't show that in the catalogs.
     * Other relkinds might be using a different AM, so check.
     */
    if (ctx.rel->rd_rel->relkind != RELKIND_SEQUENCE &&
        ctx.rel->rd_rel->relam != HEAP_TABLE_AM_OID)
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                 errmsg("only heap AM is supported")));

IMHO it would be good to establish some level of consistency here.

-- 
nathan



On Mon, Aug 26, 2024 at 1:26 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> While it is apparently pretty uncommon to use pgstattuple on sequences,
> this is arguably a bug that should be fixed and back-patched.

I don't understand what would make it a bug.

> IMHO it would be good to establish some level of consistency here.

Sure, consistency is good, all other things being equal, but just
saying "well this used to work one way and now it works another way"
isn't enough to say that there is a bug, or that something should be
changed.

--
Robert Haas
EDB: http://www.enterprisedb.com



> You state that the
> information pgstattuple provides isn't really useful for sequences, so
> that means there's no real reason to do (1)
That's correct, but we should consider that up until v11, 
sequences were supported in pgstattuple. Their support 
was removed unintentionally (I believe so). Therefore, it might be worth 
discussing whether it makes sense to reinstate support for sequences.

> why you think that the current one is bad
The current implementation has some drawbacks. 
For instance, when encountering other unsupported objects, the error looks like this:
ERROR: cannot get tuple-level statistics for relation "x"
DETAIL: This operation is not supported for foreign tables.

However, for sequences, the message should explicitly 
state that "This operation is not supported for sequences."

Currently, we're deducing that the heap access method (AM) is 
for relkind='r', so the message "only heap AM is supported" implies 
that only relkind='r' are supported. 
This prompted my thoughts on the matter.

Moreover, if you refer to the code in pgstattuple.c
you'll notice that sequences appear to be explicitly allowed in pgstattuple, 
but it results in an error encountered here - 
Therefore, I believe a small refactoring is needed to make the code cleaner and more consistent.

> IMHO it would be good to establish some level of consistency here.
Agree.

Let me know your thoughts.

Regards
Ayush Vatsa
AWS
On Mon, Aug 26, 2024 at 01:35:52PM -0400, Robert Haas wrote:
> On Mon, Aug 26, 2024 at 1:26 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> While it is apparently pretty uncommon to use pgstattuple on sequences,
>> this is arguably a bug that should be fixed and back-patched.
> 
> I don't understand what would make it a bug.
> 
>> IMHO it would be good to establish some level of consistency here.
> 
> Sure, consistency is good, all other things being equal, but just
> saying "well this used to work one way and now it works another way"
> isn't enough to say that there is a bug, or that something should be
> changed.

The reason I think it's arguably a bug is because it used to work fine and
then started ERROR-ing after commit 4b82664.  I'm fine with saying that we
don't think it's useful and intentionally deprecating it, but AFAICT no
such determination has been made.  I see no discussion about this on the
thread for commit 4b82664, and the only caller of pgstat_heap()
intentionally calls into the affected function for sequences (and has since
pgstattuple was introduced 18 years ago):

    if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind) ||
        rel->rd_rel->relkind == RELKIND_SEQUENCE)
    {
        return pgstat_heap(rel, fcinfo);
    }

-- 
nathan



On Thu, Aug 29, 2024 at 10:17:57PM +0530, Ayush Vatsa wrote:
> Please find attached the patch that re-enables
> support for sequences within the pgstattuple extension.
> I have also included the necessary test cases for
> sequences, implemented in the form of regress tests.

Thanks.  Robert, do you have any concerns with this?

+select * from pgstattuple('serial');
+ table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent |
free_space| free_percent 
 

+-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
+      8192 |           1 |        41 |           0.5 |                0 |              0 |                  0 |
8104|        98.93
 
+(1 row)

I'm concerned that some of this might be platform-dependent and make the
test unstable.  Perhaps we should just select count(*) here.

+    /**
+     * Sequences don't fall under heap AM but are still
+     * allowed for obtaining tuple-level statistics.
+     */

I think we should be a bit more descriptive here, like the comment in
verify_heapam.c:

    /*
     * Sequences always use heap AM, but they don't show that in the catalogs.
     * Other relkinds might be using a different AM, so check.
     */

-- 
nathan



On Thu, Aug 29, 2024 at 12:36:35PM -0500, Nathan Bossart wrote:
> +select * from pgstattuple('serial');
> + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent |
free_space| free_percent 
 
>
+-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
> +      8192 |           1 |        41 |           0.5 |                0 |              0 |                  0 |
8104 |        98.93
 
> +(1 row)
> 
> I'm concerned that some of this might be platform-dependent and make the
> test unstable.  Perhaps we should just select count(*) here.

Sure enough, the CI testing for 32-bit is failing here [0].

[0]
https://api.cirrus-ci.com/v1/artifact/task/4798423386292224/testrun/build-32/testrun/pgstattuple/regress/regression.diffs

-- 
nathan



On Fri, Aug 30, 2024 at 12:17:47AM +0530, Ayush Vatsa wrote:
> The patch with all the changes is attached.

Looks generally reasonable to me.

-- 
nathan



On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Thanks.  Robert, do you have any concerns with this?

I don't know if I'm exactly concerned but I don't understand what
problem we're solving, either. I thought Ayush said that the function
wouldn't produce useful results for sequences; so then why do we need
to change the code to enable it?

--
Robert Haas
EDB: http://www.enterprisedb.com



On Fri, Aug 30, 2024 at 04:07:30PM -0400, Robert Haas wrote:
> On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> Thanks.  Robert, do you have any concerns with this?
> 
> I don't know if I'm exactly concerned but I don't understand what
> problem we're solving, either. I thought Ayush said that the function
> wouldn't produce useful results for sequences; so then why do we need
> to change the code to enable it?

I suppose it would be difficult to argue that it is actually useful, given
it hasn't worked since v11 and apparently nobody noticed until recently.
If we're content to leave it unsupported, then sure, let's just remove the
"relkind == RELKIND_SEQUENCE" check in pgstat_relation().  But I also don't
have a great reason to _not_ support it.  It used to work (which appears to
have been intentional, based on the code), it was unintentionally broken,
and it'd work again with a ~1 line change.  "SELECT count(*) FROM
my_sequence" probably doesn't provide a lot of value, but I have no
intention of proposing a patch that removes support for that.

All that being said, I don't have a terribly strong opinion, but I guess I
lean towards re-enabling.

Another related inconsistency I just noticed in pageinspect:

    postgres=# select t_data from heap_page_items(get_raw_page('s', 0));
                    t_data
    --------------------------------------
     \x0100000000000000000000000000000000
    (1 row)

    postgres=# select tuple_data_split('s'::regclass, t_data, t_infomask, t_infomask2, t_bits) from
heap_page_items(get_raw_page('s',0));
 
    ERROR:  only heap AM is supported

-- 
nathan



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Matthias van de Meent
Date:
On Fri, 30 Aug 2024, 23:06 Nathan Bossart, <nathandbossart@gmail.com> wrote:
>
> On Fri, Aug 30, 2024 at 04:07:30PM -0400, Robert Haas wrote:
> > On Thu, Aug 29, 2024 at 1:36 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> Thanks.  Robert, do you have any concerns with this?
> >
> > I don't know if I'm exactly concerned but I don't understand what
> > problem we're solving, either. I thought Ayush said that the function
> > wouldn't produce useful results for sequences; so then why do we need
> > to change the code to enable it?
>
> I suppose it would be difficult to argue that it is actually useful, given
> it hasn't worked since v11 and apparently nobody noticed until recently.
> If we're content to leave it unsupported, then sure, let's just remove the
> "relkind == RELKIND_SEQUENCE" check in pgstat_relation().  But I also don't
> have a great reason to _not_ support it.  It used to work (which appears to
> have been intentional, based on the code), it was unintentionally broken,
> and it'd work again with a ~1 line change.  "SELECT count(*) FROM
> my_sequence" probably doesn't provide a lot of value, but I have no
> intention of proposing a patch that removes support for that.
>
> All that being said, I don't have a terribly strong opinion, but I guess I
> lean towards re-enabling.
>
> Another related inconsistency I just noticed in pageinspect:
>
>     postgres=# select t_data from heap_page_items(get_raw_page('s', 0));
>                     t_data
>     --------------------------------------
>      \x0100000000000000000000000000000000
>     (1 row)
>
>     postgres=# select tuple_data_split('s'::regclass, t_data, t_infomask, t_infomask2, t_bits) from
heap_page_items(get_raw_page('s',0)); 
>     ERROR:  only heap AM is supported

I don't think this is an inconsistency:

heap_page_items works on a raw page-as-bytea (produced by
get_raw_page) without knowing about or accessing the actual relation
type of that page, so it doesn't have the context why it should error
out if the page looks similar enough to a heap page. I could feed it
an arbitrary bytea, and it should still work as long as that bytea
looks similar enough to a heap page.
tuple_data_split, however, uses the regclass to decode the contents of
the tuple, and can thus determine with certainty based on that
regclass that it was supplied incorrect (non-heapAM table's regclass)
arguments. It therefore has enough context to bail out and stop trying
to decode the page's tuple data.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Nathan Bossart
Date:
On Tue, Sep 03, 2024 at 10:19:33PM +0200, Matthias van de Meent wrote:
> On Fri, 30 Aug 2024, 23:06 Nathan Bossart, <nathandbossart@gmail.com> wrote:
>> Another related inconsistency I just noticed in pageinspect:
>>
>>     postgres=# select t_data from heap_page_items(get_raw_page('s', 0));
>>                     t_data
>>     --------------------------------------
>>      \x0100000000000000000000000000000000
>>     (1 row)
>>
>>     postgres=# select tuple_data_split('s'::regclass, t_data, t_infomask, t_infomask2, t_bits) from
heap_page_items(get_raw_page('s',0));
 
>>     ERROR:  only heap AM is supported
> 
> I don't think this is an inconsistency:
> 
> heap_page_items works on a raw page-as-bytea (produced by
> get_raw_page) without knowing about or accessing the actual relation
> type of that page, so it doesn't have the context why it should error
> out if the page looks similar enough to a heap page. I could feed it
> an arbitrary bytea, and it should still work as long as that bytea
> looks similar enough to a heap page.
> tuple_data_split, however, uses the regclass to decode the contents of
> the tuple, and can thus determine with certainty based on that
> regclass that it was supplied incorrect (non-heapAM table's regclass)
> arguments. It therefore has enough context to bail out and stop trying
> to decode the page's tuple data.

My point is really that tuple_data_split() needlessly ERRORs for sequences.
Other heap functions work fine for sequences, and we know it uses the heap
table AM, so why should tuple_data_split() fail?  I agree that the others
needn't enforce relkind checks and that they might succeed in some cases
where tuple_data_split() might not be appropriate.

-- 
nathan



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Nathan Bossart
Date:
Barring objections, I'm planning to commit v3 soon.  Robert/Matthias, I'm
not sure you are convinced this is the right thing to do (or worth doing,
rather), but I don't sense that you are actually opposed to it, either.
Please correct me if I am wrong.

-- 
nathan



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Matthias van de Meent
Date:
On Wed, 11 Sept 2024 at 21:36, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Barring objections, I'm planning to commit v3 soon.  Robert/Matthias, I'm
> not sure you are convinced this is the right thing to do (or worth doing,
> rather), but I don't sense that you are actually opposed to it, either.
> Please correct me if I am wrong.

Correct: I do think making heapam-related inspection functions have a
consistent behaviour when applied to sequences is beneficial, with no
preference towards specifically supporting or not supporting sequences
in these functions. If people that work on sequences think it's better
to also support inspection of sequences, then I think that's a good
reason to add that support where it doesn't already exist.

As for patch v3, that seems fine with me.

Matthias van de Meent
Neon (https://neon.tech)



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 11:19:00AM +0900, Michael Paquier wrote:
> On Wed, Sep 11, 2024 at 11:02:43PM +0100, Matthias van de Meent wrote:
>> As for patch v3, that seems fine with me.
> 
> +1 from here as well, after looking at what v3 is doing for these two
> modules.

Committed.

-- 
nathan



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 04:39:14PM -0500, Nathan Bossart wrote:
> Committed.

Ugh, the buildfarm is unhappy with the new tests [0] [1].  Will fix.

[0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2024-09-12%2022%3A54%3A45
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skimmer&dt=2024-09-12%2022%3A38%3A13

-- 
nathan



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Nathan Bossart
Date:
On Fri, Sep 13, 2024 at 09:47:36AM +0900, Michael Paquier wrote:
> On Thu, Sep 12, 2024 at 07:41:30PM -0500, Nathan Bossart wrote:
>> Ugh, the buildfarm is unhappy with the new tests [0] [1].  Will fix.
> 
> I'd suggest to switch the test to return a count() and make sure that
> one record exists.  The data in the page does not really matter.

That's what I had in mind.  I see that skimmer is failing with this error:

    ERROR:  cannot access temporary tables during a parallel operation

This makes sense because that machine has
debug_parallel_query/force_parallel_mode set to "regress", but this test
file has used a temporary table for a couple of years without issue...

-- 
nathan



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 07:56:56PM -0500, Nathan Bossart wrote:
> I see that skimmer is failing with this error:
> 
>     ERROR:  cannot access temporary tables during a parallel operation
> 
> This makes sense because that machine has
> debug_parallel_query/force_parallel_mode set to "regress", but this test
> file has used a temporary table for a couple of years without issue...

Oh, the answer seems to be commits aeaaf52 and 47a22dc.  In short, we can't
use a temporary sequence in this test for versions older than v15.

-- 
nathan



Nathan Bossart <nathandbossart@gmail.com> writes:
> Here's a patch to make the sequence permanent and to make the output of
> tuple_data_split() not depend on endianness.

+1 --- I checked this on mamba's host and it does produce
"\\x0100000000000001" regardless of endianness.

            regards, tom lane



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 10:40:11PM -0400, Tom Lane wrote:
> Nathan Bossart <nathandbossart@gmail.com> writes:
>> Here's a patch to make the sequence permanent and to make the output of
>> tuple_data_split() not depend on endianness.
> 
> +1 --- I checked this on mamba's host and it does produce
> "\\x0100000000000001" regardless of endianness.

Thanks for checking.  I'll commit this fix in the morning.

-- 
nathan



Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch

From
Nathan Bossart
Date:
On Thu, Sep 12, 2024 at 10:41:27PM -0500, Nathan Bossart wrote:
> Thanks for checking.  I'll commit this fix in the morning.

Committed.

-- 
nathan