Thread: Revive num_dead_tuples column of pg_stat_progress_vacuum

Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Masahiko Sawada
Date:
Hi,

Commit 667e65aac3 changed num_dead_tuples and max_dead_tuples columns
to dead_tuple_bytes and max_dead_tuple_bytes columns, respectively.
But at PGConf.dev, I got feedback from multiple people that
num_dead_tuples information still can provide meaning insights for
users to understand the vacuum progress. One use case is to compare
num_dead_tuples to pg_stat_all_tables.n_dead_tup column.

I've attached the patch to revive num_dead_tuples column back to the
pg_stat_progress_vacuum view. This requires to bump catalog version.
We're post-beta1 but it should be okay as it's only for PG17.

Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Peter Geoghegan
Date:
On Mon, Jun 3, 2024 at 5:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've attached the patch to revive num_dead_tuples column back to the
> pg_stat_progress_vacuum view. This requires to bump catalog version.
> We're post-beta1 but it should be okay as it's only for PG17.
>
> Feedback is very welcome.

Can we rename this to num_dead_item_ids (or something similar) in
passing? That way we'll avoid confusing the number of dead tuples
removed from the table by VACUUM (which includes dead heap-only
tuples, but excludes any preexisting LP_DEAD items left behind by
opportunistic pruning) with the number of dead item identifiers.

As you know, TIDStore stores TIDs that refer to dead item identifiers
in the heap, which is often very different to the number of dead
tuples removed by VACUUM. The VACUUM log output has reported on dead
item identifiers separately since 14. This seems like a good
opportunity to bring pg_stat_progress_vacuum in line.

--
Peter Geoghegan



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Alvaro Herrera
Date:
On 2024-Jun-03, Masahiko Sawada wrote:

> Commit 667e65aac3 changed num_dead_tuples and max_dead_tuples columns
> to dead_tuple_bytes and max_dead_tuple_bytes columns, respectively.
> But at PGConf.dev, I got feedback from multiple people that
> num_dead_tuples information still can provide meaning insights for
> users to understand the vacuum progress. One use case is to compare
> num_dead_tuples to pg_stat_all_tables.n_dead_tup column.

+1.

> @@ -2887,7 +2887,9 @@ dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
>      TidStoreSetBlockOffsets(dead_items, blkno, offsets, num_offsets);
>      vacrel->dead_items_info->num_items += num_offsets;
>  
> -    /* update the memory usage report */
> +    /* update the progress information */
> +    pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
> +                                 vacrel->dead_items_info->num_items);
>      pgstat_progress_update_param(PROGRESS_VACUUM_DEAD_TUPLE_BYTES,
>                                   TidStoreMemoryUsage(dead_items));
>  }

You could use pgstat_progress_update_multi_param here.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The important things in the world are problems with society that we don't
understand at all. The machines will become more complicated but they won't
be more complicated than the societies that run them."    (Freeman Dyson)



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
"Andrey M. Borodin"
Date:

> On 4 Jun 2024, at 00:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you! Vacuum enhancement is a really good step forward, and this small change would help a lot of observability
tools.


> On 4 Jun 2024, at 00:49, Peter Geoghegan <pg@bowt.ie> wrote:
>
> Can we rename this to num_dead_item_ids (or something similar) in
> passing?

I do not insist, but many tools will have to adapt to this change [0,1]. However, most of tools will have to deal with
removedmax_dead_tuples anyway [2], so this is not that big problem. 


Best regards, Andrey Borodin.

[0] https://github.com/jalexandre0/pgmetrics/blob/61cb150f6d1e535513a6a07fc0c6d751fbed517e/collector/collect.go#L1289
[1]
https://github.com/DataDog/integrations-core/blob/250553f3b91d25de28add93afeb929e2abf67e53/postgres/datadog_checks/postgres/util.py#L517
[2] https://github.com/search?q=num_dead_tuples+language%3AGo&type=code


Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Masahiko Sawada
Date:
On Wed, Jun 5, 2024 at 7:19 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 4 Jun 2024, at 00:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Thank you! Vacuum enhancement is a really good step forward, and this small change would help a lot of observability
tools.
>
>
> > On 4 Jun 2024, at 00:49, Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > Can we rename this to num_dead_item_ids (or something similar) in
> > passing?
>
> I do not insist, but many tools will have to adapt to this change [0,1]. However, most of tools will have to deal
withremoved max_dead_tuples anyway [2], so this is not that big problem. 

True, this incompatibility would not be a big problem.

num_dead_item_ids seems good to me. I've updated the patch that
incorporated the comment from Álvaro[1].

Regards,

[1] https://www.postgresql.org/message-id/202406041535.pmyty3ci4pfd%40alvherre.pgsql

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Masahiko Sawada
Date:
On Fri, Jun 7, 2024 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jun 5, 2024 at 7:19 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> >
> >
> >
> > > On 4 Jun 2024, at 00:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Thank you! Vacuum enhancement is a really good step forward, and this small change would help a lot of
observabilitytools. 
> >
> >
> > > On 4 Jun 2024, at 00:49, Peter Geoghegan <pg@bowt.ie> wrote:
> > >
> > > Can we rename this to num_dead_item_ids (or something similar) in
> > > passing?
> >
> > I do not insist, but many tools will have to adapt to this change [0,1]. However, most of tools will have to deal
withremoved max_dead_tuples anyway [2], so this is not that big problem. 
>
> True, this incompatibility would not be a big problem.
>
> num_dead_item_ids seems good to me. I've updated the patch that
> incorporated the comment from Álvaro[1].

I'm going to push the v2 patch in a few days if there is no further comment.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Peter Geoghegan
Date:
On Thu, Jun 6, 2024 at 9:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> num_dead_item_ids seems good to me. I've updated the patch that
> incorporated the comment from Álvaro[1].

Great, thank you.

--
Peter Geoghegan



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Masahiko Sawada
Date:
On Tue, Jun 11, 2024 at 10:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jun 7, 2024 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jun 5, 2024 at 7:19 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > >
> > >
> > >
> > > > On 4 Jun 2024, at 00:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Thank you! Vacuum enhancement is a really good step forward, and this small change would help a lot of
observabilitytools. 
> > >
> > >
> > > > On 4 Jun 2024, at 00:49, Peter Geoghegan <pg@bowt.ie> wrote:
> > > >
> > > > Can we rename this to num_dead_item_ids (or something similar) in
> > > > passing?
> > >
> > > I do not insist, but many tools will have to adapt to this change [0,1]. However, most of tools will have to deal
withremoved max_dead_tuples anyway [2], so this is not that big problem. 
> >
> > True, this incompatibility would not be a big problem.
> >
> > num_dead_item_ids seems good to me. I've updated the patch that
> > incorporated the comment from Álvaro[1].
>
> I'm going to push the v2 patch in a few days if there is no further comment.
>

I was about to push the patch but let me confirm just in case: is it
okay to bump the catversion even after post-beta1? This patch
reintroduces a previously-used column to pg_stat_progress_vacuum so it
requires bumping the catversion.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Tom Lane
Date:
Masahiko Sawada <sawada.mshk@gmail.com> writes:
> I was about to push the patch but let me confirm just in case: is it
> okay to bump the catversion even after post-beta1?

Yes, that happens somewhat routinely.

            regards, tom lane



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Michael Paquier
Date:
On Thu, Jun 13, 2024 at 08:38:05PM -0400, Tom Lane wrote:
> Masahiko Sawada <sawada.mshk@gmail.com> writes:
>> I was about to push the patch but let me confirm just in case: is it
>> okay to bump the catversion even after post-beta1?
>
> Yes, that happens somewhat routinely.

Up to RC, even after beta2.  This happens routinely every year because
tweaks are always required for what got committed.  And that's OK to
do so now.
--
Michael

Attachment

Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Masahiko Sawada
Date:
On Fri, Jun 14, 2024 at 9:41 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jun 13, 2024 at 08:38:05PM -0400, Tom Lane wrote:
> > Masahiko Sawada <sawada.mshk@gmail.com> writes:
> >> I was about to push the patch but let me confirm just in case: is it
> >> okay to bump the catversion even after post-beta1?
> >
> > Yes, that happens somewhat routinely.
>
> Up to RC, even after beta2.  This happens routinely every year because
> tweaks are always required for what got committed.  And that's OK to
> do so now.

Thank you both for confirmation. I'll push it shortly.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Masahiko Sawada
Date:
On Fri, Jun 14, 2024 at 9:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jun 14, 2024 at 9:41 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Jun 13, 2024 at 08:38:05PM -0400, Tom Lane wrote:
> > > Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > >> I was about to push the patch but let me confirm just in case: is it
> > >> okay to bump the catversion even after post-beta1?
> > >
> > > Yes, that happens somewhat routinely.
> >
> > Up to RC, even after beta2.  This happens routinely every year because
> > tweaks are always required for what got committed.  And that's OK to
> > do so now.
>
> Thank you both for confirmation. I'll push it shortly.
>

Pushed. Thank you for giving feedback and reviewing the patch!

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Robert Treat
Date:
On Thu, Jun 13, 2024 at 9:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Jun 14, 2024 at 9:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Fri, Jun 14, 2024 at 9:41 AM Michael Paquier <michael@paquier.xyz> wrote:
> > > On Thu, Jun 13, 2024 at 08:38:05PM -0400, Tom Lane wrote:
> > > > Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > > >> I was about to push the patch but let me confirm just in case: is it
> > > >> okay to bump the catversion even after post-beta1?
> > > >
> > > > Yes, that happens somewhat routinely.
> > >
> > > Up to RC, even after beta2.  This happens routinely every year because
> > > tweaks are always required for what got committed.  And that's OK to
> > > do so now.
> >
> > Thank you both for confirmation. I'll push it shortly.
> >
>
> Pushed. Thank you for giving feedback and reviewing the patch!
>

One minor side effect of this change is the original idea of comparing
pg_stat_progress.num_dead_tuples to pg_stat_all_tables.n_dead_tup
column becomes less obvious. I presume the release notes for
pg_stat_progress_vacuum will be updated to also include this column
name change as well, so maybe that's enough for folks to figure things
out? At least I couldn't find anywhere in the docs where we have
described the relationship between these columns before. Thoughts?

Robert Treat
https://xzilla.net



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Masahiko Sawada
Date:
On Sat, Jun 15, 2024 at 8:47 PM Robert Treat <rob@xzilla.net> wrote:
>
> On Thu, Jun 13, 2024 at 9:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Fri, Jun 14, 2024 at 9:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > On Fri, Jun 14, 2024 at 9:41 AM Michael Paquier <michael@paquier.xyz> wrote:
> > > > On Thu, Jun 13, 2024 at 08:38:05PM -0400, Tom Lane wrote:
> > > > > Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > > > >> I was about to push the patch but let me confirm just in case: is it
> > > > >> okay to bump the catversion even after post-beta1?
> > > > >
> > > > > Yes, that happens somewhat routinely.
> > > >
> > > > Up to RC, even after beta2.  This happens routinely every year because
> > > > tweaks are always required for what got committed.  And that's OK to
> > > > do so now.
> > >
> > > Thank you both for confirmation. I'll push it shortly.
> > >
> >
> > Pushed. Thank you for giving feedback and reviewing the patch!
> >
>
> One minor side effect of this change is the original idea of comparing
> pg_stat_progress.num_dead_tuples to pg_stat_all_tables.n_dead_tup
> column becomes less obvious. I presume the release notes for
> pg_stat_progress_vacuum will be updated to also include this column
> name change as well, so maybe that's enough for folks to figure things
> out?

The release note has been updated, and I think it would help users
understand the change.

> At least I couldn't find anywhere in the docs where we have
> described the relationship between these columns before. Thoughts?

It would be a good idea to improve the documentation, but I think that
we cannot simply compare these two numbers since the numbers that
these fields count are slightly different. For instance,
pg_stat_all_tables.n_dead_tup includes the number of dead tuples that
are going to be HOT-pruned.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Revive num_dead_tuples column of pg_stat_progress_vacuum

From
Peter Geoghegan
Date:
On Tue, Jun 18, 2024 at 8:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > At least I couldn't find anywhere in the docs where we have
> > described the relationship between these columns before. Thoughts?
>
> It would be a good idea to improve the documentation, but I think that
> we cannot simply compare these two numbers since the numbers that
> these fields count are slightly different. For instance,
> pg_stat_all_tables.n_dead_tup includes the number of dead tuples that
> are going to be HOT-pruned.

This isn't a small difference. It's actually a huge one in many cases:

https://www.postgresql.org/message-id/CAH2-WzkkGT2Gt4XauS5eQOQi4mVvL5X49hBTtWccC8DEqeNfKA@mail.gmail.com

Practically speaking they're just two different things, with hardly
any fixed relationship.

--
Peter Geoghegan