Re: Rename dead_tuples to dead_items in vacuumlazy.c - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Rename dead_tuples to dead_items in vacuumlazy.c
Date
Msg-id CAD21AoCYdn9n+-ZBD_WyJq-4Ws=E6rErcNDsB_1DTfDJ-DzwDw@mail.gmail.com
Whole thread Raw
In response to Re: Rename dead_tuples to dead_items in vacuumlazy.c  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Dec 1, 2021 at 4:42 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Thanks! I'll change my parallel vacuum refactoring patch accordingly.
>
> Thanks again for working on that.
>
> > Regarding the commit, I think that there still is one place in
> > lazyvacuum.c where we can change "dead tuples” to "dead items”:
> >
> >     /*
> >      * Allocate the space for dead tuples.  Note that this handles parallel
> >      * VACUUM initialization as part of allocating shared memory space used
> >      * for dead_items.
> >      */
> >     dead_items_alloc(vacrel, params->nworkers);
> >     dead_items = vacrel->dead_items;
>
> Oops. Pushed a fixup for that just now.

Thanks!

>
> > Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
> > and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?
>
> That was deliberate.
>
> It would be a bit strange to alter these constants without also
> updating the corresponding column names for the
> pg_stat_progress_vacuum system view. But if I kept the definition from
> system_views.sql in sync, then I would break user scripts -- for
> reasons that users don't care about. That didn't seem like the right
> approach.

Agreed.

>
> Also, the system as a whole still assumes "DEAD tuples and LP_DEAD
> items are the same, and are just as much of a problem in the table as
> they are in each index". As you know, this is not really true, which
> is an important problem for us. Fixing it (perhaps as part of adding
> something like Robert's conveyor belt design) will likely require
> revising this model quite fundamentally (e.g, the vacthresh
> calculation in autovacuum.c:relation_needs_vacanalyze() would be
> replaced). When this happens, we'll probably need to update system
> views that have columns with names like "dead_tuples" -- because maybe
> we no longer specifically count dead items/tuples at all. I strongly
> suspect that the approach to statistics that we take for pg_statistic
> optimizer stats just doesn't work for dead items/tuples -- statistical
> sampling only produces useful statistics for the optimizer because
> certain delicate assumptions are met (even these assumptions only
> really work with a properly normalized database schema).
>
> Maybe revising the model used for autovacuum scheduling wouldn't
> include changing pg_stat_progress_vacuum, since that isn't technically
> "part of the model" --- I'm not sure. But it's not something that I am
> in a hurry to fix.

Understood.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Bharath Rupireddy
Date:
Subject: Re: pg_replslotdata - a tool for displaying replication slot information