Re: Rename dead_tuples to dead_items in vacuumlazy.c - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Rename dead_tuples to dead_items in vacuumlazy.c
Date
Msg-id CAH2-WzkPRUkE6ad9u1zAseX-_aEHtnaQ8acr21C4Sb7vbeUK-w@mail.gmail.com
Whole thread Raw
In response to Re: Rename dead_tuples to dead_items in vacuumlazy.c  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Rename dead_tuples to dead_items in vacuumlazy.c
List pgsql-hackers
On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thanks! I'll change my parallel vacuum refactoring patch accordingly.

Thanks again for working on that.

> Regarding the commit, I think that there still is one place in
> lazyvacuum.c where we can change "dead tuples” to "dead items”:
>
>     /*
>      * Allocate the space for dead tuples.  Note that this handles parallel
>      * VACUUM initialization as part of allocating shared memory space used
>      * for dead_items.
>      */
>     dead_items_alloc(vacrel, params->nworkers);
>     dead_items = vacrel->dead_items;

Oops. Pushed a fixup for that just now.

> Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
> and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?

That was deliberate.

It would be a bit strange to alter these constants without also
updating the corresponding column names for the
pg_stat_progress_vacuum system view. But if I kept the definition from
system_views.sql in sync, then I would break user scripts -- for
reasons that users don't care about. That didn't seem like the right
approach.

Also, the system as a whole still assumes "DEAD tuples and LP_DEAD
items are the same, and are just as much of a problem in the table as
they are in each index". As you know, this is not really true, which
is an important problem for us. Fixing it (perhaps as part of adding
something like Robert's conveyor belt design) will likely require
revising this model quite fundamentally (e.g, the vacthresh
calculation in autovacuum.c:relation_needs_vacanalyze() would be
replaced). When this happens, we'll probably need to update system
views that have columns with names like "dead_tuples" -- because maybe
we no longer specifically count dead items/tuples at all. I strongly
suspect that the approach to statistics that we take for pg_statistic
optimizer stats just doesn't work for dead items/tuples -- statistical
sampling only produces useful statistics for the optimizer because
certain delicate assumptions are met (even these assumptions only
really work with a properly normalized database schema).

Maybe revising the model used for autovacuum scheduling wouldn't
include changing pg_stat_progress_vacuum, since that isn't technically
"part of the model" --- I'm not sure. But it's not something that I am
in a hurry to fix.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd
Next
From: Peter Geoghegan
Date:
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations