Thread: Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)?

Commit 7c91a0364f standardized the approach we take to estimating
pg_class.reltuples, so that everybody agrees on what that means.
Follow-up work by commit 3d351d91 defined a pg_class.reltuples of -1
as "unknown, probably never vacuumed".

The former commit added this code and comment to vacuumlazy.c:

    /*
     * Now we can provide a better estimate of total number of surviving
     * tuples (we assume indexes are more interested in that than in the
     * number of nominally live tuples).
     */
    ivinfo.num_heap_tuples = vacrelstats->new_rel_tuples;
    ivinfo.strategy = vac_strategy;

I don't see why it makes sense to treat indexes differently here. Why
allow the special case? Why include dead tuples like this?

We make a general assumption that pg_class.reltuples only includes
live tuples, which this code contravenes. It's quite clear that
indexes are no exception to the general rule, since CREATE INDEX quite
deliberately does reltuples accounting in a way that fits with the
usual definition (live tuples only), per comments in
heapam_index_build_range_scan. One of these code paths must be doing
it wrong -- I think it's vacuumlazy.c.

This also confuses the index AM definitions. Whenever we call
ambulkdelete routines, IndexVacuumInfo.num_heap_tuples will always
come from the heap relation's existing pg_class.reltuples, which could
even be -1 -- so clearly its value can only be a count of live tuples.
On the other hand IndexVacuumInfo.num_heap_tuples might include some
dead tuples when we call amvacuumcleanup routines, since (as shown)
the value comes from vacuumlazy.c's vacrelstats->new_rel_tuples. It
would be more logical if IndexVacuumInfo.num_heap_tuples was always
the pg_class.reltuples for the table (either the original/existing
value, or the value that it's just about to be updated to).

That said, I can see why we wouldn't want to allow pg_class.reltuples
to ever be -1 in the case of an index. So I think we should bring
vacuumlazy.c in line with everything else here, without allowing that
case. I believe that the "pg_class.reltuples is -1 even after a
VACUUM" case is completely impossible following the Postgres 15 work
on VACUUM, but we should still clamp for safety in
update_relstats_all_indexes (though not in the amvacuumcleanup path).

-- 
Peter Geoghegan



Peter Geoghegan <pg@bowt.ie> writes:
> Commit 7c91a0364f standardized the approach we take to estimating
> pg_class.reltuples, so that everybody agrees on what that means.
> Follow-up work by commit 3d351d91 defined a pg_class.reltuples of -1
> as "unknown, probably never vacuumed".

> The former commit added this code and comment to vacuumlazy.c:

>     /*
>      * Now we can provide a better estimate of total number of surviving
>      * tuples (we assume indexes are more interested in that than in the
>      * number of nominally live tuples).
>      */
>     ivinfo.num_heap_tuples = vacrelstats->new_rel_tuples;
>     ivinfo.strategy = vac_strategy;

> I don't see why it makes sense to treat indexes differently here. Why
> allow the special case? Why include dead tuples like this?

The index has presumably got entries corresponding to dead tuples,
so that the number of entries it has ought to be more or less
num_heap_tuples, not reltuples (with discrepancies for concurrent
insertions of course).

> We make a general assumption that pg_class.reltuples only includes
> live tuples, which this code contravenes.

Huh?  This is not pg_class.reltuples.  If an index AM wants that, it
knows where to find it.

            regards, tom lane



On Mon, Apr 18, 2022 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I don't see why it makes sense to treat indexes differently here. Why
> > allow the special case? Why include dead tuples like this?
>
> The index has presumably got entries corresponding to dead tuples,
> so that the number of entries it has ought to be more or less
> num_heap_tuples, not reltuples (with discrepancies for concurrent
> insertions of course).

I guess that pg_class.reltuples has to include some "recently dead"
tuples in the case of an index, just because of the impracticality of
accurately counting index tuples while knowing if they're dead or
alive. However, it would be practical to update pg_class.reltuples to
a value "IndexBulkDeleteResult.num_index_tuples -
recently_dead_tuples" in update_relstats_all_indexes to compensate.
Then everything is consistent.

> > We make a general assumption that pg_class.reltuples only includes
> > live tuples, which this code contravenes.
>
> Huh?  This is not pg_class.reltuples.  If an index AM wants that, it
> knows where to find it.

It's not, but it is how we calculate
IndexBulkDeleteResult.num_index_tuples, which is related. Granted,
that won't be used to update pg_class for the index in the case where
it's just an estimate anyway.

-- 
Peter Geoghegan



Peter Geoghegan <pg@bowt.ie> writes:
> On Mon, Apr 18, 2022 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Huh?  This is not pg_class.reltuples.  If an index AM wants that, it
>> knows where to find it.

> It's not, but it is how we calculate
> IndexBulkDeleteResult.num_index_tuples, which is related.

Well, the number of entries in an index needn't be exactly the
same as the number in the underlying heap.  If we're setting
an index's reltuples to the number of actual index entries
including dead entries, I don't have a problem with that:
unlike the case for table reltuples, it's not going to result
in bad estimates of the number of rows resulting from a query.
If the planner looks at index reltuples at all, it's doing so
for cost estimation purposes, where the count including dead
entries is probably the right thing to use.

If you want to make this cleaner, maybe there's a case for
splitting reltuples into two columns.  But then index AMs
would be on the hook to determine how many of their entries
are live, which is not really an index's concern.

            regards, tom lane



On Mon, Apr 18, 2022 at 12:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If the planner looks at index reltuples at all, it's doing so
> for cost estimation purposes, where the count including dead
> entries is probably the right thing to use.

Then why does heapam_index_build_range_scan do it the other way around?

I think that it probably doesn't matter that much in practice. The
inconsistency should be noted in update_relstats_all_indexes, though.

> If you want to make this cleaner, maybe there's a case for
> splitting reltuples into two columns.  But then index AMs
> would be on the hook to determine how many of their entries
> are live, which is not really an index's concern.

The main concern behind this is that we're using
vacrel->new_rel_tuples for the IndexVacuumInfo.num_heap_tuples value
in amvacuumcleanup (but not in ambulkdelete), which is calculated
towards the end of lazy_scan_heap, like so:

    /*
     * Also compute the total number of surviving heap entries.  In the
     * (unlikely) scenario that new_live_tuples is -1, take it as zero.
     */
    vacrel->new_rel_tuples =
        Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
        vacrel->missed_dead_tuples;

I think that this doesn't really belong here; new_rel_tuples should
only be used for VACUUM VERBOSE/server log output, once we return to
heap_vacuum_rel from lazy_scan_heap. We should use
vacrel->new_live_tuples as our IndexVacuumInfo.num_heap_tuples value
in the amvacuumcleanup path (instead of new_rel_tuples). That way the
rule about IndexVacuumInfo.num_heap_tuples is simple: it's always
taken from pg_class.reltuples (for the heap rel). Either the existing
value, or the new value.

-- 
Peter Geoghegan



Peter Geoghegan <pg@bowt.ie> writes:
> I think that this doesn't really belong here; new_rel_tuples should
> only be used for VACUUM VERBOSE/server log output, once we return to
> heap_vacuum_rel from lazy_scan_heap. We should use
> vacrel->new_live_tuples as our IndexVacuumInfo.num_heap_tuples value
> in the amvacuumcleanup path (instead of new_rel_tuples). That way the
> rule about IndexVacuumInfo.num_heap_tuples is simple: it's always
> taken from pg_class.reltuples (for the heap rel). Either the existing
> value, or the new value.

The places where index AMs refer to num_heap_tuples seem to be using
it as a ceiling on estimated index tuple counts.  Given that we should
be counting dead index entries, redefining it as you suggest would be
wrong.

            regards, tom lane



On Mon, Apr 18, 2022 at 1:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The places where index AMs refer to num_heap_tuples seem to be using
> it as a ceiling on estimated index tuple counts.  Given that we should
> be counting dead index entries, redefining it as you suggest would be
> wrong.

I would argue that it would be correct for the first time -- at least
if we take the behavior within heapam_index_build_range_scan (and
everywhere else) as authoritative. That's a feature, not a bug.

-- 
Peter Geoghegan



On Mon, Apr 18, 2022 at 1:12 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I would argue that it would be correct for the first time -- at least
> if we take the behavior within heapam_index_build_range_scan (and
> everywhere else) as authoritative. That's a feature, not a bug.

Attached draft patch shows what I have in mind. I think that the issue
should be treated as a bug, albeit a minor one that's not worth
backpatching. I would like to target Postgres 15 here.

Addressing this issue allowed me to move more state out of vacrel.
This patch continues the trend of moving everything that deals with
pg_class, statistics, or instrumentation from lazy_scan_heap into its
caller, heap_vacuum_rel().

I noticed GIN gives us another reason to go this way:
ginvacuumcleanup() always instructs lazyvacuum.c to set each GIN
index's pg_class.reltuples to whatever the value is that will be set
in the heap relation, even with a partial index. So defining reltuples
differently for indexes just doesn't seem reasonable to me. (Actually,
lazyvacuum.c won't end up setting the value in the GIN index's
pg_class entry when IndexVacuumInfo.estimated_count is set to true.
But that hardly matters -- either way, num_index_tuples comes from
num_heap_tuples in a GIN index, no matter what.)

-- 
Peter Geoghegan

Attachment