Optimizing away second VACUUM heap scan when only BRIN indexes on table - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Optimizing away second VACUUM heap scan when only BRIN indexes on table
Date
Msg-id CAM3SWZSvhJLdEPbB1v9bX_VDd78Dd0usWhdWcUyfmjcnAMNc6A@mail.gmail.com
Whole thread Raw
Responses Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
The BRIN README notes:

"""
Since no heap TIDs are stored in a BRIN index, it's not necessary to scan the
index when heap tuples are removed.  It might be that some summary values can
be tightened if heap tuples have been deleted; but this would represent an
optimization opportunity only, not a correctness issue.  It's simpler to
represent this as the need to re-run summarization on the affected page range
rather than "subtracting" values from the existing one.  This is not
currently implemented.
"""

While I haven't studied the question in depth, I'm pretty sure that
what we do to "VACUUM" BRIN indexes has no dependency on the heap TIDs
that are pointed to by the index, because there really aren't any. The
README goes on to say:

"""
Note that if there are no indexes on the table other than the BRIN index,
usage of maintenance_work_mem by vacuum can be decreased significantly, because
no detailed index scan needs to take place (and thus it's not necessary for
vacuum to save TIDs to remove).  It's unlikely that BRIN would be the only
indexes in a table, though, because primary keys can be btrees only, and so
we don't implement this optimization.
"""

I don't think it's realistic to suppose that BRIN indexes will rarely
be the only indexes on a table. I'm not especially concerned about
maintenance_work_mem in this scenario, though -- I'm much more
concerned about the possibly unnecessary second heap scan during a
VACUUM of what is presumably a very large table. A second heap scan
occurs (lazy_vacuum_heap() is called), where that would not occur if
there were no indexes whatsoever (see commit ed8969b1).

I realize that the second scan performed by lazy_vacuum_heap() only
visits those pages known to contain dead tuples. However, the
experience of seeing problems with the random sampling of ANALYZE
makes me think that that might not be very helpful. There is no good
reason to think that there won't be a uniform distribution of dead
tuples across the heap, and so only visiting pages known to contain
dead tuples might be surprisingly little help even when there are
relatively few VACUUM-able tuples in the table.

Has any thought been given to how we could make VACUUM avoid a second
heap scan iff there are only BRIN indexes, without compromising
anything else? In other words, by killing heap TIDs *before* any
"VACUUM" of BRIN index(es) occurs, avoiding a call to
lazy_vacuum_heap(), just as when there are no indexes on the table
whatsoever?

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: parallelism and sorting
Next
From: Chapman Flack
Date:
Subject: Re: problem with msvc linker - cannot build orafce