Thread: Optimizing away second VACUUM heap scan when only BRIN indexes on table

Optimizing away second VACUUM heap scan when only BRIN indexes on table

From
Peter Geoghegan
Date:
The BRIN README notes:

"""
Since no heap TIDs are stored in a BRIN index, it's not necessary to scan the
index when heap tuples are removed.  It might be that some summary values can
be tightened if heap tuples have been deleted; but this would represent an
optimization opportunity only, not a correctness issue.  It's simpler to
represent this as the need to re-run summarization on the affected page range
rather than "subtracting" values from the existing one.  This is not
currently implemented.
"""

While I haven't studied the question in depth, I'm pretty sure that
what we do to "VACUUM" BRIN indexes has no dependency on the heap TIDs
that are pointed to by the index, because there really aren't any. The
README goes on to say:

"""
Note that if there are no indexes on the table other than the BRIN index,
usage of maintenance_work_mem by vacuum can be decreased significantly, because
no detailed index scan needs to take place (and thus it's not necessary for
vacuum to save TIDs to remove).  It's unlikely that BRIN would be the only
indexes in a table, though, because primary keys can be btrees only, and so
we don't implement this optimization.
"""

I don't think it's realistic to suppose that BRIN indexes will rarely
be the only indexes on a table. I'm not especially concerned about
maintenance_work_mem in this scenario, though -- I'm much more
concerned about the possibly unnecessary second heap scan during a
VACUUM of what is presumably a very large table. A second heap scan
occurs (lazy_vacuum_heap() is called), where that would not occur if
there were no indexes whatsoever (see commit ed8969b1).

I realize that the second scan performed by lazy_vacuum_heap() only
visits those pages known to contain dead tuples. However, the
experience of seeing problems with the random sampling of ANALYZE
makes me think that that might not be very helpful. There is no good
reason to think that there won't be a uniform distribution of dead
tuples across the heap, and so only visiting pages known to contain
dead tuples might be surprisingly little help even when there are
relatively few VACUUM-able tuples in the table.

Has any thought been given to how we could make VACUUM avoid a second
heap scan iff there are only BRIN indexes, without compromising
anything else? In other words, by killing heap TIDs *before* any
"VACUUM" of BRIN index(es) occurs, avoiding a call to
lazy_vacuum_heap(), just as when there are no indexes on the table
whatsoever?

-- 
Peter Geoghegan



Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table

From
Jim Nasby
Date:
On 11/23/15 5:06 PM, Peter Geoghegan wrote:
> I realize that the second scan performed by lazy_vacuum_heap() only
> visits those pages known to contain dead tuples. However, the
> experience of seeing problems with the random sampling of ANALYZE
> makes me think that that might not be very helpful. There is no good
> reason to think that there won't be a uniform distribution of dead
> tuples across the heap, and so only visiting pages known to contain
> dead tuples might be surprisingly little help even when there are
> relatively few VACUUM-able tuples in the table.

Even worse is if you can't fit all the dead TIDs in memory and have to 
do multiple passes for no reason...

> Has any thought been given to how we could make VACUUM avoid a second
> heap scan iff there are only BRIN indexes, without compromising
> anything else? In other words, by killing heap TIDs*before*  any
> "VACUUM" of BRIN index(es) occurs, avoiding a call to
> lazy_vacuum_heap(), just as when there are no indexes on the table
> whatsoever?

ISTM the big question here is how vacuum would know it can skip this 
since we wouldn't want to hard-code this for BRIN.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On 11/23/15 5:06 PM, Peter Geoghegan wrote:
> >I realize that the second scan performed by lazy_vacuum_heap() only
> >visits those pages known to contain dead tuples. However, the
> >experience of seeing problems with the random sampling of ANALYZE
> >makes me think that that might not be very helpful. There is no good
> >reason to think that there won't be a uniform distribution of dead
> >tuples across the heap, and so only visiting pages known to contain
> >dead tuples might be surprisingly little help even when there are
> >relatively few VACUUM-able tuples in the table.
> 
> Even worse is if you can't fit all the dead TIDs in memory and have to do
> multiple passes for no reason...

Since BRIN indexes cannot be primary keys nor unique keys, it's hard to
be convinced that the use case of a table with only BRIN indexes is
terribly interesting.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table

From
Peter Geoghegan
Date:
On Sun, Dec 20, 2015 at 6:14 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Since BRIN indexes cannot be primary keys nor unique keys, it's hard to
> be convinced that the use case of a table with only BRIN indexes is
> terribly interesting.

I'm not convinced of that.

-- 
Peter Geoghegan



Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table

From
David Fetter
Date:
On Sun, Dec 20, 2015 at 11:14:46PM -0300, Alvaro Herrera wrote:
> Jim Nasby wrote:
> > On 11/23/15 5:06 PM, Peter Geoghegan wrote:
> > >I realize that the second scan performed by lazy_vacuum_heap() only
> > >visits those pages known to contain dead tuples. However, the
> > >experience of seeing problems with the random sampling of ANALYZE
> > >makes me think that that might not be very helpful. There is no good
> > >reason to think that there won't be a uniform distribution of dead
> > >tuples across the heap, and so only visiting pages known to contain
> > >dead tuples might be surprisingly little help even when there are
> > >relatively few VACUUM-able tuples in the table.
> > 
> > Even worse is if you can't fit all the dead TIDs in memory and have to do
> > multiple passes for no reason...
> 
> Since BRIN indexes cannot be primary keys nor unique keys, it's hard to
> be convinced that the use case of a table with only BRIN indexes is
> terribly interesting.

If you've got high-frequency logs, timestamptz might not operate at
fine enough a grain to form a primary key, but it's just the kind of
thing BRIN is great at narrowing down.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table

From
Simon Riggs
Date:
On 21 December 2015 at 02:14, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Jim Nasby wrote:
> On 11/23/15 5:06 PM, Peter Geoghegan wrote:
> >I realize that the second scan performed by lazy_vacuum_heap() only
> >visits those pages known to contain dead tuples. However, the
> >experience of seeing problems with the random sampling of ANALYZE
> >makes me think that that might not be very helpful. There is no good
> >reason to think that there won't be a uniform distribution of dead
> >tuples across the heap, and so only visiting pages known to contain
> >dead tuples might be surprisingly little help even when there are
> >relatively few VACUUM-able tuples in the table.
>
> Even worse is if you can't fit all the dead TIDs in memory and have to do
> multiple passes for no reason...

Since BRIN indexes cannot be primary keys nor unique keys, it's hard to
be convinced that the use case of a table with only BRIN indexes is
terribly interesting.

Given BRIN's characteristics, such a table design is compelling when the table is very large, yet possible only for certain use cases. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table

From
Peter Geoghegan
Date:
On Mon, Dec 21, 2015 at 1:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Given BRIN's characteristics, such a table design is compelling when the
> table is very large, yet possible only for certain use cases.

You can say the same thing about BRIN itself, of course.


-- 
Peter Geoghegan



Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table

From
Simon Riggs
Date:
On 21 December 2015 at 09:35, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Dec 21, 2015 at 1:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Given BRIN's characteristics, such a table design is compelling when the
> table is very large, yet possible only for certain use cases.

You can say the same thing about BRIN itself, of course.


AFAICS, this idea is workable and so I'd say "patches welcome" on it. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services