Thread: Optimizing away second VACUUM heap scan when only BRIN indexes on table
The BRIN README notes: """ Since no heap TIDs are stored in a BRIN index, it's not necessary to scan the index when heap tuples are removed. It might be that some summary values can be tightened if heap tuples have been deleted; but this would represent an optimization opportunity only, not a correctness issue. It's simpler to represent this as the need to re-run summarization on the affected page range rather than "subtracting" values from the existing one. This is not currently implemented. """ While I haven't studied the question in depth, I'm pretty sure that what we do to "VACUUM" BRIN indexes has no dependency on the heap TIDs that are pointed to by the index, because there really aren't any. The README goes on to say: """ Note that if there are no indexes on the table other than the BRIN index, usage of maintenance_work_mem by vacuum can be decreased significantly, because no detailed index scan needs to take place (and thus it's not necessary for vacuum to save TIDs to remove). It's unlikely that BRIN would be the only indexes in a table, though, because primary keys can be btrees only, and so we don't implement this optimization. """ I don't think it's realistic to suppose that BRIN indexes will rarely be the only indexes on a table. I'm not especially concerned about maintenance_work_mem in this scenario, though -- I'm much more concerned about the possibly unnecessary second heap scan during a VACUUM of what is presumably a very large table. A second heap scan occurs (lazy_vacuum_heap() is called), where that would not occur if there were no indexes whatsoever (see commit ed8969b1). I realize that the second scan performed by lazy_vacuum_heap() only visits those pages known to contain dead tuples. However, the experience of seeing problems with the random sampling of ANALYZE makes me think that that might not be very helpful. There is no good reason to think that there won't be a uniform distribution of dead tuples across the heap, and so only visiting pages known to contain dead tuples might be surprisingly little help even when there are relatively few VACUUM-able tuples in the table. Has any thought been given to how we could make VACUUM avoid a second heap scan iff there are only BRIN indexes, without compromising anything else? In other words, by killing heap TIDs *before* any "VACUUM" of BRIN index(es) occurs, avoiding a call to lazy_vacuum_heap(), just as when there are no indexes on the table whatsoever? -- Peter Geoghegan
On 11/23/15 5:06 PM, Peter Geoghegan wrote: > I realize that the second scan performed by lazy_vacuum_heap() only > visits those pages known to contain dead tuples. However, the > experience of seeing problems with the random sampling of ANALYZE > makes me think that that might not be very helpful. There is no good > reason to think that there won't be a uniform distribution of dead > tuples across the heap, and so only visiting pages known to contain > dead tuples might be surprisingly little help even when there are > relatively few VACUUM-able tuples in the table. Even worse is if you can't fit all the dead TIDs in memory and have to do multiple passes for no reason... > Has any thought been given to how we could make VACUUM avoid a second > heap scan iff there are only BRIN indexes, without compromising > anything else? In other words, by killing heap TIDs*before* any > "VACUUM" of BRIN index(es) occurs, avoiding a call to > lazy_vacuum_heap(), just as when there are no indexes on the table > whatsoever? ISTM the big question here is how vacuum would know it can skip this since we wouldn't want to hard-code this for BRIN. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table
From
Alvaro Herrera
Date:
Jim Nasby wrote: > On 11/23/15 5:06 PM, Peter Geoghegan wrote: > >I realize that the second scan performed by lazy_vacuum_heap() only > >visits those pages known to contain dead tuples. However, the > >experience of seeing problems with the random sampling of ANALYZE > >makes me think that that might not be very helpful. There is no good > >reason to think that there won't be a uniform distribution of dead > >tuples across the heap, and so only visiting pages known to contain > >dead tuples might be surprisingly little help even when there are > >relatively few VACUUM-able tuples in the table. > > Even worse is if you can't fit all the dead TIDs in memory and have to do > multiple passes for no reason... Since BRIN indexes cannot be primary keys nor unique keys, it's hard to be convinced that the use case of a table with only BRIN indexes is terribly interesting. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table
From
Peter Geoghegan
Date:
On Sun, Dec 20, 2015 at 6:14 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Since BRIN indexes cannot be primary keys nor unique keys, it's hard to > be convinced that the use case of a table with only BRIN indexes is > terribly interesting. I'm not convinced of that. -- Peter Geoghegan
On Sun, Dec 20, 2015 at 11:14:46PM -0300, Alvaro Herrera wrote: > Jim Nasby wrote: > > On 11/23/15 5:06 PM, Peter Geoghegan wrote: > > >I realize that the second scan performed by lazy_vacuum_heap() only > > >visits those pages known to contain dead tuples. However, the > > >experience of seeing problems with the random sampling of ANALYZE > > >makes me think that that might not be very helpful. There is no good > > >reason to think that there won't be a uniform distribution of dead > > >tuples across the heap, and so only visiting pages known to contain > > >dead tuples might be surprisingly little help even when there are > > >relatively few VACUUM-able tuples in the table. > > > > Even worse is if you can't fit all the dead TIDs in memory and have to do > > multiple passes for no reason... > > Since BRIN indexes cannot be primary keys nor unique keys, it's hard to > be convinced that the use case of a table with only BRIN indexes is > terribly interesting. If you've got high-frequency logs, timestamptz might not operate at fine enough a grain to form a primary key, but it's just the kind of thing BRIN is great at narrowing down. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 21 December 2015 at 02:14, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
--
Jim Nasby wrote:
> On 11/23/15 5:06 PM, Peter Geoghegan wrote:
> >I realize that the second scan performed by lazy_vacuum_heap() only
> >visits those pages known to contain dead tuples. However, the
> >experience of seeing problems with the random sampling of ANALYZE
> >makes me think that that might not be very helpful. There is no good
> >reason to think that there won't be a uniform distribution of dead
> >tuples across the heap, and so only visiting pages known to contain
> >dead tuples might be surprisingly little help even when there are
> >relatively few VACUUM-able tuples in the table.
>
> Even worse is if you can't fit all the dead TIDs in memory and have to do
> multiple passes for no reason...
Since BRIN indexes cannot be primary keys nor unique keys, it's hard to
be convinced that the use case of a table with only BRIN indexes is
terribly interesting.
Given BRIN's characteristics, such a table design is compelling when the table is very large, yet possible only for certain use cases.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table
From
Peter Geoghegan
Date:
On Mon, Dec 21, 2015 at 1:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Given BRIN's characteristics, such a table design is compelling when the > table is very large, yet possible only for certain use cases. You can say the same thing about BRIN itself, of course. -- Peter Geoghegan
On 21 December 2015 at 09:35, Peter Geoghegan <pg@heroku.com> wrote:
--
On Mon, Dec 21, 2015 at 1:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Given BRIN's characteristics, such a table design is compelling when the
> table is very large, yet possible only for certain use cases.
You can say the same thing about BRIN itself, of course.
AFAICS, this idea is workable and so I'd say "patches welcome" on it.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services