On Sat, Jul 18, 2015 at 5:11 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Yeah, that's a bit of an open problem: we don't have any mechanism to
> mark a block range as needing resummarization, yet. I don't have any
> great ideas there, TBH. Some options that were discussed but never led
> anywhere:
>
> 1. whenever a heap tuple is deleted that's minimum or maximum for a
> column, mark the index tuple as needing resummarization. One a future
> vacuuming pass the index would be updated. (I think this works for
> minmax, but I don't see how to apply it to inclusion).
>
> 2. have block ranges be resummarized randomly during vacuum.
>
> 3. Have index tuples last for only X number of transactions, marking the
> as needing summarization when that expires.
>
> 4. Have a user-invoked function that re-runs summarization. That way
> the user can implement any of the above policies, or others.
Maybe I'm confused here, but it seems like the only time
re-summarization can be needed is when tuples are pruned. The mere
act of deleting a tuple, even if the delete goes on to commit, doesn't
create a scenario where re-summarization can work out to a win,
because there may still be snapshots that can see it. At the point
where we prune the tuple, though, there might well be a benefit in
re-summarizing, because now a newly-computed summary value won't need
to cover a value that previously had to be there.
But it seems obviously impractical to re-summarize when we HOT-prune,
so it seems like the obvious thing to do is make vacuum do it. We
know during phase one of vacuum whether we saw any dead tuples in page
range X-Y; if yes, re-summarize. The only reason not to do this is if
it causes us to do a lot of resummarization that frequently fails to
produce a smaller range. Do you have any experimental data suggesting
that this is or is not a problem?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company