Re: BRIN index and aborted transaction - Mailing list pgsql-hackers

From Robert Haas
Subject Re: BRIN index and aborted transaction
Date
Msg-id CA+Tgmoa=j9J8gGwbxttuKWk=KOqJNkTCo9djVhbLAmO1t390-g@mail.gmail.com
Whole thread Raw
In response to Re: BRIN index and aborted transaction  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: BRIN index and aborted transaction
List pgsql-hackers
On Sat, Jul 18, 2015 at 5:11 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Yeah, that's a bit of an open problem: we don't have any mechanism to
> mark a block range as needing resummarization, yet.  I don't have any
> great ideas there, TBH.  Some options that were discussed but never led
> anywhere:
>
> 1. whenever a heap tuple is deleted that's minimum or maximum for a
> column, mark the index tuple as needing resummarization.  One a future
> vacuuming pass the index would be updated.  (I think this works for
> minmax, but I don't see how to apply it to inclusion).
>
> 2. have block ranges be resummarized randomly during vacuum.
>
> 3. Have index tuples last for only X number of transactions, marking the
> as needing summarization when that expires.
>
> 4. Have a user-invoked function that re-runs summarization.  That way
> the user can implement any of the above policies, or others.

Maybe I'm confused here, but it seems like the only time
re-summarization can be needed is when tuples are pruned.  The mere
act of deleting a tuple, even if the delete goes on to commit, doesn't
create a scenario where re-summarization can work out to a win,
because there may still be snapshots that can see it.  At the point
where we prune the tuple, though, there might well be a benefit in
re-summarizing, because now a newly-computed summary value won't need
to cover a value that previously had to be there.

But it seems obviously impractical to re-summarize when we HOT-prune,
so it seems like the obvious thing to do is make vacuum do it.  We
know during phase one of vacuum whether we saw any dead tuples in page
range X-Y; if yes, re-summarize.  The only reason not to do this is if
it causes us to do a lot of resummarization that frequently fails to
produce a smaller range. Do you have any experimental data suggesting
that this is or is not a problem?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Next
From: Robert Haas
Date:
Subject: Re: Arguable RLS security bug, EvalPlanQual() paranoia