Kevin Grittner wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> >> I think the real solution to this problem is to avoid use of
> >> GetTransactionSnapshot(), and instead use GetLatestSnapshot(). As far
> >> as I can see, that should completely close the hole. This requires
> >> patching IndexBuildHeapRangeScan() to allow for that.
> >
> > Actually I think there's another problem: if a transaction starts and
> > inserts a tuple into the page range, then goes to sleep, and then
> > another session does the summarization of the page range, session 1 is
> > seen as "in progress" by session 2 (so the scan does not see the new
> > tuple), but the placeholder tuple was not modified either because it was
> > inserted later than the snapshot. So the update is lost.
> >
> > I think the only way to close this hole is to have summarize_range()
> > sleep until all open snapshots are gone after inserting the placeholder
> > tuple and before acquiring the snapshot, similarly to how CREATE INDEX
> > CONCURRENTLY does it.
>
> Please excuse my naiveté on the topic, but could you explain (or
> point me to the documented explanation) of why we don't scan using
> a non-MVCC snapshot and build the page range based on all non-dead
> tuples?
Because I don't know of any mechanism to lock insertion into the block
range while the scan takes place. If you can suggest something
workable, that might be better than what I'm proposing.
Note that in my proposal, we would have to wait for all snapshots to go
away *for every page range*, which seems very troublesome. A better
answer might be to insert all placeholder tuples first, then wait for
concurrent snapshots to go away, then do each scan. But this is a
larger rework of code.
> I understand that the range being scanned would need to be
> locked, but we're OK with doing that for creation of other indexes.
That might be so, but this is not index creation; it's not acceptable to
lock the table during vacuuming. Do we have anything with finer
granularity than locking the entire table? (I don't think holding
buffer lock on all the pages in the range is acceptable.)
> (There is no mention of snapshots or locks in the BRIN README....)
Um.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services