Thread: Latest plans for Utilities with HOT

Latest plans for Utilities with HOT

From
"Simon Riggs"
Date:
Overview
--------
CREATE INDEX, CREATE INDEX CONCURRENTLY and VACUUM FULL all need some
adaptation to work correctly with HOT.

[This summary and proposal supercedes all previous proposals by me
regarding utilities with HOT]

The Problem
-----------

With HOT, CREATE INDEX may find tuples that are HEAP_ONLY_TUPLE, yet
have an index value(s) that differs from the indexable value of the
tuple version which is at the root of the hot chain.

e.g. we have a simple hot chain that looks like this on one page:

tup1(attr1=1, attr2=A)[ROOT] -> tup2(attr1=1, attr2=B)[HOT)

there is already an index(attr1), and we are trying to build an index on
attr2. Index(attr2) will to point to both tup1 and tup2. So by the end
of the build we need to have inserted an additional index pointer into
index(attr1) (yes, the other index) for tup2 and also broken the link
between tup1->tup2. This operation is known as chilling, or
de-HOT-ifying.

If we don't do this, we will have an index inconsistency. 

If we attempt to chill a tuple concurrently, we also get the possibility
of a concurrent index scan following multiple paths to the same tuple.
This effects all types of indexes, not just b-trees.

Solutions
---------

There are three basic solutions: LOCK,  UPDATE & WAIT and MARK

1. LOCK - Use some form of locking to prevent concurrent index scans
while we modify the various tuple versions and fiddle with the index
entries. This has some sub-options:
a) AccessExclusiveLock on main relation prevents all index accesses
b) Lock all of the indexes with an AccessExclusiveLock, in a consistent
order, so that no concurrent index scans are possible. This would be a
non-transactional lock, so could be dropped before end of transaction -
we're not locking the data we're just locking the physical access
mechanism.

2. UPDATE & WAIT - From Hannu - Perform cold-null-UPDATEs on each tuple,
which can happen concurrently. This can happen concurrently, but then
requires us to wait until we are older than all non-utility
transactions, so that these UPDATEs are visible to all.

3. MARK - From Heikki - verbatim:
One solution I thought of is to add another flag to heap tuples, 
CHILL_IN_PROGRESS. To chill a tuple, you would:

1. Mark heap tuple with CHILL_IN_PROGRESS
2. Insert missing index entries
3. Clear CHILL_IN_PROGRESS and HEAP_ONLY_TUPLE flags

Index scans would ignore tuples with CHILL_IN_PROGRESS and directly 
pointed to from the index. That way if we crash in the middle of step 2,
scans and updates would work normally after replay, as if the index 
entries weren't there. CREATE INDEX would have to fail if there's any 
CHILL_IN_PROGRESS tuples, because we wouldn't know which index entries 
need to be inserted; some might already be there. To clear the 
CHILL_IN_PROGRESS flag, a vacuum would be needed. Vacuum would remove 
all index entries for those tuples, but instead of removing the heap 
tuple in the 2nd scan it would just clear the CHILL_IN_PROGRESS flag, 
bringing us back to where we started.

Thoughts
--------

- CREATE INDEX CONCURRENTLY can use UPDATE & WAIT or MARK

- VACUUM FULL could use any of the solutions. It does currently hold
AccessExclusiveLock. UPDATE & WAIT would be very good.

- CREATE INDEX has the most problems, but MARK solves most of them.

Multiple index builds may need to chill different sets of tuples, but if
there is any overlap then the second build will fail when it sees the
CHILL_IN_PROGRESS flag that the other has set. So we can no longer
achieve multiple concurrent index builds without great care.

It might be possible to clean up old CHILL_IN_PROGRESS flags as we go,
but if we did that we'd need to make sure concurrent index scans never
pass each other, otherwise one index build would clean up the others
flags. So we have two choices:
a) we continue to allow concurrent index builds, but we throw an ERROR
when an index build finds a CHILL_IN_PROGRESS flag
b) we clean up CHILL_IN_PROGRESS flags as we go, but we give up the
ability to perform concurrent index builds

Proposal
--------

- CREATE INDEX can use the MARK solution as part of the first scan (cos
there's only one), setting flags called HEAP_CHILL_IN_PROGRESS.

As the index build scan proceeds we insert records directly into each
index using the standard WAL-logged insertion path. No spool file is
built up while we do this. Setting HEAP_CHILL_IN_PROGRESS must must also
write WAL records and dirty the block, since we might fail after we have
performed an index insert but before we remove the mark. This could mean
we'd end up writing lots of database blocks when we do CREATE INDEX.
(Note here that we use WAL to chill, even though the main index build
may still skip writing WAL as it does now, if archive_command is not
set).

Note we might fail to build an index because of a uniqueness violation
that occurs *during* the tuplesort run. So we must mark the tuple,
insert the indexes, clean the tuple and then offer the tuple to the
index callback.

We pick option a): continue to allow concurrent index builds, but we
throw an ERROR when an index build finds a HEAP_CHILL_IN_PROGRESS flag.
That's the optimistic approach. Building indexes concurrently only
usually happens after a restore/large load, which will be when there are
no HOT tuples or flags set anyway.

This means we also need to modify the VACUUM process to clean up after a
failed index build. VACUUM will build a second array of tids that can be
used to perform removal of HEAP_CHILL_IN_PROGRESS during
lazy_scan_index() [[perhaps with uncool_tuple()? :-) ]]. We use a second
array because we need only remove HEAP_CHILL_IN_PROGRESS flags when a
tuple will not be removed by the normal vacuum process, so there is by
definition no overlap between the two lists. We use an array of tids to
allow us to reuse all the existing code for heap/index cleaning.

- CREATE INDEX CONCURRENTLY can use the MARK solution as part of the
second scan. After the first scan, the new index will be visible to
other transactions and no new HOT tuples will be created that could
effect the new index. So chilling as part of the second scan will ensure
that we catch all HOT tuples.

- VACUUM FULL - The best solution, for now, is to make VACUUM FULL
perform a reindex on all indexes on the table. Chilling may require us
to modify considerably more index entries than previously. UPDATE & WAIT
would be very good, but probably should wait for the next release now,
since we now have changes to make to 4 utilities.

Are we cool with that?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Latest plans for Utilities with HOT

From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>> - VACUUM FULL - The best solution, for now, is to make VACUUM FULL> perform a reindex on all
indexeson the table. Chilling may require us> to modify considerably more index entries than previously. UPDATE & WAIT>
wouldbe very good, but probably should wait for the next release now,> since we now have changes to make to 4
utilities.

On my way back home, I was thinking about VACUUM FULL. Is there really
a problem with VACUUM FULL and HOT ? VF moves tuple chains in the
table and does that while holding AccessExclusive lock on the table.

If we prune the HOT-update chains (which we anyways do for lazy vacuum),
we can guarantee that the entire HOT-update chain will be moved, if
any of the tuples in the chain is moved. Also, when VF moves a tuple
chain appropriate index entries are inserted into all the indexes.
If we don't carry the HEAP_HOT_UPDATED or HEAP_ONLY_TUPLE flags
to the moved location, the HOT-update chain will be broken, but
that should not have any correctness problems. If VACUUM FULL
crashes before completeing its operation, the original HOT-update
chain would still remain valid and the new tuples will be discarded.

Is there something I am missing ?

Thanks,
Pavan


EnterpriseDB    http://www.enterprisedb.com



Re: Latest plans for Utilities with HOT

From
"Simon Riggs"
Date:
On Mon, 2007-03-05 at 21:39 +0530, Pavan Deolasee wrote:
> Simon Riggs wrote:
>  >
>  > - VACUUM FULL - The best solution, for now, is to make VACUUM FULL
>  > perform a reindex on all indexes on the table. Chilling may require us
>  > to modify considerably more index entries than previously. UPDATE & WAIT
>  > would be very good, but probably should wait for the next release now,
>  > since we now have changes to make to 4 utilities.
> 
> On my way back home, I was thinking about VACUUM FULL. Is there really
> a problem with VACUUM FULL and HOT ? VF moves tuple chains in the
> table and does that while holding AccessExclusive lock on the table.

No problem, just additional work for little benefit.

> If we prune the HOT-update chains (which we anyways do for lazy vacuum),
> we can guarantee that the entire HOT-update chain will be moved, if
> any of the tuples in the chain is moved. 

Currently each tuple is moved individually. You'd need to inspect the
whole HOT chain on a page, calculate space for that and then try to move
them all in one go. I was originally thinking that would be a problem,
but its not so bad - but it may cause us to end repair_frag() earlier
than we otherwise would depending upon the game of Tetris plays out.

Thats harder than it sounds. My concern is that VACUUM FULL code is
fairly ugly and doing this might introduce a bug into a rarely used
piece of code that we don't pick up. I would not choose that path
myself, but we can go there if the consensus is that doing a reindex
would be the wrong way to go. Reindex will be fast to code and much more
likely to be bug-free.

> Also, when VF moves a tuple
> chain appropriate index entries are inserted into all the indexes.
> If we don't carry the HEAP_HOT_UPDATED or HEAP_ONLY_TUPLE flags
> to the moved location, the HOT-update chain will be broken, but
> that should not have any correctness problems. If VACUUM FULL
> crashes before completeing its operation, the original HOT-update
> chain would still remain valid and the new tuples will be discarded.

You mean all moves will be cold?

So VACUUM FULL will actually bloat the indexes? I hope I've
misunderstood you.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Latest plans for Utilities with HOT

From
"Pavan Deolasee"
Date:
Simon Riggs wrote:
> On Mon, 2007-03-05 at 21:39 +0530, Pavan Deolasee wrote:
>   
> Currently each tuple is moved individually. You'd need to inspect the
> whole HOT chain on a page, calculate space for that and then try to move
> them all in one go. I was originally thinking that would be a problem,
> but its not so bad - but it may cause us to end repair_frag() earlier
> than we otherwise would depending upon the game of Tetris plays out.
>
>   
Umm.. I still need to look deeper to understand the VACUUM FULL code, 
but ISTM
that we can move tuple chains just the way its done today, without 
bothering to keep
HOT-update chains intact. The tuples may actually got into different 
pages and have
equal number of index entries. To my mind, this is not such a big 
problem because
we shouldn't expect too many HOT-update chains while running VACUUM FULL.
Isn't that true ?

> Thats harder than it sounds. My concern is that VACUUM FULL code is
> fairly ugly and doing this might introduce a bug into a rarely used
> piece of code that we don't pick up. I would not choose that path
> myself, but we can go there if the consensus is that doing a reindex
> would be the wrong way to go. Reindex will be fast to code and much more
> likely to be bug-free.
>
>   
Oh I agree. VACUUM FULL code is hard to assimilate. I am hoping that we can
make HOT work fine with VACUUM FULL without too many changes. If thats
not the case, I won't push for it.
> You mean all moves will be cold?
>
> So VACUUM FULL will actually bloat the indexes? I hope I've
> misunderstood you.
>
>   
Yes, thats what I meant. But are we really worried about bloating 
indexes while
VACUUM FULL ? Any other tuple movements would anyway cause index
inserts. And I don't really expect too many tuple chains since VACUUM FULL
runs with AccessExclusive lock on the table.

Thanks,
Pavan


Re: Latest plans for Utilities with HOT

From
"Simon Riggs"
Date:
On Mon, 2007-03-05 at 22:25 +0530, Pavan Deolasee wrote:
> Simon Riggs wrote:
> > On Mon, 2007-03-05 at 21:39 +0530, Pavan Deolasee wrote:
> >   
> > Currently each tuple is moved individually. You'd need to inspect the
> > whole HOT chain on a page, calculate space for that and then try to move
> > them all in one go. I was originally thinking that would be a problem,
> > but its not so bad - but it may cause us to end repair_frag() earlier
> > than we otherwise would depending upon the game of Tetris plays out.
> >
> >   
> Umm.. I still need to look deeper to understand the VACUUM FULL code, 
> but ISTM
> that we can move tuple chains just the way its done today, without 
> bothering to keep
> HOT-update chains intact. The tuples may actually got into different 
> pages and have
> equal number of index entries. To my mind, this is not such a big 
> problem because
> we shouldn't expect too many HOT-update chains while running VACUUM FULL.
> Isn't that true ?

Well, its true enough to be a great argument.

So what you're saying is: we do nothing and it just works. At least not
too badly, and at very least: no worse than it does today.

[Oh dear! I just finished writing prototype of VACUUM FULL-with-reindex
when I read this, so either way it looks like nothing more needed on
this utility. 1 down, 3 to go.]

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com