Re: HOT for PostgreSQL 8.3 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: HOT for PostgreSQL 8.3
Date
Msg-id 45CB37EF.4060500@enterprisedb.com
Whole thread Raw
In response to Re: HOT for PostgreSQL 8.3  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: HOT for PostgreSQL 8.3  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-hackers
Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> The basic idea is that when a tuple is UPDATEd we can, in certain
>> circumstances, avoid inserting index tuples for a tuple. Such tuples are
>> marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to
>> other tuples.
> 
> What is VACUUM FULL going to do when it wants to move one of these things?

I suppose it could move the whole chain to the same target page, if 
there is one with enough space to accommodate the whole chain. Or we 
could just cop out and not move tuples marked with HEAP_ONLY_TUPLE. I 
think that would be acceptable; after the last transaction that can see 
the old tuple is finished, the old tuple is dead. After that, VACUUM 
FULL can remove the old tuple and move the remaining tuple as usual.

>> CREATE INDEX requires some careful work to allow it to identify and
>> correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as
>> a result of the new index.
> 
> I think you've glossed over the CREATE INDEX problem much too easily.
> The difficulty with that is going to be that de-HOT-ifying a tuple
> is going to require multiple updates that can't possibly be put into
> a single WAL record, and I don't think that WAL replay can clean up
> after an incomplete update (since it can't run user-defined functions
> and hence cannot be expected to compute index entries for itself).
> So I don't think you can do that while preserving crash safety.

Yeah, chilling tuples from HOT state to normal tuples is not easy.

One solution I thought of is to add another flag to heap tuples, 
CHILL_IN_PROGRESS. To chill a tuple, you would:

1. Mark heap tuple with CHILL_IN_PROGRESS
2. Insert missing index entries
3. Clear CHILL_IN_PROGRESS and HEAP_ONLY_TUPLE flags

Index scans would ignore tuples with CHILL_IN_PROGRESS and directly 
pointed to from the index. That way if we crash in the middle of step 2, 
scans and updates would work normally after replay, as if the index 
entries weren't there. CREATE INDEX would have to fail if there's any 
CHILL_IN_PROGRESS tuples, because we wouldn't know which index entries 
need to be inserted; some might already be there. To clear the 
CHILL_IN_PROGRESS flag, a vacuum would be needed. Vacuum would remove 
all index entries for those tuples, but instead of removing the heap 
tuple in the 2nd scan it would just clear the CHILL_IN_PROGRESS flag, 
bringing us back to where we started.


However, the easiest solution would be to make CREATE INDEX wait until 
the old tuple is dead. That should be ok at least for concurrent CREATE 
INDEX, because it already has that kind of a wait between 1st and 2nd 
phase.

>> Removing the root tuple will require a VACUUM *FULL*.
> 
> That seems unacceptable ... it won't take too long for your table to
> fill up with stubs, and we don't want to return to the bad old days
> when periodic VACUUM FULL was unavoidable.

Agreed.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split
Next
From: Heikki Linnakangas
Date:
Subject: Re: Bitmap index thoughts