Re: CREATE INDEX and HOT - revised design - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: CREATE INDEX and HOT - revised design
Date
Msg-id 2e78013d0703210933k3dd96b0cka9877431bb66e434@mail.gmail.com
Whole thread Raw
In response to Re: CREATE INDEX and HOT - revised design  (Bruce Momjian <bruce@momjian.us>)
Responses Re: CREATE INDEX and HOT - revised design  (Bruce Momjian <bruce@momjian.us>)
Re: CREATE INDEX and HOT - revised design  ("Merlin Moncure" <mmoncure@gmail.com>)
List pgsql-hackers


On 3/21/07, Bruce Momjian <bruce@momjian.us > wrote:
Bruce Momjian wrote:
>
> I have read the HOT discussion and wanted to give my input.  The major
> issue is that CREATE INDEX might require a HOT chain to be split apart
> if one of the new indexed columns changed in the HOT chain.

To expand a little more, the problem is that when you split those HOT
chains, you have to insert new entries into the _existing_ indexes,
causing problems for concurrent sequential scans.

You mean index scans, right ? Sequential scans don't change with HOT.
 

I have a new idea.  There has been a lot of focus on trying to tag each
tuple to indicate whether it is part of a HOT chain for individual
indexes, the idea being that some indexes will follow the HOT chain, and
some will not.



In general I would like to preserve the HOT properties at the end of
creation of new index. All index should point to the same root tuple
if pointing to a HOT-chain. The things might become otherwise messy
with the line pointer redirection, reuse of LP_DELETEd tuples and
other things that we have put in.

Disabling HOT-updates using pg_class attribute has same issue
with my earlier proposal. What do we do if the backend crashes before
it can enable it again ? Who would reset the flag ? We could have lived
without DDL and CHILL command if we would have answers for
these questions.

A different idea here:

Would it be acceptable to force CREATE INDEX to be run outside
a transaction block just like CREATE INDEX CONCURRENTLY ? If thats
acceptable, we can do the following:

CREATE INDEX:

Since CREATE INDEX locks out table for UPDATEs, we just need
to wait enough before we start the heap scan so that when we do
heap scan, all HOT-chains can be pruned to a single tuple (with or
without the redirected line pointer). So when the new index is built,
we first prune the entire page of HOT-chains and insert the TID of
the root tuple into the new index. IOW we need to wait for all
transactions in the snapshot after acquiring ShareLock but before
we start heap scan.

CREATE INDEX CONCURRENTLY:

Simon has already posted a design that would work with CREATE
INDEX CONCURRENTLY. I think we need to tweak it a bit so
that we can handle the HOT-updated tuples after then first heap
scan, but before the index is visible to all transactions. Once the
new index is visible, the heap_update() code path would take care
of avoiding HOT-updates if the column on which new index is being
built is updated.

It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?

Comments ?

Thanks,
Pavan


--

EnterpriseDB     http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: August Zajonc
Date:
Subject: Re: Money type todos?
Next
From: Benjamin Arai
Date:
Subject: SoC Ideas for people looking for projects