Re: CREATE INDEX and HOT - revised design - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: CREATE INDEX and HOT - revised design
Date
Msg-id 200703211640.l2LGe2Y25389@momjian.us
Whole thread Raw
In response to Re: CREATE INDEX and HOT - revised design  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Responses Re: CREATE INDEX and HOT - revised design  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
List pgsql-hackers
Pavan Deolasee wrote:
> On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > Bruce Momjian wrote:
> > >
> > > I have read the HOT discussion and wanted to give my input.  The major
> > > issue is that CREATE INDEX might require a HOT chain to be split apart
> > > if one of the new indexed columns changed in the HOT chain.
> >
> > To expand a little more, the problem is that when you split those HOT
> > chains, you have to insert new entries into the _existing_ indexes,
> > causing problems for concurrent sequential scans.
> 
> 
> You mean index scans, right ? Sequential scans don't change with HOT.

Sorry, yes, index scans.

> I have a new idea.  There has been a lot of focus on trying to tag each
> > tuple to indicate whether it is part of a HOT chain for individual
> > indexes, the idea being that some indexes will follow the HOT chain, and
> > some will not.
> >
> >
> 
> In general I would like to preserve the HOT properties at the end of
> creation of new index. All index should point to the same root tuple
> if pointing to a HOT-chain. The things might become otherwise messy
> with the line pointer redirection, reuse of LP_DELETEd tuples and
> other things that we have put in.
> 
> Disabling HOT-updates using pg_class attribute has same issue
> with my earlier proposal. What do we do if the backend crashes before
> it can enable it again ? Who would reset the flag ? We could have lived
> without DDL and CHILL command if we would have answers for
> these questions.

Let's focus on the xid idea, as stated in earlier emails.  It has fewer
restrictions.

> A different idea here:
> 
> Would it be acceptable to force CREATE INDEX to be run outside
> a transaction block just like CREATE INDEX CONCURRENTLY ? If thats
> acceptable, we can do the following:

Yea, that is possible, but not ideal.

> CREATE INDEX:
> 
> Since CREATE INDEX locks out table for UPDATEs, we just need
> to wait enough before we start the heap scan so that when we do
> heap scan, all HOT-chains can be pruned to a single tuple (with or
> without the redirected line pointer). So when the new index is built,
> we first prune the entire page of HOT-chains and insert the TID of
> the root tuple into the new index. IOW we need to wait for all
> transactions in the snapshot after acquiring ShareLock but before
> we start heap scan.

I am worried that will require CREATE INDEX to wait for a long time.

> CREATE INDEX CONCURRENTLY:
> 
> Simon has already posted a design that would work with CREATE
> INDEX CONCURRENTLY. I think we need to tweak it a bit so
> that we can handle the HOT-updated tuples after then first heap
> scan, but before the index is visible to all transactions. Once the
> new index is visible, the heap_update() code path would take care
> of avoiding HOT-updates if the column on which new index is being
> built is updated.
> 
> It seems much simpler to me do something like this. But important
> question is whether the restriction that CREATE INDEX can not
> be run in a transaction block is acceptable ?

Is the pg_index xid idea too complex?  It seems to give you the
per-tuple index bit, without the restrictions.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: CREATE INDEX and HOT - revised design
Next
From: Chris Browne
Date:
Subject: TOASTing smaller things