Re: CREATE INDEX and HOT - revised design - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: CREATE INDEX and HOT - revised design
Date
Msg-id 200703211447.l2LElnh18278@momjian.us
Whole thread Raw
In response to Re: CREATE INDEX and HOT - revised design  (Bruce Momjian <bruce@momjian.us>)
Responses Re: CREATE INDEX and HOT - revised design  (Heikki Linnakangas <heikki@enterprisedb.com>)
Re: CREATE INDEX and HOT - revised design  ("Simon Riggs" <simon@2ndquadrant.com>)
Re: CREATE INDEX and HOT - revised design  (Gregory Stark <stark@enterprisedb.com>)
Re: CREATE INDEX and HOT - revised design  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Re: CREATE INDEX and HOT - revised design  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
List pgsql-hackers
Bruce Momjian wrote:
> 
> I have read the HOT discussion and wanted to give my input.  The major
> issue is that CREATE INDEX might require a HOT chain to be split apart
> if one of the new indexed columns changed in the HOT chain.

To expand a little more, the problem is that when you split those HOT
chains, you have to insert new entries into the _existing_ indexes,
causing problems for concurrent sequential scans.

I have a new idea.  There has been a lot of focus on trying to tag each
tuple to indicate whether it is part of a HOT chain for individual
indexes, the idea being that some indexes will follow the HOT chain, and
some will not.

A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index.  The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains.  Then doing lookups using that index, the new index
does not follow HOT chains.  We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created.  Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.

A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it.  The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created.  Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index.  Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.  

I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.

OK, so there is my first idea to do this cleanly.  Comments?

---------------------------------------------------------------------------




> As for the outline below, there is no way we are going to add new ALTER
> TABLE and CHILL commands to make this work.  Can you imagine what kind
> of warts we would have in the system if we added such things every time
> we got stuck?  And once the warts are in, it is hard to educate people
> once they are removed.  We need to keep going until we have a solution
> that is as user-invisible as possible.  While I understand the
> frustration that we have not discussed this enough, I don't want us
> rushing to a solution either until it has been totally thought through.
> 
> HOT is not a feature only a few people are going to want to use ---
> everyone will want it, and if the user process is cumbersome, we will get
> never-ending questions about how to make it work.
> 
> Let's all think about this for the next few days.
> 
> ---------------------------------------------------------------------------
> 
> Pavan Deolasee wrote:
> > 
> > 
> > There are  few things I realized over the weekend while going
> > through the code:
> > 
> > 1. It looks like a bad idea to use ALTER TABLE ..  to chill a table
> > becuase ALTER TABLE takes AccessExclusive lock on the table.
> > But it would still be a good idea to have ALTER TABLE .. to turn
> > HOT-updates ON/OFF.
> > 
> > 2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
> > its lock anyways and is prone to deadlock. So as long as we don't
> > create new deadlock scenarios, we should be fine.
> > 
> > 3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
> > So its should be acceptable if we run CHILL as a seperate transaction.
> > 
> > 4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
> > thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
> > VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
> > race conditions between all of these.
> > 
> > 
> > So here is my plan. Let me know your suggestions/comments/objections.
> > 
> > 
> > Changes to pg_class and new DDLs:
> > ---------------------------------
> > 
> > We add two boolean attributes to pg_class: hot_update and hot_fetch.
> > 
> > We introduce two DDLs to turn HOT on/off. Note that the DDL
> > itself does not CHILL the table, but only affects the
> > subsequent UPDATEs.
> > 
> > postgres=# ALTER TABLE <tblname> ENABLE HOT;
> > postgres=# ALTER TABLE <tblname> DISABLE HOT;
> > 
> > These DDLs would acquire AccessExclusive lock on the table and
> > set hot_update to true/false using simple_heap_update()
> > 
> > 
> > CREATE INDEX [CONCURRENTLY]:
> > ----------------------------
> > 
> > If a HEAP_ONLY tuple is found, error out with a HINT to run
> > CHILL on the table and then retry.
> > 
> > If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
> > the table.
> > 
> > 
> > CHILL utility:
> > --------------
> > 
> > We introduce a new command to chill a table. The syntax for the
> > same could be:
> > 
> > postgres=# CHILL [VERBOSE] <tblname>;
> > 
> > UPDATE/INSERT/SELECT would work while the table is being chilled.
> > But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL
> > would be locked out. As a side-effect, HOT-updates are turned off on the
> > table and explicit ALTER TABLE ENABLE HOT is required to turn
> > HOT-updates on again.
> > 
> > Here is the algoirthm to CHILL table.
> > 
> > 1. Check if CHILL is running inside a transaction block, error
> > out if so.
> > 
> > 2. Start a new transaction
> > 
> > 3. Acquire ShareUpdateExclusiveLock on the relation. This would
> > allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent
> > CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY]
> > 
> > 4. Set hot_update to false and update pg_class using
> > simple_heap_update()
> > 
> > 5. Acquire ShareUpdateExclusiveLock for the entire session.
> > 
> > 6. Commit the transaction
> > 
> > 7. Start a new transaction
> > 
> > 8. Wait for all transactions in the current snapshot to finish.
> > This would ensure that there are no HOT-updates possible further
> > 
> > 9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting
> > appropriate index entries and setting CHILL_IN_PROGRESS flag.
> > WAL log the operation
> > 
> > 10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't
> > be sure whether the corresponding index entry already exists
> > or not. One option is to error out and force VACUUM on the table.
> > Alternatively, the index_insert can be enhanced to check if a
> > the same entry already exists.
> > 
> > 11. When the entire heap is chilled, set hot_fetch to false
> > and update pg_class using simple_heap_update()
> > 
> > 12. Commit the transaction
> > 
> > 13. Start a new transaction
> > 
> > 14. Wait for all transactions in the current snapshot to finish.
> > This would ensure that all the subsequent index scans would
> > only use direct path from the index.
> > 
> > 15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY
> > and HOT_UPDATED flags. WAL log the operations. We may not need
> > this, but we can revisit this later to optimize WAL logging.
> > 
> > 16. When the second scan is complete, set hot_fetch to
> > true and update pg_class using simple_heap_update(). There are
> > no HOT_UPDATED tuples in the heap at this moment, but we should
> > reset the state neverthless.
> > 
> > 17. Commit the transaction.
> > 
> > If the CHILL command crashes before completing the operation,
> > we might be left with hot_update/hot_fetch turned OFF. Administrative
> > command is needed to turn them ON again. But there won't be any
> > correctness problems in the meantime.
> > 
> > The uncleaned tuples left with CHILL_IN_PROGRESS flags would
> > require VACUUM for cleanup.
> > 
> > Index Fetch:
> > ------------
> > 
> > If hot_fetch is true, we ignore direct paths from the
> > index to HEAP_ONLY tuples
> > 
> > If hot_fetch is false, we ignore HOT_UPDATED flags
> > and only use direct paths from the index.
> > 
> > VACUUM [FULL]:
> > --------------
> > 
> > If a CHILL_IN_PROGRESS flag found, collect that tuple for
> > index removal irrespective of whether the tuple is DEAD
> > or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS
> > flag is reset in the second pass.
> > 
> > 
> > Is the plan acceptable ? If there are no objections to the
> > algorithms or the behavior in general, I would start working
> > on this with a target of feature freeze.
> > 
> > 
> > Thanks,
> > Pavan
> > 
> > 
> > -- 
> > 
> > 
> > EnterpriseDB        http://www.enterprisedb.com
> > 
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's datatypes do not
> >        match
> 
> -- 
>   Bruce Momjian  <bruce@momjian.us>          http://momjian.us
>   EnterpriseDB                               http://www.enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: "D'Arcy J.M. Cain"
Date:
Subject: Re: Money type todos?
Next
From: Tom Lane
Date:
Subject: Re: Effects of GUC settings on automatic replans