Thread: CREATE INDEX and HOT - revised design
There are few things I realized over the weekend while going through the code: 1. It looks like a bad idea to use ALTER TABLE .. to chill a table becuase ALTER TABLE takes AccessExclusive lock on the table. But it would still be a good idea to have ALTER TABLE .. to turn HOT-updates ON/OFF. 2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade its lock anyways and is prone to deadlock. So as long as we don't create new deadlock scenarios, we should be fine. 3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction. So its should be acceptable if we run CHILL as a seperate transaction. 4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and VACUUM FULL. We can do the same for CHILL to avoid any unnecessary race conditions between all of these. So here is my plan. Let me know your suggestions/comments/objections. Changes to pg_class and new DDLs: --------------------------------- We add two boolean attributes to pg_class: hot_update and hot_fetch. We introduce two DDLs to turn HOT on/off. Note that the DDL itself does not CHILL the table, but only affects the subsequent UPDATEs. postgres=# ALTER TABLE <tblname> ENABLE HOT; postgres=# ALTER TABLE <tblname> DISABLE HOT; These DDLs would acquire AccessExclusive lock on the table and set hot_update to true/false using simple_heap_update() CREATE INDEX [CONCURRENTLY]: ---------------------------- If a HEAP_ONLY tuple is found, error out with a HINT to run CHILL on the table and then retry. If CHILL_IN_PROGRESS tuple is found, VACUUM is required on the table. CHILL utility: -------------- We introduce a new command to chill a table. The syntax for the same could be: postgres=# CHILL [VERBOSE] <tblname>; UPDATE/INSERT/SELECT would work while the table is being chilled. But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL would be locked out. As a side-effect, HOT-updates are turned off on the table and explicit ALTER TABLE ENABLE HOT is required to turn HOT-updates on again. Here is the algoirthm to CHILL table. 1. Check if CHILL is running inside a transaction block, error out if so. 2. Start a new transaction 3. Acquire ShareUpdateExclusiveLock on the relation. This would allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY] 4. Set hot_update to false and update pg_class using simple_heap_update() 5. Acquire ShareUpdateExclusiveLock for the entire session. 6. Commit the transaction 7. Start a new transaction 8. Wait for all transactions in the current snapshot to finish. This would ensure that there are no HOT-updates possible further 9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting appropriate index entries and setting CHILL_IN_PROGRESS flag. WAL log the operation 10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't be sure whether the corresponding index entry already exists or not. One option is to error out and force VACUUM on the table. Alternatively, the index_insert can be enhanced to check if a the same entry already exists. 11. When the entire heap is chilled, set hot_fetch to false and update pg_class using simple_heap_update() 12. Commit the transaction 13. Start a new transaction 14. Wait for all transactions in the current snapshot to finish. This would ensure that all the subsequent index scans would only use direct path from the index. 15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY and HOT_UPDATED flags. WAL log the operations. We may not need this, but we can revisit this later to optimize WAL logging. 16. When the second scan is complete, set hot_fetch to true and update pg_class using simple_heap_update(). There are no HOT_UPDATED tuples in the heap at this moment, but we should reset the state neverthless. 17. Commit the transaction. If the CHILL command crashes before completing the operation, we might be left with hot_update/hot_fetch turned OFF. Administrative command is needed to turn them ON again. But there won't be any correctness problems in the meantime. The uncleaned tuples left with CHILL_IN_PROGRESS flags would require VACUUM for cleanup. Index Fetch: ------------ If hot_fetch is true, we ignore direct paths from the index to HEAP_ONLY tuples If hot_fetch is false, we ignore HOT_UPDATED flags and only use direct paths from the index. VACUUM [FULL]: -------------- If a CHILL_IN_PROGRESS flag found, collect that tuple for index removal irrespective of whether the tuple is DEAD or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS flag is reset in the second pass. Is the plan acceptable ? If there are no objections to the algorithms or the behavior in general, I would start working on this with a target of feature freeze. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
I have read the HOT discussion and wanted to give my input. The major issue is that CREATE INDEX might require a HOT chain to be split apart if one of the new indexed columns changed in the HOT chain. As for the outline below, there is no way we are going to add new ALTER TABLE and CHILL commands to make this work. Can you imagine what kind of warts we would have in the system if we added such things every time we got stuck? And once the warts are in, it is hard to educate people once they are removed. We need to keep going until we have a solution that is as user-invisible as possible. While I understand the frustration that we have not discussed this enough, I don't want us rushing to a solution either until it has been totally thought through. HOT is not a feature only a few people are going to want to use --- everyone will want it, and if the user process is cumbersome, we will get never-ending questions about how to make it work. Let's all think about this for the next few days. --------------------------------------------------------------------------- Pavan Deolasee wrote: > > > There are few things I realized over the weekend while going > through the code: > > 1. It looks like a bad idea to use ALTER TABLE .. to chill a table > becuase ALTER TABLE takes AccessExclusive lock on the table. > But it would still be a good idea to have ALTER TABLE .. to turn > HOT-updates ON/OFF. > > 2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade > its lock anyways and is prone to deadlock. So as long as we don't > create new deadlock scenarios, we should be fine. > > 3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction. > So its should be acceptable if we run CHILL as a seperate transaction. > > 4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and > thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and > VACUUM FULL. We can do the same for CHILL to avoid any unnecessary > race conditions between all of these. > > > So here is my plan. Let me know your suggestions/comments/objections. > > > Changes to pg_class and new DDLs: > --------------------------------- > > We add two boolean attributes to pg_class: hot_update and hot_fetch. > > We introduce two DDLs to turn HOT on/off. Note that the DDL > itself does not CHILL the table, but only affects the > subsequent UPDATEs. > > postgres=# ALTER TABLE <tblname> ENABLE HOT; > postgres=# ALTER TABLE <tblname> DISABLE HOT; > > These DDLs would acquire AccessExclusive lock on the table and > set hot_update to true/false using simple_heap_update() > > > CREATE INDEX [CONCURRENTLY]: > ---------------------------- > > If a HEAP_ONLY tuple is found, error out with a HINT to run > CHILL on the table and then retry. > > If CHILL_IN_PROGRESS tuple is found, VACUUM is required on > the table. > > > CHILL utility: > -------------- > > We introduce a new command to chill a table. The syntax for the > same could be: > > postgres=# CHILL [VERBOSE] <tblname>; > > UPDATE/INSERT/SELECT would work while the table is being chilled. > But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL > would be locked out. As a side-effect, HOT-updates are turned off on the > table and explicit ALTER TABLE ENABLE HOT is required to turn > HOT-updates on again. > > Here is the algoirthm to CHILL table. > > 1. Check if CHILL is running inside a transaction block, error > out if so. > > 2. Start a new transaction > > 3. Acquire ShareUpdateExclusiveLock on the relation. This would > allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent > CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY] > > 4. Set hot_update to false and update pg_class using > simple_heap_update() > > 5. Acquire ShareUpdateExclusiveLock for the entire session. > > 6. Commit the transaction > > 7. Start a new transaction > > 8. Wait for all transactions in the current snapshot to finish. > This would ensure that there are no HOT-updates possible further > > 9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting > appropriate index entries and setting CHILL_IN_PROGRESS flag. > WAL log the operation > > 10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't > be sure whether the corresponding index entry already exists > or not. One option is to error out and force VACUUM on the table. > Alternatively, the index_insert can be enhanced to check if a > the same entry already exists. > > 11. When the entire heap is chilled, set hot_fetch to false > and update pg_class using simple_heap_update() > > 12. Commit the transaction > > 13. Start a new transaction > > 14. Wait for all transactions in the current snapshot to finish. > This would ensure that all the subsequent index scans would > only use direct path from the index. > > 15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY > and HOT_UPDATED flags. WAL log the operations. We may not need > this, but we can revisit this later to optimize WAL logging. > > 16. When the second scan is complete, set hot_fetch to > true and update pg_class using simple_heap_update(). There are > no HOT_UPDATED tuples in the heap at this moment, but we should > reset the state neverthless. > > 17. Commit the transaction. > > If the CHILL command crashes before completing the operation, > we might be left with hot_update/hot_fetch turned OFF. Administrative > command is needed to turn them ON again. But there won't be any > correctness problems in the meantime. > > The uncleaned tuples left with CHILL_IN_PROGRESS flags would > require VACUUM for cleanup. > > Index Fetch: > ------------ > > If hot_fetch is true, we ignore direct paths from the > index to HEAP_ONLY tuples > > If hot_fetch is false, we ignore HOT_UPDATED flags > and only use direct paths from the index. > > VACUUM [FULL]: > -------------- > > If a CHILL_IN_PROGRESS flag found, collect that tuple for > index removal irrespective of whether the tuple is DEAD > or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS > flag is reset in the second pass. > > > Is the plan acceptable ? If there are no objections to the > algorithms or the behavior in general, I would start working > on this with a target of feature freeze. > > > Thanks, > Pavan > > > -- > > > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > I have read the HOT discussion and wanted to give my input. The major > issue is that CREATE INDEX might require a HOT chain to be split apart > if one of the new indexed columns changed in the HOT chain. To expand a little more, the problem is that when you split those HOT chains, you have to insert new entries into the _existing_ indexes, causing problems for concurrent sequential scans. I have a new idea. There has been a lot of focus on trying to tag each tuple to indicate whether it is part of a HOT chain for individual indexes, the idea being that some indexes will follow the HOT chain, and some will not. A different idea is to flag the _index_ as using HOT for the table or not, using a boolean in pg_index. The idea is that when a new index is created, it has its HOT boolean set to false and indexes all tuples and ignores HOT chains. Then doing lookups using that index, the new index does not follow HOT chains. We also add a boolean to pg_class to indicate no new HOT chains should be created and set that to false once the new index is created. Then, at some later time when all HOT chains are dead, we can enable HOT chain following for the new index and allow new HOT chains to be created. A more sophisticated idea would be to place an xid, rather than a boolean, in pg_index to indicate which chains were created after the index was created to control whether the index should follow that HOT chain, or ignore it. The xmax of the head of the HOT chain can be used as an indicator of when the chain was created. Transactions started before the pg_index xid could continue following the old rules and insert into the _new_ index for HOT chain additions, and new transactions would create HOT chains that could skip adding to the new index. Cleanup of the hybrid HOT chains (some indexes take part, some do not) would be more complex. I know we have xid wrap-around, but I think the VACUUM FREEZE could handle it by freezing the pg_index xid column value when it does the table. OK, so there is my first idea to do this cleanly. Comments? --------------------------------------------------------------------------- > As for the outline below, there is no way we are going to add new ALTER > TABLE and CHILL commands to make this work. Can you imagine what kind > of warts we would have in the system if we added such things every time > we got stuck? And once the warts are in, it is hard to educate people > once they are removed. We need to keep going until we have a solution > that is as user-invisible as possible. While I understand the > frustration that we have not discussed this enough, I don't want us > rushing to a solution either until it has been totally thought through. > > HOT is not a feature only a few people are going to want to use --- > everyone will want it, and if the user process is cumbersome, we will get > never-ending questions about how to make it work. > > Let's all think about this for the next few days. > > --------------------------------------------------------------------------- > > Pavan Deolasee wrote: > > > > > > There are few things I realized over the weekend while going > > through the code: > > > > 1. It looks like a bad idea to use ALTER TABLE .. to chill a table > > becuase ALTER TABLE takes AccessExclusive lock on the table. > > But it would still be a good idea to have ALTER TABLE .. to turn > > HOT-updates ON/OFF. > > > > 2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade > > its lock anyways and is prone to deadlock. So as long as we don't > > create new deadlock scenarios, we should be fine. > > > > 3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction. > > So its should be acceptable if we run CHILL as a seperate transaction. > > > > 4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and > > thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and > > VACUUM FULL. We can do the same for CHILL to avoid any unnecessary > > race conditions between all of these. > > > > > > So here is my plan. Let me know your suggestions/comments/objections. > > > > > > Changes to pg_class and new DDLs: > > --------------------------------- > > > > We add two boolean attributes to pg_class: hot_update and hot_fetch. > > > > We introduce two DDLs to turn HOT on/off. Note that the DDL > > itself does not CHILL the table, but only affects the > > subsequent UPDATEs. > > > > postgres=# ALTER TABLE <tblname> ENABLE HOT; > > postgres=# ALTER TABLE <tblname> DISABLE HOT; > > > > These DDLs would acquire AccessExclusive lock on the table and > > set hot_update to true/false using simple_heap_update() > > > > > > CREATE INDEX [CONCURRENTLY]: > > ---------------------------- > > > > If a HEAP_ONLY tuple is found, error out with a HINT to run > > CHILL on the table and then retry. > > > > If CHILL_IN_PROGRESS tuple is found, VACUUM is required on > > the table. > > > > > > CHILL utility: > > -------------- > > > > We introduce a new command to chill a table. The syntax for the > > same could be: > > > > postgres=# CHILL [VERBOSE] <tblname>; > > > > UPDATE/INSERT/SELECT would work while the table is being chilled. > > But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL > > would be locked out. As a side-effect, HOT-updates are turned off on the > > table and explicit ALTER TABLE ENABLE HOT is required to turn > > HOT-updates on again. > > > > Here is the algoirthm to CHILL table. > > > > 1. Check if CHILL is running inside a transaction block, error > > out if so. > > > > 2. Start a new transaction > > > > 3. Acquire ShareUpdateExclusiveLock on the relation. This would > > allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent > > CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY] > > > > 4. Set hot_update to false and update pg_class using > > simple_heap_update() > > > > 5. Acquire ShareUpdateExclusiveLock for the entire session. > > > > 6. Commit the transaction > > > > 7. Start a new transaction > > > > 8. Wait for all transactions in the current snapshot to finish. > > This would ensure that there are no HOT-updates possible further > > > > 9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting > > appropriate index entries and setting CHILL_IN_PROGRESS flag. > > WAL log the operation > > > > 10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't > > be sure whether the corresponding index entry already exists > > or not. One option is to error out and force VACUUM on the table. > > Alternatively, the index_insert can be enhanced to check if a > > the same entry already exists. > > > > 11. When the entire heap is chilled, set hot_fetch to false > > and update pg_class using simple_heap_update() > > > > 12. Commit the transaction > > > > 13. Start a new transaction > > > > 14. Wait for all transactions in the current snapshot to finish. > > This would ensure that all the subsequent index scans would > > only use direct path from the index. > > > > 15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY > > and HOT_UPDATED flags. WAL log the operations. We may not need > > this, but we can revisit this later to optimize WAL logging. > > > > 16. When the second scan is complete, set hot_fetch to > > true and update pg_class using simple_heap_update(). There are > > no HOT_UPDATED tuples in the heap at this moment, but we should > > reset the state neverthless. > > > > 17. Commit the transaction. > > > > If the CHILL command crashes before completing the operation, > > we might be left with hot_update/hot_fetch turned OFF. Administrative > > command is needed to turn them ON again. But there won't be any > > correctness problems in the meantime. > > > > The uncleaned tuples left with CHILL_IN_PROGRESS flags would > > require VACUUM for cleanup. > > > > Index Fetch: > > ------------ > > > > If hot_fetch is true, we ignore direct paths from the > > index to HEAP_ONLY tuples > > > > If hot_fetch is false, we ignore HOT_UPDATED flags > > and only use direct paths from the index. > > > > VACUUM [FULL]: > > -------------- > > > > If a CHILL_IN_PROGRESS flag found, collect that tuple for > > index removal irrespective of whether the tuple is DEAD > > or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS > > flag is reset in the second pass. > > > > > > Is the plan acceptable ? If there are no objections to the > > algorithms or the behavior in general, I would start working > > on this with a target of feature freeze. > > > > > > Thanks, > > Pavan > > > > > > -- > > > > > > EnterpriseDB http://www.enterprisedb.com > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > A different idea is to flag the _index_ as using HOT for the table or > not, using a boolean in pg_index. The idea is that when a new index is > created, it has its HOT boolean set to false and indexes all tuples and > ignores HOT chains. Then doing lookups using that index, the new index > does not follow HOT chains. We also add a boolean to pg_class to > indicate no new HOT chains should be created and set that to false once > the new index is created. Then, at some later time when all HOT chains > are dead, we can enable HOT chain following for the new index and allow > new HOT chains to be created. When exactly would all HOT chains be dead? AFAICS, that would be after the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run to prune and pointer-swing all HOT chains. Would we have to wait after setting the new forbid_hot_updates-flag in pg_class, to make sure everyone sees the change? What if CREATE INDEX crashes, would we need a vacuum to reset the flag? > A more sophisticated idea would be to place an xid, rather than a > boolean, in pg_index to indicate which chains were created after the > index was created to control whether the index should follow that HOT > chain, or ignore it. The xmax of the head of the HOT chain can be used > as an indicator of when the chain was created. Transactions started > before the pg_index xid could continue following the old rules and > insert into the _new_ index for HOT chain additions, and new > transactions would create HOT chains that could skip adding to the new > index. Cleanup of the hybrid HOT chains (some indexes take part, some > do not) would be more complex. What xid would you place in pg_index? Xid of the transaction running CREATE INDEX, ReadNewTransactionId() or what? How does that work if you have a transaction that begins before CREATE INDEX, and updates something after CREATE INDEX? > I know we have xid wrap-around, but I think the VACUUM FREEZE could > handle it by freezing the pg_index xid column value when it does the > table. I don't think you can freeze the xid-column, we went through a similar discussion on pg_class.relfrozenxid. But you can move it forward to oldest xmin. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2007-03-21 at 10:47 -0400, Bruce Momjian wrote: > A different idea is to flag the _index_ as using HOT for the table or > not, using a boolean in pg_index. The idea is that when a new index is > created, it has its HOT boolean set to false and indexes all tuples and > ignores HOT chains. Then doing lookups using that index, the new index > does not follow HOT chains. We also add a boolean to pg_class to > indicate no new HOT chains should be created and set that to false once > the new index is created. Then, at some later time when all HOT chains > are dead, we can enable HOT chain following for the new index and allow > new HOT chains to be created. "enable HOT chain following" would require us to hold an AccessExclusiveLock on the index. We know that solution exists, the question is: at what point would we ever request that lock? Or would we just wait until that lock is next taken before enabling it, giving the user no control over when its taken? A separate DDL command would be effectively the same as what Pavan has recently suggested. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Sorry, I was a bit too quick to respond. I didn't understand at first how this differs from Pavan's/Simon's proposals. Let me answer my own questions. Heikki Linnakangas wrote: > Bruce Momjian wrote: >> A different idea is to flag the _index_ as using HOT for the table or >> not, using a boolean in pg_index. The idea is that when a new index is >> created, it has its HOT boolean set to false and indexes all tuples and >> ignores HOT chains. Then doing lookups using that index, the new index >> does not follow HOT chains. We also add a boolean to pg_class to >> indicate no new HOT chains should be created and set that to false once >> the new index is created. Then, at some later time when all HOT chains >> are dead, we can enable HOT chain following for the new index and allow >> new HOT chains to be created. > > When exactly would all HOT chains be dead? AFAICS, that would be after > the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run > to prune and pointer-swing all HOT chains. I still think that's true. > Would we have to wait after setting the new forbid_hot_updates-flag in > pg_class, to make sure everyone sees the change? What if CREATE INDEX > crashes, would we need a vacuum to reset the flag? You wouldn't need to do any extra waits to set the forbid_hot_updates flag, CREATE INDEX locks the table and already sends a relcache invalidations to make the new index visible. CREATE INDEX CONCURRENTLY waits already. >> A more sophisticated idea would be to place an xid, rather than a >> boolean, in pg_index to indicate which chains were created after the >> index was created to control whether the index should follow that HOT >> chain, or ignore it. The xmax of the head of the HOT chain can be used >> as an indicator of when the chain was created. Transactions started >> before the pg_index xid could continue following the old rules and >> insert into the _new_ index for HOT chain additions, and new >> transactions would create HOT chains that could skip adding to the new >> index. Cleanup of the hybrid HOT chains (some indexes take part, some >> do not) would be more complex. > > What xid would you place in pg_index? Xid of the transaction running > CREATE INDEX, ReadNewTransactionId() or what? Apparently ReadNewTransactionId to make sure there's no existing tuples with an xmax smaller than that. > How does that work if you have a transaction that begins before CREATE > INDEX, and updates something after CREATE INDEX? You actually explained that above... The HOT_UPDATED flag on a tuple would basically mean that all indexes with xid < xmax doesn't contain an index pointer for the tuple, and all others do. When inserting new updated tuples, we'd also need to maintain that invariant. >> I know we have xid wrap-around, but I think the VACUUM FREEZE could >> handle it by freezing the pg_index xid column value when it does the >> table. > > I don't think you can freeze the xid-column, we went through a similar > discussion on pg_class.relfrozenxid. But you can move it forward to > oldest xmin. You could actually "freeze" the column, because unlike relfrozenid we never need to unfreeze it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Bruce Momjian" <bruce@momjian.us> writes: > We also add a boolean to pg_class to indicate no new HOT chains should be > created and set that to false once the new index is created. Since we have all the index info in the relcache we could just skim through all the indexes when we build the relcache and decide then whether we're allowed to do HOT updates. That avoids problems if we crash while HOT updates are disabled. I think we need to think harder about exactly what test we would perform against this xid to determine the two relevant tests, a) whether to prohibit HOT updates (because the index is "too new") b) whether to ignore HOT update chains when we use the index (because it's "too new" and any HOT update chains predate it). I fear it may imply that we have to keep performing cold updates until the first vacuum after the xid expires. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Bruce Momjian wrote: > > A different idea is to flag the _index_ as using HOT for the table or > > not, using a boolean in pg_index. The idea is that when a new index is > > created, it has its HOT boolean set to false and indexes all tuples and > > ignores HOT chains. Then doing lookups using that index, the new index > > does not follow HOT chains. We also add a boolean to pg_class to > > indicate no new HOT chains should be created and set that to false once > > the new index is created. Then, at some later time when all HOT chains > > are dead, we can enable HOT chain following for the new index and allow > > new HOT chains to be created. > > When exactly would all HOT chains be dead? AFAICS, that would be after > the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run > to prune and pointer-swing all HOT chains. Yes, I think so. We might need to have pg_autovacuum involved in checking pg_index and doing a vacuum when needed, but that is only for the boolean idea; the xid idea doesn't need it, I think. > Would we have to wait after setting the new forbid_hot_updates-flag in > pg_class, to make sure everyone sees the change? What if CREATE INDEX > crashes, would we need a vacuum to reset the flag? Uh, I am thinking we would just set it at the start of CREATE INDEX, but actually if we do the pg_index xid idea, we don't need to tell them anything --- once they see the new index in pg_index, they will automatically know whether to include the new index in the HOT tests based on its own xid. Right now, if a new index is created, existing backends know they need to insert into it, so it would be the same mechanism, except they have to check the pg_index xid field. If we use the xid idea, and CREATE INDEX crashes, there is no problem because all the backend just keep going and never see the new pg_index xid entry. > > A more sophisticated idea would be to place an xid, rather than a > > boolean, in pg_index to indicate which chains were created after the > > index was created to control whether the index should follow that HOT > > chain, or ignore it. The xmax of the head of the HOT chain can be used > > as an indicator of when the chain was created. Transactions started > > before the pg_index xid could continue following the old rules and > > insert into the _new_ index for HOT chain additions, and new > > transactions would create HOT chains that could skip adding to the new > > index. Cleanup of the hybrid HOT chains (some indexes take part, some > > do not) would be more complex. > > What xid would you place in pg_index? Xid of the transaction running > CREATE INDEX, ReadNewTransactionId() or what? I would say the current xid counter value at time of index creation completion, so currently running transactions are unaffected, but newly started transactions create HOT chains that include the new index, and we just cleanup the old hot chains as the currently-running transactions complete. > How does that work if you have a transaction that begins before CREATE > INDEX, and updates something after CREATE INDEX? OK, uses the old rules of always inserting into the new index, even if it is creating a HOT chain. The bottom line is that we would now have chains with different indexes involved, and the HOT head xmax compared to pg_index tells us which indexes are involved in the chain. > > I know we have xid wrap-around, but I think the VACUUM FREEZE could > > handle it by freezing the pg_index xid column value when it does the > > table. > > I don't think you can freeze the xid-column, we went through a similar > discussion on pg_class.relfrozenxid. But you can move it forward to > oldest xmin. Sounds good. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 3/21/07, Bruce Momjian <bruce@momjian.us > wrote:
You mean index scans, right ? Sequential scans don't change with HOT.
Bruce Momjian wrote:
>
> I have read the HOT discussion and wanted to give my input. The major
> issue is that CREATE INDEX might require a HOT chain to be split apart
> if one of the new indexed columns changed in the HOT chain.
To expand a little more, the problem is that when you split those HOT
chains, you have to insert new entries into the _existing_ indexes,
causing problems for concurrent sequential scans.
You mean index scans, right ? Sequential scans don't change with HOT.
I have a new idea. There has been a lot of focus on trying to tag each
tuple to indicate whether it is part of a HOT chain for individual
indexes, the idea being that some indexes will follow the HOT chain, and
some will not.
In general I would like to preserve the HOT properties at the end of
creation of new index. All index should point to the same root tuple
if pointing to a HOT-chain. The things might become otherwise messy
with the line pointer redirection, reuse of LP_DELETEd tuples and
other things that we have put in.
Disabling HOT-updates using pg_class attribute has same issue
with my earlier proposal. What do we do if the backend crashes before
it can enable it again ? Who would reset the flag ? We could have lived
without DDL and CHILL command if we would have answers for
these questions.
A different idea here:
Would it be acceptable to force CREATE INDEX to be run outside
a transaction block just like CREATE INDEX CONCURRENTLY ? If thats
acceptable, we can do the following:
CREATE INDEX:
Since CREATE INDEX locks out table for UPDATEs, we just need
to wait enough before we start the heap scan so that when we do
heap scan, all HOT-chains can be pruned to a single tuple (with or
without the redirected line pointer). So when the new index is built,
we first prune the entire page of HOT-chains and insert the TID of
the root tuple into the new index. IOW we need to wait for all
transactions in the snapshot after acquiring ShareLock but before
we start heap scan.
CREATE INDEX CONCURRENTLY:
Simon has already posted a design that would work with CREATE
INDEX CONCURRENTLY. I think we need to tweak it a bit so
that we can handle the HOT-updated tuples after then first heap
scan, but before the index is visible to all transactions. Once the
new index is visible, the heap_update() code path would take care
of avoiding HOT-updates if the column on which new index is being
built is updated.
It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?
Comments ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Wed, 2007-03-21 at 10:47 -0400, Bruce Momjian wrote: > > > A different idea is to flag the _index_ as using HOT for the table or > > not, using a boolean in pg_index. The idea is that when a new index is > > created, it has its HOT boolean set to false and indexes all tuples and > > ignores HOT chains. Then doing lookups using that index, the new index > > does not follow HOT chains. We also add a boolean to pg_class to > > indicate no new HOT chains should be created and set that to false once > > the new index is created. Then, at some later time when all HOT chains > > are dead, we can enable HOT chain following for the new index and allow > > new HOT chains to be created. > > "enable HOT chain following" would require us to hold an > AccessExclusiveLock on the index. OK. > We know that solution exists, the question is: at what point would we > ever request that lock? Or would we just wait until that lock is next > taken before enabling it, giving the user no control over when its > taken? A separate DDL command would be effectively the same as what > Pavan has recently suggested. We could just request a lock and if it fails, try again later; it just delays HOT updates, which is not a big problem. Allowing the user to do it via a separate command seems to add nothing. The only thing I could see would be adding an option to CREATE INDEX which waits and then does it. However, given recent comments, I think the xid idea, while more complicated, is better because it has fewer restrictions on when things happen. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Ah, sounds like you have the idea clearly now. Great. --------------------------------------------------------------------------- Heikki Linnakangas wrote: > Sorry, I was a bit too quick to respond. I didn't understand at first > how this differs from Pavan's/Simon's proposals. > > Let me answer my own questions. > > Heikki Linnakangas wrote: > > Bruce Momjian wrote: > >> A different idea is to flag the _index_ as using HOT for the table or > >> not, using a boolean in pg_index. The idea is that when a new index is > >> created, it has its HOT boolean set to false and indexes all tuples and > >> ignores HOT chains. Then doing lookups using that index, the new index > >> does not follow HOT chains. We also add a boolean to pg_class to > >> indicate no new HOT chains should be created and set that to false once > >> the new index is created. Then, at some later time when all HOT chains > >> are dead, we can enable HOT chain following for the new index and allow > >> new HOT chains to be created. > > > > When exactly would all HOT chains be dead? AFAICS, that would be after > > the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run > > to prune and pointer-swing all HOT chains. > > I still think that's true. > > > Would we have to wait after setting the new forbid_hot_updates-flag in > > pg_class, to make sure everyone sees the change? What if CREATE INDEX > > crashes, would we need a vacuum to reset the flag? > > You wouldn't need to do any extra waits to set the forbid_hot_updates > flag, CREATE INDEX locks the table and already sends a relcache > invalidations to make the new index visible. CREATE INDEX CONCURRENTLY > waits already. > > >> A more sophisticated idea would be to place an xid, rather than a > >> boolean, in pg_index to indicate which chains were created after the > >> index was created to control whether the index should follow that HOT > >> chain, or ignore it. The xmax of the head of the HOT chain can be used > >> as an indicator of when the chain was created. Transactions started > >> before the pg_index xid could continue following the old rules and > >> insert into the _new_ index for HOT chain additions, and new > >> transactions would create HOT chains that could skip adding to the new > >> index. Cleanup of the hybrid HOT chains (some indexes take part, some > >> do not) would be more complex. > > > > What xid would you place in pg_index? Xid of the transaction running > > CREATE INDEX, ReadNewTransactionId() or what? > > Apparently ReadNewTransactionId to make sure there's no existing tuples > with an xmax smaller than that. > > > How does that work if you have a transaction that begins before CREATE > > INDEX, and updates something after CREATE INDEX? > > You actually explained that above... > > The HOT_UPDATED flag on a tuple would basically mean that all indexes > with xid < xmax doesn't contain an index pointer for the tuple, and all > others do. When inserting new updated tuples, we'd also need to maintain > that invariant. > > >> I know we have xid wrap-around, but I think the VACUUM FREEZE could > >> handle it by freezing the pg_index xid column value when it does the > >> table. > > > > I don't think you can freeze the xid-column, we went through a similar > > discussion on pg_class.relfrozenxid. But you can move it forward to > > oldest xmin. > > You could actually "freeze" the column, because unlike relfrozenid we > never need to unfreeze it. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
As stated in later email, I think we should focus on the xid idea because it is more flexible. --------------------------------------------------------------------------- Gregory Stark wrote: > > "Bruce Momjian" <bruce@momjian.us> writes: > > > We also add a boolean to pg_class to indicate no new HOT chains should be > > created and set that to false once the new index is created. > > Since we have all the index info in the relcache we could just skim through > all the indexes when we build the relcache and decide then whether we're > allowed to do HOT updates. That avoids problems if we crash while HOT updates > are disabled. > > I think we need to think harder about exactly what test we would perform > against this xid to determine the two relevant tests, > > a) whether to prohibit HOT updates (because the index is "too new") > > b) whether to ignore HOT update chains when we use the index (because it's > "too new" and any HOT update chains predate it). > > I fear it may imply that we have to keep performing cold updates until the > first vacuum after the xid expires. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Pavan Deolasee wrote: > On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote: > > > > Bruce Momjian wrote: > > > > > > I have read the HOT discussion and wanted to give my input. The major > > > issue is that CREATE INDEX might require a HOT chain to be split apart > > > if one of the new indexed columns changed in the HOT chain. > > > > To expand a little more, the problem is that when you split those HOT > > chains, you have to insert new entries into the _existing_ indexes, > > causing problems for concurrent sequential scans. > > > You mean index scans, right ? Sequential scans don't change with HOT. Sorry, yes, index scans. > I have a new idea. There has been a lot of focus on trying to tag each > > tuple to indicate whether it is part of a HOT chain for individual > > indexes, the idea being that some indexes will follow the HOT chain, and > > some will not. > > > > > > In general I would like to preserve the HOT properties at the end of > creation of new index. All index should point to the same root tuple > if pointing to a HOT-chain. The things might become otherwise messy > with the line pointer redirection, reuse of LP_DELETEd tuples and > other things that we have put in. > > Disabling HOT-updates using pg_class attribute has same issue > with my earlier proposal. What do we do if the backend crashes before > it can enable it again ? Who would reset the flag ? We could have lived > without DDL and CHILL command if we would have answers for > these questions. Let's focus on the xid idea, as stated in earlier emails. It has fewer restrictions. > A different idea here: > > Would it be acceptable to force CREATE INDEX to be run outside > a transaction block just like CREATE INDEX CONCURRENTLY ? If thats > acceptable, we can do the following: Yea, that is possible, but not ideal. > CREATE INDEX: > > Since CREATE INDEX locks out table for UPDATEs, we just need > to wait enough before we start the heap scan so that when we do > heap scan, all HOT-chains can be pruned to a single tuple (with or > without the redirected line pointer). So when the new index is built, > we first prune the entire page of HOT-chains and insert the TID of > the root tuple into the new index. IOW we need to wait for all > transactions in the snapshot after acquiring ShareLock but before > we start heap scan. I am worried that will require CREATE INDEX to wait for a long time. > CREATE INDEX CONCURRENTLY: > > Simon has already posted a design that would work with CREATE > INDEX CONCURRENTLY. I think we need to tweak it a bit so > that we can handle the HOT-updated tuples after then first heap > scan, but before the index is visible to all transactions. Once the > new index is visible, the heap_update() code path would take care > of avoiding HOT-updates if the column on which new index is being > built is updated. > > It seems much simpler to me do something like this. But important > question is whether the restriction that CREATE INDEX can not > be run in a transaction block is acceptable ? Is the pg_index xid idea too complex? It seems to give you the per-tuple index bit, without the restrictions. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: > As stated in later email, I think we should focus on the xid idea > because it is more flexible. Sorry if I was unclear. I agree, my comment and questions are all predicated on the assumption that we would go with xids. > --------------------------------------------------------------------------- > > Gregory Stark wrote: >> >> "Bruce Momjian" <bruce@momjian.us> writes: >> >> > We also add a boolean to pg_class to indicate no new HOT chains should be >> > created and set that to false once the new index is created. >> >> Since we have all the index info in the relcache we could just skim through >> all the indexes when we build the relcache and decide then whether we're >> allowed to do HOT updates. That avoids problems if we crash while HOT updates >> are disabled. >> >> I think we need to think harder about exactly what test we would perform >> against this xid to determine the two relevant tests, >> >> a) whether to prohibit HOT updates (because the index is "too new") >> >> b) whether to ignore HOT update chains when we use the index (because it's >> "too new" and any HOT update chains predate it). >> >> I fear it may imply that we have to keep performing cold updates until the >> first vacuum after the xid expires. >> >> -- >> Gregory Stark >> EnterpriseDB http://www.enterprisedb.com >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
Not unless there are long running transactions. We are not waiting
for the lock, but only for the current transactions to finish.
How do we handle HEAP_ONLY tuple cleanup ? If I understood
the proposal correctly, at the end of the create index, a HEAP_ONLY
tuple may have pointer from the new index, isn't it ?
Thanks,
Pavan
I am worried that will require CREATE INDEX to wait for a long time.
Not unless there are long running transactions. We are not waiting
for the lock, but only for the current transactions to finish.
Is the pg_index xid idea too complex? It seems to give you the
per-tuple index bit, without the restrictions.
How do we handle HEAP_ONLY tuple cleanup ? If I understood
the proposal correctly, at the end of the create index, a HEAP_ONLY
tuple may have pointer from the new index, isn't it ?
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > > As stated in later email, I think we should focus on the xid idea > > because it is more flexible. > > Sorry if I was unclear. I agree, my comment and questions are all predicated > on the assumption that we would go with xids. OK, replies below. > > Gregory Stark wrote: > >> > >> "Bruce Momjian" <bruce@momjian.us> writes: > >> > >> > We also add a boolean to pg_class to indicate no new HOT chains should be > >> > created and set that to false once the new index is created. > >> > >> Since we have all the index info in the relcache we could just skim through > >> all the indexes when we build the relcache and decide then whether we're > >> allowed to do HOT updates. That avoids problems if we crash while HOT updates > >> are disabled. Uh, with xid, we don't disable HOT updates at all --- we just control, via xids, which indexes are part of which HOT update chain. > >> I think we need to think harder about exactly what test we would perform > >> against this xid to determine the two relevant tests, > >> > >> a) whether to prohibit HOT updates (because the index is "too new") Never. > >> b) whether to ignore HOT update chains when we use the index (because it's > >> "too new" and any HOT update chains predate it). We have to look at the xmax of the HOT chain head and compare that to the index xid. > >> I fear it may imply that we have to keep performing cold updates until the > >> first vacuum after the xid expires. Yes, effectively, that is true, because once we have the new index in place, we can't reuse the old HOT chains --- we need VACUUM at some point, but new HOT chains can be created. Actually, since we have a _write_ lock on the table, there are no table changes during index creation, so we only need to worry about changes after the index is created. After the index is created, old HOT chains have to be cleaned up via VACUUM because we have the new index pointing into the HOT chain. Existing transactions can't create new HOT chains because they would have to insert into the new index (because of their xid), so they might as well not use HOT. New transactions can create new HOT chains because their xid marks them as including the new index in the index group. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Pavan Deolasee wrote: > On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote: > > > > > > > > I am worried that will require CREATE INDEX to wait for a long time. > > > > Not unless there are long running transactions. We are not waiting > for the lock, but only for the current transactions to finish. Waiting for all transactions might take a while, no? > > Is the pg_index xid idea too complex? It seems to give you the > > per-tuple index bit, without the restrictions. > > > > How do we handle HEAP_ONLY tuple cleanup ? If I understood > the proposal correctly, at the end of the create index, a HEAP_ONLY > tuple may have pointer from the new index, isn't it ? Right. You would need vacuum to clean up the HEAP_ONLY tuples. I just sent an email about those deails. We might have autovacuum check pg_index and do it automatically. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Effectively, my idea is not to chill/break the HOT chains during index creation, but rather to abandon them and wait for VACUUM to clean them up. My idea is much closer to the idea of a bit per index on every tuple, except the tuple xmax and pg_index xid replace them. --------------------------------------------------------------------------- Bruce Momjian wrote: > Pavan Deolasee wrote: > > On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote: > > > > > > > > > > > > I am worried that will require CREATE INDEX to wait for a long time. > > > > > > > > Not unless there are long running transactions. We are not waiting > > for the lock, but only for the current transactions to finish. > > Waiting for all transactions might take a while, no? > > > > Is the pg_index xid idea too complex? It seems to give you the > > > per-tuple index bit, without the restrictions. > > > > > > > > How do we handle HEAP_ONLY tuple cleanup ? If I understood > > the proposal correctly, at the end of the create index, a HEAP_ONLY > > tuple may have pointer from the new index, isn't it ? > > Right. You would need vacuum to clean up the HEAP_ONLY tuples. I just > sent an email about those deails. We might have autovacuum check > pg_index and do it automatically. > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 3/21/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > It seems much simpler to me do something like this. But important > question is whether the restriction that CREATE INDEX can not > be run in a transaction block is acceptable ? yikes -- this is huge, huge price to pay, IMHO. Think about DDL that implies index creation such as adding unique constraint to a table, many postgresql users (including me) take advantage of that in update systems to production databases. merlin
On 3/21/07, Merlin Moncure <mmoncure@gmail.com> wrote:
On 3/21/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> It seems much simpler to me do something like this. But important
> question is whether the restriction that CREATE INDEX can not
> be run in a transaction block is acceptable ?
yikes -- this is huge, huge price to pay, IMHO. Think about DDL that
implies index creation such as adding unique constraint to a table,
many postgresql users (including me) take advantage of that in update
systems to production databases.
I didn't understand that quite well. How does it help to run CREATE
INDEX inside a transaction ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
Effectively, my idea is not to chill/break the HOT chains during index
creation, but rather to abandon them and wait for VACUUM to clean them
up.
My idea is much closer to the idea of a bit per index on every tuple,
except the tuple xmax and pg_index xid replace them.
Regarding waiting inside CREATE INDEX for transactions started
before us doesn't seem like a big problem to me. But may be I am
mistaken.
Given a choice I won't complicate the fetch/update logic more. I would
rather fix the problem at hand, if I can. But I would go by the
community agreement on this issue since I don't have a clear
answer.
Also, I am wondering whether the information that which index is used to
fetch a tuple is always available. I haven't checked, but do we have that
information in lossy bitmap heapscan ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > On 3/21/07, Merlin Moncure <mmoncure@gmail.com> wrote: >> >> On 3/21/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: >> > It seems much simpler to me do something like this. But important >> > question is whether the restriction that CREATE INDEX can not >> > be run in a transaction block is acceptable ? >> >> yikes -- this is huge, huge price to pay, IMHO. Think about DDL that >> implies index creation such as adding unique constraint to a table, >> many postgresql users (including me) take advantage of that in update >> systems to production databases. >> > I didn't understand that quite well. How does it help to run CREATE > INDEX inside a transaction ? The problem is not so much CREATE INDEX per se, but other DDL commands that implicitly create an index, for example adding a PRIMARY KEY to a table. Some applications that I have written would fail badly if CREATE INDEX was disallowed inside a transaction - mostly, because they use plpgsql functions to manipulate database objects, and disallowing CREATE INDEX inside a transaction presumably means disallowing it from inside stored procedures. greetings, Florian Pflug
On 3/21/07, Florian G. Pflug <fgp@phlo.org> wrote: > Pavan Deolasee wrote: > > On 3/21/07, Merlin Moncure <mmoncure@gmail.com> wrote: > >> > >> On 3/21/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > >> > It seems much simpler to me do something like this. But important > >> > question is whether the restriction that CREATE INDEX can not > >> > be run in a transaction block is acceptable ? > >> > >> yikes -- this is huge, huge price to pay, IMHO. Think about DDL that > >> implies index creation such as adding unique constraint to a table, > >> many postgresql users (including me) take advantage of that in update > >> systems to production databases. > >> > > I didn't understand that quite well. How does it help to run CREATE > > INDEX inside a transaction ? > The problem is not so much CREATE INDEX per se, but other DDL commands > that implicitly create an index, for example adding a PRIMARY KEY to a > table. > > Some applications that I have written would fail badly if CREATE INDEX > was disallowed inside a transaction - mostly, because they use plpgsql > functions to manipulate database objects, and disallowing CREATE INDEX > inside a transaction presumably means disallowing it from inside > stored procedures. speaking with pavan off list he seems to think that only 'create index' is outside transaction, not the other ddl flavors of it because they are generally acquiring a excl lock. so, in that sense it is possibly acceptable to me although still a pretty tough pill to swallow (thinking, guc time). It would also preclude ever integrating vanilla 'create index' to create table command, fwiw. merlin
Pavan Deolasee wrote: > On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote: > > > > > > Effectively, my idea is not to chill/break the HOT chains during index > > creation, but rather to abandon them and wait for VACUUM to clean them > > up. > > > > My idea is much closer to the idea of a bit per index on every tuple, > > except the tuple xmax and pg_index xid replace them. > > > > > > > Regarding waiting inside CREATE INDEX for transactions started > before us doesn't seem like a big problem to me. But may be I am > mistaken. > > Given a choice I won't complicate the fetch/update logic more. I would > rather fix the problem at hand, if I can. But I would go by the > community agreement on this issue since I don't have a clear > answer. Yep, we all want is simple. If we can do the xid idea, we hopefully fix the concurrent index scan and crash recovery issues. > Also, I am wondering whether the information that which index is used to > fetch a tuple is always available. I haven't checked, but do we have that > information in lossy bitmap heapscan ? Oh, that is an interesting problem because an index might have one index entry representing an entire HOT chain, while another index might represent each chain member by individual index entries. When we do the bitmaps, don't we access them by heap tid, meaning we would find all entries anyway? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > Also, I am wondering whether the information that which index is used to > > fetch a tuple is always available. I haven't checked, but do we have that > > information in lossy bitmap heapscan ? > > Oh, that is an interesting problem because an index might have one index > entry representing an entire HOT chain, while another index might > represent each chain member by individual index entries. When we do the > bitmaps, don't we access them by heap tid, meaning we would find all > entries anyway? I thinking some more, it would be a problem because while we are merging the tids, we are using index entries and haven't looked at the heap yet. I am guessing we would have to exclude the new index from bitmap joins with other indexes until the VACUUM happens. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Bruce Momjian wrote: > > > Also, I am wondering whether the information that which index is used to > > > fetch a tuple is always available. I haven't checked, but do we have that > > > information in lossy bitmap heapscan ? > > > > Oh, that is an interesting problem because an index might have one index > > entry representing an entire HOT chain, while another index might > > represent each chain member by individual index entries. When we do the > > bitmaps, don't we access them by heap tid, meaning we would find all > > entries anyway? > > I thinking some more, it would be a problem because while we are merging > the tids, we are using index entries and haven't looked at the heap yet. > I am guessing we would have to exclude the new index from bitmap joins > with other indexes until the VACUUM happens. Thinking some more, bitmap scans have a mode that tracks just the page numbers, rather than the tids --- if the index visibilities do not match, we would need to fall back to that mode. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, 2007-03-21 at 13:29 -0400, Bruce Momjian wrote: > Pavan Deolasee wrote: > > On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote: > > > > > > > > > > > > I am worried that will require CREATE INDEX to wait for a long time. > > > > > > > > Not unless there are long running transactions. We are not waiting > > for the lock, but only for the current transactions to finish. > > Waiting for all transactions might take a while, no? It would be nice if we could force CREATE INDEX to only run outside a transaction block, but that just isn't possible if you care about existing applications. Any PL function running a CREATE INDEX would fail and I'm pretty sure there's lots of them. Many of them use dynamic SQL, so you wouldn't even be able to scan them prior to run-time to know that they would fail. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Bruce Momjian wrote: > Bruce Momjian wrote: >> Bruce Momjian wrote: >>>> Also, I am wondering whether the information that which index is used to >>>> fetch a tuple is always available. I haven't checked, but do we have that >>>> information in lossy bitmap heapscan ? >>> Oh, that is an interesting problem because an index might have one index >>> entry representing an entire HOT chain, while another index might >>> represent each chain member by individual index entries. When we do the >>> bitmaps, don't we access them by heap tid, meaning we would find all >>> entries anyway? >> I thinking some more, it would be a problem because while we are merging >> the tids, we are using index entries and haven't looked at the heap yet. >> I am guessing we would have to exclude the new index from bitmap joins >> with other indexes until the VACUUM happens. > > Thinking some more, bitmap scans have a mode that tracks just the page > numbers, rather than the tids --- if the index visibilities do not > match, we would need to fall back to that mode. You don't need to scan the whole page like in the lossy bitmap mode, just all the tuples in the HOT-chain. You need to somehow pass the information that multiple indexes have been used in the bitmap scan to the bitmap heapscan node, so that it knows when the extra checking is required. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Bruce Momjian wrote: > > Bruce Momjian wrote: > >> Bruce Momjian wrote: > >>>> Also, I am wondering whether the information that which index is used to > >>>> fetch a tuple is always available. I haven't checked, but do we have that > >>>> information in lossy bitmap heapscan ? > >>> Oh, that is an interesting problem because an index might have one index > >>> entry representing an entire HOT chain, while another index might > >>> represent each chain member by individual index entries. When we do the > >>> bitmaps, don't we access them by heap tid, meaning we would find all > >>> entries anyway? > >> I thinking some more, it would be a problem because while we are merging > >> the tids, we are using index entries and haven't looked at the heap yet. > >> I am guessing we would have to exclude the new index from bitmap joins > >> with other indexes until the VACUUM happens. > > > > Thinking some more, bitmap scans have a mode that tracks just the page > > numbers, rather than the tids --- if the index visibilities do not > > match, we would need to fall back to that mode. > > You don't need to scan the whole page like in the lossy bitmap mode, > just all the tuples in the HOT-chain. > > You need to somehow pass the information that multiple indexes have been > used in the bitmap scan to the bitmap heapscan node, so that it knows > when the extra checking is required. That might be confusing because you are going to have some tids that are chains, and some that aren't. The hard part is making sure you don't include the same tid twice. Another idea is to set pg_index xid to FrozenTransactionId once the VACUUM happens, and if it not frozen, do something special for bitmap scans. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > You don't need to scan the whole page like in the lossy bitmap mode, > > just all the tuples in the HOT-chain. > > > > You need to somehow pass the information that multiple indexes have been > > used in the bitmap scan to the bitmap heapscan node, so that it knows > > when the extra checking is required. > > That might be confusing because you are going to have some tids that are > chains, and some that aren't. The hard part is making sure you don't > include the same tid twice. > > Another idea is to set pg_index xid to FrozenTransactionId once the > VACUUM happens, and if it not frozen, do something special for bitmap > scans. Anyway, perhaps we can leave the bitmap scan part to someone more familiar with that part of the code, like Tom. HOT is a huge feature, so let's keep working on it. The patch doesn't have to be 100% on April 1, but we certainly want to try to get as far as we can by then. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > Anyway, perhaps we can leave the bitmap scan part to someone more > familiar with that part of the code, like Tom. > > > Yeah. We all know he doesn't have enough to do ... cheers andrew
> speaking with pavan off list he seems to think that only 'create > index' is outside transaction, not the other ddl flavors of it because > they are generally acquiring a excl lock. so, in that sense it is > possibly acceptable to me although still a pretty tough pill to > swallow (thinking, guc time). It would also preclude ever integrating > vanilla 'create index' to create table command, fwiw. Just to signal that it is in use: we did use create index in transactions occasionally when we had to do DB schema upgrade on production systems for application upgrades which span multiple versions of our application (normally we upgrade versions one by one, but we have some systems which are upgraded rarely). In these occasions it was riskier than usually to run the cumulated upgrade scripts outside a transaction block. But that was mostly a convenience feature, we could always rearrange our upgrade scripts to do all the rest first and then all the index creation at the end if all the rest succeeded... but if implicit index creation fails (e.g. when adding a new field to a table which happens also to be a primary key) inside the transaction, that would hurt... mostly in more work/more risks of extended downtime, but it will have a factor of inconvenience. Cheers, Csaba.
On 3/22/07, Csaba Nagy <nagy@ecircle-ag.com> wrote:
> speaking with pavan off list he seems to think that only 'create
> index' is outside transaction, not the other ddl flavors of it because
> they are generally acquiring a excl lock. so, in that sense it is
> possibly acceptable to me although still a pretty tough pill to
> swallow (thinking, guc time). It would also preclude ever integrating
> vanilla 'create index' to create table command, fwiw.
Just to signal that it is in use: we did use create index in
transactions occasionally when we had to do DB schema upgrade on
production systems for application upgrades which span multiple versions
of our application (normally we upgrade versions one by one, but we have
some systems which are upgraded rarely). In these occasions it was
riskier than usually to run the cumulated upgrade scripts outside a
transaction block.
But that was mostly a convenience feature, we could always rearrange our
upgrade scripts to do all the rest first and then all the index creation
at the end if all the rest succeeded... but if implicit index creation
fails (e.g. when adding a new field to a table which happens also to be
a primary key) inside the transaction, that would hurt... mostly in more
work/more risks of extended downtime, but it will have a factor of
inconvenience.
What I am hearing from many users is that its probably not such
a nice thing to put such restriction. Thats fair. It really helps to think
about a solution once you know what is acceptable and what is not.
I am back to the drawing board.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > What I am hearing from many users is that its probably not such > a nice thing to put such restriction. Thats fair. It really helps to think > about a solution once you know what is acceptable and what is not. That's likely to be the reaction for almost any restriction you can imagine. Performance improvements are great, but you can't ask people for whom current performance is adequate to pay a price in functionality for them. cheers andrew
On 3/22/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > On 3/22/07, Csaba Nagy <nagy@ecircle-ag.com> wrote: > > > speaking with pavan off list he seems to think that only 'create > > > index' is outside transaction, not the other ddl flavors of it because > > > they are generally acquiring a excl lock. so, in that sense it is > > > possibly acceptable to me although still a pretty tough pill to > > > swallow (thinking, guc time). It would also preclude ever integrating > > > vanilla 'create index' to create table command, fwiw. > > > > Just to signal that it is in use: we did use create index in > > transactions occasionally when we had to do DB schema upgrade on > > production systems for application upgrades which span multiple versions > > of our application (normally we upgrade versions one by one, but we have > > some systems which are upgraded rarely). In these occasions it was > > riskier than usually to run the cumulated upgrade scripts outside a > > transaction block. > > > > But that was mostly a convenience feature, we could always rearrange our > > upgrade scripts to do all the rest first and then all the index creation > > at the end if all the rest succeeded... but if implicit index creation > > fails (e.g. when adding a new field to a table which happens also to be > > a primary key) inside the transaction, that would hurt... mostly in more > > work/more risks of extended downtime, but it will have a factor of > > inconvenience. > > > > > > What I am hearing from many users is that its probably not such > a nice thing to put such restriction. Thats fair. It really helps to think > about a solution once you know what is acceptable and what is not. > > I am back to the drawing board. The objections to 'create index' forced to being non-transactional come from what I can see two general cases: * update scripts * functions that create tables, etc ISTM that in both cases nobody would complain too much if the lock was escalated to exclusive lock in those cases. So, maybe an alternative solution is this: * Make create index/create index concurrently extra-transactional as Pavan suggested. * Introduce index creation ability to alter/create table. If as Pavan suggests this integrates well with HOT due to excl lock, alter table can be left transactional. Now, there is at least alternative path to take in use cases previously covered by create index + transaction. now, there is a clean break between classic DDL (alter table, etc) and non-transactional maintenance commands, like vacuum which create index becomes part of. merlin
On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index. The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains. Then doing lookups using that index, the new index
does not follow HOT chains. We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created. Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.
A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it. The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created. Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index. Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.
Bruce, thanks for bringing up this idea.
As I think more about this idea, I think I am able to extend this further
to solve the problems we discussed around it. One of my conerns
were that the change the basic structure of heap
with HOT-chains so that it should be possible to just look at the
heap tuple and say whether it has any index pointer or not.
The way I propose to extend/modify the idea is to use pg_index xid
as suggested by Bruce to mark the index. This xid would guide the
visibility of the index. As we all know, CREATE INDEX locks out
UPDATEs on the table and further UPDATEs are possible only after
the transaction creating the new index commits.
When CREATE INDEX starts, it acquires ShareLock on the table.
At this point we may have one or more HOT-update chains in the
table. Tuples in this chain may be visible to one or more running
transactions. The fact that we have ShareLock on the table means
that all tuples in the chain except the one at the head either
RECENTLY_DEAD or were UPDATEd by the same transaction
that is now running CREATE INDEX.
With this background, I propose to index ONLY the head of the
HOT-chain. The TID of the root tuple is used instead of the actual
TID of the tuple being indexed. This index will not be available to
the transactions which are started before the CREATE INDEX
transaction. Just like we use "indisvalid" flag to avoid including
an invalid index in the plan, we use the pg_index "xid" to decide
whether to use the index in the plan or not. Only transactions with
txid > pg_index:xid can see the index and use it.
In fact, the serializable transactions started before CREATE INDEX
can not anyway see the index so all this is done to handle
read-committed transactions.
In this proposal we indexed only the latest version. But none of the
transactions started after CREATE INDEX can anyway see the
older tuples and hence we should be fine even if we don't index
them in the new index. And none of the older transaction can see
the index, so again we are safe. The design also helps us to
preserve the heap HOT semantics and chain pruning and does not
need VACUUM or any special handling.
Can anyone spot a hole in this logic ? Comments ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > When CREATE INDEX starts, it acquires ShareLock on the table. > At this point we may have one or more HOT-update chains in the > table. Tuples in this chain may be visible to one or more running > transactions. The fact that we have ShareLock on the table means > that all tuples in the chain except the one at the head either > RECENTLY_DEAD or were UPDATEd by the same transaction > that is now running CREATE INDEX. This logic seems pretty questionable to me in view of the recent VACUUM FULL bug fixes. We now know that an update chain can contain tuples that appear DEAD but are later than ones that are RECENTLY_DEAD. How are you defining a HOT chain exactly --- will it be just a part of an update chain that consists only of contiguous tuples that appear live or RECENTLY_DEAD to the CREATE INDEX transaction? > In fact, the serializable transactions started before CREATE INDEX > can not anyway see the index so all this is done to handle > read-committed transactions. You are laboring under an illusion that system catalog accesses are MVCC. SnapshotNow does not behave that way: the system can see the new index as soon as it's committed. (It had better, since it has to start updating the index immediately, whether it's safe to scan it or not.) I'm not sure whether that's fundamental to your argument or not, but it's certainly wrong. regards, tom lane
On 3/22/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
HOT-chain is something which contains contiguous tuples that share
the same index keys for all the existing indexes and share a single
index entry. It does not really matter if there are RECENTLY_DEAD
tuples that appear before a DEAD tuple because from VACUUM FULL
bug fix we know that they are DEAD too.
My argument is that its enough to index only the LIVE tuple which
is at the end of the chain if we don't use the new index for queries
in transactions which were started before CREATE INDEX. I am
proposing to do that by storing an xid in the pg_index row. A
special case is where a tuple is UPDATEd multiple times by
the same transaction which is also creating the index, in which case
there are more than one LIVE versions of the tuple. But again
we are safe by indexing only the latest version because all other
versions would be invisible (even to us) once CREATE INDEX commits.
Oh, thanks for pointing that out. But thats certainly not fundamental
to the argument as you probably already guessed. The xid still controls
the usage of index for query planning, somewhat similar to "isindvalid"
flag for CREATE INDEX CONCURRENTLY.
Thanks,
Pavan
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> When CREATE INDEX starts, it acquires ShareLock on the table.
> At this point we may have one or more HOT-update chains in the
> table. Tuples in this chain may be visible to one or more running
> transactions. The fact that we have ShareLock on the table means
> that all tuples in the chain except the one at the head either
> RECENTLY_DEAD or were UPDATEd by the same transaction
> that is now running CREATE INDEX.
This logic seems pretty questionable to me in view of the recent VACUUM
FULL bug fixes. We now know that an update chain can contain tuples that
appear DEAD but are later than ones that are RECENTLY_DEAD. How are
you defining a HOT chain exactly --- will it be just a part of an
update chain that consists only of contiguous tuples that appear live or
RECENTLY_DEAD to the CREATE INDEX transaction?
HOT-chain is something which contains contiguous tuples that share
the same index keys for all the existing indexes and share a single
index entry. It does not really matter if there are RECENTLY_DEAD
tuples that appear before a DEAD tuple because from VACUUM FULL
bug fix we know that they are DEAD too.
My argument is that its enough to index only the LIVE tuple which
is at the end of the chain if we don't use the new index for queries
in transactions which were started before CREATE INDEX. I am
proposing to do that by storing an xid in the pg_index row. A
special case is where a tuple is UPDATEd multiple times by
the same transaction which is also creating the index, in which case
there are more than one LIVE versions of the tuple. But again
we are safe by indexing only the latest version because all other
versions would be invisible (even to us) once CREATE INDEX commits.
> In fact, the serializable transactions started before CREATE INDEX
> can not anyway see the index so all this is done to handle
> read-committed transactions.
You are laboring under an illusion that system catalog accesses are MVCC.
SnapshotNow does not behave that way: the system can see the new index
as soon as it's committed. (It had better, since it has to start
updating the index immediately, whether it's safe to scan it or not.)
I'm not sure whether that's fundamental to your argument or not, but
it's certainly wrong.
Oh, thanks for pointing that out. But thats certainly not fundamental
to the argument as you probably already guessed. The xid still controls
the usage of index for query planning, somewhat similar to "isindvalid"
flag for CREATE INDEX CONCURRENTLY.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > My argument is that its enough to index only the LIVE tuple which > is at the end of the chain if we don't use the new index for queries > in transactions which were started before CREATE INDEX. I am > proposing to do that by storing an xid in the pg_index row. A > special case is where a tuple is UPDATEd multiple times by > the same transaction which is also creating the index, in which case > there are more than one LIVE versions of the tuple. But again > we are safe by indexing only the latest version because all other > versions would be invisible (even to us) once CREATE INDEX commits. What if CREATE INDEX is run in a SERIALIZABLE transaction? > > In fact, the serializable transactions started before CREATE INDEX > > > can not anyway see the index so all this is done to handle > > > read-committed transactions. > > > > You are laboring under an illusion that system catalog accesses are MVCC. > > SnapshotNow does not behave that way: the system can see the new index > > as soon as it's committed. (It had better, since it has to start > > updating the index immediately, whether it's safe to scan it or not.) > > I'm not sure whether that's fundamental to your argument or not, but > > it's certainly wrong. > > > > Oh, thanks for pointing that out. But thats certainly not fundamental > to the argument as you probably already guessed. The xid still controls > the usage of index for query planning, somewhat similar to "isindvalid" > flag for CREATE INDEX CONCURRENTLY. I am glad you found the pg_index xid actually helps in other ways. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, 2007-03-22 at 22:11 +0530, Pavan Deolasee wrote: > With this background, I propose to index ONLY the head of the > HOT-chain. The TID of the root tuple is used instead of the actual > TID of the tuple being indexed. This index will not be available to > the transactions which are started before the CREATE INDEX > transaction. Just like we use "indisvalid" flag to avoid including > an invalid index in the plan, we use the pg_index "xid" to decide > whether to use the index in the plan or not. Only transactions with > txid > pg_index:xid can see the index and use it. > > In fact, the serializable transactions started before CREATE INDEX > can not anyway see the index so all this is done to handle > read-committed transactions. > > In this proposal we indexed only the latest version. But none of the > transactions started after CREATE INDEX can anyway see the > older tuples and hence we should be fine even if we don't index > them in the new index. And none of the older transaction can see > the index, so again we are safe. The design also helps us to > preserve the heap HOT semantics and chain pruning and does not > need VACUUM or any special handling. Well, ISTM you've nailed it. CREATE INDEX returns as soon as possible, but people will have to wait for their next transaction before they can see it and use it too. Nice role reversal to avoid having CREATE INDEX wait. No restrictions on the number of indexes, no restrictions on multiple concurrent index builders and we can do this in just one pass. The ShareLock taken by CREATE INDEX guarantees all transactions that wrote data to the table have completed and that no new data can be added until after the index build commits. So the end of the chain is visible to CREATE INDEX and won't change. As long as you index the latest committed version on each HOT chain, then I think it works. Clearly want to ignore aborted versions. Sounds like you'll need to read the HOT chains in sequence to ensure we don't repeat the VACUUM FULL error. If there are no HOT chains then it will be just a normal seq scan of each block, so there's no real speed loss for situations where no HOT updates have taken place, such as reload from pg_dump. Sounds like you'll need to store the Next TransactionId rather than the TransactionId of the CREATE INDEX. We don't need to store the ComboId as well, since all commands are planned in ComboId sequence, assuming plan invalidation blows away any earlier plans held by our own backend. There is a slight hole in that SERIALIZABLE transactions won't be able to use any indexes they build during their transaction, since they may need to be able to see prior data, but I don't think anybody is going to complain about that restriction. Anyone? Maybe we can use this technique for CREATE INDEX CONCURRENTLY as well, so that it doesn't have to wait either. That needs some careful thinking... it may not work the same because of the locking differences. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > There is a slight hole in that SERIALIZABLE transactions won't be able > to use any indexes they build during their transaction, since they may > need to be able to see prior data, but I don't think anybody is going to > complain about that restriction. Anyone? Practically every statement I've seen in this thread that used the phrase "SERIALIZABLE transaction" was wrong to some extent, and this one is no different. The issue is not whether the whole transaction is serializable or not, it's how old is the oldest still-live snapshot, a thing that CREATE INDEX can't tell with any certainty in READ COMMITTED mode. So if your solution involves any explicit dependence on the transaction serializability mode, it's probably wrong. I'm not totally sure if you are expecting to be able to tell that, but I do know that the planner has no idea what snapshots a plan it makes will be used with. regards, tom lane
On Thu, 2007-03-22 at 16:16 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > There is a slight hole in that SERIALIZABLE transactions won't be able > > to use any indexes they build during their transaction, since they may > > need to be able to see prior data, but I don't think anybody is going to > > complain about that restriction. Anyone? > > Practically every statement I've seen in this thread that used the > phrase "SERIALIZABLE transaction" was wrong to some extent, and this > one is no different. > > The issue is not whether the whole transaction is serializable or not, > it's how old is the oldest still-live snapshot, a thing that CREATE > INDEX can't tell with any certainty in READ COMMITTED mode. So if your > solution involves any explicit dependence on the transaction > serializability mode, it's probably wrong. I'm not totally sure if you > are expecting to be able to tell that, but I do know that the planner > has no idea what snapshots a plan it makes will be used with. Thanks for correcting me. Reworded: There is a slight hole in that snapshots older than the CREATE INDEX must never be allowed to use the index. That means that SERIALIZABLE transactions and some other situations will need to be restricted. Personally, I would argue that such a restriction was an acceptable loss of functionality, since I can't think of a situation where such a thing would need to occur, though one may turn up. Currently, I don't know how to prevent this from happening. We'll need to examine this in more detail to see if there is a way. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Thu, 2007-03-22 at 16:16 -0400, Tom Lane wrote: > >> "Simon Riggs" <simon@2ndquadrant.com> writes: >> >>> There is a slight hole in that SERIALIZABLE transactions won't be able >>> to use any indexes they build during their transaction, since they may >>> need to be able to see prior data, but I don't think anybody is going to >>> complain about that restriction. Anyone? >>> >> Practically every statement I've seen in this thread that used the >> phrase "SERIALIZABLE transaction" was wrong to some extent, and this >> one is no different. >> >> The issue is not whether the whole transaction is serializable or not, >> it's how old is the oldest still-live snapshot, a thing that CREATE >> INDEX can't tell with any certainty in READ COMMITTED mode. So if your >> solution involves any explicit dependence on the transaction >> serializability mode, it's probably wrong. I'm not totally sure if you >> are expecting to be able to tell that, but I do know that the planner >> has no idea what snapshots a plan it makes will be used with. >> > > Thanks for correcting me. > > Reworded: There is a slight hole in that snapshots older than the CREATE > INDEX must never be allowed to use the index. That means that > SERIALIZABLE transactions and some other situations will need to be > restricted. Personally, I would argue that such a restriction was an > acceptable loss of functionality, since I can't think of a situation > where such a thing would need to occur, though one may turn up. > > Currently, I don't know how to prevent this from happening. We'll need > to examine this in more detail to see if there is a way. > I have seen and used transactions that create indexes inside a transaction, use them for the life of the transaction, and then drop them at the end. I think this is an acceptable solution to not be able to use the index in the current transaction if the table you are building the index on is HOT enabled. That way it is not really a functionality loss, it's just a restriction put in place if you are using a certain feature. We do not want to go breaking existing code. However HOT is enabled by default on tables, then we have a different situation. And if the expectation is that HOT will be enabled by default in future releases, then this needs to be considered now. Regards Russell Smith
On 3/23/07, Simon Riggs <simon@2ndquadrant.com > wrote:
Its slightly different for the HOT-chains created by this transaction which
is creating the index. We should index the latest version of the row which
is not yet committed. But thats ok because when CREATE INDEX commits
this latest version would also get committed.
Yes, I agree.
Oh, I did not see that. If there are no HOT-chains in the table, we can
set the xid to InvalidTransactionId so that the index is usable immediately
after it is created in the current transaction, as well as those transactions
which were started before CREATE INDEX. We can possibly further
improve it by checking if there are no HOT-chains except those created
by this transaction and set xid to InvalidTransactionId. IMO with that we
shall address most of the use cases. There are few which might
still get impacted, but even for them there won't be any correctness
problem.
Thanks,
Pavan
The ShareLock taken by CREATE INDEX guarantees all transactions that
wrote data to the table have completed and that no new data can be added
until after the index build commits. So the end of the chain is visible
to CREATE INDEX and won't change. As long as you index the latest
committed version on each HOT chain, then I think it works.
Its slightly different for the HOT-chains created by this transaction which
is creating the index. We should index the latest version of the row which
is not yet committed. But thats ok because when CREATE INDEX commits
this latest version would also get committed.
Sounds like you'll need to store the Next TransactionId rather than the
TransactionId of the CREATE INDEX.
Yes, I agree.
There is a slight hole in that SERIALIZABLE transactions won't be able
to use any indexes they build during their transaction, since they may
need to be able to see prior data, but I don't think anybody is going to
complain about that restriction. Anyone?
Oh, I did not see that. If there are no HOT-chains in the table, we can
set the xid to InvalidTransactionId so that the index is usable immediately
after it is created in the current transaction, as well as those transactions
which were started before CREATE INDEX. We can possibly further
improve it by checking if there are no HOT-chains except those created
by this transaction and set xid to InvalidTransactionId. IMO with that we
shall address most of the use cases. There are few which might
still get impacted, but even for them there won't be any correctness
problem.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Ühel kenal päeval, N, 2007-03-22 kell 23:30, kirjutas Pavan Deolasee: > > > On 3/22/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > > When CREATE INDEX starts, it acquires ShareLock on the > table. > > At this point we may have one or more HOT-update chains in > the > > table. Tuples in this chain may be visible to one or more > running > > transactions. The fact that we have ShareLock on the table > means > > that all tuples in the chain except the one at the head > either > > RECENTLY_DEAD or were UPDATEd by the same transaction > > that is now running CREATE INDEX. > > This logic seems pretty questionable to me in view of the > recent VACUUM > FULL bug fixes. We now know that an update chain can contain > tuples that > appear DEAD but are later than ones that are > RECENTLY_DEAD. How are > you defining a HOT chain exactly --- will it be just a part of > an > update chain that consists only of contiguous tuples that > appear live or > RECENTLY_DEAD to the CREATE INDEX transaction? > > > HOT-chain is something which contains contiguous tuples that share > the same index keys for all the existing indexes and share a single > index entry. It does not really matter if there are RECENTLY_DEAD > tuples that appear before a DEAD tuple because from VACUUM FULL > bug fix we know that they are DEAD too. > > My argument is that its enough to index only the LIVE tuple which > is at the end of the chain if we don't use the new index for queries > in transactions which were started before CREATE INDEX. You mean, which were started before CREATE INDEX completes ? Just wait for all concurrent transactions to complete before marking the index as usable in plans. > I am > proposing to do that by storing an xid in the pg_index row. I don't think it is a good idea to store xid's anywhere but in xmin/xmax columns, as doing so would cause nasty xid wraparound problems. Instead you should wait, after completeing the index , for all concurrent transactions to end before you mark the index as "usable for queries", similar to the way CREATE INDEX CONCURRENTLY does. > A > special case is where a tuple is UPDATEd multiple times by > the same transaction which is also creating the index, in which case > there are more than one LIVE versions of the tuple. But again > we are safe by indexing only the latest version because all other > versions would be invisible (even to us) once CREATE INDEX commits. > > > > In fact, the serializable transactions started before CREATE > INDEX > > can not anyway see the index so all this is done to handle > > read-committed transactions. > > You are laboring under an illusion that system catalog > accesses are MVCC. > SnapshotNow does not behave that way: the system can see the > new index > as soon as it's committed. (It had better, since it has to > start > updating the index immediately, whether it's safe to scan it > or not.) > I'm not sure whether that's fundamental to your argument or > not, but > it's certainly wrong. > > > Oh, thanks for pointing that out. But thats certainly not fundamental > to the argument as you probably already guessed. The xid still > controls the usage of index for query planning, somewhat similar to > "isindvalid" flag for CREATE INDEX CONCURRENTLY. Xids are unstable and will come back to bite you after 2G transactions. Why not just use the "isindvalid" flag ? -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Ühel kenal päeval, N, 2007-03-22 kell 07:09, kirjutas Andrew Dunstan: > Pavan Deolasee wrote: > > What I am hearing from many users is that its probably not such > > a nice thing to put such restriction. Thats fair. It really helps to think > > about a solution once you know what is acceptable and what is not. > > > That's likely to be the reaction for almost any restriction you can > imagine. Performance improvements are great, but you can't ask people > for whom current performance is adequate to pay a price in functionality > for them. An easy solution would be to not enable HOT by default, so people who dont want to pay the price of no in-transaction CREATE INDEX for HOT improvements don't have to. > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Ühel kenal päeval, K, 2007-03-21 kell 14:06, kirjutas Merlin Moncure: > On 3/21/07, Florian G. Pflug <fgp@phlo.org> wrote: > > Pavan Deolasee wrote: > > > On 3/21/07, Merlin Moncure <mmoncure@gmail.com> wrote: > > >> > > >> On 3/21/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > >> > It seems much simpler to me do something like this. But important > > >> > question is whether the restriction that CREATE INDEX can not > > >> > be run in a transaction block is acceptable ? > > >> > > >> yikes -- this is huge, huge price to pay, IMHO. Think about DDL that > > >> implies index creation such as adding unique constraint to a table, > > >> many postgresql users (including me) take advantage of that in update > > >> systems to production databases. > > >> > > > I didn't understand that quite well. How does it help to run CREATE > > > INDEX inside a transaction ? > > The problem is not so much CREATE INDEX per se, but other DDL commands > > that implicitly create an index, for example adding a PRIMARY KEY to a > > table. What I would really like is an official way to promote an UNIQUE index to Primary Key, so that PK could also be added without locking the table for long periods. > > Some applications that I have written would fail badly if CREATE INDEX > > was disallowed inside a transaction - mostly, because they use plpgsql > > functions to manipulate database objects, and disallowing CREATE INDEX > > inside a transaction presumably means disallowing it from inside > > stored procedures. we don't have stored procedures :( only functions. Maybe your problem will be fixed once we get proper stored procedures, which can begin and commit/rollback transactions inside their code. > speaking with pavan off list he seems to think that only 'create > index' is outside transaction, not the other ddl flavors of it because > they are generally acquiring a excl lock. so, in that sense it is > possibly acceptable to me although still a pretty tough pill to > swallow (thinking, guc time). It would also preclude ever integrating > vanilla 'create index' to create table command, fwiw. I don't think that any of the HOT restrictions apply when creating an index on an empty table. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
On 3/23/07, Hannu Krosing <hannu@skype.net> wrote:
I thought there will be objections to that approach since it changes
the CREATE INDEX behavior and may also lead to new deadlock
conditions.
May be we can use VACUUM to cleanup the xid. If xid is less than
OldtestXmin, the index is visible to all the transactions and can be
marked as InvalidTransactionId.
>
> My argument is that its enough to index only the LIVE tuple which
> is at the end of the chain if we don't use the new index for queries
> in transactions which were started before CREATE INDEX.
You mean, which were started before CREATE INDEX completes ?
Just wait for all concurrent transactions to complete before marking the
index as usable in plans.
I thought there will be objections to that approach since it changes
the CREATE INDEX behavior and may also lead to new deadlock
conditions.
> I am
> proposing to do that by storing an xid in the pg_index row.
I don't think it is a good idea to store xid's anywhere but in xmin/xmax
columns, as doing so would cause nasty xid wraparound problems.
May be we can use VACUUM to cleanup the xid. If xid is less than
OldtestXmin, the index is visible to all the transactions and can be
marked as InvalidTransactionId.
>
>
> Oh, thanks for pointing that out. But thats certainly not fundamental
> to the argument as you probably already guessed. The xid still
> controls the usage of index for query planning, somewhat similar to
> "isindvalid" flag for CREATE INDEX CONCURRENTLY.
Xids are unstable and will come back to bite you after 2G transactions.
Why not just use the "isindvalid" flag ?
Who would set the flag to true ? Unless of course we are waiting in
CREATE INDEX. But that seems to be less acceptable to me.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > There is a slight hole in that SERIALIZABLE transactions won't be able >> to use any indexes they build during their transaction, since they may >> need to be able to see prior data, but I don't think anybody is going to >> complain about that restriction. Anyone? > > Oh, I did not see that. If there are no HOT-chains in the table, we can > set the xid to InvalidTransactionId so that the index is usable immediately > after it is created in the current transaction, as well as those > transactions > which were started before CREATE INDEX. We can possibly further > improve it by checking if there are no HOT-chains except those created > by this transaction and set xid to InvalidTransactionId. IMO with that we > shall address most of the use cases. There are few which might > still get impacted, but even for them there won't be any correctness > problem. Why exactly can't a SERIALIZABLE transaction use the index it created itself? If you add a pointer to the root of all HOT update chains where either the HEAD is alive, or some tuple is visible to the transaction creating the index, shouldn't this be sufficient for using the index in the creating transaction? greetings, Florian Pflug
On 3/23/07, Florian G. Pflug <fgp@master.phlo.org> wrote:
Tuples in the HOT-chain may not share the same index keys with
respect to the new index being built (they share the same keys for
all existing indexes though). So we don't know which index key
to use while building the index.
Thanks,
Pavan
--
Why exactly can't a SERIALIZABLE transaction use the index it created
itself? If you add a pointer to the root of all HOT update chains where
either the HEAD is alive, or some tuple is visible to the transaction
creating the index, shouldn't this be sufficient for using the index
in the creating transaction?
Tuples in the HOT-chain may not share the same index keys with
respect to the new index being built (they share the same keys for
all existing indexes though). So we don't know which index key
to use while building the index.
Thanks,
Pavan
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > On 3/23/07, Florian G. Pflug <fgp@master.phlo.org> wrote: >> >> >> Why exactly can't a SERIALIZABLE transaction use the index it created >> itself? If you add a pointer to the root of all HOT update chains where >> either the HEAD is alive, or some tuple is visible to the transaction >> creating the index, shouldn't this be sufficient for using the index >> in the creating transaction? > > Tuples in the HOT-chain may not share the same index keys with > respect to the new index being built (they share the same keys for > all existing indexes though). So we don't know which index key > to use while building the index. Ah, of course - thanks for pointing that out. greetings, Florian Pflug
Hannu Krosing wrote: > I don't think it is a good idea to store xid's anywhere but in xmin/xmax > columns, as doing so would cause nasty xid wraparound problems. > > Instead you should wait, after completeing the index , for all > concurrent transactions to end before you mark the index as "usable for > queries", similar to the way CREATE INDEX CONCURRENTLY does. We already discussed having VACUUM FREEZE deal with the pg_index xid column. I don't see how having CREATE INDEX wait for all completed transactions helps us from a usability perspective. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Pavan Deolasee wrote: > > Xids are unstable and will come back to bite you after 2G transactions. > > > > Why not just use the "isindvalid" flag ? > > > > > Who would set the flag to true ? Unless of course we are waiting in > CREATE INDEX. But that seems to be less acceptable to me. Agreed, and we have the problem of crash recovery if we do that. The xid seems to handle that automatically. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 3/23/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
Its slightly different for the HOT-chains created by this transaction which
is creating the index. We should index the latest version of the row which
is not yet committed. But thats ok because when CREATE INDEX commits
this latest version would also get committed.
Sounds like you'll need to store the Next TransactionId rather than the
TransactionId of the CREATE INDEX.
Just when I thought we have nailed down CREATE INDEX, I realized
that there something more to worry. The problem is with the HOT-chains
created by our own transaction which is creating the index. We thought
it will be enough to index the tuple at the head-of-the-chain since that
would be the visible copy once the transaction commits. We thought
of keeping the index unavailable for queries in pre-existing transactions
by setting a new "xid" attribute in pg_index. The question is what value
to assign to "xid". I though we would assign ReadNewTransactionId().
Now, a new transaction can start before we commit and hence have
transaction_id > xid. This transaction can still see the old tuple
(because the transaction creating the index is not yet committed)
which we did not index while creating the index. Once the transaction
creating the index commits, the index is also available to this new
transaction and we are in trouble at that point.
I think Tom had already seen this, but his comment got overlooked
in the flow discussion. If thats the case, I regret that.
Any idea how to handle this case ? One ugly hack I can think of
is to remember all those indexes created in the transaction for which
we had seen DELETE_IN_PROGRESS tuples while building
the index. At the commit time, we (somehow) stop new transactions
to start, note the next transaction id and set it in pg_index and commit
the transaction. New transactions are then enabled again.
I know it looks ugly. Is there something better that we can do ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
On Wed, 2007-03-28 at 22:24 +0530, Pavan Deolasee wrote: > Just when I thought we have nailed down CREATE INDEX, I realized > that there something more to worry. The problem is with the HOT-chains > created by our own transaction which is creating the index. We thought > it will be enough to index the tuple at the head-of-the-chain since > that > would be the visible copy once the transaction commits. We thought > of keeping the index unavailable for queries in pre-existing > transactions > by setting a new "xid" attribute in pg_index. The question is what > value > to assign to "xid". I though we would assign ReadNewTransactionId(). > Any idea how to handle this case ? Set it at the end, not the beginning. If you are indexing a table that hasn't just been created by you, set the xcreate field on pg_index at the *end* of the build using ReadNewTransactionId(). Any xid less than that sees the index as invalid. If you created the table in this transaction (i.e. createSubId != 0) then set xcreate to creating xid. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Wed, 2007-03-28 at 22:24 +0530, Pavan Deolasee wrote: > >> Just when I thought we have nailed down CREATE INDEX, I realized >> that there something more to worry. The problem is with the HOT-chains >> created by our own transaction which is creating the index. We thought >> it will be enough to index the tuple at the head-of-the-chain since >> that >> would be the visible copy once the transaction commits. We thought >> of keeping the index unavailable for queries in pre-existing >> transactions >> by setting a new "xid" attribute in pg_index. The question is what >> value >> to assign to "xid". I though we would assign ReadNewTransactionId().> > If you are indexing a table that hasn't just been created by you, set > the xcreate field on pg_index at the *end* of the build using > ReadNewTransactionId(). Any xid less than that sees the index as > invalid. If you created the table in this transaction (i.e. > createSubId != 0) then set xcreate to creating xid. Couldn't you store the creating transaction's xid in pg_index, and let other transaction check that against their snapshot like they would for any tuple's xmin or xmax? (With one exception - the creating transaction would consider indices it built itself invalid, which is not how things usually work for xmin/xmax). This would mean that any transaction that believes that the creating transaction has committed also consideres the index to be valid. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Couldn't you store the creating transaction's xid in pg_index, and > let other transaction check that against their snapshot like they > would for any tuple's xmin or xmax? What snapshot? I keep having to remind people that system catalog operations are SnapshotNow. In the particular context here, the place where doing something else would break down is that the planner has no idea when it makes a plan what snapshot(s) the plan might later be used with. regards, tom lane
On 3/28/07, Simon Riggs <simon@2ndquadrant.com> wrote:
At the end of what ? It does not help to set it at the end of CREATE
INDEX because the transaction may not commit immediately. In
the meantime, many new transactions may start with
transaction id > xcreate. All these transactions can see the old
tuple (which we did not index) and can also see the index once
CREATE INDEX commits.
Why do we need to handle the case where table is created in
the same transaction ? Neither the table nor the index is visible
until we commit. So thats a simple case.
Thanks,
Pavan
--
Set it at the end, not the beginning.
At the end of what ? It does not help to set it at the end of CREATE
INDEX because the transaction may not commit immediately. In
the meantime, many new transactions may start with
transaction id > xcreate. All these transactions can see the old
tuple (which we did not index) and can also see the index once
CREATE INDEX commits.
If you are indexing a table that hasn't just been created by you, set
the xcreate field on pg_index at the *end* of the build using
ReadNewTransactionId(). Any xid less than that sees the index as
invalid. If you created the table in this transaction ( i.e.
createSubId != 0) then set xcreate to creating xid.
Why do we need to handle the case where table is created in
the same transaction ? Neither the table nor the index is visible
until we commit. So thats a simple case.
Thanks,
Pavan
EnterpriseDB http://www.enterprisedb.com
On 3/28/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Florian G. Pflug" <fgp@phlo.org> writes:
> Couldn't you store the creating transaction's xid in pg_index, and
> let other transaction check that against their snapshot like they
> would for any tuple's xmin or xmax?
What snapshot? I keep having to remind people that system catalog
operations are SnapshotNow. In the particular context here, the
place where doing something else would break down is that the planner
has no idea when it makes a plan what snapshot(s) the plan might later
be used with.
Tom, please correct me if I am wrong. But ISTM that this idea might
work in this context. In get_relation_info(), we would check if "xcreate"
xid stored in pg_index for the index under consideration is seen
committed with respect to the snapshot at that point of time.
Even if the snapshot changes later and index becomes valid, we
might not replan and hence not use index. But that doesn't seem
like a big problem to me.
So in get_relation_info():
We get the transaction snapshot. If its a serializable transaction,
it can't change later. If its a read-commited transaction, we anyways
don't care because the transaction can only see the last committed
version of the tuple in the table and we have indexed that. And so
we are safe.
- if xcreate < snapshot->xmin, index is valid
- if xcreate > snapshot->xmax, index is invalid
- if snapshot->xmax > xcreate > snapshot->xmin, we check xids in the snapshot
to determine validity of the index.
I might have got something wrong here, but I hope I made the point.
Am I missing something here ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Couldn't you store the creating transaction's xid in pg_index, and >> let other transaction check that against their snapshot like they >> would for any tuple's xmin or xmax? > > What snapshot? I keep having to remind people that system catalog > operations are SnapshotNow. In the particular context here, the > place where doing something else would break down is that the planner > has no idea when it makes a plan what snapshot(s) the plan might later > be used with. Sorry - now that you say it, I remember that you've said that already multiple times... So the question is, why did this work until now, and CREATE INDEX+HOT just doesn't seem to fit into this scheme? I think the answer is that all other DDL statements manage to assure that any database objects they create or modify are usable for everybody else immediatly after they are committed. This usually implies pretty strong locking requirements - for example, I think that the core reason why TRUNCATE needs an exclusive lock is precisely that guarantee it has to make. Maybe this could somehow be relaxed? Could, for example, the planner be allowed to base some of it's decisions on the SerializableSnapshot the every transaction (even read-only ones) posseses? It seems that this would prevent plans from living longer than a transaction, but maybe plan invalidation could help here? greetings, Florian Pflug
On Wed, 2007-03-28 at 23:42 +0530, Pavan Deolasee wrote: > > > On 3/28/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > > Set it at the end, not the beginning. > > > At the end of what ? It does not help to set it at the end of CREATE > INDEX because the transaction may not commit immediately. In > the meantime, many new transactions may start with > transaction id > xcreate. All these transactions can see the old > tuple (which we did not index) and can also see the index once > CREATE INDEX commits. AtEOX_Reincarnate()... :-) Set xcreate to InvalidTransactionId when we build the index If we created an index in this transaction, as soon as we commit the top level transaction, run another top level transaction to set xcreate using ReadNewTransactionId(). During WAL replay, we remember any index creations and reset xcreate if we were unlucky enough to crash between the two transactions. (I'll be offline now for a few hours until the flames subside.) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > On 3/28/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> "Florian G. Pflug" <fgp@phlo.org> writes: >> > Couldn't you store the creating transaction's xid in pg_index, and >> > let other transaction check that against their snapshot like they >> > would for any tuple's xmin or xmax? >> >> What snapshot? I keep having to remind people that system catalog >> operations are SnapshotNow. In the particular context here, the >> place where doing something else would break down is that the planner >> has no idea when it makes a plan what snapshot(s) the plan might later >> be used with. >> > Tom, please correct me if I am wrong. But ISTM that this idea might > work in this context. In get_relation_info(), we would check if "xcreate" > xid stored in pg_index for the index under consideration is seen > committed with respect to the snapshot at that point of time. > Even if the snapshot changes later and index becomes valid, we > might not replan and hence not use index. But that doesn't seem > like a big problem to me. That problem are usecases like PREPARE my_plan .... ; BEGIN; EXECUTE my_plan .... ; COMMIT ; Is that "PREPARE" even run inside a transaction? Even if it is, it probably won't have created a snapshot... I think allowing the use of some sort of snapshot from inside the planner would allow some locking to be relaxed, but there seems be a lot of corner cases to consider :-( OTOH, if you manage to make this work, a TRUNCATE that doesn't block concurrent selects might become possible to do. This would for example allow dropping and rebuilding subscriptions on a slony node while it is in use. greetings, Florian Pflug
On 3/29/07, Florian G. Pflug <fgp@phlo.org> wrote:
Oh, I see.
In this specific context, this particular case is easy to handle because
we are only concerned about the serializable transactions started before
CREATE INDEX commits. If PREPARE can see the new index, it
implies that the CI transaction is committed. So the transaction
starting after than can only see the tuple version that we have indexed.
But I don't know if this is the only case or there are more cases to
consider :-(
Pavan Deolasee wrote:
> Tom, please correct me if I am wrong. But ISTM that this idea might
> work in this context. In get_relation_info(), we would check if "xcreate"
> xid stored in pg_index for the index under consideration is seen
> committed with respect to the snapshot at that point of time.
> Even if the snapshot changes later and index becomes valid, we
> might not replan and hence not use index. But that doesn't seem
> like a big problem to me.
That problem are usecases like
PREPARE my_plan .... ;
BEGIN;
EXECUTE my_plan .... ;
COMMIT ;
Oh, I see.
Is that "PREPARE" even run inside a transaction? Even if it is, it
probably won't have created a snapshot...
In this specific context, this particular case is easy to handle because
we are only concerned about the serializable transactions started before
CREATE INDEX commits. If PREPARE can see the new index, it
implies that the CI transaction is committed. So the transaction
starting after than can only see the tuple version that we have indexed.
But I don't know if this is the only case or there are more cases to
consider :-(
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > In this specific context, this particular case is easy to handle because > we are only concerned about the serializable transactions started before > CREATE INDEX commits. If PREPARE can see the new index, it > implies that the CI transaction is committed. So the transaction > starting after than can only see the tuple version that we have indexed. Yes, but the non-index plan PREPARE generated will be used until the end of the session, nut only until the end of the transaction. Imagine that it wasn't explicitly PREPARED (where you might say this is acceptable), but rather just a query inside a plpgsql function, maybe even called from some app using connection pooling. This means that the non-index using plan might get used for a quite long time, which contradics the work Tom did on plan invalidation I think. Maybe Tom can comment on wheter it's possible to use plan invalidation to eventually get rid of a stale plan in this context? greetings, Florian Pflug
On 3/29/07, Florian G. Pflug <fgp@phlo.org> wrote:
Frankly I don't know this works, but are you sure that the plan will
be used until the end of the session ? Even if thats the case, it can
happen even today if we create a new index, but the existing sessions
will use the stale plan (assuming what you said is true)
Thanks,
Pavan
Yes, but the non-index plan PREPARE generated will be used until the end
of the session, nut only until the end of the transaction.
Frankly I don't know this works, but are you sure that the plan will
be used until the end of the session ? Even if thats the case, it can
happen even today if we create a new index, but the existing sessions
will use the stale plan (assuming what you said is true)
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > On 3/29/07, Florian G. Pflug <fgp@phlo.org> wrote: >> >> Yes, but the non-index plan PREPARE generated will be used until the end >> of the session, nut only until the end of the transaction. > > Frankly I don't know this works, but are you sure that the plan will > be used until the end of the session ? Even if thats the case, it can > happen even today if we create a new index, but the existing sessions > will use the stale plan (assuming what you said is true) I've checked that: ************************************************************ test=# prepare myplan as select * from test where id=10000 ; PREPARE test=# explain execute myplan ; QUERY PLAN ------------------------------------------------------------ Seq Scan on test (cost=0.00..22897.70 rows=5421 width=36) Filter: (id = 10000) (2 rows) !!!! Now I create an index in another session !!!! test=# explain select * from test where id=10000 ; QUERY PLAN ---------------------------------------------------------------------- Bitmap Heap Scan on test (cost=95.11..8248.45 rows=5000width=36) Recheck Cond: (id = 10000) -> Bitmap Index Scan on idx (cost=0.00..93.86 rows=5000 width=0) Index Cond: (id = 10000) (4 rows) test=# explain execute myplan ; QUERY PLAN ------------------------------------------------------------ Seq Scan on test (cost=0.00..22897.70 rows=5421 width=36) Filter: (id = 10000) (2 rows) !!! Index got used by the "select .. " but not by "execute myplan ..." !!! test=# prepare myplan2 as select * from test where id=10000 ; PREPARE test=# explain execute myplan2 ; QUERY PLAN ----------------------------------------------------------------- Index Scan using idx on test (cost=0.00..8.38 rows=1 width=37) Index Cond: (id = 10000) (2 rows) !!! A newly prepared plan of course uses the index !!! ************************************************************ So yes, plans get cached until the end of the session, and yes, 8.2 won't notice index creation either ;-) The open question is how CVS HEAD with plan invalidation behaves. If it replans after the index-creating transaction commits, then basing index validity on a snapshot will break this, because upon replay they index might not be useable, but later on it may very well be (but that plan invalidation machinery won't realize that) So this might not introduce a regression compared to 8.2, but to a future 8.3 with plan invalidation... Sorry for being so unclear in my previous emails - I had confused myself ;-) greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Pavan Deolasee wrote: >> Frankly I don't know this works, but are you sure that the plan will >> be used until the end of the session ? Even if thats the case, it can >> happen even today if we create a new index, but the existing sessions >> will use the stale plan (assuming what you said is true) > I've checked that: Evidently you weren't testing on HEAD. > The open question is how CVS HEAD with plan invalidation behaves. > If it replans after the index-creating transaction commits, then > basing index validity on a snapshot will break this, because upon > replay they index might not be useable, but later on it may very > well be (but that plan invalidation machinery won't realize that) It will replan at the first use of the plan after seeing the relcache inval sent by commit of the index-creating transaction. If you have two separate transactions to create an index and then mark it valid later, everything's fine because there are two inval events. However, if you design something where an index becomes usable due to the passage of time rather than an explicit mark-valid step, there's gonna be a problem. I'd suggest trying to stick to the way CREATE INDEX CONCURRENTLY does it... regards, tom lane
On 3/29/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It will replan at the first use of the plan after seeing the relcache
inval sent by commit of the index-creating transaction. If you have
two separate transactions to create an index and then mark it valid
later, everything's fine because there are two inval events.
However, if you design something where an index becomes usable due
to the passage of time rather than an explicit mark-valid step,
there's gonna be a problem. I'd suggest trying to stick to the
way CREATE INDEX CONCURRENTLY does it...
I had earlier proposed to do things CIC way. But there were objections
to the additional wait introduced in CREATE INDEX, and I don't
think they were unreasonable. May be if we can avoid waits if there
are no HOT-chains in the table, but still we need agreement on that.
OTOH ISTM that the pg_index:xcreate solution may work fine if
we can keep index unusable to those transactions which started
before CREATE INDEX could commit. I coded a quick prototype
where I use ActiveSnapshot in get_relation_info() to test if the
CREATE INDEX transaction is seen as "in-progress" to the
transaction. If seen as in-progress, the index is not used (even
though the CREATE INDEX is committed and hence can be seen
by SnapshotNow).
If an index which was earlier seen unusable is marked as "valid" as
time progresses, could there be some trouble ? I mean, as long as
we don't make the index usable before all tuples which are not
indexed are DEAD, we should be fine.
Is there something I am missing ? Would it help to explain the idea
if I post the patch ?
CREATE INDEX and CREATE INDEX CONCURRENTLY turned
out to be much more difficult than I imagined earlier. While we are
discussing CREATE INDEX, I would post a design for CIC. I
restrained myself till now to avoid confusion, but with time running
out, it would be helpful to get agreement so that we can finish
the patch on time.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > However, if you design something where an index becomes usable due > to the passage of time rather than an explicit mark-valid step, > there's gonna be a problem. I'd suggest trying to stick to the > way CREATE INDEX CONCURRENTLY does it... I'm a bit skeptical about the idea of CREATE INDEX (ie, non-concurrent) creating an index that won't be used for a while. We get enough people asking why Postgres isn't using an index as it is... Besides, it seems if people are happy to have indexes take a long time to build they could just do a concurrent build. The reason they do non-concurrent builds is precisely because they're willing to take an exclusive lock in order to have them complete as soon as possible. Earlier we were talking about not inserting any HOT tuples until the index became valid. The goal of having an xid on the index was so we would know when we could start doing HOT updates again. That seems like a much lesser cost than not being able to use the index until all live transactions exit. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 3/29/07, Gregory Stark <stark@enterprisedb.com> wrote:
I think we discussed this earlier. One of the down-side of CIC is that
it needs two complete heap scans. Apart from that CIC itself needs
to wait for all existing transactions to finish and more than one
instance of CIC can not be run on a table.
What I am proposing is to keep index unusable for existing transactions.
The index is available for all new transactions even if there are unfinished
existing transactions. Is that a big problem ? Well, I still need buy-in and
review from Tom and others on the design, but it seems workable to me.
Thanks,
Pavan
Besides, it seems if people are
happy to have indexes take a long time to build they could just do a
concurrent build.
I think we discussed this earlier. One of the down-side of CIC is that
it needs two complete heap scans. Apart from that CIC itself needs
to wait for all existing transactions to finish and more than one
instance of CIC can not be run on a table.
Earlier we were talking about not inserting any HOT tuples until the index
became valid. The goal of having an xid on the index was so we would know when
we could start doing HOT updates again. That seems like a much lesser cost
than not being able to use the index until all live transactions exit.
What I am proposing is to keep index unusable for existing transactions.
The index is available for all new transactions even if there are unfinished
existing transactions. Is that a big problem ? Well, I still need buy-in and
review from Tom and others on the design, but it seems workable to me.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > > Earlier we were talking about not inserting any HOT tuples until the index > > became valid. The goal of having an xid on the index was so we would know > > when > > we could start doing HOT updates again. That seems like a much lesser cost > > than not being able to use the index until all live transactions exit. > > > What I am proposing is to keep index unusable for existing transactions. > The index is available for all new transactions even if there are unfinished > existing transactions. Is that a big problem ? Well, I still need buy-in and > review from Tom and others on the design, but it seems workable to me. Yes, that seems totally acceptable to me. As I remember, the index is usable by the transaction that created it, and new transactions. Hard to see how someone would have a problem with that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > What I am proposing is to keep index unusable for existing transactions. > The index is available for all new transactions even if there are unfinished > existing transactions. Ah thanks, that makes a lot more sense. Sorry for the false alarm. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-03-29 at 13:55 -0400, Bruce Momjian wrote: > Pavan Deolasee wrote: > > > Earlier we were talking about not inserting any HOT tuples until the index > > > became valid. The goal of having an xid on the index was so we would know > > > when > > > we could start doing HOT updates again. That seems like a much lesser cost > > > than not being able to use the index until all live transactions exit. > > > > > > What I am proposing is to keep index unusable for existing transactions. > > The index is available for all new transactions even if there are unfinished > > existing transactions. Is that a big problem ? Well, I still need buy-in and > > review from Tom and others on the design, but it seems workable to me. > > Yes, that seems totally acceptable to me. As I remember, the index is > usable by the transaction that created it, and new transactions. Hard > to see how someone would have a problem with that. Agreed. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-03-29 at 22:08 +0530, Pavan Deolasee wrote: > > > On 3/29/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > It will replan at the first use of the plan after seeing the > relcache > inval sent by commit of the index-creating transaction. If > you have > two separate transactions to create an index and then mark it > valid > later, everything's fine because there are two inval events. > However, if you design something where an index becomes usable > due > to the passage of time rather than an explicit mark-valid > step, > there's gonna be a problem. I'd suggest trying to stick to > the > way CREATE INDEX CONCURRENTLY does it... > > > I had earlier proposed to do things CIC way. But there were objections > to the additional wait introduced in CREATE INDEX, and I don't > think they were unreasonable. May be if we can avoid waits if there > are no HOT-chains in the table, but still we need agreement on that. > > OTOH ISTM that the pg_index:xcreate solution may work fine if > we can keep index unusable to those transactions which started > before CREATE INDEX could commit. Pavan, ISTM you have misunderstood Tom slightly. Having the index invisible to all current transactions is acceptable. However, the other backends will not receive an invalidation event, which means even when they start new transactions they will still not see it, which is not acceptable. ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > ISTM that the run-another-transaction-afterwards idea is the only one > that does everything I think we need. I really do wish we could put in a > wait, like CIC, but I just think it will break existing programs. Actually, there's a showstopper objection to that: plain CREATE INDEX has to be able to run within a larger transaction. (To do otherwise breaks "pg_dump --single-transaction", just for starters.) This means it can *not* commit partway through. Back to the drawing board :-( regards, tom lane
Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: >> ISTM that the run-another-transaction-afterwards idea is the only one >> that does everything I think we need. I really do wish we could put in a >> wait, like CIC, but I just think it will break existing programs. > > Actually, there's a showstopper objection to that: plain CREATE INDEX > has to be able to run within a larger transaction. (To do otherwise > breaks "pg_dump --single-transaction", just for starters.) This means > it can *not* commit partway through. I believe the original idea was to invent some kind of "on commit run this transaction" hook - similar to how files are deleted on commit, I think. At least I understood the "Run another transaction on commit" that way... greetings, Florian Pflug
On Thu, 2007-03-29 at 17:27 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > ISTM that the run-another-transaction-afterwards idea is the only one > > that does everything I think we need. I really do wish we could put in a > > wait, like CIC, but I just think it will break existing programs. > > Actually, there's a showstopper objection to that: plain CREATE INDEX > has to be able to run within a larger transaction. (To do otherwise > breaks "pg_dump --single-transaction", just for starters.) This means > it can *not* commit partway through. I agree with most of that, but thats why we-are-where-we-are and I don't think its a showstopper at all. The idea is to make note that the transaction has created an index within a transaction block, so that after the top level transaction commits we sneak in an extra hidden transaction to update the pg_index tuple with the xcreate of the first transaction. We don't do this after the CREATE INDEX statement ends, only at the end of the transaction in which it ran. We only do this if we are creating an index on a table that is not a temporary table and was not created during the transaction (so --single-transaction isn't broken and doesn't require this additional action). i.e. MyTransactionCreatedVisibleIndex, with special processing in xact.c. The only other alternative is to forcibly throw a relcache inval event in the same circumstances without running the additional transaction, but the solution is mostly the same. I agree this is weird, but no more weird a solution as CIC was when that first came out. I don't like it, or think its clever; I just think its the only thing on the table. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Thu, 2007-03-29 at 17:27 -0400, Tom Lane wrote: >> "Simon Riggs" <simon@2ndquadrant.com> writes: >>> ISTM that the run-another-transaction-afterwards idea is the only one >>> that does everything I think we need. I really do wish we could put in a >>> wait, like CIC, but I just think it will break existing programs. >> Actually, there's a showstopper objection to that: plain CREATE INDEX >> has to be able to run within a larger transaction. (To do otherwise >> breaks "pg_dump --single-transaction", just for starters.) This means >> it can *not* commit partway through. > > The idea is to make note that the transaction has created an index > within a transaction block, so that after the top level transaction > commits we sneak in an extra hidden transaction to update the pg_index > tuple with the xcreate of the first transaction. > > The only other alternative is to forcibly throw a relcache inval event > in the same circumstances without running the additional transaction, > but the solution is mostly the same. I think one alternative might be to store a list of xid's together with a cached plan, and replan if the commit status (as percieved by the transaction the plan will be executed in) of one of those xid's changes. greetings, Florian Pflug
On 3/30/07, Simon Riggs <simon@2ndquadrant.com> wrote:
Oh, yes. Now that I re-read Tom's comment, his plan invalidation
design and code, I understand things better.
Ok.
Agree.
ISTM that the run-another-transaction-afterwards idea would have same
problem with plan invalidation. When the second transaction commits,
the relcache invalidation event is generated. The event may get consumed
by other backends, but the index may still not be usable to them because
their xid < xcreat. If no more relcache invalidation events are generated
after that, the backends would continue to use the cached plan, even
if index becomes usable to them later.
How about storing the snapshot which we used during planning in
CachedPlanSource, if at least one index was seen unusable because
its CREATE INDEX transaction was seen as in-progress ? In
RevalidateCachedPlan(), we check if snapshot is set in
CachedPlanSource and check if we are now using a different snapshot.
This triggers plan invalidation and re-planning. This would also help us
to use index early in read-committed transactions even if the transaction
was started before CREATE INDEX committed.
Does this sound good ?
Thanks,
Pavan, ISTM you have misunderstood Tom slightly.
Oh, yes. Now that I re-read Tom's comment, his plan invalidation
design and code, I understand things better.
Having the index invisible to all current transactions is acceptable.
Ok.
However, the other backends will not receive an invalidation event,
which means even when they start new transactions they will still not
see it, which is not acceptable.
Agree.
ISTM that the run-another-transaction-afterwards idea is the only one
that does everything I think we need. I really do wish we could put in a
wait, like CIC, but I just think it will break existing programs.
ISTM that the run-another-transaction-afterwards idea would have same
problem with plan invalidation. When the second transaction commits,
the relcache invalidation event is generated. The event may get consumed
by other backends, but the index may still not be usable to them because
their xid < xcreat. If no more relcache invalidation events are generated
after that, the backends would continue to use the cached plan, even
if index becomes usable to them later.
How about storing the snapshot which we used during planning in
CachedPlanSource, if at least one index was seen unusable because
its CREATE INDEX transaction was seen as in-progress ? In
RevalidateCachedPlan(), we check if snapshot is set in
CachedPlanSource and check if we are now using a different snapshot.
This triggers plan invalidation and re-planning. This would also help us
to use index early in read-committed transactions even if the transaction
was started before CREATE INDEX committed.
Does this sound good ?
Pavan
--
EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > How about storing the snapshot which we used during planning in > CachedPlanSource, if at least one index was seen unusable because > its CREATE INDEX transaction was seen as in-progress ? I'm getting tired of repeating this, but: the planner doesn't use a snapshot. System catalogs run on SnapshotNow. regards, tom lane
Tom Lane wrote:>> I'm getting tired of repeating this, but: the planner doesn't use a> snapshot. System catalogs run onSnapshotNow.> I am really sorry if I sound foolish here. I am NOT suggesting that we use "snapshot" to read system catalogs. I understand that system catalogs run on SnapshotNow and all transactions, irrespective of when they started, would see the changes to system catalogs as soon as the transaction updating the system catalog commits. What I am suggesting is to use ActiveSnapshot (actually Florian's idea) to decide whether the transaction that created index was still running when we started. Isn't it the case that some snapshot will be "active" when we plan ? The active snapshot may change later in the same transaction if we are running in read-committed mode, and we may need to invalidate the plan. Here is what I suggest to do in get_relation_info(): + if (index->indcreatexid != InvalidTransactionId) + { + Assert(ActiveSnapshot); + if (XidInMVCCSnapshot(index->indcreatexid, ActiveSnapshot)) + { + index_close(indexRelation, NoLock); + continue; + } + /* + * Otherwise the index is usable + */ + } Is there a problem with this ? I really appreciate all the help I am receiving on this. But there is hardly anything else that I can do than post my thoughts and get feedback, until we find a clean solution :-( Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: >> How about storing the snapshot which we used during planning in >> CachedPlanSource, if at least one index was seen unusable because >> its CREATE INDEX transaction was seen as in-progress ? > > I'm getting tired of repeating this, but: the planner doesn't use a > snapshot. System catalogs run on SnapshotNow. But it would still do that - it would just compare the createxid of the index against some snapshot, and the query would be replanned if the cached result of this comparison differs from the one the current snapshot yields. It might well be that this won't work, because the planner is invoked in situations where there is no active snapshot - I'm not sure if your comment refers to that case, or not. greetings, Florian Pflug
"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes: > What I am suggesting is to use ActiveSnapshot (actually > Florian's idea) to decide whether the transaction that created > index was still running when we started. Isn't it the case that > some snapshot will be "active" when we plan ? I do not think you can assume that the plan won't be used later with some older snapshot. Consider recursive plpgsql functions for a counterexample: the inner occurrence might be the first to arrive at a given line of the function, hence the first to plan it, yet when we return to the outer instance we might revert to an older snapshot. regards, tom lane
Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes: >> What I am suggesting is to use ActiveSnapshot (actually >> Florian's idea) to decide whether the transaction that created >> index was still running when we started. Isn't it the case that >> some snapshot will be "active" when we plan ? > > I do not think you can assume that the plan won't be used later with > some older snapshot. Consider recursive plpgsql functions for a > counterexample: the inner occurrence might be the first to arrive at > a given line of the function, hence the first to plan it, yet when we > return to the outer instance we might revert to an older snapshot. So maybe we'd need to use the SerializableSnapshot created at the start of each transaction for this check, and not the ActiveSnapshot? Could that work? What about doing PREPARE myplan select ... ; outside of a transaction? Will this be execute inside a transaction? Is is a query always planned upon it's first execution, and not when "PREPARE" is issued? greetings, Florian Pflug
On 3/30/07, Florian G. Pflug <fgp@phlo.org> wrote:
What about doing
PREPARE myplan select ... ;
outside of a transaction? Will this be execute inside a transaction?
I checked that. PREPARE runs with ActiveSnapshot set.
Pavan
--
EnterpriseDB http://www.enterprisedb.com
"Florian G. Pflug" <fgp@phlo.org> writes: > Tom Lane wrote: >> I do not think you can assume that the plan won't be used later with >> some older snapshot. > So maybe we'd need to use the SerializableSnapshot created at the start > of each transaction for this check, and not the ActiveSnapshot? Could > that work? That might work, but it doesn't seem to address the core objection: there's no mechanism to cause the query to be replanned once the snapshot is new enough, because no relcache inval will happen. So most likely existing backends will keep using old plans that don't consider the index. regards, tom lane
On Fri, 2007-03-30 at 11:44 +0530, Pavan Deolasee wrote: > ISTM that the run-another-transaction-afterwards idea would have same > problem with plan invalidation. When the second transaction commits, > the relcache invalidation event is generated. The event may get > consumed > by other backends, but the index may still not be usable to them > because > their xid < xcreat. If no more relcache invalidation events are > generated > after that, the backends would continue to use the cached plan, even > if index becomes usable to them later. Sounds like we need to allow create index invalidation events to be processed at the the end of the current transaction in the *receiving* backend. That way we don't need to do the run-another-transaction thing and seems a helluva lot cleaner way of doing this. Messages of SHAREDINVALRELCACHE_ID, would be augmented by a boolean deferred flag on the SharedInvalRelcacheMsg struct. Received messages would be stored in a third kind of InvalidationList, then processed during AtEOXact_Inval() whether the receiving transaction commits or not. (see src/backend/utils/cache/inval.c) Not sure how we'd know to *send* the message marked as deferred, but seems like we can work that out also. That seems to allow CCI to not have to wait until the end of time either. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On 3/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Can't we store the snapshot (or may be the transaction id) which was
used to plan the query in CachedPlanSource if and only if at least one
index was seen unusable ? In RevalidateCachedPlan() we then check if
the snapshot has changed and replan the query in that case.
That would make the index usable in the subsequent transactions
in the same session, though we may not be able to use the index
in the same transaction, even if its running in read-commited mode.
Would that be acceptable ?
Thanks,
Pavan
That might work, but it doesn't seem to address the core objection:
there's no mechanism to cause the query to be replanned once the
snapshot is new enough, because no relcache inval will happen. So
most likely existing backends will keep using old plans that don't
consider the index.
Can't we store the snapshot (or may be the transaction id) which was
used to plan the query in CachedPlanSource if and only if at least one
index was seen unusable ? In RevalidateCachedPlan() we then check if
the snapshot has changed and replan the query in that case.
That would make the index usable in the subsequent transactions
in the same session, though we may not be able to use the index
in the same transaction, even if its running in read-commited mode.
Would that be acceptable ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Tom Lane wrote: >>> I do not think you can assume that the plan won't be used later with >>> some older snapshot. > >> So maybe we'd need to use the SerializableSnapshot created at the start >> of each transaction for this check, and not the ActiveSnapshot? Could >> that work? > > That might work, but it doesn't seem to address the core objection: > there's no mechanism to cause the query to be replanned once the > snapshot is new enough, because no relcache inval will happen. So > most likely existing backends will keep using old plans that don't > consider the index. Pavan suggested storing the IndexSnapshot in the cached plan, and to compare it to the IndexSnapshot when the query is executed. If those two snapshots differ, the query would be replanned. My idea was to store a list of xid's together with the cached plan that are assumed to be uncommitted accoring to the IndexSnapshot. The query is replanned if upon execution the IndexSnapshot assumes that one of these xid's is committed. Those two ideas seem to be mostly equivalent, mine seems to be a bit more fine-grained, but at the cost of more work upon each query execution. greetings, Florian Pflug
On 3/30/07, Florian G. Pflug <fgp@phlo.org> wrote:
My idea was to store a list of xid's together with the cached plan that
are assumed to be uncommitted accoring to the IndexSnapshot. The query
is replanned if upon execution the IndexSnapshot assumes that one of
these xid's is committed.
Actually, if we are using Serializable Snapshot then there is no chance
to replan the query before the transaction completes and the next
transaction to start in the session must see the index and hence
we must replan. So it would be enough just to associate a transaction
id with the cached plan. If this xid is set and our transaction id is
different than that, we replan.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > On 3/30/07, Florian G. Pflug <fgp@phlo.org> wrote: >> >> My idea was to store a list of xid's together with the cached plan that >> are assumed to be uncommitted accoring to the IndexSnapshot. The query >> is replanned if upon execution the IndexSnapshot assumes that one of >> these xid's is committed. >> >> > Actually, if we are using Serializable Snapshot then there is no chance > to replan the query before the transaction completes and the next > transaction to start in the session must see the index and hence > we must replan. So it would be enough just to associate a transaction > id with the cached plan. If this xid is set and our transaction id is > different than that, we replan. I believe this is true for the CREATE INDEX scenario. However, comparing either the snapshot or the result of xid checks seems like it might be useful for other things beside CREATE INDEX. I'm specifically thinking about TRUNCATE here - the create index + HOT problems sound quite similar to the problems a non-exclusive-locking TRUNCATE would face. greetings, Florian Pflug
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > Actually, if we are using Serializable Snapshot then there is no chance > to replan the query before the transaction completes and the next > transaction to start in the session must see the index and hence > we must replan. So it would be enough just to associate a transaction > id with the cached plan. If this xid is set and our transaction id is > different than that, we replan. Hm. So anytime we reject a potentially useful index as being not valid yet, we mark the plan as "only good for this top-level transaction"? That seems possibly workable --- in particular it doesn't get more complicated as soon as you consider multiple such indexes. regards, tom lane
On Fri, 2007-03-30 at 13:54 -0400, Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > > Actually, if we are using Serializable Snapshot then there is no chance > > to replan the query before the transaction completes and the next > > transaction to start in the session must see the index and hence > > we must replan. So it would be enough just to associate a transaction > > id with the cached plan. If this xid is set and our transaction id is > > different than that, we replan. > > Hm. So anytime we reject a potentially useful index as being not valid > yet, we mark the plan as "only good for this top-level transaction"? > That seems possibly workable --- in particular it doesn't get more > complicated as soon as you consider multiple such indexes. I like that because its specific in dealing with the exact issue we have - it doesn't rely on many other things happening correctly. ...and it also seems to provide a new route to avoiding the CIC wait. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On 3/31/07, Simon Riggs <simon@2ndquadrant.com> wrote:
Ok. Cool. I would finish this work then.
Yeah, though I would like to take that up later.
Thanks,
Pavan
On Fri, 2007-03-30 at 13:54 -0400, Tom Lane wrote:
>
> Hm. So anytime we reject a potentially useful index as being not valid
> yet, we mark the plan as "only good for this top-level transaction"?
> That seems possibly workable --- in particular it doesn't get more
> complicated as soon as you consider multiple such indexes.
I like that because its specific in dealing with the exact issue we have
- it doesn't rely on many other things happening correctly.
Ok. Cool. I would finish this work then.
...and it also seems to provide a new route to avoiding the CIC wait.
Yeah, though I would like to take that up later.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
On 3/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thanks for making me aware of such scenarios. What it also means
is that a transaction may use an older snapshot after it created the
index. So to be on safer side, we should not use an index created
in the same transaction if we saw HOT-updated RECENTLY_DEAD
or DELETE_IN_PROGRESS tuples while building the index (with HOT
we don't want to index these tuples). Not such a bad restriction to
live with. May be we can do something with command ids to solve
this once we get the basic stuff ready.
Thanks,
Pavan
I do not think you can assume that the plan won't be used later with
some older snapshot. Consider recursive plpgsql functions for a
counterexample: the inner occurrence might be the first to arrive at
a given line of the function, hence the first to plan it, yet when we
return to the outer instance we might revert to an older snapshot.
Thanks for making me aware of such scenarios. What it also means
is that a transaction may use an older snapshot after it created the
index. So to be on safer side, we should not use an index created
in the same transaction if we saw HOT-updated RECENTLY_DEAD
or DELETE_IN_PROGRESS tuples while building the index (with HOT
we don't want to index these tuples). Not such a bad restriction to
live with. May be we can do something with command ids to solve
this once we get the basic stuff ready.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com