Thread: Question: pg_class attributes and race conditions ?
What is the safest way to access/modify the pg_class attribute and still avoid any race conditions with the other backends ? A specific example is: To solve the CREATE INDEX problem with HOT, I am thinking of adding (along with other things) a pg_class boolean attribute, say hot_update_enable. All backends are supposed to check this attribute before they perform an UPDATE. The attribute would usually be available in relation->rd_rel My understanding is that the backend which sets this attribute must first acquire a lock on the heap relation of sufficient strength so as to ensure that there are no concurrent UPDATErs, update the pg_class row and then release the lock on the relation. This would ensure that no backend has a stale "Relation" pointer with stale value of hot_update_enable. Also, should I use heap_inplace_update() rather than simple_heap_update() because I want other backends to see the change immediately irrespective of their snapshot ? Is this a fair analysis ? Are there any rules I must follow to avoid any deadlock and race conditions. I know we should not be requesting a higher grade lock while holding a lower grade lock, but are there any other restrictions/best practices ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > > > What is the safest way to access/modify the pg_class attribute > and still avoid any race conditions with the other backends ? > > A specific example is: To solve the CREATE INDEX problem with > HOT, I am thinking of adding (along with other things) a pg_class > boolean attribute, say hot_update_enable. All backends are > supposed to check this attribute before they perform an UPDATE. > The attribute would usually be available in relation->rd_rel > > My understanding is that the backend which sets this attribute > must first acquire a lock on the heap relation of sufficient > strength so as to ensure that there are no concurrent UPDATErs, > update the pg_class row and then release the lock on the relation. > This would ensure that no backend has a stale "Relation" > pointer with stale value of hot_update_enable. FWIW this is pretty much the same I wanted to do with setting relfrozenxid to FrozenTransactionId. To this end I wrote a patch to add a catalog pg_ntclass (later renamed to pg_class_nt), which was ultimately rejected for reasons I don't remember at the time. Maybe it would be illuminating to investigate that -- please see the archives. (I still think it would be good to have a pg_class_nt catalog, so it's not a dead idea). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes: > My understanding is that the backend which sets this attribute > must first acquire a lock on the heap relation of sufficient > strength so as to ensure that there are no concurrent UPDATErs, > update the pg_class row and then release the lock on the relation. In what context are you proposing to do that, and won't this high-strength lock in itself lead to deadlocks? The whole thing sounds exceedingly ugly anyway --- for example what happens if the backend doing the CREATE INDEX fails and is therefore unable to clear the flag again? regards, tom lane
Tom Lane wrote:>> In what context are you proposing to do that, and won't this> high-strength lock in itself lead to deadlocks?>>The whole thing sounds exceedingly ugly anyway --- for example> what happens if the backend doing the CREATEINDEX fails and> is therefore unable to clear the flag again?> Let me state the problem and a vague solution I am thinking of. I would appreciate comments and suggestions. The major known issue left with HOT is support for CREATE INDEX and CREATE INDEX CONCURRENTLY. The problem is with HEAP_ONLY tuples in the heap which do not have index entries in the existing indexes. When we build a new index, some or all of the HEAP_ONLY tuples may need index entries in the new index. It would be very ugly if we try to keep the existing indexes without index entries for those tuples. A clean solution would be to add index entries for the HEAP_ONLY tuples in the existing indexes and break all the HOT-chains. I would leave the details, but rather explain what I have in mind at high level. Any help to fill in the details or any suggestions to do things differently would immensely help. This is what I have in mind: In the context of CREATE INDEX [CONCURRENTLY], We first disable HOT-updates on the table. This would ensure that no new HOT tuples are added while we CHILL the heap. (How do we do this ?) We then start scanning the heap and start building the new index. If a HEAP_ONLY tuple is found which needs to be indexed, we mark the tuple with a CHILL_IN_PROGRESS flag and insert index entries into all the existing indexes. (The buffer is exclusively locked and the operation is WAL logged). We do this until entire heap is scanned. At this point, we would have inserted missing index entries for the HEAP_ONLY tuples. Till this point, we don't use the direct index entries to fetch the HEAP_ONLY tuples to avoid duplicate fetches of the same tuple. We now wait for all the concurrent index scans to end and then disable HOT-chain following logic to fetch tuples. (How do we do this ?) At this point, all index scans would ONLY use the direct path from the index to fetch tuples. The HOT-chains are not followed to avoid duplicate fetches of the same tuple. A second pass over the heap is now required to clear the CHILL_IN_PROGRESS, HEAP_ONLY and HEAP_HOT_UPDATED flags. At the end of this step, all the indexes and the table are in sync. Once again we need to ensure that there are no concurrent index scans in progress and then enable HOT-fetch. Also, HOT-updates can be turned on. If CREATE INDEX crashes, VACUUM is required to clear the CHILL_IN_PROGRESS flags and the corresponding index entries are removed. Since VACUUM runs mutually exclusive to CREATE INDEX, we don't need any special mechanism to handle race conditions between them. There are some other details like running multiple CREATE INDEX in parallel and still be able to CHILL the table safely. May be one of them needs to act as the chiller and others wait for it finish successfully. Any thoughts on the overall approach ? Any suggestions to simplify things or any alternate designs ? Can something as simple as CHILLing the table holding VACUUM FULL strength lock be acceptable ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes: > Any thoughts on the overall approach ? Fragile and full of race conditions :-(. I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? regards, tom lane
Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes: >> Any thoughts on the overall approach ? > > Fragile and full of race conditions :-(. I thought from the beginning > that CREATE INDEX might be a showstopper for the whole HOT concept, > and it's starting to look like that's the case. > > I think what we need to get away from is the assumption that HOT-ness > for one index is the same as HOT-ness for all. What if we only applied > HOT to primary-key indexes, so that there was certainly not more than > one index per table that the property applies to? Just to throw my two bits in here :). If we do that, how does that effect the idea that most people in the web world use (id serial primary key), even though that is not what they are searching on? More specifically, does HOT help conditions where a composite comes into play (session_id,last_active) ... which would be a more heavily updated index than just the primary key. Sincerely, Joshua D. Drake > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Tom Lane wrote: > What if we only applied > HOT to primary-key indexes, so that there was certainly not more than > one index per table that the property applies to? The main objective of HOT is to enable retail vacuum of HOT-updated tuples. Doing the above would make it useless for that purpose, at least when there's more than one index on the table. Granted, there's a lot of tables with just one index out there, but it's a big limitation nevertheless. An extension of that idea, though is to store a flag per index in the HOT-updated tuple. We would then need a mapping between bits in the tuple header to indexes, for example as a new column in pg_index. Let's assume that we'd use one byte in the tuple header to store HOT-flags. That's enough to store the HOT-status for 8 indexes. A set bit means that the index corresponding that bit doesn't have an index pointer in it. When you CREATE INDEX, assign a bit for the new index that's not currently in use. When you scan the table to build the index, clear that bit for every tuple if set and insert index entry as usual. DROP INDEX wouldn't need to scan the heap to clear the flags, because we clear them on CREATE INDEX when necessary. If you run out of bits in the header, IOW have more than 8 indexes on a table, indexes unlucky enough to not have a bit assigned to them wouldn't be HOT-updateable. This would also enable us to skip index inserts for those indexes whose key columns are not updated, and do the index inserts as usual for the rest. The limitation that you can only retail vacuum HOT-updated tuples when none of the indexed keys were changed remains, but we've accepted that already. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote:> "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:>> Any thoughts on the overall approach ?>> Fragileand full of race conditions :-(.> Yes, it looks a bit complex. But IMHO we can get around that. Do you have any ideas in mind about doing that ? > I thought from the beginning> that CREATE INDEX might be a showstopper for the whole HOT concept,> and it's starting tolook like that's the case. I remember you raised this concern very early, but I am hopeful that we would be able to solve this. Would it be acceptable to have a simple (though not the best) solution for this release and then improve later on ? As I mentioned earlier, one option is to CHILL the table, if required, holding AccessExclusive lock, just like VACUUM FULL. I am assuming here that CREATE INDEX is not such a common activity, isn't that true ? > I think what we need to get away from is the assumption that HOT-ness> for one index is the same as HOT-ness for all. What if we only applied> HOT to primary-key indexes, so that there was certainly not more than> one index per table thatthe property applies to?> I think that will take away the ability to reuse HEAP_ONLY tuples without vacuuming the heap and index. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
On Fri, 2007-03-16 at 21:56 +0530, Pavan Deolasee wrote: > Any thoughts on the overall approach ? Any suggestions to > simplify things or any alternate designs ? Well your design is very different from what we discussed, so I think I should post my proposed design alongside this, for further discussion. - - - Proposed solutions for CREATE INDEX and CREATE INDEX CONCURRENTLY. CREATE INDEX CONCURRENTLY has no feature changes to work with HOT. CREATE INDEX works normally, except when HOT tuples are found, in which case special processing occurs requiring additional locking. The index build occurs in a single scan, as now. CREATE INDEX CONCURRENTLY ------------------------- We perform no chilling during the initial scan. We index the tuple identified by SnapshotNow, but we take the root tuple's htid, not the htid of the tuple version being indexed. We assume that the tuple version indexed will be the root of the HOT chain by the time the index is complete. Currently, we wait until all pre-existing transactions exit before we allow this to COMMIT. With HOT, we simply move the wait so it occurs *before* the second scan, then we can prune the HOT chains as we pass through the heap on the second scan. There will be no pre-existing HOT tuples and so no chilling is required. CREATE INDEX ------------ We add a field, xchill, to pg_class that stores TransactionIds. This is set to InvalidTransactionId if no index has been built yet. The concept of "main indexer" is introduced, so we allow multiple concurrent index builds, but only one of these can chill tuples at a time. 1. In IndexBuildHeapScan, as we pass through the table: a) if we find any any HOT rows, we check xchill and do one of steps (i-iii). Until this point, it hasn't been important whether we are the main or a secondary indexer. i) if xchill is InvalidTransactionId or is committed then we attempt to become main indexer immediately, following these steps before we continue building the index (1b) -- If the table is temp, or if we created the table then we immediately become the main indexer, so return immediately. If the table being indexed is already visible to everybody, then: -- Update pg_class entry for the table, setting the xchill field for the table to the builder's Xid. (Use heap_inplace_update for this, which is OK, whether we commit or abort). -- acquire AccessExclusiveLock on all existing indexes (not the heap) ii) If xchill is in-progress we wait for that transaction to complete, then do either step i) or iii). We cannot continue building our index until the other transaction commits because we cannot yet see the other index, yet we have to insert into it in order to correctly chill a tuple to allow *our* index to be built. iii) if xchill is aborted we abort also, saying that a VACUUM is needed. b) If we get here then we are the main indexer and can chill tuples. As we move through the scan we chill all HOT tuples, mark them HEAP_CHILL_IN_PROGRESS, write WAL for this and insert index entries for them in all existing indexes, as well as this one. Then remove CHILL_IN_PROGRESS flags, without writing WAL. c) release locks on indexes, before end of transaction 2. If we crash or a transaction abort occurs: - we cannot prune a HEAP_ONLY_TUPLE that points to a tuple with HEAP_CHILL_IN_PROGRESS. - VACUUM must be used to clean up after an aborted index build and needs some additional code to allow this to occur. 3. Concurrent index builds are allowed. If we are not the main indexer, then we can attempt to build an index, but any scan that sees a HOT tuple will block and wait for the main index builder to complete before it proceeds. 4. When an indexscan reads the table, if it finds a HEAP_CHILL_IN_PROGRESS tuple it may or may not be valid. Concurrent index scans and tuple chilling can mean that an index scan find the same tuple twice, by different routes, if a CREATE INDEX crashed. To avoid this an IndexScan will only find a tuple visible if it came across a HEAP_CHILL_IN_PROGRESS tuple using an indirect route, i.e. it followed the path from root->HOT tuple. In this design, CREATE INDEX does have a deadlock risk when it is used within a transaction *and* the index is being built on a publicly visible table (i.e. not just-built and not temp). IMHO that risk is acceptable, since if users are worried about concurrent access to a table during CREATE INDEX they can use CREATE INDEX CONCURRENTLY. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Joshua D. Drake" <jd@commandprompt.com> writes: > Just to throw my two bits in here :). If we do that, how does that > effect the idea that most people in the web world use (id serial primary > key), even though that is not what they are searching on? "affect". But I think you're right that generally you'll have two indexes. > More specifically, does HOT help conditions where a composite comes into > play (session_id,last_active) ... which would be a more heavily updated > index than just the primary key. Well if you're indexing a column that you're updating then you've already failed your saving throw. The case we're trying to deal with is when you're updating columns that *aren't* indexed and therefore really don't need redundant index pointers for each tuple version with identical to the old versions. Especially since those index pointers are what's preventing us from vacuuming the old tuple versions. If you are updating an index key then there's no question you're going to need vacuum to clean out your index. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:> Tom Lane wrote:>> What if we only applied>> HOT to primary-key indexes, so that there was certainlynot more than>> one index per table that the property applies to?>> The main objective of HOT is to enable retailvacuum of HOT-updated> tuples. Doing the above would make it useless for that purpose,> at least when there's morethan one index on the table. Granted,> there's a lot of tables with just one index out there, but it's a> big limitationnevertheless.> Agree. > An extension of that idea, though is to store a flag per index in> the HOT-updated tuple. We would then need a mappingbetween bits in> the tuple header to indexes, for example as a new column in pg_index.> I like the idea. The major objection would be that it adds a byte to the tuple header which when considered along with the null bitmap, may actually make the header 8 bytes larger in the worst case. Also, I am also worried about the additional complexity introduced with this. We can and should work on this idea, I am wondering whether it would be too much to do before the feature freeze. I am personally inclined towards doing something simpler to tackle the CREATE INDEX issue at the moment. But if that is not acceptable and/or you or anyone else is willing help me on this, we can work on a better solution. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
On Fri, 2007-03-16 at 12:40 -0400, Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes: > > Any thoughts on the overall approach ? > > Fragile and full of race conditions :-(. I thought from the beginning > that CREATE INDEX might be a showstopper for the whole HOT concept, > and it's starting to look like that's the case. Seems like we can fix all but some strange CREATE INDEX use cases. Since we have CREATE INDEX CONCURRENTLY, seems like HOT is a showstopper for the whole CREATE INDEX concept. > I think what we need to get away from is the assumption that HOT-ness > for one index is the same as HOT-ness for all. Sounds interesting. I'd not considered that before. > What if we only applied > HOT to primary-key indexes, so that there was certainly not more than > one index per table that the property applies to? On its own, I don't think this is a sufficiently wide use-case. Perhaps we should do this PLUS make HOT-semantics optional for each additional index. i.e. HOT is always enforced on primary indexes and optionally on other indexes (but not by default). If you accept the HOT option on an index, you then accept the additional issues surrounding chilling tuples. Bear in mind that there aren't any at all if you use CREATE INDEX CONCURRENTLY and many other cases. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > >> Just to throw my two bits in here :). If we do that, how does that >> effect the idea that most people in the web world use (id serial primary >> key), even though that is not what they are searching on? > > "affect". But I think you're right that generally you'll have two indexes. > >> More specifically, does HOT help conditions where a composite comes into >> play (session_id,last_active) ... which would be a more heavily updated >> index than just the primary key. > > Well if you're indexing a column that you're updating then you've already > failed your saving throw. Just for everyone who missed this. Greg Stark obviously spends his time (or at some time) playing|ed D&D. I have an Epic level Sorcerer, how about you Greg? ;) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Simon Riggs wrote: > > What if we only applied > > HOT to primary-key indexes, so that there was certainly not more than > > one index per table that the property applies to? > > On its own, I don't think this is a sufficiently wide use-case. > > Perhaps we should do this PLUS make HOT-semantics optional for each > additional index. i.e. HOT is always enforced on primary indexes and > optionally on other indexes (but not by default). Here's is a very simple, low-tech idea. How about checking whether the new index requires chilling tuples; if it does, then elog(ERROR) until all the indexes have been manually chilled, which would be done with an "ALTER INDEX ... CHILL" command or something like that. Only when all indexes are known chilled, you can create another one, and then the user can "hotify" indexes as appropriate. (Disclaimer: I haven't followed the HOT design closely to know if this makes enough sense) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, 2007-03-16 at 16:59 -0400, Alvaro Herrera wrote: > Here's is a very simple, low-tech idea. How about checking whether the > new index requires chilling tuples; if it does, then elog(ERROR) until > all the indexes have been manually chilled, which would be done with an > "ALTER INDEX ... CHILL" command or something like that. Only when all > indexes are known chilled, you can create another one, and then the user > can "hotify" indexes as appropriate. Well, I've spent two weeks searching for a design that does CREATE INDEX without changing existing functionality. What's been proposed is very close, but not exact. CREATE INDEX CONCURRENTLY can work, so we're just discussing the other increasingly edgy cases. I agree some kind of compromise on CREATE INDEX seems to be required if we want HOT without some drastic reductions in function. I'm happy to go for low tech approaches, or anything really. Simple is good, so we can hit feature freeze. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)
From
"Pavan Deolasee"
Date:
How do we move forward with the CREATE INDEX issue with HOT ? There are quite a few suggestions and objections. Can we please discuss and decide on the plan ? I am very comfortable with the current state of HOT, the results are encouraging and I hope this issue does not become a showstopper. Here is what different people have suggested: 1. Simon and I posted couple of designs which include prior suggestions from Heikki, Tom, Hannu and others. I believe these designs are not very distinct and if done correctly would help us keep the current behavior of CREATE INDEX and CREATE INDEX CONCURRENTLY same. There are concerns about race conditions and deadlock issues though. What is a general feeling at this point ? Are these issues very hard to address ? At a high level, IMO we need some or all of these things to make either of these designs work: - ability to enable/disable HOT-updates on the table - ability to enable/disable HOT-fetches on the table - abilityto wait for concurrent index scans to finish - ability to allow only one backend to CHILL the table How hard would these be without causing any race conditions and deadlocks ? 2. Heikki suggested an approach where we add a byte to tuple header and track HOT-ness of different indexes. The idea looks good but had a downside of increasing tuple header and complexity. 3. Alvaro suggested to have a ALTER TABLE .. SET CHILL kind of syntax to chill the table. I would like to extend that a bit further: Can we have a mechanism to explicitely ON/OFF HOT-updates on tables ? By default, HOT is disabled. If user wants to take advantage of HOT, he/she should do something like: ALTER TABLE test SET HOT ON; While creating an index, if a HEAP_ONLY tuple is found, CREATE INDEX [CONCURRENTLY] fails with an error and the user needs to SET HOT OFF and then try again. While turning HOT off, the entire table is CHILLed, holding AccessExclusive lock on the table. Once the new index is created, user can turn HOT on again. This infrastructure would ensure that the current behavioral expectations of CREATE INDEX [CONCURRENTLY] don't suddenly change unless user explicitely turns HOT on. In that case, we can very well assume that the user is aware of the advantages/disadvantages of HOT and also cost associated with creating a new index on HOT-updated tables. This would also allow users experiment with HOT and turn it off if there is no gain in their environment. Any thoughts on the these approaches ? What is feasible and acceptable ? I am more inclined towards the third approach for this release and improve things later on. Is this acceptable ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Sat, 2007-03-17 at 12:43 +0530, Pavan Deolasee wrote: > How do we move forward with the CREATE INDEX issue with > HOT ? There are quite a few suggestions and objections. > Can we please discuss and decide on the plan ? I am very > comfortable with the current state of HOT, the results > are encouraging and I hope this issue does not become > a showstopper. My suggested plan would be: First, we should make CREATE INDEX CONCURRENTLY work, since we have a design to do that. Many other things are acceptable if that works fully and correctly with HOT, if it cannot, we have problems and that is a showstopper. Second, I can't put my finger on it exactly, but I think having a DDL command to chill a table, as Alvaro suggests, sounds like a very wise utility to include, whatever we do later. That does have the useful by-product of making CREATE INDEX work without change, but I would not want to rely on it in the longer term. While those are happening we can explore the other possibilities in more detail, but they do seem like safe bets to me. > Here is what different people have suggested: > > 1. Simon and I posted couple of designs which include prior > suggestions from Heikki, Tom, Hannu and others. I believe > these designs are not very distinct and if done correctly > would help us keep the current behavior of CREATE INDEX > and CREATE INDEX CONCURRENTLY same. Agreed. Apologies for confusing things by posting the second design, that wasn't very helpful or polite. > There are concerns > about race conditions and deadlock issues though. What > is a general feeling at this point ? Are these issues very > hard to address ? At a high level, IMO we need some or all > of these things to make either of these designs work: > > - ability to enable/disable HOT-updates on the table > - ability to enable/disable HOT-fetches on the table > - ability to wait for concurrent index scans to finish > - ability to allow only one backend to CHILL the table That seems like a good summary of what we need. I would add only that these need not be explicit user-level commands, just internal capabilities. > How hard would these be without causing any race conditions > and deadlocks ? We should be able to take advantage of two things: - We only need to chill some of the tuples, in some cases. - Chilling only requires us to lock the indexes, which would be more practical if the index locking were more transitory. I note that IndexScans hold the locks they have on an index until transaction end, even after the IndexScans and BitmapIndexScans complete. ISTM we could relax that locking; that seems OK since we earlier discussed removing locks completely on indexes. The lock on the heap would not be upgraded or changed. > 2. Heikki suggested an approach where we add a byte > to tuple header and track HOT-ness of different indexes. > The idea looks good but had a downside of increasing tuple > header and complexity. I'm very comfortable with the idea that HOT can be turned on/off for a table. That gives us a workaround to bugs. Previously, changing things like WITHOUT OIDS was done over two releases, so I'd suggest the same thing here. Add the option now, disabled, then look to make it the default option in the next release. We can override that with the default_use_hot parameter for those that feel bold, at least initially. I know Bruce has been long opposed to the idea of a table-level switch, which is why we've been trying so hard to avoid it. So we should add his -1 to this idea from the start. Right now, ideas around (2) sound like too much complexity. Maybe there's a good idea there waiting to break out, so we should pursue that also - but I'm not sure you can wait for that to happen. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)
From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes: > While creating an index, if a HEAP_ONLY tuple is found, > CREATE INDEX [CONCURRENTLY] fails with an error and the > user needs to SET HOT OFF and then try again. While turning > HOT off, the entire table is CHILLed, holding AccessExclusive > lock on the table. Once the new index is created, user > can turn HOT on again. It hardly seems acceptable to require exclusive lock to chill a table. In production situations, knowing that you'd have to do that to do index maintenance on a large table would probably scare you off of ever enabling the feature at all. Last year we were getting beaten up about how it wasn't acceptable for CREATE INDEX to lock out writes for a long time; how is it suddenly acceptable to need to lock out both reads and writes for a long time before you can even think about creating an index? regards, tom lane
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Sat, 2007-03-17 at 11:45 -0400, Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes: > > While creating an index, if a HEAP_ONLY tuple is found, > > CREATE INDEX [CONCURRENTLY] fails with an error and the > > user needs to SET HOT OFF and then try again. While turning > > HOT off, the entire table is CHILLed, holding AccessExclusive > > lock on the table. Once the new index is created, user > > can turn HOT on again. > > It hardly seems acceptable to require exclusive lock to chill a table. > In production situations, knowing that you'd have to do that to do > index maintenance on a large table would probably scare you off of > ever enabling the feature at all. Last year we were getting beaten up > about how it wasn't acceptable for CREATE INDEX to lock out writes > for a long time; how is it suddenly acceptable to need to lock out > both reads and writes for a long time before you can even think > about creating an index? I agree with you: It isn't acceptable for us to contemplate an AccessExclusiveLock before we can build any index. We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is I think we can without significant difficulty. The problems are with CREATE INDEX, in some cases. I regret that I did not see those difficulties until recently, which is why I was concerned that we spent time on VACUUM FULL rather than this issue. I'm glad now that you both pressed ahead and solved that though. As a result of the issues, I think Pavan is playing safe, to make sure there is *an* option, so that we can build upwards from there. The proposal is pragmatism only, while we discuss other approaches. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)
From
"Pavan Deolasee"
Date:
Tom Lane wrote:> "Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:>> While creating an index, if a HEAP_ONLY tupleis found,>> CREATE INDEX [CONCURRENTLY] fails with an error and the>> user needs to SET HOT OFF and then try again.While turning>> HOT off, the entire table is CHILLed, holding AccessExclusive>> lock on the table. Once the new indexis created, user>> can turn HOT on again.>> It hardly seems acceptable to require exclusive lock to chill a table.>In production situations, knowing that you'd have to do that to do> index maintenance on a large table would probablyscare you off of> ever enabling the feature at all. Last year we were getting beaten up> about how it wasn't acceptablefor CREATE INDEX to lock out writes> for a long time; how is it suddenly acceptable to need to lock out> both readsand writes for a long time before you can even think> about creating an index?> Yeah, I agree. I was proposing this as a stop-gap solution though. Something which would help us solve the problem without changing the current behavior for non-HOT tables. So what do you suggest ? Do you feel that there is no way we can solve the problem ? ISTM that if we run CHILL as a seperate transaction (just like VACUUM), we should be able to CHILL the table with ShareUpdateExclusiveLock. Running it as a seperate transaction would reduce the risk of causing deadlocks. Is that a fair assessment ? If we need to CHILL with ShareUpdateExclusiveLock, IMHO we would again be back to something similar to the first approach. I know you said its fragile and full of race conditions, but do you think we can handle it better if we have a seperate DDL command, running within its own transaction ? The algorithm would look like: 1. Disable HOT-updates 2. CHILL the table by inserting appropriate index entries and marking tuples CHILL_IN_PROGRESS 3. Establish a point when there are no open index scans 4. Disable HOT-fetches 5. Scan the heap again, reset CHILL_IN_PROGRESS, HEAP_ONLY and HOT_UPDATED flags 6. Establish a point when there are no open index scans 7. Enable HOT-fetches 8. Enable HOT-updates I need help to do the step 1,3,4,6,7 and 8 (well all :-)) in a deadlock and race condition free mannner. Any suggestions ? Where do we keep the global state about HOT-updates/HOT-fetches ? If we keep it in pg_class, a crash of the CHILL command or the server may leave the pg_class row in a stale state. That does not look like a problem though. In the worst case, we might not be able to do HOT-updates without manual intervention. Again comments, suggestions ? I really appreciate everyone's time and patience. Help is what I need to solve this problem. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>> As a result of the issues, I think Pavan is playing safe, to make sure> there is *an* option, so thatwe can build upwards from there. The> proposal is pragmatism only, while we discuss other approaches.> Absolutely true. I agree that CHILLing the table with AccessExclusive lock is not a solution that I like. I was looking at it just as a stop-gap solution, given the time left for feature freeze. But if there is something better that we can do, I am for it. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Sat, 2007-03-17 at 23:11 +0530, Pavan Deolasee wrote: > The algorithm would look like: > > 1. Disable HOT-updates The lock held by CREATE INDEX already prevents HOT updates. So steps 1 and 8 aren't needed. We need to be clear that we already have a solution to CREATE INDEX CONCURRENTLY. Do you agree that we do? Does anyone see a problem with the posted design for that? Hopefully it is only CREATE INDEX that we need to think about. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>> We need to be clear that we already have a solution to CREATE INDEX> CONCURRENTLY. Do you agree thatwe do? Does anyone see a problem with> the posted design for that?>> Hopefully it is only CREATE INDEX that we need tothink about.> I agree. Lets first decide whether its only CREATE INDEX that needs solution or its both. Lets also decide whether we want to fix CREATE INDEX [CONCURRENTLY] or we want to provide a seperate DLL to CHILL the table and then build index normally. Tom has already rejected the idea of holding exclusive lock while chilling, but if we get around that, is rest of the approach acceptable ? If its going to be a seperate DLL, lets decide whether its acceptable to run it as a seperate transaction. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Sun, 2007-03-18 at 00:44 +0530, Pavan Deolasee wrote: > Simon Riggs wrote: > > > > We need to be clear that we already have a solution to CREATE INDEX > > CONCURRENTLY. Do you agree that we do? Does anyone see a problem with > > the posted design for that? If we have solved CREATE INDEX CONCURRENTLY, then I would propose that this becomes the default option for creating an index, when the statement is issued outside of a statement block. That seems better than reminding everybody to run with the CONCURRENTLY option, or advise them of different performance characteristics or behaviour of the normal CREATE INDEX route. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>>> We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is> I think we can without significantdifficulty.> Yeah, I think CREATE INDEX CONCURRENTLY is much easier to solve. Though I am not completely convinced that we can do that without much changes to CREATE INDEX CONCURRENTLY logic. For example, I believe we still need to lock out HOT-updates before we start CREATE INDEX CONCURRENTLY. Otherwise we might end up creating two paths to the same tuple in the new index. Say, we have a table with two columns (int a, int b). We have an index on 'a' and building another index on 'b'. We got a tuple (10, 20) in the heap. In the first phase of CREATE INDEX CONCURRENTLY, this tuple would be indexed. If the tuple is HOT-updated to (10, 30) before the first phase ends, the updated tuple would again get indexed in the second phase. This would lead to two paths to the latest visible tuple from the new index. Am I missing something in your design that stops this from happening ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Mon, 2007-03-19 at 14:33 +0530, Pavan Deolasee wrote: > Simon Riggs wrote: > > > > > > We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is > > I think we can without significant difficulty. > > > > Yeah, I think CREATE INDEX CONCURRENTLY is much easier to solve. Though > I am not completely convinced that we can do that without much changes > to CREATE INDEX CONCURRENTLY logic. For example, I believe we still > need to lock out HOT-updates before we start CREATE INDEX CONCURRENTLY. > Otherwise we might end up creating two paths to the same tuple in > the new index. > > Say, we have a table with two columns (int a, int b). We have an > index on 'a' and building another index on 'b'. We got a tuple > (10, 20) in the heap. In the first phase of CREATE INDEX CONCURRENTLY, > this tuple would be indexed. If the tuple is HOT-updated to (10, 30) > before the first phase ends, the updated tuple would again get > indexed in the second phase. This would lead to two paths to the > latest visible tuple from the new index. > > Am I missing something in your design that stops this from > happening ? This problem is solved by moving the wait (for all transactions in reference snapshot to finish) so that it is now between the first and second scans, as described. During the second scan we would prune each block, so the only remaining tuple in the block when the second scan sees it would be (10,30) and it would no longer be a HOT tuple - the index would have a pointer to it, so no new index pointer would be added. The pointer to (10,30) is the same pointer that was added in the first phase for the tuple (10,20). The wait and subsequent prune ensures that all HOT tuples are now the root of their HOT chain. The index created in the fist phase ensures that the HOT chains are never added to. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Mon, 2007-03-19 at 09:28 +0000, Simon Riggs wrote: > On Mon, 2007-03-19 at 14:33 +0530, Pavan Deolasee wrote: > > Simon Riggs wrote: > > > > > > > > > We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is > > > I think we can without significant difficulty. > > > > > > > Yeah, I think CREATE INDEX CONCURRENTLY is much easier to solve. Though > > I am not completely convinced that we can do that without much changes > > to CREATE INDEX CONCURRENTLY logic. For example, I believe we still > > need to lock out HOT-updates before we start CREATE INDEX CONCURRENTLY. > > Otherwise we might end up creating two paths to the same tuple in > > the new index. > > > > Say, we have a table with two columns (int a, int b). We have an > > index on 'a' and building another index on 'b'. We got a tuple > > (10, 20) in the heap. In the first phase of CREATE INDEX CONCURRENTLY, > > this tuple would be indexed. If the tuple is HOT-updated to (10, 30) > > before the first phase ends, the updated tuple would again get > > indexed in the second phase. This would lead to two paths to the > > latest visible tuple from the new index. > > > > Am I missing something in your design that stops this from > > happening ? > > This problem is solved by moving the wait (for all transactions in > reference snapshot to finish) so that it is now between the first and > second scans, as described. > > During the second scan we would prune each block, so the only remaining > tuple in the block when the second scan sees it would be (10,30) and it > would no longer be a HOT tuple - the index would have a pointer to it, > so no new index pointer would be added. The pointer to (10,30) is the > same pointer that was added in the first phase for the tuple (10,20). > > The wait and subsequent prune ensures that all HOT tuples are now the > root of their HOT chain. The index created in the fist phase ensures > that the HOT chains are never added to. AFAICS this is all you need to make CREATE INDEX CONCURRENTLY work with HOT, which is even simpler than my original post. [This presumes that we do pruning automatically on a heap scan, not sure what the current state of that is, but it could be a scan option]. Index: src/backend/commands/indexcmds.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.157 diff -c -r1.157 indexcmds.c *** src/backend/commands/indexcmds.c 13 Mar 2007 00:33:39 -0000 1.157 --- src/backend/commands/indexcmds.c 19 Mar 2007 09:49:56 -0000 *************** *** 497,507 **** ActiveSnapshot = snapshot; /* - * Scan the index and the heap, insert any missing index entries. - */ - validate_index(relationId, indexRelationId, snapshot); - - /* * The index is now valid in the sense that it contains all currently * interesting tuples. But since it might not contain tuples deleted just * before the reference snap was taken, we have to wait out any --- 497,502 ---- *************** *** 514,519 **** --- 509,519 ---- for (ixcnt = 0; ixcnt < snapshot->xcnt; ixcnt++) XactLockTableWait(snapshot->xip[ixcnt]); + /* + * Scan the index and the heap, insert any missing index entries. + */ + validate_index(relationId, indexRelationId, snapshot); + /* Index can now be marked valid -- update its pg_index entry */ pg_index = heap_open(IndexRelationId, RowExclusiveLock); -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)
From
Heikki Linnakangas
Date:
Pavan Deolasee wrote: > 2. Heikki suggested an approach where we add a byte > to tuple header and track HOT-ness of different indexes. > The idea looks good but had a downside of increasing tuple > header and complexity. We would only need the extra byte in HOT-updated tuples. Alternatively, we could use the bits we have free in infomask2. There's currently 5 bits free, using just 2 or 3 of those would get us quite far. Or just one, which would be the Tom's suggestion of only using HOT for tables with a single index. Complexity is in the eye of the beholder. Chilling existing tuples isn't exactly trivial either, and neither is getting all the locking and waiting needed in the other proposals correct. The simplicity of the other proposals depend a lot on what kind of restrictions and changes to current semantics of CREATE INDEX [CONCURRENTLY] we accept. Which of the following restrictions are we OK with, if a table has HOT-updated tuples: 1. Throw an error 2. Require a vacuum after crash during CREATE INDEX 3. Do multiple heap-scan passes 4. Wait longer in CREATE INDEX CONCURRENTLY 5. Wait in CREATE INDEX, like we do in CREATE INDEX CONCURRENTLY 6. Lock the table exclusively 7. Disallow multiple CREATE INDEXes at the same time. I've lost track of which proposals lead to which restrictions. Maybe we should look at the restrictions first, and judge which ones are acceptable and which ones are not? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)
From
"Pavan Deolasee"
Date:
Heikki Linnakangas wrote:> Pavan Deolasee wrote:>> 2. Heikki suggested an approach where we add a byte>> to tuple headerand track HOT-ness of different indexes.>> The idea looks good but had a downside of increasing tuple>> header andcomplexity.>> We would only need the extra byte in HOT-updated tuples. Alternatively, we could use the bits we have free in infomask2. There's currently 5 bits free, using just 2 or 3 of those would get us quite far. Or just one, which would be the Tom's suggestion of only using HOT for tables with a single index.> We've already used three of those, two for tracking HEAP_ONLY and HOT_UPDATED tuples and one for tracking fragmented tuple. Doing it for just one index seems too restrictive. Are we ok with adding another byte to the tuple header ? > Complexity is in the eye of the beholder. Chilling existing tuples isn't exactly trivial either, and neither is getting all the locking and waiting needed in the other proposals correct.> I agree. I am just worried about the short term and long term solution. Your proposal is certainly the better of all as it also gives us the ability to restrict bloats on a index whose key does not change during UPDATE. I would like to do something which is acceptable and is also feasible to complete by feature freeze. Do you want to give a shot to this approach while I try to build the ALTER TABLE and CHILL utilities ? > The simplicity of the other proposals depend a lot on what kind of restrictions and changes to current semantics of CREATE INDEX [CONCURRENTLY] we accept. Which of the following restrictions are we OK with, if a table has HOT-updated tuples:>> 1. Throw an error> 2. Require a vacuum after crash during CREATE INDEX> 3. Domultiple heap-scan passes> 4. Wait longer in CREATE INDEX CONCURRENTLY> 5. Wait in CREATE INDEX, like we do in CREATE INDEXCONCURRENTLY> 6. Lock the table exclusively> 7. Disallow multiple CREATE INDEXes at the same time.>> I've lost trackof which proposals lead to which restrictions. Maybe we should look at the restrictions first, and judge which ones are acceptable and which ones are not?> This is a good summary. With the assumption that creating index is not very frequent operation, I would live with 1, 2, 3 and 4. But frankly I'm least knowledgable in this regard and would rely on others to decide. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)
From
"Pavan Deolasee"
Date:
Simon Riggs wrote:>> This problem is solved by moving the wait (for all transactions in> reference snapshot to finish) sothat it is now between the first and> second scans, as described.>> During the second Vscan we would prune each block,so the only remaining> tuple in the block when the second scan sees it would be (10,30) and it> would no longer bea HOT tuple - the index would have a pointer to it,> so no new index pointer would be added. The pointer to (10,30) isthe> same pointer that was added in the first phase for the tuple (10,20).> The problem is that in the first phase, the pointer was inserted with key=20 whereas now its changed to 30. So we need to delete the old index entry and add a new one. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)
From
Heikki Linnakangas
Date:
Pavan Deolasee wrote: > Heikki Linnakangas wrote: > > Pavan Deolasee wrote: > > We would only need the extra byte in HOT-updated tuples. > Alternatively, we could use the bits we have free in infomask2. There's > currently 5 bits free, using just 2 or 3 of those would get us quite > far. Or just one, which would be the Tom's suggestion of only using HOT > for tables with a single index. > > > > We've already used three of those, two for tracking HEAP_ONLY > and HOT_UPDATED tuples and one for tracking fragmented tuple. HEAP_ONLY_TUPLE would go away in favor of the per-index bits. So we have bits available for three indexes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Mon, 2007-03-19 at 16:06 +0530, Pavan Deolasee wrote: > Simon Riggs wrote: > > > > This problem is solved by moving the wait (for all transactions in > > reference snapshot to finish) so that it is now between the first and > > second scans, as described. > > > > During the second Vscan we would prune each block, so the only remaining > > tuple in the block when the second scan sees it would be (10,30) and it > > would no longer be a HOT tuple - the index would have a pointer to it, > > so no new index pointer would be added. The pointer to (10,30) is the > > same pointer that was added in the first phase for the tuple (10,20). > > > > The problem is that in the first phase, the pointer was inserted > with key=20 whereas now its changed to 30. So we need to delete the old > index entry and add a new one. So don't index HOT tuples in the first phase, wait until the second. That should be just a single if() test in IndexBuildHeapScan(). -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Mon, 2007-03-19 at 10:51 +0000, Heikki Linnakangas wrote: > Pavan Deolasee wrote: > > Heikki Linnakangas wrote: > > > Pavan Deolasee wrote: > > > We would only need the extra byte in HOT-updated tuples. > > Alternatively, we could use the bits we have free in infomask2. There's > > currently 5 bits free, using just 2 or 3 of those would get us quite > > far. Or just one, which would be the Tom's suggestion of only using HOT > > for tables with a single index. > > > > > > > We've already used three of those, two for tracking HEAP_ONLY > > and HOT_UPDATED tuples and one for tracking fragmented tuple. > > HEAP_ONLY_TUPLE would go away in favor of the per-index bits. So we have > bits available for three indexes. ISTM that we are getting very close to a great idea here. I was unwilling to compromise to have HOT if only one index existed, but IMHO allowing HOT with <= 3 indexes is an acceptable compromise for this release. (We can always use vertical partitioning techniques to allow additional access paths to be added to the same table - I'd be very happy to document that with worked examples, if requried). I trust that we will think of ways of extending that limit in later releases. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
Hannu Krosing
Date:
Ühel kenal päeval, E, 2007-03-19 kell 12:05, kirjutas Simon Riggs: > On Mon, 2007-03-19 at 10:51 +0000, Heikki Linnakangas wrote: > > Pavan Deolasee wrote: > > > Heikki Linnakangas wrote: > > > > Pavan Deolasee wrote: > > > > We would only need the extra byte in HOT-updated tuples. > > > Alternatively, we could use the bits we have free in infomask2. There's > > > currently 5 bits free, using just 2 or 3 of those would get us quite > > > far. Or just one, which would be the Tom's suggestion of only using HOT > > > for tables with a single index. > > > > > > > > > > We've already used three of those, two for tracking HEAP_ONLY > > > and HOT_UPDATED tuples and one for tracking fragmented tuple. > > > > HEAP_ONLY_TUPLE would go away in favor of the per-index bits. So we have > > bits available for three indexes. But you probably have to do some kind of SUPERFULL VACUUM if you want to DROP and CREATE the third index. You will probably have to touch all tuples, regardless of weather they are live or not, or if will be moved or not, just to kclean ot bits for the just-deleted index. Maybe a CLUSTER would be an answer here. > ISTM that we are getting very close to a great idea here. > > I was unwilling to compromise to have HOT if only one index existed, but > IMHO allowing HOT with <= 3 indexes is an acceptable compromise for this > release. (We can always use vertical partitioning techniques to allow > additional access paths to be added to the same table - I'd be very > happy to document that with worked examples, if requried). Maybe using more than one TOAST table as means of vertical partitioning ? > I trust that we will think of ways of extending that limit in later > releases. > -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)
From
Heikki Linnakangas
Date:
Hannu Krosing wrote: > Ühel kenal päeval, E, 2007-03-19 kell 12:05, kirjutas Simon Riggs: >> On Mon, 2007-03-19 at 10:51 +0000, Heikki Linnakangas wrote: >>> Pavan Deolasee wrote: >>>> We've already used three of those, two for tracking HEAP_ONLY >>>> and HOT_UPDATED tuples and one for tracking fragmented tuple. >>> HEAP_ONLY_TUPLE would go away in favor of the per-index bits. So we have >>> bits available for three indexes. > > But you probably have to do some kind of SUPERFULL VACUUM if you want to > DROP and CREATE the third index. You will probably have to touch all > tuples, regardless of weather they are live or not, or if will be moved > or not, just to kclean ot bits for the just-deleted index. DROP INDEX wouldn't do anything extra. CREATE INDEX would have to clear the bit assigned to the new index, which would mean dirtying every heap page in the worst case. As a further optimization, CREATE INDEX could skip index inserts for HOT-updated tuples, if the key for new index wasn't changed, and leave the flag set. Vacuum isn't needed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Merlin Moncure"
Date:
On 3/17/07, Simon Riggs <simon@2ndquadrant.com> wrote: > I'm very comfortable with the idea that HOT can be turned on/off for a > table. That gives us a workaround to bugs. Previously, changing things > like WITHOUT OIDS was done over two releases, so I'd suggest the same > thing here. Add the option now, disabled, then look to make it the > default option in the next release. We can override that with the > default_use_hot parameter for those that feel bold, at least initially. > I know Bruce has been long opposed to the idea of a table-level switch, > which is why we've been trying so hard to avoid it. So we should add his > -1 to this idea from the start. Is fear of bugs a justification of guc setting? Or is there a trade-off involved with HOT? merlin
Re: CREATE INDEX and HOT (was Question:pg_classattributes and race conditions ?)
From
"Simon Riggs"
Date:
On Mon, 2007-03-19 at 10:29 -0500, Merlin Moncure wrote: > On 3/17/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > I'm very comfortable with the idea that HOT can be turned on/off for a > > table. That gives us a workaround to bugs. Previously, changing things > > like WITHOUT OIDS was done over two releases, so I'd suggest the same > > thing here. Add the option now, disabled, then look to make it the > > default option in the next release. We can override that with the > > default_use_hot parameter for those that feel bold, at least initially. > > I know Bruce has been long opposed to the idea of a table-level switch, > > which is why we've been trying so hard to avoid it. So we should add his > > -1 to this idea from the start. > > Is fear of bugs a justification of guc setting? Probably not on its own, but the inspiration was that we currently have user-visible behaviour in the recent proposals, hence the GUC. > Or is there a trade-off involved with HOT? At the moment, there is no downside to HOT in normal operation that I'm aware of, but its a great question. The problem we have is with normal CREATE INDEX because there are two sources of race conditions that complicate this: concurrent index scans and crash safety. Currently there are no perfect solutions to this. We have two main options: 1. additional locking, either within CIDX or as a separate DDL 2. additional complexity and possible limitation in the number of indexes to just 3 before we stop doing HOT updates. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Merlin Moncure"
Date:
On 3/19/07, Pavan Deolasee <pavan.deolasee@enterprisedb.com> wrote: > Yeah, I think CREATE INDEX CONCURRENTLY is much easier to solve. Though > I am not completely convinced that we can do that without much changes > to CREATE INDEX CONCURRENTLY logic. For example, I believe we still > need to lock out HOT-updates before we start CREATE INDEX CONCURRENTLY. > Otherwise we might end up creating two paths to the same tuple in > the new index. > > Say, we have a table with two columns (int a, int b). We have an > index on 'a' and building another index on 'b'. We got a tuple > (10, 20) in the heap. In the first phase of CREATE INDEX CONCURRENTLY, > this tuple would be indexed. If the tuple is HOT-updated to (10, 30) > before the first phase ends, the updated tuple would again get > indexed in the second phase. This would lead to two paths to the > latest visible tuple from the new index. just a thought...can you disable HOT on the fly? why not disable hot updates completely during these types of operations?. merlin
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Tom Lane wrote: >> What if we only applied >> HOT to primary-key indexes, so that there was certainly not more than >> one index per table that the property applies to? > > The main objective of HOT is to enable retail vacuum of HOT-updated tuples. > Doing the above would make it useless for that purpose, at least when there's > more than one index on the table. Granted, there's a lot of tables with just > one index out there, but it's a big limitation nevertheless. > > An extension of that idea, though is to store a flag per index in the > HOT-updated tuple. We would then need a mapping between bits in the tuple > header to indexes, for example as a new column in pg_index. I had an interesting thought this morning, these bits might less us do retail vacuum in a lot of cases. When you traverse an index and find that an index pointer points to a DEAD tuples you set the LP_DELETE flag. If at that time you set the tuple's bit indicating that the index pointer for that index then if we find all the bits are set (and whatever condition we have for ensuring that all indexes are represented by bits) we know there are no index pointers left and the tuple can now be retail vacuumed. I think in order for this to work we may want a rule that we don't have to dirty a page to set a "index pointer missing" bit though we would certainly need to dirty it (and wal log) it if we *clear* a bit. In other words a set bit would be a guarantee that the index pointer was missing but a clear bit would only be a hint that it might be present. The main problem with this is that it would necessitate WAL logging setting the LP_DELETE flag on index pointers which could be a large overhead for a SELECT. This interacts with two other proposed changes, HOT and truncating line pointers, which I think are both valuable. But I think it works with both. What I would suggest is the following: When we follow an index pointer, find a (non-truncated) DEAD tuple we truncate the line pointer, and initialize the length bits to an empty bitmask of "index pointer missing" flags. In the case of a HOT-updated tuple HOT would have to provide enough information for us to initialize the bitmask -- it's the same information that it needs anyways. Then we set the LP_DELETE flag on our own index pointer (and wal log it) and set the corresponding bit on the page. If we find an index pointer pointing to a truncated line pointer we set the flag in the bitmask. If we find that all the bits are set indicating that all indexes have successfully set their LP_DELETE flag then we the line pointer can be marked as !LP_USED. Of course only certain types of indexes would be able to do this, indexes that only ever have exactly one pointer to every tuple, and which have space for an LP_DELETED or equivalent flag. I believe currently this includes all except GIN. The part I'm most worried about with both this and the equivalent bits for HOT are maintaining the mapping from index to bit. I think it could be worked out, but it has to be done carefully. Dropping an index can't ever change the mapping and creating a new index can't ever leave a tuple with a bit incorrectly set for that index. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
"Jim C. Nasby"
Date:
On Mon, Mar 19, 2007 at 12:05:19PM +0000, Simon Riggs wrote: > I was unwilling to compromise to have HOT if only one index existed, but > IMHO allowing HOT with <= 3 indexes is an acceptable compromise for this > release. (We can always use vertical partitioning techniques to allow > additional access paths to be added to the same table - I'd be very > happy to document that with worked examples, if requried). I'm not sure where we're sitting with this, but I've got another idea I haven't seen (one that I think is better than an arbitrary limit on the number of indexes)... what if we just disallow non-concurrent index builds on hot tables? It sounds like the additional pain involved in chilling an entire table and keeping it chilled for the index build is even more overhead than just doing a concurrent index build. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)
From
Russell Smith
Date:
Jim C. Nasby wrote: > On Mon, Mar 19, 2007 at 12:05:19PM +0000, Simon Riggs wrote: > >> I was unwilling to compromise to have HOT if only one index existed, but >> IMHO allowing HOT with <= 3 indexes is an acceptable compromise for this >> release. (We can always use vertical partitioning techniques to allow >> additional access paths to be added to the same table - I'd be very >> happy to document that with worked examples, if requried). >> > > I'm not sure where we're sitting with this, but I've got another idea I > haven't seen (one that I think is better than an arbitrary limit on the > number of indexes)... what if we just disallow non-concurrent index > builds on hot tables? It sounds like the additional pain involved in > chilling an entire table and keeping it chilled for the index build is > even more overhead than just doing a concurrent index build. > I thought about making it even simpler. Only allow CREATE INDEX builds on non HOT tables. However as I mentioned in another thread, this idea dies if you expect to be able to have HOT enabled by default in any future release. Chilling needs to be able to be done with a regular Vacuum style lock for chilling to be a usable reality. I'm sure there are use cases or this, but it seems unlikely that a high update table is going to have an index added to it. Am I a long way from reality when saying that? Regards Russell Smith