Thread: Re: [GENERAL] Concurrency problem building indexes
> > [ shrug... ] Worksforme. > > > > There is a short interval at the end of the first CREATE INDEX on the > > table where the problem would happen if another CREATE INDEX tries to > > modify the pg_class row before the first one's committed. > I did a pg_dumpall and removed the index creation commands. The first time > I run the index build, I usually get at least one occurrence. I think that narrows it down nicely. You create the table, load rows, then without analyze create the indexes, thus pg_class is not up to date, and the update needs to be done. My answer to this would be to (have an option to) ommit this relpages and reltuples update. It is imho not the task of create index to update statistics in the first place. I have been burnt by that behavior when creating indexes on empty tables in Informix and never liked it (iirc pg has a workaround for empty tables though). Wes, you could most likely solve your immediate problem if you did an analyze before creating the indexes. Andreas
On 4/25/06 2:18 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > So invent some made-up data. I'd be seriously surprised if this > behavior has anything to do with the precise data being indexed. > Experiment around till you've got something you don't mind posting > that exhibits the behavior you see. My initial attempts last night at duplicating it with a small result set were not successful. I'll see what I can do. On 4/25/06 3:25 AM, "Zeugswetter Andreas DCP SD" <ZeugswetterA@spardat.at> wrote: > Wes, you could most likely solve your immediate problem if you did an > analyze before > creating the indexes. I can try that. Is that going to be a reasonable thing to do when there's 100 million rows per table? I obviously want to minimize the number of sequential passes through the database. Wes
"Zeugswetter Andreas DCP SD" <ZeugswetterA@spardat.at> writes: > My answer to this would be to (have an option to) ommit this relpages > and reltuples update. Wouldn't help, unless you want to get rid of relhasindex too. > Wes, you could most likely solve your immediate problem if you did an > analyze before creating the indexes. No, because that would install an approximate tuple count that the first CREATE INDEX would (most likely) still want to replace. The best I can gather from Wes' mail is that he's somehow getting a higher-than-chance probability that the first two CREATE INDEX commands finish at almost exactly the same time, and thus neither one of them sees the pg_class row as already updated with the correct values. I can't think of any effect in the PG code that would make CREATE INDEX commands tend to converge rather than diverge, so maybe there is some external effect here. Wes, is your machine a dual processor by any chance? Which OS X version exactly? regards, tom lane
On 4/25/06 9:58 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > I can't think of any effect in the PG code that would make CREATE INDEX > commands tend to converge rather than diverge, so maybe there is some > external effect here. Wes, is your machine a dual processor by any > chance? Which OS X version exactly? No, I'm running on a single processor powerbook. I currently am running OS X 10.4.5 on this system, 1GB memory. A while back, I saw a posting (I think from Tom) to the effect of index creation converging due to disk caching. It was along the lines of the slower index would be reading from disk data cached by the first index creation's read. When the faster creation went out to read from disk, the one reading from memory could catch up. Possible? Below is the (mostly psql) output from the creation script. Timing is enabled. It's a bit difficult to follow because of the multiprocessing. The index build script is perl, using popen to talk to psql. $|=1 is set on the command pipe to flush each command to the process. I can provide that if you want. If needed I could rewrite it using DBI to get easier to read logging. This is 6 tables, and I get three errors. The header table has 3 indexes, and the detail table has 4. Wes In buildIndex type=header date=20050904 which=1 939: header date <<<<<----- The number is the PID of the child process 939: header msgid \timing Timing is on. create index header_i_date_20050904 on header_20050904 (mdate) TABLESPACE indexes; header 1 forked 939 In buildIndex type=header date=20050904 which=2 942: header originator \timing Timing is on. create index header_i_originator_20050904 on header_20050904 (originator) TABLESPACE indexes; header 2 forked 942 Waiting for index builds to complete CREATE INDEX Time: 79.463 ms \q ERROR: tuple concurrently updated create index header_i_msgid_20050904 on header_20050904 (messageid) TABLESPACE indexes; 942 completed. CREATE INDEX Time: 7.125 ms \q 939 completed. header 1 forked 946 header 2 forked 947 Waiting for index builds to complete In buildIndex type=header date=20050905 which=1 In buildIndex type=header date=20050905 which=2 946: header date 946: header msgid 947: header originator \timing Timing is on. create index header_i_originator_20050905 on header_20050905 (originator) TABLESPACE indexes; \timing Timing is on. create index header_i_date_20050905 on header_20050905 (mdate) TABLESPACE indexes; CREATE INDEX Time: 444.957 ms create index header_i_msgid_20050905 on header_20050905 (messageid) TABLESPACE indexes; CREATE INDEX Time: 569.063 ms \q 947 completed. CREATE INDEX Time: 293.467 ms \q 946 completed. header 1 forked 953 In buildIndex type=header date=20050906 which=1 In buildIndex type=header date=20050906 which=2 header 2 forked 954 Waiting for index builds to complete 953: header date 953: header msgid 954: header originator \timing Timing is on. create index header_i_originator_20050906 on header_20050906 (originator) TABLESPACE indexes; \timing Timing is on. create index header_i_date_20050906 on header_20050906 (mdate) TABLESPACE indexes; CREATE INDEX Time: 1048.094 ms create index header_i_msgid_20050906 on header_20050906 (messageid) TABLESPACE indexes; ERROR: tuple concurrently updated \q 954 completed. CREATE INDEX Time: 566.794 ms \q 953 completed. In buildIndex type=detail date=20050904 which=1 960: detail originator 960: detail date \timing Timing is on. create index detail_i_originator_20050904 on detail_20050904 (originator) TABLESPACE indexes; detail 1 forked 960 In buildIndex type=detail date=20050904 which=2 963: detail recipient 963: detail msgid \timing Timing is on. create index detail_i_recipient_20050904 on detail_20050904 (recipient) TABLESPACE indexes; CREATE INDEX Time: 11.679 ms create index detail_i_msgid_20050904 on detail_20050904 (messageid) TABLESPACE indexes; CREATE INDEX Time: 4.952 ms \q detail 2 forked 963 Waiting for index builds to complete 963 completed. CREATE INDEX Time: 123.106 ms create index detail_i_date_20050904 on detail_20050904 (mdate) TABLESPACE indexes; CREATE INDEX Time: 5.724 ms \q 960 completed. In buildIndex type=detail date=20050905 which=1 967: detail originator 967: detail date \timing Timing is on. create index detail_i_originator_20050905 on detail_20050905 (originator) TABLESPACE indexes; detail 1 forked 967 In buildIndex type=detail date=20050905 which=2 971: detail recipient 971: detail msgid \timing Timing is on. create index detail_i_recipient_20050905 on detail_20050905 (recipient) TABLESPACE indexes; detail 2 forked 971 Waiting for index builds to complete CREATE INDEX Time: 1223.597 ms create index detail_i_date_20050905 on detail_20050905 (mdate) TABLESPACE indexes; ERROR: tuple concurrently updated create index detail_i_msgid_20050905 on detail_20050905 (messageid) TABLESPACE indexes; CREATE INDEX Time: 629.056 ms \q 967 completed. CREATE INDEX Time: 568.197 ms \q 971 completed. In buildIndex type=detail date=20050906 which=1 975: detail originator 975: detail date \timing Timing is on. create index detail_i_originator_20050906 on detail_20050906 (originator) TABLESPACE indexes; detail 1 forked 975 In buildIndex type=detail date=20050906 which=2 978: detail recipient 978: detail msgid detail 2 forked 978 Waiting for index builds to complete \timing Timing is on. create index detail_i_recipient_20050906 on detail_20050906 (recipient) TABLESPACE indexes; CREATE INDEX Time: 2680.001 ms create index detail_i_date_20050906 on detail_20050906 (mdate) TABLESPACE indexes; CREATE INDEX Time: 2675.191 ms create index detail_i_msgid_20050906 on detail_20050906 (messageid) TABLESPACE indexes; CREATE INDEX Time: 2022.232 ms \q CREATE INDEX Time: 1910.771 ms \q 978 completed. 975 completed.
Wes <wespvp@syntegra.com> writes: > A while back, I saw a posting (I think from Tom) to the effect of index > creation converging due to disk caching. It was along the lines of the > slower index would be reading from disk data cached by the first index > creation's read. When the faster creation went out to read from disk, the > one reading from memory could catch up. Possible? There would be some convergence effect while reading the table contents, but the subsequent sorting and index-writing would be competitive and ought to diverge again. regards, tom lane
I'm late to this thread, but maybe we can make the process of storing the new data in pg_class take a lock using LockObject() or something like that to serialize the access to the pg_class row. The idea would be that this lock doesn't conflict with a LockRelation(), but it would of course conflict with itself so no two CREATE INDEXES can enter that code section concurrently. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > I'm late to this thread, but maybe we can make the process of storing > the new data in pg_class take a lock using LockObject() or something > like that to serialize the access to the pg_class row. I'm inclined to think that the right solution is to fix UpdateStats and setRelhasindex so that they don't use simple_heap_update, but call heap_update directly and cope with HeapTupleUpdated (by looping around and trying the update from scratch). Another thing that's annoying here is that we update the pg_class row twice in some cases --- really we ought to try to get this down to one update. (So we'd only need one instance of the looping logic not two.) I'm not entirely clear on the cleanest way to do that, but am currently thinking that btbuild and friends ought to pass back the tuple counts they obtained, rather than writing them into the catalogs for themselves. IndexCloseAndUpdateStats ought to go away --- the index AM never had any business doing that for itself. regards, tom lane
On Tue, Apr 25, 2006 at 12:48:04PM -0400, Alvaro Herrera wrote: > I'm late to this thread, but maybe we can make the process of storing > the new data in pg_class take a lock using LockObject() or something > like that to serialize the access to the pg_class row. The idea would > be that this lock doesn't conflict with a LockRelation(), but it would > of course conflict with itself so no two CREATE INDEXES can enter that > code section concurrently. Is there anything in comments/docs/list archives about why catalog access uses a bunch of 'magic' instead of treating catalog tables the same as every other table? I realize that ultimately you have to bootstrap somehow (kinda hard to read from pg_class if the info needed to do so is in pg_class), but perhaps switching over to the regular access methods after the system is up would be worth-while. Advantages: Allows for concurrent access (using MVCC) Potentially reduces locking requirements (if snapshots aren't required anymore, each backend should theoretically be able to rely on MVCC to get the right catalog info, though of course this depends on the actual operation) Should allow for much-sought-after triggers on the system catalogs But I'm sure this has come up in the past, I just can't find any info about why not to do this... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Re: Catalog Access (was: [GENERAL] Concurrency problem building indexes)
From
Martijn van Oosterhout
Date:
On Tue, Apr 25, 2006 at 12:25:35PM -0500, Jim C. Nasby wrote: > Is there anything in comments/docs/list archives about why catalog > access uses a bunch of 'magic' instead of treating catalog tables the > same as every other table? I realize that ultimately you have to > bootstrap somehow (kinda hard to read from pg_class if the info needed > to do so is in pg_class), but perhaps switching over to the regular > access methods after the system is up would be worth-while. I don't know if it's explicitly documented, but here's one mail that describes some of the issues: http://www.thelinuxreview.com/pgsql-hackers/2005-03/msg00840.php I think the basic problem is that DDL can't really work within a transaction. If I do an ALTER TABLE, some of these changes need to show up to concurrent transactions (maybe creating a unique index?). I think it's like Tom says in that email, it could be done, but the cost/benefit ratio isn't very good... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > I think the basic problem is that DDL can't really work within a > transaction. If I do an ALTER TABLE, some of these changes need to show > up to concurrent transactions (maybe creating a unique index?). The point is that DDL can't be MVCC. If for instance you add an index to a table, once you commit every other transaction must *immediately* start updating that index when they modify the table. They can't use the excuse of "not my problem because the catalog change postdates the snapshot I'm using". The drop-index case is even worse, since a transaction that believes the index is still present is likely to try to access/update a disk file that's not there anymore. Adding/dropping columns, constraints, triggers, etc all have hazards of the same ilk. > I think it's like Tom says in that email, it could be done, but the > cost/benefit ratio isn't very good... It's barely possible that we could make this happen, but it would be a huge amount of work, and probably a huge amount of instability for a very long time until we'd gotten all the corner cases sorted. I think there are much more productive uses for our development effort. regards, tom lane
Martijn van Oosterhout wrote: > On Tue, Apr 25, 2006 at 12:25:35PM -0500, Jim C. Nasby wrote: >> Is there anything in comments/docs/list archives about why catalog >> access uses a bunch of 'magic' instead of treating catalog tables the >> same as every other table? I realize that ultimately you have to >> bootstrap somehow (kinda hard to read from pg_class if the info needed >> to do so is in pg_class), but perhaps switching over to the regular >> access methods after the system is up would be worth-while. > > I don't know if it's explicitly documented, but here's one mail that > describes some of the issues: > > http://www.thelinuxreview.com/pgsql-hackers/2005-03/msg00840.php O.k. I need to fix that... that should be an archives link ;0 http://archives.postgresql.org/pgsql-hackers/2005-03/msg00840.php Martin how did you find that link with the linuxreview domain ? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
On Tue, Apr 25, 2006 at 11:57:13AM -0700, Joshua D. Drake wrote: > Martijn van Oosterhout wrote: > >http://www.thelinuxreview.com/pgsql-hackers/2005-03/msg00840.php > > O.k. I need to fix that... that should be an archives link ;0 > > http://archives.postgresql.org/pgsql-hackers/2005-03/msg00840.php > > Martin how did you find that link with the linuxreview domain ? Google. It looked the same so I didn't pay attention to the fact the site was different. Oops :) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Tue, Apr 25, 2006 at 11:57:13AM -0700, Joshua D. Drake wrote: >> Martijn van Oosterhout wrote: >>> http://www.thelinuxreview.com/pgsql-hackers/2005-03/msg00840.php >> O.k. I need to fix that... that should be an archives link ;0 >> >> http://archives.postgresql.org/pgsql-hackers/2005-03/msg00840.php >> >> Martin how did you find that link with the linuxreview domain ? > > Google. It looked the same so I didn't pay attention to the fact the > site was different. Oops :) It is just a straggling domain... I just didn't want any confusion. :) Joshua D. Drake > > Have a nice day, -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Ühel kenal päeval, T, 2006-04-25 kell 13:58, kirjutas Tom Lane: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I think the basic problem is that DDL can't really work within a > > transaction. If I do an ALTER TABLE, some of these changes need to show > > up to concurrent transactions (maybe creating a unique index?). > > The point is that DDL can't be MVCC. If for instance you add an index > to a table, once you commit every other transaction must *immediately* > start updating that index when they modify the table. How is it done in a way that all other backends see it, but only after commit ? Is there some secret transaction isolation mode for DDL? Maybe something that fits between "read uncommitted" and "read committed" ? Or is it just that catalog access is always done in read-committed mode, even if transaction is in "serializable" ? Would this take effect even inside a single command ? in other words, if it were possible that an index appeared in middle of a big update, would the tuples updated after the index becomes visible be also added to the index ? The reason I ask, is that I'm still keen on implementin a CONCURRENT INDEX command, and I want to get as much background info as possible before diving in. > They can't use > the excuse of "not my problem because the catalog change postdates the > snapshot I'm using". The drop-index case is even worse, since a > transaction that believes the index is still present is likely to try > to access/update a disk file that's not there anymore. Adding/dropping > columns, constraints, triggers, etc all have hazards of the same ilk. at what point will an add/drop column become visible for parallel transactions ? can trigger/constraint appear magically inside a transaction ? so if I update 5 rows inside one serialisable trx, is it possible that a trigger added to the table after 2nd update will fire for last 3 updates ? btw, i don't think that a stored procedure (cached plans) will pick up something like added/changed default even after commit in both connections. > > I think it's like Tom says in that email, it could be done, but the > > cost/benefit ratio isn't very good... > > It's barely possible that we could make this happen, but it would be a > huge amount of work, and probably a huge amount of instability for a > very long time until we'd gotten all the corner cases sorted. I think > there are much more productive uses for our development effort. True. ------------------- Hannu
Hannu Krosing wrote: > Ühel kenal päeval, T, 2006-04-25 kell 13:58, kirjutas Tom Lane: > > Martijn van Oosterhout <kleptog@svana.org> writes: > > > I think the basic problem is that DDL can't really work within a > > > transaction. If I do an ALTER TABLE, some of these changes need to show > > > up to concurrent transactions (maybe creating a unique index?). > > > > The point is that DDL can't be MVCC. If for instance you add an index > > to a table, once you commit every other transaction must *immediately* > > start updating that index when they modify the table. > > How is it done in a way that all other backends see it, but only after > commit ? > > Is there some secret transaction isolation mode for DDL? Sort of. Catalog accesses normally use SnapshotNow, instead of ActiveSnapshot which is normally used by regular access. The use of ActiveSnapshot is what makes a transaction read committed or serializable; in a serializable transaction, ActiveSnapshot will point to SerializableSnapshot, while on a read committed transaction, ActiveSnapshot will point to a snapshot acquired at the beggining of the command by GetSnapshotData. Have a look at GetTransactionSnapshot() in tqual.c. (The trick is grokking the differences among the various HeapTupleSatisfies routines.) > Would this take effect even inside a single command ? in other words, if > it were possible that an index appeared in middle of a big update, would > the tuples updated after the index becomes visible be also added to the > index ? This can't happen, because an UPDATE to a table requires acquiring a lock (RowExclusiveLock) which conflicts with a lock taken by the CREATE INDEX (ShareLock). You can see the conflict table in lock.c, the LockConflicts array. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Hannu Krosing wrote: >> Would this take effect even inside a single command ? in other words, if >> it were possible that an index appeared in middle of a big update, would >> the tuples updated after the index becomes visible be also added to the >> index ? > This can't happen, because an UPDATE to a table requires acquiring a > lock (RowExclusiveLock) which conflicts with a lock taken by the CREATE > INDEX (ShareLock). Right. By and large, schema-changing operations take AccessExclusive lock on the table they are changing, which guarantees (a) no concurrent operation is is touching the table, and (b) by the time the lock is released, the schema-changing command is already committed and so its catalog changes appear valid to any subsequent transactions touching the table, since they look at the catalogs with SnapshotNow rules. CREATE INDEX is the only schema-changing op that I can think of offhand that takes a non-exclusive lock, and so its little problem with two concurrent operations on the same table is unique. regards, tom lane
On 4/25/06 12:24 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > I'm inclined to think that the right solution is to fix UpdateStats and > setRelhasindex so that they don't use simple_heap_update, but call > heap_update directly and cope with HeapTupleUpdated (by looping around > and trying the update from scratch). Is there a verdict on what can/should/will be done for this? As far as I can tell from all this, there appears to be no workaround (even kludgy) other than to not build indexes in parallel - not an attractive option. If I'm only building two indexes simultaneously, what would happen if I tried to lock pg_class in the shorter index build transaction? Besides seeming like a bad idea... Wes
On Wed, Apr 26, 2006 at 05:24:27PM -0500, Wes wrote: > On 4/25/06 12:24 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > > > I'm inclined to think that the right solution is to fix UpdateStats and > > setRelhasindex so that they don't use simple_heap_update, but call > > heap_update directly and cope with HeapTupleUpdated (by looping around > > and trying the update from scratch). > > Is there a verdict on what can/should/will be done for this? As far as I > can tell from all this, there appears to be no workaround (even kludgy) > other than to not build indexes in parallel - not an attractive option. > > If I'm only building two indexes simultaneously, what would happen if I > tried to lock pg_class in the shorter index build transaction? Besides > seeming like a bad idea... Try running a first index build by itself and then running them in parallel. Hopefully once pg_class has an exact tuple count the conflicting update won't happen. If you actually have an exact tuple count you could also try updating pg_class manually beforehand, but that's not exactly a supported option... Another possibility would be to patch the code so that if the tuplecount found by CREATE INDEX is within X percent of what's already in pg_class it doesn't do the update. Since there's already code to check to see if the count is an exact match, this patch should be pretty simple, and the community might well accept it into the code as well. BTW, why are you limiting yourself to 2 indexes at once? I'd expect that for a table larger than memory you'd be better off building all the indexes at once so that everything runs off a single sequential scan. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > Try running a first index build by itself and then running them in > parallel. Yeah, this is probably the best workaround for now. I think we should look at making it fully concurrent-safe per upthread comments, but that won't be happening in existing release branches. Also, the only case where it's a problem is if the first two index builds finish at almost exactly the same time. It might be possible to overlap the first two index builds with reasonable safety so long as you choose indexes with very different sorting costs (eg, integer vs text columns, different numbers of columns, etc). regards, tom lane
On Wed, Apr 26, 2006 at 06:42:53PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > Try running a first index build by itself and then running them in > > parallel. > > Yeah, this is probably the best workaround for now. I think we should > look at making it fully concurrent-safe per upthread comments, but that > won't be happening in existing release branches. > > Also, the only case where it's a problem is if the first two index > builds finish at almost exactly the same time. It might be possible to > overlap the first two index builds with reasonable safety so long as you > choose indexes with very different sorting costs (eg, integer vs text > columns, different numbers of columns, etc). What about not updating if the tuplecount is within X percent? Would that be safe enough to back-port? I've been trying to think of a reason why disabling the current behavior of CREATE INDEX forcing reltuples to be 100% accurate but I can't think of one... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > What about not updating if the tuplecount is within X percent? Would > that be safe enough to back-port? Even if you got agreement that it was a good idea (I don't think so myself), it wouldn't help Wes, at least not for values of X smaller than 100. Presumably, that first CREATE INDEX is trying to update reltuples from zero to reality. Also, the first CREATE INDEX has to set relhasindex = true, and that's not fuzzy at all. regards, tom lane
On 4/26/06 5:34 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > Try running a first index build by itself and then running them in > parallel. Hopefully once pg_class has an exact tuple count the > conflicting update won't happen. If you actually have an exact tuple > count you could also try updating pg_class manually beforehand, but > that's not exactly a supported option... I thought about that. It would work well for the table with 3 indexes (1/2) either way, but would be an extra pass on the one with 4 (1/2/1 instead of 2/2). > Another possibility would be to patch the code so that if the tuplecount > found by CREATE INDEX is within X percent of what's already in pg_class > it doesn't do the update. Since there's already code to check to see if > the count is an exact match, this patch should be pretty simple, and the > community might well accept it into the code as well. I don't think that would help here. I assume after the COPY, the tuple count is zero, and after the first index build, it is exact. Dumb question... Since COPY has to lock the table, why doesn't it take the current count in pg_class and increment it by the number of rows inserted? If you're doing a clean load of a table, that would result in an exact count. What about your idea of retrying the request if it detects a conflict? > BTW, why are you limiting yourself to 2 indexes at once? I'd expect that > for a table larger than memory you'd be better off building all the > indexes at once so that everything runs off a single sequential scan. I don't know enough about the index build process. My presumption was that while you might get a gain during the read process, the head contention during the sort/write process would be a killer. I don't have enough spindles (yet) to separate out the different indexes. I think you'd only want to do one table at a time to avoid head contention during the read process. Wouldn't this tend to exacerbate the current problem, too? In this specific case, I guess I could do 1,2 and 1,3 parallel builds (for the 3 index/4 index tables). Right now I'm running with rather restricted hardware (1GB memory, two 2-disk RAID 0's and a single disk). If the pilot proves what I think is possible, and I can get real hardware (i.e. an intelligent caching array and some serious memory), things change a bit. Wes
On Wed, Apr 26, 2006 at 07:13:08PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > What about not updating if the tuplecount is within X percent? Would > > that be safe enough to back-port? > > Even if you got agreement that it was a good idea (I don't think so > myself), it wouldn't help Wes, at least not for values of X smaller > than 100. Presumably, that first CREATE INDEX is trying to update > reltuples from zero to reality. It may be, but an ANALYZE would eliminate that need and be far faster than waiting on one entire CREATE INDEX. I'm thinking that even being of by as much as 5% won't matter to the planner, and I can't think of any possible reason to need an exact tuplecount in pg_class... > Also, the first CREATE INDEX has to set relhasindex = true, and that's > not fuzzy at all. Oh, will each index build try and do that? Would changing that be non-invasive enough to backpatch (I'm guessing it's just an added conditional...) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 4/26/06 5:42 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Yeah, this is probably the best workaround for now. I think we should > look at making it fully concurrent-safe per upthread comments, but that > won't be happening in existing release branches. I changed the index build script such that for each table it builds one index by itself, then builds the remaining indexes in parallel. This appears to be stable. I made several runs with no errors. I've got some more testing to do, then I'll try my big run. Thanks Wes
I forgot to mention it in the commit message, but this patch http://archives.postgresql.org/pgsql-committers/2006-05/msg00069.php includes a fix to avoid "tuple concurrently updated" failures in CREATE INDEX's update of the parent relation's pg_class entry. I'm still not convinced that that's where your problem is, however, because AFAICS that ought to be a pretty low-probability scenario. If that fixes it for you then there's something pushing the system to complete index builds at about the same time, and I don't see what that would be. regards, tom lane