Thread: vacuum timings
I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is 400MB and index is 160MB. With index on the single in4 column, I got: 78 seconds for a vacuum121 seconds for vacuum after deleting a single row662seconds for vacuum after deleting the entire table With no index, I got: 43 seconds for a vacuum 43 seconds for vacuum after deleting a single row 43 seconds for vacuum afterdeleting the entire table I find this quite interesting. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is > 400MB and index is 160MB. > With index on the single in4 column, I got: > 78 seconds for a vacuum > 121 seconds for vacuum after deleting a single row > 662 seconds for vacuum after deleting the entire table > With no index, I got: > 43 seconds for a vacuum > 43 seconds for vacuum after deleting a single row > 43 seconds for vacuum after deleting the entire table > I find this quite interesting. How long does it take to create the index on your setup --- ie, if vacuum did a drop/create index, would it be competitive? regards, tom lane
Bruce Momjian wrote: > > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is > 400MB and index is 160MB. > > With index on the single in4 column, I got: > 78 seconds for a vacuum > 121 seconds for vacuum after deleting a single row > 662 seconds for vacuum after deleting the entire table > > With no index, I got: > 43 seconds for a vacuum > 43 seconds for vacuum after deleting a single row > 43 seconds for vacuum after deleting the entire table Wi/wo -F ? Vadim
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian > > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is > 400MB and index is 160MB. > > With index on the single in4 column, I got: > 78 seconds for a vacuum vc_vaconeind() is called once > 121 seconds for vacuum after deleting a single row vc_vaconeind() is called twice Hmmm,vc_vaconeind() takes pretty long time even if it does little. > 662 seconds for vacuum after deleting the entire table > How about half of the rows deleted case ? It would take longer time. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> Bruce Momjian wrote: > > > > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is > > 400MB and index is 160MB. > > > > With index on the single in4 column, I got: > > 78 seconds for a vacuum > > 121 seconds for vacuum after deleting a single row > > 662 seconds for vacuum after deleting the entire table > > > > With no index, I got: > > 43 seconds for a vacuum > > 43 seconds for vacuum after deleting a single row > > 43 seconds for vacuum after deleting the entire table > > Wi/wo -F ? With no -F. I can get you -F times tomorrow. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Hiroshi Inoue wrote: > > > > > With index on the single in4 column, I got: > > 78 seconds for a vacuum > vc_vaconeind() is called once ^^^^^^ not called ? > > > 121 seconds for vacuum after deleting a single row > vc_vaconeind() is called twice > > Hmmm,vc_vaconeind() takes pretty long time even if it does little. It reads all index leaf pages in any case... Vadim
> -----Original Message----- > From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim > Mikheev > > Hiroshi Inoue wrote: > > > > > > > > With index on the single in4 column, I got: > > > 78 seconds for a vacuum > > vc_vaconeind() is called once > ^^^^^^ > not called ? Oops,you are right. vc_scanoneind() is called once. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I loaded 10,000,000 rows into CREATE TABLE test (x INTEGER); Table is > > 400MB and index is 160MB. > > > With index on the single in4 column, I got: > > 78 seconds for a vacuum > > 121 seconds for vacuum after deleting a single row > > 662 seconds for vacuum after deleting the entire table > > > With no index, I got: > > 43 seconds for a vacuum > > 43 seconds for vacuum after deleting a single row > > 43 seconds for vacuum after deleting the entire table > > > I find this quite interesting. > > How long does it take to create the index on your setup --- ie, > if vacuum did a drop/create index, would it be competitive? OK, new timings with -F enabled: index no index519 same load 247 " first vacuum40 " other vacuums1222 X index creation90 X first vacuum80 X other vacuums<1 90 delete one row121 38 vacuum after delete 1 row346 344 deleteall rows440 44 first vacuum20 <1 other vacuums(index is still same size) Conclusions: o indexes never get smallero drop/recreate index is slower than vacuum of indexes What other conclusions can be made? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Fri, 21 Jan 2000, Bruce Momjian wrote: > OK, new timings with -F enabled: > > index no index > 519 same load > 247 " first vacuum > 40 " other vacuums > > 1222 X index creation > 90 X first vacuum > 80 X other vacuums > > <1 90 delete one row > 121 38 vacuum after delete 1 row > > 346 344 delete all rows > 440 44 first vacuum > 20 <1 other vacuums(index is still same size) > > Conclusions: > > o indexes never get smaller this one, I thought, was a known? if I remember right, Vadim changed it so that space was reused, but index never shrunk in size ... no? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Conclusions: > o indexes never get smaller Which we knew... > o drop/recreate index is slower than vacuum of indexes Quite a few people have reported finding the opposite in practice. You should probably try vacuuming after deleting or updating some fraction of the rows, rather than just the all or none cases. regards, tom lane
Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Conclusions: > > o indexes never get smaller > > Which we knew... > > > o drop/recreate index is slower than vacuum of indexes > > Quite a few people have reported finding the opposite in practice. I'm one of them. On 1,5 GB table with three indices it about twice slowly. Probably becouse vacuuming indices brakes system cache policy. (FreeBSD 3.3) -- Dmitry Samersoff, DM\S dms@wplus.net http://devnull.wplus.net * there will come soft rains
[Charset koi8-r unsupported, filtering to ASCII...] > Tom Lane wrote: > > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Conclusions: > > > o indexes never get smaller > > > > Which we knew... > > > > > o drop/recreate index is slower than vacuum of indexes > > > > Quite a few people have reported finding the opposite in practice. > > I'm one of them. On 1,5 GB table with three indices it about twice > slowly. > Probably becouse vacuuming indices brakes system cache policy. > (FreeBSD 3.3) OK, we are researching what things can be done to improve this. We are toying with: lock table for less duration, or read lockcreating another copy of heap/indexes, and rename() over old filesimproving heapvacuum speedimproving index vacuum speedmoving analyze out of vacuum -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Fri, 21 Jan 2000, Bruce Momjian wrote: > [Charset koi8-r unsupported, filtering to ASCII...] > > Tom Lane wrote: > > > > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > Conclusions: > > > > o indexes never get smaller > > > > > > Which we knew... > > > > > > > o drop/recreate index is slower than vacuum of indexes > > > > > > Quite a few people have reported finding the opposite in practice. > > > > I'm one of them. On 1,5 GB table with three indices it about twice > > slowly. > > Probably becouse vacuuming indices brakes system cache policy. > > (FreeBSD 3.3) > > OK, we are researching what things can be done to improve this. We are > toying with: > > lock table for less duration, or read lock if there is some way that we can work around the bug that I believe Tom found with removing the lock altogether (ie. makig use of MVCC), I think that would be the best option ... if not possible, at least get things down to a table lock vs the whole database? a good example is the udmsearch that we are using on the site ... it uses multiple tables to store the dictionary, each representing words of X size ... if I'm searching on a 4 letter word, and the whole database is locked while it is working on the dictionary with 8 letter words, I'm sitting there idle ... at least if we only locked the 8 letter table, everyone not doing 8 letter searches can go on their merry way ... Slightly longer vacuum's, IMHO, are acceptable if, to the end users, its as transparent as possible ... locking per table would be slightly slower, I think, because once a table is finished, the next table would need to have an exclusive lock put on it before starting, so you'd have to possibly wait for that...? > creating another copy of heap/indexes, and rename() over old files sounds to me like introducing a large potential for error here ... > moving analyze out of vacuum I think that should be done anyway ... if we ever get to the point that we're able to re-use rows in tables, then that would eliminate the immediate requirement for vacuum, but still retain a requirement for a periodic analyze ... no? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Dmitry Samersoff wrote: > > Tom Lane wrote: > > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Conclusions: > > > o indexes never get smaller > > > > Which we knew... > > > > > o drop/recreate index is slower than vacuum of indexes > > > > Quite a few people have reported finding the opposite in practice. > > I'm one of them. On 1,5 GB table with three indices it about twice > slowly. > Probably becouse vacuuming indices brakes system cache policy. I'm another. Do the times increase linearly with each index added? Do the times increase linearly for each index for each field in a composite index? Does the field type being indexed have any affect (varchar vs int)? Mike Mascari
The Hermit Hacker <scrappy@hub.org> writes: >> lock table for less duration, or read lock > if there is some way that we can work around the bug that I believe Tom > found with removing the lock altogether (ie. makig use of MVCC), I think > that would be the best option ... if not possible, at least get things > down to a table lock vs the whole database? Huh? VACUUM only requires an exclusive lock on the table it is currently vacuuming; there's no database-wide lock. Even a single-table exclusive lock is bad, of course, if it's a large table that's critical to a 24x7 application. Bruce was talking about the possibility of having VACUUM get just a write lock on the table; other backends could still read it, but not write it, during the vacuum process. That'd be a considerable step forward for 24x7 applications, I think. It looks like that could be done if we rewrote the table as a new file (instead of compacting-in-place), but there's a problem when it comes time to rename the new files into place. At that point you'd need to get an exclusive lock to ensure all the readers are out of the table too --- and upgrading from a plain lock to an exclusive lock is a well-known recipe for deadlocks. Not sure if this can be solved. regards, tom lane
On Fri, 21 Jan 2000, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > >> lock table for less duration, or read lock > > > if there is some way that we can work around the bug that I believe Tom > > found with removing the lock altogether (ie. makig use of MVCC), I think > > that would be the best option ... if not possible, at least get things > > down to a table lock vs the whole database? > > Huh? VACUUM only requires an exclusive lock on the table it is > currently vacuuming; there's no database-wide lock. > > Even a single-table exclusive lock is bad, of course, if it's a large > table that's critical to a 24x7 application. Bruce was talking about > the possibility of having VACUUM get just a write lock on the table; > other backends could still read it, but not write it, during the vacuum > process. That'd be a considerable step forward for 24x7 applications, > I think. > > It looks like that could be done if we rewrote the table as a new file > (instead of compacting-in-place), but there's a problem when it comes > time to rename the new files into place. At that point you'd need to > get an exclusive lock to ensure all the readers are out of the table too > --- and upgrading from a plain lock to an exclusive lock is a well-known > recipe for deadlocks. Not sure if this can be solved. What would it take to re-use space vs compacting/truncating the file? Right now, ppl vacuum the database to clear out old, deleted records, and truncate the tables ... if we were to change things so that an insert/update were to find the next largest contiguous free block in the table and re-used it, then, theoretically, you would eventually hit a fixed table size assuming no new inserts, and only updates/deletes, right? Eventually, you'd have "holes" in the table, where an inserted record was smaller then the "next largest contiguous free block", but what's left over is too small for any further additions ... but I would think that that would greatly reduce how often you'd have to do a vacuum, and, if we split out ANALYZE, you could use that to update statistics ... To speed up the search for the "next largest contiguous free block", a special table.FAT could be used similar to an index? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Conclusions: > o drop/recreate index is slower than vacuum of indexes BTW, I did some profiling of CREATE INDEX this evening (quite unintentionally actually; I was interested in COPY IN, but the pg_dump script I used as driver happened to create some indexes too). I was startled to discover that 60% of the runtime of CREATE INDEX is spent in _bt_invokestrat (which is called from tuplesort.c's comparetup_index, and exists only to figure out which specific comparison routine to call). Of this, a whopping 4% was spent in the useful subroutine, int4gt. All the rest went into lookup and validation checks that by rights should be done once per index creation, not once per comparison. In short: a fairly straightforward bit of optimization will eliminate circa 50% of the CPU time consumed by CREATE INDEX. All we need is to figure out where to cache the lookup results. The optimization would improve insertions and lookups in indexes, as well, if we can cache the lookup results in those scenarios. This was for a table small enough that tuplesort.c could do the sort entirely in memory, so I'm sure the gains would be smaller for a large table that requires a disk-based sort. Still, it seems worth looking into... regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Conclusions: > > o drop/recreate index is slower than vacuum of indexes > > BTW, I did some profiling of CREATE INDEX this evening (quite > unintentionally actually; I was interested in COPY IN, but the pg_dump > script I used as driver happened to create some indexes too). I was > startled to discover that 60% of the runtime of CREATE INDEX is spent in > _bt_invokestrat (which is called from tuplesort.c's comparetup_index, > and exists only to figure out which specific comparison routine to call). > Of this, a whopping 4% was spent in the useful subroutine, int4gt. All > the rest went into lookup and validation checks that by rights should be > done once per index creation, not once per comparison. Good job, Tom. Clearly a huge win. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: owner-pgsql-hackers@postgresql.org > [mailto:owner-pgsql-hackers@postgresql.org]On Behalf Of Tom Lane > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Conclusions: > > o indexes never get smaller > > Which we knew... > > > o drop/recreate index is slower than vacuum of indexes > > Quite a few people have reported finding the opposite in practice. > You should probably try vacuuming after deleting or updating some > fraction of the rows, rather than just the all or none cases. > Vacuum after delelting all rows isn't a worst case. There's no moving in that case and vacuum doesn't need to call index_insert() corresponding to the moving of heap tuples. Vacuum after deleting half of rows may be one of the worst case. In this case,index_delete() is called as many times as 'delete all' case and expensive index_insert() is called for moved_in tuples. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > Vacuum after deleting half of rows may be one of the worst case. Or equivalently, vacuum after updating all the rows. regards, tom lane
> > Quite a few people have reported finding the opposite in practice. > > You should probably try vacuuming after deleting or updating some > > fraction of the rows, rather than just the all or none cases. > > > > Vacuum after delelting all rows isn't a worst case. > There's no moving in that case and vacuum doesn't need to call > index_insert() corresponding to the moving of heap tuples. > > Vacuum after deleting half of rows may be one of the worst case. > In this case,index_delete() is called as many times as 'delete all' > case and expensive index_insert() is called for moved_in tuples. I will test that. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > Quite a few people have reported finding the opposite in practice. > > > You should probably try vacuuming after deleting or updating some > > > fraction of the rows, rather than just the all or none cases. > > > > > > > Vacuum after delelting all rows isn't a worst case. > > There's no moving in that case and vacuum doesn't need to call > > index_insert() corresponding to the moving of heap tuples. > > > > Vacuum after deleting half of rows may be one of the worst case. > > In this case,index_delete() is called as many times as 'delete all' > > case and expensive index_insert() is called for moved_in tuples. > > I will test that. > I tried my test case in less scale than Bruce. CREATE TABLE t (id int4, dt int4); for (i=0; i < 2500000; i++) insert into t values ( i, (i * 1009) % 2500000); delete from t where id < 1250000; 1) vacuum after create index on t(id) 405sec 2) vacuum after create index on t(dt) > 3600sec I gave up to continue execution. 3) vacuum and create index on t(id) and t(dt) 90sec + 114sec + 143sec = 347sec. Seems random index insert is painful for vacuum. Regards. Hiroshi Inoue Inoue@tpf.co.jp