Thread: Vacuum only with 20% old tuples
I suggest that we change vacuum to only move remove tuples if there is more than 20% expired tuples. When we do vacuum, we drop all indexes and recreate them. This fixes the complaint about vacuum slowness when there are many expired rows in the table. We know this is causes by excessive index updates. It allows indexes to shrink (Jan pointed this out to me.) And it fixes the TOAST problem with TOAST values in indexes. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I suggest that we change vacuum to only move remove tuples if there is > more than 20% expired tuples. > When we do vacuum, we drop all indexes and recreate them. > This fixes the complaint about vacuum slowness when there are many > expired rows in the table. We know this is causes by excessive index > updates. It allows indexes to shrink (Jan pointed this out to me.) And > it fixes the TOAST problem with TOAST values in indexes. We can't "drop and recreate" without a solution to the relation versioning issue (unless you are prepared to accept a nonfunctional database after a failure partway through index rebuild on a system table). I think we should do this, but it's not all that simple... I do not see what your 20% idea has to do with this, though, nor why it's a good idea. If I've told the thing to vacuum I think it should vacuum. 20% of a big table could be a lot of megabytes, and I don't want some arbitrary decision in the code about whether I can reclaim that space or not. regards, tom lane
> We can't "drop and recreate" without a solution to the relation > versioning issue (unless you are prepared to accept a nonfunctional > database after a failure partway through index rebuild on a system > table). I think we should do this, but it's not all that simple... > > I do not see what your 20% idea has to do with this, though, nor > why it's a good idea. If I've told the thing to vacuum I think > it should vacuum. 20% of a big table could be a lot of megabytes, > and I don't want some arbitrary decision in the code about whether > I can reclaim that space or not. Well, I think we should do a sequential scan before starting vacuum to find the number of expired rows. Now that we are removing indexes, doing that to remove a few tuples is a major waste. The user can not really know if the table is worth vacuuming in normal use. They are just going to use the default. Now, I think a FORCE option would be good, or the ability to change the 20% default. Remember, commercial db's don't even return unused space if you remove all the rows in a table. At least Informix doesn't and I am sure there are others. I like vacuum, but let's not make it do major hurtles for small gain. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On > Behalf Of Tom Lane > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I suggest that we change vacuum to only move remove tuples if there is > > more than 20% expired tuples. > > > When we do vacuum, we drop all indexes and recreate them. > > > This fixes the complaint about vacuum slowness when there are many > > expired rows in the table. We know this is causes by excessive index > > updates. It allows indexes to shrink (Jan pointed this out to me.) And > > it fixes the TOAST problem with TOAST values in indexes. > > We can't "drop and recreate" without a solution to the relation > versioning issue (unless you are prepared to accept a nonfunctional > database after a failure partway through index rebuild on a system > table). I think we should do this, but it's not all that simple... > Is this topic independent of WAL in the first place ? Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> We can't "drop and recreate" without a solution to the relation >> versioning issue (unless you are prepared to accept a nonfunctional >> database after a failure partway through index rebuild on a system >> table). I think we should do this, but it's not all that simple... > Is this topic independent of WAL in the first place ? Sure, unless Vadim sees some clever way of using WAL to eliminate the need for versioned relations. But as far as I've seen in the discussions, versioned relations are independent of WAL. Basically what I want here is to build the new index relation as a new file (set of files, if large) and then atomically commit it as the new version of the index. If we only want to solve the problem of rebuilding indexes, it's probably not necessary to have true versions, because nothing outside of pg_index refers to an index. You could build a complete new index (new OID, new pg_class and pg_attribute entries, the whole nine yards) as a new set of files, and delete the old index, and your commit of this transaction would atomically replace the index. (Vacuuming pg_index's own indexes this way might be a tad tricky though...) But that approach doesn't solve the problem of making a CLUSTER operation that really works the way it should. So I'd rather see us put the effort into doing relation versions. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> We can't "drop and recreate" without a solution to the relation > >> versioning issue (unless you are prepared to accept a nonfunctional > >> database after a failure partway through index rebuild on a system > >> table). I think we should do this, but it's not all that simple... > > > Is this topic independent of WAL in the first place ? > > Sure, unless Vadim sees some clever way of using WAL to eliminate > the need for versioned relations. But as far as I've seen in the > discussions, versioned relations are independent of WAL. > > Basically what I want here is to build the new index relation as > a new file (set of files, if large) and then atomically commit it > as the new version of the index. > Hmm,your plan seems to need WAL. We must postpone to build indexes until the end of tuple moving in vacuum. Once tuple moving started,the consistency between heap and indexes would be broken. Currently(without WAL) this inconsistency could never be recovered in case of rollback. > If we only want to solve the problem of rebuilding indexes, it's > probably not necessary to have true versions, because nothing outside > of pg_index refers to an index. You could build a complete new index > (new OID, new pg_class and pg_attribute entries, the whole nine yards) > as a new set of files, and delete the old index, and your commit of > this transaction would atomically replace the index. (Vacuuming > pg_index's own indexes this way might be a tad tricky though...) ??? Don't pg_class and pg_attribute needs tricky handling either ? Seems pg_class alone needs to be tricky when we use rel versioning. Anyway we couldn't rely on indexes of currently vacuuming table. I don't think it's easy to maintain indexes of pg_class,pg_indexes, pg_atribute all together properly. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> Basically what I want here is to build the new index relation as >> a new file (set of files, if large) and then atomically commit it >> as the new version of the index. > Hmm,your plan seems to need WAL. > We must postpone to build indexes until the end of tuple moving > in vacuum. Once tuple moving started,the consistency between > heap and indexes would be broken. Currently(without WAL) this > inconsistency could never be recovered in case of rollback. Why? The same commit that makes the new index valid would make the tuple movements valid. Actually, the way VACUUM currently works, the tuple movements have been committed before we start freeing index entries anyway. (One reason VACUUM is so inefficient with indexes is that there is a peak index usage where there are index entries for *both* old and new tuple positions. I don't feel a need to change that, as long as the duplicate entries are in the old index that we're hoping to get rid of.) >> this transaction would atomically replace the index. (Vacuuming >> pg_index's own indexes this way might be a tad tricky though...) > ??? Don't pg_class and pg_attribute needs tricky handling either ? > Seems pg_class alone needs to be tricky when we use rel versioning. Could be. I think working through how we handle system tables and indexes is the key stumbling block we've got to get past to have versioning. I don't know quite how to do it, yet. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> Basically what I want here is to build the new index relation as > >> a new file (set of files, if large) and then atomically commit it > >> as the new version of the index. > > > Hmm,your plan seems to need WAL. > > We must postpone to build indexes until the end of tuple moving > > in vacuum. Once tuple moving started,the consistency between > > heap and indexes would be broken. Currently(without WAL) this > > inconsistency could never be recovered in case of rollback. > > Why? The same commit that makes the new index valid would make the > tuple movements valid. Oops,I rememered I wasn't correct. Certainly it's not so dangerous as I wrote. But there remains a possibilty that index tuples would point to cleaned heap blocks unless we delete index tuples for those heap blocks. Cleaned blocks would be reused by UPDATE operation. Regards. Hiroshi Inoue
Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I suggest that we change vacuum to only move remove tuples if there is > > more than 20% expired tuples. > > > When we do vacuum, we drop all indexes and recreate them. > > > This fixes the complaint about vacuum slowness when there are many > > expired rows in the table. We know this is causes by excessive index > > updates. It allows indexes to shrink (Jan pointed this out to me.) And > > it fixes the TOAST problem with TOAST values in indexes. > > We can't "drop and recreate" without a solution to the relation > versioning issue (unless you are prepared to accept a nonfunctional > database after a failure partway through index rebuild on a system > table). I think we should do this, but it's not all that simple... > > I do not see what your 20% idea has to do with this, though, nor > why it's a good idea. If I've told the thing to vacuum I think > it should vacuum. 20% of a big table could be a lot of megabytes, > and I don't want some arbitrary decision in the code about whether > I can reclaim that space or not. I can see some value in having a _configurable_ threshold %age of deletes before vacuum kicked in and attempted to shrink table/index on-disk file sizes. This would let the end-user decide, and 20% is probably a reasonable default, but if it isn't then changing a default is easier to do down the track. I can also see that it could be done with (perhaps) a modification to VACUUM syntax, say:VACUUM [VERBOSE] [SHRINK] ... And I believe that the whole thing will go better if ANALYZE is taken _out_ of vacuum, as was discussed on this list a month or two ago. Cheers, Andrew. -- _____________________________________________________________________ Andrew McMillan, e-mail: Andrew@cat-it.co.nz Catalyst IT Ltd, PO Box 10-225, Level 22, 105 The Terrace, Wellington Me: +64 (21) 635 694, Fax: +64 (4) 499 5596, Office: +64 (4) 499 2267
Tom Lane wrote: > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> We can't "drop and recreate" without a solution to the relation > >> versioning issue (unless you are prepared to accept a nonfunctional > >> database after a failure partway through index rebuild on a system > >> table). I think we should do this, but it's not all that simple... > > > Is this topic independent of WAL in the first place ? > > Sure, unless Vadim sees some clever way of using WAL to eliminate > the need for versioned relations. But as far as I've seen in the > discussions, versioned relations are independent of WAL. > > Basically what I want here is to build the new index relation as > a new file (set of files, if large) and then atomically commit it > as the new version of the index. What implicitly says we need to vacuum the toast relation AFTER beeing completely done with the indices - in contranst to what you said before. Otherwise, the old index (the active one) would still refer to entries thatdon't exist any more. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> I can see some value in having a _configurable_ threshold %age of > deletes before vacuum kicked in and attempted to shrink table/index > on-disk file sizes. This would let the end-user decide, and 20% is > probably a reasonable default, but if it isn't then changing a default > is easier to do down the track. > > I can also see that it could be done with (perhaps) a modification to > VACUUM syntax, say: > VACUUM [VERBOSE] [SHRINK] ... > > And I believe that the whole thing will go better if ANALYZE is taken > _out_ of vacuum, as was discussed on this list a month or two ago. The analayze process no longer locks the table exclusively. It will be made a separate command in 7.1, though an ANALYZE option will still be avaiable in VACUUM. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, 11 Jul 2000, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I suggest that we change vacuum to only move remove tuples if there is > > more than 20% expired tuples. > > > When we do vacuum, we drop all indexes and recreate them. > > > This fixes the complaint about vacuum slowness when there are many > > expired rows in the table. We know this is causes by excessive index > > updates. It allows indexes to shrink (Jan pointed this out to me.) And > > it fixes the TOAST problem with TOAST values in indexes. > > We can't "drop and recreate" without a solution to the relation > versioning issue (unless you are prepared to accept a nonfunctional > database after a failure partway through index rebuild on a system > table). I think we should do this, but it's not all that simple... > > I do not see what your 20% idea has to do with this, though, nor > why it's a good idea. If I've told the thing to vacuum I think > it should vacuum. 20% of a big table could be a lot of megabytes, > and I don't want some arbitrary decision in the code about whether > I can reclaim that space or not. I wouldn't mind seeing some automagic vacuum happen *if* >20% expired ... but don't understand the limit when I tell it to vacuum either ...
how about leaving vacuum as is, but extend REINDEX so that it drops/rebuilds all indices on a TABLE | DATABASE? Or does it do that now? From reading \h REINDEX, my thought is that it doesn't, but ... On Tue, 11 Jul 2000, Bruce Momjian wrote: > > We can't "drop and recreate" without a solution to the relation > > versioning issue (unless you are prepared to accept a nonfunctional > > database after a failure partway through index rebuild on a system > > table). I think we should do this, but it's not all that simple... > > > > I do not see what your 20% idea has to do with this, though, nor > > why it's a good idea. If I've told the thing to vacuum I think > > it should vacuum. 20% of a big table could be a lot of megabytes, > > and I don't want some arbitrary decision in the code about whether > > I can reclaim that space or not. > > Well, I think we should do a sequential scan before starting vacuum to > find the number of expired rows. > > Now that we are removing indexes, doing that to remove a few tuples is a > major waste. The user can not really know if the table is worth > vacuuming in normal use. They are just going to use the default. Now, > I think a FORCE option would be good, or the ability to change the 20% > default. > > Remember, commercial db's don't even return unused space if you remove > all the rows in a table. At least Informix doesn't and I am sure there > are others. I like vacuum, but let's not make it do major hurtles for > small gain. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> > I do not see what your 20% idea has to do with this, though, nor > > why it's a good idea. If I've told the thing to vacuum I think > > it should vacuum. 20% of a big table could be a lot of megabytes, > > and I don't want some arbitrary decision in the code about whether > > I can reclaim that space or not. > > I wouldn't mind seeing some automagic vacuum happen *if* >20% expired > ... but don't understand the limit when I tell it to vacuum either ... I am confused by your comment. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Wed, 12 Jul 2000, Bruce Momjian wrote: > > > I do not see what your 20% idea has to do with this, though, nor > > > why it's a good idea. If I've told the thing to vacuum I think > > > it should vacuum. 20% of a big table could be a lot of megabytes, > > > and I don't want some arbitrary decision in the code about whether > > > I can reclaim that space or not. > > > > I wouldn't mind seeing some automagic vacuum happen *if* >20% expired > > ... but don't understand the limit when I tell it to vacuum either ... > > I am confused by your comment. Make the backend reasonably intelligent ... periodically do a scan, as you've suggested would be required for your above 20% idea, and if >20% are expired records, auto-start a vacuum (settable, of course) ...
> On Wed, 12 Jul 2000, Bruce Momjian wrote: > > > > > I do not see what your 20% idea has to do with this, though, nor > > > > why it's a good idea. If I've told the thing to vacuum I think > > > > it should vacuum. 20% of a big table could be a lot of megabytes, > > > > and I don't want some arbitrary decision in the code about whether > > > > I can reclaim that space or not. > > > > > > I wouldn't mind seeing some automagic vacuum happen *if* >20% expired > > > ... but don't understand the limit when I tell it to vacuum either ... > > > > I am confused by your comment. > > Make the backend reasonably intelligent ... periodically do a scan, as > you've suggested would be required for your above 20% idea, and if >20% > are expired records, auto-start a vacuum (settable, of course) ... Would be good if we could to vacuum without locking. We could find a table when things are mostly idle, and it then. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Wed, 12 Jul 2000, Bruce Momjian wrote: > > On Wed, 12 Jul 2000, Bruce Momjian wrote: > > > > > > > I do not see what your 20% idea has to do with this, though, nor > > > > > why it's a good idea. If I've told the thing to vacuum I think > > > > > it should vacuum. 20% of a big table could be a lot of megabytes, > > > > > and I don't want some arbitrary decision in the code about whether > > > > > I can reclaim that space or not. > > > > > > > > I wouldn't mind seeing some automagic vacuum happen *if* >20% expired > > > > ... but don't understand the limit when I tell it to vacuum either ... > > > > > > I am confused by your comment. > > > > Make the backend reasonably intelligent ... periodically do a scan, as > > you've suggested would be required for your above 20% idea, and if >20% > > are expired records, auto-start a vacuum (settable, of course) ... > > Would be good if we could to vacuum without locking. We could find a > table when things are mostly idle, and it then. Definitely :)
> -----Original Message----- > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On > Behalf Of The Hermit Hacker > > how about leaving vacuum as is, but extend REINDEX so that it > drops/rebuilds all indices on a TABLE | DATABASE? Or does it do that > now? From reading \h REINDEX, my thought is that it doesn't, but ... > As for user tables,REINDEX could do it already,i.e REINDEX TABLE table_name FORCE; is possible under psql. If REINDEX fails,PostgreSQL just ignores the indexes of the table (i.e Indexscan is never applied) and REINDEX/VACUUM would recover the state. Yes,VACUUM already has a hidden functionality to reindex. As for system indexes,you must shutdown postmaster and invoke standalone postgres with -P option. REINDEX DATABASE database_name FORCE; would reindex(shrink) all system tables of the database. It may be possible even under postmaster if REINDEX never fails. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Hiroshi Inoue wrote: > > -----Original Message----- > > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On > > Behalf Of The Hermit Hacker > > > > how about leaving vacuum as is, but extend REINDEX so that it > > drops/rebuilds all indices on a TABLE | DATABASE? Or does it do that > > now? From reading \h REINDEX, my thought is that it doesn't, but ... > > > > As for user tables,REINDEX could do it already,i.e > REINDEX TABLE table_name FORCE; is possible under psql. > If REINDEX fails,PostgreSQL just ignores the indexes of the table > (i.e Indexscan is never applied) and REINDEX/VACUUM would > recover the state. Yes,VACUUM already has a hidden functionality > to reindex. Sorry, but there seem to be problems with that. pgsql=# delete from t2; DELETE 0 pgsql=# vacuum; VACUUM pgsql=# reindex table t2 force; REINDEX pgsql=# \c You are now connected to database pgsql as user pgsql. pgsql=# insert into t2 select* from t1; FATAL 1: btree: failed to add item to the page pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processingthe request. Happens too if I don't reconnect to the database between REINDEX and INSERT. Also if I drop connection and restart postmaster, so it shouldn't belong to old blocks hanging aroung in the cache. The interesting thing is that the btree index get's reset to 2 blocks. Need to dive into... Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> -----Original Message----- > From: Jan Wieck [mailto:JanWieck@t-online.de] > > Hiroshi Inoue wrote: > > > -----Original Message----- > > > From: pgsql-hackers-owner@hub.org > [mailto:pgsql-hackers-owner@hub.org]On > > > Behalf Of The Hermit Hacker > > > > > > how about leaving vacuum as is, but extend REINDEX so that it > > > drops/rebuilds all indices on a TABLE | DATABASE? Or does it do that > > > now? From reading \h REINDEX, my thought is that it doesn't, but ... > > > > > > > As for user tables,REINDEX could do it already,i.e > > REINDEX TABLE table_name FORCE; is possible under psql. > > If REINDEX fails,PostgreSQL just ignores the indexes of the table > > (i.e Indexscan is never applied) and REINDEX/VACUUM would > > recover the state. Yes,VACUUM already has a hidden functionality > > to reindex. > > Sorry, but there seem to be problems with that. > > pgsql=# delete from t2; > DELETE 0 > pgsql=# vacuum; > VACUUM > pgsql=# reindex table t2 force; > REINDEX > pgsql=# \c > You are now connected to database pgsql as user pgsql. > pgsql=# insert into t2 select * from t1; > FATAL 1: btree: failed to add item to the page > pqReadData() -- backend closed the channel unexpectedly. > This probably means the backend terminated abnormally > before or while processing the request. > > Happens too if I don't reconnect to the database between > REINDEX and INSERT. Also if I drop connection and restart > postmaster, so it shouldn't belong to old blocks hanging > aroung in the cache. > > The interesting thing is that the btree index get's reset to > 2 blocks. Need to dive into... > Hmm,couldn't reproduce it here. What kind of indexes t2 have ? Anyway the index get's reset to 2 blocks seems reasonable because t2 is empty. Regards. Hiroshi Inoue Inoue@tpf.co.jp