Thread: Why vacuum?
I noticed the other day that one of my pg databases was slow, so I ran vacuum on it, which brought a question to mind: why the need? I looked at my oracle server and we aren't doing anything of the sort (that I can find), so why does pg need it? Any info? Thanks, - brandon b. palmer, bpalmer@crimelabs.net pgp: www.crimelabs.net/bpalmer.pgp5
El Mié 13 Dic 2000 16:41, bpalmer escribió: > I noticed the other day that one of my pg databases was slow, so I ran > vacuum on it, which brought a question to mind: why the need? I looked > at my oracle server and we aren't doing anything of the sort (that I can > find), so why does pg need it? Any info? I know nothing about Oracle, but I can tell you that Informix has an update statistics, which I don't know if it's similar to vacuum, but.... What vacuum does is clean the database from rows that were left during updates and deletes, non the less, the tables get shrincked, so searches get faster. Saludos... :-) -- System Administration: It's a dirty job, but someone told I had to do it. ----------------------------------------------------------------- Martín Marqués email: martin@math.unl.edu.ar Santa Fe - Argentina http://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -----------------------------------------------------------------
* Martin A. Marques <martin@math.unl.edu.ar> [001213 15:15] wrote: > El Mié 13 Dic 2000 16:41, bpalmer escribió: > > I noticed the other day that one of my pg databases was slow, so I ran > > vacuum on it, which brought a question to mind: why the need? I looked > > at my oracle server and we aren't doing anything of the sort (that I can > > find), so why does pg need it? Any info? > > I know nothing about Oracle, but I can tell you that Informix has an update > statistics, which I don't know if it's similar to vacuum, but.... > What vacuum does is clean the database from rows that were left during > updates and deletes, non the less, the tables get shrincked, so searches get > faster. Yes, postgresql requires vacuum quite often otherwise queries and updates start taking ungodly amounts of time to complete. If you're having problems because vacuum locks up your tables for too long you might want to check out: http://people.freebsd.org/~alfred/vacfix/ It has some tarballs that have patches to speed up vacuum depending on how you access your tables you can see up to a 20x reduction in vacuum time. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
bpalmer wrote: > > I noticed the other day that one of my pg databases was slow, so I ran > vacuum on it, which brought a question to mind: why the need? I looked > at my oracle server and we aren't doing anything of the sort (that I can > find), so why does pg need it? Any info? Hi, I'm one of the people beeing slightly bitten by the current vacuum behaviour :), so i take the chance to add my suggestions to this question. FWIW, my thought is about a vacuumer process that, in background, scans each table for available blocks (for available I mean a block full of deleted rows whose tid is commited) and fills a cache of those blocks available to the backends. Whenever a backend needs to allocate a new block it looks for a free block in the cache, if it finds any, it can use it, else it proceeds as usual appending the block at the tail. The vacuumer would run with a very low priority, so that it doesn't suck precious CPU and I/O when the load on the machine is high. A small flag on each table would avoid the vacuumer to scan the table if no empty block is found and no tuple has been deleted. Ok, now tell me where this is badly broken :)) Just my .02 euro :) Bye! -- Daniele Orlandi
I have this nasty problem too, in early time, I don't know the problem, but we used it for a while, than we found our table growing too fast without insert any record( we use update), this behaviour most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record space and full fill your hard disk after several hours, the nasty vaccum block any other users to operate on table, this is a big problem for a large table, because it will block tooo long to let other user to run query. we have a project affected by this problem, and sadly we decide to use closure source database- SYBASE on linux, wehavn't any other selections. :( note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds, not block any other user, this is pretty. ya, what's good technology! XuYifeng ----- Original Message ----- From: Martin A. Marques <martin@math.unl.edu.ar> To: bpalmer <bpalmer@crimelabs.net>; <pgsql-hackers@postgresql.org> Sent: Thursday, December 14, 2000 7:08 AM Subject: Re: [HACKERS] Why vacuum? El Mié 13 Dic 2000 16:41, bpalmer escribió: > I noticed the other day that one of my pg databases was slow, so I ran > vacuum on it, which brought a question to mind: why the need? I looked > at my oracle server and we aren't doing anything of the sort (that I can > find), so why does pg need it? Any info? I know nothing about Oracle, but I can tell you that Informix has an update statistics, which I don't know if it's similar to vacuum, but.... What vacuum does is clean the database from rows that were left during updates and deletes, non the less, the tables get shrincked, so searches get faster. Saludos... :-) -- System Administration: It's a dirty job, but someone told I had to do it. ----------------------------------------------------------------- Martín Marqués email: martin@math.unl.edu.ar Santa Fe - Argentina http://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -----------------------------------------------------------------
I have this nasty problem too, in early time, I don't know the problem, but we used it for a while, than we found our table growing too fast without insert any record( we use update), this behaviour most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record space and full fill your hard disk after several hours, the nasty vaccum block any other users to operate on table, this is a big problem for a large table, because it will block tooo long to let other user to run query. we have a project affected by this problem, and sadly we decide to use closure source database- SYBASE on linux, wehavn't any other selections. :( note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds, not block any other user, this is pretty. ya, what's good technology! XuYifeng ----- Original Message ----- From: Martin A. Marques <martin@math.unl.edu.ar> To: bpalmer <bpalmer@crimelabs.net>; <pgsql-hackers@postgresql.org> Sent: Thursday, December 14, 2000 7:08 AM Subject: Re: [HACKERS] Why vacuum? El Mié 13 Dic 2000 16:41, bpalmer escribió: > I noticed the other day that one of my pg databases was slow, so I ran > vacuum on it, which brought a question to mind: why the need? I looked > at my oracle server and we aren't doing anything of the sort (that I can > find), so why does pg need it? Any info? I know nothing about Oracle, but I can tell you that Informix has an update statistics, which I don't know if it's similar to vacuum, but.... What vacuum does is clean the database from rows that were left during updates and deletes, non the less, the tables get shrincked, so searches get faster. Saludos... :-) -- System Administration: It's a dirty job, but someone told I had to do it. ----------------------------------------------------------------- Martín Marqués email: martin@math.unl.edu.ar Santa Fe - Argentina http://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -----------------------------------------------------------------
* xuyifeng <xyf@stocke.com> [001213 18:54] wrote: > I have this nasty problem too, in early time, I don't know the problem, but we used it for a while, > than we found our table growing too fast without insert any record( we use update), this behaviour > most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record > space and full fill your hard disk after several hours, the nasty vaccum block any other users to operate > on table, this is a big problem for a large table, because it will block tooo long to let other user to run > query. we have a project affected by this problem, and sadly we decide to use closure source database > - SYBASE on linux, we havn't any other selections. :( > > note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds, > not block any other user, this is pretty. ya, what's good technology! http://people.freebsd.org/~alfred/vacfix/ -Alfred
> Yes, postgresql requires vacuum quite often otherwise queries and > updates start taking ungodly amounts of time to complete. If you're > having problems because vacuum locks up your tables for too long > you might want to check out: But why? I don't know of other databases that need to be 'vacuum'ed. Do all others just do it internaly on a regular basis? What am I missing here? b. palmer, bpalmer@crimelabs.net pgp: www.crimelabs.net/bpalmer.pgp5
On Wed, 13 Dec 2000, bpalmer wrote: > > Yes, postgresql requires vacuum quite often otherwise queries and > > updates start taking ungodly amounts of time to complete. If you're > > having problems because vacuum locks up your tables for too long > > you might want to check out: > > But why? I don't know of other databases that need to be 'vacuum'ed. Do > all others just do it internaly on a regular basis? > > What am I missing here? PgSQL's storage manager is currently such that it doesn't overwrite 'deleted' records, but just keeps appending to the end of the table ... so, for instance, a client of ours whose table had 5 records in it that are updated *alot* grew a table to 64Meg that only contains ~8k worth of data ... vacuum'ng cleans out the cruft and truncates the file ... vadim, for v7.2, is planning on re-writing the storage manager to do proper overwriting of deleted space, which will reduce the requirement for vacuum to almost never ...
> But why? I don't know of other databases that need to be 'vacuum'ed. Do > all others just do it internaly on a regular basis? > > What am I missing here? Plenty of other databases need to be 'vacuumed'. For instance, if you have an ms access database with 5 MB of data in it, and then delete all the data, leaving only the forms, etc - you will be left with a 5MB mdb file still! If you then run 'Compact Database' (which is another word for 'vacuum'), the mdb file will be reduced down to 500k... Chris
On Thu, 14 Dec 2000, Christopher Kings-Lynne wrote: > Plenty of other databases need to be 'vacuumed'. For instance, if you have > an ms access database with 5 MB of data in it, and then delete all the data, > leaving only the forms, etc - you will be left with a 5MB mdb file still! > > If you then run 'Compact Database' (which is another word for 'vacuum'), the > mdb file will be reduced down to 500k... Ooh... Hope MS Access isn't going to be taken seriously as a benchmark here :-). The same is also true of MapInfo, by the way, but I'm not holding that up as a benchmark either ;-). > Chris Tim -- ----------------------------------------------- Tim Allen tim@proximity.com.au Proximity Pty Ltd http://www.proximity.com.au/ http://www4.tpg.com.au/users/rita_tim/
Tim Allen wrote: > > On Thu, 14 Dec 2000, Christopher Kings-Lynne wrote: > > > Plenty of other databases need to be 'vacuumed'. For instance, if you have > > an ms access database with 5 MB of data in it, and then delete all the data, > > leaving only the forms, etc - you will be left with a 5MB mdb file still! > > > > If you then run 'Compact Database' (which is another word for 'vacuum'), the > > mdb file will be reduced down to 500k... > > Ooh... Hope MS Access isn't going to be taken seriously as a benchmark > here :-). The same is also true of MapInfo, by the way, but I'm not > holding that up as a benchmark either ;-). :-) I think that the non-overwriting storage manager actually bought a lot more for PostgreSQL than it does for MS Access. In earlier versions of PostgreSQL it was possible to "time travel" your database and so run your query agains the database as it was at a particular time / date. This advanced feature turns out to be useful in very few situations, and is very expensive in terms of storage. Still, "if it works, don't fix it" also applies. The PostgreSQL storage manager is quite efficient as it is now, and most of us do have quiet periods when we can safely vacuum the database, which is why it has had to wait until now. This will be quite a big change for 7.2, and getting the performance right will no doubt challenge these hackers whom we are all greatly indebted to. Cheers, Andrew. -- _____________________________________________________________________ Andrew McMillan, e-mail: Andrew@catalyst.net.nz Catalyst IT Ltd, PO Box 10-225, Level 22, 105 The Terrace, Wellington Me: +64 (21) 635 694, Fax: +64 (4) 499 5596, Office: +64 (4) 499 2267
The Hermit Hacker wrote: > > On Wed, 13 Dec 2000, bpalmer wrote: > > > > Yes, postgresql requires vacuum quite often otherwise queries and > > > updates start taking ungodly amounts of time to complete. If you're > > > having problems because vacuum locks up your tables for too long > > > you might want to check out: > > > > But why? I don't know of other databases that need to be 'vacuum'ed. Do > > all others just do it internaly on a regular basis? > > > > What am I missing here? > > PgSQL's storage manager is currently such that it doesn't overwrite > 'deleted' records, but just keeps appending to the end of the table > ... so, for instance, a client of ours whose table had 5 records in it > that are updated *alot* grew a table to 64Meg that only contains ~8k worth > of data ... > > vacuum'ng cleans out the cruft and truncates the file ... > > vadim, for v7.2, is planning on re-writing the storage manager to do > proper overwriting of deleted space, which will reduce the requirement for > vacuum to almost never ... I hope that he does it in a way that allows it to retain the old behaviour for some tables if there is need for it. Also, vacuum and analyze should be separated (i.e. one should be able to analyze a table without vacuuming it.) Maybe use "ALTER TABLE/DATABASE UPDATE STATISTICS" for VACUUM ANALYZE as syntax. Time travel is/was an useful feature that is difficult to emulate efficiently using "other" means like rules/triggers ------------ Hannu
Hello, Another question about vacuum. Will vacuum/drop/create deadlocks be fixed in 7.0.x branch? That's really annoying. I cannot run vacuum automatically due to this. Just a patch will be really great. Is it so hard to fix? -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
El Mié 13 Dic 2000 22:24, xuyifeng escribió: > I have this nasty problem too, in early time, I don't know the problem, > but we used it for a while, than we found our table growing too fast > without insert any record( we use update), this behaviour most like M$ > MSACCESS database I had used a long time ago which don't reuse deleted > record space and full fill your hard disk after several hours, the nasty > vaccum block any other users to operate on table, this is a big problem > for a large table, because it will block tooo long to let other user to run > query. we have a project affected by this problem, and sadly we decide to > use closure source database - SYBASE on linux, we havn't any other > selections. :( > > note that SYBASE and Informix both have 'update statistics' command, but > they run it fast in seconds, not block any other user, this is pretty. ya, > what's good technology! I have to say that 'update statistics' does not take a few seconds if the databases have grownto be a bit large. At least thats what I have seen. Saludos... :-) -- System Administration: It's a dirty job, but someone told I had to do it. ----------------------------------------------------------------- Martín Marqués email: martin@math.unl.edu.ar Santa Fe - Argentina http://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -----------------------------------------------------------------
On Thu, Dec 14, 2000 at 01:16:20PM +0000, Hannu Krosing wrote: > The Hermit Hacker wrote: <snip> > > vadim, for v7.2, is planning on re-writing the storage manager to do > > proper overwriting of deleted space, which will reduce the requirement for > > vacuum to almost never ... > > I hope that he does it in a way that allows it to retain the old > behaviour > for some tables if there is need for it. Here as well. The framework is still mostly there for multiple storage managers: I hope Vadim takes advantage of it. <snip> > Time travel is/was an useful feature that is difficult to emulate > efficiently using "other" means like rules/triggers I've actually been doing this very thing this week. It's not _that_ horibble, but does interact really poorly with RI constraints: suddenly, all those unique PK columns aren't so unique! This is probably the biggest reason to do time travel in the backend. Having it on a per-table basis would be cool. Hmm, seems the biggest problem to doing it per table would be needing a couple optional system attributes (e.g. tt_start and tt_stop), and different indices, that know how to skip not-current tuples. Extra syntax to do queries at a particular time in the past would be nice, but not an inital requirement. Sounds like there's something in common here with the per tuple CRC discusson, as well as optional OID: a generic need for optional system attributes. Ross -- Open source code is like a natural resource, it's the result of providing food and sunshine to programmers, and then staying out of their way. [...] [It] is not going away because it has utility for both the developers and users independent of economic motivations. Jim Flynn, Sunnyvale, Calif.
"Martin A. Marques" wrote: > > El Mié 13 Dic 2000 16:41, bpalmer escribió: > > I noticed the other day that one of my pg databases was slow, so I ran > > vacuum on it, which brought a question to mind: why the need? I looked > > at my oracle server and we aren't doing anything of the sort (that I can > > find), so why does pg need it? Any info? > > I know nothing about Oracle, but I can tell you that Informix has an update > statistics, which I don't know if it's similar to vacuum, but.... > What vacuum does is clean the database from rows that were left during > updates and deletes, non the less, the tables get shrincked, so searches get > faster. > While I would like Postgres to perform statistics, one and a while, on it own. I like vacuum in general. I would rather trade unused disk space for performace. The last thing you need during high loads is the database thinking that it is time to clean up. -- http://www.mohawksoft.com
* mlw <markw@mohawksoft.com> [001214 09:30] wrote: > "Martin A. Marques" wrote: > > > > El Mié 13 Dic 2000 16:41, bpalmer escribió: > > > I noticed the other day that one of my pg databases was slow, so I ran > > > vacuum on it, which brought a question to mind: why the need? I looked > > > at my oracle server and we aren't doing anything of the sort (that I can > > > find), so why does pg need it? Any info? > > > > I know nothing about Oracle, but I can tell you that Informix has an update > > statistics, which I don't know if it's similar to vacuum, but.... > > What vacuum does is clean the database from rows that were left during > > updates and deletes, non the less, the tables get shrincked, so searches get > > faster. > > > > While I would like Postgres to perform statistics, one and a while, on > it own. I like vacuum in general. > > I would rather trade unused disk space for performace. The last thing > you need during high loads is the database thinking that it is time to > clean up. Even worse is having to scan a file that has grown 20x the size because you havne't vacuum'd in a while. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."