Thread: vacuum takes too long
Hi, insert a few row in a table (>50000) and do delete from mytable; vacuum verbose analyze; Why is this that slow? [I am not on hackers@postgreSQL.org] --- _ _ _(_)(_)_ David Wetzel, Turbocat's Development, (_) __ (_) Buchhorster Strasse, D-16567 Muehlenbeck/Berlin, FRG, _/ \_ Fax +49 33056 82835 NeXTmail dave@turbocat.de (______) http://www.turbocat.de/ DEVELOPMENT * CONSULTING * ADMINISTRATION WATCH OUT FOR TURBOFAX for OPENSTEP!
> Hi, > > insert a few row in a table (>50000) and do > delete from mytable; > vacuum verbose analyze; > > Why is this that slow? Analyze checks every column in every row. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> From: Bruce Momjian <maillist@candle.pha.pa.us> > > insert a few row in a table (>50000) and do > > delete from mytable; > > vacuum verbose analyze; > > > > Why is this that slow? > > Analyze checks every column in every row. even if you only type "vacuum verbose" it takes _very_ long. I deleted _all_ records with "delete from mytable;" before. A drop and a new create is faster. But what is when you delete (maybe 100000) rows but keep 100 in the table? I use 6.4.2 on NetBSD/i486 (that box makes gets 12MBytes/sec via the filesystem out of the drives) --- _ _ _(_)(_)_ David Wetzel, Turbocat's Development, (_) __ (_) Buchhorster Strasse, D-16567 Muehlenbeck/Berlin, FRG, _/ \_ Fax +49 33056 82835 NeXTmail dave@turbocat.de (______) http://www.turbocat.de/ DEVELOPMENT * CONSULTING * ADMINISTRATION
> > From: Bruce Momjian <maillist@candle.pha.pa.us> > > > > insert a few row in a table (>50000) and do > > > delete from mytable; > > > vacuum verbose analyze; > > > > > > Why is this that slow? > > > > Analyze checks every column in every row. > > even if you only type "vacuum verbose" it takes _very_ long. > > I deleted _all_ records with "delete from mytable;" before. > A drop and a new create is faster. But what is when you delete (maybe > 100000) rows but keep 100 in the table? > > I use 6.4.2 on NetBSD/i486 (that box makes gets 12MBytes/sec via the > filesystem out of the drives) Not sure what to say. Vacuum does take a while, and it is often faster to drop and recreate. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, 6 Jan 1999, Bruce Momjian wrote: > > > From: Bruce Momjian <maillist@candle.pha.pa.us> > > > > > > insert a few row in a table (>50000) and do > > > > delete from mytable; > > > > vacuum verbose analyze; > > > > > > > > Why is this that slow? > > > > > > Analyze checks every column in every row. > > > > even if you only type "vacuum verbose" it takes _very_ long. > > > > I deleted _all_ records with "delete from mytable;" before. > > A drop and a new create is faster. But what is when you delete (maybe > > 100000) rows but keep 100 in the table? > > > > I use 6.4.2 on NetBSD/i486 (that box makes gets 12MBytes/sec via the > > filesystem out of the drives) > > Not sure what to say. Vacuum does take a while, and it is often faster > to drop and recreate. Let's ignore the 'analyze' part first...take a simple 'vacuum' command...what takes the longest? My understanding is a vacuum simplistically, takes and moves all rows "up" in the file to fill in any blanks resulting from updates and deletes, then truncates the end of the file... If so, is there no way of having vacuum running on its own? Basically, if my understanding is remotely correct, vaccum is defragmenting the table...so why can't the defragmenting be performed during idle time...or, at least some of it. Start at the top of the table, go to the first 'blank' section (a deleted record)...find the next good record that will fit in the space, move it there...clear out the old space, etc... if dba issues a 'vacuum', lock the table and do all records at once, but otherwise try and vacuum the table live... With the new MVCC serialization, this concept should be less intrusive on readers, no? Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> Start at the top of the table, go to the first 'blank' section (a deleted > record)...find the next good record that will fit in the space, move it > there...clear out the old space, etc... > > if dba issues a 'vacuum', lock the table and do all records at once, but > otherwise try and vacuum the table live... > > With the new MVCC serialization, this concept should be less intrusive on > readers, no? Wish I knew the answer. I can guess, but that isn't going to help. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, 6 Jan 1999, Bruce Momjian wrote: > > Start at the top of the table, go to the first 'blank' section (a deleted > > record)...find the next good record that will fit in the space, move it > > there...clear out the old space, etc... > > > > if dba issues a 'vacuum', lock the table and do all records at once, but > > otherwise try and vacuum the table live... > > > > With the new MVCC serialization, this concept should be less intrusive on > > readers, no? > > Wish I knew the answer. I can guess, but that isn't going to help. Guess == throwing in ideas, even if they are incorrect...the way I figure it, I through out alot of guesses...some of them spark ideas in others and we see some really neat ideas come out of it :) Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> > Wish I knew the answer. I can guess, but that isn't going to help. > > Guess == throwing in ideas, even if they are incorrect...the way I > figure it, I through out alot of guesses...some of them spark ideas in > others and we see some really neat ideas come out of it :) Yes. Good. I have been bugging Vadim about possible row reuse, but I don't know enough to understand the options. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, 6 Jan 1999, Bruce Momjian wrote: > > > Wish I knew the answer. I can guess, but that isn't going to help. > > > > Guess == throwing in ideas, even if they are incorrect...the way I > > figure it, I through out alot of guesses...some of them spark ideas in > > others and we see some really neat ideas come out of it :) > > Yes. Good. I have been bugging Vadim about possible row reuse, but I > don't know enough to understand the options. I'm not sure about the row-reuse thing. What sort of performance hit will it have. As it is now, you add a row by zipping down to the end, add the row...bang, finished. with row-reuse, you have to search for a good fit, which could take time... Hrmmm...let's look at Oracle's "model"...bear in mind that I haven't dived very deep into it, so I could be totally off base here, but, with Oracle, you have a seperate "group" of processes started up for each 'instance', where, if I'm correct, an instance is the same as our database(?)... How hard would it be for us to implement something similar? When you start up the postmaster, it starts up 1 postgres "master process" for each database that it knows about. The point of the master process is effectively the garbage collector for the database, as well as the central 'traffic cop'... so, for example, I have 4 databases on my server...when you start up the system with your normal 'postmaster' process, it forks off 4 processes, one for each database. When you connect to port #### for database XXXX, the listening process (main postmaster) shunts the process over to the appropriate 'traffic cop' for handling... The 'traffic cop' would keep track of the number of connections to the database are currently open, and when zero, which woudl indicate idle time, process through a table in the database to clean it up. As soon as a new connection comes in, it would "finish" its cleanup by making sure the table is in a 'sane state' (ie. finish up with its current record) and then fork off the process, to wait quietly until its idle again... Then each database could effectively have their own shared memory pool that could be adjusted on a per database basis. Maybe even add a 'change threshold', where after X transactions (update, insert or delete), the table gets auto-vacuum'd (no analyze, just vacuum)...the threshold could be set on a per-table basis...the 'traffic cop' should be able to easily keep track of those sort of stats internally...no? Hell, the 'traffic cop' *should* be able to keep reasonably accurate stats to update the same tables that a 'vacuum analyze' maintains, adjusting those values periodically to give a semi-accurate picture. Periodically, a normal 'analyze' would have to be run... Its a thought...haven't got a clue as to the complexity of implementing, but...*shrug* Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Added to TODO: * Create a background process for each database that runs while database is idle, finding superceeded rows, gathering stats and vacuuming > On Wed, 6 Jan 1999, Bruce Momjian wrote: > > > > > Wish I knew the answer. I can guess, but that isn't going to help. > > > > > > Guess == throwing in ideas, even if they are incorrect...the way I > > > figure it, I through out alot of guesses...some of them spark ideas in > > > others and we see some really neat ideas come out of it :) > > > > Yes. Good. I have been bugging Vadim about possible row reuse, but I > > don't know enough to understand the options. > > I'm not sure about the row-reuse thing. What sort of performance hit will > it have. As it is now, you add a row by zipping down to the end, add the > row...bang, finished. with row-reuse, you have to search for a good fit, > which could take time... > > Hrmmm...let's look at Oracle's "model"...bear in mind that I haven't dived > very deep into it, so I could be totally off base here, but, with Oracle, > you have a seperate "group" of processes started up for each 'instance', > where, if I'm correct, an instance is the same as our database(?)... > > How hard would it be for us to implement something similar? When you > start up the postmaster, it starts up 1 postgres "master process" for each > database that it knows about. The point of the master process is > effectively the garbage collector for the database, as well as the central > 'traffic cop'... > > so, for example, I have 4 databases on my server...when you start up the > system with your normal 'postmaster' process, it forks off 4 processes, > one for each database. When you connect to port #### for database XXXX, > the listening process (main postmaster) shunts the process over to the > appropriate 'traffic cop' for handling... > > The 'traffic cop' would keep track of the number of connections to the > database are currently open, and when zero, which woudl indicate idle > time, process through a table in the database to clean it up. As soon as > a new connection comes in, it would "finish" its cleanup by making sure > the table is in a 'sane state' (ie. finish up with its current record) and > then fork off the process, to wait quietly until its idle again... > > Then each database could effectively have their own shared memory pool > that could be adjusted on a per database basis. Maybe even add a 'change > threshold', where after X transactions (update, insert or delete), the > table gets auto-vacuum'd (no analyze, just vacuum)...the threshold could > be set on a per-table basis...the 'traffic cop' should be able to easily > keep track of those sort of stats internally...no? > > Hell, the 'traffic cop' *should* be able to keep reasonably accurate stats > to update the same tables that a 'vacuum analyze' maintains, adjusting > those values periodically to give a semi-accurate picture. Periodically, > a normal 'analyze' would have to be run... > > Its a thought...haven't got a clue as to the complexity of implementing, > but...*shrug* > > Marc G. Fournier > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026