Thread: Why vacuum?

Why vacuum?

From
bpalmer
Date:
I noticed the other day that one of my pg databases was slow,  so I ran
vacuum on it,  which brought a question to mind:  why the need?  I looked
at my oracle server and we aren't doing anything of the sort (that I can
find),  so why does pg need it?  Any info?

Thanks,
- brandon


b. palmer,  bpalmer@crimelabs.net
pgp:  www.crimelabs.net/bpalmer.pgp5



Re: Why vacuum?

From
"Martin A. Marques"
Date:
El Mié 13 Dic 2000 16:41, bpalmer escribió:
> I noticed the other day that one of my pg databases was slow,  so I ran
> vacuum on it,  which brought a question to mind:  why the need?  I looked
> at my oracle server and we aren't doing anything of the sort (that I can
> find),  so why does pg need it?  Any info?

I know nothing about Oracle, but I can tell you that Informix has an update 
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during 
updates and deletes, non the less, the tables get shrincked, so searches get 
faster.

Saludos... :-)

-- 
System Administration: It's a dirty job, 
but someone told I had to do it.
-----------------------------------------------------------------
Martín Marqués            email:     martin@math.unl.edu.ar
Santa Fe - Argentina        http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------


Re: Why vacuum?

From
Alfred Perlstein
Date:
* Martin A. Marques <martin@math.unl.edu.ar> [001213 15:15] wrote:
> El Mié 13 Dic 2000 16:41, bpalmer escribió:
> > I noticed the other day that one of my pg databases was slow,  so I ran
> > vacuum on it,  which brought a question to mind:  why the need?  I looked
> > at my oracle server and we aren't doing anything of the sort (that I can
> > find),  so why does pg need it?  Any info?
> 
> I know nothing about Oracle, but I can tell you that Informix has an update 
> statistics, which I don't know if it's similar to vacuum, but....
> What vacuum does is clean the database from rows that were left during 
> updates and deletes, non the less, the tables get shrincked, so searches get 
> faster.

Yes, postgresql requires vacuum quite often otherwise queries and
updates start taking ungodly amounts of time to complete.  If you're
having problems because vacuum locks up your tables for too long
you might want to check out:

http://people.freebsd.org/~alfred/vacfix/

It has some tarballs that have patches to speed up vacuum depending
on how you access your tables you can see up to a 20x reduction in
vacuum time.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Why vacuum?

From
Daniele Orlandi
Date:
bpalmer wrote:
> 
> I noticed the other day that one of my pg databases was slow,  so I ran
> vacuum on it,  which brought a question to mind:  why the need?  I looked
> at my oracle server and we aren't doing anything of the sort (that I can
> find),  so why does pg need it?  Any info?

Hi,

I'm one of the people beeing slightly bitten by  the current vacuum
behaviour :), so i take the chance to add my suggestions to this
question.

FWIW, my thought is about a vacuumer process that, in background, scans
each table for available blocks (for available I mean a block full of
deleted rows whose tid is commited) and fills a cache of those blocks
available to the backends.

Whenever a backend needs to allocate a new block it looks for a free
block in the cache, if it finds any, it can use it, else it proceeds as
usual appending the block at the tail.

The vacuumer would run with a very low priority, so that it doesn't suck
precious CPU and I/O when the load on the machine is high.

A small flag on each table would avoid the vacuumer to scan the table if
no empty block is found and no tuple has been deleted.

Ok, now tell me where this is badly broken :))

Just my .02 euro :)

Bye!

-- Daniele Orlandi


Re: Why vacuum?

From
"xuyifeng"
Date:
I have this nasty problem too,  in early time, I don't know the problem, but we used it for a while,
than we found our table growing too fast without insert any record( we use update), this behaviour 
most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record 
space and full fill your hard disk after several hours,  the nasty vaccum block any other users to operate
on table,  this is a big problem for a large table, because it will block tooo long to let other user to run
query. we have a project affected by this problem, and sadly we decide to use closure source database- SYBASE on linux,
wehavn't any other selections. :(
 

note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds,
not block any other user, this is pretty. ya, what's good technology!

XuYifeng

----- Original Message ----- 
From: Martin A. Marques <martin@math.unl.edu.ar>
To: bpalmer <bpalmer@crimelabs.net>; <pgsql-hackers@postgresql.org>
Sent: Thursday, December 14, 2000 7:08 AM
Subject: Re: [HACKERS] Why vacuum?


El Mié 13 Dic 2000 16:41, bpalmer escribió:
> I noticed the other day that one of my pg databases was slow,  so I ran
> vacuum on it,  which brought a question to mind:  why the need?  I looked
> at my oracle server and we aren't doing anything of the sort (that I can
> find),  so why does pg need it?  Any info?

I know nothing about Oracle, but I can tell you that Informix has an update 
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during 
updates and deletes, non the less, the tables get shrincked, so searches get 
faster.

Saludos... :-)

-- 
System Administration: It's a dirty job, 
but someone told I had to do it.
-----------------------------------------------------------------
Martín Marqués email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------



Re: Why vacuum?

From
"xuyifeng"
Date:
I have this nasty problem too,  in early time, I don't know the problem, but we used it for a while,
than we found our table growing too fast without insert any record( we use update), this behaviour 
most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record 
space and full fill your hard disk after several hours,  the nasty vaccum block any other users to operate
on table,  this is a big problem for a large table, because it will block tooo long to let other user to run
query. we have a project affected by this problem, and sadly we decide to use closure source database- SYBASE on linux,
wehavn't any other selections. :(
 

note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds,
not block any other user, this is pretty. ya, what's good technology!

XuYifeng

----- Original Message ----- 
From: Martin A. Marques <martin@math.unl.edu.ar>
To: bpalmer <bpalmer@crimelabs.net>; <pgsql-hackers@postgresql.org>
Sent: Thursday, December 14, 2000 7:08 AM
Subject: Re: [HACKERS] Why vacuum?


El Mié 13 Dic 2000 16:41, bpalmer escribió:
> I noticed the other day that one of my pg databases was slow,  so I ran
> vacuum on it,  which brought a question to mind:  why the need?  I looked
> at my oracle server and we aren't doing anything of the sort (that I can
> find),  so why does pg need it?  Any info?

I know nothing about Oracle, but I can tell you that Informix has an update 
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during 
updates and deletes, non the less, the tables get shrincked, so searches get 
faster.

Saludos... :-)

-- 
System Administration: It's a dirty job, 
but someone told I had to do it.
-----------------------------------------------------------------
Martín Marqués email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------




Re: Why vacuum?

From
Alfred Perlstein
Date:
* xuyifeng <xyf@stocke.com> [001213 18:54] wrote:
> I have this nasty problem too,  in early time, I don't know the problem, but we used it for a while,
> than we found our table growing too fast without insert any record( we use update), this behaviour 
> most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record 
> space and full fill your hard disk after several hours,  the nasty vaccum block any other users to operate
> on table,  this is a big problem for a large table, because it will block tooo long to let other user to run
> query. we have a project affected by this problem, and sadly we decide to use closure source database
>  - SYBASE on linux, we havn't any other selections. :(
> 
> note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds,
> not block any other user, this is pretty. ya, what's good technology!

http://people.freebsd.org/~alfred/vacfix/

-Alfred


Re: Why vacuum?

From
bpalmer
Date:
> Yes, postgresql requires vacuum quite often otherwise queries and
> updates start taking ungodly amounts of time to complete.  If you're
> having problems because vacuum locks up your tables for too long
> you might want to check out:

But why?  I don't know of other databases that need to be 'vacuum'ed.  Do
all others just do it internaly on a regular basis?

What am I missing here?



b. palmer,  bpalmer@crimelabs.net
pgp:  www.crimelabs.net/bpalmer.pgp5




Re: Why vacuum?

From
The Hermit Hacker
Date:
On Wed, 13 Dec 2000, bpalmer wrote:

> > Yes, postgresql requires vacuum quite often otherwise queries and
> > updates start taking ungodly amounts of time to complete.  If you're
> > having problems because vacuum locks up your tables for too long
> > you might want to check out:
> 
> But why?  I don't know of other databases that need to be 'vacuum'ed.  Do
> all others just do it internaly on a regular basis?
> 
> What am I missing here?

PgSQL's storage manager is currently such that it doesn't overwrite
'deleted' records, but just keeps appending to the end of the table
... so, for instance, a client of ours whose table had 5 records in it
that are updated *alot* grew a table to 64Meg that only contains ~8k worth
of data ...

vacuum'ng cleans out the cruft and truncates the file ...

vadim, for v7.2, is planning on re-writing the storage manager to do
proper overwriting of deleted space, which will reduce the requirement for
vacuum to almost never ... 



RE: Why vacuum?

From
"Christopher Kings-Lynne"
Date:
> But why?  I don't know of other databases that need to be 'vacuum'ed.  Do
> all others just do it internaly on a regular basis?
>
> What am I missing here?

Plenty of other databases need to be 'vacuumed'.  For instance, if you have
an ms access database with 5 MB of data in it, and then delete all the data,
leaving only the forms, etc - you will be left with a 5MB mdb file still!

If you then run 'Compact Database' (which is another word for 'vacuum'), the
mdb file will be reduced down to 500k...

Chris



RE: Why vacuum?

From
Tim Allen
Date:
On Thu, 14 Dec 2000, Christopher Kings-Lynne wrote:

> Plenty of other databases need to be 'vacuumed'.  For instance, if you have
> an ms access database with 5 MB of data in it, and then delete all the data,
> leaving only the forms, etc - you will be left with a 5MB mdb file still!
> 
> If you then run 'Compact Database' (which is another word for 'vacuum'), the
> mdb file will be reduced down to 500k...

Ooh... Hope MS Access isn't going to be taken seriously as a benchmark
here :-). The same is also true of MapInfo, by the way, but I'm not
holding that up as a benchmark either ;-).

> Chris

Tim

-- 
-----------------------------------------------
Tim Allen          tim@proximity.com.au
Proximity Pty Ltd  http://www.proximity.com.au/ http://www4.tpg.com.au/users/rita_tim/



Re: Why vacuum?

From
Andrew McMillan
Date:
Tim Allen wrote:
> 
> On Thu, 14 Dec 2000, Christopher Kings-Lynne wrote:
> 
> > Plenty of other databases need to be 'vacuumed'.  For instance, if you have
> > an ms access database with 5 MB of data in it, and then delete all the data,
> > leaving only the forms, etc - you will be left with a 5MB mdb file still!
> >
> > If you then run 'Compact Database' (which is another word for 'vacuum'), the
> > mdb file will be reduced down to 500k...
> 
> Ooh... Hope MS Access isn't going to be taken seriously as a benchmark
> here :-). The same is also true of MapInfo, by the way, but I'm not
> holding that up as a benchmark either ;-).

:-)

I think that the non-overwriting storage manager actually bought a lot
more for PostgreSQL than it does for MS Access.

In earlier versions of PostgreSQL it was possible to "time travel" your
database and so run your query agains the database as it was at a
particular time / date.  This advanced feature turns out to be useful in
very few situations, and is very expensive in terms of storage.

Still, "if it works, don't fix it" also applies.  The PostgreSQL storage
manager is quite efficient as it is now, and most of us do have quiet
periods when we can safely vacuum the database, which is why it has had
to wait until now.

This will be quite a big change for 7.2, and getting the performance
right will no doubt challenge these hackers whom we are all greatly
indebted to.

Cheers,                Andrew.
-- 
_____________________________________________________________________          Andrew McMillan, e-mail:
Andrew@catalyst.net.nz
Catalyst IT Ltd, PO Box 10-225, Level 22, 105 The Terrace, Wellington
Me: +64 (21) 635 694, Fax: +64 (4) 499 5596, Office: +64 (4) 499 2267


Re: Why vacuum?

From
Hannu Krosing
Date:
The Hermit Hacker wrote:
> 
> On Wed, 13 Dec 2000, bpalmer wrote:
> 
> > > Yes, postgresql requires vacuum quite often otherwise queries and
> > > updates start taking ungodly amounts of time to complete.  If you're
> > > having problems because vacuum locks up your tables for too long
> > > you might want to check out:
> >
> > But why?  I don't know of other databases that need to be 'vacuum'ed.  Do
> > all others just do it internaly on a regular basis?
> >
> > What am I missing here?
> 
> PgSQL's storage manager is currently such that it doesn't overwrite
> 'deleted' records, but just keeps appending to the end of the table
> ... so, for instance, a client of ours whose table had 5 records in it
> that are updated *alot* grew a table to 64Meg that only contains ~8k worth
> of data ...
> 
> vacuum'ng cleans out the cruft and truncates the file ...
> 
> vadim, for v7.2, is planning on re-writing the storage manager to do
> proper overwriting of deleted space, which will reduce the requirement for
> vacuum to almost never ...

I hope that he does it in a way that allows it to retain the old
behaviour 
for some tables if there is need for it.

Also, vacuum and analyze should be separated (i.e. one should be able to 
analyze a table without vacuuming it.) 

Maybe use "ALTER TABLE/DATABASE UPDATE STATISTICS" for VACUUM ANALYZE as
syntax.

Time travel is/was an useful feature that is difficult to emulate
efficiently 
using "other" means like rules/triggers 

------------
Hannu


Re: Why vacuum?

From
Denis Perchine
Date:
Hello,

Another question about vacuum. Will vacuum/drop/create deadlocks be fixed in 
7.0.x branch? That's really annoying. I cannot run vacuum automatically due 
to this. Just a patch will be really great. Is it so hard to fix?

-- 
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------


Re: Why vacuum?

From
"Martin A. Marques"
Date:
El Mié 13 Dic 2000 22:24, xuyifeng escribió:
> I have this nasty problem too,  in early time, I don't know the problem,
> but we used it for a while, than we found our table growing too fast
> without insert any record( we use update), this behaviour most like M$
> MSACCESS database I had used a long time ago which don't reuse deleted
> record space and full fill your hard disk after several hours,  the nasty
> vaccum block any other users to operate on table,  this is a big problem
> for a large table, because it will block tooo long to let other user to run
> query. we have a project affected by this problem, and sadly we decide to
> use closure source database - SYBASE on linux, we havn't any other
> selections. :(
>
> note that SYBASE and Informix both have 'update statistics' command, but
> they run it fast in seconds, not block any other user, this is pretty. ya,
> what's good technology!

I have to say that 'update statistics' does not take a few seconds if the 
databases have grownto be a bit large. At least thats what I have seen.

Saludos... :-)

-- 
System Administration: It's a dirty job, 
but someone told I had to do it.
-----------------------------------------------------------------
Martín Marqués            email:     martin@math.unl.edu.ar
Santa Fe - Argentina        http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------


Re: Why vacuum?

From
"Ross J. Reedstrom"
Date:
On Thu, Dec 14, 2000 at 01:16:20PM +0000, Hannu Krosing wrote:
> The Hermit Hacker wrote:

<snip>

> > vadim, for v7.2, is planning on re-writing the storage manager to do
> > proper overwriting of deleted space, which will reduce the requirement for
> > vacuum to almost never ...
> 
> I hope that he does it in a way that allows it to retain the old
> behaviour 
> for some tables if there is need for it.

Here as well. The framework is still mostly there for multiple storage
managers: I hope Vadim takes advantage of it.

<snip>

> Time travel is/was an useful feature that is difficult to emulate
> efficiently using "other" means like rules/triggers 

I've actually been doing this very thing this week. It's not _that_
horibble, but does interact really poorly with RI constraints: suddenly,
all those unique PK columns aren't so unique! This is probably the biggest
reason to do time travel in the backend. Having it on a per-table basis
would be cool.

Hmm, seems the biggest problem to doing it per table would be needing
a couple optional system attributes (e.g. tt_start and tt_stop),
and different indices, that know how to skip not-current tuples. Extra
syntax to do queries at a particular time in the past would be nice, but
not an inital requirement.

Sounds like there's something in common here with the per tuple CRC
discusson, as well as optional OID: a generic need for optional system
attributes.

Ross
-- 
Open source code is like a natural resource, it's the result of providing
food and sunshine to programmers, and then staying out of their way.
[...] [It] is not going away because it has utility for both the developers 
and users independent of economic motivations.  Jim Flynn, Sunnyvale, Calif.


Re: Why vacuum?

From
mlw
Date:
"Martin A. Marques" wrote:
> 
> El Mié 13 Dic 2000 16:41, bpalmer escribió:
> > I noticed the other day that one of my pg databases was slow,  so I ran
> > vacuum on it,  which brought a question to mind:  why the need?  I looked
> > at my oracle server and we aren't doing anything of the sort (that I can
> > find),  so why does pg need it?  Any info?
> 
> I know nothing about Oracle, but I can tell you that Informix has an update
> statistics, which I don't know if it's similar to vacuum, but....
> What vacuum does is clean the database from rows that were left during
> updates and deletes, non the less, the tables get shrincked, so searches get
> faster.
> 

While I would like Postgres to perform statistics, one and a while, on
it own. I like vacuum in general.

I would rather trade unused disk space for performace. The last thing
you need during high loads is the database thinking that it is time to
clean up.

-- 
http://www.mohawksoft.com


Re: Why vacuum?

From
Alfred Perlstein
Date:
* mlw <markw@mohawksoft.com> [001214 09:30] wrote:
> "Martin A. Marques" wrote:
> > 
> > El Mié 13 Dic 2000 16:41, bpalmer escribió:
> > > I noticed the other day that one of my pg databases was slow,  so I ran
> > > vacuum on it,  which brought a question to mind:  why the need?  I looked
> > > at my oracle server and we aren't doing anything of the sort (that I can
> > > find),  so why does pg need it?  Any info?
> > 
> > I know nothing about Oracle, but I can tell you that Informix has an update
> > statistics, which I don't know if it's similar to vacuum, but....
> > What vacuum does is clean the database from rows that were left during
> > updates and deletes, non the less, the tables get shrincked, so searches get
> > faster.
> > 
> 
> While I would like Postgres to perform statistics, one and a while, on
> it own. I like vacuum in general.
> 
> I would rather trade unused disk space for performace. The last thing
> you need during high loads is the database thinking that it is time to
> clean up.

Even worse is having to scan a file that has grown 20x the size
because you havne't vacuum'd in a while.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."