Thread: AW: Why vacuum?

AW: Why vacuum?

From
Zeugswetter Andreas SB
Date:
> > Yes, postgresql requires vacuum quite often otherwise queries and
> > updates start taking ungodly amounts of time to complete.  If you're
> > having problems because vacuum locks up your tables for too long
> > you might want to check out:
> 
> But why?  I don't know of other databases that need to be 
> 'vacuum'ed.  Do
> all others just do it internaly on a regular basis?
> 
> What am I missing here?

They all have an overwriting storage manager. The current storage manager
of PostgreSQL is non overwriting, which has other advantages.

There seem to be 2 answers to the problem:
1. change to an overwrite storage manager
2. make vacuum concurrent capable

The tendency here seems to be towards an improved smgr.
But, it is currently extremely cheap to calculate where a new row
needs to be located physically. This task is *a lot* more expensive
in an overwrite smgr. It needs to maintain a list of pages with free slots,
which has all sorts of concurrency and persistence problems.

Andreas


Re: Why vacuum?

From
"Ross J. Reedstrom"
Date:
On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote:
> 
> They all have an overwriting storage manager. The current storage manager
> of PostgreSQL is non overwriting, which has other advantages.
> 
> There seem to be 2 answers to the problem:
> 1. change to an overwrite storage manager
> 2. make vacuum concurrent capable
> 
> The tendency here seems to be towards an improved smgr.
> But, it is currently extremely cheap to calculate where a new row
> needs to be located physically. This task is *a lot* more expensive
> in an overwrite smgr. It needs to maintain a list of pages with free slots,
> which has all sorts of concurrency and persistence problems.
> 

Not to mention the recent thread here about people recovering data that
was accidently deleted, or from damaged db files: the old tuples serve
as redundant backup, in a way. Not a real compelling reason to keep a
non-overwriting smgr, but still a surprise bonus for those who need it.

Ross


Re: Why vacuum?

From
Daniele Orlandi
Date:
"Ross J. Reedstrom" wrote:
> 
> Not to mention the recent thread here about people recovering data that
> was accidently deleted, or from damaged db files: the old tuples serve
> as redundant backup, in a way. Not a real compelling reason to keep a
> non-overwriting smgr, but still a surprise bonus for those who need it.

The optimal would be a configurable behaviour. I wouldn't enable it on a
users table, neither on a log-type table (the former is a slowly
changing table, the second is a table with few updates/deletes), but a
fast-changing table like an  active sessions table would benefit a lot.

Currently, my active sessions table grows by 100K every 20 seconds, I
have to constantly vacuum it to keep the things reasonable. Other tables
would benefit a lot, pg_listener for example.

Bye!


Re: Why vacuum?

From
Daniele Orlandi
Date:
"Ross J. Reedstrom" wrote:
> 
> On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote:
> >
> > The tendency here seems to be towards an improved smgr.
> > But, it is currently extremely cheap to calculate where a new row
> > needs to be located physically. This task is *a lot* more expensive
> > in an overwrite smgr.

I don't agree. If (as I have proposed) the search is made in the
background by a low priority process, you just have to lookup a cache
entry to find out where to write.

> > It needs to maintain a list of pages with free slots,
> > which has all sorts of concurrency and persistence problems.

Concurrency is a problem, but a spinlock on a shared-memory table should
suffice in the majority of the cases[1]. I may be wrong... but I think
it should be discussed.

[1] I believe that already there's a similar problem to synchronize the
backends when the want to append a new page.

Bye!


Re: Why vacuum?

From
Alfred Perlstein
Date:
* Ross J. Reedstrom <reedstrm@rice.edu> [001214 07:57] wrote:
> On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote:
> > 
> > They all have an overwriting storage manager. The current storage manager
> > of PostgreSQL is non overwriting, which has other advantages.
> > 
> > There seem to be 2 answers to the problem:
> > 1. change to an overwrite storage manager
> > 2. make vacuum concurrent capable
> > 
> > The tendency here seems to be towards an improved smgr.
> > But, it is currently extremely cheap to calculate where a new row
> > needs to be located physically. This task is *a lot* more expensive
> > in an overwrite smgr. It needs to maintain a list of pages with free slots,
> > which has all sorts of concurrency and persistence problems.
> > 
> 
> Not to mention the recent thread here about people recovering data that
> was accidently deleted, or from damaged db files: the old tuples serve
> as redundant backup, in a way. Not a real compelling reason to keep a
> non-overwriting smgr, but still a surprise bonus for those who need it.

One could make vacuum optional such that it either:

1) always overwrites
2) will not overwrite data until a vacuum is called (perhaps with  a date option to specify how much deleted data you
wishto  reclaim) data can be marked free but not free for re-use  until vacuum is run.
 

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."