Thread: AW: Why vacuum?
> > Yes, postgresql requires vacuum quite often otherwise queries and > > updates start taking ungodly amounts of time to complete. If you're > > having problems because vacuum locks up your tables for too long > > you might want to check out: > > But why? I don't know of other databases that need to be > 'vacuum'ed. Do > all others just do it internaly on a regular basis? > > What am I missing here? They all have an overwriting storage manager. The current storage manager of PostgreSQL is non overwriting, which has other advantages. There seem to be 2 answers to the problem: 1. change to an overwrite storage manager 2. make vacuum concurrent capable The tendency here seems to be towards an improved smgr. But, it is currently extremely cheap to calculate where a new row needs to be located physically. This task is *a lot* more expensive in an overwrite smgr. It needs to maintain a list of pages with free slots, which has all sorts of concurrency and persistence problems. Andreas
On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote: > > They all have an overwriting storage manager. The current storage manager > of PostgreSQL is non overwriting, which has other advantages. > > There seem to be 2 answers to the problem: > 1. change to an overwrite storage manager > 2. make vacuum concurrent capable > > The tendency here seems to be towards an improved smgr. > But, it is currently extremely cheap to calculate where a new row > needs to be located physically. This task is *a lot* more expensive > in an overwrite smgr. It needs to maintain a list of pages with free slots, > which has all sorts of concurrency and persistence problems. > Not to mention the recent thread here about people recovering data that was accidently deleted, or from damaged db files: the old tuples serve as redundant backup, in a way. Not a real compelling reason to keep a non-overwriting smgr, but still a surprise bonus for those who need it. Ross
"Ross J. Reedstrom" wrote: > > Not to mention the recent thread here about people recovering data that > was accidently deleted, or from damaged db files: the old tuples serve > as redundant backup, in a way. Not a real compelling reason to keep a > non-overwriting smgr, but still a surprise bonus for those who need it. The optimal would be a configurable behaviour. I wouldn't enable it on a users table, neither on a log-type table (the former is a slowly changing table, the second is a table with few updates/deletes), but a fast-changing table like an active sessions table would benefit a lot. Currently, my active sessions table grows by 100K every 20 seconds, I have to constantly vacuum it to keep the things reasonable. Other tables would benefit a lot, pg_listener for example. Bye!
"Ross J. Reedstrom" wrote: > > On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote: > > > > The tendency here seems to be towards an improved smgr. > > But, it is currently extremely cheap to calculate where a new row > > needs to be located physically. This task is *a lot* more expensive > > in an overwrite smgr. I don't agree. If (as I have proposed) the search is made in the background by a low priority process, you just have to lookup a cache entry to find out where to write. > > It needs to maintain a list of pages with free slots, > > which has all sorts of concurrency and persistence problems. Concurrency is a problem, but a spinlock on a shared-memory table should suffice in the majority of the cases[1]. I may be wrong... but I think it should be discussed. [1] I believe that already there's a similar problem to synchronize the backends when the want to append a new page. Bye!
* Ross J. Reedstrom <reedstrm@rice.edu> [001214 07:57] wrote: > On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote: > > > > They all have an overwriting storage manager. The current storage manager > > of PostgreSQL is non overwriting, which has other advantages. > > > > There seem to be 2 answers to the problem: > > 1. change to an overwrite storage manager > > 2. make vacuum concurrent capable > > > > The tendency here seems to be towards an improved smgr. > > But, it is currently extremely cheap to calculate where a new row > > needs to be located physically. This task is *a lot* more expensive > > in an overwrite smgr. It needs to maintain a list of pages with free slots, > > which has all sorts of concurrency and persistence problems. > > > > Not to mention the recent thread here about people recovering data that > was accidently deleted, or from damaged db files: the old tuples serve > as redundant backup, in a way. Not a real compelling reason to keep a > non-overwriting smgr, but still a surprise bonus for those who need it. One could make vacuum optional such that it either: 1) always overwrites 2) will not overwrite data until a vacuum is called (perhaps with a date option to specify how much deleted data you wishto reclaim) data can be marked free but not free for re-use until vacuum is run. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."