Re: a vacuum thread is not the answer - Mailing list pgsql-hackers

From mlw
Subject Re: a vacuum thread is not the answer
Date
Msg-id 3C8F7CB4.DD172133@mohawksoft.com
Whole thread Raw
In response to a vacuum thread is not the answer  (mlw <markw@mohawksoft.com>)
List pgsql-hackers
Hannu Krosing wrote:
> 
> On Thu, 2002-03-07 at 20:15, mlw wrote:
> > I was just toying around with things, and you know, running vacuum in the
> > background doesn't work. It slows things down too much.
> >
> > The worst case senario is when one does this:
> >
> > update accounts set abalance = abalance + 1 ;
> >
> > This takes forever to run and doubles the size of the table.
> 
> How is this related to running vacuum in background ?
> 
> Does it run fast when vacuum is not running ?

The problem is that it doubles the size of a table. This invariably means that
you have more I/O. If there were a way to reuse old tulples, while they are
still in the buffer cache, then PostgreSQL could handle this query faster.

It was, however, pointed out that (obviously) you can't do reclaimation during
a transaction because if it fails or someone issues "rollback" you have broken
the database. 

So, I guess I'm saying ignore that part.

> 
> > Is there a way that a separate thread managing the freelist can perform a "per
> > row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
> > need to be able to recover rows already in memory for performance.
> 
> What could be possibly done (and is probably not very useful anyway) is
> reusing the row modified _in_the_same_transaction_ so that
> 
> begin;
> abalance = abalance + 1 ;
> abalance = abalance + 1 ;
> abalance = abalance + 1 ;
> end;
> 
> would consume just 2x the tablespace and not 4x. But this does not
> require a separate thread, just some changes in update logic.
> 
> OTOH, this will probably interfere with some transaction modes that make
> use of command ids.

I haven't looked at the code, so I don't even know if it is doable. Could a
small vacuum thread run in the background and monitor the buffer cache? When it
finds a buffer with an unreferenced tuple, do what vacuum does, but only to
that block?

Here is my problem with vacuum. It scans the whole damn table and it takes a
long time. In many, dare I say most, SQL databases, the rows which are updated
are likely a small percent.

If a small vacuum routine can be run against the blocks that are already in the
buffer, this will eliminate a block read, and focus more on blocks which are
likely to have been modified.


pgsql-hackers by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: select max(column) not using index
Next
From: Hannu Krosing
Date:
Subject: Re: select max(column) not using index