Thread: a vacuum thread is not the answer

a vacuum thread is not the answer

From
mlw
Date:
I was just toying around with things, and you know, running vacuum in the
background doesn't work. It slows things down too much.

The worst case senario is when one does this:

update accounts set abalance = abalance + 1 ;

This takes forever to run and doubles the size of the table.

Is there a way that a separate thread managing the freelist can perform a "per
row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
need to be able to recover rows already in memory for performance.


Re: a vacuum thread is not the answer

From
Jan Wieck
Date:
mlw wrote:
> I was just toying around with things, and you know, running vacuum in the
> background doesn't work. It slows things down too much.
>
> The worst case senario is when one does this:
>
> update accounts set abalance = abalance + 1 ;
>
> This takes forever to run and doubles the size of the table.
>
> Is there a way that a separate thread managing the freelist can perform a "per
> row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
> need to be able to recover rows already in memory for performance.
   So  you want to reuse space from rows before your transaction   committed? Fine, I'm all for it, as long as
       begin ;       update accounts set abalance = abalance + 1 ;       rollback ;
   still works.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



Re: a vacuum thread is not the answer

From
Hannu Krosing
Date:
On Thu, 2002-03-07 at 20:15, mlw wrote:
> I was just toying around with things, and you know, running vacuum in the
> background doesn't work. It slows things down too much.
> 
> The worst case senario is when one does this:
> 
> update accounts set abalance = abalance + 1 ;
> 
> This takes forever to run and doubles the size of the table.

How is this related to running vacuum in background ?

Does it run fast when vacuum is not running ?

> Is there a way that a separate thread managing the freelist can perform a "per
> row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
> need to be able to recover rows already in memory for performance.

What could be possibly done (and is probably not very useful anyway) is
reusing the row modified _in_the_same_transaction_ so that 

begin;
abalance = abalance + 1 ;
abalance = abalance + 1 ;
abalance = abalance + 1 ;
end;

would consume just 2x the tablespace and not 4x. But this does not
require a separate thread, just some changes in update logic.

OTOH, this will probably interfere with some transaction modes that make
use of command ids.

--------------
Hannu



Re: a vacuum thread is not the answer

From
mlw
Date:
Hannu Krosing wrote:
> 
> On Thu, 2002-03-07 at 20:15, mlw wrote:
> > I was just toying around with things, and you know, running vacuum in the
> > background doesn't work. It slows things down too much.
> >
> > The worst case senario is when one does this:
> >
> > update accounts set abalance = abalance + 1 ;
> >
> > This takes forever to run and doubles the size of the table.
> 
> How is this related to running vacuum in background ?
> 
> Does it run fast when vacuum is not running ?

The problem is that it doubles the size of a table. This invariably means that
you have more I/O. If there were a way to reuse old tulples, while they are
still in the buffer cache, then PostgreSQL could handle this query faster.

It was, however, pointed out that (obviously) you can't do reclaimation during
a transaction because if it fails or someone issues "rollback" you have broken
the database. 

So, I guess I'm saying ignore that part.

> 
> > Is there a way that a separate thread managing the freelist can perform a "per
> > row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
> > need to be able to recover rows already in memory for performance.
> 
> What could be possibly done (and is probably not very useful anyway) is
> reusing the row modified _in_the_same_transaction_ so that
> 
> begin;
> abalance = abalance + 1 ;
> abalance = abalance + 1 ;
> abalance = abalance + 1 ;
> end;
> 
> would consume just 2x the tablespace and not 4x. But this does not
> require a separate thread, just some changes in update logic.
> 
> OTOH, this will probably interfere with some transaction modes that make
> use of command ids.

I haven't looked at the code, so I don't even know if it is doable. Could a
small vacuum thread run in the background and monitor the buffer cache? When it
finds a buffer with an unreferenced tuple, do what vacuum does, but only to
that block?

Here is my problem with vacuum. It scans the whole damn table and it takes a
long time. In many, dare I say most, SQL databases, the rows which are updated
are likely a small percent.

If a small vacuum routine can be run against the blocks that are already in the
buffer, this will eliminate a block read, and focus more on blocks which are
likely to have been modified.