Re: Autovacuum in the backend - Mailing list pgsql-hackers

From Gavin Sherry
Subject Re: Autovacuum in the backend
Date
Msg-id Pine.LNX.4.58.0506161039220.18538@linuxworld.com.au
Whole thread Raw
In response to Re: Autovacuum in the backend  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Autovacuum in the backend
Re: Autovacuum in the backend
Re: Autovacuum in the backend
Re: Autovacuum in the backend
Space reuse and autovacuum
List pgsql-hackers
On Wed, 15 Jun 2005, Bruce Momjian wrote:

>
> I am going to start working on it.  I am concerned it is a big job.
>
> I will post questions as I find them, and the one below is a good one.
>

I'm wondering if effort is being misdirected here. I remember when Mark
Wong at OSDL was running pg_autovacuum during a dbt run, he was seeing
significant performance loss -- I think on the order of 30% to 40% (I will
try and dig up a link to the results).

I think these results can be dramatically improved if the focus is on a
more effective vacuum.

In January I was in Toronto with Jan, Tom and others and some ideas about
vacuum were being discussed. The basic idea is that when we dirty pages we
need we set a bit in a bitmap to say that the page has been dirty. A
convenient place to do this is when we are writing dirty buffers out to
disk. In many situations, this can happen inside the bgwriter meaning that
there should be little contention for this bitmap. Of course, individual
backends may be writing pages out and would have to account for the
dirty pages at that point.

Now this bitmap can occur on a per heap segment basis (ie, per 1 GB heap
file). You only need 2 pages for the bitmap to represent all the pages in
the segment, which is fairly nice. When vacuum is run, instead of visiting
every page, it would see which pages have been dirtied in the bitmap and
visit only pages. With large tables and small numbers of modified
tuples/pages, the effect this change would have would be pretty
impressive.

This also means that we could effectively implement some of the ideas
which are being floated around, such as having vacuum run only for a short
time period.

One problem is whether or not we have to guarantee that we account for
every dirtied page. I think that would be difficult in the presence of a
crash. One idea Neil mentioned is that on a crash, we could set all pages
in the bitmap to dirty and the first vacuum would effectively be a vacuum
full. The alternative is to say that we don't guarantee that this type of
vacuum is completely comprehensive and that it isn't a replacement for
vacuum full.

Thoughts? Comments?

Thanks,

Gavin


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [GENERAL] INHERITS and planning
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: Autovacuum in the backend