Re: Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Proposal: Another attempt at vacuum improvements
Date
Msg-id BANLkTi=STNpQFTa-e97OSwz9ts=kKQQP6Q@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Another attempt at vacuum improvements  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Thu, May 26, 2011 at 6:40 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>>> There are some other issues that we should think about too. Like
>>> recording free space  and managing visibility map. The free space is
>>> recorded in the second pass pass today, but I don't see any reason why
>>> that can't be moved to the first pass. Its not clear though if we
>>> should also record free space after retail page vacuum or leave it as
>>> it is.
>>
>> Not sure.  Any idea why it's like that, or why we might want to change it?
>
> I think it precedes the HOT days when the dead space was reclaimed
> only during the second scan. Even post-HOT, if we know we would
> revisit the page anyways during the second scan, it makes sense to
> delay recording free space because the dead line pointers can add to
> it (if they are towards the end of the line pointer array). I remember
> discussing this briefly during HOT, but can't recollect why we decided
> not to update the FSM after retail vacuum. But the entire focus then
> was to keep things simple and that could be one reason.

It's important to keep in mind that page-at-a-time vacuum is happening
in the middle of a routine INSERT/UPDATE/DELETE operation, so we don't
want to do anything too expensive there.  Whether updating the FSM
falls into that category or not, I am not sure.

>> Currently, I believe the only way a page can get marked all-visible is
>> by vacuum.  But if we make this change, then it would be possible for
>> a HOT cleanup to encounter a situation where all-visible could be set.
>>  We probably want to make that work.
>
> Yes. Thats certainly an option.
>
> We did not discuss where to store the information about the start-LSN
> of the last successful index vacuum. I am thinking about a new
> pg_class attribute, just because I can't think of anything better. Any
> suggestion ?

That seems fairly grotty, but I don't have a lot of brilliant ideas.
One possibility that occurred to me was to stick it in the special
space on the first page of the relation.  But that would mean that
every HOT cleanup would need to look at that page, which seems poor.
Even if we cached it after the first access, it still seems kinda
poor.  But it would make the unlogged case easier to handle...  and we
have thought previously about including some metadata in the relation
file itself to help with forensics (which table was this, anyway?).
So I don't know.

> Also for the first version, I wonder if we should let the unlogged and
> temp tables to be handled by the usual two pass vacuum. Once we have
> proven that one pass is better, we will extend that to other tables as
> discussed on this thread.

We can certainly do that for testing.  Whether we want to commit it
that way, I'm not sure.

> Do we need a modified syntax for vacuum, like "VACUUM mytab SKIP
> INDEX" or something similar ? That way, user can just vacuum the heap
> if she wishes so and can also help us with testing.

There's an extensible-options syntax you can use... VACUUM (index off) mytab.

> Do we need more autovacuum tuning parameters to control when to vacuum
> just the heap and when to vacuum the index as well ? Again, we can
> discuss and decide this later, but just wanted to mention this here.

Let's make tuning that a separate effort.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: Proposal: Another attempt at vacuum improvements
Next
From: Robert Haas
Date:
Subject: Re: Proposal: Another attempt at vacuum improvements