Re: Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Proposal: Another attempt at vacuum improvements
Date
Msg-id BANLkTinzsJC8kZ1js7KzSqimXU3bqNA7Mw@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Another attempt at vacuum improvements  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Proposal: Another attempt at vacuum improvements  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Wed, May 25, 2011 at 11:51 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>> Agreed.  The only thing I'm trying to do further is to avoid the need
>> for a reshuffle when the special LSN storage is reclaimed.
>
> Ah ok. That was never clear from your initial emails or may be I
> mis-read.

Sorry, I must not have explained it very well.  :-(

> So what you are saying is by storing LSN after line pointer
> array, we might be able to reclaim LSN storage without shuffling. That
> makes sense. Having said that, it doesn't excite me too much because I
> think we should do the dead line pointer reclaim operation during page
> pruning and we are already holding cleanup lock at that time and most
> likely do a reshuffle anyways.

I'll give that a firm maybe.  If there is no reshuffle, then you can
do this with just an exclusive content lock.  Maybe that's worthless,
but I'm not certain of it.  I guess we might need to see how the code
shakes out.

Also, reshuffling might be more expensive.  I agree that if there are
new dead tuples on the page, then you're going to be paying that price
anyway; but if not, it might be avoidable.

> Also a downside of storing LSN after line pointer array is that you
> may waste space because of alignment issues.

We could possibly store it unaligned and read it back two bytes at a
time.  Granted, that's not free.

> I also thought that the
> LSN might come in between extending line pointer array, but probably
> thats not a big deal since if there is free space in the page (and
> there should be if we are adding a new tuple), it should be available
> immediately after the LSN.

Yeah.  I'm not sure how icky that is, though.

> There are some other issues that we should think about too. Like
> recording free space  and managing visibility map. The free space is
> recorded in the second pass pass today, but I don't see any reason why
> that can't be moved to the first pass. Its not clear though if we
> should also record free space after retail page vacuum or leave it as
> it is.

Not sure.  Any idea why it's like that, or why we might want to change it?

> For visibility maps, we should not update them until there are
> LP_DEAD line pointers on the page. Now thats not good because all
> tuples in the page may be visible, so we may loose some advantage, at
> least for a while, but if mark the page all-visible, the vacuum scan
> would not find the dead line pointers in it and that would leave
> dangling index pointers after an index vacuum.

Also, an index-only scan might return index tuples that are pointing
to dead line pointers.

Currently, I believe the only way a page can get marked all-visible is
by vacuum.  But if we make this change, then it would be possible for
a HOT cleanup to encounter a situation where all-visible could be set.We probably want to make that work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: Proposal: Another attempt at vacuum improvements
Next
From: Robert Haas
Date:
Subject: Re: tackling full page writes