Re: Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Proposal: Another attempt at vacuum improvements
Date
Msg-id BANLkTi=fGT_fyYNSd44MXUuq7dkocmjU1Q@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Another attempt at vacuum improvements  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Proposal: Another attempt at vacuum improvements  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Wed, May 25, 2011 at 7:07 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>> But instead of allocating permanent space in the page header, which would
>> both reduce (admittedly only by 8 bytes) the amount of space available
>> for tuples, and more significantly have the effect of breaking on-disk
>> compatibility, I'm wondering if we could get by with making space for
>> that extra LSN only when it's actually present. In other words, when
>> it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch,
>> increment pd_upper, and use the extra space to store the LSN.  There
>> is an alignment problem to worry about there but that shouldn't be a
>> huge issue.
>
> That might work but would require us to move tuples around when the first
> dead line pointer gets generated in the page.

I'm confused.  A major point of the approach I was proposing was to
avoid having to move tuples around.

> You may argue that we should
> be holding a cleanup-lock when that happens and the dead line pointer
> creation is always followed by a call to PageRepairFragmentation(), so it
> should be easier to make space for the LSN.

I'm not sure if this is the same thing you're saying, but certainly
the only time we need to make space for this value is when we've just
remove tuples from the page and defragmented, and at that point there
should certainly be 8 bytes free somewhere.

> Instead of storing the LSN after the page header, would it be easier to set
> pd_special and store the LSN at the end of the page ?

I was proposing storing it after the line pointer array, not after the
page header.  If we store it at the end of the page, I suspect we're
going to basically end up allocating permanent space for it, because
otherwise we'll have to shift all the tuple data forward and backward
by 8 bytes when we allocate or deallocate space for this.  Now, maybe
that's OK: I'm not sure.  But it's something to think about carefully.If we are going to allocate permanent space, the
specialspace seems 
better than the page header, because we should be able to make that
work without on-disk compatibility, and because AFAIUI we only need
the space for heap pages, not index pages.

> I think that should be not so difficult to handle. I think handling this by
> special space mechanism might be less complicated.

A permanent space allocation will certainly be simpler.  I'm just not
sure how much we care about giving up 8 bytes that could potentially
be used to store tuple data.

>> If the LSN stored in the page precedes the
>> start-of-last-successful-index-vacuum LSN, and if, further, we can get
>> a buffer cleanup lock on the page, then we can do a HOT cleanup and
>> life is good.  Otherwise, we can either (1) just forget about the
>> most-recent-dead-line-pointer LSN - not ideal but not catastrophic
>> either - or (2) if the start-of-last-successful-vacuum-LSN is old
>> enough, we could overwrite an LP_DEAD line pointer in place.
>
> I don't think we need the cleanup lock to turn the LP_DEAD line pointers to
> LP_UNUSED since that does not involve moving tuples around. So a simple
> EXCLUSIVE lock should be enough. But we would need to WAL log the operation
> of turning DEAD to UNUSED, so it would be simpler to consolidate this in HOT
> pruning. There could be exceptions such as, say large number of DEAD line
> pointers are accumulated towards the end and reclaiming those would free up
> substantial space in the page. But may be we can use those conditions to
> invoke HOT prune instead of handling them separately.

Makes sense.

>> Another issue is that this causes problems for temporary and unlogged
>> tables, because no WAL records are generated and, therefore, the LSN
>> does not advance.  This is also a problem for GIST indexes; Heikki
>> fixed temporary GIST indexes by generating fake LSNs off of a
>> backend-local counter.  Unlogged GIST indexes are currently not
>> supported.  I think what we need to do is create an API to which you
>> can pass a relation and get an LSN.  If it's a permanent relation, you
>> get a regular LSN.  If it's a temporary relation, you get a fake LSN
>> based on a backend-local counter.  If it's an unlogged relation, you
>> get a fake LSN based on a shared-memory counter that is reset on
>> restart.  If we can encapsulate that properly, it should provide both
>> what we need to make this idea work and allow a somewhat graceful fix
>> for GIST-vs-unlogged problem.
>
> Can you explain more how things would work for unlogged tables ? Do we use
> the same shared memory counter for tracking last successful index vacuum ?

Yes.

> If so, how do we handle the case where after restart the page may get LSN
> less than the index vacuum LSN if the index vacuum happened before the
> crash/stop ?

Well, on a crash, the unlogged relations get truncated, and their
indexes also, so no problem.  On a clean shutdown, I guess we need to
arrange to save the counter across restarts.

Take a look at the existing logic around GetXLogRecPtrForTemp().
That's slightly different, because there we don't even need to be
consistent across backends.  We just need an increasing sequence of
values.  For unlogged relations things are a bit more complex - but it
seems manageable.

> We might be fooled into believing that the index pointers are
> all removed even for dead line pointers generated after the restart ? We can
> possibly handle that by resetting the index vacuum LSN so that nothing gets
> removed until one cycle of heap and index vacuum is done. But I am not sure
> how easy would it be to reset the index vacuum LSNs for all unlogged
> relations at the end of recovery.

Yeah.  If we store the LSN in the system catalogs, it will be hard to
reset it after recovery, unless we also include some other identifier
that keeps track of restarts.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Leonardo Francalanci
Date:
Subject: Re: use less space in xl_xact_commit patch
Next
From: Leonardo Francalanci
Date:
Subject: Re: use less space in xl_xact_commit patch