Re: Proposal: Another attempt at vacuum improvements - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Proposal: Another attempt at vacuum improvements
Date
Msg-id BANLkTimhxbWX6w3H=PTdxBuPgXi1b2UFug@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Another attempt at vacuum improvements  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Proposal: Another attempt at vacuum improvements  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Wed, May 25, 2011 at 1:43 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> I think the point is you can not *always* put it just after the line
> pointer array without possibly shuffling the tuples. Remember we need
> to put the LSN when the dead line pointer is generated because we
> decided to prune away the dead tuple. Say, for example, the page is
> completely full and there are no dead line pointers and hence no LSN
> on the page. Also there is no free space after the line pointer array.
> Now say we prune dead tuples and generate dead line pointers, but the
> last line pointer in the array is still in-use and the first tuple
> immediately after the line pointer array is live. Since you generated
> dead line pointers you want to store the LSN on the page. Now, there
> is no way you can store is after the line pointer array without moving
> the live tuple somewhere else.

So far I agree.  But don't we always defragment immediately after
pruning dead tuples to line pointers?  The removal of even one tuple
will give us more than enough space to store the LSN.

> Let me summarize the sequence of operations and let me know if you
> still disagree with the general principle:
>
> 1. There are no dead line pointers in the page - we are good.
> 2. Few tuples become dead, HOT pruning is invoked either during normal
> operation or heap vacuum. The dead tuples are pruned away and
> truncated to dead line pointers. We already hold cleanup lock on the
> buffer. We set the flag in the page header and store the LSN (either
> at the end of line pointer array or at the end of the page)
> 3. Someday index vacuum is run and it removes the index pointers to
> the dead line pointers. We remember the start LSN of the index vacuum
> somewhere, may be as a pg_class attribute (how does index vacuum get
> the list of dead line pointers is not material in the general scheme
> of things)
> 4. When the page is again chosen for pruning, we check if the flag is
> set in the header. If so, get the LSN stored in the page, check it
> against the last successful index vacuum LSN and if its precedes the
> index vacuum LSN, we turn the LP_DEAD line pointers to LP_UNUSED. The
> special LSN can be removed unless new LP_DEAD line pointers get
> generated during the pruning, otherwise its overwritten with the
> current LSN. Since we hold the buffer cleanup lock, the special LSN
> storage can be reclaimed by shuffling things around.

Agreed.  The only thing I'm trying to do further is to avoid the need
for a reshuffle when the special LSN storage is reclaimed.  For
example, consider:

1. There are three tuples on the page.  We are good.
2. Tuple #2 becomes dead.  The tuple is pruned to a line pointer.  The
page is defragmented.  At this point, it doesn't matter WHERE we put
the LSN - we are rearranging the whole page anyway.
3. Index vacuum is run.
4. Now we want to make the dead line pointer unused, and reclaim the
LSN storage.  If the LSN is stored at the end of the page, then we now
have to move all of the tuple data forward by 8 bytes.  But if it's
stored adjacent to the hole in the middle of the page, we need only
clear the page-header bits saying it's there (and maybe adjust
pd_lower).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: adding a new column in IDENTIFY_SYSTEM
Next
From: Bruce Momjian
Date:
Subject: Re: [BUGS] BUG #6034: pg_upgrade fails when it should not.