Re: getting rid of freezing - Mailing list pgsql-hackers

From Robert Haas
Subject Re: getting rid of freezing
Date
Msg-id CA+TgmoZMAPbJ554JuT68jGM4Ye3TeMUJGE3=VaCBDGKxAdh0Jw@mail.gmail.com
Whole thread Raw
In response to getting rid of freezing  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: getting rid of freezing  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Thu, May 23, 2013 at 1:51 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> So, what I propose instead is basically:
> 1) only vacuum non-all-visible pages, even when doing it for
>    anti-wraparound

Check.  We might want an option to force a scan of the whole relation.

> 2) When we can set all-visible guarantee that all tuples on the page are
>    fully hinted. During recovery do the same, so we don't need to log
>    all hint bits.
>    We can do this with only an exclusive lock on the buffer, we don't
>    need a cleanup lock.

I don't think this works.  Emitting XLOG_HEAP_VISIBLE for a heap page
does not emit an FPI for the heap page, only (if needed) for the
visibility map page.  So a subsequent crash that tears the page could
keep XLOG_HEAP_VISIBLE but lose other changes on the page - i.e. the
hint bits.

> 3) When we cannot mark a page all-visible or we cannot get the cleanup
>    lock, remember the oldest xmin on that page. We could set all visible
>    in the former case, but we want the page to be cleaned up sometime
>    soonish.

I think you mean "in the latter case" not "in the former case".  If
not, then I'm confused.

> 4) If we can get the cleanup lock, purge dead tuples from the page and
>    the indexes, just as today. Set the page as all-visible.
>
> That way we know that any page that is all-visible doesn't ever need to
> look at xmin/xmax since we are sure to have set all relevant hint
> bits.
>
> We don't even necessarily need to log the hint bits for all items since
> the redo for all_visible could make sure all items are hinted. The only
> problem is knowing up to where we can truncate pg_clog...

The redo for all_visible cannot make sure all items are hinted.
Again, there's no FPI on the heap page.  The heap page could in fact
contain dead tuples at the time we mark it all-visible.  Consider, for
example:

0. Checkpoint.
1. The buffer becomes all visible.
2. A tuple is inserted, making the buffer not-all-visible.
3. The page is written by the OS.
4. Crash.

Now, recovery will first find the record marking the buffer
all-visible, and will mark it all-visible.  Now the all-visible bit on
the page is flat-out wrong, but it doesn't matter because we haven't
reached consistency.  Next we'll find the heap-insert record, which
will have an FPI, since it's the first WAL-logged change to the buffer
since the last checkpoint.  Now the FPI fixes everything and we're
back in a sane state.

Now in this particular case it wouldn't hurt anything if the redo
routine that set the all-visible bit also hinted all the tuples,
because the FPI is going to overwrite it anyway.  But suppose in lieu
of steps (3) and (4) we write half of the page and then crash, leaving
behind a torn page.  Now it's pretty crazy to think about trying to
hint tuples; the page may be in a completely insane state.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Block write statistics WIP
Next
From: Fabrízio de Royes Mello
Date:
Subject: Patch to .gitignore