Re: Freezing without write I/O - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Freezing without write I/O
Date
Msg-id 5242AD68.1090902@vmware.com
Whole thread Raw
In response to Re: Freezing without write I/O  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Freezing without write I/O
Re: Freezing without write I/O
Re: Freezing without write I/O
List pgsql-hackers
On 19.09.2013 16:24, Andres Freund wrote:
> On 2013-09-19 14:40:35 +0200, Andres Freund wrote:
>>>> * I think heap_lock_tuple() needs to unset all-visible, otherwise we
>>>>    won't vacuum that page again which can lead to problems since we
>>>>    don't do full-table vacuums again?
>>>
>>> It's OK if the page is never vacuumed again, the whole point of the patch is
>>> that old XIDs can be just left lingering in the table. The next time the
>>> page is updated, perhaps to lock a tuple again, the page will be frozen and
>>> the old xmax will be cleared.
>>
>> Yes, you're right, it should be possible to make it work that way. But
>> currently, we look at xmax and infomask of a tuple in heap_lock_tuple()
>> *before* the PageUpdateNeedsFreezing() call. Currently we would thus
>> create a new multixact with long dead xids as members and such.

Ah, I see.

> That reminds me of something:
>
> There are lots of checks sprayed around that unconditionally look at
> xmin, xmax or infomask.
>
> Things I noticed in a quick look after thinking about the previous
> statment:
> * AlterEnum() looks at xmin
> * The PLs look at xmin
> * So does afair VACUUM FREEZE, COPY and such.
> * Several HeapTupleSatisfiesVacuum() callers look at xmin and stuff
>    without checking for maturity or freezing the page first.
>    * lazy_scan_heap() itself if the page isn't already marked all_visible
>    * heap_page_is_all_visible()
> * rewrite_heap_tuple() does an unconditional heap_freeze_tuple() without
>    considering maturity/passing in Invalid*
> * heap_page_prune_opt() shouldn't do anything on a mature (or even all
>    visible) page.
> * heap_page_prune() marks a buffer dirty, writes a new xid without
>    changing the lsn.
> * heap_prune_chain() looks at tuples without considering page maturity,
>    but the current implementation actually looks safe.
> * CheckForSerializableConflictOut() looks at xmins.
> * pgrowlocks() looks at xmax unconditionally.
> * heap_update() computes, just like heap_lock_tuple(), a new
>    xmax/infomask before freezing.
> * Same for heap_lock_updated_tuple(). Not sure if that's an actual
>    concern, but it might if PageMatureLSN advances or such.
> * heap_getsysattr() should probably return different things for mature
>    pages.
>
> There's probably more.

Hmm, some of those are trivial, but others, rewrite_heap_tuple() are
currently only passed the HeapTuple, with no reference to the page where
the tuple came from. Instead of changing code to always pass that along
with a tuple, I think we should add a boolean to HeapTupleData, to
indicate if the tuple came from a mature page. If the flag is set, the
tuple should be considered visible to everyone, without looking at the
xmin/xmax. This might affect extensions, although usually external C
code that have to deal with HeapTuples will use functions like
heap_form_tuple to do so, and won't need to deal with low-level stuff or
visibility themselves.

Attached is a new version, which adds that field to HeapTupleData. Most
of the issues on you listed above have been fixed, plus a bunch of other
bugs I found myself. The bug that Jeff ran into with his count.pl script
has also been fixed.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Jeevan Chalke
Date:
Subject: Re: [PATCH] Revive line type
Next
From: Heikki Linnakangas
Date:
Subject: Re: Freezing without write I/O