Re: Freezing without write I/O - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Freezing without write I/O |
Date | |
Msg-id | 5242AD68.1090902@vmware.com Whole thread Raw |
In response to | Re: Freezing without write I/O (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Freezing without write I/O
Re: Freezing without write I/O Re: Freezing without write I/O |
List | pgsql-hackers |
On 19.09.2013 16:24, Andres Freund wrote: > On 2013-09-19 14:40:35 +0200, Andres Freund wrote: >>>> * I think heap_lock_tuple() needs to unset all-visible, otherwise we >>>> won't vacuum that page again which can lead to problems since we >>>> don't do full-table vacuums again? >>> >>> It's OK if the page is never vacuumed again, the whole point of the patch is >>> that old XIDs can be just left lingering in the table. The next time the >>> page is updated, perhaps to lock a tuple again, the page will be frozen and >>> the old xmax will be cleared. >> >> Yes, you're right, it should be possible to make it work that way. But >> currently, we look at xmax and infomask of a tuple in heap_lock_tuple() >> *before* the PageUpdateNeedsFreezing() call. Currently we would thus >> create a new multixact with long dead xids as members and such. Ah, I see. > That reminds me of something: > > There are lots of checks sprayed around that unconditionally look at > xmin, xmax or infomask. > > Things I noticed in a quick look after thinking about the previous > statment: > * AlterEnum() looks at xmin > * The PLs look at xmin > * So does afair VACUUM FREEZE, COPY and such. > * Several HeapTupleSatisfiesVacuum() callers look at xmin and stuff > without checking for maturity or freezing the page first. > * lazy_scan_heap() itself if the page isn't already marked all_visible > * heap_page_is_all_visible() > * rewrite_heap_tuple() does an unconditional heap_freeze_tuple() without > considering maturity/passing in Invalid* > * heap_page_prune_opt() shouldn't do anything on a mature (or even all > visible) page. > * heap_page_prune() marks a buffer dirty, writes a new xid without > changing the lsn. > * heap_prune_chain() looks at tuples without considering page maturity, > but the current implementation actually looks safe. > * CheckForSerializableConflictOut() looks at xmins. > * pgrowlocks() looks at xmax unconditionally. > * heap_update() computes, just like heap_lock_tuple(), a new > xmax/infomask before freezing. > * Same for heap_lock_updated_tuple(). Not sure if that's an actual > concern, but it might if PageMatureLSN advances or such. > * heap_getsysattr() should probably return different things for mature > pages. > > There's probably more. Hmm, some of those are trivial, but others, rewrite_heap_tuple() are currently only passed the HeapTuple, with no reference to the page where the tuple came from. Instead of changing code to always pass that along with a tuple, I think we should add a boolean to HeapTupleData, to indicate if the tuple came from a mature page. If the flag is set, the tuple should be considered visible to everyone, without looking at the xmin/xmax. This might affect extensions, although usually external C code that have to deal with HeapTuples will use functions like heap_form_tuple to do so, and won't need to deal with low-level stuff or visibility themselves. Attached is a new version, which adds that field to HeapTupleData. Most of the issues on you listed above have been fixed, plus a bunch of other bugs I found myself. The bug that Jeff ran into with his count.pl script has also been fixed. - Heikki
Attachment
pgsql-hackers by date: