Re: Freezing without write I/O - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Freezing without write I/O
Date
Msg-id 20130918132235.GC21051@awork2.anarazel.de
Whole thread Raw
In response to Re: Freezing without write I/O  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Freezing without write I/O
List pgsql-hackers
On 2013-09-16 16:59:28 +0300, Heikki Linnakangas wrote:
> Here's a rebased version of the patch, including the above-mentioned fixes.
> Nothing else new.

* We need some higherlevel description of the algorithm somewhere in the source. I don't think I've understood the
conceptfrom the patch alone without having read the thread previously.
 
* why do we need to do the PageUpdateNeedsFreezing() dance in heap_page_prune? No xids should change during it.
* Why can we do a GetOldestXmin(allDbs = false) in BeginXidLSNRangeSwitch()?
* Is there any concrete reasoning behind the current values for XID_LSN_RANGE_INTERVAL and NUM_XID_LSN_RANGES or just
gutfeeling?
 
* the lsn ranges file can possibly become bigger than 512bytes (the size we assume to be written atomically) and you
writeit inplace. If we fail halfway through writing, we seem to be able to recover by using the pageMatureLSN from the
lastcheckpoint, but it seems better to do the fsync(),rename(),fsync() dance either way.
 
* Should we preemptively freeze tuples on a page in lazy_scan_heap if we already have dirtied the page? That would make
futuremodifcations cheaper.
 
* lazy_scan_heap now blocks acquiring a cleanup lock on every buffer that contains dead tuples. Shouldn't we use some
kindof cutoff xid there? That might block progress too heavily. Also the comment above it still refers to the old
logic.
* There's no way to force a full table vacuum anymore, that seems problematic to me.
* I wonder if CheckPointVarsup() doesn't need to update minRecoveryPoint. StartupVarsup() should be ok, because we
shouldonly read one from the future during a basebackup?
 
* xidlsnranges_recently[_dirtied] are not obvious on a first glance. Why can't we just reset dirty before the
WriteXidLSNRangesFile()call? There's only one process doing the writeout. Just because the checkpointing process could
bekilled?
 
* I think we should either not require consuming an multixactid or use a function that doesn't need
MultiXactIdSetOldestMember().If the transaction doing so lives for long it will unnecessarily prevent truncation of
mxacts.
* switchFinishXmin and nextSwitchXid should probably be either volatile or have a compiler barrier between accessing
sharedmemory and checking them. The compiler very well could optimize them away and access shmem all the time which
couldlead to weird results.
 
* I wonder whether the fact that we're doing the range switches after acquiring an xid could be problematic if we're
preventingxid allocation due to the checks earlier in that function?
 
* I think heap_lock_tuple() needs to unset all-visible, otherwise we won't vacuum that page again which can lead to
problemssince we don't do full-table vacuums again?
 

So, I think that's enough for a first look. Will think about general
issues a bit more.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: psql should show disabled internal triggers
Next
From: Dimitri Fontaine
Date:
Subject: Re: Where to load modules from?