Re: Summary of plans to avoid the annoyance of Freezing - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Summary of plans to avoid the annoyance of Freezing
Date
Msg-id 55EF6D63.9030505@BlueTreble.com
Whole thread Raw
In response to Re: Summary of plans to avoid the annoyance of Freezing  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 9/6/15 7:25 AM, Andres Freund wrote:
> On 2015-08-10 07:03:02 +0100, Simon Riggs wrote:
>> I was previously a proponent of (2) as a practical way forwards, but my
>> proposal here today is that we don't do anything further on 2) yet, and
>> seek to make progress on 5) instead.
>>
>> If 5) fails to bring a workable solution by the Jan 2016 CF then we commit
>> 2) instead.
>>
>> If Heikki wishes to work on (5), that's good. Otherwise, I think its
>> something I can understand and deliver by 1 Jan, though likely for 1 Nov CF.
>
> I highly doubt that we can get either variant into 9.6 if we only start
> to seriously review them by then. Heikki's lsn ranges patch essentially
> was a variant of 5) and it ended up being a rather complicated patch. I
> don't think using an explicit epoch is going to be that much simpler.
>
> So I think we need to decide now.
>
> My vote is that we should try to get freeze maps into 9.6 - that seems
> more realistic given that we have a patch right now. Yes, it might end
> up being superflous churn, but it's rather localized. I think around
> we've put off significant incremental improvements off with the promise
> of more radical stuff too often.

I'm concerned with how to test this. Right now it's rather difficult to 
test things like epoch rollover, especially in a way that would expose 
race conditions and other corner cases. We obviously got burned by that 
on the MultiXact changes, and a lot of our best developers had to spend 
a huge amount of time fixing that. ISTM that a way to unit test things 
like CLOG/MXID truncation and visibility logic should be created before 
attempting a change like this. Would having this kind of test 
infrastructure have helped with the LSN patch development? More 
importantly, would it have reduced the odds of the MXID bugs, or made it 
easier to diagnose them?

In any case, thanks Simon for the summary. I really like the idea and 
will help with it if I can.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Proposal: Implement failover on libpq connect level.
Next
From: Thomas Munro
Date:
Subject: Re: Making tab-complete.c easier to maintain