Re: Recovery Test Framework - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Recovery Test Framework
Date
Msg-id 1231766569.18005.1135.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Recovery Test Framework  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Recovery Test Framework  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
On Sun, 2009-01-11 at 12:07 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > Recovery doesn't have a test framework as yet. I would like to add one
> > for this release, especially since we have so much recovery-related code
> > being added to the release (and manual testing is so time consuming).
> 
> I've been thinking for some time that putting replication into 8.4
> has proven to be an unreasonably optimistic goal.  Seeing new
> requirements like this one pop up two months after feature freeze
> kind of drives the point home.
> 
> I think it's time to back off and agree that we should target all this
> stuff for 8.5.  I don't want our first release of replication to be
> flaky, but I hardly see how it will be anything else if it ships in 8.4.

I understand that. As you know, I have been concerned and disappointed
by the progress of sync rep in particular, though salute Fujii-san's
personal effort and skill.

Which patches were you thinking of when you say "all this stuff"?

As a main reviewer of Sync Rep, I can say it's shaping up nicely. I
don't have any reason now to veto it for architectural reasons and it
covers many subtle, second level issues very well that would be easily
missed in re-designs. It has some innovative features that make it best
in class. Is it flaky? Not fundamentally; code wise I see it more as a
question of time. Does it do everything? No, some advanced features
(multiple streaming standbys, single command setup for small installs)
have been deferred to later releases. 

Now that it's time to discuss such things I personally think we have run
out of time though for WAL I/O read-ahead ("Proposal of PITR performance
improvement") especially since tests show it has little advantage with
FPW enabled. If we really need to we could lose most of my
Infrastructure patch, since that adds fast failover and additional
performance with bgwriter.

If we insist upon cuts, we can lose some patches and code and yet still
maintain the popular headline features of both Sync Rep and Hot Standby.
Realistically, we need your attention if we are to include them. I can
list points where your attention would be especially welcome since the
patches are relatively large. Let's look at the detail of what we need
to do rather than the broad brush.

Back to the test framework: this is not relevant to replication. PITR
and crash recovery are all manually tested and have been for years.
Testing Hot Standby *revealed* a bug in visibility maps that went
through otherwise unnoticed and I think there are others, new and old.
Even if we reject replication entirely, a recovery test framework is
going to increase the robustness of what we *currently* have. It's not a
new requirement; what is new is I now have some sponsorship money
explicitly earmarked for this, but as part of the HS project and indeed
the test framework relies upon HS to operate at all. 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: "Koichi Suzuki"
Date:
Subject: V4 of PITR performance improvement for 8.4
Next
From: Tom Lane
Date:
Subject: Re: about truncate