Re: Recovery Test Framework - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Recovery Test Framework |
Date | |
Msg-id | 1231766569.18005.1135.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: Recovery Test Framework (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Recovery Test Framework
|
List | pgsql-hackers |
On Sun, 2009-01-11 at 12:07 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > Recovery doesn't have a test framework as yet. I would like to add one > > for this release, especially since we have so much recovery-related code > > being added to the release (and manual testing is so time consuming). > > I've been thinking for some time that putting replication into 8.4 > has proven to be an unreasonably optimistic goal. Seeing new > requirements like this one pop up two months after feature freeze > kind of drives the point home. > > I think it's time to back off and agree that we should target all this > stuff for 8.5. I don't want our first release of replication to be > flaky, but I hardly see how it will be anything else if it ships in 8.4. I understand that. As you know, I have been concerned and disappointed by the progress of sync rep in particular, though salute Fujii-san's personal effort and skill. Which patches were you thinking of when you say "all this stuff"? As a main reviewer of Sync Rep, I can say it's shaping up nicely. I don't have any reason now to veto it for architectural reasons and it covers many subtle, second level issues very well that would be easily missed in re-designs. It has some innovative features that make it best in class. Is it flaky? Not fundamentally; code wise I see it more as a question of time. Does it do everything? No, some advanced features (multiple streaming standbys, single command setup for small installs) have been deferred to later releases. Now that it's time to discuss such things I personally think we have run out of time though for WAL I/O read-ahead ("Proposal of PITR performance improvement") especially since tests show it has little advantage with FPW enabled. If we really need to we could lose most of my Infrastructure patch, since that adds fast failover and additional performance with bgwriter. If we insist upon cuts, we can lose some patches and code and yet still maintain the popular headline features of both Sync Rep and Hot Standby. Realistically, we need your attention if we are to include them. I can list points where your attention would be especially welcome since the patches are relatively large. Let's look at the detail of what we need to do rather than the broad brush. Back to the test framework: this is not relevant to replication. PITR and crash recovery are all manually tested and have been for years. Testing Hot Standby *revealed* a bug in visibility maps that went through otherwise unnoticed and I think there are others, new and old. Even if we reject replication entirely, a recovery test framework is going to increase the robustness of what we *currently* have. It's not a new requirement; what is new is I now have some sponsorship money explicitly earmarked for this, but as part of the HS project and indeed the test framework relies upon HS to operate at all. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: