Re: [CORE] Restore-reliability mode - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [CORE] Restore-reliability mode
Date
Msg-id CAB7nPqQsBAKuCWSqd834LwC0T+g8=yzD1GnPts6oMe4Ewrpjbg@mail.gmail.com
Whole thread Raw
In response to Re: [CORE] Restore-reliability mode  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: [CORE] Restore-reliability mode  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: [CORE] Restore-reliability mode  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On Fri, Jun 5, 2015 at 8:53 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>
> On 4 June 2015 at 22:43, Stephen Frost <sfrost@snowman.net> wrote:
>>
>> Josh,
>>
>> * Josh Berkus (josh@agliodbs.com) wrote:
>> > I would argue that if we delay 9.5 in order to do a 100% manual review
>> > of code, without adding any new automated tests or other non-manual
>> > tools for improving stability, then it's a waste of time; we might as
>> > well just release the beta, and our users will find more issues than we
>> > will.  I am concerned that if we declare a cleanup period, especially in
>> > the middle of the summer, all that will happen is that the project will
>> > go to sleep for an extra three months.
>>
>> This is the exact same concern that I have.  A delay just to have a
>> delay is not useful.  I completely agree that we need more automated
>> testing, etc, though getting all of that set up and running could be
>> done at any time too- there's no reason to wait, nor do I believe
>> delaying 9.5 would make such automated testing appear.
>>
>
> In terms of specific testing improvements, things I think we need to have
> covered and runnable on the buildfarm are:
>
> * pg_dump and pg_restore testing (because it's scary we don't do this)

We do test it in some way with pg_upgrade using set of objects that
are not removed by the regression test suite. Extension dumps are
uncovered yet though.

> * WAL archiving based warm standby testing with promotion
> * Two node streaming replication with promotion, both with a slot and with
> archive fallback
> * Three node cascading streaming replication with middle node promotion then
> tail end node promotion
> * Logical decoding streaming testing, comparing to expected decoded output
> * hard-kill the postmaster, start up from crashed datadir
> * pg_basebackup + start up from backup
> * pg_start_backup, rsync, pg_stop_backup, start up in hot standby
> * Tests of crash recovery during various DDL operations

Well, steps in this direction are the point of this patch, the
replication test suite:
https://commitfest.postgresql.org/5/197/
And this one, addition of Windows support for TAP tests:
https://commitfest.postgresql.org/5/207/

> * DDL deparse test coverage for all operations

What do you have in mind except what is already in objectaddress.sql
and src/test/modules/test_dll_deparse/?

> * disk exhaustion tests both for pg_xlog and for the main datadir, showing
> we can recover OK when disk is filled then space is freed

This may be tricky. How would you emulate that?

> Is pg_tap a reasonable starting point for this sort of testing?

IMO, using the TAP machinery would be a good base for that. What lacks
is a basic set of perl routines that one can easily use to set of test
scenarios.

> How would a test that would've caught the multixact issues look?

I have not followed closely those discussions, not sure about that.

Regards,
-- 
Michael



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
Next
From: Amit Kapila
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file