Re: A test for replay of regression tests - Mailing list pgsql-hackers

From Andres Freund
Subject Re: A test for replay of regression tests
Date
Msg-id 20210423152031.unkv7mmvhw4pbjuc@alap3.anarazel.de
Whole thread Raw
In response to A test for replay of regression tests  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: A test for replay of regression tests
List pgsql-hackers
Hi,

On 2021-04-23 17:37:48 +1200, Thomas Munro wrote:
> We have automated tests for many specific replication and recovery
> scenarios, but nothing that tests replay of a wide range of records.
> People working on recovery code often use installcheck (presumably
> along with other custom tests) to exercise it, sometimes with
> wal_consistency_check enabled.  So, why don't we automate that?  Aside
> from exercising the WAL decoding machinery (which brought me here),
> that'd hopefully provide some decent improvements in coverage of the
> various redo routines, many of which are not currently exercised at
> all.

Yay.


> I'm not quite sure where it belongs, though.  The attached initial
> sketch patch puts it under rc/test/recovery near other similar things,
> but I'm not sure if it's really OK to invoke make -C ../regress from
> here.

I'd say it's not ok, and we should just invoke pg_regress without make.


> Add a new TAP test under src/test/recovery that runs the regression
> tests with wal_consistency_checking=all.

Hm. I wonder if running with wal_consistency_checking=all doesn't also
reduce coverage of some aspects of recovery, by increasing record sizes
etc.


> I copied pg_update/test.sh's trick of using a different
> outputdir to avoid clobbering a concurrent run under src/test/regress,
> and I also needed to invent a way to stop it from running the cursed
> tablespace test (deferring startup of the standby also works but eats
> way too much space, which I learned after blowing out a smallish
> development VM's disk).

That's because you are using wal_consistency_checking=all, right?
Because IIRC we don't generate that much WAL otherwise?


> +# Create some content on primary and check its presence in standby 1
> +$node_primary->safe_psql('postgres',
> +    "CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a");
> +
> +# Wait for standby to catch up
> +$node_primary->wait_for_catchup($node_standby_1, 'replay',
> +    $node_primary->lsn('insert'));

> +my $result =
> +  $node_standby_1->safe_psql('postgres', "SELECT count(*) FROM tab_int");
> +print "standby 1: $result\n";
> +is($result, qq(1002), 'check streamed content on standby 1');

Why is this needed?



Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Forget close an open relation in ReorderBufferProcessTXN()
Next
From: Robert Haas
Date:
Subject: Re: decoupling table and index vacuum