Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys. - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.
Date
Msg-id CAB7nPqT6GGLwPzyPOR7BYVjYUkD=S2bA2QT6XhWB1WixWy+7BA@mail.gmail.com
Whole thread Raw
In response to Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, Apr 14, 2016 at 11:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Apr 13, 2016 at 4:54 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Apr 8, 2016 at 4:49 PM, Fujii Masao <fujii@postgresql.org> wrote:
>>> Add regression tests for multiple synchronous standbys.
>>>
>>> Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
>>> Reviewed-By: Kyotaro Horiguchi
>>
>> Well, we are not quite there yet:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-04-12%2016%3A00%3A06
>>
>> # Running: pg_ctl -D
>> /home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/data_master_Qmuz/pgdata
>> reload
>> server signaled
>> not ok 2 - asterisk in synchronous_standby_names
>>
>> #   Failed test 'asterisk in synchronous_standby_names'
>> #   at t/007_sync_rep.pl line 26.
>> #          got: 'standby1|1|sync
>> # standby2|1|potential
>> # standby3|0|async'
>> #     expected: 'standby1|1|sync
>> # standby2|1|potential
>> # standby3|1|potential'
>
> This seems to be a timing issue.
>
> There can be small window after SIGHUP is sent before walsender updates
> its priority based on new s_s_names. If pg_stat_replication is checked
> before that update, it displays unexpected output. Probably we need to
> sleep a few second after pg_ctl reload before pg_stat_replication.

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
test_passed = 0;
while ($timeout < 30)
{    $result = $node->psql('SELECT blah FROM pg_stat_replication');    if ($result eq $expected)    {
test_passed= 1;        break;    }    sleep 1;    $timeout++;
 
}
ok($test_passed, $test_name);

What do you think?
-- 
Michael



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Support for N synchronous standby servers - take 2
Next
From: Michael Paquier
Date:
Subject: Re: Support for N synchronous standby servers - take 2