Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys. - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.
Date
Msg-id CAD21AoCO+eDr6xZD4Hvk5VE-V9wLZ0VTOD8b6BqYKYHXc5_3iQ@mail.gmail.com
Whole thread Raw
In response to Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Thu, Apr 14, 2016 at 1:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Apr 14, 2016 at 11:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Apr 13, 2016 at 4:54 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Fri, Apr 8, 2016 at 4:49 PM, Fujii Masao <fujii@postgresql.org> wrote:
>>>> Add regression tests for multiple synchronous standbys.
>>>>
>>>> Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
>>>> Reviewed-By: Kyotaro Horiguchi
>>>
>>> Well, we are not quite there yet:
>>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-04-12%2016%3A00%3A06
>>>
>>> # Running: pg_ctl -D
>>> /home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/data_master_Qmuz/pgdata
>>> reload
>>> server signaled
>>> not ok 2 - asterisk in synchronous_standby_names
>>>
>>> #   Failed test 'asterisk in synchronous_standby_names'
>>> #   at t/007_sync_rep.pl line 26.
>>> #          got: 'standby1|1|sync
>>> # standby2|1|potential
>>> # standby3|0|async'
>>> #     expected: 'standby1|1|sync
>>> # standby2|1|potential
>>> # standby3|1|potential'
>>
>> This seems to be a timing issue.
>>
>> There can be small window after SIGHUP is sent before walsender updates
>> its priority based on new s_s_names. If pg_stat_replication is checked
>> before that update, it displays unexpected output. Probably we need to
>> sleep a few second after pg_ctl reload before pg_stat_replication.
>
> Yes. I'd prefer avoid a hardcoded sleep and have something that relies
> on lookups of pg_stat_replication though, because there is no way to
> be sure that a sleep will ever be stable on heavily loaded machines,
> like the machines I am specialized in maintaining :)
> It kills a bit the purpose on having checks on pg_stat_replication as
> the validation tests are based on that, still I think that we had
> better base those checks on a loop that has a timeout, something like
> that in a subroutine:
> test_passed = 0;
> while ($timeout < 30)
> {
>      $result = $node->psql('SELECT blah FROM pg_stat_replication');
>      if ($result eq $expected)
>      {
>          test_passed = 1;
>          break;
>      }
>      sleep 1;
>      $timeout++;
> }
> ok($test_passed, $test_name);
>
> What do you think?

Look good to me. +1.

Regards,

--
Masahiko Sawada



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Wrong definition of pgwin32_bind.
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Support for N synchronous standby servers - take 2