On 3/26/22 22:37, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> I went over the patch again, polished the commit message a bit, and
>> pushed. May the buildfarm be merciful!
>
> Initial results aren't that great. komodoensis[1], petalura[2],
> and snapper[3] have all shown variants of
>
> # Failed test 'partitions with different replica identities not replicated correctly'
> # at t/031_column_list.pl line 734.
> # got: '2|4|
> # 4|9|'
> # expected: '1||5
> # 2|4|
> # 3||8
> # 4|9|'
> # Looks like you failed 1 test of 34.
> [18:19:36] t/031_column_list.pl ...............
> Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/34 subtests
>
> snapper reported different actual output than the other two:
> # got: '1||5
> # 3||8'
>
> The failure seems intermittent, as both komodoensis and petalura
> have also passed cleanly since the commit (snapper's only run once).
>
> This smells like an uninitialized-variable problem, but I've had
> no luck finding any problem under valgrind. Not sure how to progress
> from here.
>
I think I see the problem - there's a CREATE SUBSCRIPTION but the test
is not waiting for the tablesync to complete, so sometimes it finishes
in time and sometimes not. That'd explain the flaky behavior, and it's
just this one test that misses the sync AFAICS.
FWIW I did run this under valgrind a number of times, and also on
various ARM machines that tend to trip over memory issues.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company