On Wed, Sep 18, 2019 at 04:25:14PM +0530, Kuntal Ghosh wrote:
>Hello Michael,
>
>On Wed, Sep 18, 2019 at 6:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On my side, I have let this thing run for a couple of hours with a
>> patched version to include a sleep between the rename and the sync but
>> I could not reproduce it either:
>> #!/bin/bash
>> attempt=0
>> while true; do
>> attempt=$((attempt+1))
>> echo "Attempt $attempt"
>> cd $HOME/postgres/src/test/recovery/
>> PROVE_TESTS=t/006_logical_decoding.pl make check > /dev/null 2>&1
>> ERRNUM=$?
>> if [ $ERRNUM != 0 ]; then
>> echo "Failed at attempt $attempt"
>> exit $ERRNUM
>> fi
>> done
>I think the failing test is src/test/subscription/t/010_truncate.pl.
>I've tried to reproduce the same failure using your script in OS X
>10.14 and Ubuntu 18.04.2 (Linux version 5.0.0-23-generic), but
>couldn't reproduce the same.
>
I kinda suspect it might be just a coincidence that it fails during that
particular test. What likely plays a role here is a checkpoint timing
(AFAICS that's the thing removing the file). On most systems the tests
complete before any checkpoint is triggered, hence no issue.
Maybe aggressively triggering checkpoints on the running cluter from
another session would do the trick ...
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services