Re: [HACKERS] Still another race condition in recovery TAP tests - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Still another race condition in recovery TAP tests
Date
Msg-id CAB7nPqSnPRk-L_2bXdm0tOFiphZf1a=N+-+j=X2A9Bva=AZ_eA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Still another race condition in recovery TAP tests  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Still another race condition in recovery TAP tests  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Sep 10, 2017 at 12:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Sat, Sep 9, 2017 at 1:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yeah, even if we fixed this particular call site, I'm sure the issue
>>> would come up again.  Certainly we expect hot backups to work with
>>> a changing source directory.
>
>> In short, I'd still like to keep RecursiveCopy for now, but change its
>> code so as a copy() is not a hard failure. What do you think?
>
> The specific case we need to allow is "ENOENT on a file/dir that was
> there a moment ago".  I think it still behooves us to complain about
> anything else.  If you think it's a simple fix, have at it.  But
> I see at least three ways for _copypath_recurse to fail depending on
> exactly when the file disappears.

I have finally been able to take a couple of hours and looked at
problems in this module.

Errno is supported down to 5.8.8 per the docs, and those are set also
on Windows, copy() and opendir() complain with ENOENT for missing
entries:
http://perldoc.perl.org/5.8.8/Errno.html
readdir() does not though. If for example a directory gets removed in
the middle of a scan, then an empty list is returned. mkdir() would
not fail on ENOENT either, the parent directory would have already
been created if a child repository is being scanned.

With the check for -d and -f, this brings indeed the count to three
code paths. With the patch attached, I have added some manual sleep
calls in RecursiveCopy.pm and triggered manual deletions of the source
repository. The copy is able to complete in any case, warning about
missing directories or files. I have added warn messages in the patch
when ENOENT is triggered, though I think that those should be removed
in the final patch.

By the way, 010_logical_decoding_timelines.pl does not need to include
RecursiveCopy.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] The case for removing replacement selection sort
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Setting pd_lower in GIN metapage