Re: pg_rewind failure by file deletion in source server - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pg_rewind failure by file deletion in source server
Date
Msg-id CAB7nPqT=tgOgr3-9pxRy-ABfYYms0rTan5R6f2nEV4rjOz9jdg@mail.gmail.com
Whole thread Raw
In response to Re: pg_rewind failure by file deletion in source server  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_rewind failure by file deletion in source server  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Fri, Jun 19, 2015 at 9:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 19, 2015 at 8:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jun 19, 2015 at 12:14 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>>> Listing the directories with pg_ls_dir() has the same problem.
>>>
>>> (After some discussion on IM with Heikki on this one).
>>> This is actually more tricky because pg_ls_dir() does not return '.'
>>> or '..' that we could use to identify that the directory actually
>>> exists or not when it is empty. Hence I think that we should add two
>>> options to pg_ls_dir:
>>> - include_self, default to false. If set to true, '.' is added in the
>>> list of items.
>>> - if_not_exists, to bypass error that a folder does not exist, default
>>> at false. If if_not_exists = true and include_self = true, returning
>>> only '.' would mean that the folder exist but that it is empty. If
>>> if_not_exists = true and include_self = false, no rows are returned.
>>> We could as well ERROR as well if both options are set like that. I am
>>> fine with any of them as long as behavior is properly documented.
>>
>> Including '.' to distinguish between an empty directory and a
>> nonexistent one seems like an unnecessarily complicated and
>> non-obvious API.  How about just one additional parameter bool
>> *exists.  If NULL and no directory, ERROR, else on return set *exists
>> to true or false.
>
> Err, wait.  You're talking about an SQL function, heh heh.  So that
> won't work.  Maybe what you proposed is the best we can do, then,
> although I would suggest that you rename the include_self parameter to
> something like include_dot_dirs and return both "." and "..".
> Returning only "." seems like it will seem weird to people.

So... Attached are a set of patches dedicated at fixing this issue:
- 0001, add if_not_exists to pg_tablespace_location, returning NULL if
path does not exist
- 0002, same with pg_stat_file, returning NULL if file does not exist
- 0003, same with pg_read_*file. I added them to all the existing
functions for consistency.
- 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
(thanks Robert for the naming!)
- 0005, as things get complex, a set of regression tests aimed to
covering those things. pg_tablespace_location is platform-dependent,
so there are no tests for it.
- 0006, the fix for pg_rewind, using what has been implemented before.

Attached is an updated test case triggering the issue
(rewind_test.bash), with the small patch attached that adds a pg_sleep
call in pg_rewind.c (20150623_pg_rewind_sleep.patch).

I imagine that this is a bug people are going to meet in the field
easily, particularly with temporary relation files or temporary XLOG
files.
Regards,
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: A couple of newlines missing in pg_rewind log entries
Next
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing