Re: pg_rewind failure by file deletion in source server - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: pg_rewind failure by file deletion in source server
Date
Msg-id 55894EB4.4080109@iki.fi
Whole thread Raw
In response to Re: pg_rewind failure by file deletion in source server  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: pg_rewind failure by file deletion in source server  (Fujii Masao <masao.fujii@gmail.com>)
Re: pg_rewind failure by file deletion in source server  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 06/23/2015 07:51 AM, Michael Paquier wrote:
> So... Attached are a set of patches dedicated at fixing this issue:

Thanks for working on this!

> - 0001, add if_not_exists to pg_tablespace_location, returning NULL if
> path does not exist
> - 0002, same with pg_stat_file, returning NULL if file does not exist
> - 0003, same with pg_read_*file. I added them to all the existing
> functions for consistency.
> - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
> (thanks Robert for the naming!)
> - 0005, as things get complex, a set of regression tests aimed to
> covering those things. pg_tablespace_location is platform-dependent,
> so there are no tests for it.
> - 0006, the fix for pg_rewind, using what has been implemented before.

With thes patches, pg_read_file() will return NULL for any failure to 
open the file, which makes pg_rewind to assume that the file doesn't 
exist in the source server, and will remove the file from the 
destination. That's dangerous, those functions should check specifically 
for ENOENT.

There's still a small race condition with tablespaces. If you run CREATE 
TABLESPACE in the source server while pg_rewind is running, it's 
possible that the recursive query that pg_rewind uses sees the symlink 
in pg_tblspc/ directory, but its snapshot doesn't see the row in 
pg_tablespace yet. It will think that the symlink is a regular file, try 
to read it, and fail (if we checked for ENOENT).

Actually, I think we need try to deal with symlinks a bit harder. 
Currently, pg_rewind assumes that anything in pg_tblspace that has a 
matching row in pg_tablespace is a symlink, and nothing else is. I think 
symlinks to directories. I just noticed that pg_rewind fails miserable 
if pg_xlog is a symlink, because of that:

----
The servers diverged at WAL position 0/3023F08 on timeline 1.
Rewinding from last common checkpoint at 0/2000060 on timeline 1

"data-master//pg_xlog" is not a directory
Failure, exiting
----

I think we need to add a column to pg_stat_file output, to indicate 
symbolic links, and add a pg_readlink() function. That still leaves a 
race condition if the type of a file changes, i.e. a file is deleted and 
a directory with the same name is created in its place, but that seems 
acceptable. I don't think PostgreSQL ever does such a thing, so that 
could only happen if you mess with the data directory manually while the 
server is running.

I just realized another problem: We recently learned the hard way that 
some people have files in the data directory that are not writeable by 
the 'postgres' user 
(http://www.postgresql.org/message-id/20150523172627.GA24277@msg.df7cb.de). 
pg_rewind will try to overwrite all files it doesn't recognize as 
relation files, so it's going to fail on those. A straightforward fix 
would be to first open the destination file in read-only mode, and 
compare its contents, and only open the file in write mode if it has 
changed. It would still fail when the files really differ, but I think 
that's acceptable.

I note that pg_rewind doesn't need to distinguish between an empty and a 
non-existent directory, so it's quite silly for it to pass 
include_dot_dirs=true, and then filter out "." and ".." from the result 
set. The documentation should mention the main reason for including "." 
and "..": to distinguish between an empty and non-existent directory.

- Heikki




pgsql-hackers by date:

Previous
From: Sandro Santilli
Date:
Subject: Memory context at PG_init call ?
Next
From: Heikki Linnakangas
Date:
Subject: Re: A couple of newlines missing in pg_rewind log entries