Re: Concurrency issue in pg_rewind - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Concurrency issue in pg_rewind
Date
Msg-id ac2f431b-40dc-ca2b-b8c1-deb3c621b3f6@iki.fi
Whole thread Raw
In response to Re: Concurrency issue in pg_rewind  (Alexander Kukushkin <cyberdemn@gmail.com>)
Responses Re: Concurrency issue in pg_rewind
List pgsql-hackers
On 18/09/2020 10:17, Alexander Kukushkin wrote:
> At the same time, pg_rewind due to such "fatal" error leaves PGDATA in
> an inconsistent state with empty pg_control file, this is totally bad
> and easily fixable. We want the specific file to be absent and it is
> already absent, why should it be a fatal error and not warning?

Whenever pg_rewind runs into something unexpected, it fails loudly, so 
that the administrator can re-initialize from a base backup. That's the 
general rule. If a file goes missing while pg_rewind is running, that is 
unexpected. It could be a sign that the server was started concurrently, 
or another pg_rewind was started against it, for example.

I feel that we could make an exception of some sort here, but I'm not 
sure what exactly. I don't feel comfortable just downgrading the 
unexpected ENOENT on unlink() to warning in all cases. Besides, scary 
warnings that you routinely ignore is not good either.

I have a hard time coming up with a general rule and justification 
that's not just "do X because WAL-G does Y". pg_rewind failing because 
WAL-G removed a file unexpectedly is one problem, but another is that 
the restore_command might get confused if a pg_rewind removes a file 
that restore_command needs. This is hard when restore_command does 
things in the background, and there's no communication between the 
background process and pg_rewind.

The general principle is that pg_rewind is equivalent to overwriting the 
target with the source, only faster. Perhaps pg_wal should be an 
exception, and pg_rewind should leave alone any files under pg_wal that 
it doesn't recognize.

- Heikki



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Next
From: Hamid Akhtar
Date:
Subject: Improved Cost Calculation for IndexOnlyScan