Re: pg_rewind failure by file deletion in source server - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pg_rewind failure by file deletion in source server
Date
Msg-id CAB7nPqRu_w+Yck0DQTSMxiCV-pbW0D+SYcO6mVZi1fnPbL9uVA@mail.gmail.com
Whole thread Raw
In response to Re: pg_rewind failure by file deletion in source server  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: pg_rewind failure by file deletion in source server
List pgsql-hackers
On Thu, Jun 11, 2015 at 5:48 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Jun 11, 2015 at 2:14 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Jun 11, 2015 at 1:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> Shouldn't pg_rewind ignore that failure of operation? If the file is not
>>> found in source server, the file doesn't need to be copied to destination
>>> server obviously. So ISTM that pg_rewind safely can skip copying that file.
>>> Thought?
>>
>> I think that you should fail. Let's imagine that the master to be
>> rewound has removed a relation file before being stopped cleanly after
>> its standby has been promoted that was here at the last checkpoint
>> before forking, and that the standby still has the relation file after
>> promotion. You should be able to copy it to be able to replay WAL on
>> it. If the standby has removed a file in the file map after taking the
>> file map, I guess that the best thing to do is fail because the file
>> that should be here for the rewound node cannot be fetched.
>
> In this case, why do you think that the file should exist in the old master?
> Even if it doesn't exist, ISTM that the old master can safely replay the WAL
> records related to the file when it restarts. So what's the problem
> if the file doesn't exist in the old master?

Well, some user may want to rewind the master down to the point where
WAL forked, and then recover it immediately when a consistent point is
reached just at restart instead of replugging it into the cluster. In
this case I think that you need the relation file of the dropped
relation to get a consistent state. That's still cheaper than
recreating a node from a fresh base backup in some cases, particularly
if the last base backup taken is far in the past for this cluster.

>> Documentation should be made clearer about that with a better error
>> message...
>
> I'm wondering how we can recover (or rewind again) the old master from
> that error. This also would need to be documented if we decide not to
> fix any code regarding this problem...

FWIW, here is a scenario able to trigger the error with 1 master (port
5432, data at ~/data/5432) and 1 standby (port 5433, data at
~/data/5433).
$ psql -c 'create table aa as select generate_series(1,1000000)'
# Promote standby
$ pg_ctl promote -D ~/data/5433/
# Drop table on master
$ psql -c 'drop table aa'
DROP TABLE
$ pg_ctl stop -D ~/data/5432/

At this point there is no more relation file on master for 'aa', it is
still present on standby. Running pg_rewind at this point will work,
the relation file would be copied from the promoted standby to master.

$ lldb -- pg_rewind -D 5432 --source-server="port=5433 dbname=postgres"
Breakpoint pg_rewind after fetchSourceFileList() and before replaying
the changes from the block map, drop table 'aa' on standby and
checkpoint it, then the source file list is inconsistent and pg_rewind
will fail. This can just happen with --source-server, with
--source-pgdata

Adding a sleep() of a couple of seconds in pg_rewind may be better to
trigger directly the error ;), with DROP DATABASE for example.

Regards,
-- 
Michael



pgsql-hackers by date:

Previous
From: Naoya Anzai
Date:
Subject: Re: Comfortably check BackendPID with psql
Next
From: Fujii Masao
Date:
Subject: Re: pg_rewind failure by file deletion in source server