Re: run pg_rewind on an uncleanly shut down cluster. - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: run pg_rewind on an uncleanly shut down cluster.
Date
Msg-id CAB7nPqQNML02GENNkdmzJuDYwSFct7YJLzsPxhnnELd2LhZd3A@mail.gmail.com
Whole thread Raw
In response to run pg_rewind on an uncleanly shut down cluster.  (Oleksii Kliukin <alexk@hintbits.com>)
Responses Re: run pg_rewind on an uncleanly shut down cluster.  (Oleksii Kliukin <alexk@hintbits.com>)
Re: run pg_rewind on an uncleanly shut down cluster.  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Tue, Oct 6, 2015 at 12:41 AM, Oleksii Kliukin <alexk@hintbits.com> wrote:
> pg_rewind -D postgresql0 --source-server="host=127.0.0.1 port=5433
> dbname=postgres"
> The servers diverged at WAL position 0/3000060 on timeline 1.
> could not open file "data/postgresql0/pg_xlog/000000010000000000000002":
> No such file or directory
>
> Note that this problem happens not 100% of time during the tests,
> sometimes pg_rewind can actually rewind the former master.

I don't think that there is any actual reason preventing us from
rewinding a node that has its state in pg_control set as something
else than DB_SHUTDOWNED, the important point here is to be sure that
the target node is *not* running while pg_rewind is running (perhaps
pg_rewind should perform an action in the target node to not have it
run, let's say that it creates a fake postgresql.conf with invalid
data and renames the existing one). Checking pg_control makes things
easier though, there is no need to rely on external binaries like
"pg_ctl status" or some parsing of postmaster.pid with kill(pid, 0)
for example.

> I know I can copy the segment back from the archive, but I'd like to
> avoid putting this logic into the failover tool if possible. Is there
> anything we can do to avoid the problem described above, or is there a
> better way to bring up the former master after the crash with pg_rewind?

Well, for 9.5 (and actually the same applies to the 9.3 and 9.4
version on github because I am keeping the previous versions
consistent with what is in 9.5), I guess no.

This missing segment is going to be needed in any case because the
list of blocks modified needs to be found, hence the question is "how
can pg_rewind guess where a WAL segment missing from the target's
pg_xlog is located?". And there are multiple answers:
- An archive path, then let's add an option to pg_rewind to add a
path, though this needs the archive path to be mounted locally, and
usually that's not the case.
- An existing node of the cluster, perhaps the segment is still
present on another standby node that already replayed it, though this
would need an extra node.
- The source node itself, if we are lucky the missing segment created
before WAL forked is still there. It may not be there though if it has
already been recycled.
At the end it seems to me that this is going to need some extra
operation by the failover tool or the system administrator either way,
and that any additional logic to check where this segment is located
is never going to satisfy completely the failover use cases. Hence I
would keep just pg_rewind out of that.
Regards,
-- 
Michael



pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: Multi-tenancy with RLS
Next
From: Oleksii Kliukin
Date:
Subject: Re: run pg_rewind on an uncleanly shut down cluster.