Thread: [MASSMAIL]pg_rewind after promote

[MASSMAIL]pg_rewind after promote

From

Emond Papegaaij

Date:

28 March 2024, 14:52:52

Hi,

We develop an application that uses PostgreSQL in combination with Pgpool as a database backend for a Jakarta EE application (on WildFly). This application supports running in a clustered setup with 3 nodes, providing both high availability and load balancing. Every node runs an instance of the database, a pgpool and the application server. Pgpool manages the PostgreSQL replication using async streaming replication, with 1 primary and 2 standby nodes.

The versions used are (containerized on debian:bullseye-slim):

PostgreSQL version 12.18

Pgpool2 version 4.5.0

The problem we are seeing happens during planned maintenance, for example, when updates are installed and the hosts need to reboot. We take the hosts out of the cluster one at a time, perform the updates and reboot, and bring the host back into the cluster. If the host that needs to be taken out has the role of the primary database, we need to perform a failover. For this, we perform several steps:

* we detach the primary database backend, forcing a failover

* pgpool selects a new primary database and promotes it

* the other 2 nodes (the old primary and the other standby) are rewound and streaming is resumed from the new primary

* the node that needed to be taken out of the cluster (the old primary) is shutdown and rebooted

This works fine most of the time, but sometimes we see this message on one of the nodes:

pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required

This message seems timing related, as the first node might report that, while the second reports something like:

pg_rewind: servers diverged at WAL location 5/F28AB1A8 on timeline 21 pg_rewind: rewinding from last common checkpoint at 5/F27FCA98 on timeline 21 pg_rewind: Done!

If we ignore the response from pg_rewind, streaming will break on the node that reported no rewind was required. On the new primary, we do observe the database moving from timeline 21 to 22, but it seems this takes some time to materialize to be observable by pg_rewind. This window where the new timeline does exist, but is not observed by pg_rewind makes our failover much less reliable. So, I've got 2 questions:

1. Is my observation about the starting of a new timeline correct?

2. If yes, is there anything we can do during to block promotion process until the new timeline has fully materialized, either by waiting or preferably forcing the new timeline to be started?

Best regards,

Emond Papegaaij

Re: pg_rewind after promote

From

Laurenz Albe

Date:

28 March 2024, 15:21:55

On Thu, 2024-03-28 at 15:52 +0100, Emond Papegaaij wrote:

>  * we detach the primary database backend, forcing a failover
>  * pgpool selects a new primary database and promotes it
>  * the other 2 nodes (the old primary and the other standby) are rewound
>    and streaming is resumed from the new primary
>  * the node that needed to be taken out of the cluster (the old primary)
>    is shutdown and rebooted
>
> This works fine most of the time, but sometimes we see this message on one of the nodes:
> pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required
> This message seems timing related, as the first node might report that,
> while the second reports something like:
> pg_rewind: servers diverged at WAL location 5/F28AB1A8 on timeline 21
> pg_rewind: rewinding from last common checkpoint at 5/F27FCA98 on timeline 21
> pg_rewind: Done!
>
> If we ignore the response from pg_rewind, streaming will break on the node that reported
> no rewind was required. On the new primary, we do observe the database moving from timeline
> 21 to 22, but it seems this takes some time to materialize to be observable by pg_rewind.
>
> 1. Is my observation about the starting of a new timeline correct?
> 2. If yes, is there anything we can do during to block promotion process until the new
>    timeline has fully materialized, either by waiting or preferably forcing the new
>    timeline to be started?

This must be the problem addressed by commit 009eeee746 [1].

You'd have to upgrade to PostgreSQL v16, which would be a good idea anyway, given
that you are running v12.

A temporary workaround could be to explicitly trigger a checkpoint right after
promotion.

Yours,
Laurenz Albe


 [1]. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=009eeee746825090ec7194321a3db4b298d6571e

Re: pg_rewind after promote

From

Emond Papegaaij

Date:

28 March 2024, 16:17:08

Op do 28 mrt 2024 om 16:21 schreef Laurenz Albe <laurenz.albe@cybertec.at>:

On Thu, 2024-03-28 at 15:52 +0100, Emond Papegaaij wrote:
> This works fine most of the time, but sometimes we see this message on one of the nodes:
> pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required
> This message seems timing related, as the first node might report that,
> while the second reports something like:
> pg_rewind: servers diverged at WAL location 5/F28AB1A8 on timeline 21
> pg_rewind: rewinding from last common checkpoint at 5/F27FCA98 on timeline 21
> pg_rewind: Done!
>
> If we ignore the response from pg_rewind, streaming will break on the node that reported
> no rewind was required. On the new primary, we do observe the database moving from timeline
> 21 to 22, but it seems this takes some time to materialize to be observable by pg_rewind.

This must be the problem addressed by commit 009eeee746 [1].

Thanks for the quick help!

This commit does seem to exactly address the problem we are seeing. Great to hear it's fixed in the latest version!

You'd have to upgrade to PostgreSQL v16, which would be a good idea anyway, given
that you are running v12.

This is quite high on our roadmap. We were at v12 when we introduced our HA setup. Before then, upgrading PostgreSQL was as simple as running pg_upgrade, but now we need to deal with upgrading an entire cluster. We are thinking about setting up logical replication to a single v16 node, and resync the cluster from that node. We will make sure to upgrade before v12 is EOL (November this year).

A temporary workaround could be to explicitly trigger a checkpoint right after
promotion.

Would this be as simple as sending a CHECKPOINT to the new primary just after promoting? This would work fine for us until we've migrated to v16.

Best regards,

Emond Papegaaij

Re: pg_rewind after promote

From

Laurenz Albe

Date:

28 March 2024, 22:01:52

On Thu, 2024-03-28 at 17:17 +0100, Emond Papegaaij wrote:
> Op do 28 mrt 2024 om 16:21 schreef Laurenz Albe <laurenz.albe@cybertec.at>:
> > On Thu, 2024-03-28 at 15:52 +0100, Emond Papegaaij wrote:
> > > pg_rewind: source and target cluster are on the same timeline pg_rewind: no rewind required
> > >
> > > If we ignore the response from pg_rewind, streaming will break on the node that reported
> > > no rewind was required. On the new primary, we do observe the database moving from timeline
> > > 21 to 22, but it seems this takes some time to materialize to be observable by pg_rewind.
> >
> > This must be the problem addressed by commit 009eeee746 [1]. 
> >
> > A temporary workaround could be to explicitly trigger a checkpoint right after
> > promotion.
>
> Would this be as simple as sending a CHECKPOINT to the new primary just after promoting?
> This would work fine for us until we've migrated to v16.

Yes, that would be the idea.

Yours,
Laurenz Albe