Thread: Replication with Patroni not working after killing secondary and starting again

Hi,
I am new to Patroni and PostgreSQL.We have set up a cluster with etcd (3 nodes), Patroni (2 nodes) and PostgreSQL (2 nodes) with replication from primary to secondary. Seemed to be working fine and we started some tests. One of the tests gave us unsatisfactory results. Specifically when we start a long transaction with multiple inserts (we use remote Java app for that) and during execution of this transaction we kill the secondary database by using the following:
first list the processes:
ps aux | egrep '(patroni|postgresql)'
and
uses the kill -9 command for the two processes that have postgres in the name then the following happens:

1) Java app continues with the long transaction which is OK
2) Patroni is not restarting automatically again on the secondary, which I assume is as expected (?)
3) When the transaction finishes and we start the patroni on secondary by using the commands:
systemctl enable patroni
systemctl start patroni

 the database starts on secondary after a while but the replication from the primary is not working anymore. Trying new transactions does not replicate the data either. The same set up worked before in other scenarios - replicated data (e.g. when using short transactions, killing the secondary, restarting secondary).

Thus my questions:
1) Is it normal that replication stops working if we kill secondary postgres and start it again using patroni? Do we need to do any additional steps except the commands above that start patroni?
2) Is it normal that patroni is not started again automatically after we kill it and postgres on secondary?
3) Or there is something wrong with our setup and the replication should be recovered automatically after we kill the secondary and start the patroni again on secondary?

Thanks,

Zbigniew

Re: Replication with Patroni not working after killing secondary and starting again

From
"Peter J. Holzer"
Date:
On 2022-04-27 15:27:34 +0200, Zb B wrote:
> Hi,
> I am new to Patroni and PostgreSQL.We have set up a cluster with etcd (3
> nodes), Patroni (2 nodes) and PostgreSQL (2 nodes) with replication from
> primary to secondary.

Pretty much the setup we have.

> Seemed to be working fine and we started some tests. One of the tests
> gave us unsatisfactory results. Specifically when we start a long
> transaction with multiple inserts (we use remote Java app for that)
> and during execution of this transaction we kill the secondary
> database by using the following:
[...]
>  the database starts on secondary after a while but the replication from the
> primary is not working anymore.
[...]
> Thus my questions:
> 1) Is it normal that replication stops working if we kill secondary postgres
> and start it again using patroni? Do we need to do any additional steps except
> the commands above that start patroni?

No.

When the secondary starts up it should continue replicating from where
it stopped. However, it can only do this if the necessary information is
still available. If WAL files have been deleted in the mean time. it
can't replay them. There should be error messages in your logs on what
went wrong.

> 2) Is it normal that patroni is not started again automatically after we kill
> it and postgres on secondary?

Depends on your system setup. Did you tell systemd to automatically
restart patroni? Patroni obviously can't do anything by itself after
it's been killed.

> 3) Or there is something wrong with our setup and the replication should be
> recovered automatically after we kill the secondary and start the patroni again
> on secondary?

I assume (but you really haven't given us enough information, so I could
be wrong) that you haven't configured a replication slot and you haven't
enough WAL segments to last through the downtime naturally.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment
> When the secondary starts up it should continue replicating from where
> it stopped. However, it can only do this if the necessary information is
> still available. If WAL files have been deleted in the mean time. it
> can't replay them. There should be error messages in your logs on what
> went wrong

I did another test using different wal_sender_timeout parameter, as the time of the secondary being shut down was longer than the default 60s for this parameter.
I was hoping it would help but the result was the same (records were not replicated to the secondary after the patroni start). Well, I just verified again that the records were replicated after about 15 minutes to the secondary, so probably the timeout setting helped, or I was not patient enough before. Is it normal to wait so long for the replication? (the original transaction in primary took about 5 minutes and was about 3000 small records). I am providing more details for completeness below:

I get the following errors on the primary DB:
2022-04-28 04:36:50.544 EDT [13794] WARNING:  archive_mode enabled, yet archive_command is not set
2022-04-28 04:37:34.893 EDT [14755] ERROR:  replication slot "xyzd3riardb05" does not exist
2022-04-28 04:37:34.893 EDT [14755] STATEMENT:  START_REPLICATION SLOT "xyzd3riardb05" 0/7000000 TIMELINE 18
2022-04-28 04:37:34.915 EDT [14756] ERROR:  replication slot "xyzd3riardb05" does not exist
2022-04-28 04:37:34.915 EDT [14756] STATEMENT:  START_REPLICATION SLOT "xyzd3riardb05" 0/7000000 TIMELINE 18
2022-04-28 04:37:39.925 EDT [14763] ERROR:  replication slot "xyzd3riardb05" does not exist
2022-04-28 04:37:39.925 EDT [14763] STATEMENT:  START_REPLICATION SLOT "xyzd3riardb05" 0/7000000 TIMELINE 18
2022-04-28 04:37:44.924 EDT [14768] ERROR:  replication slot "xyzd3riardb05" does not exist
2022-04-28 04:37:44.924 EDT [14768] STATEMENT:  START_REPLICATION SLOT "xyzd3riardb05" 0/7000000 TIMELINE 18
and after some time such errors stop to appear.

and when I execute:
su postgres -c "psql -c \"SELECT * FROM pg_replication_slots;\""

I get the following the slot seems to exist:
 slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_f
lush_lsn | wal_status | safe_wal_size | two_phase
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+------------
---------+------------+---------------+-----------
 xyzd3riardb05 |        | physical  |        |          | f         | f      |            |      |              | 0/73289E8   |
         | reserved   |               | f
 pdc2b         |        | physical  |        |          | f         | f      |            |      |              | 0/726D398   |
         | reserved   |               | f
 pdc2b_standby |        | physical  |        |          | f         | f      |            |      |              | 0/726D398   |
         | reserved   |               | f
(3 rows)

And as I said I just verified that the records were replicated after about 15 minutes to the secondary.


Re: Replication with Patroni not working after killing secondary and starting again

From
"Peter J. Holzer"
Date:
On 2022-04-28 11:09:12 +0200, Zb B wrote:
> > When the secondary starts up it should continue replicating from where
> > it stopped. However, it can only do this if the necessary information is
> > still available. If WAL files have been deleted in the mean time. it
> > can't replay them. There should be error messages in your logs on what
> > went wrong
>
> I did another test using different wal_sender_timeout parameter, as the time of
> the secondary being shut down was longer than the default 60s for this
> parameter.

I don't think this will help. It will just make the primary slower in
noticing that the secondary is gone.


> I was hoping it would help but the result was the same (records were not
> replicated to the secondary after the patroni start). Well, I just verified
> again that the records were replicated after about 15 minutes to the secondary,
> so probably the timeout setting helped, or I was not patient enough before.

The latter, I suspect. Although I'm surprised that it takes so long. In
my experience, that takes only a few seconds, certainly less than a
minute for replication to start (how long it takes to finish depends on
the amount of data, of course).

Patroni can nuke the secondary database and create a fresh copy
(using basebackup). That might take 15 minutes (depending on the
database size). I don't think it does that automatically, though. Also I
think you would have noticed that.

What does `patronictl list` show during that interval?


> Is it normal to wait so long for the replication? (the original
> transaction in primary took about 5 minutes and was about 3000 small
> records). I am providing more details for completeness below:
>
> I get the following errors on the primary DB:
> 2022-04-28 04:36:50.544 EDT [13794] WARNING:  archive_mode enabled, yet
> archive_command is not set
> 2022-04-28 04:37:34.893 EDT [14755] ERROR:  replication slot "xyzd3riardb05"
> does not exist
> 2022-04-28 04:37:34.893 EDT [14755] STATEMENT:  START_REPLICATION SLOT
> "xyzd3riardb05" 0/7000000 TIMELINE 18
...
> and after some time such errors stop to appear.

So the replication slot is probably created after some time and then
replication starts to work.

I think that replication slot is managed by Patroni. So the question
would be: Why does Patroni take so long to create it? Did it log
anything?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment
> What does `patronictl list` show during that interval?

Well. I can't repeat the situation anymore. Now the replication starts immediately after starting the patroni on secondary. I did several switchover commands meanwhile though

Meanwhile I did another test where I run a Java app with a large number of *short* transactions (inserts) and during execution of this app I do the patroni switchover command:

patronictl -c /etc/patroni/patroni.yml switchover

It turned out the records were not replicated to the secondary and when I tried to execute the switchover command on the primary I got the following error:
Error: This cluster has no master

When I tried to execute the switchover command on  the secondary it worked but because there was a discrepancy between the primary and secondary the records on the old primary were rolled back (the number of records on primary and secondary became the same - the same as it was on the old secondary)

Apparently there is something wrong with my cluster. How to debug i?. Do I need to configure anything so the replication is synchronous?

 



pt., 29 kwi 2022 o 22:33 Peter J. Holzer <hjp-pgsql@hjp.at> napisał(a):
On 2022-04-28 11:09:12 +0200, Zb B wrote:
> > When the secondary starts up it should continue replicating from where
> > it stopped. However, it can only do this if the necessary information is
> > still available. If WAL files have been deleted in the mean time. it
> > can't replay them. There should be error messages in your logs on what
> > went wrong
>
> I did another test using different wal_sender_timeout parameter, as the time of
> the secondary being shut down was longer than the default 60s for this
> parameter.

I don't think this will help. It will just make the primary slower in
noticing that the secondary is gone.


> I was hoping it would help but the result was the same (records were not
> replicated to the secondary after the patroni start). Well, I just verified
> again that the records were replicated after about 15 minutes to the secondary,
> so probably the timeout setting helped, or I was not patient enough before.

The latter, I suspect. Although I'm surprised that it takes so long. In
my experience, that takes only a few seconds, certainly less than a
minute for replication to start (how long it takes to finish depends on
the amount of data, of course).

Patroni can nuke the secondary database and create a fresh copy
(using basebackup). That might take 15 minutes (depending on the
database size). I don't think it does that automatically, though. Also I
think you would have noticed that.

What does `patronictl list` show during that interval?


> Is it normal to wait so long for the replication? (the original
> transaction in primary took about 5 minutes and was about 3000 small
> records). I am providing more details for completeness below:
>
> I get the following errors on the primary DB:
> 2022-04-28 04:36:50.544 EDT [13794] WARNING:  archive_mode enabled, yet
> archive_command is not set
> 2022-04-28 04:37:34.893 EDT [14755] ERROR:  replication slot "xyzd3riardb05"
> does not exist
> 2022-04-28 04:37:34.893 EDT [14755] STATEMENT:  START_REPLICATION SLOT
> "xyzd3riardb05" 0/7000000 TIMELINE 18
...
> and after some time such errors stop to appear.

So the replication slot is probably created after some time and then
replication starts to work.

I think that replication slot is managed by Patroni. So the question
would be: Why does Patroni take so long to create it? Did it log
anything?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
On 2022-05-04 10:21:56 +0200, Zb B wrote:
> Apparently there is something wrong with my cluster. How to debug i?.
> Do I need to configure anything so the replication is synchronous?

Does https://patroni.readthedocs.io/en/latest/replication_modes.html
help?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment
Does https://patroni.readthedocs.io/en/latest/replication_modes.html
help?

Thanks. I have found the same meanwhile. The effects I experienced were caused by the fact that Patroni configures async replication by default. After changing it to sync everything worked as expected