Re: Replication with Patroni not working after killing secondary and starting again - Mailing list pgsql-general
From | Zb B |
---|---|
Subject | Re: Replication with Patroni not working after killing secondary and starting again |
Date | |
Msg-id | CAKwARkbqwVc35dZWFLvrwL_6FxvwJSq-UEzFareEcoLvqqNYsA@mail.gmail.com Whole thread Raw |
In response to | Re: Replication with Patroni not working after killing secondary and starting again ("Peter J. Holzer" <hjp-pgsql@hjp.at>) |
Responses |
Re: Replication with Patroni not working after killing secondary and starting again
|
List | pgsql-general |
> What does `patronictl list` show during that interval?
Well. I can't repeat the situation anymore. Now the replication starts immediately after starting the patroni on secondary. I did several switchover commands meanwhile though
Meanwhile I did another test where I run a Java app with a large number of *short* transactions (inserts) and during execution of this app I do the patroni switchover command:
patronictl -c /etc/patroni/patroni.yml switchover
It turned out the records were not replicated to the secondary and when I tried to execute the switchover command on the primary I got the following error:
Error: This cluster has no master
When I tried to execute the switchover command on the secondary it worked but because there was a discrepancy between the primary and secondary the records on the old primary were rolled back (the number of records on primary and secondary became the same - the same as it was on the old secondary)
Apparently there is something wrong with my cluster. How to debug i?. Do I need to configure anything so the replication is synchronous?
pt., 29 kwi 2022 o 22:33 Peter J. Holzer <hjp-pgsql@hjp.at> napisał(a):
On 2022-04-28 11:09:12 +0200, Zb B wrote:
> > When the secondary starts up it should continue replicating from where
> > it stopped. However, it can only do this if the necessary information is
> > still available. If WAL files have been deleted in the mean time. it
> > can't replay them. There should be error messages in your logs on what
> > went wrong
>
> I did another test using different wal_sender_timeout parameter, as the time of
> the secondary being shut down was longer than the default 60s for this
> parameter.
I don't think this will help. It will just make the primary slower in
noticing that the secondary is gone.
> I was hoping it would help but the result was the same (records were not
> replicated to the secondary after the patroni start). Well, I just verified
> again that the records were replicated after about 15 minutes to the secondary,
> so probably the timeout setting helped, or I was not patient enough before.
The latter, I suspect. Although I'm surprised that it takes so long. In
my experience, that takes only a few seconds, certainly less than a
minute for replication to start (how long it takes to finish depends on
the amount of data, of course).
Patroni can nuke the secondary database and create a fresh copy
(using basebackup). That might take 15 minutes (depending on the
database size). I don't think it does that automatically, though. Also I
think you would have noticed that.
What does `patronictl list` show during that interval?
> Is it normal to wait so long for the replication? (the original
> transaction in primary took about 5 minutes and was about 3000 small
> records). I am providing more details for completeness below:
>
> I get the following errors on the primary DB:
> 2022-04-28 04:36:50.544 EDT [13794] WARNING: archive_mode enabled, yet
> archive_command is not set
> 2022-04-28 04:37:34.893 EDT [14755] ERROR: replication slot "xyzd3riardb05"
> does not exist
> 2022-04-28 04:37:34.893 EDT [14755] STATEMENT: START_REPLICATION SLOT
> "xyzd3riardb05" 0/7000000 TIMELINE 18
...
> and after some time such errors stop to appear.
So the replication slot is probably created after some time and then
replication starts to work.
I think that replication slot is managed by Patroni. So the question
would be: Why does Patroni take so long to create it? Did it log
anything?
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
pgsql-general by date: