Dear Alvaro,
Thanks for your answers. Unfortunately, I was unaware of a shutdown
record, that makes a difference then. So, I definitely must stop the
primary first, then use pg_controldata to obtain checkpoint info. Then,
can I query the replicas while they are up and running if they've
received the shutdown record or not? So, after shutting down the
primary, how will I know if a replica has received the mentioned record,
and is safe to shutdown?
Thanks for the clarifications.
Best regards,
Richard
2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
> On 2025-Feb-19, richard@kojedz.in wrote:
>
>> With this, I have the question, that after the shutdown of primary,
>> what is
>> the guarantee for replicas having the same checkpoint location? Why
>> does the
>> order of shutting down the servers matter? What would be the really
>> exact
>> and reliable way to ensure that replicas will have the same checkpoint
>> location as the primary?
>
> The replicas can't write WAL by themselves, but they will replay
> whatever the primary has sent; by shutting down the primary first and
> letting the replicas catch up, you ensure that the replicas will
> actually receive the shutdown record and replay it. If you shut down
> the replicas first, they can obviously never catch up with the shutdown
> checkpoint of the primary.
>
> As I recall, if you do shut down the primary first, one potential
> danger
> is that the primary fails to send the checkpoint record before shutting
> down, so the replicas won't receive it and obviously will not replay
> it;
> or simply that they are behind enough that they receive it but don't
> replay it.
>
> You could use pg_controldata to read the last checkpoint info from all
> nodes. You can run it on the primary after shutting it down, and then
> on each replica while it's still running to ensure that the correct
> restartpoint has been created.