Home > mailing lists

Re: very long secondary->primary switch time - Mailing list pgsql-general

From	Tom Lane
Subject	Re: very long secondary->primary switch time
Date	April 27, 2021 20:16:01
Msg-id	2720706.1619554561@sss.pgh.pa.us Whole thread Raw
In response to	very long secondary->primary switch time (Tomas Pospisek <tpo2@sourcepole.ch>)
Responses	Re: very long secondary->primary switch time
List	pgsql-general

Tree view

Tomas Pospisek <tpo2@sourcepole.ch> writes:
> I maintain a postgresql cluster that does failover via patroni. The 
> problem is that after a failover happens it takes the secondary too long 
> (that is about 35min) to come up and answer queries. The log of the 
> secondary looks like this:

> 04:00:29.777 [9679] LOG:  received promote request
> 04:00:29.780 [9693] FATAL:  terminating walreceiver process due to 
> administrator command
> 04:00:29.780 [9679] LOG:  invalid record length at 320/B95A1EE0: wanted 
> 24, got 0
> 04:00:29.783 [9679] LOG:  redo done at 320/B95A1EA8
> 04:00:29.783 [9679] LOG:  last completed transaction was at log time 
> 2021-03-03 03:57:46.466342+01

> 04:35:00.982 [9679] LOG:  selected new timeline ID: 15
> 04:35:01.404 [9679] LOG:  archive recovery complete
> 04:35:02.337 [9662] LOG:  database system is ready to accept connections

> The cluster is "fairly large" with thousands of DBs (sic!) and ~1TB of data.

Hm.  WAL replay is already done at the "redo done" entry.  There is a
checkpoint after that, I believe, and there may be some effort to search
for dead files as well.  Still, if your I/O subsystem is better than
a wet noodle, 35 minutes is a long time to finish that.

One thing I'm not sure about is whether we try to do the checkpoint
at maximum speed.  If you have set GUC options to throttle checkpoint
I/O hard, that could perhaps explain this.

You could possibly learn more by strace'ing the startup process to
see what it's doing.

Also, what PG version is that exactly?

            regards, tom lane

pgsql-general by date:

From: Vijaykumar Jain
Date: 27 April 2021, 17:46:48
Subject: Re: -1/0 virtualtransaction

From: Chris Stephens
Date: 28 April 2021, 00:46:38
Subject: pgbouncer configuration

Re: very long secondary->primary switch time - Mailing list pgsql-general

Previous

Next