how to check that recovery is complete - Mailing list pgsql-general
From | Dmitry O Litvintsev |
---|---|
Subject | how to check that recovery is complete |
Date | |
Msg-id | SA0PR09MB71613DE760DBADC8712FDF17B9EE0@SA0PR09MB7161.namprd09.prod.outlook.com Whole thread Raw |
Responses |
Re: how to check that recovery is complete
|
List | pgsql-general |
Hi, I have a workflow where I recover from PITR backup and run a query on it. The program that runs query checks that it can connect to database in a loop, until it can, and then runs the query. This has worked fine far. Recently I upgraded to 11 and I see that I can connect to DB while recovery is not complete yet. See this: < 2020-11-05 03:34:36.114 CST >LOG: starting archive recovery < 2020-11-05 03:34:36.590 CST >LOG: restored log file "00000001000002EF000000F9" from archive < 2020-11-05 03:34:36.641 CST >LOG: redo starts at 2EF/F9000028 ... < 2020-11-05 03:34:46.392 CST >LOG: restored log file "00000001000002F000000008" from archive < 2020-11-05 03:34:46.658 CST 127.0.0.1 >FATAL: the database system is starting up < 2020-11-05 03:34:47.028 CST >LOG: restored log file "00000001000002F000000009" from archive You can see above fail to connect , but sometime into recover I see; < 2020-11-05 04:07:51.987 CST >LOG: restored log file "00000001000002F200000029" from archive < 2020-11-05 04:08:23.195 CST 127.0.0.1 >ERROR: canceling statement due to conflict with recovery < 2020-11-05 04:08:23.195 CST 127.0.0.1 >DETAIL: User query might have needed to see row versions that must be removed. < 2020-11-05 04:08:23.195 CST 127.0.0.1 >STATEMENT: select count(*) from file < 2020-11-05 04:08:23.195 CST >FATAL: terminating connection due to administrator command < 2020-11-05 04:08:23.195 CST >STATEMENT: select count(*) from file < 2020-11-05 04:08:23.195 CST >FATAL: terminating connection due to administrator command < 2020-11-05 04:08:23.195 CST >STATEMENT: select count(*) from file < 2020-11-05 04:08:23.232 CST >LOG: background worker "parallel worker" (PID 13577) exited with exit code 1 < 2020-11-05 04:08:23.244 CST >LOG: background worker "parallel worker" (PID 13578) exited with exit code 1 < 2020-11-05 04:08:23.244 CST 127.0.0.1 >FATAL: terminating connection due to conflict with recovery < 2020-11-05 04:08:23.244 CST 127.0.0.1 >DETAIL: User query might have needed to see row versions that must be removed. < 2020-11-05 04:08:23.244 CST 127.0.0.1 >HINT: In a moment you should be able to reconnect to the database and repeat yourcommand. < 2020-11-05 04:08:25.354 CST >LOG: restored log file "00000001000002F20000002A" from archive < 2020-11-05 04:08:55.555 CST 127.0.0.1 >ERROR: canceling statement due to conflict with recovery < 2020-11-05 04:08:55.555 CST 127.0.0.1 >DETAIL: User query might have needed to see row versions that must be removed. < 2020-11-05 04:08:55.555 CST 127.0.0.1 >STATEMENT: select count(*) from file < 2020-11-05 04:08:55.556 CST >FATAL: terminating connection due to administrator command < 2020-11-05 04:08:55.556 CST >STATEMENT: select count(*) from file < 2020-11-05 04:08:55.561 CST >FATAL: terminating connection due to administrator command < 2020-11-05 04:08:55.561 CST >STATEMENT: select count(*) from file < 2020-11-05 04:08:55.640 CST >LOG: background worker "parallel worker" (PID 13683) exited with exit code 1 < 2020-11-05 04:08:55.653 CST >LOG: background worker "parallel worker" (PID 13684) exited with exit code 1 < 2020-11-05 04:08:55.653 CST 127.0.0.1 >FATAL: terminating connection due to conflict with recovery < 2020-11-05 04:08:55.653 CST 127.0.0.1 >DETAIL: User query might have needed to see row versions that must be removed. < 2020-11-05 04:08:55.653 CST 127.0.0.1 >HINT: In a moment you should be able to reconnect to the database and repeat yourcommand. < 2020-11-05 04:09:00.307 CST >LOG: restored log file "00000001000002F20000002B" from archive As you can see a query "select count(*) from file" failed due to table not being restored yet. BUT connection was allowed before DB was ready.... Only few hours after ; < 2020-11-05 09:31:30.319 CST >LOG: archive recovery complete < 2020-11-05 09:34:51.729 CST >LOG: database system is ready to accept connections After which the query runs fine without errors. This is bad because I see that select count(*) takes progressively longer to execute and the count(*) is not what I am interested in. I run the "real" query after that. As a result I add hours to program execution time. Is there a more robust method to it? Ideally I do not want to be able to connect to db until : < 2020-11-05 09:31:30.319 CST >LOG: archive recovery complete < 2020-11-05 09:34:51.729 CST >LOG: database system is ready to accept connections And I believe this was the behavior before upgrade. If connection can't be disabled, how can I detect condition "database system is ready to accept connections" I believe "pg_isready" utility would succeed once it can connect. And as can see I could connect way before DB is really ready. Thanks! Dmitry
pgsql-general by date: