Thread: Postgres 9.1 - getting a continous archiving database to accept connections taking too long
Postgres 9.1 - getting a continous archiving database to accept connections taking too long
From
James Sebastian
Date:
I am very new to postgresql and have a continous archiving server to recover the database in case of a hardware failure or pg crash. Following http://www.postgresql.org/docs/9.1/static/continuous-archiving.html, I copy the WAL archives to another machine and apply them regularly. Recently due to pg server crash at my main server I tried to restore the database by pushing a trigger (an empty file, expecting smart failover) and the logs are given below.
prodrestore_error.log
======================
WAL file not present yet. Checking for trigger file...
trigger file found: smart failover
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 0000000100000001000000CC
WAL file path: /backup/prod/db_backup/archive/0000000100000001000000CC
Restoring to: pg_xlog/RECOVERYXLOG
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/0000000100000001000000CC" "pg_xlog/RECOVERYXLOG"
Keep archive history: 000000000000000000000000 and later
trigger file found: smart failover
running restore: OK
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000002.history
WAL file path: /backup/prod/db_backup/archive/00000002.history
Restoring to: pg_xlog/RECOVERYHISTORY
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore: cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
not restored
history file not found
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000001.history
WAL file path: /backup/prod/db_backup/archive/00000001.history
Restoring to: pg_xlog/RECOVERYHISTORY
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore: cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
not restored
history file not found
postgresql-9.1-main.log
========================
2015-08-03 08:08:44 IST::@:[5542]:LOG: restored log file "0000000100000001000000CC" from archive
2015-08-03 08:11:44 IST::@:[5542]:LOG: could not open file "pg_xlog/0000000100000001000000CD" (log file 1, segment 205): No such file or directory
2015-08-03 08:11:44 IST::@:[5542]:LOG: redo done at 1/CC001E10
2015-08-03 08:11:44 IST::@:[5542]:LOG: last completed transaction was at log time 2015-08-03 07:59:58.121908+05:30
2015-08-03 08:11:44 IST::@:[5542]:LOG: restored log file "0000000100000001000000CC" from archive
2015-08-03 08:17:44 IST::@:[5542]:LOG: selected new timeline ID: 2
2015-08-03 08:23:45 IST::@:[5542]:LOG: archive recovery complete
2015-08-03 08:23:51 IST::@:[6088]:LOG: autovacuum launcher started
2015-08-03 08:23:51 IST::@:[5541]:LOG: database system is ready to accept connections
The whole process of looking for two history files 00000002.history and 00000001.history, 4 times and finally start the database took approximately 12 minutes. prodrestore_error.log
======================
WAL file not present yet. Checking for trigger file...
trigger file found: smart failover
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 0000000100000001000000CC
WAL file path: /backup/prod/db_backup/archive/0000000100000001000000CC
Restoring to: pg_xlog/RECOVERYXLOG
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/0000000100000001000000CC" "pg_xlog/RECOVERYXLOG"
Keep archive history: 000000000000000000000000 and later
trigger file found: smart failover
running restore: OK
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000002.history
WAL file path: /backup/prod/db_backup/archive/00000002.history
Restoring to: pg_xlog/RECOVERYHISTORY
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore: cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
not restored
history file not found
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000001.history
WAL file path: /backup/prod/db_backup/archive/00000001.history
Restoring to: pg_xlog/RECOVERYHISTORY
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore: cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
not restored
history file not found
postgresql-9.1-main.log
========================
2015-08-03 08:08:44 IST::@:[5542]:LOG: restored log file "0000000100000001000000CC" from archive
2015-08-03 08:11:44 IST::@:[5542]:LOG: could not open file "pg_xlog/0000000100000001000000CD" (log file 1, segment 205): No such file or directory
2015-08-03 08:11:44 IST::@:[5542]:LOG: redo done at 1/CC001E10
2015-08-03 08:11:44 IST::@:[5542]:LOG: last completed transaction was at log time 2015-08-03 07:59:58.121908+05:30
2015-08-03 08:11:44 IST::@:[5542]:LOG: restored log file "0000000100000001000000CC" from archive
2015-08-03 08:17:44 IST::@:[5542]:LOG: selected new timeline ID: 2
2015-08-03 08:23:45 IST::@:[5542]:LOG: archive recovery complete
2015-08-03 08:23:51 IST::@:[6088]:LOG: autovacuum launcher started
2015-08-03 08:23:51 IST::@:[5541]:LOG: database system is ready to accept connections
"WAL file not present yet. Checking for trigger file...
trigger file found: fast failover
Trigger file: /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 0000000100000001000000CF
WAL file path: /backup/prod/db_backup/archive/0000000100000001000000CF
Restoring to: pg_xlog/RECOVERYXLOG
Sleep interval: 60 seconds
Max wait interval: 0 forever
Command for restore: cp "/backup/prod/db_backup/archive/0000000100000001000000CF" "pg_xlog/RECOVERYXLOG"
Keep archive history: 000000000000000000000000 and later
trigger file found: smart failover
running restore: OK
and the restore took same time to check the history files.
In this case, I needed almost 14-15 minutes as it looked for 00000003.history aprt from 00000002.history and 00000001.history, even though I restarted the whole process by replacing the base backup afresh.
Am I doing something wrong here or is this was a fast failover expected to work?
3. Each iteration, I did to see the behaviour by stopping the current postgres server, deleting the previous base backup and restoring afresh added search for one more non existing history file like 00000004.history and 00000005.history, further slowing the process.
As I restore from the base backup each time, then push the trigger file after applying all WAL archives after the base backup, I would like to either ignore looking for history files altogether or control how many history files are tried. Also like to speed up starting the database in read write mode, so that I can point my application to this database and continue the business. What can I do for that ?
Thanks for all the help.
Regards,
James
Re: Postgres 9.1 - getting a continous archiving database to accept connections taking too long
From
James Sebastian
Date:
1. How I can make this as fast to look for history files and if not present proceed quickly to start database in read/write mode ? or not to look for history files at all.Sleep interval: 60 secondsThe whole process of looking for two history files 00000002.history and 00000001.history, 4 times and finally start the database took approximately 12 minutes.
Max wait interval: 0 forever
I could get an answer in another forum explaining how sleep time of 60 seconds, ensures almost 6 minutes for waiting for WAL files and history files.
3. Each iteration, I did to see the behaviour by stopping the current postgres server, deleting the previous base backup and restoring afresh added search for one more non existing history file like 00000004.history and 00000005.history, further slowing the process.As I restore from the base backup each time, then push the trigger file after applying all WAL archives after the base backup, I would like to either ignore looking for history files altogether or control how many history files are tried. A
Thanks
James