Thread: PostgreSQL 14.8 - server fails to start even though all transaction logs with base backup are restored
PostgreSQL 14.8 - server fails to start even though all transaction logs with base backup are restored
Hi Team,
We are following https://www.postgresql.org/docs/14/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
Making An Exclusive Low-Level Backup.
After restoring, we see the server fails to start with below error:
PostgreSQL server log:
2023-08-04 16:47:47.227 IST [40582] LOG: starting PostgreSQL 14.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2023-08-04 16:47:47.228 IST [40582] LOG: listening on IPv4 address "0.0.0.0", port 5414
2023-08-04 16:47:47.228 IST [40582] LOG: listening on IPv6 address "::", port 5414
2023-08-04 16:47:47.231 IST [40582] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5414"
2023-08-04 16:47:47.236 IST [40582] LOG: listening on Unix socket "/tmp/.s.PGSQL.5414"
2023-08-04 16:47:47.243 IST [40584] LOG: database system was interrupted; last known up at 2023-08-04 14:48:29 IST
2023-08-04 16:47:49.254 IST [40584] LOG: starting archive recovery
cp: cannot stat ‘/var/lib/pgsql/14/wal/00000004.history’: No such file or directory
2023-08-04 16:47:49.275 IST [40584] LOG: restored log file "000000040000000200000041" from archive
2023-08-04 16:47:49.290 IST [40584] LOG: redo starts at 2/41000028
2023-08-04 16:47:49.310 IST [40584] LOG: restored log file "000000040000000200000042" from archive
2023-08-04 16:47:49.344 IST [40584] LOG: restored log file "000000040000000200000043" from archive
2023-08-04 16:47:49.368 IST [40584] LOG: restored log file "000000040000000200000044" from archive
2023-08-04 16:47:49.395 IST [40584] LOG: restored log file "000000040000000200000045" from archive
2023-08-04 16:47:49.423 IST [40584] LOG: restored log file "000000040000000200000046" from archive
2023-08-04 16:47:49.460 IST [40584] LOG: restored log file "000000040000000200000047" from archive
2023-08-04 16:47:50.649 IST [40584] LOG: restored log file "000000040000000200000048" from archive
2023-08-04 16:47:51.219 IST [40584] LOG: restored log file "000000040000000200000049" from archive
cp: cannot stat ‘/var/lib/pgsql/14/wal/00000004000000020000004A’: No such file or directory
2023-08-04 16:47:51.603 IST [40584] LOG: redo done at 2/49013500 system usage: CPU: user: 0.00 s, system: 0.19 s, elapsed: 2.31 s
2023-08-04 16:47:51.603 IST [40584] LOG: last completed transaction was at log time 2023-08-04 14:58:51.021414+05:30
2023-08-04 16:47:51.628 IST [40584] LOG: restored log file "000000040000000200000049" from archive
2023-08-04 16:47:51.973 IST [40584] FATAL: WAL ends before end of online backup
2023-08-04 16:47:51.973 IST [40584] HINT: Online backup started with pg_start_backup() must be ended with pg_stop_backup(), and all WAL up to that point must be available at recovery.
2023-08-04 16:47:51.975 IST [40582] LOG: startup process (PID 40584) exited with exit code 1
2023-08-04 16:47:51.975 IST [40582] LOG: terminating any other active server processes
2023-08-04 16:47:51.975 IST [40582] LOG: shutting down due to startup process failure
2023-08-04 16:47:51.977 IST [40582] LOG: database system is shut down
Backup_label from data directory:
[root@gkstandby2 data]# cat backup_label
START WAL LOCATION: 2/41000028 (file 000000040000000200000041)
CHECKPOINT LOCATION: 2/41000060
BACKUP METHOD: pg_start_backup
BACKUP FROM: primary
START TIME: 2023-08-04 14:48:29 IST
LABEL: pgida_backup_5414_108625_1691140709
START TIMELINE: 4
[root@gkstandby2 data]#
Archive log directory:
[root@gkstandby2 wal]# ls
00000002000000020000003B 00000002000000020000003E.00000028.backup 000000040000000200000040.00000028.backup 000000040000000200000044 000000040000000200000048
00000002000000020000003C 00000002.history 000000040000000200000041 000000040000000200000045 000000040000000200000049
00000002000000020000003C.00000028.backup 0000000300000005000000E4 000000040000000200000042 000000040000000200000045.00005AE0.backup
00000002000000020000003D 00000004000000020000003F 000000040000000200000043 000000040000000200000046
00000002000000020000003E 000000040000000200000040 000000040000000200000043.00000028.backup 000000040000000200000047
[root@gkstandby2 wal]# /usr/pgsql-14/bin/pg_waldump 000000040000000200000046 | grep BACKUP
rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 2/46000110, prev 2/46000098, desc: BACKUP_END 2/45005AE0
[root@gkstandby2 wal]#
We are restoring all the transaction logs required for base backup to be consistent.
BACKUP_END is present in the 000000040000000200000046 segment. Why did recovery fail to find this?
Regards,
Meera
Re: PostgreSQL 14.8 - server fails to start even though all transaction logs with base backup are restored
"26.3.3.2. Making An Exclusive Low-Level Backup
Note
The exclusive backup method is deprecated and should be avoided. Prior to PostgreSQL 9.6, this was the only low-level method available, but it is now recommended that all users upgrade their scripts to use non-exclusive backups."
Use PgBackRest, instead.
@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; text-decoration:underline;}span.EmailStyle19 {mso-style-type:personal-compose; font-family:"Calibri",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-size:10.0pt; font-family:"Calibri",sans-serif; mso-ligatures:none;}div.WordSection1 {page:WordSection1;} Hi Team,
We are following https://www.postgresql.org/docs/14/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
Making An Exclusive Low-Level Backup.
After restoring, we see the server fails to start with below error:
PostgreSQL server log:
2023-08-04 16:47:47.227 IST [40582] LOG: starting PostgreSQL 14.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2023-08-04 16:47:47.228 IST [40582] LOG: listening on IPv4 address "0.0.0.0", port 5414
2023-08-04 16:47:47.228 IST [40582] LOG: listening on IPv6 address "::", port 5414
2023-08-04 16:47:47.231 IST [40582] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5414"
2023-08-04 16:47:47.236 IST [40582] LOG: listening on Unix socket "/tmp/.s.PGSQL.5414"
2023-08-04 16:47:47.243 IST [40584] LOG: database system was interrupted; last known up at 2023-08-04 14:48:29 IST
2023-08-04 16:47:49.254 IST [40584] LOG: starting archive recovery
cp: cannot stat ‘/var/lib/pgsql/14/wal/00000004.history’: No such file or directory
2023-08-04 16:47:49.275 IST [40584] LOG: restored log file "000000040000000200000041" from archive
2023-08-04 16:47:49.290 IST [40584] LOG: redo starts at 2/41000028
2023-08-04 16:47:49.310 IST [40584] LOG: restored log file "000000040000000200000042" from archive
2023-08-04 16:47:49.344 IST [40584] LOG: restored log file "000000040000000200000043" from archive
2023-08-04 16:47:49.368 IST [40584] LOG: restored log file "000000040000000200000044" from archive
2023-08-04 16:47:49.395 IST [40584] LOG: restored log file "000000040000000200000045" from archive
2023-08-04 16:47:49.423 IST [40584] LOG: restored log file "000000040000000200000046" from archive
2023-08-04 16:47:49.460 IST [40584] LOG: restored log file "000000040000000200000047" from archive
2023-08-04 16:47:50.649 IST [40584] LOG: restored log file "000000040000000200000048" from archive
2023-08-04 16:47:51.219 IST [40584] LOG: restored log file "000000040000000200000049" from archive
cp: cannot stat ‘/var/lib/pgsql/14/wal/00000004000000020000004A’: No such file or directory
2023-08-04 16:47:51.603 IST [40584] LOG: redo done at 2/49013500 system usage: CPU: user: 0.00 s, system: 0.19 s, elapsed: 2.31 s
2023-08-04 16:47:51.603 IST [40584] LOG: last completed transaction was at log time 2023-08-04 14:58:51.021414+05:30
2023-08-04 16:47:51.628 IST [40584] LOG: restored log file "000000040000000200000049" from archive
2023-08-04 16:47:51.973 IST [40584] FATAL: WAL ends before end of online backup
2023-08-04 16:47:51.973 IST [40584] HINT: Online backup started with pg_start_backup() must be ended with pg_stop_backup(), and all WAL up to that point must be available at recovery.
2023-08-04 16:47:51.975 IST [40582] LOG: startup process (PID 40584) exited with exit code 1
2023-08-04 16:47:51.975 IST [40582] LOG: terminating any other active server processes
2023-08-04 16:47:51.975 IST [40582] LOG: shutting down due to startup process failure
2023-08-04 16:47:51.977 IST [40582] LOG: database system is shut down
Backup_label from data directory:
[root@gkstandby2 data]# cat backup_label
START WAL LOCATION: 2/41000028 (file 000000040000000200000041)
CHECKPOINT LOCATION: 2/41000060
BACKUP METHOD: pg_start_backup
BACKUP FROM: primary
START TIME: 2023-08-04 14:48:29 IST
LABEL: pgida_backup_5414_108625_1691140709
START TIMELINE: 4
[root@gkstandby2 data]#
Archive log directory:
[root@gkstandby2 wal]# ls
00000002000000020000003B 00000002000000020000003E.00000028.backup 000000040000000200000040.00000028.backup 000000040000000200000044 000000040000000200000048
00000002000000020000003C 00000002.history 000000040000000200000041 000000040000000200000045 000000040000000200000049
00000002000000020000003C.00000028.backup 0000000300000005000000E4 000000040000000200000042 000000040000000200000045.00005AE0.backup
00000002000000020000003D 00000004000000020000003F 000000040000000200000043 000000040000000200000046
00000002000000020000003E 000000040000000200000040 000000040000000200000043.00000028.backup 000000040000000200000047
[root@gkstandby2 wal]# /usr/pgsql-14/bin/pg_waldump 000000040000000200000046 | grep BACKUP
rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 2/46000110, prev 2/46000098, desc: BACKUP_END 2/45005AE0
[root@gkstandby2 wal]#
We are restoring all the transaction logs required for base backup to be consistent.
BACKUP_END is present in the 000000040000000200000046 segment. Why did recovery fail to find this?
Regards,
Meera
Born in Arizona, moved to Babylonia.
Re: PostgreSQL 14.8 - server fails to start even though all transaction logs with base backup are restored
Greetings, * Meera Nair (mnair@commvault.com) wrote: > We are following https://www.postgresql.org/docs/14/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP > Making An Exclusive Low-Level Backup. Exclusive backups have been removed, so you really don't want to be depending on it. > After restoring, we see the server fails to start with below error: Your archive doesn't seem to have all of the WAL which was generated during the backup, which means that your backup wasn't complete and the system cannot be restored from this backup. > We are restoring all the transaction logs required for base backup to be consistent. Evidently not, per the complaint from PG when it starts up. > BACKUP_END is present in the 000000040000000200000046 segment. Why did recovery fail to find this? It's possible to have multiple backups going concurrently. Presumably, this was from a different backup and that's why it wasn't picked up as being the end for this backup. You seem to have quite a few .backup files in your archive directory which would seem to support this. Not sure exactly what you're doing, but unless your goal is to spend a great deal of effort developing a PG backup solution, you're really better off using one of the existing solutions (eg: pg_basebackup or pgBackRest). If you are developing your own backup solution for PG, you definitely want to be using the new APIs and not using the exclusive backup method. Please review the current (15) documentation: https://www.postgresql.org/docs/current/continuous-archiving.html Thanks, Stephen
Attachment
RE: PostgreSQL 14.8 - server fails to start even though all transaction logs with base backup are restored
Hi Stephen,
The backups we ran were all exclusive backups.
pg_start_backup(‘label’) returned 000000040000000200000041. Then we made a copy of data directory .
Then pg_stop_backup() was executed and it returned 000000040000000200000046.
After this 2 databases were dropped and recreated (using dropdb and createdb). Then we ran pg_switch_wal()
This returned 000000040000000200000049. Then we made a copy of the archive log directory.
To verify the copied data is restorable, we stopped server. Renamed data and archive log directories.
Replaced it with data and archive log directories copied earlier. In archive log directory logs from 41 to 49 is present.
Created recovery.signal and added restore_command to postgresql.conf file. And tried to start server. this failed with error in previous mail.
[root@gkstandby2 wal]# ls -ltr
total 262172
-rw------- 1 postgres postgres 42 Jun 13 01:08 00000002.history
-rw------- 1 postgres postgres 16777216 Jun 22 22:50 00000002000000020000003B
-rw------- 1 postgres postgres 357 Jun 22 22:51 00000002000000020000003C.00000028.backup
-rw------- 1 postgres postgres 16777216 Jun 22 22:51 00000002000000020000003C
-rw------- 1 postgres postgres 16777216 Jun 22 22:52 00000002000000020000003D
-rw------- 1 postgres postgres 357 Jun 22 22:52 00000002000000020000003E.00000028.backup
-rw------- 1 postgres postgres 16777216 Jun 22 22:52 00000002000000020000003E
-rw------- 1 postgres postgres 16777216 Aug 4 14:47 0000000300000005000000E4
-rw------- 1 postgres postgres 16777216 Aug 4 14:48 00000004000000020000003F
-rw------- 1 postgres postgres 16777216 Aug 4 14:48 000000040000000200000040
-rw------- 1 postgres postgres 16777216 Aug 4 14:49 000000040000000200000041
-rw------- 1 postgres postgres 351 Aug 4 14:49 000000040000000200000040.00000028.backup
-rw------- 1 postgres postgres 16777216 Aug 4 14:49 000000040000000200000042
-rw------- 1 postgres postgres 16777216 Aug 4 14:49 000000040000000200000043
-rw------- 1 postgres postgres 16777216 Aug 4 14:49 000000040000000200000044
-rw------- 1 postgres postgres 16777216 Aug 4 14:50 000000040000000200000045
-rw------- 1 postgres postgres 351 Aug 4 14:50 000000040000000200000043.00000028.backup
-rw------- 1 postgres postgres 16777216 Aug 4 14:50 000000040000000200000046
-rw------- 1 postgres postgres 358 Aug 4 14:50 000000040000000200000045.00005AE0.backup
-rw------- 1 postgres postgres 16777216 Aug 4 14:58 000000040000000200000047
-rw------- 1 postgres postgres 16777216 Aug 4 14:58 000000040000000200000048
-rw------- 1 postgres postgres 16777216 Aug 4 14:58 000000040000000200000049
[root@gkstandby2 wal]#
Regards,
Meera
-----Original Message-----
From: Stephen Frost <sfrost@snowman.net>
Sent: Tuesday, August 8, 2023 7:48 PM
To: Meera Nair <mnair@commvault.com>
Cc: pgsql-general@lists.postgresql.org; Punit Pranesh Koujalgi <pkoujalgi@commvault.com>
Subject: Re: PostgreSQL 14.8 - server fails to start even though all transaction logs with base backup are restored
Greetings,
* Meera Nair (mnair@commvault.com) wrote:
> We are following
> https://www.postgresql.org/docs/14/continuous-archiving.html#BACKUP-LO
> WLEVEL-BASE-BACKUP
> Making An Exclusive Low-Level Backup.
Exclusive backups have been removed, so you really don't want to be depending on it.
> After restoring, we see the server fails to start with below error:
Your archive doesn't seem to have all of the WAL which was generated during the backup, which means that your backup wasn't complete and the system cannot be restored from this backup.
> We are restoring all the transaction logs required for base backup to be consistent.
Evidently not, per the complaint from PG when it starts up.
> BACKUP_END is present in the 000000040000000200000046 segment. Why did recovery fail to find this?
It's possible to have multiple backups going concurrently. Presumably, this was from a different backup and that's why it wasn't picked up as being the end for this backup. You seem to have quite a few .backup files in your archive directory which would seem to support this.
Not sure exactly what you're doing, but unless your goal is to spend a great deal of effort developing a PG backup solution, you're really better off using one of the existing solutions (eg: pg_basebackup or pgBackRest). If you are developing your own backup solution for PG, you definitely want to be using the new APIs and not using the exclusive backup method. Please review the current (15) documentation:
https://www.postgresql.org/docs/current/continuous-archiving.html
Thanks,
Stephen
Re: PostgreSQL 14.8 - server fails to start even though all transaction logs with base backup are restored
Greetings, * Meera Nair (mnair@commvault.com) wrote: > The backups we ran were all exclusive backups. You should not use exclusive backups. They've been deprecated for a long, long time and have been removed in the most recent version of PG. You really should not be trying to build your own tooling for this. Please, use an existing tool where the authors have already worked through how all of this works and understand how BACKUP_END works and how to reliably perform a backup and restore of PG. Despite the documentation seeming to imply that it's trivial to do this on your own, it really isn't. > pg_start_backup('label') returned 000000040000000200000041. Then we made a copy of data directory . > Then pg_stop_backup() was executed and it returned 000000040000000200000046. I'm really not sure what you're doing exactly, but it sure looks like the BACKUP_END in the 46 WAL file is associated with the .backup file for the backup started in the 45 segment. You have this file: 000000040000000200000045.00005AE0.backup and then this is what you reported from the 46 file: rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 2/46000110, prev 2/46000098, desc: BACKUP_END 2/45005AE0 Note that the 'BACKUP_END' data content there is the starting point of the backup for which that is the BACKUP_END for- 2/45005AE0. That matches with a backup being started in the 45 file and matches the start time of the .backup file that you have in the archive. That's why the restore of the backup started in the 41 segment isn't picking up on the BACKUP_END in the 46 file as being the one matching that backup- because that's from some other backup. > To verify the copied data is restorable, we stopped server. Renamed data and archive log directories. > Replaced it with data and archive log directories copied earlier. In archive log directory logs from 41 to 49 is present. The pg_wal dir can and should be empty at the start. > Created recovery.signal and added restore_command to postgresql.conf file. And tried to start server. this failed witherror in previous mail. The restore_command should be copying WAL files from the location where they were stored by the archive_command. Thanks, Stephen