Re: backup manifests - Mailing list pgsql-hackers
From | tushar |
---|---|
Subject | Re: backup manifests |
Date | |
Msg-id | 3d2b645c-b373-7946-8b33-f1fd879c368d@enterprisedb.com Whole thread Raw |
In response to | Re: backup manifests (Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>) |
Responses |
Re: backup manifests
|
List | pgsql-hackers |
Hi,
There is one scenario where i somehow able to run pg_validatebackup successfully but when i tried to start the server , it is failing
Steps to reproduce -
--create 2 base backup directory
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db1
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db2
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D db2
--run pg_validatebackup , use backup_manifest of db1 directory against db2/ . Will get an error
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
pg_validatebackup: error: checksum mismatch for file "backup_label"
pg_validatebackup: * manifest_checksum = 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
pg_validatebackup: error: checksum mismatch for file "backup_label"
--copy the backup_level of db1 to db2 folder
[centos@tushar-ldap-docker bin]$ cp db1/backup_label db2/.
--run pg_validatebackup .. working fine
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup -m db1/backup_manifest db2/
pg_validatebackup: * manifest_checksum = 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$
pg_validatebackup: * manifest_checksum = 5b131aff4a4f86e2a53efd84b003a67b9f615decb0039f19033eefa6f43c1ede
pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$
--try to start the server
[centos@tushar-ldap-docker bin]$ ./pg_ctl -D db2 start -o '-p 7777'
waiting for server to start....2020-03-05 15:33:53.471 IST [24049] LOG: starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-05 15:33:53.471 IST [24049] LOG: listening on IPv6 address "::1", port 7777
2020-03-05 15:33:53.471 IST [24049] LOG: listening on IPv4 address "127.0.0.1", port 7777
2020-03-05 15:33:53.473 IST [24049] LOG: listening on Unix socket "/tmp/.s.PGSQL.7777"
2020-03-05 15:33:53.476 IST [24050] LOG: database system was interrupted; last known up at 2020-03-05 15:32:51 IST
2020-03-05 15:33:53.573 IST [24050] LOG: invalid checkpoint record
2020-03-05 15:33:53.573 IST [24050] FATAL: could not locate required checkpoint record
2020-03-05 15:33:53.573 IST [24050] HINT: If you are restoring from a backup, touch "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/recovery.signal" and add required recovery options.
If you are not restoring from a backup, try removing the file "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label".
Be careful: removing "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label" will result in a corrupt cluster if restoring from a backup.
2020-03-05 15:33:53.574 IST [24049] LOG: startup process (PID 24050) exited with exit code 1
2020-03-05 15:33:53.574 IST [24049] LOG: aborting startup due to startup process failure
2020-03-05 15:33:53.575 IST [24049] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
[centos@tushar-ldap-docker bin]$
waiting for server to start....2020-03-05 15:33:53.471 IST [24049] LOG: starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-05 15:33:53.471 IST [24049] LOG: listening on IPv6 address "::1", port 7777
2020-03-05 15:33:53.471 IST [24049] LOG: listening on IPv4 address "127.0.0.1", port 7777
2020-03-05 15:33:53.473 IST [24049] LOG: listening on Unix socket "/tmp/.s.PGSQL.7777"
2020-03-05 15:33:53.476 IST [24050] LOG: database system was interrupted; last known up at 2020-03-05 15:32:51 IST
2020-03-05 15:33:53.573 IST [24050] LOG: invalid checkpoint record
2020-03-05 15:33:53.573 IST [24050] FATAL: could not locate required checkpoint record
2020-03-05 15:33:53.573 IST [24050] HINT: If you are restoring from a backup, touch "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/recovery.signal" and add required recovery options.
If you are not restoring from a backup, try removing the file "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label".
Be careful: removing "/home/centos/pg13_bk_mani/edb/edbpsql/bin/db2/backup_label" will result in a corrupt cluster if restoring from a backup.
2020-03-05 15:33:53.574 IST [24049] LOG: startup process (PID 24050) exited with exit code 1
2020-03-05 15:33:53.574 IST [24049] LOG: aborting startup due to startup process failure
2020-03-05 15:33:53.575 IST [24049] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
[centos@tushar-ldap-docker bin]$
regards,
On 3/5/20 1:09 PM, Rajkumar Raghuwanshi wrote:
Hi,In a negative test scenario, if I changed size to -1 in backup_manifest, pg_validatebackup givingerror with a random size number.[edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold --manifest-checksum 'SHA256'[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: backup successfully verified--change a file size to -1 and generate new checksum.[edb@localhost bin]$ vi /tmp/bold/backup_manifest
[edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a /tmp/bold/backup_manifest
[edb@localhost bin]$ vi /tmp/bold/backup_manifest[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: error: "global/4183" has size 0 on disk but size 18446744073709551615 in the manifestThanks & Regards,Rajkumar RaghuwanshiOn Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:On Wed, Mar 4, 2020 at 7:21 PM tushar <tushar.ahuja@enterprisedb.com> wrote:Hi,There is a scenario in which i add something inside the pg_tablespace directory , i am getting an error like-pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/PG_13_202002271/test" is present on disk but not in the manifestbut if i remove 'PG_13_202002271 ' directory then there is no error[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum = 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verifiedThis seems expected considering current design as we don't log the directory entries in backup_manifest. In your case, you have tablespace with no objects (empty tablespace) then backup_manifest does not have any entry for this hence when you remove this tablespace directory, validator could not detect it.We can either document it or add the entry for directories in the manifest. Robert may have a better idea on this.----Thanks & Regards,Suraj kharage,EnterpriseDB Corporation,The Postgres Database Company.
-- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
pgsql-hackers by date: