Re: PgbackRest PointTIme Recovery : server unable to start back - Mailing list pgsql-general

From KK CHN
Subject Re: PgbackRest PointTIme Recovery : server unable to start back
Date
Msg-id CAKgGyB_=dmmEMC=Jrad00Ui2kbBB1+cs3dGMaWdQT6qMqPMDJQ@mail.gmail.com
Whole thread Raw
In response to Re: PgbackRest PointTIme Recovery : server unable to start back  (Muhammad Ikram <mmikram@gmail.com>)
Responses Re: PgbackRest PointTIme Recovery : server unable to start back
List pgsql-general
here the log outputs....

When I ran 

$sudo -u enterprisedb pgbackrest --stanza=Demo_Repo  --delta  --set=20240719-122703F_20240724-094727I --target-timeline=current --type=time "--target=2024-07-24 09:44:01.3255+05:30" --target-action=promote  restore



The Edb server  log out put shows 
...........................................................................
................................................................
2024-07-23 15:34:12 IST ERROR:  syntax error at or near "selecct" at character 1
2024-07-23 15:34:12 IST STATEMENT:  selecct current_timestamp
2024-07-23 15:34:26 IST FATAL:  database "enterprisedb" does not exist
2024-07-24 09:38:20 IST FATAL:  database "enterprisedb" does not exist
2024-07-24 09:42:17 IST FATAL:  database "test1" does not exist
2024-07-24 09:43:56 IST FATAL:  database "enterprisedb" does not exist
2024-07-24 09:45:07 IST FATAL:  database "enterprisedb" does not exist
2024-07-24 09:45:16 IST ERROR:  relation "important_table" does not exist at character 58
2024-07-24 09:45:16 IST STATEMENT:  begin; drop table important_table; commit; select * from important_table;
2024-07-24 09:46:00 IST LOG:  checkpoint starting: time
2024-07-24 09:46:02 IST LOG:  checkpoint complete: wrote 16 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=1.506 s, sync=0.001 s, total=1.509 s; sync files=9, longest=0.001 s, average=0.001 s; distance=93 kB, estimate=171 kB; lsn=0/3C045768, redo lsn=0/3C045730
2024-07-24 09:47:27 IST LOG:  checkpoint starting: immediate force wait
2024-07-24 09:47:27 IST LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=1, longest=0.001 s, average=0.001 s; distance=16106 kB, estimate=16106 kB; lsn=0/3D000060, redo lsn=0/3D000028
2024-07-24 09:47:27.788 P00   INFO: archive-push command begin 2.52.1: [pg_wal/00000009000000000000003C] --exec-id=34036-c1d6de0b --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/edb/as16/data --pg-version-force=16 --repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-24 09:47:28.130 P00   INFO: pushed WAL file '00000009000000000000003C' to the archive
2024-07-24 09:47:28.230 P00   INFO: archive-push command end: completed successfully (444ms)



When  I ran       $ sudo systemctl start edb-as-16.service

4-07-26 11:32:56 IST LOG:  starting PostgreSQL 16.3 (EnterpriseDB Advanced Server 16.3.0) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20231218 (Red Hat 11.4.1-3), 64-bit
2024-07-26 11:32:56 IST LOG:  listening on IPv4 address "0.0.0.0", port 5444
2024-07-26 11:32:56 IST LOG:  listening on IPv6 address "::", port 5444
2024-07-26 11:32:56 IST LOG:  listening on Unix socket "/tmp/.s.PGSQL.5444"
2024-07-26 11:32:56 IST LOG:  database system was interrupted; last known up at 2024-07-24 09:47:27 IST
2024-07-26 11:32:56 IST LOG:  starting point-in-time recovery to 2024-07-24 09:44:01.3255+05:30
2024-07-26 11:32:56 IST LOG:  starting backup recovery with redo LSN 0/3D000028, checkpoint LSN 0/3D000060, on timeline ID 9
2024-07-26 11:32:56.475 P00   INFO: archive-get command begin 2.52.1: [00000009.history, pg_wal/RECOVERYHISTORY] --exec-id=43293-1d583a8e --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/edb/as16/data --pg-version-force=16 --repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:56.709 P00   INFO: found 00000009.history in the repo1: 16-1 archive
2024-07-26 11:32:56.809 P00   INFO: archive-get command end: completed successfully (335ms)
2024-07-26 11:32:56 IST LOG:  restored log file "00000009.history" from archive
2024-07-26 11:32:56.816 P00   INFO: archive-get command begin 2.52.1: [00000009000000000000003D, pg_wal/RECOVERYXLOG] --exec-id=43295-55c5d6f7 --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/edb/as16/data --pg-version-force=16 --repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:56.999 P00   INFO: found 00000009000000000000003D in the repo1: 16-1 archive
2024-07-26 11:32:57.099 P00   INFO: archive-get command end: completed successfully (285ms)
2024-07-26 11:32:57 IST LOG:  restored log file "00000009000000000000003D" from archive
2024-07-26 11:32:57 IST LOG:  redo starts at 0/3D000028
2024-07-26 11:32:57.114 P00   INFO: archive-get command begin 2.52.1: [00000009000000000000003E, pg_wal/RECOVERYXLOG] --exec-id=43297-34867c05 --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/edb/as16/data --pg-version-force=16 --repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:57.282 P00   INFO: found 00000009000000000000003E in the repo1: 16-1 archive
2024-07-26 11:32:57.382 P00   INFO: archive-get command end: completed successfully (270ms)
2024-07-26 11:32:57 IST LOG:  restored log file "00000009000000000000003E" from archive
2024-07-26 11:32:57.400 P00   INFO: archive-get command begin 2.52.1: [00000009000000000000003F, pg_wal/RECOVERYXLOG] --exec-id=43299-e2db2e1b --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/edb/as16/data --pg-version-force=16 --repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:57.521 P00   INFO: unable to find 00000009000000000000003F in the archive
2024-07-26 11:32:57.621 P00   INFO: archive-get command end: completed successfully (222ms)
2024-07-26 11:32:57 IST LOG:  completed backup recovery with redo LSN 0/3D000028 and end LSN 0/3D000100
2024-07-26 11:32:57 IST LOG:  consistent recovery state reached at 0/3D000100
2024-07-26 11:32:57 IST LOG:  database system is ready to accept read-only connections
2024-07-26 11:32:57.632 P00   INFO: archive-get command begin 2.52.1: [00000009000000000000003F, pg_wal/RECOVERYXLOG] --exec-id=43301-f613dae9 --log-level-console=info --log-level-file=debug --pg1-path=/var/lib/edb/as16/data --pg-version-force=16 --repo1-host=10.10.20.7 --repo1-host-user=postgres --stanza=Demo_Repo
2024-07-26 11:32:57.761 P00   INFO: unable to find 00000009000000000000003F in the archive
2024-07-26 11:32:57.861 P00   INFO: archive-get command end: completed successfully (231ms)
2024-07-26 11:32:57 IST LOG:  redo done at 0/3E000060 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.75 s
2024-07-26 11:32:57 IST FATAL:  recovery ended before configured recovery target was reached
2024-07-26 11:32:57 IST LOG:  startup process (PID 43292) exited with exit code 1


ONLY inference  I can make is 

  INFO  unable to find   00000009000000000000003F in the archive    ( This means  the  EDB server  (10.10.20.6  ) unable to push the archives to the   Repo server(10.10.20.7 ) ?    Is that the reason for the  recovery and start backing of edb server fails ?


the pg_hba.conf   entry in the EDB Server machine is as 

host    all             all             127.0.0.1/32            ident
host    all             all             10.10.20.7/32          scram-sha-256
#host    all             all              10.10.20.7/32          trust
# IPv6 local connections:
host    all             all             ::1/128                 ident
#host    all             all             10.10.20.7/24           trust

# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             10.10.20.7/32           scram-sha-256
host    replication     all             127.0.0.1/32            ident
host    replication     all             ::1/128                 ident


Do I have to change anything in pg_hba.conf ?


my EDB Server conf as  this 

archive_mode = on
archive_command = 'pgbackrest --stanza=Demo_Repo  archive-push %p'
log_filename = 'postgresql.log'
max_wal_senders = 5
wal_level = replica


Any help ?

Krishane




On Fri, Jul 26, 2024 at 10:45 AM Muhammad Ikram <mmikram@gmail.com> wrote:
Hi KK CHN

Could you check server logs ?
Your service trace suggests that it started server and then failure happened

ul 26 09:48:49 service01 systemd[1]: Started EDB Postgres Advanced Server 16.
Jul 26 09:48:50 service01 systemd[1]: edb-as-16.service: Main process exited, code=exited, status=1/FAILURE



Regards,
Ikram


On Fri, Jul 26, 2024 at 10:04 AM KK CHN <kkchn.in@gmail.com> wrote:
List, 

I am following the   PTR  on RHEL9 EPAS16.
I am able to do a  backup(Full, diff and incr)  and   restore from a full backup  and restart of EPAS16 works fine. 

But when I do an incremental backup  after doing the   procedures mentioned in the PTR section of the above  reference link and  try  restoring the EDB database from the INCR backup   and then starting up the  EPAS16 the server  always results in dead state 

 My repo server is another machine.  If  I do  a  full restore  on the DB server  ( sudo -u enterprisedb pgbackrest --stanza=Demo_Repo  --delta restore) it works  and the server starts without any issue.  
Restoring  from  Incremental backup tty output shows successful but edb service start  results in  failure. 

Any help is much appreciated. 

Krishane. 




STEPS followed:

after dropping  the table  pg-primary  Drop the important table ([section]
stopped the EDB server.

$ sudo -u enterprisedb pgbackrest --stanza=Demo_Repo    --delta  --set=20240719-122703F_20240724-094727I --target-timeline=current --type=time "--target=2024-07-24 09:44:01.3255+05:30" --target-action=promote  restore
.....................................................

2024-07-26 09:48:06.343 P00   INFO: restore command end: completed successfully (1035ms)


But    

[root@rservice01 ~]# sudo systemctl start edb-as-16.service
[root@service01 ~]# sudo systemctl status edb-as-16.service
× edb-as-16.service - EDB Postgres Advanced Server 16
     Loaded: loaded (/etc/systemd/system/edb-as-16.service; disabled; preset: disabled)
     Active: failed (Result: exit-code) since Fri 2024-07-26 09:48:50 IST; 8s ago
   Duration: 242ms
    Process: 41903 ExecStartPre=/usr/edb/as16/bin/edb-as-16-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
    Process: 41908 ExecStart=/usr/edb/as16/bin/edb-postgres -D ${PGDATA} (code=exited, status=1/FAILURE)
   Main PID: 41908 (code=exited, status=1/FAILURE)
        CPU: 331ms

Jul 26 09:48:48 service01 systemd[1]: Starting EDB Postgres Advanced Server 16...
Jul 26 09:48:48 service01 edb-postgres[41908]: 2024-07-26 09:48:48 IST LOG:  redirecting log output to logging collector process
Jul 26 09:48:48 service01 edb-postgres[41908]: 2024-07-26 09:48:48 IST HINT:  Future log output will appear in directory "log".
Jul 26 09:48:49 service01 systemd[1]: Started EDB Postgres Advanced Server 16.
Jul 26 09:48:50 service01 systemd[1]: edb-as-16.service: Main process exited, code=exited, status=1/FAILURE
Jul 26 09:48:50 service01 systemd[1]: edb-as-16.service: Killing process 41909 (edb-postgres) with signal SIGKILL.
Jul 26 09:48:50 service01 systemd[1]: edb-as-16.service: Failed with result 'exit-code'.
[root@service01 ~]#

why is it unable to perform a restore and recovery  from an incr  backup ?







On The Repo Server
[root@service02 ~]#  sudo -u postgres pgbackrest --stanza=Demo_Repo info
stanza: Demo_Repo
    status: ok
    cipher: aes-256-cbc

    db (current)
        wal archive min/max (16): 000000020000000000000021/0000000B0000000000000041

        full backup: 20240719-122703F
            timestamp start/stop: 2024-07-19 12:27:03+05:30 / 2024-07-19 12:27:06+05:30
            wal start/stop: 00000002000000000000002A / 00000002000000000000002A
            database size: 61.7MB, database backup size: 61.7MB
            repo1: backup size: 9.6MB

        incr backup: 20240719-122703F_20240719-123353I
            timestamp start/stop: 2024-07-19 12:33:53+05:30 / 2024-07-19 12:33:56+05:30
            wal start/stop: 00000002000000000000002C / 00000002000000000000002C
            database size: 61.7MB, database backup size: 6.4MB
            repo1: backup size: 6.2KB
            backup reference list: 20240719-122703F

        diff backup: 20240719-122703F_20240719-123408D
            timestamp start/stop: 2024-07-19 12:34:08+05:30 / 2024-07-19 12:34:10+05:30
            wal start/stop: 00000002000000000000002E / 00000002000000000000002E
            database size: 61.7MB, database backup size: 6.4MB
            repo1: backup size: 6.4KB
            backup reference list: 20240719-122703F

        incr backup: 20240719-122703F_20240723-110212I
            timestamp start/stop: 2024-07-23 11:02:12+05:30 / 2024-07-23 11:02:15+05:30
            wal start/stop: 000000070000000000000038 / 000000070000000000000038
            database size: 48MB, database backup size: 6.4MB
            repo1: backup size: 9.8KB
            backup reference list: 20240719-122703F, 20240719-122703F_20240719-123408D

        incr backup: 20240719-122703F_20240723-141818I
            timestamp start/stop: 2024-07-23 14:18:18+05:30 / 2024-07-23 14:18:22+05:30
            wal start/stop: 00000008000000000000003C / 00000008000000000000003C
            database size: 75.4MB, database backup size: 33.8MB
            repo1: backup size: 4.7MB
            backup reference list: 20240719-122703F, 20240719-122703F_20240719-123408D, 20240719-122703F_20240723-110212I

        incr backup: 20240719-122703F_20240724-094727I
            timestamp start/stop: 2024-07-24 09:47:27+05:30 / 2024-07-24 09:47:30+05:30
            wal start/stop: 00000009000000000000003D / 00000009000000000000003D
            database size: 62MB, database backup size: 11.5MB
            repo1: backup size: 72KB
            backup reference list: 20240719-122703F, 20240719-122703F_20240719-123408D, 20240719-122703F_20240723-110212I, 20240719-122703F_20240723-141818





--
Muhammad Ikram

pgsql-general by date:

Previous
From: Muhammad Ikram
Date:
Subject: Re: Slow performance
Next
From: Muhammad Ikram
Date:
Subject: Re: PgbackRest PointTIme Recovery : server unable to start back