Problems starting slave - Mailing list pgsql-general

From Douglas Reed
Subject Problems starting slave
Date
Msg-id 1263466497.931257.1696257526649@mail.yahoo.com
Whole thread Raw
Responses Re: Problems starting slave
List pgsql-general
Hi guys

The servers are virtual running on Nutanix

We are running Pg version 12 (12.10)

On Linux km-data1.rs.fsbtech.com 5.4.191-1.el7.elrepo.x86_64 #1 SMP Tue Apr 26 12:14:16 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

48G/16 x CPU (Master and slave)

Timeline

System had a number of issues due to kafka and slots, would not always shut down correctly. Several incidents where it had to be forced.
Found corruption on a number of indexes and tables. Decided to recover from a backup (barman). Due to missing wal file restored data up
to about three hours prior to expectation but it's working.

Attempts to build a slave;

On the slave at first we got error messages from pg_basebackup stating that the target directory was not empty;

    pg_basebackup: error: directory "/var/lib/pgsql/12/data" exists but is not empty

Although the directory /var/lib/pgsql/12/data was empty (using rm -r ....). Finally deleted the data directory and re-created ensuring that
perms were same. Restore started successfully and completed with error=0.

When starting the instance we got the message;

# systemctl start postgresql-12.service

    Job for postgresql-12.service failed because the control process exited with error code. See "systemctl status postgresql-12.service" and "journalctl -xe" for details.

Ran systemctl status postgresql-12.service and it returned;

     postgresql-12.service - PostgreSQL 12 database server
       Loaded: loaded (/usr/lib/systemd/system/postgresql-12.service; enabled; vendor preset: disabled)
       Active: failed (Result: exit-code) since Mon 2023-10-02 15:26:43 BST; 54s ago
         Docs: https://www.postgresql.org/docs/12/static/
      Process: 9532 ExecStart=/usr/pgsql-12/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)
      Process: 9526 ExecStartPre=/usr/pgsql-12/bin/postgresql-12-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
     Main PID: 9532 (code=exited, status=1/FAILURE)

    Oct 02 15:26:43 km-data2.rs.fsbtech.com postgres[9534]: [9-1] user=,db=,app=client= LOG:  entering standby mode
    Oct 02 15:26:43 km-data2.rs.fsbtech.com postmaster[9532]: cp: cannot stat ‘barman_wal/00000002.history’: No such file or directory
    Oct 02 15:26:43 km-data2.rs.fsbtech.com postmaster[9532]: cp: cannot stat ‘barman_wal/0000000200000C740000006D’: No such file or directory
    Oct 02 15:26:43 km-data2.rs.fsbtech.com postgres[9532]: [7-1] user=,db=,app=client= LOG:  startup process (PID 9534) was terminated by signal 6: Aborted
    Oct 02 15:26:43 km-data2.rs.fsbtech.com postgres[9532]: [8-1] user=,db=,app=client= LOG:  aborting startup due to startup process failure
    Oct 02 15:26:43 km-data2.rs.fsbtech.com postgres[9532]: [9-1] user=,db=,app=client= LOG:  database system is shut down
    Oct 02 15:26:43 km-data2.rs.fsbtech.com systemd[1]: postgresql-12.service: main process exited, code=exited, status=1/FAILURE
    Oct 02 15:26:43 km-data2.rs.fsbtech.com systemd[1]: Failed to start PostgreSQL 12 database server.
    Oct 02 15:26:43 km-data2.rs.fsbtech.com systemd[1]: Unit postgresql-12.service entered failed state.
    Oct 02 15:26:43 km-data2.rs.fsbtech.com systemd[1]: postgresql-12.service failed.

Also ran journalctl -xe and it returned;

    -- The start-up result is done.
    Oct 02 15:28:43 km-data2.rs.fsbtech.com sudo[10001]: pam_unix(sudo:session): session opened for user root by (uid=0)
    Oct 02 15:28:43 km-data2.rs.fsbtech.com sudo[10001]: pam_unix(sudo:session): session closed for user root
    Oct 02 15:28:43 km-data2.rs.fsbtech.com systemd[1]: Removed slice User Slice of root.
    -- Subject: Unit user-0.slice has finished shutting down
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit user-0.slice has finished shutting down.
    Oct 02 15:28:43 km-data2.rs.fsbtech.com filebeat[71555]: {"log.level":"info","@timestamp":"2023-10-02T15:28:43.427+0100","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":186},"message
    Oct 02 15:28:44 km-data2.rs.fsbtech.com sudo[10016]:   zabbix : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/etc/zabbix/scripts/postgres.sh connections sport
    Oct 02 15:28:44 km-data2.rs.fsbtech.com systemd[1]: Created slice User Slice of root.
    -- Subject: Unit user-0.slice has finished start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit user-0.slice has finished starting up.
    --
    -- The start-up result is done.
    Oct 02 15:28:44 km-data2.rs.fsbtech.com systemd[1]: Started Session c1058389 of user root.
    -- Subject: Unit session-c1058389.scope has finished start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit session-c1058389.scope has finished starting up.
    --
    -- The start-up result is done.
    Oct 02 15:28:44 km-data2.rs.fsbtech.com sudo[10016]: pam_unix(sudo:session): session opened for user root by (uid=0)
    Oct 02 15:28:44 km-data2.rs.fsbtech.com sudo[10016]: pam_unix(sudo:session): session closed for user root
    Oct 02 15:28:44 km-data2.rs.fsbtech.com systemd[1]: Removed slice User Slice of root.
    -- Subject: Unit user-0.slice has finished shutting down
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    --
    -- Unit user-0.slice has finished shutting down.

The main error seems to be;

    Oct 02 15:26:43 km-data2.rs.fsbtech.com postmaster[9532]: cp: cannot stat ‘barman_wal/00000002.history’: No such file or directory
    Oct 02 15:26:43 km-data2.rs.fsbtech.com postmaster[9532]: cp: cannot stat ‘barman_wal/0000000200000C740000006D’: No such file or directory

Any ideas guys



Doug Reed 
dougreed765@yahoo.com 
07973-132664
https://uk.linkedin.com/pub/douglas-reed/33/326/2b

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: specifying multiple options in URI psql behaviour?
Next
From: Dominique Devienne
Date:
Subject: How to force "re-TOAST" after changing STORAGE or COMPRESSION?