Thread: BUG #18666: Standby cluster does not start after building from Primary cluster.
BUG #18666: Standby cluster does not start after building from Primary cluster.
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18666 Logged by: Андрій Лозовський Email address: andrew.lozovsky@gmail.com PostgreSQL version: 16.4 Operating system: Ubuntu 20.04.5 LTS Description: Standby cluster does not start after building from Primary cluster. The copy2slave16.sh script was used for the building Standby cluster: #!/bin/bash . /home/postgres/pg.sh LD_LIBRARY_PATH=$PGLIB $PGBIN/psql -Upgadmin -d postgres -f /home/postgres/copy2slave16.sql > /home/postgres/copy.log 2> /home/postgres/copy.err copy2slave16.sql: SELECT pg_backup_start(label => 'label', fast => true); \! rsync -a /opt/pgdata-16/ <Standby_cluster_IP>:/opt/pgdata-16/ --exclude postmaster.pid --exclude pg_wal/ --exclude pg_replslot/ > /home/postgres/rsync.log 2> /home/postgres/rsync.err SELECT * FROM pg_backup_stop(wait_for_archive => true); There was also an attempt to building the Standby cluster using pg_basebackup. Problem was a same. The following messages are indicated in the log file: 2024-10-21 10:38:32.672 EEST [2816535] [] [] [] []LOG: starting PostgreSQL 16.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit 2024-10-21 10:38:32.672 EEST [2816535] [] [] [] []LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-10-21 10:38:32.672 EEST [2816535] [] [] [] []LOG: listening on IPv6 address "::", port 5432 2024-10-21 10:38:32.677 EEST [2816535] [] [] [] []LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2024-10-21 10:38:32.685 EEST [2816540] [] [] [] []LOG: database system was interrupted; last known up at 2024-10-21 06:58:37 EEST cp: cannot stat '/opt/archivelog/00000002.history': No such file or directory 2024-10-21 10:38:32.725 EEST [2816540] [] [] [] []LOG: entering standby mode 2024-10-21 10:38:32.725 EEST [2816540] [] [] [] []LOG: invalid checkpoint record 2024-10-21 10:38:32.725 EEST [2816540] [] [] [] []PANIC: could not locate a valid checkpoint record 2024-10-21 10:38:32.851 EEST [2816535] [] [] [] []LOG: startup process (PID 2816540) was terminated by signal 6: Aborted 2024-10-21 10:38:32.851 EEST [2816535] [] [] [] []LOG: aborting startup due to startup process failure 2024-10-21 10:38:32.851 EEST [2816535] [] [] [] []LOG: [pg_stat_monitor] pgsm_shmem_shutdown: Shutdown initiated. 2024-10-21 10:38:32.872 EEST [2816535] [] [] [] []LOG: database system is shut down The pg_hba.conf file contains all the settings for the primary and standby clusters. The standby.signal file is created in the data directory of the standby cluster. # CUSTOMIZED OPTIONS in postgresql.conf on Standby cluster port = 5432 listen_addresses = '*' superuser_reserved_connections = 3 log_timezone = 'Europe/Kiev' timezone = 'Europe/Kiev' logging_collector = on log_directory = '/opt/pglogs' #log_filename = 'p2p_unk-%a.log' #log_filename = 'p2p_unp-%a.log' log_filename = 'p2p_una-%a.log' log_truncate_on_rotation = on log_line_prefix = '%m [%p] [%a] [%u] [%d] [%h]' log_statement = 'none' # none, ddl, mod, all log_connections = on autovacuum = on shared_preload_libraries = 'pg_stat_statements,pg_stat_monitor' synchronous_commit = on max_locks_per_transaction = 256 cluster_name = 'p2p_un' max_connections = 200 shared_buffers = 16GB effective_cache_size = 32GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.2 effective_io_concurrency = 200 work_mem = 16MB huge_pages = try min_wal_size = 2GB max_wal_size = 8GB max_parallel_workers_per_gather = 4 max_parallel_workers = 16 max_parallel_maintenance_workers = 4 log_min_duration_statement = 1s#min_time_query_in_log archive_mode = on wal_level = replica #logical max_replication_slots = 14 #кількість підписок max_wal_senders = 15 #max_replication_slots+1 max_logical_replication_workers = 15 #max_replication_slots + резерв max_worker_processes = 16 #max_logical_replication_workers + 1 wal_keep_size = 32 #archive_command = 'rsync -aq %p postgres@<Standby_cluster_IP>:/opt/archivelog/%f'#AWS #archive_command = 'rsync -aq %p postgres@<Primary_cluster_IP>:/opt/archivelog/%f'#AWS archive_command = 'cp %p /dev/null' hot_standby = on primary_conninfo = 'user=pgadmin host=<Primary_cluster_IP> port=5432 user=pgadmin sslmode=prefer sslcompression=1' # Amazon #primary_conninfo = 'user=pgadmin host=<Standby_cluster_IP> port=5432 user=pgadmin sslmode=prefer sslcompression=1' # Amazon recovery_target_timeline = 'latest' restore_command = 'cp /opt/archivelog/%f %p' archive_cleanup_command = '/usr/pgsql-16/bin/pg_archivecleanup /opt/archivelog %r' #primary_slot_name = 'standby_slot_unk' #primary_slot_name = 'standby_slot_unp' primary_slot_name = 'standby_slot_una' log_lock_waits = 'on' deadlock_timeout = '100ms' # Parameters of the Server claster. OS - Ubuntu 20.04.5 LTS (Focal Fossa) CPU - 16 RAM - 64 Gb DB Size - 6.7T
Re: BUG #18666: Standby cluster does not start after building from Primary cluster.
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > Standby cluster does not start after building from Primary cluster. I think your archiving setup is broken: > archive_command = 'cp %p /dev/null' > ... > restore_command = 'cp /opt/archivelog/%f %p' BTW, "cp" is widely regarded as not being safe enough for production archiving, because it doesn't fsync. But this archive_command isn't even trying. So it's not surprising if there's not enough WAL to recover from. regards, tom lane
Re: BUG #18666: Standby cluster does not start after building from Primary cluster.
From
px shi
Date:
copy2slave16.sql:
SELECT pg_backup_start(label => 'label', fast => true);
\! rsync -a /opt/pgdata-16/ <Standby_cluster_IP>:/opt/pgdata-16/ --exclude
postmaster.pid --exclude pg_wal/ --exclude pg_replslot/ >
/home/postgres/rsync.log 2> /home/postgres/rsync.err
SELECT * FROM pg_backup_stop(wait_for_archive => true);
How is the
archive_command
parameter configured on your primary? I see primary and standby nodes aren't deployed on the same host, so is it set up to send WAL files to the /opt/archivelog
directory on the standby machine? Regards,
Pixian Shi
PG Bug reporting form <noreply@postgresql.org> 于2024年10月21日周一 19:33写道:
The following bug has been logged on the website:
Bug reference: 18666
Logged by: Андрій Лозовський
Email address: andrew.lozovsky@gmail.com
PostgreSQL version: 16.4
Operating system: Ubuntu 20.04.5 LTS
Description:
Standby cluster does not start after building from Primary cluster.
The copy2slave16.sh script was used for the building Standby cluster:
#!/bin/bash
. /home/postgres/pg.sh
LD_LIBRARY_PATH=$PGLIB $PGBIN/psql -Upgadmin -d postgres -f
/home/postgres/copy2slave16.sql > /home/postgres/copy.log 2>
/home/postgres/copy.err
copy2slave16.sql:
SELECT pg_backup_start(label => 'label', fast => true);
\! rsync -a /opt/pgdata-16/ <Standby_cluster_IP>:/opt/pgdata-16/ --exclude
postmaster.pid --exclude pg_wal/ --exclude pg_replslot/ >
/home/postgres/rsync.log 2> /home/postgres/rsync.err
SELECT * FROM pg_backup_stop(wait_for_archive => true);
There was also an attempt to building the Standby cluster using
pg_basebackup.
Problem was a same.
The following messages are indicated in the log file:
2024-10-21 10:38:32.672 EEST [2816535] [] [] [] []LOG: starting PostgreSQL
16.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1)
9.4.0, 64-bit
2024-10-21 10:38:32.672 EEST [2816535] [] [] [] []LOG: listening on IPv4
address "0.0.0.0", port 5432
2024-10-21 10:38:32.672 EEST [2816535] [] [] [] []LOG: listening on IPv6
address "::", port 5432
2024-10-21 10:38:32.677 EEST [2816535] [] [] [] []LOG: listening on Unix
socket "/tmp/.s.PGSQL.5432"
2024-10-21 10:38:32.685 EEST [2816540] [] [] [] []LOG: database system was
interrupted; last known up at 2024-10-21 06:58:37 EEST
cp: cannot stat '/opt/archivelog/00000002.history': No such file or
directory
2024-10-21 10:38:32.725 EEST [2816540] [] [] [] []LOG: entering standby
mode
2024-10-21 10:38:32.725 EEST [2816540] [] [] [] []LOG: invalid checkpoint
record
2024-10-21 10:38:32.725 EEST [2816540] [] [] [] []PANIC: could not locate a
valid checkpoint record
2024-10-21 10:38:32.851 EEST [2816535] [] [] [] []LOG: startup process (PID
2816540) was terminated by signal 6: Aborted
2024-10-21 10:38:32.851 EEST [2816535] [] [] [] []LOG: aborting startup due
to startup process failure
2024-10-21 10:38:32.851 EEST [2816535] [] [] [] []LOG: [pg_stat_monitor]
pgsm_shmem_shutdown: Shutdown initiated.
2024-10-21 10:38:32.872 EEST [2816535] [] [] [] []LOG: database system is
shut down
The pg_hba.conf file contains all the settings for the primary and standby
clusters.
The standby.signal file is created in the data directory of the standby
cluster.
# CUSTOMIZED OPTIONS in postgresql.conf on Standby cluster
port = 5432
listen_addresses = '*'
superuser_reserved_connections = 3
log_timezone = 'Europe/Kiev'
timezone = 'Europe/Kiev'
logging_collector = on
log_directory = '/opt/pglogs'
#log_filename = 'p2p_unk-%a.log'
#log_filename = 'p2p_unp-%a.log'
log_filename = 'p2p_una-%a.log'
log_truncate_on_rotation = on
log_line_prefix = '%m [%p] [%a] [%u] [%d] [%h]'
log_statement = 'none' # none, ddl, mod, all
log_connections = on
autovacuum = on
shared_preload_libraries = 'pg_stat_statements,pg_stat_monitor'
synchronous_commit = on
max_locks_per_transaction = 256
cluster_name = 'p2p_un'
max_connections = 200
shared_buffers = 16GB
effective_cache_size = 32GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.2
effective_io_concurrency = 200
work_mem = 16MB
huge_pages = try
min_wal_size = 2GB
max_wal_size = 8GB
max_parallel_workers_per_gather = 4
max_parallel_workers = 16
max_parallel_maintenance_workers = 4
log_min_duration_statement = 1s#min_time_query_in_log
archive_mode = on
wal_level = replica #logical
max_replication_slots = 14 #кількість підписок
max_wal_senders = 15 #max_replication_slots+1
max_logical_replication_workers = 15 #max_replication_slots + резерв
max_worker_processes = 16 #max_logical_replication_workers +
1
wal_keep_size = 32
#archive_command = 'rsync -aq %p
postgres@<Standby_cluster_IP>:/opt/archivelog/%f'#AWS
#archive_command = 'rsync -aq %p
postgres@<Primary_cluster_IP>:/opt/archivelog/%f'#AWS
archive_command = 'cp %p /dev/null'
hot_standby = on
primary_conninfo = 'user=pgadmin host=<Primary_cluster_IP> port=5432
user=pgadmin sslmode=prefer sslcompression=1' # Amazon
#primary_conninfo = 'user=pgadmin host=<Standby_cluster_IP> port=5432
user=pgadmin sslmode=prefer sslcompression=1' # Amazon
recovery_target_timeline = 'latest'
restore_command = 'cp /opt/archivelog/%f %p'
archive_cleanup_command = '/usr/pgsql-16/bin/pg_archivecleanup
/opt/archivelog %r'
#primary_slot_name = 'standby_slot_unk'
#primary_slot_name = 'standby_slot_unp'
primary_slot_name = 'standby_slot_una'
log_lock_waits = 'on'
deadlock_timeout = '100ms'
# Parameters of the Server claster.
OS - Ubuntu 20.04.5 LTS (Focal Fossa)
CPU - 16
RAM - 64 Gb
DB Size - 6.7T
Re: BUG #18666: Standby cluster does not start after building from Primary cluster.
From
"David G. Johnston"
Date:
On Monday, October 21, 2024, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18666
Logged by: Андрій Лозовський
Email address: andrew.lozovsky@gmail.com
PostgreSQL version: 16.4
Operating system: Ubuntu 20.04.5 LTS
Description:
There isn’t likely to be a bug here.
There was also an attempt to building the Standby cluster using
pg_basebackup.
Problem was a same.
I suggest to request help on using pg_basebackup on the -general mailing list.
David J.