BUG #16029: pg_basebackup - At least one WAL file is missing - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16029: pg_basebackup - At least one WAL file is missing |
Date | |
Msg-id | 16029-cc360b93eda43c60@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16029: pg_basebackup - At least one WAL file is missing
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16029 Logged by: Georg H. Email address: silentx@arcor.de PostgreSQL version: 11.5 Operating system: CentOS Linux release 7.7.1908 Description: Hello, I've upgraded 2 clusters (single, no HA / replication except BARMANs synchronous WAL streaming) as described below from v10.10 to v11.5. Both clusters are on the same server, using same file systems, having identical postgresql.conf settings (except port / cluster name / folders) and have more or less same content (2 development stages) We take sequential backups of both clusters using BARMAN with backup_method = postgres (If set to postgres barman will use the pg_basebackup command to execute the backup). BARMAN conf for both clusters is also identical and using same PostgreSQL 11.5 binaries out of the same repo as the DB-server uses. Since the upgrade the backup of one of both clusters is failing 2-3 times a week with "error=At least one WAL file is missing. The first missing WAL file is xxx". In fact the "missing" WAL is archived to the backup server 2 seconds after "pg_stop_backup complete" (based on postgresql.log). The second cluster doesn't have this problem at all. I'm not sure if this is a bug or just a configuration problem. Noticeable is a difference between backup.info file (end_wal) and backup history file (STOP WAL LOCATION) contained in the basebackup. When failing, the backup history file is the one of STOP WAL LOCATION and the next WAL is in the error marked as missing as it's set as end_wal in backup.info e.g. (see full content below): backup history file is 000000010000000100000083.00000028.backup saying STOP WAL LOCATION: 1/83000130 (file 000000010000000100000083) and missing WAL = end_wal = 000000010000000100000084 I would expect, that "end_wal = STOP WAL LOCATION", so pointing to the same WAL as it does when backup succeed. --backup.info backup_label=None begin_offset=40 begin_time=2019-09-26 23:35:02+02:00 begin_wal=000000010000000100000083 begin_xlog=1/83000028 config_file=/pg_data/data/i02mspdev1/postgresql.conf copy_stats={'total_time': 23.154973, 'copy_time': 23.154973} deduplicated_size=2038909210 end_offset=96 end_time=2019-09-26 23:35:25.052851+02:00 end_wal=000000010000000100000084 end_xlog=1/84000060 error=At least one WAL file is missing. The first missing WAL file is 000000010000000100000084 hba_file=/pg_data/data/i02mspdev1/pg_hba.conf ident_file=/pg_data/data/i02mspdev1/pg_ident.conf included_files=None mode=postgres pgdata=/pg_data/data/i02mspdev1 server_name=dev_server_5433 size=2038909210 status=FAILED tablespaces=[('indexspace', 16400, '/pg_data/indexts/i02mspdev1')] timeline=1 version=110005 xlog_segment_size=16777216 --backup.info end --backup history START WAL LOCATION: 1/83000028 (file 000000010000000100000083) STOP WAL LOCATION: 1/83000130 (file 000000010000000100000083) CHECKPOINT LOCATION: 1/83000060 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2019-09-26 23:35:02 CEST LABEL: pg_basebackup base backup START TIMELINE: 1 STOP TIME: 2019-09-26 23:35:21 CEST STOP TIMELINE: 1 --backup history end --excerpt out of postgresql.log (Level DEBUG1) : 2019-09-26 23:35:01.732 CEST [166017] [unknown] streaming_barman 10.x.x.101 LOG: received replication command: IDENTIFY_SYSTEM 2019-09-26 23:35:02.010 CEST [166018] [unknown] streaming_barman 10.x.x.101 LOG: received replication command: SHOW data_directory_mode 2019-09-26 23:35:02.011 CEST [166018] [unknown] streaming_barman 10.x.x.101 LOG: received replication command: SHOW wal_segment_size 2019-09-26 23:35:02.023 CEST [166018] [unknown] streaming_barman 10.x.x.101 LOG: received replication command: IDENTIFY_SYSTEM 2019-09-26 23:35:02.038 CEST [166018] [unknown] streaming_barman 10.x.x.101 LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base backup' FAST 2019-09-26 23:35:02.057 CEST [97464] DEBUG: performing replication slot checkpoint 2019-09-26 23:35:05.090 CEST [97468] DEBUG: archived write-ahead log file "000000010000000100000082" 2019-09-26 23:35:13.901 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: file "pg_internal.init" excluded from backup 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: contents of directory "pg_dynshmem" excluded from backup 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: contents of directory "pg_notify" excluded from backup 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: contents of directory "pg_serial" excluded from backup 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: contents of directory "pg_snapshots" excluded from backup 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: contents of directory "pg_subtrans" excluded from backup 2019-09-26 23:35:13.916 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: file "pg_internal.init" excluded from backup 2019-09-26 23:35:14.060 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: file "pg_internal.init" excluded from backup 2019-09-26 23:35:14.120 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: file "pg_internal.init" excluded from backup 2019-09-26 23:35:17.744 CEST [166068] DEBUG: autovacuum: processing database "postgres" 2019-09-26 23:35:21.325 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: contents of directory "pg_replslot" excluded from backup 2019-09-26 23:35:21.325 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: contents of directory "pg_stat_tmp" excluded from backup 2019-09-26 23:35:21.421 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: file "postmaster.opts" excluded from backup 2019-09-26 23:35:21.421 CEST [166018] [unknown] streaming_barman 10.x.x.101 DEBUG: file "postmaster.pid" excluded from backup 2019-09-26 23:35:22.682 CEST [97468] DEBUG: archived write-ahead log file "000000010000000100000083" 2019-09-26 23:35:23.569 CEST [97468] DEBUG: archived write-ahead log file "000000010000000100000083.00000028.backup" 2019-09-26 23:35:24.449 CEST [166018] [unknown] streaming_barman 10.x.x.101 NOTICE: pg_stop_backup complete, all required WAL segments have been archived 2019-09-26 23:35:25.547 CEST [166098] postgres barman 10.x.x.101 LOG: restore point "barman_20190926T233501" created at 1/85000090 2019-09-26 23:35:25.547 CEST [166098] postgres barman 10.x.x.101 STATEMENT: SELECT pg_create_restore_point('barman_20190926T233501') 2019-09-26 23:35:26.201 CEST [97468] DEBUG: archived write-ahead log file "000000010000000100000084" OS: CentOS Linux release 7.7.1908 (Core) on a virtual machine (on a HyperV host) with 12 GB RAM uname -a: Linux dev_server 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux We are using PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit installed from repo https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-$releasever-$basearch Upgraded from PostgreSQL 10.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit using pg_dumpall -> initdb -> psql load of dump (psql -f) -> vacuumdb -a -f -z initdb: /usr/pgsql-11/bin/initdb --pgdata=/pg_data/data/i02mspdev1/ --waldir=/pg_data/wals/i02mspdev01/11 --pwprompt --encoding=UTF8 --locale=en_US.UTF-8 --data-checksums barman -v 2.9 Barman by 2ndQuadrant (www.2ndQuadrant.com) server config: path_prefix = "/usr/pgsql-11/bin" --changed postgresql.conf settings: 1 "listen_addresses" "*" 2 "port" "5433" 3 "max_connections" "120" 4 "superuser_reserved_connections" "5" 5 "unix_socket_directories" "/var/run/modispostgresql, /tmp" 6 "tcp_keepalives_idle" "4" 7 "tcp_keepalives_interval" "4" 8 "tcp_keepalives_count" "2" 9 "shared_buffers" "2GB" 10 "temp_buffers" "512MB" 11 "work_mem" "16MB" 12 "maintenance_work_mem" "512MB" 13 "dynamic_shared_memory_type" "posix" 14 "effective_io_concurrency" "200" 15 "wal_level" "replica" 16 "wal_sync_method" "fdatasync" 17 "wal_compression" "on" 18 "wal_buffers" "16MB" 19 "max_wal_size" "2GB" 20 "min_wal_size" "1GB" 21 "checkpoint_completion_target" "0.9" 22 "archive_mode" "on" 23 "archive_command" "barman-wal-archive barman.server dev_server_5433 %p && test ! -f /pg_data/walarch/i02mspdev1/11/%f && cp %p /pg_data/walarch/i02mspdev1/11/%f " 24 "wal_keep_segments" "4" 25 "track_commit_timestamp" "on" 26 "synchronous_standby_names" "barman_receive_wal" 27 "hot_standby" "off" 28 "random_page_cost" "1.0" 29 "min_parallel_table_scan_size" "4MB" 30 "from_collapse_limit" "12" 31 "join_collapse_limit" "16" 32 "log_destination" "stderr" 33 "logging_collector" "on" 34 "log_directory" "log" 35 "log_filename" "postgresql-%Y-%m-%d_%H%M%S.log" 36 "log_truncate_on_rotation" "on" 37 "log_rotation_age" "1d" 38 "log_rotation_size" "10MB" 39 "syslog_ident" "postgres_i02mspdev1" 40 "log_min_messages" "debug1" 41 "log_line_prefix" "%m [%p] %d %u %h " 42 "log_lock_waits" "on" 43 "log_statement" "ddl" 44 "log_replication_commands" "on" 45 "log_timezone" "Europe/Berlin" 46 "cluster_name" "i02mspdev1" 47 "track_functions" "pl" 48 "temp_tablespaces" "/pg_tempts/i02mspdev1" 49 "datestyle" "iso, mdy" 50 "timezone" "Europe/Berlin" 51 "lc_messages" "en_US.UTF-8" 52 "lc_monetary" "en_US.UTF-8" 53 "lc_numeric" "en_US.UTF-8" 54 "lc_time" "en_US.UTF-8" 55 "default_text_search_config" "pg_catalog.english" 56 "shared_preload_libraries" "plugin_debugger,pg_stat_statements" kind regards Georg
pgsql-bugs by date: