Re: pgsql: Add contrib/pg_walinspect. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pgsql: Add contrib/pg_walinspect.
Date
Msg-id 298622.1651025665@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgsql: Add contrib/pg_walinspect.  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> Ok, that's the same for me.  Next question: why does the patch I
> posted not help?

I improved the instrumentation a bit, and it looks like what is
happening is that loc > read_upto, causing that code to "break"
independently of wait_for_wal.  success.log is from "make installcheck"
immediately after initdb; fail.log is from "make check".

            regards, tom lane

diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 960530e..85dcb3a 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -15,7 +15,6 @@ REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_walinspect/walinspect.conf

 # Disabled because these tests require "wal_level=replica", which
 # some installcheck users do not have (e.g. buildfarm clients).
-NO_INSTALLCHECK = 1

 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index cf5db23..33d2c73 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -373,6 +373,9 @@ XLogNextRecord(XLogReaderState *state, char **errormsg)
          */
         Assert(!XLogRecPtrIsInvalid(state->EndRecPtr));

+        if(!*errormsg)
+          *errormsg = "decode_queue_head is null";
+
         return NULL;
     }

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 4257026..5ea59a3 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -951,6 +951,9 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr,

         if (state->currTLI == currTLI)
         {
+          elog(LOG, "considering wait for wal: loc %X/%X read_upto %X/%X wait_for_wal %d",
+               LSN_FORMAT_ARGS(loc), LSN_FORMAT_ARGS(read_upto),
+               wait_for_wal);

             if (loc <= read_upto)
                 break;
2022-04-27 04:06:21.995 CEST [4503] LOG:  starting PostgreSQL 15devel on mipsel-unknown-linux-gnu, compiled by gcc
(Debian4.9.2-10+deb8u1) 4.9.2, 32-bit 
2022-04-27 04:06:21.997 CEST [4503] LOG:  listening on IPv6 address "::1", port 5440
2022-04-27 04:06:21.997 CEST [4503] LOG:  listening on IPv4 address "127.0.0.1", port 5440
2022-04-27 04:06:22.000 CEST [4503] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5440"
2022-04-27 04:06:22.021 CEST [4523] LOG:  database system was shut down at 2022-04-27 04:06:11 CEST
2022-04-27 04:06:22.048 CEST [4503] LOG:  database system is ready to accept connections
2022-04-27 04:06:30.619 CEST [4738] ERROR:  WAL start LSN must be less than end LSN
2022-04-27 04:06:30.619 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903EB0',
'0/1903E08');
2022-04-27 04:06:30.621 CEST [4738] ERROR:  WAL start LSN must be less than end LSN
2022-04-27 04:06:30.621 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903EB0',
'0/1903E08');
2022-04-27 04:06:30.623 CEST [4738] LOG:  considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.623 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_record_info('0/1903E08');
2022-04-27 04:06:30.623 CEST [4738] LOG:  considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.623 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_record_info('0/1903E08');
2022-04-27 04:06:30.626 CEST [4738] LOG:  considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.626 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903E08',
'0/1903EB0');
2022-04-27 04:06:30.626 CEST [4738] LOG:  considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.626 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903E08',
'0/1903EB0');
2022-04-27 04:06:30.630 CEST [4738] LOG:  considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.630 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info_till_end_of_wal('0/1903E08');
2022-04-27 04:06:30.630 CEST [4738] LOG:  considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.630 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info_till_end_of_wal('0/1903E08');
2022-04-27 04:06:30.633 CEST [4738] LOG:  considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.633 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903E08',
'0/1903EB0');
2022-04-27 04:06:30.634 CEST [4738] LOG:  considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.634 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903E08',
'0/1903EB0');
2022-04-27 04:06:30.638 CEST [4738] LOG:  considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.638 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats_till_end_of_wal('0/1903E08');
2022-04-27 04:06:30.639 CEST [4738] LOG:  considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.639 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats_till_end_of_wal('0/1903E08');
2022-04-27 04:06:30.653 CEST [4738] LOG:  considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.653 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E08',
'0/1903EB0')
                WHERE block_ref LIKE concat('%', '16391', '%') AND resource_manager = 'Heap';
2022-04-27 04:06:30.654 CEST [4738] LOG:  considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.654 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E08',
'0/1903EB0')
                WHERE block_ref LIKE concat('%', '16391', '%') AND resource_manager = 'Heap';
2022-04-27 04:06:30.657 CEST [4738] LOG:  considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.657 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E08',
'0/1903EB0')
                WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
2022-04-27 04:06:30.658 CEST [4738] LOG:  considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0
2022-04-27 04:06:30.658 CEST [4738] STATEMENT:  SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E08',
'0/1903EB0')
                WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
2022-04-27 04:06:37.735 CEST [4503] LOG:  received fast shutdown request
2022-04-27 04:06:37.829 CEST [4503] LOG:  aborting any active transactions
2022-04-27 04:06:37.835 CEST [4503] LOG:  background worker "logical replication launcher" (PID 4526) exited with exit
code1 
2022-04-27 04:06:40.402 CEST [4521] LOG:  shutting down
2022-04-27 04:06:40.406 CEST [4521] LOG:  checkpoint starting: shutdown immediate
2022-04-27 04:06:41.063 CEST [4521] LOG:  checkpoint complete: wrote 784 buffers (4.8%); 0 WAL file(s) added, 0
removed,0 recycled; write=0.218 s, sync=0.001 s, total=0.661 s; sync files=0, longest=0.000 s, average=0.000 s;
distance=4197kB, estimate=4197 kB 
2022-04-27 04:06:43.573 CEST [4503] LOG:  database system is shut down
2022-04-27 04:04:08.677 CEST postmaster[3167] LOG:  starting PostgreSQL 15devel on mipsel-unknown-linux-gnu, compiled
bygcc (Debian 4.9.2-10+deb8u1) 4.9.2, 32-bit 
2022-04-27 04:04:08.677 CEST postmaster[3167] LOG:  listening on Unix socket "/tmp/pg_regress-S5YZoI/.s.PGSQL.51696"
2022-04-27 04:04:08.708 CEST startup[3172] LOG:  database system was shut down at 2022-04-27 04:04:08 CEST
2022-04-27 04:04:08.737 CEST postmaster[3167] LOG:  database system is ready to accept connections
2022-04-27 04:04:12.135 CEST client backend[3185] pg_regress/pg_walinspect ERROR:  WAL start LSN must be less than end
LSN
2022-04-27 04:04:12.135 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info('0/1903EE8','0/1903E40'); 
2022-04-27 04:04:12.137 CEST client backend[3185] pg_regress/pg_walinspect ERROR:  WAL start LSN must be less than end
LSN
2022-04-27 04:04:12.137 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats('0/1903EE8','0/1903E40'); 
2022-04-27 04:04:12.138 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1002000read_upto 0/1903F90 wait_for_wal 0 
2022-04-27 04:04:12.138 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_record_info('0/1903E40');
2022-04-27 04:04:12.139 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1903E40read_upto 0/1903F90 wait_for_wal 0 
2022-04-27 04:04:12.139 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_record_info('0/1903E40');
2022-04-27 04:04:12.142 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1002000read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.142 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info('0/1903E40','0/1903EE8'); 
2022-04-27 04:04:12.143 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1903E40read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.143 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info('0/1903E40','0/1903EE8'); 
2022-04-27 04:04:12.146 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1002000read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.146 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info_till_end_of_wal('0/1903E40');
2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1903E40read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info_till_end_of_wal('0/1903E40');
2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1904034read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info_till_end_of_wal('0/1903E40');
2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect ERROR:  could not read WAL at 0/1903E40:
decode_queue_headis null 
2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_records_info_till_end_of_wal('0/1903E40');
2022-04-27 04:04:12.149 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1002000read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.149 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats('0/1903E40','0/1903EE8'); 
2022-04-27 04:04:12.149 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1903E40read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.149 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats('0/1903E40','0/1903EE8'); 
2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1002000read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats_till_end_of_wal('0/1903E40');
2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1903E40read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats_till_end_of_wal('0/1903E40');
2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1904034read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats_till_end_of_wal('0/1903E40');
2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect ERROR:  could not read WAL at 0/1903E40:
decode_queue_headis null 
2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM
pg_get_wal_stats_till_end_of_wal('0/1903E40');
2022-04-27 04:04:12.165 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1002000read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.165 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 1 AS ok FROM
pg_get_wal_records_info('0/1903E40','0/1903EE8') 
                WHERE block_ref LIKE concat('%', '16391', '%') AND resource_manager = 'Heap';
2022-04-27 04:04:12.165 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1903E40read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.165 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 1 AS ok FROM
pg_get_wal_records_info('0/1903E40','0/1903EE8') 
                WHERE block_ref LIKE concat('%', '16391', '%') AND resource_manager = 'Heap';
2022-04-27 04:04:12.168 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1002000read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.168 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 1 AS ok FROM
pg_get_wal_records_info('0/1903E40','0/1903EE8') 
                WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
2022-04-27 04:04:12.168 CEST client backend[3185] pg_regress/pg_walinspect LOG:  considering wait for wal: loc
0/1903E40read_upto 0/1904000 wait_for_wal 0 
2022-04-27 04:04:12.168 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT:  SELECT COUNT(*) >= 1 AS ok FROM
pg_get_wal_records_info('0/1903E40','0/1903EE8') 
                WHERE resource_manager = 'Heap' AND record_type = 'INSERT';
2022-04-27 04:04:12.297 CEST postmaster[3167] LOG:  received fast shutdown request
2022-04-27 04:04:12.299 CEST postmaster[3167] LOG:  aborting any active transactions
2022-04-27 04:04:12.306 CEST postmaster[3167] LOG:  background worker "logical replication launcher" (PID 3175) exited
withexit code 1 
2022-04-27 04:04:12.307 CEST checkpointer[3170] LOG:  shutting down
2022-04-27 04:04:12.310 CEST checkpointer[3170] LOG:  checkpoint starting: shutdown immediate
2022-04-27 04:04:12.536 CEST checkpointer[3170] LOG:  checkpoint complete: wrote 784 buffers (4.8%); 0 WAL file(s)
added,0 removed, 0 recycled; write=0.212 s, sync=0.001 s, total=0.230 s; sync files=0, longest=0.000 s, average=0.000
s;distance=4197 kB, estimate=4197 kB 
2022-04-27 04:04:12.866 CEST postmaster[3167] LOG:  database system is shut down

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: pgsql: Add contrib/pg_walinspect.
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Add contrib/pg_walinspect.