Thread: Re: pgsql: Add contrib/pg_walinspect.
Hi Jeff, On Fri, Apr 08, 2022 at 07:27:44AM +0000, Jeff Davis wrote: > Add contrib/pg_walinspect. > > Provides similar functionality to pg_waldump, but from a SQL interface > rather than a separate utility. The tests of pg_walinspect look unstable: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2022-04-25%2001%3A48%3A47 SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal(:'wal_lsn1'); - ok ----- - t -(1 row) - +ERROR: could not read WAL at 0/1903E40 This points out at ReadNextXLogRecord(), though I would not blame this test suite as you create a physical replication slot beforehand. Could this be an issue related to the addition of the circular WAL decoding buffer, aka 3f1ce97? I am adding an open item about that. Thanks, -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Could this be an issue related to the addition of the circular WAL > decoding buffer, aka 3f1ce97? I already whined about this at [1]. I've been wondering if the issue could be traced to topminnow's unusual hardware properties, specifically that it has MAXALIGN 8 even though it's only a 32-bit machine per sizeof(void *). I think the only other active buildfarm animal like that is my gaur ... but I've failed to reproduce it on gaur. Best guess at the moment is that it's a timing issue that topminnow manages to reproduce often. Anyway, as I said in the other thread, I can reproduce it on topminnow's host. Let me know if you have ideas about how to home in on the cause. regards, tom lane [1] https://www.postgresql.org/message-id/111657.1650910309%40sss.pgh.pa.us
On Tue, Apr 26, 2022 at 01:25:14AM -0400, Tom Lane wrote: > I've been wondering if the issue could be traced to topminnow's unusual > hardware properties, specifically that it has MAXALIGN 8 even though > it's only a 32-bit machine per sizeof(void *). I think the only > other active buildfarm animal like that is my gaur ... but I've > failed to reproduce it on gaur. Best guess at the moment is that > it's a timing issue that topminnow manages to reproduce often. I have managed to miss your message. Let's continue the discussion there, then. > Anyway, as I said in the other thread, I can reproduce it on > topminnow's host. Let me know if you have ideas about how to > home in on the cause. Nice. I have not been able to do so, but based on the lack of reports from the buildfarm, that's not surprising. -- Michael
Attachment
On Tue, Apr 26, 2022 at 5:36 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Apr 26, 2022 at 01:25:14AM -0400, Tom Lane wrote: > > I've been wondering if the issue could be traced to topminnow's unusual > > hardware properties, specifically that it has MAXALIGN 8 even though > > it's only a 32-bit machine per sizeof(void *). I think the only > > other active buildfarm animal like that is my gaur ... but I've > > failed to reproduce it on gaur. Best guess at the moment is that > > it's a timing issue that topminnow manages to reproduce often. > > I have managed to miss your message. Let's continue the discussion > there, then. I think it's a bug in pg_walinspect, so I'll move the discussion back here. Here's one rather simple way to fix it, that has survived running the test a thousand times (using a recipe that failed for me quite soon, after 20-100 attempts or so; I never figured out how to get the 50% failure rate reported by Tom). Explanation in commit message. You can see that the comments near the first hunk already contemplated this possibility, but just didn't try to handle it. Another idea that I slept on, but rejected, is that the new WOULDBLOCK return value introduced to support WAL prefetching could be used here (it's a way of reporting a lack of data, different from errors). Unfortunately it's not exposed to the XLogReadRecord() interface, as I only intended it for use by XLogReadAhead(). I don't really think it's a good idea to redesign that API at this juncture. Maybe there is some other way I haven't considered -- is there a way to get the LSN past the latest whole flushed record from shmem?
Attachment
Thomas Munro <thomas.munro@gmail.com> writes: > I think it's a bug in pg_walinspect, so I'll move the discussion back > here. Here's one rather simple way to fix it, that has survived > running the test a thousand times (using a recipe that failed for me > quite soon, after 20-100 attempts or so; I never figured out how to > get the 50% failure rate reported by Tom). Not sure what we're doing differently, but plain "make check" in contrib/pg_walinspect fails pretty consistently for me on gcc23. I tried it again just now and got five failures in five attempts. I then installed your patch and got the same failure, three times out of three, so I don't think we're there yet. Again, since I do have this problem in captivity, I'm happy to spend some time poking at it. But I could use a little guidance where to poke, because I've not looked at any of the WAL prefetch stuff. regards, tom lane
On Wed, Apr 27, 2022 at 12:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > I think it's a bug in pg_walinspect, so I'll move the discussion back > > here. Here's one rather simple way to fix it, that has survived > > running the test a thousand times (using a recipe that failed for me > > quite soon, after 20-100 attempts or so; I never figured out how to > > get the 50% failure rate reported by Tom). > > Not sure what we're doing differently, but plain "make check" in > contrib/pg_walinspect fails pretty consistently for me on gcc23. > I tried it again just now and got five failures in five attempts. I tried on the /home filesystem (a slow NFS mount) and then inside a directory on /tmp to get ext4 (I saw that Noah had somehow got onto a local filesystem, based on the present of "ext4" in the pathname and I was trying everything I could think of). I used what I thought might be some relevant starter configure options copied from the animal: ./configure --prefix=$HOME/install --enable-cassert --enable-debug --enable-tap-tests CC="ccache gcc -mips32r2" CFLAGS="-O2 -funwind-tables" LDFLAGS="-rdynamic" For me, make check always succeeds in contrib/pg_walinspect. For me, make installcheck fails if I do it enough times in a loop, somewhere around the 20th loop or so, which I imagine has to do with WAL page boundaries moving around. for i in `seq 1 1000` ; do make -s installcheck || exit 1 done > I then installed your patch and got the same failure, three times > out of three, so I don't think we're there yet. Hrmph... Are you sure you rebuilt the contrib module? Assuming so, maybe it's failing in a different way for you and me. For me, it always fails after this break is reached in xlogutil.c: /* If asked, let's not wait for future WAL. */ if (!wait_for_wal) break; If you add a log message there, do you see that? For me, the patch fixes it, because it teaches pg_walinspect that messageless errors are a way of detecting end-of-data (due to the code above, introduced by the pg_walinspect commit).
Thomas Munro <thomas.munro@gmail.com> writes: > Hrmph... Are you sure you rebuilt the contrib module? Assuming so, > maybe it's failing in a different way for you and me. For me, it > always fails after this break is reached in xlogutil.c: > /* If asked, let's not wait for future WAL. */ > if (!wait_for_wal) > break; Hmm. For me, that statement is not reached at all in successful (make installcheck) runs. In a failing run, it's reached with wait_for_wal = false, after which we get the "could not read WAL" failure. Usually that happens twice, as per attached. regards, tom lane diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile index 960530e..85dcb3a 100644 --- a/contrib/pg_walinspect/Makefile +++ b/contrib/pg_walinspect/Makefile @@ -15,7 +15,6 @@ REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_walinspect/walinspect.conf # Disabled because these tests require "wal_level=replica", which # some installcheck users do not have (e.g. buildfarm clients). -NO_INSTALLCHECK = 1 ifdef USE_PGXS PG_CONFIG = pg_config diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index cf5db23..33d2c73 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -373,6 +373,9 @@ XLogNextRecord(XLogReaderState *state, char **errormsg) */ Assert(!XLogRecPtrIsInvalid(state->EndRecPtr)); + if(!*errormsg) + *errormsg = "decode_queue_head is null"; + return NULL; } diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 4257026..4089efd 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -955,6 +955,8 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr, if (loc <= read_upto) break; + elog(LOG, "let's wait for wal: %d", wait_for_wal); + /* If asked, let's not wait for future WAL. */ if (!wait_for_wal) break; 2022-04-27 03:40:09.253 CEST postmaster[32569] LOG: starting PostgreSQL 15devel on mipsel-unknown-linux-gnu, compiled bygcc (Debian 4.9.2-10+deb8u1) 4.9.2, 32-bit 2022-04-27 03:40:09.254 CEST postmaster[32569] LOG: listening on Unix socket "/tmp/pg_regress-g0HWzi/.s.PGSQL.51696" 2022-04-27 03:40:09.284 CEST startup[32574] LOG: database system was shut down at 2022-04-27 03:40:09 CEST 2022-04-27 03:40:09.314 CEST postmaster[32569] LOG: database system is ready to accept connections 2022-04-27 03:40:12.073 CEST client backend[32598] pg_regress/pg_walinspect ERROR: WAL start LSN must be less than end LSN 2022-04-27 03:40:12.073 CEST client backend[32598] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903EE8','0/1903E40'); 2022-04-27 03:40:12.075 CEST client backend[32598] pg_regress/pg_walinspect ERROR: WAL start LSN must be less than end LSN 2022-04-27 03:40:12.075 CEST client backend[32598] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903EE8','0/1903E40'); 2022-04-27 03:40:12.085 CEST client backend[32598] pg_regress/pg_walinspect LOG: let's wait for wal: 0 2022-04-27 03:40:12.085 CEST client backend[32598] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal('0/1903E40'); 2022-04-27 03:40:12.085 CEST client backend[32598] pg_regress/pg_walinspect ERROR: could not read WAL at 0/1903E40: decode_queue_headis null 2022-04-27 03:40:12.085 CEST client backend[32598] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal('0/1903E40'); 2022-04-27 03:40:12.091 CEST client backend[32598] pg_regress/pg_walinspect LOG: let's wait for wal: 0 2022-04-27 03:40:12.091 CEST client backend[32598] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal('0/1903E40'); 2022-04-27 03:40:12.091 CEST client backend[32598] pg_regress/pg_walinspect ERROR: could not read WAL at 0/1903E40: decode_queue_headis null 2022-04-27 03:40:12.091 CEST client backend[32598] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal('0/1903E40'); 2022-04-27 03:40:12.198 CEST postmaster[32569] LOG: received fast shutdown request 2022-04-27 03:40:12.299 CEST postmaster[32569] LOG: aborting any active transactions 2022-04-27 03:40:12.304 CEST postmaster[32569] LOG: background worker "logical replication launcher" (PID 32577) exitedwith exit code 1 2022-04-27 03:40:12.306 CEST checkpointer[32572] LOG: shutting down 2022-04-27 03:40:12.310 CEST checkpointer[32572] LOG: checkpoint starting: shutdown immediate 2022-04-27 03:40:12.524 CEST checkpointer[32572] LOG: checkpoint complete: wrote 784 buffers (4.8%); 0 WAL file(s) added,0 removed, 0 recycled; write=0.199 s, sync=0.001 s, total=0.218 s; sync files=0, longest=0.000 s, average=0.000 s;distance=4197 kB, estimate=4197 kB 2022-04-27 03:40:12.838 CEST postmaster[32569] LOG: database system is shut down
On Wed, Apr 27, 2022 at 1:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Hrmph... Are you sure you rebuilt the contrib module? Assuming so, > > maybe it's failing in a different way for you and me. For me, it > > always fails after this break is reached in xlogutil.c: > > > /* If asked, let's not wait for future WAL. */ > > if (!wait_for_wal) > > break; > > Hmm. For me, that statement is not reached at all in successful > (make installcheck) runs. In a failing run, it's reached with > wait_for_wal = false, after which we get the "could not read WAL" > failure. Usually that happens twice, as per attached. Ok, that's the same for me. Next question: why does the patch I posted not help? For me, the error "could not read WAL at %X/%X", seen on the BF log, is raised by ReadNextXLogRecord() in pg_walinspect.c. The patch removes that ereport() entirely (and handles NULL in a couple of places).
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Apr 27, 2022 at 12:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Not sure what we're doing differently, but plain "make check" in >> contrib/pg_walinspect fails pretty consistently for me on gcc23. >> I tried it again just now and got five failures in five attempts. > I tried on the /home filesystem (a slow NFS mount) and then inside a > directory on /tmp to get ext4 (I saw that Noah had somehow got onto a > local filesystem, based on the present of "ext4" in the pathname and I > was trying everything I could think of). I used what I thought might > be some relevant starter configure options copied from the animal: > ./configure --prefix=$HOME/install --enable-cassert --enable-debug > --enable-tap-tests CC="ccache gcc -mips32r2" CFLAGS="-O2 > -funwind-tables" LDFLAGS="-rdynamic" Hmph. I'm also running it on the /home filesystem, and I used these settings: export CC="ccache gcc -mips32r2" export CFLAGS="-O2 -funwind-tables" export LDFLAGS="-rdynamic" ./configure --enable-debug --enable-cassert --with-systemd --enable-nls --with-icu --enable-tap-tests --with-system-tzdata=/usr/share/zoneinfo plus uninteresting stuff like --prefix. Now maybe some of these other options affect this, but I'd be pretty surprised if so. So I'm at a loss why it behaves differently for you. regards, tom lane
On Wed, Apr 27, 2022 at 1:54 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Apr 27, 2022 at 1:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > Hrmph... Are you sure you rebuilt the contrib module? Assuming so, > > > maybe it's failing in a different way for you and me. For me, it > > > always fails after this break is reached in xlogutil.c: > > > > > /* If asked, let's not wait for future WAL. */ > > > if (!wait_for_wal) > > > break; > > > > Hmm. For me, that statement is not reached at all in successful > > (make installcheck) runs. In a failing run, it's reached with > > wait_for_wal = false, after which we get the "could not read WAL" > > failure. Usually that happens twice, as per attached. > > Ok, that's the same for me. Next question: why does the patch I > posted not help? For me, the error "could not read WAL at %X/%X", > seen on the BF log, is raised by ReadNextXLogRecord() in > pg_walinspect.c. The patch removes that ereport() entirely (and > handles NULL in a couple of places). BTW If you had your local change from debug.patch (upthread), that'd defeat the patch. I mean this: + if(!*errormsg) + *errormsg = "decode_queue_head is null";
Thomas Munro <thomas.munro@gmail.com> writes: > Ok, that's the same for me. Next question: why does the patch I > posted not help? I improved the instrumentation a bit, and it looks like what is happening is that loc > read_upto, causing that code to "break" independently of wait_for_wal. success.log is from "make installcheck" immediately after initdb; fail.log is from "make check". regards, tom lane diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile index 960530e..85dcb3a 100644 --- a/contrib/pg_walinspect/Makefile +++ b/contrib/pg_walinspect/Makefile @@ -15,7 +15,6 @@ REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_walinspect/walinspect.conf # Disabled because these tests require "wal_level=replica", which # some installcheck users do not have (e.g. buildfarm clients). -NO_INSTALLCHECK = 1 ifdef USE_PGXS PG_CONFIG = pg_config diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index cf5db23..33d2c73 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -373,6 +373,9 @@ XLogNextRecord(XLogReaderState *state, char **errormsg) */ Assert(!XLogRecPtrIsInvalid(state->EndRecPtr)); + if(!*errormsg) + *errormsg = "decode_queue_head is null"; + return NULL; } diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 4257026..5ea59a3 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -951,6 +951,9 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr, if (state->currTLI == currTLI) { + elog(LOG, "considering wait for wal: loc %X/%X read_upto %X/%X wait_for_wal %d", + LSN_FORMAT_ARGS(loc), LSN_FORMAT_ARGS(read_upto), + wait_for_wal); if (loc <= read_upto) break; 2022-04-27 04:06:21.995 CEST [4503] LOG: starting PostgreSQL 15devel on mipsel-unknown-linux-gnu, compiled by gcc (Debian4.9.2-10+deb8u1) 4.9.2, 32-bit 2022-04-27 04:06:21.997 CEST [4503] LOG: listening on IPv6 address "::1", port 5440 2022-04-27 04:06:21.997 CEST [4503] LOG: listening on IPv4 address "127.0.0.1", port 5440 2022-04-27 04:06:22.000 CEST [4503] LOG: listening on Unix socket "/tmp/.s.PGSQL.5440" 2022-04-27 04:06:22.021 CEST [4523] LOG: database system was shut down at 2022-04-27 04:06:11 CEST 2022-04-27 04:06:22.048 CEST [4503] LOG: database system is ready to accept connections 2022-04-27 04:06:30.619 CEST [4738] ERROR: WAL start LSN must be less than end LSN 2022-04-27 04:06:30.619 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903EB0', '0/1903E08'); 2022-04-27 04:06:30.621 CEST [4738] ERROR: WAL start LSN must be less than end LSN 2022-04-27 04:06:30.621 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903EB0', '0/1903E08'); 2022-04-27 04:06:30.623 CEST [4738] LOG: considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.623 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_record_info('0/1903E08'); 2022-04-27 04:06:30.623 CEST [4738] LOG: considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.623 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_record_info('0/1903E08'); 2022-04-27 04:06:30.626 CEST [4738] LOG: considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.626 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903E08', '0/1903EB0'); 2022-04-27 04:06:30.626 CEST [4738] LOG: considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.626 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903E08', '0/1903EB0'); 2022-04-27 04:06:30.630 CEST [4738] LOG: considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.630 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal('0/1903E08'); 2022-04-27 04:06:30.630 CEST [4738] LOG: considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.630 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal('0/1903E08'); 2022-04-27 04:06:30.633 CEST [4738] LOG: considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.633 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903E08', '0/1903EB0'); 2022-04-27 04:06:30.634 CEST [4738] LOG: considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.634 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903E08', '0/1903EB0'); 2022-04-27 04:06:30.638 CEST [4738] LOG: considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.638 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal('0/1903E08'); 2022-04-27 04:06:30.639 CEST [4738] LOG: considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.639 CEST [4738] STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal('0/1903E08'); 2022-04-27 04:06:30.653 CEST [4738] LOG: considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.653 CEST [4738] STATEMENT: SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E08', '0/1903EB0') WHERE block_ref LIKE concat('%', '16391', '%') AND resource_manager = 'Heap'; 2022-04-27 04:06:30.654 CEST [4738] LOG: considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.654 CEST [4738] STATEMENT: SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E08', '0/1903EB0') WHERE block_ref LIKE concat('%', '16391', '%') AND resource_manager = 'Heap'; 2022-04-27 04:06:30.657 CEST [4738] LOG: considering wait for wal: loc 0/1002000 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.657 CEST [4738] STATEMENT: SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E08', '0/1903EB0') WHERE resource_manager = 'Heap' AND record_type = 'INSERT'; 2022-04-27 04:06:30.658 CEST [4738] LOG: considering wait for wal: loc 0/1903E08 read_upto 0/1903F58 wait_for_wal 0 2022-04-27 04:06:30.658 CEST [4738] STATEMENT: SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E08', '0/1903EB0') WHERE resource_manager = 'Heap' AND record_type = 'INSERT'; 2022-04-27 04:06:37.735 CEST [4503] LOG: received fast shutdown request 2022-04-27 04:06:37.829 CEST [4503] LOG: aborting any active transactions 2022-04-27 04:06:37.835 CEST [4503] LOG: background worker "logical replication launcher" (PID 4526) exited with exit code1 2022-04-27 04:06:40.402 CEST [4521] LOG: shutting down 2022-04-27 04:06:40.406 CEST [4521] LOG: checkpoint starting: shutdown immediate 2022-04-27 04:06:41.063 CEST [4521] LOG: checkpoint complete: wrote 784 buffers (4.8%); 0 WAL file(s) added, 0 removed,0 recycled; write=0.218 s, sync=0.001 s, total=0.661 s; sync files=0, longest=0.000 s, average=0.000 s; distance=4197kB, estimate=4197 kB 2022-04-27 04:06:43.573 CEST [4503] LOG: database system is shut down 2022-04-27 04:04:08.677 CEST postmaster[3167] LOG: starting PostgreSQL 15devel on mipsel-unknown-linux-gnu, compiled bygcc (Debian 4.9.2-10+deb8u1) 4.9.2, 32-bit 2022-04-27 04:04:08.677 CEST postmaster[3167] LOG: listening on Unix socket "/tmp/pg_regress-S5YZoI/.s.PGSQL.51696" 2022-04-27 04:04:08.708 CEST startup[3172] LOG: database system was shut down at 2022-04-27 04:04:08 CEST 2022-04-27 04:04:08.737 CEST postmaster[3167] LOG: database system is ready to accept connections 2022-04-27 04:04:12.135 CEST client backend[3185] pg_regress/pg_walinspect ERROR: WAL start LSN must be less than end LSN 2022-04-27 04:04:12.135 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903EE8','0/1903E40'); 2022-04-27 04:04:12.137 CEST client backend[3185] pg_regress/pg_walinspect ERROR: WAL start LSN must be less than end LSN 2022-04-27 04:04:12.137 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903EE8','0/1903E40'); 2022-04-27 04:04:12.138 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1002000read_upto 0/1903F90 wait_for_wal 0 2022-04-27 04:04:12.138 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_record_info('0/1903E40'); 2022-04-27 04:04:12.139 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1903E40read_upto 0/1903F90 wait_for_wal 0 2022-04-27 04:04:12.139 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_record_info('0/1903E40'); 2022-04-27 04:04:12.142 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1002000read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.142 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903E40','0/1903EE8'); 2022-04-27 04:04:12.143 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1903E40read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.143 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info('0/1903E40','0/1903EE8'); 2022-04-27 04:04:12.146 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1002000read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.146 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal('0/1903E40'); 2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1903E40read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal('0/1903E40'); 2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1904034read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal('0/1903E40'); 2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect ERROR: could not read WAL at 0/1903E40: decode_queue_headis null 2022-04-27 04:04:12.147 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal('0/1903E40'); 2022-04-27 04:04:12.149 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1002000read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.149 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903E40','0/1903EE8'); 2022-04-27 04:04:12.149 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1903E40read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.149 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats('0/1903E40','0/1903EE8'); 2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1002000read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal('0/1903E40'); 2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1903E40read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal('0/1903E40'); 2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1904034read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal('0/1903E40'); 2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect ERROR: could not read WAL at 0/1903E40: decode_queue_headis null 2022-04-27 04:04:12.153 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal('0/1903E40'); 2022-04-27 04:04:12.165 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1002000read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.165 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E40','0/1903EE8') WHERE block_ref LIKE concat('%', '16391', '%') AND resource_manager = 'Heap'; 2022-04-27 04:04:12.165 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1903E40read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.165 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E40','0/1903EE8') WHERE block_ref LIKE concat('%', '16391', '%') AND resource_manager = 'Heap'; 2022-04-27 04:04:12.168 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1002000read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.168 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E40','0/1903EE8') WHERE resource_manager = 'Heap' AND record_type = 'INSERT'; 2022-04-27 04:04:12.168 CEST client backend[3185] pg_regress/pg_walinspect LOG: considering wait for wal: loc 0/1903E40read_upto 0/1904000 wait_for_wal 0 2022-04-27 04:04:12.168 CEST client backend[3185] pg_regress/pg_walinspect STATEMENT: SELECT COUNT(*) >= 1 AS ok FROM pg_get_wal_records_info('0/1903E40','0/1903EE8') WHERE resource_manager = 'Heap' AND record_type = 'INSERT'; 2022-04-27 04:04:12.297 CEST postmaster[3167] LOG: received fast shutdown request 2022-04-27 04:04:12.299 CEST postmaster[3167] LOG: aborting any active transactions 2022-04-27 04:04:12.306 CEST postmaster[3167] LOG: background worker "logical replication launcher" (PID 3175) exited withexit code 1 2022-04-27 04:04:12.307 CEST checkpointer[3170] LOG: shutting down 2022-04-27 04:04:12.310 CEST checkpointer[3170] LOG: checkpoint starting: shutdown immediate 2022-04-27 04:04:12.536 CEST checkpointer[3170] LOG: checkpoint complete: wrote 784 buffers (4.8%); 0 WAL file(s) added,0 removed, 0 recycled; write=0.212 s, sync=0.001 s, total=0.230 s; sync files=0, longest=0.000 s, average=0.000 s;distance=4197 kB, estimate=4197 kB 2022-04-27 04:04:12.866 CEST postmaster[3167] LOG: database system is shut down
Thomas Munro <thomas.munro@gmail.com> writes: > BTW If you had your local change from debug.patch (upthread), that'd > defeat the patch. I mean this: > + if(!*errormsg) > + *errormsg = "decode_queue_head is null"; Oh! Okay, I'll retry without that. regards, tom lane
I wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> BTW If you had your local change from debug.patch (upthread), that'd >> defeat the patch. I mean this: >> + if(!*errormsg) >> + *errormsg = "decode_queue_head is null"; > Oh! Okay, I'll retry without that. I've now done several runs with your patch and not seen the test failure. However, I think we ought to rethink this API a bit rather than just apply the patch as-is. Even if it were documented, relying on errormsg = NULL to mean something doesn't seem like a great plan. regards, tom lane
On Wed, Apr 27, 2022 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > >> BTW If you had your local change from debug.patch (upthread), that'd > >> defeat the patch. I mean this: > > >> + if(!*errormsg) > >> + *errormsg = "decode_queue_head is null"; > > > Oh! Okay, I'll retry without that. > > I've now done several runs with your patch and not seen the test failure. > However, I think we ought to rethink this API a bit rather than just > apply the patch as-is. Even if it were documented, relying on > errormsg = NULL to mean something doesn't seem like a great plan. Sorry for being late in the game, occupied with other stuff. How about using private_data of XLogReaderState for read_local_xlog_page_no_wait, something like this? typedef struct ReadLocalXLogPageNoWaitPrivate { bool end_of_wal; } ReadLocalXLogPageNoWaitPrivate; In read_local_xlog_page_no_wait: /* If asked, let's not wait for future WAL. */ if (!wait_for_wal) { private_data->end_of_wal = true; break; } /* * Opaque data for callbacks to use. Not used by XLogReader. */ void *private_data; Regards, Bharath Rupireddy.
On Wed, Apr 27, 2022 at 8:57 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Apr 27, 2022 at 8:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > I wrote: > > > Thomas Munro <thomas.munro@gmail.com> writes: > > >> BTW If you had your local change from debug.patch (upthread), that'd > > >> defeat the patch. I mean this: > > > > >> + if(!*errormsg) > > >> + *errormsg = "decode_queue_head is null"; > > > > > Oh! Okay, I'll retry without that. > > > > I've now done several runs with your patch and not seen the test failure. > > However, I think we ought to rethink this API a bit rather than just > > apply the patch as-is. Even if it were documented, relying on > > errormsg = NULL to mean something doesn't seem like a great plan. > > Sorry for being late in the game, occupied with other stuff. > > How about using private_data of XLogReaderState for > read_local_xlog_page_no_wait, something like this? > > typedef struct ReadLocalXLogPageNoWaitPrivate > { > bool end_of_wal; > } ReadLocalXLogPageNoWaitPrivate; > > In read_local_xlog_page_no_wait: > > /* If asked, let's not wait for future WAL. */ > if (!wait_for_wal) > { > private_data->end_of_wal = true; > break; > } > > /* > * Opaque data for callbacks to use. Not used by XLogReader. > */ > void *private_data; I found an easy way to reproduce this consistently (I think on any server): I basically generated huge WAL record (I used a fun extension that I wrote - https://github.com/BRupireddy/pg_synthesize_wal, but one can use pg_logical_emit_message as well) session 1: select * from pg_synthesize_wal_record(1*1024*1024); --> generate 1 MB of WAL record first and make a note of the output lsn (lsn1) session 2: select * from pg_get_wal_records_info_till_end_of_wal(lsn1); \watch 1 session 1: select * from pg_synthesize_wal_record(1000*1024*1024); --> generate ~1 GB of WAL record and we see ERROR: could not read WAL at XXXXX in session 2. Delay the checkpoint (set checkpoint_timeout to 1hr) just not recycle the wal files while we run pg_walinspect functions, no other changes required from the default initdb settings on the server. And, Thomas's patch fixes the issue. Regards, Bharath Rupireddy.
On Wed, Apr 27, 2022 at 1:47 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > I've now done several runs with your patch and not seen the test failure. > > > However, I think we ought to rethink this API a bit rather than just > > > apply the patch as-is. Even if it were documented, relying on > > > errormsg = NULL to mean something doesn't seem like a great plan. > > > > Sorry for being late in the game, occupied with other stuff. > > > > How about using private_data of XLogReaderState for > > read_local_xlog_page_no_wait, something like this? > > > > typedef struct ReadLocalXLogPageNoWaitPrivate > > { > > bool end_of_wal; > > } ReadLocalXLogPageNoWaitPrivate; > > > > In read_local_xlog_page_no_wait: > > > > /* If asked, let's not wait for future WAL. */ > > if (!wait_for_wal) > > { > > private_data->end_of_wal = true; > > break; > > } > > > > /* > > * Opaque data for callbacks to use. Not used by XLogReader. > > */ > > void *private_data; > > I found an easy way to reproduce this consistently (I think on any server): > > I basically generated huge WAL record (I used a fun extension that I > wrote - https://github.com/BRupireddy/pg_synthesize_wal, but one can > use pg_logical_emit_message as well) > session 1: > select * from pg_synthesize_wal_record(1*1024*1024); --> generate 1 MB > of WAL record first and make a note of the output lsn (lsn1) > > session 2: > select * from pg_get_wal_records_info_till_end_of_wal(lsn1); > \watch 1 > > session 1: > select * from pg_synthesize_wal_record(1000*1024*1024); --> generate > ~1 GB of WAL record and we see ERROR: could not read WAL at XXXXX in > session 2. > > Delay the checkpoint (set checkpoint_timeout to 1hr) just not recycle > the wal files while we run pg_walinspect functions, no other changes > required from the default initdb settings on the server. > > And, Thomas's patch fixes the issue. Here's v2 patch (up on Thomas's v1 at [1]) using private_data to set the end of the WAL flag. Please have a look at it. [1] https://www.postgresql.org/message-id/CA%2BhUKGLtswFk9ZO3WMOqnDkGs6dK5kCdQK9gxJm0N8gip5cpiA%40mail.gmail.com Regards, Bharath Rupireddy.
Attachment
On Wed, 2022-04-27 at 13:47 +0530, Bharath Rupireddy wrote: > I found an easy way to reproduce this consistently (I think on any > server): > > I basically generated huge WAL record (I used a fun extension that I > wrote - https://github.com/BRupireddy/pg_synthesize_wal, but one can > use pg_logical_emit_message as well) Thank you Bharath for creating the extension and the simple test case. Thomas's patch solves the issue for me as well. Tom, the debug patch you posted[0] seems to be setting the error message if it's not already set. Thomas's patch uses the lack of a message as a signal that we've reached the end of WAL. That explains why you are still seeing the problem. Obviously, that's a sign that Thomas's patch is not the cleanest solution. But other approaches would be more invasive. I guess the question is whether that's a good enough solution for now, and hopefully we could improve the API later; or whether we need to come up with something better. When reviewing, I considered the inability to read old WAL and the inability to read flushed-in-the-middle-of-a-record WAL as similar kinds of errors that the user would need to deal with. But they are different: the former can be avoided by creating a slot; the latter can't be easily avoided, only retried. Depending on the intended use cases, forcing the user to retry might be reasonable, in which case we could consider this a test problem rather than a real problem, and we might be able to do something simpler to just stabilize the test. Regards, Jeff Davis [0] https://postgr.es/m/295868.1651024073@sss.pgh.pa.us
On Wed, Apr 27, 2022 at 10:22 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Here's v2 patch (up on Thomas's v1 at [1]) using private_data to set > the end of the WAL flag. Please have a look at it. I don't have a strong view on whether it's better to use a NULL error for this private communication between pg_walinspect.c and the read_page callback it installs, or install a custom private_data to signal this condition, or to give up on all this stuff completely and just wait (see below for thoughts). I'd feel better about both no-wait options if the read_page callback in question were actually in the contrib module, and not in the core code. On the other hand, I'm not too hung up about it because I'm really hoping to see the get-rid-of-all-these-callbacks-and-make-client-code-do-the-waiting scheme proposed by Horiguchi-san and Heikki developed further, to rip much of this stuff out in a future release. If you go with the private_data approach, a couple of small comments: if (record == NULL) { + ReadLocalXLogPageNoWaitPrivate *private_data; + + /* return NULL, if end of WAL is reached */ + private_data = (ReadLocalXLogPageNoWaitPrivate *) + xlogreader->private_data; + + if (private_data->end_of_wal) + return NULL; + if (errormsg) ereport(ERROR, (errcode_for_file_access(), errmsg("could not read WAL at %X/%X: %s", LSN_FORMAT_ARGS(first_record), errormsg))); - else - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not read WAL at %X/%X", - LSN_FORMAT_ARGS(first_record)))); } I think you should leave that second ereport() call in, in this variant of the patch, no? I don't know if anything else raises errors with no message, but if we're still going to treat them as silent end-of-data conditions then you might as well go with the v1 patch. Another option might be to abandon this whole no-wait concept and revert 2258e76f's changes to xlogutils.c. pg_walinspect already does preliminary checks that LSNs are in range, so you can't enter a value that will wait indefinitely, and it's interruptible (it's a 1ms sleep/check loop, not my favourite programming pattern but that's pre-existing code). If you're unlucky enough to hit the case where the LSN is judged to be valid but is in the middle of a record that hasn't been totally flushed yet, it'll just be a bit slower to return as we wait for the inevitable later flush(es) to happen. The rest of your record will *surely* be flushed pretty soon (or the flushing backend panics the whole system and time ends). I don't imagine this is performance critical work, so maybe that'd be acceptable?
On Thu, 2022-04-28 at 12:11 +1200, Thomas Munro wrote: > > Another option might be to abandon this whole no-wait concept and > revert 2258e76f's changes to xlogutils.c. pg_walinspect already does > preliminary checks that LSNs are in range, so you can't enter a value > that will wait indefinitely, and it's interruptible (it's a 1ms > sleep/check loop, not my favourite programming pattern but that's > pre-existing code). If you're unlucky enough to hit the case where > the LSN is judged to be valid but is in the middle of a record that > hasn't been totally flushed yet, it'll just be a bit slower to return > as we wait for the inevitable later flush(es) to happen. The rest of > your record will *surely* be flushed pretty soon (or the flushing > backend panics the whole system and time ends). I don't imagine this > is performance critical work, so maybe that'd be acceptable? I'm inclined toward this option. I was a bit iffy on those xlogutils.c changes to begin with, and they are causing a problem now, so I'd like to avoid layering on more workarounds. The time when we need to wait is very narrow, only in this case where it's earlier than the flush pointer, and the flush pointer is in the middle of a record that's not fully flushed. And as you say, we won't be waiting very long in that case, because once we start to write a WAL record it better finish soon. Bharath, thoughts? When you originally introduced the nowait behavior, I believe that was to solve the case where someone specifies an LSN range well in the future, but we can still catch that and throw an error if we see that it's beyond the flush pointer. Do you see a problem with just doing that and getting rid of the nowait changes? Regards, Jeff Davis
On Thu, Apr 28, 2022 at 8:41 AM Jeff Davis <pgsql@j-davis.com> wrote: > > On Thu, 2022-04-28 at 12:11 +1200, Thomas Munro wrote: > > > > Another option might be to abandon this whole no-wait concept and > > revert 2258e76f's changes to xlogutils.c. pg_walinspect already does > > preliminary checks that LSNs are in range, so you can't enter a value > > that will wait indefinitely, and it's interruptible (it's a 1ms > > sleep/check loop, not my favourite programming pattern but that's > > pre-existing code). If you're unlucky enough to hit the case where > > the LSN is judged to be valid but is in the middle of a record that > > hasn't been totally flushed yet, it'll just be a bit slower to return > > as we wait for the inevitable later flush(es) to happen. The rest of > > your record will *surely* be flushed pretty soon (or the flushing > > backend panics the whole system and time ends). I don't imagine this > > is performance critical work, so maybe that'd be acceptable? > > I'm inclined toward this option. I was a bit iffy on those xlogutils.c > changes to begin with, and they are causing a problem now, so I'd like > to avoid layering on more workarounds. > > The time when we need to wait is very narrow, only in this case where > it's earlier than the flush pointer, and the flush pointer is in the > middle of a record that's not fully flushed. And as you say, we won't > be waiting very long in that case, because once we start to write a WAL > record it better finish soon. > > Bharath, thoughts? When you originally introduced the nowait behavior, > I believe that was to solve the case where someone specifies an LSN > range well in the future, but we can still catch that and throw an > error if we see that it's beyond the flush pointer. Do you see a > problem with just doing that and getting rid of the nowait changes? It's not just the flush ptr, without no wait mode, the functions would wait if start/input lsn is, say current flush lsn - 1 or 2 or more (before the previous record) bytes. If the functions were to wait, by how much time should they wait? a timeout? forever? This is what the complexity we wanted to avoid. I would still vote for the private_data approach and if the end of WAL reaches when flush lsn falls in the middle of the record, let's just exit the loop and report the results back to the client. I addressed Thomas's review comment and attached v3 patch. Please have a look. Regards, Bharath Rupireddy.
Attachment
On Fri, 2022-04-29 at 10:46 +0530, Bharath Rupireddy wrote: > It's not just the flush ptr, without no wait mode, the functions > would > wait if start/input lsn is, say current flush lsn - 1 or 2 or more > (before the previous record) bytes. If the functions were to wait, by > how much time should they wait? a timeout? forever? I see, you're talking about the case of XLogFindNextRecord(), not XLogReadRecord(). XLogFindNextRecord() is the only way to align the user-provided start LSN on a valid record, but that calls XLogReadRecord(), which may wait indefinitely. If there were a different entry point that just did the alignment and skipped past continuation records, we could prevent it from trying to read the next record if we are already at the flush pointer. But without some tweak to that API, nowait is still needed. Committed your v3 patch with minor modifications. Regards, Jeff Davis