Thread: Possible bug in logical replication.

Possible bug in logical replication.

From
Konstantin Knizhnik
Date:
We got the following assertion failure at our buildfarm of master branch of Postgres in contrib/test_decoding regression tests:

2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL:  Streaming transactions committing after 0/2A00000, reading WAL from 0/29FFF1C.
2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT:  SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') 
TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", File: "xlogreader.c", Line: 241)


Stack trace is the following:

$ gdb -x ./gdbcmd --batch pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres pgsql.build/contrib/test_decoding/tmp_check/data/core
[New LWP 21678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildfarm regression [local] SELECT                                 '.
Program terminated with signal SIGABRT, Aborted.
#0  0xf7722c89 in __kernel_vsyscall ()
#0  0xf7722c89 in __kernel_vsyscall ()
#1  0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xf6b5f297 in __GI_abort () at abort.c:89
#4  0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", errorType=0x56b8bf61 "FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
#5  0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, errormsg=0xffaf2560) at xlogreader.c:241
#6  0x569c3191 in pg_logical_replication_slot_advance (startlsn=<optimized out>, moveto=44040564) at slotfuncs.c:370
#7  0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at slotfuncs.c:487
#8  0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, randomAccess=false) at execSRF.c:231
#9  0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
#10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 <FunctionRecheck>, accessMtd=0x568b3ed0 <FunctionNext>, node=0x57626cfc) at execScan.c:95
#11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 <FunctionNext>, recheckMtd=0x568b3ec0 <FunctionRecheck>) at execScan.c:162
#12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
#13 0x5689caba in ExecProcNode (node=0x57626cfc) at ../../../src/include/executor/executor.h:238
#14 ExecutePlan (execute_once=<optimized out>, dest=0x0, direction=NoMovementScanDirection, numberTuples=<optimized out>, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
#15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:368
#16 0x56a132cd in PortalRunSelect (portal=portal@entry=0x575c4ea8, forward=forward@entry=true, count=0, count@entry=2147483647, dest=0x576232d4) at pquery.c:932
#17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 "") at pquery.c:773
#18 0x56a0fbb7 in exec_simple_query (query_string=query_string@entry=0x57579070 "SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') ") at postgres.c:1122
#19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=<optimized out>, username=0x575a09f0 "buildfarm") at postgres.c:4153
#20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
#21 BackendStartup (port=0x5759a358) at postmaster.c:4033
#22 ServerLoop () at postmaster.c:1706
#23 0x5698a608 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1379
#24 0x566cf642 in main (argc=<optimized out>, argv=<optimized out>) at main.c:228



As you can see, assertion failure happen because specified startlsn (0x2a00000) is not considered to be valid.
This LSN is taken from slot's confirmed flush LSN in pg_replication_slot_advance:
startlsn = MyReplicationSlot->data.confirmed_flush;


Unfortunately I was not able to reproduce the problem even repeating this regression tests 1000 times, so it seems to be very difficult to reproduced non-deterministic bug.
I wonder if there is a warranty that confirmed_flush always points to the start of the record.
Inspecting of xlogreader.c code shows that confirmed_flush is for example assigned EndRecPtr in DecodingContextFindStartpoint.
And EndRecPtr is updated in  XLogReadRecord and if record doesn't cross page boundary, then the following formula is used:
	state->EndRecPtr = RecPtr + MAXALIGN(total_len);

And if record ends at page boundary, then looks like EndRecPtr can point to page boundary.
It is confirmed by the following comment in XLogReadRecord function:
	/*	 * RecPtr is pointing to end+1 of the previous WAL record.  If we're	 * at a page boundary, no more records can fit on the current page. We	 * must skip over the page header, but we can't do that until we've	 * read in the page, since the header size is variable.	 */

But it means that it is possible that confirmed_flush can also point to the page boundary which will cause this assertion failure in XLogReadRecord:
	/*	 * Caller supplied a position to start at.	 *	 * In this case, the passed-in record pointer should already be	 * pointing to a valid record starting position.	 */	Assert(XRecOffIsValid(RecPtr));


May be this bug is very difficult to reproduce because it takes place only if WAL record preceding specified start point ends exactly at page boundary. The probability of it seems to be quite small...

I am not so familiar with wal iteration code and logical decoding, so I will be pleased to receive confirmation or confutation of my analysis of the problem.
 
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Possible bug in logical replication.

From
Konstantin Knizhnik
Date:


On 17.05.2018 10:45, Konstantin Knizhnik wrote:
We got the following assertion failure at our buildfarm of master branch of Postgres in contrib/test_decoding regression tests:

2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL:  Streaming transactions committing after 0/2A00000, reading WAL from 0/29FFF1C.
2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT:  SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') 
TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", File: "xlogreader.c", Line: 241)


Stack trace is the following:

$ gdb -x ./gdbcmd --batch pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres pgsql.build/contrib/test_decoding/tmp_check/data/core
[New LWP 21678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildfarm regression [local] SELECT                                 '.
Program terminated with signal SIGABRT, Aborted.
#0  0xf7722c89 in __kernel_vsyscall ()
#0  0xf7722c89 in __kernel_vsyscall ()
#1  0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xf6b5f297 in __GI_abort () at abort.c:89
#4  0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", errorType=0x56b8bf61 "FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
#5  0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, errormsg=0xffaf2560) at xlogreader.c:241
#6  0x569c3191 in pg_logical_replication_slot_advance (startlsn=<optimized out>, moveto=44040564) at slotfuncs.c:370
#7  0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at slotfuncs.c:487
#8  0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, randomAccess=false) at execSRF.c:231
#9  0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
#10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 <FunctionRecheck>, accessMtd=0x568b3ed0 <FunctionNext>, node=0x57626cfc) at execScan.c:95
#11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 <FunctionNext>, recheckMtd=0x568b3ec0 <FunctionRecheck>) at execScan.c:162
#12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
#13 0x5689caba in ExecProcNode (node=0x57626cfc) at ../../../src/include/executor/executor.h:238
#14 ExecutePlan (execute_once=<optimized out>, dest=0x0, direction=NoMovementScanDirection, numberTuples=<optimized out>, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
#15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:368
#16 0x56a132cd in PortalRunSelect (portal=portal@entry=0x575c4ea8, forward=forward@entry=true, count=0, count@entry=2147483647, dest=0x576232d4) at pquery.c:932
#17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 "") at pquery.c:773
#18 0x56a0fbb7 in exec_simple_query (query_string=query_string@entry=0x57579070 "SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') ") at postgres.c:1122
#19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=<optimized out>, username=0x575a09f0 "buildfarm") at postgres.c:4153
#20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
#21 BackendStartup (port=0x5759a358) at postmaster.c:4033
#22 ServerLoop () at postmaster.c:1706
#23 0x5698a608 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1379
#24 0x566cf642 in main (argc=<optimized out>, argv=<optimized out>) at main.c:228



As you can see, assertion failure happen because specified startlsn (0x2a00000) is not considered to be valid.
This LSN is taken from slot's confirmed flush LSN in pg_replication_slot_advance:
	startlsn = MyReplicationSlot->data.confirmed_flush;


Unfortunately I was not able to reproduce the problem even repeating this regression tests 1000 times, so it seems to be very difficult to reproduced non-deterministic bug.
I wonder if there is a warranty that confirmed_flush always points to the start of the record.
Inspecting of xlogreader.c code shows that confirmed_flush is for example assigned EndRecPtr in DecodingContextFindStartpoint.
And EndRecPtr is updated in  XLogReadRecord and if record doesn't cross page boundary, then the following formula is used:
	state->EndRecPtr = RecPtr + MAXALIGN(total_len);

And if record ends at page boundary, then looks like EndRecPtr can point to page boundary.
It is confirmed by the following comment in XLogReadRecord function:
	/*	 * RecPtr is pointing to end+1 of the previous WAL record.  If we're	 * at a page boundary, no more records can fit on the current page. We	 * must skip over the page header, but we can't do that until we've	 * read in the page, since the header size is variable.	 */

But it means that it is possible that confirmed_flush can also point to the page boundary which will cause this assertion failure in XLogReadRecord:
	/*	 * Caller supplied a position to start at.	 *	 * In this case, the passed-in record pointer should already be	 * pointing to a valid record starting position.	 */	Assert(XRecOffIsValid(RecPtr));


May be this bug is very difficult to reproduce because it takes place only if WAL record preceding specified start point ends exactly at page boundary. The probability of it seems to be quite small...

I am not so familiar with wal iteration code and logical decoding, so I will be pleased to receive confirmation or confutation of my analysis of the problem.
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Looks like it is not the first time this problem was encountered:

https://www.postgresql.org/message-id/flat/34d66f63-40a9-4c3e-c9a1-248d1e393d29%40enterprisedb.com#34d66f63-40a9-4c3e-c9a1-248d1e393d29@enterprisedb.com

I wonder who is now maintaining logical replication stuff and whether this bug is going to be fixed?
I think that using restart_lsn instead of confirmed_flush is not right approach.
If restart_lsn is not available and confirmed_flush is pointing to page boundary, then in any case we should somehow handle this case and adjust startlsn to point on the valid record position (by jjust adding page header size?).

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Possible bug in logical replication.

From
Arseny Sher
Date:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:

> I think that using restart_lsn instead of confirmed_flush is not right approach.
> If restart_lsn is not available and confirmed_flush is pointing to page
> boundary, then in any case we should somehow handle this case and adjust
> startlsn to point on the valid record position (by jjust adding page header
> size?).

Well, restart_lsn is always available on live slot: it is initially set
in ReplicationSlotReserveWal during slot creation.


--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Thu, May 17, 2018 at 12:31:09PM +0300, Konstantin Knizhnik wrote:
> I wonder who is now maintaining logical replication stuff and whether this
> bug is going to be fixed?

I haven't looked at your problem in details so I cannot give a sure
conclusion, but when it comes to pg_replication_slot_advance, which is
new as of v11, you need to look at this commit:
commit: 9c7d06d60680c7f00d931233873dee81fdb311c6
author: Simon Riggs <simon@2ndQuadrant.com>
date: Wed, 17 Jan 2018 11:38:34 +0000
Ability to advance replication slots

Ability to advance both physical and logical replication slots using a
new user function pg_replication_slot_advance().

For logical advance that means records are consumed as fast as possible
and changes are not given to output plugin for sending. Makes 2nd phase
(after we reached SNAPBUILD_FULL_SNAPSHOT) of replication slot creation
faster, especially when there are big transactions as the reorder buffer
does not have to deal with data changes and does not have to spill to
disk.

Author: Petr Jelinek
Reviewed-by: Simon Riggs

I am adding an open item.
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Kyotaro HORIGUCHI
Date:
At Thu, 17 May 2018 13:54:07 +0300, Arseny Sher <a.sher@postgrespro.ru> wrote in <87in7md034.fsf@ars-thinkpad>
> 
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> 
> > I think that using restart_lsn instead of confirmed_flush is not right approach.
> > If restart_lsn is not available and confirmed_flush is pointing to page
> > boundary, then in any case we should somehow handle this case and adjust
> > startlsn to point on the valid record position (by jjust adding page header
> > size?).
> 
> Well, restart_lsn is always available on live slot: it is initially set
> in ReplicationSlotReserveWal during slot creation.

restart_lsn stays at the beginning of a transaction until the
transaction ends so just using restart_lsn allows repeated
decoding of a transaction, in short, rewinding occurs. The
function works only for inactive slot so the current code works
fine on this point. Addition to that restart_lsn also can be on a
page bounary.


We can see the problem easily.

1. Just create a logical replication slot with setting current LSN.

  select pg_create_logical_replication_slot('s1', 'pgoutput');

2. Advance LSN by two or three pages by doing anyting.

3. Advance the slot to a page bounadry.

  e.g. select pg_replication_slot_advance('s1', '0/9624000');

4. advance the slot further, then crash.

So directly set ctx->reader->EndRecPtr by startlsn fixes the
problem, but I found another problem here.

The function accepts any LSN even if it is not at the begiining
of a record. We will see errors or crashs or infinite waiting or
maybe any kind of trouble by such values. The moved LSN must
always be at the "end of a record" (that is, at the start of the
next recored). The attached patch also fixes this.

The documentation doesn't look requiring a fix.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d9e10263bb..d3cb777f9f 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -318,6 +318,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 
 /*
  * Helper function for advancing physical replication slot forward.
+ *
+ * This function accepts arbitrary LSN even if the LSN is not at the beginning
+ * of a record. This can lead to any kind of misbehavior but currently the
+ * value is used only to determine up to what wal segment to keep and
+ * successive implicit advancing fixes the state.
  */
 static XLogRecPtr
 pg_physical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
@@ -344,6 +349,7 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
     LogicalDecodingContext *ctx;
     ResourceOwner old_resowner = CurrentResourceOwner;
     XLogRecPtr    retlsn = InvalidXLogRecPtr;
+    XLogRecPtr    upto;
 
     PG_TRY();
     {
@@ -354,6 +360,13 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
                                     logical_read_local_xlog_page,
                                     NULL, NULL, NULL);
 
+        /*
+         * startlsn can be on page boundary but it is not accepted as explicit
+         * parameter to XLogReadRecord. Set it in reader context.
+         */
+        Assert(startlsn != InvalidXLogRecPtr);
+        upto = ctx->reader->EndRecPtr = startlsn;
+    
         CurrentResourceOwner = ResourceOwnerCreate(CurrentResourceOwner,
                                                    "logical decoding");
 
@@ -361,22 +374,18 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
         InvalidateSystemCaches();
 
         /* Decode until we run out of records */
-        while ((startlsn != InvalidXLogRecPtr && startlsn < moveto) ||
-               (ctx->reader->EndRecPtr != InvalidXLogRecPtr && ctx->reader->EndRecPtr < moveto))
+        while (ctx->reader->EndRecPtr <= moveto)
         {
             XLogRecord *record;
             char       *errm = NULL;
+ 
+            /* ctx->reader->EndRecPtr cannot be go backward here */
+            upto = ctx->reader->EndRecPtr;
 
-            record = XLogReadRecord(ctx->reader, startlsn, &errm);
+            record = XLogReadRecord(ctx->reader, InvalidXLogRecPtr, &errm);
             if (errm)
                 elog(ERROR, "%s", errm);
 
-            /*
-             * Now that we've set up the xlog reader state, subsequent calls
-             * pass InvalidXLogRecPtr to say "continue from last record"
-             */
-            startlsn = InvalidXLogRecPtr;
-
             /*
              * The {begin_txn,change,commit_txn}_wrapper callbacks above will
              * store the description into our tuplestore.
@@ -384,18 +393,14 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)
             if (record != NULL)
                 LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-            /* check limits */
-            if (moveto <= ctx->reader->EndRecPtr)
-                break;
-
             CHECK_FOR_INTERRUPTS();
         }
 
         CurrentResourceOwner = old_resowner;
 
-        if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)
+        if (startlsn != upto)
         {
-            LogicalConfirmReceivedLocation(moveto);
+            LogicalConfirmReceivedLocation(upto);
 
             /*
              * If only the confirmed_flush_lsn has changed the slot won't get

Re: Possible bug in logical replication.

From
Masahiko Sawada
Date:
On Fri, May 18, 2018 at 2:37 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Thu, 17 May 2018 13:54:07 +0300, Arseny Sher <a.sher@postgrespro.ru> wrote in <87in7md034.fsf@ars-thinkpad>
>>
>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>>
>> > I think that using restart_lsn instead of confirmed_flush is not right approach.
>> > If restart_lsn is not available and confirmed_flush is pointing to page
>> > boundary, then in any case we should somehow handle this case and adjust
>> > startlsn to point on the valid record position (by jjust adding page header
>> > size?).
>>
>> Well, restart_lsn is always available on live slot: it is initially set
>> in ReplicationSlotReserveWal during slot creation.
>
> restart_lsn stays at the beginning of a transaction until the
> transaction ends so just using restart_lsn allows repeated
> decoding of a transaction, in short, rewinding occurs. The
> function works only for inactive slot so the current code works
> fine on this point. Addition to that restart_lsn also can be on a
> page bounary.
>
>
> We can see the problem easily.
>
> 1. Just create a logical replication slot with setting current LSN.
>
>   select pg_create_logical_replication_slot('s1', 'pgoutput');
>
> 2. Advance LSN by two or three pages by doing anyting.
>
> 3. Advance the slot to a page bounadry.
>
>   e.g. select pg_replication_slot_advance('s1', '0/9624000');
>
> 4. advance the slot further, then crash.
>
> So directly set ctx->reader->EndRecPtr by startlsn fixes the
> problem, but I found another problem here.

I confirmed that the attached patch fixes this problem as well as the
same problem reported on another thread.

I'm not sure it's a good approach to change the state of xlogreader
directly in the replication slot codes because it also means that we
have to care about this code as well when xlogreader code is changed.
Another possible way might be to make XLogFindNextRecord valid in
backend code and move startlsn to the first valid record with an lsn
>= startlsn by using that function. Please find attached patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Possible bug in logical replication.

From
Kyotaro HORIGUCHI
Date:
Hello.

At Wed, 23 May 2018 15:56:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+5Tz0Z8zHOmD=sU5F=cygoEjHs7WvbBDL07fH9ayVaw@mail.gmail.com>
> > So directly set ctx->reader->EndRecPtr by startlsn fixes the
> > problem, but I found another problem here.
> 
> I confirmed that the attached patch fixes this problem as well as the
> same problem reported on another thread.
> 
> I'm not sure it's a good approach to change the state of xlogreader
> directly in the replication slot codes because it also means that we
> have to care about this code as well when xlogreader code is changed.

XLogReadRecrod checks whether state->ReadRecPtr is invalid or not
in the case and works as the same to the explicit LSN case if
so. That is suggesting the usage. (I found no actual use case,
though.) It seems somewhat uneasy also to me, though..

> Another possible way might be to make XLogFindNextRecord valid in
> backend code and move startlsn to the first valid record with an lsn
> >= startlsn by using that function. Please find attached patch.

The another reason for the code is the fact that confirmed_lsn is
storing EndRecPtr after the last XLogReadRecord call. That is,
from the definition, confirmed_lsn must be on the start of a
record or page boundary and error out if not. For that reason,
calling XLogFindNextRecord would not be the right thing to do
here. We should just skip a header if we are on a boundary but it
already done in XLogReadRecord.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Thu, May 24, 2018 at 10:14:01AM +0900, Kyotaro HORIGUCHI wrote:
> At Wed, 23 May 2018 15:56:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+5Tz0Z8zHOmD=sU5F=cygoEjHs7WvbBDL07fH9ayVaw@mail.gmail.com>
>> Another possible way might be to make XLogFindNextRecord valid in
>> backend code and move startlsn to the first valid record with an lsn
>> >= startlsn by using that function. Please find attached patch.
>
> The another reason for the code is the fact that confirmed_lsn is
> storing EndRecPtr after the last XLogReadRecord call. That is,
> from the definition, confirmed_lsn must be on the start of a
> record or page boundary and error out if not. For that reason,
> calling XLogFindNextRecord would not be the right thing to do
> here. We should just skip a header if we are on a boundary but it
> already done in XLogReadRecord.

Maybe I am being too naive, but wouldn't it be just enough to update the
confirmed flush LSN to ctx->reader->ReadRecPtr?  This way, the slot
advances up to the beginning of the last record where user wants to
advance, and not the beginning of the next record:
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -395,7 +395,7 @@ pg_logical_replication_slot_advance(XLogRecPtr startlsn, XLogRecPtr moveto)

         if (ctx->reader->EndRecPtr != InvalidXLogRecPtr)
         {
-            LogicalConfirmReceivedLocation(moveto);
+            LogicalConfirmReceivedLocation(ctx->reader->ReadRecPtr);

             /*
              * If only the confirmed_flush_lsn has changed the slot won't get

I agree with the point made above to not touch manually the XLogReader
context out of xlogreader.c.
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Arseny Sher
Date:
Hello,

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

> restart_lsn stays at the beginning of a transaction until the
> transaction ends so just using restart_lsn allows repeated
> decoding of a transaction, in short, rewinding occurs. The
> function works only for inactive slot so the current code works
> fine on this point.

Sorry, I do not follow. restart_lsn is advanced whenever there is a
consistent snapshot dumped (in xl_running_xacts) which is old enough to
wholly decode all xacts not yet confirmed by the client. Could you
please elaborate, what's wrong with that?

> Addition to that restart_lsn also can be on a
> page bounary.

Do you have an example of that? restart_lsn is set initially to WAL
insert position at ReplicationSlotReserveWal, and later it always points
to xl_running_xacts record with consistent snapshot dumped.

> So directly set ctx->reader->EndRecPtr by startlsn fixes the
> problem, but I found another problem here.

There is a minor issue with the patch. Now slot advancement hangs
polling for new WAL on my example from [1]; most probably because we
must exit the loop when ctx->reader->EndRecPtr == moveto.

> The function accepts any LSN even if it is not at the begiining
> of a record. We will see errors or crashs or infinite waiting or
> maybe any kind of trouble by such values. The moved LSN must
> always be at the "end of a record" (that is, at the start of the
> next recored). The attached patch also fixes this.

Indeed, but we have these problems only if we are trying to read WAL
since confirmed_flush.

[1] https://www.postgresql.org/message-id/873720e4hf.fsf%40ars-thinkpad

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Possible bug in logical replication.

From
Arseny Sher
Date:
Michael Paquier <michael@paquier.xyz> writes:

> Maybe I am being too naive, but wouldn't it be just enough to update the
> confirmed flush LSN to ctx->reader->ReadRecPtr?  This way, the slot
> advances up to the beginning of the last record where user wants to
> advance, and not the beginning of the next record:

Same problem should be handled at pg_logical_slot_get_changes_guts and
apply worker feedback; and there is a convention that all commits since
confirmed_flush must be decoded, so we risk decoding such boundary
commit twice.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Possible bug in logical replication.

From
Alvaro Herrera
Date:
This thread seems to have died down without any fix being proposed.
Simon, you own this open item.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Possible bug in logical replication.

From
Simon Riggs
Date:
On 6 June 2018 at 17:22, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> This thread seems to have died down without any fix being proposed.
> Simon, you own this open item.

Thanks, will look.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Thu, Jun 07, 2018 at 08:32:10AM +0100, Simon Riggs wrote:
> On 6 June 2018 at 17:22, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> This thread seems to have died down without any fix being proposed.
>> Simon, you own this open item.
>
> Thanks, will look.

Petr and I have found a couple of issues about the slot advance stuff on
this thread:
https://www.postgresql.org/message-id/20180525052805.GA15634%40paquier.xyz

The result is f731cfa, which, per my tests, is able to take care of this
issue as well as advancing first a slot to a WAL page boundary, and then
advancing it to the latest LSN available does not trigger any assertions
anymore.  It would be nice if there is a double-check though, so I am
letting this thread on the list of open items for now.
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Alvaro Herrera
Date:
Hello

Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
Michaël's commit fixes the reported bug?

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Possible bug in logical replication.

From
Masahiko Sawada
Date:
On Fri, Jun 15, 2018 at 5:06 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Hello
>
> Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
> Michaël's commit fixes the reported bug?
>

I don't confirm that commit deeply yet but I have confirmed that the
reported bug is fixed using the following test case which led an
assertion failure. And this test case is the same as the previous
report[1].

postgres(1:128004)=# select pg_create_logical_replication_slot('s1',
'pgoutput');
2018-06-15 18:04:31.301 JST [128004] LOG:  logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.301 JST [128004] DETAIL:  There are no running transactions.
2018-06-15 18:04:31.301 JST [128004] STATEMENT:  select
pg_create_logical_replication_slot('s1', 'pgoutput');
 pg_create_logical_replication_slot
------------------------------------
 (s1,0/16453C0)
(1 row)

postgres(1:128004)=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/16453D8
(1 row)

postgres(1:128004)=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/2000000
(1 row)

postgres(1:128004)=# select pg_replication_slot_advance('s1', '0/2000000');
2018-06-15 18:04:31.338 JST [128004] LOG:  starting logical decoding
for slot "s1"
2018-06-15 18:04:31.338 JST [128004] DETAIL:  Streaming transactions
committing after 0/16453C0, reading WAL from 0/1645388.
2018-06-15 18:04:31.338 JST [128004] STATEMENT:  select
pg_replication_slot_advance('s1', '0/2000000');
2018-06-15 18:04:31.339 JST [128004] LOG:  logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.339 JST [128004] DETAIL:  There are no running transactions.
2018-06-15 18:04:31.339 JST [128004] STATEMENT:  select
pg_replication_slot_advance('s1', '0/2000000');
 pg_replication_slot_advance
-----------------------------
 (s1,0/2000000)
(1 row)

postgres(1:128004)=# create table a (c int);
select pg_replication_slot_advance('s1', pg_current_wal_lsn());
CREATE TABLE
postgres(1:128004)=# select pg_replication_slot_advance('s1',
pg_current_wal_lsn());
2018-06-15 18:04:31.401 JST [128004] LOG:  starting logical decoding
for slot "s1"
2018-06-15 18:04:31.401 JST [128004] DETAIL:  Streaming transactions
committing after 0/2000000, reading WAL from 0/1645388.
2018-06-15 18:04:31.401 JST [128004] STATEMENT:  select
pg_replication_slot_advance('s1', pg_current_wal_lsn());
2018-06-15 18:04:31.402 JST [128004] LOG:  logical decoding found
consistent point at 0/1645388
2018-06-15 18:04:31.402 JST [128004] DETAIL:  There are no running transactions.
2018-06-15 18:04:31.402 JST [128004] STATEMENT:  select
pg_replication_slot_advance('s1', pg_current_wal_lsn());
 pg_replication_slot_advance
-----------------------------
 (s1,0/2017828)
(1 row)

[1] https://www.postgresql.org/message-id/34d66f63-40a9-4c3e-c9a1-248d1e393d29%40enterprisedb.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Possible bug in logical replication.

From
Arseny Sher
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:

> Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
> Michaël's commit fixes the reported bug?

I confirm that starting reading WAL since restart_lsn as implemented in
f731cfa fixes this issue, as well as the second issue tushar mentioned
at [1]. I think that the code still can be improved a bit though --
consider the attached patch:
* pg_logical_replication_slot_advance comment was not very informative
  and even a bit misleading: it said that we use confirmed_flush_lsn and
  restart_lsn, but didn't explain why.
* Excessive check in its main loop.
* Copy-paste comment fix.

[1] https://www.postgresql.org/message-id/5f85bf41-098e-c4e1-7332-9171fef57a0a%40enterprisedb.com

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From d8ed8ae3eec54b716d7dbb35379d0047a96c6c75 Mon Sep 17 00:00:00 2001
From: Arseny Sher <sher-ars@yandex.ru>
Date: Fri, 15 Jun 2018 18:11:17 +0300
Subject: [PATCH] Cosmetic review for f731cfa.

* pg_logical_replication_slot_advance comment was not very informative and even
  a bit misleading: it said that we use confirmed_flush_lsn and restart_lsn, but
  didn't explain why.
* Excessive check in its main loop.
* Copy-paste comment fix.
---
 src/backend/replication/slotfuncs.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..597e81f4d9 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,24 +341,26 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn allowing
+ * to recycle WAL and old catalog tuples. We do it in special fast_forward
+ * mode without actual replay.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
 {
     LogicalDecodingContext *ctx;
     ResourceOwner old_resowner = CurrentResourceOwner;
+    /*
+     * Start reading WAL at restart_lsn, which certainly points to the valid
+     * record.
+     */
     XLogRecPtr    startlsn = MyReplicationSlot->data.restart_lsn;
     XLogRecPtr    retlsn = MyReplicationSlot->data.confirmed_flush;
 
     PG_TRY();
     {
-        /* restart at slot's confirmed_flush */
+        /* start_lsn doesn't matter here, we don't replay xacts at all */
         ctx = CreateDecodingContext(InvalidXLogRecPtr,
                                     NIL,
                                     true,
@@ -388,17 +390,10 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
              */
             startlsn = InvalidXLogRecPtr;
 
-            /*
-             * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-             * store the description into our tuplestore.
-             */
+            /* Changes are not actually produced in fast_forward mode. */
             if (record != NULL)
                 LogicalDecodingProcessRecord(ctx, ctx->reader);
 
-            /* Stop once the moving point wanted by caller has been reached */
-            if (moveto <= ctx->reader->EndRecPtr)
-                break;
-
             CHECK_FOR_INTERRUPTS();
         }
 
-- 
2.11.0


Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
> I confirm that starting reading WAL since restart_lsn as implemented in
> f731cfa fixes this issue, as well as the second issue tushar mentioned
> at [1].

Thanks!

+    /*
+     * Start reading WAL at restart_lsn, which certainly points to the valid
+     * record.
+     */
     XLogRecPtr    startlsn = MyReplicationSlot->data.restart_lsn;
     XLogRecPtr    retlsn = MyReplicationSlot->data.confirmed_flush;

What does this one actually bring?

     PG_TRY();
     {
-        /* restart at slot's confirmed_flush */
+        /* start_lsn doesn't matter here, we don't replay xacts at all */
         ctx = CreateDecodingContext(InvalidXLogRecPtr,
                                     NIL,
                                     true,

Okay for this one.

-    /*
-     * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-     * store the description into our tuplestore.
-     */
+    /* Changes are not actually produced in fast_forward mode. */

This one is a good idea.  Now CreateDecodingContext is missing the
description of what fast_forward actually does, aka no changes are
produced.  Could you update your patch to reflect that?  That would be
useful for future callers of CreateDecodingContext as well.

-    /* Stop once the moving point wanted by caller has been reached */
-    if (moveto <= ctx->reader->EndRecPtr)
-        break;
-
     CHECK_FOR_INTERRUPTS();

It seems to me that we still want to have the slot forwarding finish in
this case even if this is interrupted.  Petr, isn't that the intention
here?
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:
> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
> It seems to me that we still want to have the slot forwarding finish in
> this case even if this is interrupted.  Petr, isn't that the intention
> here?

I have been chewing a bit more on the proposed patch, finishing with the
attached to close the loop.  Thoughts?
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Arseny Sher
Date:
Michael Paquier <michael@paquier.xyz> writes:

> On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:
>> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
>> It seems to me that we still want to have the slot forwarding finish in
>> this case even if this is interrupted.  Petr, isn't that the intention
>> here?
>
> I have been chewing a bit more on the proposed patch, finishing with the
> attached to close the loop.  Thoughts?

Sorry for being pedantic, but it seems to me worthwhile to mention *why*
we need decoding machinery at all -- like I wrote:

+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
+ * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).

Also,

>   * The slot's restart_lsn is used as start point for reading records,

This is clearly seen from the code, I propose to remove this.

>   * while confirmed_lsn is used as base point for the decoding context.

And as I wrote, this doesn't matter as changes are not produced.

>   * The LSN position to move to is checked by doing a per-record scan and
>   * logical decoding which makes sure that confirmed_lsn is updated to a
>   * LSN which allows the future slot consumer to get consistent logical
> - * changes.
> + * changes.  As decoding is done with fast_forward mode, no changes are
> + * actually generated.

confirmed_lsn is always updated to `moveto` unless we run out of WAL
earlier (and unless we try to move slot backwards, which is obviously
forbidden) -- consistent changes are practically irrelevant
here. Moreover, we can directly set confirmed_lsn and still have
consistent changes further as restart_lsn and xmin of the slot are not
touched. What we actually do here is trying to advance *restart_lsn and
xmin* as far as we can but up to the point which ensures that decoding
can assemble a consistent snapshot allowing to fully decode all COMMITs
since updated `confirmed_flush_lsn`. All this happens in
SnapBuildProcessRunningXacts.

> It seems to me that we still want to have the slot forwarding finish in
> this case even if this is interrupted.  Petr, isn't that the intention
> here?

Probably, though I am not sure what is the point of this. Ok, I keep
this check in the updated (with your comments) patch and CC'ing Petr.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 61588d626f..76bafca41c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -335,11 +335,14 @@ CreateInitDecodingContext(char *plugin,
  *        The LSN at which to start decoding.  If InvalidXLogRecPtr, restart
  *        from the slot's confirmed_flush; otherwise, start from the specified
  *        location (but move it forwards to confirmed_flush if it's older than
- *        that, see below).
+ *        that, see below). Doesn't matter in fast_forward mode.
  *
  * output_plugin_options
  *        contains options passed to the output plugin.
  *
+ * fast_forward
+ *        bypasses the generation of logical changes.
+ *
  * read_page, prepare_write, do_write, update_progress
  *        callbacks that have to be filled to perform the use-case dependent,
  *        actual work.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2806e1076c..0a4985ef8c 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -341,12 +341,10 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
 
 /*
  * Helper function for advancing logical replication slot forward.
- * The slot's restart_lsn is used as start point for reading records,
- * while confirmed_lsn is used as base point for the decoding context.
- * The LSN position to move to is checked by doing a per-record scan and
- * logical decoding which makes sure that confirmed_lsn is updated to a
- * LSN which allows the future slot consumer to get consistent logical
- * changes.
+ * While we could just do LogicalConfirmReceivedLocation updating
+ * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
+ * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).
+ * We do it in special fast_forward mode without actual replay.
  */
 static XLogRecPtr
 pg_logical_replication_slot_advance(XLogRecPtr moveto)
@@ -358,7 +356,6 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 
     PG_TRY();
     {
-        /* restart at slot's confirmed_flush */
         ctx = CreateDecodingContext(InvalidXLogRecPtr,
                                     NIL,
                                     true,
@@ -388,10 +385,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
              */
             startlsn = InvalidXLogRecPtr;
 
-            /*
-             * The {begin_txn,change,commit_txn}_wrapper callbacks above will
-             * store the description into our tuplestore.
-             */
+            /* Changes are not actually produced in fast_forward mode. */
             if (record != NULL)
                 LogicalDecodingProcessRecord(ctx, ctx->reader);


Re: Possible bug in logical replication.

From
Petr Jelinek
Date:
On 20/06/18 09:59, Arseny Sher wrote:
> 
> Michael Paquier <michael@paquier.xyz> writes:
> 
>> On Mon, Jun 18, 2018 at 09:42:36PM +0900, Michael Paquier wrote:
>>> On Fri, Jun 15, 2018 at 06:27:56PM +0300, Arseny Sher wrote:
>>> It seems to me that we still want to have the slot forwarding finish in
>>> this case even if this is interrupted.  Petr, isn't that the intention
>>> here?
>>
>> I have been chewing a bit more on the proposed patch, finishing with the
>> attached to close the loop.  Thoughts?
> 
> Sorry for being pedantic, but it seems to me worthwhile to mention *why*
> we need decoding machinery at all -- like I wrote:
> 
> + * While we could just do LogicalConfirmReceivedLocation updating
> + * confirmed_flush_lsn, we'd better digest WAL to advance restart_lsn
> + * (allowing to recycle WAL) and xmin (allowing to vacuum old catalog tuples).
> 

+1

> Also,
> 
>>   * The slot's restart_lsn is used as start point for reading records,
> 
> This is clearly seen from the code, I propose to remove this.

Given there was just bug fix done for this I guess the extra clarity
there does not hurt.

> 
>>   * The LSN position to move to is checked by doing a per-record scan and
>>   * logical decoding which makes sure that confirmed_lsn is updated to a
>>   * LSN which allows the future slot consumer to get consistent logical
>> - * changes.
>> + * changes.  As decoding is done with fast_forward mode, no changes are
>> + * actually generated.
> 
> confirmed_lsn is always updated to `moveto` unless we run out of WAL
> earlier (and unless we try to move slot backwards, which is obviously
> forbidden) -- consistent changes are practically irrelevant
> here. Moreover, we can directly set confirmed_lsn and still have
> consistent changes further as restart_lsn and xmin of the slot are not
> touched. What we actually do here is trying to advance *restart_lsn and
> xmin* as far as we can but up to the point which ensures that decoding
> can assemble a consistent snapshot allowing to fully decode all COMMITs
> since updated `confirmed_flush_lsn`. All this happens in
> SnapBuildProcessRunningXacts.
> 

Those are two different things, here is consistent snapshot for logical
decoding without which we can't decode and that's handled by restart_lsn
and xmin. But the consistent stream of data for the consumer is handled
by confirmed_lsn (and the comment says that). You have to take into
account that next use of the slot can consume data (ie may be done via
one of the other interfaces and not by move). So I think what Michael
has here is correct.

>> It seems to me that we still want to have the slot forwarding finish in
>> this case even if this is interrupted.  Petr, isn't that the intention
>> here?
> 

Well, it seems wasteful to just exit there if we already finished all
the requested work, also gives some consistency with the coding of
get/peek_changes. Not very important for the functionality either way.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Thu, Jun 21, 2018 at 12:18:44PM +0200, Petr Jelinek wrote:
> On 20/06/18 09:59, Arseny Sher wrote:
>> Michael Paquier <michael@paquier.xyz> writes:
>>> It seems to me that we still want to have the slot forwarding finish in
>>> this case even if this is interrupted.  Petr, isn't that the intention
>>> here?
>
> Well, it seems wasteful to just exit there if we already finished all
> the requested work, also gives some consistency with the coding of
> get/peek_changes. Not very important for the functionality either way.

I like the concept of consistency.

Could it be possible to get a patch from all the feedback and exchange
gathered here?  Petr, I think that it would not hurt if you use the set
of words and comments you think is most adapted as the primary author of
the feature.
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Kyotaro HORIGUCHI
Date:
Hello.

At Thu, 14 Jun 2018 16:06:43 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
<20180614200643.3my362zmfiwitrni@alvherre.pgsql>
> Can somebody (Arseny, Konstantin, horiguti, Sawada) please confirm that
> Michaël's commit fixes the reported bug?

pg_advance_replication_slots can advance uninitialized physical
slots and that might not be good. (Logical slots always have
initial invalid values in thw two lsn columns.)

About scanning from restart_lsn in the advancing function, I
think I confirmed that the value always comes from
XLogRecordBuffer.origptr, which comes from ReadRecPtr, not
EndRecPtr, which cannot be on page boundary.

FWIW, as the result, it looks fine for me also regarding the
issue on this thread.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Fri, Jun 22, 2018 at 04:33:12PM +0900, Kyotaro HORIGUCHI wrote:
> pg_advance_replication_slots can advance uninitialized physical
> slots and that might not be good. (Logical slots always have
> initial invalid values in thw two lsn columns.)

The current logic is careful that users willing to move to a new
position cannot move in the future, but the logic is visibly wanted to
accept past values.  Petr, what do you think?  KeepLogSeg() won't return
negative values so some applications may take advantage of that.  Or
should advancing be simply disabled for non-initialized slots?
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
> Could it be possible to get a patch from all the feedback and exchange
> gathered here?  Petr, I think that it would not hurt if you use the set
> of words and comments you think is most adapted as the primary author of
> the feature.

I have seen no patch, so attached is one to finally close the loop and
this open item, which includes both my suggestions and what Arseny has
mentioned based on the latest emails exchanged.  Any objections to that?
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Alvaro Herrera
Date:
On 2018-Jul-03, Michael Paquier wrote:

> On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
> > Could it be possible to get a patch from all the feedback and exchange
> > gathered here?  Petr, I think that it would not hurt if you use the set
> > of words and comments you think is most adapted as the primary author of
> > the feature.
> 
> I have seen no patch, so attached is one to finally close the loop and
> this open item, which includes both my suggestions and what Arseny has
> mentioned based on the latest emails exchanged.  Any objections to that?

Let me review tomorrow.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Possible bug in logical replication.

From
Arseny Sher
Date:
Michael Paquier <michael@paquier.xyz> writes:

> On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
>> Could it be possible to get a patch from all the feedback and exchange
>> gathered here?  Petr, I think that it would not hurt if you use the set
>> of words and comments you think is most adapted as the primary author of
>> the feature.
>
> I have seen no patch, so attached is one to finally close the loop and
> this open item, which includes both my suggestions and what Arseny has
> mentioned based on the latest emails exchanged.  Any objections to that?

I'm practically happy with this.

>  * while confirmed_lsn is used as base point for the decoding context.

This line is excessive as now we have comment below saying it doesn't
matter.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Tue, Jul 03, 2018 at 01:17:48AM -0400, Alvaro Herrera wrote:
> Let me review tomorrow.

Of course, please feel free.
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Tue, Jul 03, 2018 at 09:16:42AM +0300, Arseny Sher wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> On Thu, Jun 21, 2018 at 07:31:20PM +0900, Michael Paquier wrote:
>>> Could it be possible to get a patch from all the feedback and exchange
>>> gathered here?  Petr, I think that it would not hurt if you use the set
>>> of words and comments you think is most adapted as the primary author of
>>> the feature.
>>
>> I have seen no patch, so attached is one to finally close the loop and
>> this open item, which includes both my suggestions and what Arseny has
>> mentioned based on the latest emails exchanged.  Any objections to that?
>
> I'm practically happy with this.
>
>>  * while confirmed_lsn is used as base point for the decoding context.
>
> This line is excessive as now we have comment below saying it doesn't
> matter.

Okay, let's do as you suggest then.  Do you find the attached adapted?
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Arseny Sher
Date:
Michael Paquier <michael@paquier.xyz> writes:

> Okay, let's do as you suggest then.  Do you find the attached adapted?

Yes, thanks!

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Wed, Jul 04, 2018 at 10:50:28AM +0900, Michael Paquier wrote:
> On Tue, Jul 03, 2018 at 01:17:48AM -0400, Alvaro Herrera wrote:
> > Let me review tomorrow.
>
> Of course, please feel free.

Alvaro, are you planning to look at that to close the loop?  The latest
version is here:
https://postgr.es/m/20180709070200.GC30202@paquier.xyz
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Alvaro Herrera
Date:
On 2018-Jul-12, Michael Paquier wrote:

> On Wed, Jul 04, 2018 at 10:50:28AM +0900, Michael Paquier wrote:
> > On Tue, Jul 03, 2018 at 01:17:48AM -0400, Alvaro Herrera wrote:
> > > Let me review tomorrow.
> > 
> > Of course, please feel free.
> 
> Alvaro, are you planning to look at that to close the loop?  The latest
> version is here:
> https://postgr.es/m/20180709070200.GC30202@paquier.xyz

In the immortal words of Julian Bream: "yeah, I didn't like any of that".

I started thinking that the "while we could do X, we better not because
Y" new wording in the comment was misleading -- the comment is precisely
to convey that we must NOT do X, so why say "we could"?  I reworded that
comment a few times until it made sense.  Then I noticed the other
comments were either misplaced or slightly misleading, so I moved them
to their proper places, then reworded them thoroughly.

I also moved some assignments from the declaration section to the code
section, so that I could attach proper comments to each, to improve
clarity of *why* we do those things.

I then noticed that we get a XLogRecord from XLogReadRecord, but then
fail to do anything with it, so I changed the code to use a bool
instead, which I think is clearer.

I think the proposed comment before the LogicalDecodingProcessRecord
call failed to convey the important ideas, so I rewrote that one also.

There is no struct member called confirmed_flush_lsn anywhere.

The tense of some words in CreateDecodingContext was wrong.

I also back-patched two minor changes from Tom's 3cb646264e8c.

BTW I think I'm starting to have a vague idea of logical decoding now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:
> In the immortal words of Julian Bream: "yeah, I didn't like any of
> that".

One wikipedia lookup later, I still don't know where this quote comes
from, but at least I understand who the man is.

I may be missing something, but I cannot apply your patch on HEAD so I
have not tested it.  Anyway, I read through it and the thing does not
look logically wrong.

> I also moved some assignments from the declaration section to the code
> section, so that I could attach proper comments to each, to improve
> clarity of *why* we do those things.

To be pedantic here, you could move the declarations of startlsn,
old_resowner and ctx directly inside the PG_TRY block.

> I then noticed that we get a XLogRecord from XLogReadRecord, but then
> fail to do anything with it, so I changed the code to use a bool
> instead, which I think is clearer.

Matter of taste perhaps, I was fine with just manipulating the record
pointer.

> I think the proposed comment before the LogicalDecodingProcessRecord
> call failed to convey the important ideas, so I rewrote that one also.
>
> There is no struct member called confirmed_flush_lsn anywhere.

This is referring to the system catalog field in pg_replication_slots.

> BTW I think I'm starting to have a vague idea of logical decoding
> now.

Nice.

>      PG_TRY();
>      {
> -        /* restart at slot's confirmed_flush */
> +        /*
> +         * Create our decoding context in fast_forward mode, passing start_lsn
> +         * as Invalid, so that we start processing from confirmed_flush.
> +         */

I'd rather mention InvalidXLogRecPtr directly here.  Invalid alone makes
no real sense.

> +      gotrecord = XLogReadRecord(ctx->reader, startlsn, &errm) != NULL;

I would put parenthesis for clarity.
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Alvaro Herrera
Date:
On 2018-Jul-19, Michael Paquier wrote:

> On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:
> > In the immortal words of Julian Bream: "yeah, I didn't like any of
> > that".
> 
> One wikipedia lookup later, I still don't know where this quote comes
> from, but at least I understand who the man is.

https://twitter.com/alvherre/status/1019652397306703873

OK, maybe not that "immortal" after all.

> I may be missing something, but I cannot apply your patch on HEAD so I
> have not tested it.  Anyway, I read through it and the thing does not
> look logically wrong.

Sorry, I forgot to mention this -- it applies to 11.

> > I also moved some assignments from the declaration section to the code
> > section, so that I could attach proper comments to each, to improve
> > clarity of *why* we do those things.
> 
> To be pedantic here, you could move the declarations of startlsn,
> old_resowner and ctx directly inside the PG_TRY block.

Good idea, thanks.

> > I then noticed that we get a XLogRecord from XLogReadRecord, but then
> > fail to do anything with it, so I changed the code to use a bool
> > instead, which I think is clearer.
> 
> Matter of taste perhaps, I was fine with just manipulating the record
> pointer.

Yeah, it works out to the same thing really.  Maybe it's just me being
pedantic and annoyed when I realized that logical decoding does not
operate on the record itself but on the xlogreader struct instead.

TBH what I was actually doing at first was just casting the result of
XLogReadRecord to void and not doing anything with it, until I realized
that the logical decoding call below was important because of
side-effects.  I thought since fast_forward mode was used,
XLogReadRecord would not return anything anyway -- but I added an assert
and realized that was not so.

> > I think the proposed comment before the LogicalDecodingProcessRecord
> > call failed to convey the important ideas, so I rewrote that one also.
> > 
> > There is no struct member called confirmed_flush_lsn anywhere.
> 
> This is referring to the system catalog field in pg_replication_slots.

Yeah, I think that's a bit misleading.  (I very frequently do a tag-jump
on identifiers in comments, and it's uncomfortable that in this case it
jumps to the Docbook source rather than to the struct declaration.)

> >      PG_TRY();
> >      {
> > -        /* restart at slot's confirmed_flush */
> > +        /*
> > +         * Create our decoding context in fast_forward mode, passing start_lsn
> > +         * as Invalid, so that we start processing from confirmed_flush.
> > +         */
> 
> I'd rather mention InvalidXLogRecPtr directly here.  Invalid alone makes
> no real sense.

OK.  I was of two minds about that.

> > +      gotrecord = XLogReadRecord(ctx->reader, startlsn, &errm) != NULL;
> 
> I would put parenthesis for clarity.

Or just put it back as a record pointer.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Wed, Jul 18, 2018 at 10:45:27PM -0400, Alvaro Herrera wrote:
> On 2018-Jul-19, Michael Paquier wrote:
>
>> On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:
>> > In the immortal words of Julian Bream: "yeah, I didn't like any of
>> > that".
>>
>> One wikipedia lookup later, I still don't know where this quote comes
>> from, but at least I understand who the man is.
>
> https://twitter.com/alvherre/status/1019652397306703873
>
> OK, maybe not that "immortal" after all.

:)

>> This is referring to the system catalog field in pg_replication_slots.
>
> Yeah, I think that's a bit misleading.  (I very frequently do a tag-jump
> on identifiers in comments, and it's uncomfortable that in this case it
> jumps to the Docbook source rather than to the struct declaration.)

Okay, this makes sense as well and that's easier to grep.

>>> +      gotrecord = XLogReadRecord(ctx->reader, startlsn, &errm) != NULL;
>>
>> I would put parenthesis for clarity.
>
> Or just put it back as a record pointer.

I am fine either way if you want to have the last call.  So please feel
free to choose what you prefer here.  That's no big deal.
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Alvaro Herrera
Date:
On 2018-Jul-19, Michael Paquier wrote:

> I am fine either way if you want to have the last call.  So please feel
> free to choose what you prefer here.  That's no big deal.

Okay.  You want to push it, or shall I?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Possible bug in logical replication.

From
Michael Paquier
Date:
On Thu, Jul 19, 2018 at 12:38:53AM -0400, Alvaro Herrera wrote:
> On 2018-Jul-19, Michael Paquier wrote:
>> I am fine either way if you want to have the last call.  So please feel
>> free to choose what you prefer here.  That's no big deal.
>
> Okay.  You want to push it, or shall I?

It seems to me that you are on it more than I.  So please go ahead.
Thanks for asking.
--
Michael

Attachment

Re: Possible bug in logical replication.

From
Erik Rijkers
Date:
On 2018-07-19 03:42, Michael Paquier wrote:
> On Wed, Jul 18, 2018 at 02:30:53PM -0400, Alvaro Herrera wrote:
>> In the immortal words of Julian Bream: "yeah, I didn't like any of
>> that".
> 
> One wikipedia lookup later, I still don't know where this quote comes
> from, but at least I understand who the man is.

Julian Bream is the 20th century finest classical guitarist.

The quote is from Bream's 1978 guitar masterclass (at 30:25)

https://www.youtube.com/watch?v=iAXUzIrukt0



Re: Possible bug in logical replication.

From
Alvaro Herrera
Date:
On 2018-Jul-19, Michael Paquier wrote:

> On Thu, Jul 19, 2018 at 12:38:53AM -0400, Alvaro Herrera wrote:
> > On 2018-Jul-19, Michael Paquier wrote:
> >> I am fine either way if you want to have the last call.  So please feel
> >> free to choose what you prefer here.  That's no big deal.
> > 
> > Okay.  You want to push it, or shall I?
> 
> It seems to me that you are on it more than I.  So please go ahead.
> Thanks for asking.

Done, thanks everyone.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services