On Thursday, September 16, 2021 8:36 PM Amit Kapila <amit.kapila16@gmail.com>:
> On Thu, Sep 16, 2021 at 6:29 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> >
> > After applying the patch,
> > I saw the same problem and can reproduce it by the following steps:
> >
> > 1) execute the SQLs.
> > -----------SQL-----------
> > CREATE TABLE decoding_test(x integer, y text);
> > SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');
> > INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(1,4)
> s;
> >
> > -- use the lsn here to execute pg_recvlogical later
> > SELECT lsn FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)
> ORDER BY lsn DESC LIMIT 1;
> > INSERT INTO decoding_test(x,y) SELECT s, s::text FROM
> generate_series(5,50) s;
> > ----------------------
> >
> > 2) Then, if I execute the following command twice:
> > # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f -
> > BEGIN 708
> > table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
> > table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
> > table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
> > table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
> > COMMIT 708
> >
> > # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f -
> > BEGIN 709
> >
> > It still generated ' BEGIN 709' in the second time execution.
> > But it will output nothing in the second time execution without the patch.
> >
>
> I think here the reason is that the first_lsn of a transaction is
> always equal to end_lsn of the previous transaction (See comments
> above first_lsn and end_lsn fields of ReorderBufferTXN). I have not
> debugged but I think in StreamLogicalLog() the cur_record_lsn after
> receiving 'w' message, in this case, will be equal to endpos whereas
> we expect to be greater than endpos to exit. Before the patch, it will
> always get the 'k' message where we expect the received lsn to be
> equal to endpos to conclude that we can exit. Do let me know if your
> analysis differs?
After debugging it, I agree with your analysis.
WITH the patch:
in function StreamLogicalLog(), I can see the cur_record_lsn is equal
to endpos which result in unexpected record.
WITHOUT the patch:
In function StreamLogicalLog(), it first received a 'k' message which will break the
loop by the following code.
if (endposReached)
{
prepareToTerminate(conn, endpos, true, InvalidXLogRecPtr);
time_to_abort = true;
break;
}
Best regards,
Hou zj