RE: Logical replication keepalive flood - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Logical replication keepalive flood
Date
Msg-id OS0PR01MB57167898E5220CA118DB679A94DD9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical replication keepalive flood  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thursday, September 16, 2021 8:36 PM Amit Kapila <amit.kapila16@gmail.com>:
> On Thu, Sep 16, 2021 at 6:29 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> >
> > After applying the patch,
> > I saw the same problem and can reproduce it by the following steps:
> >
> > 1) execute the SQLs.
> > -----------SQL-----------
> > CREATE TABLE decoding_test(x integer, y text);
> > SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');
> > INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(1,4)
> s;
> >
> > -- use the lsn here to execute pg_recvlogical later
> > SELECT lsn FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL)
> ORDER BY lsn DESC LIMIT 1;
> > INSERT INTO decoding_test(x,y) SELECT s, s::text FROM
> generate_series(5,50) s;
> > ----------------------
> >
> > 2) Then, if I execute the following command twice:
> > # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f -
> > BEGIN 708
> > table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
> > table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
> > table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
> > table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
> > COMMIT 708
> >
> > # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f -
> > BEGIN 709
> >
> > It still generated ' BEGIN 709' in the second time execution.
> > But it will output nothing in the second time execution without the patch.
> >
> 
> I think here the reason is that the first_lsn of a transaction is
> always equal to end_lsn of the previous transaction (See comments
> above first_lsn and end_lsn fields of ReorderBufferTXN). I have not
> debugged but I think in StreamLogicalLog() the cur_record_lsn after
> receiving 'w' message, in this case, will be equal to endpos whereas
> we expect to be greater than endpos to exit. Before the patch, it will
> always get the 'k' message where we expect the received lsn to be
> equal to endpos to conclude that we can exit. Do let me know if your
> analysis differs?

After debugging it, I agree with your analysis.

WITH the patch:
in function StreamLogicalLog(), I can see the cur_record_lsn is equal
to endpos which result in unexpected record.

WITHOUT the patch:
In function StreamLogicalLog(), it first received a 'k' message which will break the
loop by the following code.

            if (endposReached)
            {
                prepareToTerminate(conn, endpos, true, InvalidXLogRecPtr);
                time_to_abort = true;
                break;
            }

Best regards,
Hou zj

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs
Next
From: Bharath Rupireddy
Date:
Subject: Re: Refactoring postgres_fdw code to rollback remote transaction