Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers

From Shlok Kyal
Subject Re: long-standing data loss bug in initial sync of logical replication
Date
Msg-id CANhcyEW4pq6+PO_eFn2q=23sgV1budN3y4SxpYBaKMJNADSDuA@mail.gmail.com
Whole thread Raw
In response to long-standing data loss bug in initial sync of logical replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Mon, 9 Sept 2024 at 10:41, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Mon, 2 Sept 2024 at 10:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Aug 30, 2024 at 3:06 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> > >
> > > Next I am planning to test solely on the logical decoding side and
> > > will share the results.
> > >
> >
> > Thanks, the next set of proposed tests makes sense to me. It will also
> > be useful to generate some worst-case scenarios where the number of
> > invalidations is more to see the distribution cost in such cases. For
> > example, Truncate/Drop a table with 100 or 1000 partitions.
> >
> > --
> > With Regards,
> > Amit Kapila.
>
> Hi,
>
> I did some performance testing solely on the logical decoding side and
> found some degradation in performance, for the following testcase:
> 1. Created a publisher on a single table, say 'tab_conc1';
> 2. Created a second publisher on a single table say 'tp';
> 4. two sessions are running in parallel, let's say S1 and S2.
> 5. Begin a transaction in S1.
> 6. Now in a loop (this loop runs 'count' times):
>      S1: Insert a row in table 'tab_conc1'
>      S2: BEGIN;  Alter publication DROP/ ADD tp; COMMIT
> 7. COMMIT the transaction in S1.
> 8. run 'pg_logical_slot_get_binary_changes' to get the decoding changes.
>
> Observation:
> With fix a new entry is added in decoding. During debugging I found
> that this entry only comes when we do a 'INSERT' in Session 1 after we
> do 'ALTER PUBLICATION' in another session in parallel (or we can say
> due to invalidation). Also, I observed that this new entry is related
> to sending replica identity, attributes,etc as function
> 'logicalrep_write_rel' is called.
>
> Performance:
> We see a performance degradation as we are sending new entries during
> logical decoding. Results are an average of 5 runs.
>
> count    |    Head (sec)    |    Fix (sec)    |    Degradation (%)
> ------------------------------------------------------------------------------
> 10000   |    1.298            |    1.574         |    21.26348228
> 50000   |    22.892          |    24.997       |    9.195352088
> 100000 |    88.602          |    93.759       |    5.820410374
>
> I have also attached the test script here.
>

For the above case I tried to investigate the inconsistent degradation
and found out that Serialization was happening for a large number of
'count'. So, I tried adjusting 'logical_decoding_work_mem' to a large
value, so that we can avoid serialization here. I ran the above
performance test again and got the following results:

count    |    Head (sec)    |    Fix (sec)         |    Degradation (%)
-----------------------------------------------------------------------------------
10000   |    0.415446      |    0.53596167    |    29.00874482
50000   |    7.950266      |    10.37375567  |    30.48312685
75000   |    17.192372    |    22.246715      |    29.39875312
100000 |    30.555903    |    39.431542      |    29.04721552

 These results are an average of 3 runs. Here the degradation is
consistent around ~30%.

Thanks and Regards,
Shlok Kyal



pgsql-hackers by date:

Previous
From: torikoshia
Date:
Subject: Re: Using per-transaction memory contexts for storing decoded tuples
Next
From: Amit Kapila
Date:
Subject: Re: Conflict detection for update_deleted in logical replication