RE: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: long-standing data loss bug in initial sync of logical replication |
Date | |
Msg-id | OSCPR01MB14966EB5F3B416E4689FB5A67F5D32@OSCPR01MB14966.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | RE: long-standing data loss bug in initial sync of logical replication ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Responses |
Re: long-standing data loss bug in initial sync of logical replication
|
List | pgsql-hackers |
Hi hackers, > Our team (mainly Shlok) did a performance testing with several workloads. Let > me > share them on -hackers. We did it for master/REL_17 branches, and in this post > master's one will be discussed. I posted benchmark results for master [1]. In this post contains a result for back branch, especially REL_17_STABLE. The observed trend is the same as master's one: Frequent DDL for publishing tables can cause huge regression, but this is expected. For other cases, it is small or does not exist. Used source =========== The base code was HEAD of REL_17_STABLE, and compared patch was v16. The large difference is that master tries to preserve relsync caches as much as possible, but REL_17_STABLE discards them more aggressively. Please refer recent commit, 3abe9d and 588acf6. Executed workloads were mostly same as master's case. ----- Workload A: No DDL operation done in concurrent session ====================================== No regression was observed in the workload. Concurrent txn | Head (sec) | Patch (sec) | Degradation (%) ------------------ | ------------ | ------------ | ---------------- 50 | 0.013706 | 0.013398 | -2.2496 100 | 0.014811 | 0.014821 | 0.0698 500 | 0.018288 | 0.018318 | 0.1640 1000 | 0.022613 | 0.022622 | 0.0413 2000 | 0.031812 | 0.031891 | 0.2504 ----- Workload B: DDL is happening but is unrelated to publication ======================================== Small regression was observed when the concurrency was huge. Because the DDL transaction would send inval messages to all the concurrent transactions. Concurrent txn | Head (sec) | Patch (sec) | Degradation (%) ------------------ | ------------ | ------------ | ---------------- 50 | 0.013159 | 0.013305 | 1.1120 100 | 0.014718 | 0.014725 | 0.0476 500 | 0.018134 | 0.019578 | 7.9628 1000 | 0.022762 | 0.025228 | 10.8324 2000 | 0.032326 | 0.035638 | 10.2467 ----- Workload C. DDL is happening on publication but on unrelated table ============================================ We did not run the workload because we expected this could be same results as D. 588acf6 is needed to optimize the workload. ----- Workload D. DDL is happening on the related published table, and one insert is done per invalidation ========================================= This workload had huge regression same as the master branch. This is expected because distributed invalidation messages require all concurrent transactions to rebuild relsync caches. Concurrent txn | Head (sec) | Patch (sec) | Degradation (%) ------------------ | ------------ | ------------ | ---------------- 50 | 0.013496 | 0.015588 | 15.5034 100 | 0.015112 | 0.018868 | 24.8517 500 | 0.018483 | 0.038714 | 109.4536 1000 | 0.023402 | 0.063735 | 172.3524 2000 | 0.031596 | 0.110860 | 250.8720 ----- Workload E. DDL is happening on the related published table, and 1000 inserts are done per invalidation ============================================ The regression seen by D. cannot be observed. This is same as master's case and expected because decoding 1000 tuples requires much time. Concurrent txn | Head (sec) | Patch (sec) | Degradation (%) ------------------ | ------------ | ------------ | ---------------- 50 | 0.093019 | 0.108820 | 16.9869 100 | 0.188367 | 0.199621 | 5.9741 500 | 0.967896 | 0.970674 | 0.2870 1000 | 1.658552 | 1.803991 | 8.7691 2000 | 3.482935 | 3.682771 | 5.7376 [1]: https://www.postgresql.org/message-id/OSCPR01MB149661EA973D65EBEC2B60D98F5D32%40OSCPR01MB14966.jpnprd01.prod.outlook.com Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: