Hello,
I ran a series of tests using both streaming and non-streaming logical
replication modes with the patch. In non-streaming mode, the patch
showed a significant performance improvement — up to +68% in the best
case, with a -6% regression in the worst case.
In contrast, results in streaming mode were more modest. With the
default logical_decoding_work_mem of 64MB, I observed a +11.6%
improvement at best and a -6.7% degradation at worst. Increasing the
work memory provided some incremental improvements:
At 128MB: +14.43% (best), -0.65% (worst)
At 256MB: +12.55% (best), -0.03% (worst)
At 512MB: +16.98% (best), -2.48% (worst)
It's worth noting that streaming mode is enabled by default in logical
decoding, and as such, it's likely the mode most users and
applications are operating in. Non-streaming mode is typically only
used in more specialized setups or older deployments. Given this, the
broader benefit of the patch - especially considering its complexity,
may depend on how widely non-streaming mode is used in practice.
I'm sharing these findings in case others are interested in evaluating
the patch further. I believe the worst-case performance degradation
can be reduced with better code optimization. Feedback is welcome if
people believe it’s worthwhile to continue development based on these
results.
regards,
Ajin Cherian
Fujitsu Australia