If I publish a pgbench workload and subscribe to it, the subscription worker is signalling the wal writer thousands of times a second, once for every async commit. This has a noticeable performance cost.
I've used a local variable to avoid waking up the wal writer more than once for the same page boundary. This reduces the number of wake-ups by about 7/8.
I'm testing it by doing 1e6 transactions over 8 clients while replication is in effect, then waiting for the logical replica to catch up. This cycle takes 183.1 seconds in HEAD, and 162.4 seconds with the attached patch. N=14, p-value for difference of the means 6e-17.
If I suppress all wake-ups just to see what would happen, it further reduces the runtime to 153.7.