On Sun, May 3, 2020 at 3:12 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I've finally performed couple of tests involving more IO. The
> not-that-big dataset of 1.5 GB for the replica with the memory allowing
> fitting ~ 1/6 of it, default prefetching parameters and an update
> workload with uniform distribution. Rather a small setup, but causes
> stable reading into the page cache on the replica and allows to see a
> visible influence of the patch (more measurement samples tend to happen
> at lower latencies):
Thanks for these tests Dmitry. You didn't mention the details of the
workload, but one thing I'd recommend for a uniform/random workload
that's generating a lot of misses on the primary server using N
backends is to make sure that maintenance_io_concurrency is set to a
number like N*2 or higher, and to look at the queue depth on both
systems with iostat -x 1. Then you can experiment with ALTER SYSTEM
SET maintenance_io_concurrency = X; SELECT pg_reload_conf(); to try to
understand the way it works; there is a point where you've set it high
enough and the replica is able to handle the same rate of concurrent
I/Os as the primary. The default of 10 is actually pretty low unless
you've only got ~4 backends generating random updates on the primary.
That's with full_page_writes=off; if you leave it on, it takes a while
to get into a scenario where it has much effect.
Here's a rebase, after the recent XLogReader refactoring.