Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: WIP: WAL prefetch (another approach) |
Date | |
Msg-id | 20200502151423.yf52i63u232fdfrg@localhost Whole thread Raw |
In response to | Re: WIP: WAL prefetch (another approach) (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: WIP: WAL prefetch (another approach)
|
List | pgsql-hackers |
> On Sat, Apr 25, 2020 at 09:19:35PM +0200, Dmitry Dolgov wrote: > > On Tue, Apr 21, 2020 at 05:26:52PM +1200, Thomas Munro wrote: > > > > One report I heard recently said that if you get rid of I/O stalls, > > pread() becomes cheap enough that the much higher frequency lseek() > > calls I've complained about elsewhere[1] become the main thing > > recovery is doing, at least on some systems, but I haven't pieced > > together the conditions required yet. I'd be interested to know if > > you see that. > > At the moment I've performed couple of tests for the replication in case > when almost everything is in memory (mostly by mistake, I was expecting > that a postgres replica within a badly memory limited cgroup will cause > more IO, but looks like kernel do not evict pages anyway). Not sure if > that's what you mean by getting rid of IO stalls, but in these tests > profiling shows lseek & pread appear in similar amount of samples. > > If I understand correctly, eventually one can measure prefetching > influence by looking at different redo function execution time (assuming > that data they operate with is already prefetched they should be > faster). I still have to clarify what is the exact reason, but even in > the situation described above (in memory) there is some visible > difference, e.g. I've finally performed couple of tests involving more IO. The not-that-big dataset of 1.5 GB for the replica with the memory allowing fitting ~ 1/6 of it, default prefetching parameters and an update workload with uniform distribution. Rather a small setup, but causes stable reading into the page cache on the replica and allows to see a visible influence of the patch (more measurement samples tend to happen at lower latencies): # with patch Function = b'heap_redo' [206] nsecs : count distribution 1024 -> 2047 : 0 | | 2048 -> 4095 : 32833 |********************** | 4096 -> 8191 : 59476 |****************************************| 8192 -> 16383 : 18617 |************ | 16384 -> 32767 : 3992 |** | 32768 -> 65535 : 425 | | 65536 -> 131071 : 5 | | 131072 -> 262143 : 326 | | 262144 -> 524287 : 6 | | # without patch Function = b'heap_redo' [130] nsecs : count distribution 1024 -> 2047 : 0 | | 2048 -> 4095 : 20062 |*********** | 4096 -> 8191 : 70662 |****************************************| 8192 -> 16383 : 12895 |******* | 16384 -> 32767 : 9123 |***** | 32768 -> 65535 : 560 | | 65536 -> 131071 : 1 | | 131072 -> 262143 : 460 | | 262144 -> 524287 : 3 | | Not that there were any doubts, but at the same time it was surprising to me how good linux readahead works in this situation. The results above are shown with disabled readahead for filesystem and device, and without that there was almost no difference, since a lot of IO was avoided by readahead (which was in fact the majority of all reads): # with patch flags = Read usecs : count distribution 16 -> 31 : 0 | | 32 -> 63 : 1 |******** | 64 -> 127 : 5 |****************************************| flags = ReadAhead-Read usecs : count distribution 32 -> 63 : 0 | | 64 -> 127 : 131 |****************************************| 128 -> 255 : 12 |*** | 256 -> 511 : 6 |* | # without patch flags = Read usecs : count distribution 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 4 |****************************************| flags = ReadAhead-Read usecs : count distribution 32 -> 63 : 0 | | 64 -> 127 : 143 |****************************************| 128 -> 255 : 20 |***** | Numbers of reads in this case were similar with and without patch, which means it couldn't be attributed to the situation when a page was read too early, then evicted and read again later.
pgsql-hackers by date: