Home > mailing lists

Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

From	Dmitry Dolgov
Subject	Re: WIP: WAL prefetch (another approach)
Date	May 2, 2020 15:14:23
Msg-id	20200502151423.yf52i63u232fdfrg@localhost Whole thread Raw
In response to	Re: WIP: WAL prefetch (another approach) (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses	Re: WIP: WAL prefetch (another approach)
List	pgsql-hackers

Tree view

> On Sat, Apr 25, 2020 at 09:19:35PM +0200, Dmitry Dolgov wrote:
> > On Tue, Apr 21, 2020 at 05:26:52PM +1200, Thomas Munro wrote:
> >
> > One report I heard recently said that if  you get rid of I/O stalls,
> > pread() becomes cheap enough that the much higher frequency lseek()
> > calls I've complained about elsewhere[1] become the main thing
> > recovery is doing, at least on some systems, but I haven't pieced
> > together the conditions required yet.  I'd be interested to know if
> > you see that.
>
> At the moment I've performed couple of tests for the replication in case
> when almost everything is in memory (mostly by mistake, I was expecting
> that a postgres replica within a badly memory limited cgroup will cause
> more IO, but looks like kernel do not evict pages anyway). Not sure if
> that's what you mean by getting rid of IO stalls, but in these tests
> profiling shows lseek & pread appear in similar amount of samples.
>
> If I understand correctly, eventually one can measure prefetching
> influence by looking at different redo function execution time (assuming
> that data they operate with is already prefetched they should be
> faster). I still have to clarify what is the exact reason, but even in
> the situation described above (in memory) there is some visible
> difference, e.g.

I've finally performed couple of tests involving more IO. The
not-that-big dataset of 1.5 GB for the replica with the memory allowing
fitting ~ 1/6 of it, default prefetching parameters and an update
workload with uniform distribution. Rather a small setup, but causes
stable reading into the page cache on the replica and allows to see a
visible influence of the patch (more measurement samples tend to happen
at lower latencies):

    # with patch
    Function = b'heap_redo' [206]
     nsecs               : count     distribution
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 32833    |**********************                  |
      4096 -> 8191       : 59476    |****************************************|
      8192 -> 16383      : 18617    |************                            |
     16384 -> 32767      : 3992     |**                                      |
     32768 -> 65535      : 425      |                                        |
     65536 -> 131071     : 5        |                                        |
    131072 -> 262143     : 326      |                                        |
    262144 -> 524287     : 6        |                                        |

    # without patch
    Function = b'heap_redo' [130]
     nsecs               : count     distribution
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 20062    |***********                             |
      4096 -> 8191       : 70662    |****************************************|
      8192 -> 16383      : 12895    |*******                                 |
     16384 -> 32767      : 9123     |*****                                   |
     32768 -> 65535      : 560      |                                        |
     65536 -> 131071     : 1        |                                        |
    131072 -> 262143     : 460      |                                        |
    262144 -> 524287     : 3        |                                        |

Not that there were any doubts, but at the same time it was surprising
to me how good linux readahead works in this situation. The results
above are shown with disabled readahead for filesystem and device, and
without that there was almost no difference, since a lot of IO was
avoided by readahead (which was in fact the majority of all reads):

    # with patch
    flags = Read
         usecs               : count     distribution
            16 -> 31         : 0        |                                        |
            32 -> 63         : 1        |********                                |
            64 -> 127        : 5        |****************************************|

    flags = ReadAhead-Read
         usecs               : count     distribution
            32 -> 63         : 0        |                                        |
            64 -> 127        : 131      |****************************************|
           128 -> 255        : 12       |***                                     |
           256 -> 511        : 6        |*                                       |

    # without patch
    flags = Read
         usecs               : count     distribution
            16 -> 31         : 0        |                                        |
            32 -> 63         : 0        |                                        |
            64 -> 127        : 4        |****************************************|

    flags = ReadAhead-Read
         usecs               : count     distribution
            32 -> 63         : 0        |                                        |
            64 -> 127        : 143      |****************************************|
           128 -> 255        : 20       |*****                                   |

Numbers of reads in this case were similar with and without patch, which
means it couldn't be attributed to the situation when a page was read
too early, then evicted and read again later.

pgsql-hackers by date:

From: Tomas Vondra
Date: 02 May 2020, 14:05:29
Subject: Re: pg_stat_reset_slru(name) doesn't seem to work as documented

From: Tomas Vondra
Date: 02 May 2020, 16:59:05
Subject: Re: SLRU statistics

Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

Previous

Next