Home > mailing lists

Re: automatic analyze: readahead - add "IO read time" log message - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: automatic analyze: readahead - add "IO read time" log message
Date	November 4, 2020 14:19:05
Msg-id	20201104141905.dfoeu7pjarkxhml2@development Whole thread Raw
In response to	Re: automatic analyze: readahead - add "IO read time" log message (Jakub Wartak <Jakub.Wartak@tomtom.com>)
Responses	Re: automatic analyze: readahead - add "IO read time" log message Re: automatic analyze: readahead - add "IO read time" log message
List	pgsql-hackers

Tree view

On Wed, Nov 04, 2020 at 09:07:59AM +0000, Jakub Wartak wrote:
>
>Greetings Stephen,
>
>I saw up 410MB/s for a few seconds with this patch on NVMe, and that's
>huge ~5.2x improvement which is amazing for a such simple patch.
>
>The system and data was identical like last time, so results are directly comparable
>to the previous post. The only change is that I've applied Yours patch on top of
>REL12_STABLE as this is cluster where I was having original data. I think it deserves
> it's own commitfest entry, right? Or do we combine those two?
>
>Each test was conducted with
>1. alter system set effective_io_concurrency=$N;
>2. echo 3 > /proc/sys/vm/drop_caches; 3x sync; /etc/init.d/postgresql-12 restart;
>3. analyze verbose t;
>4. the measurements were observed by pidstat -d and iostat -dxm.
>
>no readahead (blockdev --setra 0), cold cache:
>effective_io_concurrency=64; => ~26MB/s
>effective_io_concurrency=32; => ~29MB/s
>effective_io_concurrency=8; => ~31MB/s
>effective_io_concurrency=4; => ~32MB/s
>effective_io_concurrency=1; => ~32MB/s
>effective_io_concurrency=0; => ~32MB/s
>
>readahead=256 (128kB, Linux defaults), cold cache:
>effective_io_concurrency=0; => ~60..68MB/s (analyze total elapsed time 3min 22s , baseline)
>effective_io_concurrency=1; => ~62MB/s
>effective_io_concurrency=4; => ~370MB/s
>effective_io_concurrency=8; => ~380..410MB/s <-- wow! up to 52k read IOPS, CPU util 100%
>effective_io_concurrency=16; => ~320..390MB/s (analyze total elapsed time ~39s)
>

Thanks for the testing, those are indeed quite promising numbers for
such a simple patch. I wonder how would this behave on a different
storage - I'll see if I can do some testing, but I guess it should be
the same as for other places that do prefetching.

>I think one needs to be sure to restart, so that autovacuum
>workers get the new effective_io_concurrency, maybe that's a
>documentation thing only.
>
>I have just one question, with this patch the strace I/O of analyze
>look like below:
>
>pread(72, ..., 8192, 738181120) = 8192
>fadvise64(72, 738500608, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 738607104, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 738754560, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 738762752, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739115008, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739155968, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739385344, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739696640, 8192, POSIX_FADV_WILLNEED) = 0
>pread(72, "..., 8192, 738500608) = 8192
>fadvise64(72, 738607104, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 738754560, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 738762752, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739115008, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739155968, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739385344, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739696640, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739934208, 8192, POSIX_FADV_WILLNEED) = 0
>pread(72, .., 8192, 738607104) = 8192
>fadvise64(72, 738754560, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 738762752, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739115008, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739155968, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739385344, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739696640, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739934208, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 740122624, 8192, POSIX_FADV_WILLNEED) = 0
>pread(72, ..., 8192, 738754560) = 8192
>fadvise64(72, 738762752, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739115008, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739155968, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739385344, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739696640, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 739934208, 8192, POSIX_FADV_WILLNEED) = 0
>fadvise64(72, 740122624, 8192, POSIX_FADV_WILLNEED) = 0
>
>If you highlight "738754560" in the output it appears to duplicate the
>syscalls issued until it preads() - in case of "738754560" offset it was
>asked for 3 times. Also I wouldn't  imagine in wildest dreams that
>posix_fadvise(POSIX_FADV_WILLNEED) is such a cheap syscall.
>

IMHO that'a a bug in the patch, which always tries to prefetch all
"future" blocks, including those that were already prefetched. It
probably needs to do something like bitmap heap scan where we track
what was already prefetched and only issue the new blocks.

BTW it seems your e-mail client does something funny, stripping the
"references" headers, which breaks threading and makes following the
discussion very hard (and will confuse the CF app too). If you look at
pgsql-hackers archives, each of your responses starts a new thread.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tomas Vondra
Date: 04 November 2020, 13:50:30
Subject: Re: PANIC: could not fsync file "pg_multixact/..." since commit dee663f7843

From: Stephen Frost
Date: 04 November 2020, 14:30:41
Subject: Re: automatic analyze: readahead - add "IO read time" log message

Re: automatic analyze: readahead - add "IO read time" log message - Mailing list pgsql-hackers

Previous

Next