Re: automatic analyze: readahead - add "IO read time" log message - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: automatic analyze: readahead - add "IO read time" log message
Date
Msg-id 20201109180644.GJ16415@tamriel.snowman.net
Whole thread Raw
In response to Re: automatic analyze: readahead - add "IO read time" log message  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: automatic analyze: readahead - add "IO read time" log message  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On 11/4/20 5:02 PM, Stephen Frost wrote:
> >* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >>>If you highlight "738754560" in the output it appears to duplicate the
> >>>syscalls issued until it preads() - in case of "738754560" offset it was
> >>>asked for 3 times. Also I wouldn't  imagine in wildest dreams that
> >>>posix_fadvise(POSIX_FADV_WILLNEED) is such a cheap syscall.
> >>
> >>IMHO that'a a bug in the patch, which always tries to prefetch all
> >>"future" blocks, including those that were already prefetched. It
> >>probably needs to do something like bitmap heap scan where we track
> >>what was already prefetched and only issue the new blocks.
> >
> >Updated patch attached which:
> >
> >- Starts out by pre-fetching the first effective_io_concurrency number
> >   of blocks we are going to want, hopefully making it so the kernel will
> >   trust our fadvise's over its own read-ahead, right from the start.
> >- Makes sure the prefetch iterator is pushed forward whenever the
> >   regular interator is moved forward.
> >- After each page read, issues a prefetch, similar to BitmapHeapScan, to
> >   hopefully avoiding having the prefetching get in the way of the
> >   regular i/o.
> >- Added some comments, ran pgindent, added a commit message.
>
> Nice, that was quick ;-)

:)

> >I do think we should also include patch that Jakub wrote previously
> >which adds information about the read rate of ANALYZE.
>
> +1

Attached is an updated patch which updates the documentation and
integrates Jakub's initial work on improving the logging around
auto-analyze (and I made the logging in auto-vacuum more-or-less match
it).

> >I'll look at integrating that into this patch and then look at a new
> >patch to do something similar for VACUUM in a bit.
>
> +1

I spent some time looking into this but it's a bit complicated..  For
some sound reasons, VACUUM will avoid skipping through a table when
there's only a few pages that it could skip (not skipping allows us to
move forward the relfrozenxid).  That said, perhaps we could start doing
prefetching once we've decided that we're skipping.  We'd need to think
about if we have to worry about the VM changing between the pre-fetching
and the time when we're actually going to ask for the page..  I don't
*think* that's an issue because only VACUUM would be changing the pages
to be all-frozen or all-visible, and so if we see a page that isn't one
of those then we're going to want to visit that page and that's not
going to change, and we probably don't need to care about a page that
used to be all-frozen and now isn't during this run- but if the prefetch
went ahead and got page 10, and now page 8 is not-all-frozen and the
actual scan is at page 5, then maybe it wants page 8 next and that isn't
what we pre-fetched...

Anyhow, all-in-all, definitely more complicated and probably best
considered and discussed independently.

> >If you're doing further benchmarking of ANALYZE though, this would
> >probably be the better patch to use.  Certainly improved performance
> >here quite a bit with effective_io_concurrency set to 16.
>
> Yeah. I'd expect this to be heavily dependent on hardware.

Sure, I agree with that too.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Jeevan Ladhe
Date:
Subject: Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
Next
From: Tomas Vondra
Date:
Subject: Re: automatic analyze: readahead - add "IO read time" log message