On 06/19/2018 02:33 PM, Konstantin Knizhnik wrote:
>
> On 19.06.2018 14:03, Tomas Vondra wrote:
>>
>> On 06/19/2018 11:08 AM, Konstantin Knizhnik wrote:
>>>
>>> ...
>>>
>>> Also there are two points which makes prefetching into shared buffers
>>> more complex:
>>> 1. Need to spawn multiple workers to make prefetch in parallel and
>>> somehow distribute work between them.
>>> 2. Synchronize work of recovery process with prefetch to prevent
>>> prefetch to go too far and doing useless job.
>>> The same problem exists for prefetch in OS cache, but here risk of
>>> false prefetch is less critical.
>>>
>>
>> I think the main challenge here is that all buffer reads are currently
>> synchronous (correct me if I'm wrong), while the posix_fadvise()
>> allows a to prefetch the buffers asynchronously.
>
> Yes, this is why we have to spawn several concurrent background workers
> to perfrom prefetch.
Right. My point is that while spawning bgworkers probably helps, I don't
expect it to be enough to fill the I/O queues on modern storage systems.
Even if you start say 16 prefetch bgworkers, that's not going to be
enough for large arrays or SSDs. Those typically need way more than 16
requests in the queue.
Consider for example [1] from 2014 where Merlin reported how S3500
(Intel SATA SSD) behaves with different effective_io_concurrency values:
[1]
https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azsTdQ@mail.gmail.com
Clearly, you need to prefetch 32/64 blocks or so. Consider you may have
multiple such devices in a single RAID array, and that this device is
from 2014 (and newer flash devices likely need even deeper queues).
ISTM a small number of bgworkers is not going to be sufficient. It might
be enough for WAL prefetching (where we may easily run into the
redo-is-single-threaded bottleneck), but it's hardly a solution for
bitmap heap scans, for example. We'll need to invent something else for
that.
OTOH my guess is that whatever solution we'll end up implementing for
bitmap heap scans, it will be applicable for WAL prefetching too. Which
is why I'm suggesting simply using posix_fadvise is not going to make
the direct I/O patch significantly more complicated.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services