Re: WAL prefetch - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: WAL prefetch
Date
Msg-id 7d50a243-eb78-6a65-905c-4ddf425df16e@postgrespro.ru
Whole thread Raw
In response to Re: WAL prefetch  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: WAL prefetch
List pgsql-hackers

On 19.06.2018 14:03, Tomas Vondra wrote:
>
>
> On 06/19/2018 11:08 AM, Konstantin Knizhnik wrote:
>>
>>
>> On 18.06.2018 23:47, Andres Freund wrote:
>>> On 2018-06-18 16:44:09 -0400, Robert Haas wrote:
>>>> On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund <andres@anarazel.de> 
>>>> wrote:
>>>>>> The posix_fadvise approach is not perfect, no doubt about that. 
>>>>>> But it
>>>>>> works pretty well for bitmap heap scans, and it's about 13249x 
>>>>>> better
>>>>>> (rough estimate) than the current solution (no prefetching).
>>>>> Sure, but investing in an architecture we know might not live long 
>>>>> also
>>>>> has it's cost. Especially if it's not that complicated to do better.
>>>> My guesses are:
>>>>
>>>> - Using OS prefetching is a very small patch.
>>>> - Prefetching into shared buffers is a much bigger patch.
>>> Why?  The majority of the work is standing up a bgworker that does
>>> prefetching (i.e. reads WAL, figures out reads not in s_b, does
>>> prefetch). Allowing a configurable number + some synchronization 
>>> between
>>> them isn't that much more work.
>>
>> I do not think that prefetching in shared buffers requires much more 
>> efforts and make patch more envasive...
>> It even somehow simplify it, because there is no to maintain own 
>> cache of prefetched pages...
>> But it will definitely have much more impact on Postgres performance: 
>> contention for buffer locks, throwing away pages accessed by 
>> read-only queries,...
>>
>> Also there are two points which makes prefetching into shared buffers 
>> more complex:
>> 1. Need to spawn multiple workers to make prefetch in parallel and 
>> somehow distribute work between them.
>> 2. Synchronize work of recovery process with prefetch to prevent 
>> prefetch to go too far and doing useless job.
>> The same problem exists for prefetch in OS cache, but here risk of 
>> false prefetch is less critical.
>>
>
> I think the main challenge here is that all buffer reads are currently 
> synchronous (correct me if I'm wrong), while the posix_fadvise() 
> allows a to prefetch the buffers asynchronously.

Yes, this is why we have to spawn several concurrent background workers 
to perfrom prefetch.
>
> I don't think simply spawning a couple of bgworkers to prefetch 
> buffers is going to be equal to async prefetch, unless we support some 
> sort of async I/O. Maybe something has changed recently, but every 
> time I looked for good portable async I/O API/library I got burned.
>
> Now, maybe a couple of bgworkers prefetching buffers synchronously 
> would be good enough for WAL refetching - after all, we only need to 
> prefetch data fast enough for the recovery not to wait. But I doubt 
> it's going to be good enough for bitmap heap scans, for example.
>
> We need a prefetch that allows filling the I/O queues with hundreds of 
> requests, and I don't think sync prefetch from a handful of bgworkers 
> can achieve that.
>
> regards
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: MERGE SQL statement for PG12
Next
From: Etsuro Fujita
Date:
Subject: Re: Expression errors with "FOR UPDATE" and postgres_fdw with partitionwise join enabled.