Re: WAL prefetch - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: WAL prefetch
Date
Msg-id 27163fe9-fc41-b3de-76b3-a850f1b3c9e7@postgrespro.ru
Whole thread Raw
In response to Re: WAL prefetch  (Andres Freund <andres@anarazel.de>)
Responses Re: WAL prefetch
Re: WAL prefetch
List pgsql-hackers

On 18.06.2018 23:47, Andres Freund wrote:
> On 2018-06-18 16:44:09 -0400, Robert Haas wrote:
>> On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund <andres@anarazel.de> wrote:
>>>> The posix_fadvise approach is not perfect, no doubt about that. But it
>>>> works pretty well for bitmap heap scans, and it's about 13249x better
>>>> (rough estimate) than the current solution (no prefetching).
>>> Sure, but investing in an architecture we know might not live long also
>>> has it's cost. Especially if it's not that complicated to do better.
>> My guesses are:
>>
>> - Using OS prefetching is a very small patch.
>> - Prefetching into shared buffers is a much bigger patch.
> Why?  The majority of the work is standing up a bgworker that does
> prefetching (i.e. reads WAL, figures out reads not in s_b, does
> prefetch). Allowing a configurable number + some synchronization between
> them isn't that much more work.

I do not think that prefetching in shared buffers requires much more 
efforts and make patch more envasive...
It even somehow simplify it, because there is no to maintain own cache 
of prefetched pages...
But it will definitely have much more impact on Postgres performance: 
contention for buffer locks, throwing away pages accessed by read-only 
queries,...

Also there are two points which makes prefetching into shared buffers 
more complex:
1. Need to spawn multiple workers to make prefetch in parallel and 
somehow distribute work between them.
2. Synchronize work of recovery process with prefetch to prevent 
prefetch to go too far and doing useless job.
The same problem exists for prefetch in OS cache, but here risk of false 
prefetch is less critical.


>
>
>> - It'll be five years before we have direct I/O.
> I think we'll have lost a significant market share by then if that's the
> case. Deservedly so.

I have implemented some number of DBMS engines (GigaBASE, GOODS, FastDB, 
...) and have supported direct IO (as option) in most of them.
But at most workloads I have not get any significant improvement in 
performance.
Certainly, it may be some problem with my implementations... and Linux 
kernel is significantly changed since this time.
But there is one "axiom" which complicates usage of direct IO: only OS 
knows at each moment of time how much free memory it has.
So only OS can efficiently schedule memory so that all system RAM is 
used.  It is very hard if ever possible to do it at application level.

As a result you will have to be very conservative in choosing size of 
shared buffers to fit in RAM and avoid swapping.
It may be possible if you have complete control on the server and there 
is just one Postgres instance running at this server.
But now there is a trend towards visualization and clouds and such 
assumption is not true in most cases. So double buffering
(or even triple if take in account on-device internal caches) is 
definitely an issue. But direct IO seems to be not a silver bullet for 
solving it...


Concerning WAL perfetch I still have a serious doubt if it is needed at 
all:
if checkpoint interval is less than size of free memory at the system, 
then redo process should not read much.
And if checkpoint interval is much larger than OS cache (are there cases 
when it is really needed?) then quite small patch (as it seems to me now)
forcing full page write when distance between page LSN and current WAL 
insertion point exceeds some threshold should eliminate random reads 
also in this case.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: partition -> partitioned
Next
From: "Kato, Sho"
Date:
Subject: RE: Add function to release an allocated SQLDA