Re: WAL prefetch - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: WAL prefetch |
Date | |
Msg-id | 27163fe9-fc41-b3de-76b3-a850f1b3c9e7@postgrespro.ru Whole thread Raw |
In response to | Re: WAL prefetch (Andres Freund <andres@anarazel.de>) |
Responses |
Re: WAL prefetch
Re: WAL prefetch |
List | pgsql-hackers |
On 18.06.2018 23:47, Andres Freund wrote: > On 2018-06-18 16:44:09 -0400, Robert Haas wrote: >> On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund <andres@anarazel.de> wrote: >>>> The posix_fadvise approach is not perfect, no doubt about that. But it >>>> works pretty well for bitmap heap scans, and it's about 13249x better >>>> (rough estimate) than the current solution (no prefetching). >>> Sure, but investing in an architecture we know might not live long also >>> has it's cost. Especially if it's not that complicated to do better. >> My guesses are: >> >> - Using OS prefetching is a very small patch. >> - Prefetching into shared buffers is a much bigger patch. > Why? The majority of the work is standing up a bgworker that does > prefetching (i.e. reads WAL, figures out reads not in s_b, does > prefetch). Allowing a configurable number + some synchronization between > them isn't that much more work. I do not think that prefetching in shared buffers requires much more efforts and make patch more envasive... It even somehow simplify it, because there is no to maintain own cache of prefetched pages... But it will definitely have much more impact on Postgres performance: contention for buffer locks, throwing away pages accessed by read-only queries,... Also there are two points which makes prefetching into shared buffers more complex: 1. Need to spawn multiple workers to make prefetch in parallel and somehow distribute work between them. 2. Synchronize work of recovery process with prefetch to prevent prefetch to go too far and doing useless job. The same problem exists for prefetch in OS cache, but here risk of false prefetch is less critical. > > >> - It'll be five years before we have direct I/O. > I think we'll have lost a significant market share by then if that's the > case. Deservedly so. I have implemented some number of DBMS engines (GigaBASE, GOODS, FastDB, ...) and have supported direct IO (as option) in most of them. But at most workloads I have not get any significant improvement in performance. Certainly, it may be some problem with my implementations... and Linux kernel is significantly changed since this time. But there is one "axiom" which complicates usage of direct IO: only OS knows at each moment of time how much free memory it has. So only OS can efficiently schedule memory so that all system RAM is used. It is very hard if ever possible to do it at application level. As a result you will have to be very conservative in choosing size of shared buffers to fit in RAM and avoid swapping. It may be possible if you have complete control on the server and there is just one Postgres instance running at this server. But now there is a trend towards visualization and clouds and such assumption is not true in most cases. So double buffering (or even triple if take in account on-device internal caches) is definitely an issue. But direct IO seems to be not a silver bullet for solving it... Concerning WAL perfetch I still have a serious doubt if it is needed at all: if checkpoint interval is less than size of free memory at the system, then redo process should not read much. And if checkpoint interval is much larger than OS cache (are there cases when it is really needed?) then quite small patch (as it seems to me now) forcing full page write when distance between page LSN and current WAL insertion point exceeds some threshold should eliminate random reads also in this case. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: