Re: incremental-checkopints - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: incremental-checkopints |
Date | |
Msg-id | CAEze2WjaKqU=76Syhy-Tb1oURz_tbufFQws1X-QNak=fxYP_+g@mail.gmail.com Whole thread Raw |
In response to | Re: incremental-checkopints (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: incremental-checkopints
Re: incremental-checkopints |
List | pgsql-hackers |
On Wed, 26 Jul 2023 at 20:58, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 7/26/23 15:16, Matthias van de Meent wrote: > > On Wed, 26 Jul 2023 at 14:41, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> > >> Hello > >> > >> On 2023-Jul-26, Thomas wen wrote: > >> > >>> Hi Hackes: I found this page : > >>> https://pgsql-hackers.postgresql.narkive.com/cMxBwq65/incremental-checkopints,PostgreSQL > >>> no incremental checkpoints have been implemented so far. When a > >>> checkpoint is triggered, the performance jitter of PostgreSQL is very > >>> noticeable. I think incremental checkpoints should be implemented as > >>> soon as possible > >> > >> I think my first question is why do you think that is necessary; there > >> are probably other tools to achieve better performance. For example, > >> you may want to try making checkpoint_completion_target closer to 1, and > >> the checkpoint interval longer (both checkpoint_timeout and > >> max_wal_size). Also, changing shared_buffers may improve things. You > >> can try adding more RAM to the machine. > > > > Even with all those tuning options, a significant portion of a > > checkpoint's IO (up to 50%) originates from FPIs in the WAL, which (in > > general) will most often appear at the start of each checkpoint due to > > each first update to a page after a checkpoint needing an FPI. > > Yeah, FPIs are certainly expensive and can represent huge part of the > WAL produced. But how would incremental checkpoints make that step > unnecessary? > > > If instead we WAL-logged only the pages we are about to write to disk > > (like MySQL's double-write buffer, but in WAL instead of a separate > > cyclical buffer file), then a checkpoint_completion_target close to 1 > > would probably solve the issue, but with "WAL-logged torn page > > protection at first update after checkpoint" we'll probably always > > have higher-than-average FPI load just after a new checkpoint. > > > > So essentially instead of WAL-logging the FPI on the first change, we'd > only do that later when actually writing-out the page (either during a > checkpoint or because of memory pressure)? How would you make sure > there's enough WAL space until the next checkpoint? I mean, FPIs are a > huge write amplification source ... You don't make sure that there's enough space for the modifications, but does it matter from a durability point of view? As long as the page isn't written to disk before the FPI, we can replay non-FPI (but fsynced) WAL on top of the old version of the page that you read from disk, instead of only trusting FPIs from WAL. > Imagine the system has max_wal_size set to 1GB, and does 1M updates > before writing 512MB of WAL and thus triggering a checkpoint. Now it > needs to write FPIs for 1M updates - easily 8GB of WAL, maybe more with > indexes. What then? Then you ignore the max_wal_size GUC as PostgreSQL so often already does. At least, it doesn't do what I expect it to do at face value - limit the size of the WAL directory to the given size. But more reasonably, you'd keep track of the count of modified pages that are yet to be fully WAL-logged, and keep that into account as a debt that you have to the current WAL insert pointer when considering checkpoint distances and max_wal_size. --- The main issue that I see with "WAL-logging the FPI only when you write the dirty page to disk" is that dirty page flushing also happens with buffer eviction in ReadBuffer(). This change in behaviour would add a WAL insertion penalty to this write, and make it a very common occurrance that we'd have to write WAL + fsync the WAL when we have to write the dirty page. It would thus add significant latency to the dirty write mechanism, which is probably a unpopular change. Kind regards, Matthias van de Meent Neon (https://neon.tech)
pgsql-hackers by date: