Re: checkpointer continuous flushing - V18 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: checkpointer continuous flushing - V18
Date
Msg-id 20160310212757.4xmnxs3pz62b2c5i@alap3.anarazel.de
Whole thread Raw
In response to Re: checkpointer continuous flushing - V18  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing - V18  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On 2016-03-08 09:28:15 +0100, Fabien COELHO wrote:
> 
> >>>Now I cannot see how having one context per table space would have a
> >>>significant negative performance impact.
> >>
> >>The 'dirty data' etc. limits are global, not per block device. By having
> >>several contexts with unflushed dirty data the total amount of dirty
> >>data in the kernel increases.
> >
> >Possibly, but how much?  Do you have experimental data to back up that
> >this is really an issue?
> >
> >We are talking about 32 (context size) * #table spaces * 8KB buffers = 4MB
> >of dirty buffers to manage for 16 table spaces, I do not see that as a
> >major issue for the kernel.

We flush in those increments, that doesn't mean there's only that much
dirty data. I regularly see one order of magnitude more being dirty.


I had originally kept it with one context per tablespace after
refactoring this, but found that it gave worse results in rate limited
loads even over only two tablespaces. That's on SSDs though.


> To complete the argument, the 4MB is just a worst case scenario, in reality
> flushing the different context would be randomized over time, so the
> frequency of flushing a context would be exactly the same in both cases
> (shared or per table space context) if the checkpoints are the same size,
> just that with shared table space each flushing potentially targets all
> tablespace with a few pages, while with the other version each flushing
> targets one table space only.

The number of pages still in writeback (i.e. for which sync_file_range
has been issued, but which haven't finished running yet) at the end of
the checkpoint matters for the latency hit incurred by the fsync()s from
smgrsync(); at least by my measurement.


My current plan is to commit this with the current behaviour (as in this
week[end]), and then do some actual benchmarking on this specific
part. It's imo a relatively minor detail.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.
Next
From: "Regina Obe"
Date:
Subject: Re: Is there a way around function search_path killing SQL function inlining? - and backup / restore issue