Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1508240810170.14924@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: checkpointer continuous flushing  (Michael Paquier <michael.paquier@gmail.com>)
Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hello Amit,

>> Can the script be started on its own at all?
>
> I have tried like below which results in same error, also I tried few
> other variations but could not succeed.
> ./avg.py

Hmmm... Ensure that the script is readable and executable:
  sh> chmod a+rx ./avg.py

Also check the file:
  sh> file ./avg.py  ./avg.py: Python script, UTF-8 Unicode text executable

>> Sure... This is *already* the case with the current checkpointer, the
>> schedule is performed with respect to the initial number of buffers it
>> think it will have to write, and if someone else writes these buffers then
>> the schedule is skewed a little bit, or more... I have not changed this
>
> I don't know how good or bad it is to build  further on somewhat skewed
> logic,

The logic is no more skewed that it is with the current version: your 
remark about the estimation which may be wrong in some cases is clearly 
valid, but it is orthogonal (independent, unrelated, different) to what is 
addressed by this patch.

I currently have no reason to believe that the issue you raise is a major 
performance issue, but if so it may be addressed by another patch by 
whoever want to do so.

What I have done is to demonstrate that generating a lot of random I/Os is 
a major performance issue (well, sure), and this patch addresses this 
point and provide major speedup (*3-5) and latency reductions (from +60% 
unavailability to nearly full availability) for high OLTP write load, by 
reordering and flushing checkpoint buffers in a sensible way.

> but the point is that unless it is required why to use it.

This is really required to avoid predictable performance regressions, see 
below.

>> I do not think that Heikki version worked wrt to balancing writes over
>> tablespaces,
>
> I also think that it doesn't balances over tablespaces, but the question 
> is why do we need to balance over tablespaces, can we reliably predict 
> in someway which indicates that performing balancing over tablespace can 
> help the workload.

The reason for the tablespace balancing is that in the current postgres 
buffers are written more or less randomly, so it is (probably) implicitely 
and statistically balanced over tablespaces because of this randomness, 
and indeed, AFAIK, people with multi tablespace setup have not complained 
that postgres was using the disks sequentially.

However, once the buffers are sorted per file, the order becomes 
deterministic and there is no more implicit balancing, which means that if 
someone has a pg setup with several disks it will write sequentially on 
these instead of in parallel.

This regression was pointed out by Andres Freund, I agree that such a 
regression for high end systems must be avoided, hence the tablespace 
balancing.

> I think here we are doing more engineering than required for this patch.

I do not think so, I think that Andres remark is justified to avoid a 
performance regression on high end systems which use tablespaces, which is 
really undesirable.

About the balancing code, it is not that difficult, even if it is not 
trivial: the point is to select the tablespace for which the progress 
ratio (written/to_write) is below the overall progress ratio, so that it 
catches up, and do so in a round robin maner, so that all tablespaces get 
to write things. I also have both written a proof and tested the logic (in 
a separate script).

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Stefan Kaltenbrunner
Date:
Subject: Re: (full) Memory context dump considered harmful
Next
From: Simon Riggs
Date:
Subject: Re: Declarative partitioning