Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1508240810170.14924@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: checkpointer continuous flushing
Re: checkpointer continuous flushing |
List | pgsql-hackers |
Hello Amit, >> Can the script be started on its own at all? > > I have tried like below which results in same error, also I tried few > other variations but could not succeed. > ./avg.py Hmmm... Ensure that the script is readable and executable: sh> chmod a+rx ./avg.py Also check the file: sh> file ./avg.py ./avg.py: Python script, UTF-8 Unicode text executable >> Sure... This is *already* the case with the current checkpointer, the >> schedule is performed with respect to the initial number of buffers it >> think it will have to write, and if someone else writes these buffers then >> the schedule is skewed a little bit, or more... I have not changed this > > I don't know how good or bad it is to build further on somewhat skewed > logic, The logic is no more skewed that it is with the current version: your remark about the estimation which may be wrong in some cases is clearly valid, but it is orthogonal (independent, unrelated, different) to what is addressed by this patch. I currently have no reason to believe that the issue you raise is a major performance issue, but if so it may be addressed by another patch by whoever want to do so. What I have done is to demonstrate that generating a lot of random I/Os is a major performance issue (well, sure), and this patch addresses this point and provide major speedup (*3-5) and latency reductions (from +60% unavailability to nearly full availability) for high OLTP write load, by reordering and flushing checkpoint buffers in a sensible way. > but the point is that unless it is required why to use it. This is really required to avoid predictable performance regressions, see below. >> I do not think that Heikki version worked wrt to balancing writes over >> tablespaces, > > I also think that it doesn't balances over tablespaces, but the question > is why do we need to balance over tablespaces, can we reliably predict > in someway which indicates that performing balancing over tablespace can > help the workload. The reason for the tablespace balancing is that in the current postgres buffers are written more or less randomly, so it is (probably) implicitely and statistically balanced over tablespaces because of this randomness, and indeed, AFAIK, people with multi tablespace setup have not complained that postgres was using the disks sequentially. However, once the buffers are sorted per file, the order becomes deterministic and there is no more implicit balancing, which means that if someone has a pg setup with several disks it will write sequentially on these instead of in parallel. This regression was pointed out by Andres Freund, I agree that such a regression for high end systems must be avoided, hence the tablespace balancing. > I think here we are doing more engineering than required for this patch. I do not think so, I think that Andres remark is justified to avoid a performance regression on high end systems which use tablespaces, which is really undesirable. About the balancing code, it is not that difficult, even if it is not trivial: the point is to select the tablespace for which the progress ratio (written/to_write) is below the overall progress ratio, so that it catches up, and do so in a round robin maner, so that all tablespaces get to write things. I also have both written a proof and tested the logic (in a separate script). -- Fabien.
pgsql-hackers by date: