Re: Load Distributed Checkpoints, final patch - Mailing list pgsql-patches
From | Heikki Linnakangas |
---|---|
Subject | Re: Load Distributed Checkpoints, final patch |
Date | |
Msg-id | 4688C71F.9040305@enterprisedb.com Whole thread Raw |
In response to | Re: Load Distributed Checkpoints, final patch (Heikki Linnakangas <heikki@enterprisedb.com>) |
Responses |
Re: Load Distributed Checkpoints, final patch
|
List | pgsql-patches |
Heikki Linnakangas wrote: > Tom Lane wrote: >> Heikki Linnakangas <heikki@enterprisedb.com> writes: >>> I'm scheduling more DBT-2 tests at a high # of warehouses per Greg >>> Smith's suggestion just to see what happens, but I doubt that will >>> change my mind on the above decisions. >> >> When do you expect to have those results? > > In a few days. I'm doing long tests because the variability in the 1h > tests was very high. I ran two tests with 200 warehouses to see how LDC behaves on a badly overloaded system, see tests imola-319 and imola-320. Seems to work quite well. In fact the checkpoint spike is relatively speaking less severe than with smaller # of warehouses even in the baseline test run, and LDC smooths it very nicely. After those two tests, I noticed that I had full_page_writes=off in all tests performed earlier :(. That throws off the confidence in those results, so I ran more tests with full_page_writes on and off to compare the affect. I also wanted to compare the effectiveness of the patch when checkpoints are triggered by either checkpoint_timeout or checkpoint_segments. imola-326 - imola-330 are all configured so that checkpoints happen roughly on a 50 minute interval. On imola-326, checkpoints are triggered by checkpoint_segments, and on imola-327 they're triggered by checkpoint_timeout. On imola-326, the write phase lasts ~7 minutes, and on imola-327, it lasts ~10 minutes. Because of full_page_writes, a lot more WAL is consumed right after starting the checkpoint, so we end up being more aggressive than necessary at the beginning. For comparison, imola-328 has full_page_writes=off. Checkpoints last ~9 minutes there, and the graphs look very smooth. That suggests that spreading the writes over a longer time wouldn't make a difference, but smoothing the rush at the beginning of checkpoint might. I'm going to try the algorithm I posted, that uses the WAL consumption rate from previous checkpoint interval in the calculations. Imola-329 is the same as imola-328, but with updated CVS source tree instead of older tree + patch. The purpose of this test was basically to just verify that what was committed works the same as the patch. Imola-330 is comparable with imola-327, checkpoints are triggered by timeout and full_page_writes=on. But 330 was patched to to call PreallocXlogFiles in bgwriter, per Tom's idea. According to logs, most WAL segments are created by bgwriter in that test, and response times look slightly better with the patch, though I'm not sure the difference is statistically significant. As before, the results are available at http://community.enterprisedb.com/ldc/ -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-patches by date: