Re: Load Distributed Checkpoints, final patch - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: Load Distributed Checkpoints, final patch
Date
Msg-id 4688C71F.9040305@enterprisedb.com
Whole thread Raw
In response to Re: Load Distributed Checkpoints, final patch  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Load Distributed Checkpoints, final patch
List pgsql-patches
Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>>> I'm scheduling more DBT-2 tests at a high # of warehouses per Greg
>>> Smith's suggestion just to see what happens, but I doubt that will
>>> change my mind on the above decisions.
>>
>> When do you expect to have those results?
>
> In a few days. I'm doing long tests because the variability in the 1h
> tests was very high.

I ran two tests with 200 warehouses to see how LDC behaves on a badly
overloaded system, see tests imola-319 and imola-320. Seems to work
quite well. In fact the checkpoint spike is relatively speaking less
severe than with smaller # of warehouses even in the baseline test run,
and LDC smooths it very nicely.

After those two tests, I noticed that I had full_page_writes=off in all
tests performed earlier :(. That throws off the confidence in those
results, so I ran more tests with full_page_writes on and off to compare
the affect. I also wanted to compare the effectiveness of the patch when
checkpoints are triggered by either checkpoint_timeout or
checkpoint_segments.

imola-326 - imola-330 are all configured so that checkpoints happen
roughly on a 50 minute interval. On imola-326, checkpoints are triggered
by checkpoint_segments, and on imola-327 they're triggered by
checkpoint_timeout. On imola-326, the write phase lasts ~7 minutes, and
on imola-327, it lasts ~10 minutes. Because of full_page_writes, a lot
more WAL is consumed right after starting the checkpoint, so we end up
being more aggressive than necessary at the beginning.

For comparison, imola-328 has full_page_writes=off. Checkpoints last ~9
minutes there, and the graphs look very smooth. That suggests that
spreading the writes over a longer time wouldn't make a difference, but
smoothing the rush at the beginning of checkpoint might. I'm going to
try the algorithm I posted, that uses the WAL consumption rate from
previous checkpoint interval in the calculations.

Imola-329 is the same as imola-328, but with updated CVS source tree
instead of older tree + patch. The purpose of this test was basically to
just verify that what was committed works the same as the patch.

Imola-330 is comparable with imola-327, checkpoints are triggered by
timeout and full_page_writes=on. But 330 was patched to to call
PreallocXlogFiles in bgwriter, per Tom's idea. According to logs, most
WAL segments are created by bgwriter in that test, and response times
look slightly better with the patch, though I'm not sure the difference
is statistically significant.

As before, the results are available at
http://community.enterprisedb.com/ldc/

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: Gregory Stark
Date:
Subject: Re: dblink connection security
Next
From: "Simon Riggs"
Date:
Subject: Re: Warm standby patch