Re: checkpointer continuous flushing - V18 - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing - V18
Date
Msg-id alpine.DEB.2.10.1603072043070.13457@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing - V18  (Andres Freund <andres@anarazel.de>)
Responses Re: checkpointer continuous flushing - V18  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hello Andres,

>>>> (1) with 16 tablespaces (1 per table) on 1 disk : 680.0 tps
>>>>    per second avg, stddev [ min q1 median d3 max ] <=300tps
>>>>    679.6 ± 750.4 [0.0, 317.0, 371.0, 438.5, 2724.0] 19.5%
>>>>
>>>> (2) with 1 tablespace on 1 disk : 956.0 tps
>>>>    per second avg, stddev [ min q1 median d3 max ] <=300tps
>>>>    956.2 ± 796.5 [3.0, 488.0, 583.0, 742.0, 2774.0] 2.1%
>
> Well, that's not a particularly meaningful workload. You increased the 
> number of flushed to the same number of disks considerably.

It is just a simple workload designed to emphasize the effect of having 
one context shared for all table space instead of on per tablespace, 
without rewriting the patch and without a large host with multiple disks.

> For a meaningful comparison you'd have to compare using one writeback 
> context for N tablespaces on N separate disks/raids, and using N 
> writeback contexts for the same.

Sure, it would be better to do that, but that would require (1) rewriting 
the patch, which is a small work, and also (2) having access to a machine 
with a number of disks/raids, that I do NOT have available.


What happens in the 16 tb workload is that much smaller flushes are 
performed on the 16 files writen in parallel, so the tps performance is 
significantly degraded, despite the writes being sorted in each file. On 
one tb, all buffers flushed are in the same file, so flushes are much more 
effective.

When the context is shared and checkpointer buffer writes are balanced 
against table spaces, then when the limit is reached the flushing gets few 
buffers per tablespace, so this limits sequential writes to few buffers, 
hence the performance degradation.

So I can explain the performance degradation *because* the flush context 
is shared between the table spaces, which is a logical argument backed 
with experimental data, so it is better than handwaving. Given the 
available hardware, this is the best proof I can have that context should 
be per table space.

Now I cannot see how having one context per table space would have a 
significant negative performance impact.

So the logical conclusion for me is that without further experimental data 
it is better to have one context per table space.

If you have a hardware with plenty disks available for testing, that would 
provide better data, obviously.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: ExecGather() + nworkers
Next
From: "Igal @ Lucee.org"
Date:
Subject: Proposal: RETURNING primary_key()