Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> On Sat, 2009-06-20 at 13:15 +0200, Stefan Kaltenbrunner wrote:
>>>> 8192 6m43.203s/6m48.293s
>>>> 16384 6m24.980s/6m24.116s
>>>> 32768 6m20.753s/6m22.083s
>>>> 65536 6m22.913s/6m22.449s
>>>> 1048576 6m23.765s/6m24.645s
>>
>>> The rest of the patch should have had a greater effect on tables with
>>> thinner rows. Your results match my expectations, though I read from
>>> them that we should use 16384, since that provides some gain, not just a
>>> cancellation of the regression.
>>
>> +1 for using 16384 (ie, max ring buffer size 16MB). Maybe even more.
>> It seems likely that other cases might have an even bigger issue than
>> is exhibited in the couple of test cases we have here, so we should
>> leave some margin for error. Also, there's code in there to limit the
>> ring buffer to 1/8th of shared buffers, so we don't have to worry about
>> trashing the whole buffer arena in small configurations. Any limitation
>> at all is still a step forward over previous releases as far as not
>> trashing the arena is concerned.
>
> +1. You might get away with a smaller ring with narrow tables, where
> writing 16MB of data produces more than 16MB of WAL, but I don't think
> it can ever be the other way round. Leaving a little bit of room for
> error doesn't seem like a bad idea, though.
yeah 16MB seems like the best choice given the available data and how
far we are into the release cycle.
>
> IIRC we experimented with an auto-tuning ring size when we worked on the
> original ring buffer patch. The idea is that you start with a small
> ring, and enlarge it in StrategyRejectBuffer. But that seems too risky
> for 8.4.
agreed.
>
> I wonder if using the small ring showed any benefit when the COPY is not
> WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen,
> so the small ring might have some L2 cache benefits.
I did some limited testing on that but I was unable to measure any
significant effect - especially since the difference between wal-logged
and not is rather small for a non-parallel COPY (ie in the above example
you get around 6m20s runtime for wal-logged and ~5m40s in the other case).
Stefan