On 4/14/14, 5:51 PM, Joe Conway wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 04/14/2014 03:17 PM, Jim Nasby wrote:
>> On 4/14/14, 4:50 PM, Andres Freund wrote:
>>> On 2014-04-14 14:33:03 -0700, Joe Conway wrote:
>>>> I realize there are many things that can be done to improve my
>>>> specific scenario, e.g. drop indexes before loading, change
>>>> various configs, etc. My purpose for this post is to ask if it
>>>> is really expected to get over 20 times as much WAL as heap
>>>> data?
>>>
>>> I'd bet a large percentage of this will be full page images of
>>> the index. The values you index are essentially distributed over
>>> the whole index, so you'll modifiy the same indx values
>>> repeatedly. But often enough it won't be in the same checkpoint
>>> and thus will create full page images.
>>
>> My thought exactly...
>>
>> ISTM that we should be able to push all the index inserts to the
>> end of the transaction. That should greatly reduce the amount of
>> full page writes. That would also open the door for doing all the
>> index inserts in parallel.
>
> That's the thing. I'm sure there is tuning and other things to improve
> this particular case, but creating over 20 times as much WAL as real
> data seems like pathological behavior to me.
Can you take a look at what's actually going into WAL when the wheels fall off? I think it should be pretty easy to
testthe theory that it's a ton of full page writes of index leaf pages...
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net