Re: issue with gininsert under very high load - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: issue with gininsert under very high load
Date
Msg-id 52FBDBF5.2060800@dunslane.net
Whole thread Raw
In response to Re: issue with gininsert under very high load  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 02/12/2014 02:50 PM, Andres Freund wrote:
> On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
>> On investigation I found that a number of processes were locked waiting for
>> one wedged process to end its transaction, which never happened (this
>> transaction should normally take milliseconds). oprofile revealed that
>> postgres was spending 87% of its time in s_lock(), and strace on the wedged
>> process revealed that it was in a tight loop constantly calling select(). It
>> did not respond to a SIGTERM. I attached a debugger to the wedged process
>> and got this backtrace:
>>
>>     #0  0x000000386a2eaf37 in semop () from /lib64/libc.so.6
>>     #1  0x00000000006021a7 in PGSemaphoreLock ()
>>     #2  0x00000000006463a1 in LWLockAcquire ()
>>     #3  0x0000000000631997 in ReadBuffer_common ()
>>     #4  0x00000000006322ee in ReadBufferExtended ()
>>     #5  0x0000000000467475 in ginPrepareFindLeafPage ()
>>     #6  0x000000000046663f in ginInsertItemPointers ()
>>     #7  0x0000000000462adc in ginEntryInsert ()
>>     #8  0x000000000046cad0 in ginInsertCleanup ()
>>     #9  0x000000000046d7c6 in ginHeapTupleFastInsert ()
>>     #10 0x0000000000462cfa in gininsert ()
>>     #11 0x0000000000722b33 in FunctionCall6Coll ()
>>     #12 0x000000000048bdff in index_insert ()
>>     #13 0x0000000000587595 in ExecInsertIndexTuples ()
>>     #14 0x00000000005948e1 in ExecModifyTable ()
>>     #15 0x000000000057e718 in ExecProcNode ()
>>     #16 0x000000000057d512 in standard_ExecutorRun ()
>>     [...]
>>
>>
>>
>> Armed with this the client identified a single gin index (on an array of
>> text) that could have been involved, and on removing the index the problem
>> simply went away, and they have now been live on 9.3 for 36 hours without a
>> mishap.
>>
>> The client is attempting to create a self-contained reproducible test, but
>> understandably are not going to re-add the index to their production system.
>>
>> I'm not terribly familiar with the gin code, so I'm reporting this in the
>> hope that other people who are more familiar with it than I am might know
>> where to look for a potential race condition or other bug that might have
>> caused this.
> That's a deficiency of the gin fastupdate cache: a) it bases it's size
> on work_mem which usually makes it *far* too big b) it doesn't perform the
> cleanup in one go if it can get a suitable lock, but does independent
> locking for each entry. That usually leads to absolutely horrific
> performance under concurreny.
>
> The fix is usually to just turn FASTUPDATE off for all indexes. If it's
> done after the initial creation, the table should be vacuumed afterwards
> to flush it.
>
> There's previously been talk about changing the limits to something more
> reasonable but it got stalled in bikeshedding IIRC.
>

So this doesn't work just when it might be most useful?

cheers

andrew



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: issue with gininsert under very high load
Next
From: Alvaro Herrera
Date:
Subject: Re: truncating pg_multixact/members