Re: issue with gininsert under very high load - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: issue with gininsert under very high load |
Date | |
Msg-id | 52FBDBF5.2060800@dunslane.net Whole thread Raw |
In response to | Re: issue with gininsert under very high load (Andres Freund <andres@2ndquadrant.com>) |
List | pgsql-hackers |
On 02/12/2014 02:50 PM, Andres Freund wrote: > On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: >> On investigation I found that a number of processes were locked waiting for >> one wedged process to end its transaction, which never happened (this >> transaction should normally take milliseconds). oprofile revealed that >> postgres was spending 87% of its time in s_lock(), and strace on the wedged >> process revealed that it was in a tight loop constantly calling select(). It >> did not respond to a SIGTERM. I attached a debugger to the wedged process >> and got this backtrace: >> >> #0 0x000000386a2eaf37 in semop () from /lib64/libc.so.6 >> #1 0x00000000006021a7 in PGSemaphoreLock () >> #2 0x00000000006463a1 in LWLockAcquire () >> #3 0x0000000000631997 in ReadBuffer_common () >> #4 0x00000000006322ee in ReadBufferExtended () >> #5 0x0000000000467475 in ginPrepareFindLeafPage () >> #6 0x000000000046663f in ginInsertItemPointers () >> #7 0x0000000000462adc in ginEntryInsert () >> #8 0x000000000046cad0 in ginInsertCleanup () >> #9 0x000000000046d7c6 in ginHeapTupleFastInsert () >> #10 0x0000000000462cfa in gininsert () >> #11 0x0000000000722b33 in FunctionCall6Coll () >> #12 0x000000000048bdff in index_insert () >> #13 0x0000000000587595 in ExecInsertIndexTuples () >> #14 0x00000000005948e1 in ExecModifyTable () >> #15 0x000000000057e718 in ExecProcNode () >> #16 0x000000000057d512 in standard_ExecutorRun () >> [...] >> >> >> >> Armed with this the client identified a single gin index (on an array of >> text) that could have been involved, and on removing the index the problem >> simply went away, and they have now been live on 9.3 for 36 hours without a >> mishap. >> >> The client is attempting to create a self-contained reproducible test, but >> understandably are not going to re-add the index to their production system. >> >> I'm not terribly familiar with the gin code, so I'm reporting this in the >> hope that other people who are more familiar with it than I am might know >> where to look for a potential race condition or other bug that might have >> caused this. > That's a deficiency of the gin fastupdate cache: a) it bases it's size > on work_mem which usually makes it *far* too big b) it doesn't perform the > cleanup in one go if it can get a suitable lock, but does independent > locking for each entry. That usually leads to absolutely horrific > performance under concurreny. > > The fix is usually to just turn FASTUPDATE off for all indexes. If it's > done after the initial creation, the table should be vacuumed afterwards > to flush it. > > There's previously been talk about changing the limits to something more > reasonable but it got stalled in bikeshedding IIRC. > So this doesn't work just when it might be most useful? cheers andrew
pgsql-hackers by date: