Re: issue with gininsert under very high load - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: issue with gininsert under very high load |
Date | |
Msg-id | 20140212195046.GA5500@awork2.anarazel.de Whole thread Raw |
In response to | issue with gininsert under very high load (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: issue with gininsert under very high load
Re: issue with gininsert under very high load Re: issue with gininsert under very high load |
List | pgsql-hackers |
On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote: > On investigation I found that a number of processes were locked waiting for > one wedged process to end its transaction, which never happened (this > transaction should normally take milliseconds). oprofile revealed that > postgres was spending 87% of its time in s_lock(), and strace on the wedged > process revealed that it was in a tight loop constantly calling select(). It > did not respond to a SIGTERM. I attached a debugger to the wedged process > and got this backtrace: > > #0 0x000000386a2eaf37 in semop () from /lib64/libc.so.6 > #1 0x00000000006021a7 in PGSemaphoreLock () > #2 0x00000000006463a1 in LWLockAcquire () > #3 0x0000000000631997 in ReadBuffer_common () > #4 0x00000000006322ee in ReadBufferExtended () > #5 0x0000000000467475 in ginPrepareFindLeafPage () > #6 0x000000000046663f in ginInsertItemPointers () > #7 0x0000000000462adc in ginEntryInsert () > #8 0x000000000046cad0 in ginInsertCleanup () > #9 0x000000000046d7c6 in ginHeapTupleFastInsert () > #10 0x0000000000462cfa in gininsert () > #11 0x0000000000722b33 in FunctionCall6Coll () > #12 0x000000000048bdff in index_insert () > #13 0x0000000000587595 in ExecInsertIndexTuples () > #14 0x00000000005948e1 in ExecModifyTable () > #15 0x000000000057e718 in ExecProcNode () > #16 0x000000000057d512 in standard_ExecutorRun () > [...] > > > > Armed with this the client identified a single gin index (on an array of > text) that could have been involved, and on removing the index the problem > simply went away, and they have now been live on 9.3 for 36 hours without a > mishap. > > The client is attempting to create a self-contained reproducible test, but > understandably are not going to re-add the index to their production system. > > I'm not terribly familiar with the gin code, so I'm reporting this in the > hope that other people who are more familiar with it than I am might know > where to look for a potential race condition or other bug that might have > caused this. That's a deficiency of the gin fastupdate cache: a) it bases it's size on work_mem which usually makes it *far* too big b) it doesn't perform the cleanup in one go if it can get a suitable lock, but does independent locking for each entry. That usually leads to absolutely horrific performance under concurreny. The fix is usually to just turn FASTUPDATE off for all indexes. If it's done after the initial creation, the table should be vacuumed afterwards to flush it. There's previously been talk about changing the limits to something more reasonable but it got stalled in bikeshedding IIRC. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: