Re: issue with gininsert under very high load - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: issue with gininsert under very high load
Date
Msg-id 52FCEBBF.2070908@vmware.com
Whole thread Raw
In response to Re: issue with gininsert under very high load  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: issue with gininsert under very high load  (Andrew Dunstan <andrew@dunslane.net>)
Re: issue with gininsert under very high load  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 02/13/2014 05:40 PM, Andrew Dunstan wrote:
>
> On 02/12/2014 04:04 PM, Heikki Linnakangas wrote:
>> On 02/12/2014 10:50 PM, Andres Freund wrote:
>>> On February 12, 2014 9:33:38 PM CET, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Andres Freund <andres@2ndquadrant.com> writes:
>>>>> On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
>>>>>> On investigation I found that a number of processes were locked
>>>> waiting for
>>>>>> one wedged process to end its transaction, which never happened
>>>> (this
>>>>>> transaction should normally take milliseconds). oprofile revealed
>>>> that
>>>>>> postgres was spending 87% of its time in s_lock(), and strace on the
>>>> wedged
>>>>>> process revealed that it was in a tight loop constantly calling
>>>> select(). It
>>>>>> did not respond to a SIGTERM.
>>>>
>>>>> That's a deficiency of the gin fastupdate cache: a) it bases it's
>>>> size
>>>>> on work_mem which usually makes it *far* too big b) it doesn't
>>>> perform the
>>>>> cleanup in one go if it can get a suitable lock, but does independent
>>>>> locking for each entry. That usually leads to absolutely horrific
>>>>> performance under concurreny.
>>>>
>>>> I'm not sure that what Andrew is describing can fairly be called a
>>>> concurrent-performance problem.  It sounds closer to a stuck lock.
>>>> Are you sure you've diagnosed it correctly?
>>>
>>> No. But I've several times seen similar backtraces where it wasn't
>>> actually stuck, just livelocked. I'm just on my mobile right now, but
>>> afair Andrew described a loop involving lots of semaphores and
>>> spinlock, that shouldn't be the case if it were actually stuck.
>>> If there dozens of processes waiting on the same lock, cleaning up a
>>> large amount of items one by one, it's not surprising if its
>>> dramatically slow.
>>
>> Perhaps we should use a lock to enforce that only one process tries to
>> clean up the pending list at a time.
>
> Is that going to serialize all these inserts?

It will serialize the cleanup process, which moves entries from the 
pending list to the tree proper. But that's better than the current 
situation. Currently, when two processes attempt it, they will both try 
to insert into the GIN tree, but one of them will notice that the other 
one already did the cleanup, and bail out. So only one process 
contributes to progress, while the others just waste their effort.

The processes should try to get the lock, and just give up if it's 
already held rather than wait. If someone else is already doing the 
cleanup, there's no need for the current process to do it.

- Heikki



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: old warning in docs
Next
From: Andres Freund
Date:
Subject: Re: Changeset Extraction v7.6