Re: Spinlocks, yet again: analysis and proposed patches - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Spinlocks, yet again: analysis and proposed patches
Date
Msg-id 8764t50yiz.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Spinlocks, yet again: analysis and proposed patches  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Spinlocks, yet again: analysis and proposed patches
Re: Spinlocks, yet again: analysis and proposed patches
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > On contented case you'll want task switch anyway, so the futex
> > managing should not matter.
> 
> No, we DON'T want a task switch.  That's the entire point: in a
> multiprocessor, it's a good bet that the spinlock is held by a task
> running on another processor, and doing a task switch will take orders
> of magnitude longer than just spinning until the lock is released.
> You should yield only after spinning long enough to make it a strong
> probability that the spinlock is held by a process that's lost the
> CPU and needs to be rescheduled.

Does the futex code make any attempt to record the CPU of the process grabbing
the lock? Clearly it wouldn't be a guarantee of anything but if it's only used
for short-lived spinlocks while acquiring longer lived locks then maybe?

> No; that page still says specifically "So a process calling
> sched_yield() now must wait until all other runnable processes in the
> system have used up their time slices before it will get the processor
> again."  I can prove that that is NOT what happens, at least not on
> a multi-CPU Opteron with current FC4 kernel.  However, if the newer
> kernels penalize a process calling sched_yield as heavily as this page
> claims, then it's not what we want anyway ...

Well it would be no worse than select or any other random i/o syscall.

It seems to me what you've found is an outright bug in the linux scheduler.
Perhaps posting it to linux-kernel would be worthwhile.

-- 
greg



pgsql-hackers by date:

Previous
From: Roman Neuhauser
Date:
Subject: bug #1702: nested composite types in plpgsql
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Hard drive failure leads to corrupt db