Re: [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance - Mailing list pgsql-performance

From Ivan Voras
Subject Re: [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance
Date
Msg-id i8kfft$e5j$1@dough.gmane.org
Whole thread Raw
In response to Re: [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
On 10/07/10 02:39, Robert Haas wrote:
> On Wed, Oct 6, 2010 at 6:31 PM, Ivan Voras<ivoras@freebsd.org>  wrote:
>> On 10/04/10 20:49, Josh Berkus wrote:
>>
>>>> The other major bottleneck they ran into was a kernel one: reading from
>>>> the heap file requires a couple lseek operations, and Linux acquires a
>>>> mutex on the inode to do that. The proper place to fix this is
>>>> certainly in the kernel but it may be possible to work around in
>>>> Postgres.
>>>
>>> Or we could complain to Kernel.org.  They've been fairly responsive in
>>> the past.  Too bad this didn't get posted earlier; I just got back from
>>> LinuxCon.
>>>
>>> So you know someone who can speak technically to this issue? I can put
>>> them in touch with the Linux geeks in charge of that part of the kernel
>>> code.
>>
>> Hmmm... lseek? As in "lseek() then read() or write()" idiom? It AFAIK
>> cannot be fixed since you're modifying the global "strean position"
>> variable and something has got to lock that.
>
> Well, there are lock free algorithms using CAS, no?

Nothing is really "lock free" - in this case the algorithms simply push
the locking down to atomic operations on the CPU (and the memory bus).
Semantically, *something* has to lock the memory region for however
brief period of time and then propagate that update to other CPUs'
caches (i.e. invalidate them).

>> OTOH, pread() / pwrite() don't have to do that.
>
> Hey, I didn't know about those.  That sounds like it might be worth
> investigating, though I confess I lack a 48-core machine on which to
> measure the alleged benefit.

As Jon said, it will in any case reduce the number of these syscalls by
half, and they can be wrapped by a C macro for the platforms which don't
implement them.

http://man.freebsd.org/pread

(and just in case it's needed: pread() is a special case of preadv()).

pgsql-performance by date:

Previous
From: Srikanth K
Date:
Subject: Re: Optimizing query
Next
From: Merlin Moncure
Date:
Subject: Re: Runtime dependency from size of a bytea field