Re: CLOG contention, part 2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: CLOG contention, part 2
Date
Msg-id CA+Tgmob+xWFeuY4=kYL_sck1F2NfHcOO5cyJn2zaK_vyaqnGHw@mail.gmail.com
Whole thread Raw
In response to Re: CLOG contention, part 2  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: CLOG contention, part 2
List pgsql-hackers
On Fri, Jan 27, 2012 at 8:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Jan 27, 2012 at 3:16 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Fri, Jan 27, 2012 at 4:05 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> Also, I think the general approach is wrong.  The only reason to have
>>> these pages in shared memory is that we can control access to them to
>>> prevent write/write and read/write corruption.  Since these pages are
>>> never written, they don't need to be in shared memory.   Just read
>>> each page into backend-local memory as it is needed, either
>>> palloc/pfree each time or using a single reserved block for the
>>> lifetime of the session.  Let the kernel worry about caching them so
>>> that the above mentioned reads are cheap.
>>
>> right -- exactly.  but why stop at one page?
>
> If you have more than one, you need code to decide which one to evict
> (just free) every time you need a new one.  And every process needs to
> be running this code, while the kernel is still going to need make its
> own decisions for the entire system.  It seems simpler to just let the
> kernel do the job for everyone.  Are you worried that a read syscall
> is going to be slow even when the data is presumably cached in the OS?

I think that would be a very legitimate worry.  You're talking about
copying 8kB of data because you need two bits.  Even if the
user/kernel mode context switch is lightning-fast, that's a lot of
extra data copying.

In a previous commit, 33aaa139e6302e81b4fbf2570be20188bb974c4f, we
increased the number of CLOG buffers from 8 to 32 (except in very
low-memory configurations).  The main reason that shows a win on Nate
Boley's 32-core test machine appears to be because it avoids the
scenario where there are, say, 12 people simultaneously wanting to
read 12 different CLOG buffers, and so 4 of them have to wait for a
buffer to become available before they can even think about starting a
read.  The really bad latency spikes were happening not because the
I/O took a long time, but because it can't be started immediately.
However, these spikes are now gone, as a result of the above-commit.
Probably you can get them back with enough cores, but you'll probably
hit a lot of other, more serious problems first.

I assume that if there's any purpose to further optimization here,
it's either because the overall miss rate of the cache is too large,
or because the remaining locking costs are too high.  Unfortunately I
haven't yet had time to look at this patch and understand what it
does, or machine cycles available to benchmark it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Group commit, revised
Next
From: Heikki Linnakangas
Date:
Subject: Re: [COMMITTERS] pgsql: Make group commit more effective.