Re: CLOG contention, part 2 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: CLOG contention, part 2 |
Date | |
Msg-id | CA+Tgmob+xWFeuY4=kYL_sck1F2NfHcOO5cyJn2zaK_vyaqnGHw@mail.gmail.com Whole thread Raw |
In response to | Re: CLOG contention, part 2 (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: CLOG contention, part 2
|
List | pgsql-hackers |
On Fri, Jan 27, 2012 at 8:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Jan 27, 2012 at 3:16 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Fri, Jan 27, 2012 at 4:05 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> Also, I think the general approach is wrong. The only reason to have >>> these pages in shared memory is that we can control access to them to >>> prevent write/write and read/write corruption. Since these pages are >>> never written, they don't need to be in shared memory. Just read >>> each page into backend-local memory as it is needed, either >>> palloc/pfree each time or using a single reserved block for the >>> lifetime of the session. Let the kernel worry about caching them so >>> that the above mentioned reads are cheap. >> >> right -- exactly. but why stop at one page? > > If you have more than one, you need code to decide which one to evict > (just free) every time you need a new one. And every process needs to > be running this code, while the kernel is still going to need make its > own decisions for the entire system. It seems simpler to just let the > kernel do the job for everyone. Are you worried that a read syscall > is going to be slow even when the data is presumably cached in the OS? I think that would be a very legitimate worry. You're talking about copying 8kB of data because you need two bits. Even if the user/kernel mode context switch is lightning-fast, that's a lot of extra data copying. In a previous commit, 33aaa139e6302e81b4fbf2570be20188bb974c4f, we increased the number of CLOG buffers from 8 to 32 (except in very low-memory configurations). The main reason that shows a win on Nate Boley's 32-core test machine appears to be because it avoids the scenario where there are, say, 12 people simultaneously wanting to read 12 different CLOG buffers, and so 4 of them have to wait for a buffer to become available before they can even think about starting a read. The really bad latency spikes were happening not because the I/O took a long time, but because it can't be started immediately. However, these spikes are now gone, as a result of the above-commit. Probably you can get them back with enough cores, but you'll probably hit a lot of other, more serious problems first. I assume that if there's any purpose to further optimization here, it's either because the overall miss rate of the cache is too large, or because the remaining locking costs are too high. Unfortunately I haven't yet had time to look at this patch and understand what it does, or machine cycles available to benchmark it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: