On 24.11.2010 07:07, Robert Haas wrote:
> Per previous threats, I spent some time tonight running oprofile
> (using the directions Tom Lane was foolish enough to provide me back
> in May). I took testlibpq.c and hacked it up to just connect to the
> server and then disconnect in a tight loop without doing anything
> useful, hoping to measure the overhead of starting up a new
> connection. Ha, ha, funny about that:
>
> 120899 18.0616 postgres AtProcExit_Buffers
> 56891 8.4992 libc-2.11.2.so memset
> 30987 4.6293 libc-2.11.2.so memcpy
> 26944 4.0253 postgres hash_search_with_hash_value
> 26554 3.9670 postgres AllocSetAlloc
> 20407 3.0487 libc-2.11.2.so _int_malloc
> 17269 2.5799 libc-2.11.2.so fread
> 13005 1.9429 ld-2.11.2.so do_lookup_x
> 11850 1.7703 ld-2.11.2.so _dl_fixup
> 10194 1.5229 libc-2.11.2.so _IO_file_xsgetn
>
> In English: the #1 overhead here is actually something that happens
> when processes EXIT, not when they start. Essentially all the time is
> in two lines:
>
> 56920 6.6006 : for (i = 0; i< NBuffers; i++)
> : {
> 98745 11.4507 : if (PrivateRefCount[i] != 0)
Oh, that's quite surprising.
> Anything we can do about this? That's a lot of overhead, and it'd be
> a lot worse on a big machine with 8GB of shared_buffers.
Micro-optimizing that search for the non-zero value helps a little bit
(attached). Reduces the percentage shown by oprofile from about 16% to
12% on my laptop.
For bigger gains, I think you need to somehow make the PrivateRefCount
smaller. Perhaps only use one byte for each buffer instead of int32, and
use some sort of an overflow list for the rare case that a buffer is
pinned more than 255 times. Or make it a hash table instead of a simple
lookup array. But whatever you do, you have to be very careful to not
add overhead to PinBuffer/UnPinBuffer, those can already be quite high
in oprofile reports of real applications. It might be worth
experimenting a bit, at the moment PrivateRefCount takes up 5MB of
memory per 1GB of shared_buffers. Machines with a high shared_buffers
setting have no shortage of memory, but a large array like that might
waste a lot of precious CPU cache.
Now, the other question is if this really matters. Even if we eliminate
that loop in AtProcExit_Buffers altogether, is connect/disconnect still
be so slow that you have to use a connection pooler if you do that a lot?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com