Re: tweaking MemSet() performance - 7.4.5 - Mailing list pgsql-hackers

From Manfred Spraul
Subject Re: tweaking MemSet() performance - 7.4.5
Date
Msg-id 414C567C.3060503@colorfullife.com
Whole thread Raw
In response to Re: tweaking MemSet() performance - 7.4.5  (Marc Colosimo <mcolosimo@mitre.org>)
Responses Re: tweaking MemSet() performance - 7.4.5  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Marc Colosimo wrote:

> Oops, I used the same setting as in the old hacking message (-O2, gcc 
> 3.3). If I understand what you are saying, then it turns out yes, PG's 
> MemSet is faster for smaller blocksizes (see below, between 32 and 
> 64). I just replaced the whole MemSet with memset and it is not very 
> low when I profile.

Could you check what the OS-X memset function does internally?
One trick to speed up memset it to bypass the cache and bulk-write 
directly from write buffers to main memory. i386 cpus support that and 
in microbenchmarks it's 3 times faster (or something like that). 
Unfortunately it's a loss in real-world tests: Typically a structure is 
initialized with memset and then immediately accessed. If the memset 
bypasses the cache then the following access will cause a cache line 
miss, which can be so slow that using the faster memset can result in a 
net performance loss.

> I could squeeze more out of it if I spent more time trying to 
> understand it (change MEMSET_LOOP_LIMIT to 32 and then add memset 
> after that?). I'm now working one understanding  Spin Locks and 
> friends. Putting in a sync call (in s_lock.h) is really a time killer 
> and bad for performance (it takes up 35 cycles).
>
That's the price you pay for weakly ordered memory access.
Linux on ppc uses eieio, on ppc64 lwsync is used. Could you check if 
they are faster?

--   Manfred


pgsql-hackers by date:

Previous
From: "Jeroen T. Vermeulen"
Date:
Subject: Re: transaction idle timeout in 7.4.5 and 8.0.0beta2
Next
From: Tom Lane
Date:
Subject: Re: transaction idle timeout in 7.4.5 and 8.0.0beta2