Re: tweaking MemSet() performance - 7.4.5 - Mailing list pgsql-hackers

From Marc Colosimo
Subject Re: tweaking MemSet() performance - 7.4.5
Date
Msg-id DF0A0E72-121C-11D9-830D-000A95A5D8B2@mitre.org
Whole thread Raw
In response to Re: tweaking MemSet() performance - 7.4.5  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
On Sep 29, 2004, at 7:37 AM, Bruce Momjian wrote:

> Karel Zak wrote:
>> On Sat, 2004-09-25 at 23:23 +0200, Manfred Spraul wrote:
>>> mcolosimo@mitre.org wrote:
>>>
>>>>> If the memset
>>>>> bypasses the cache then the following access will cause a cache 
>>>>> line
>>>>> miss, which can be so slow that using the faster memset can result 
>>>>> in a
>>>>> net performance loss.
>>>>
>>>> Could you suggest some structs to test? If I get your meaning, I 
>>>> would make a loop that sets then reads from the structure.
>>>>
>>> Read the sources and the cpu specs. Benchmarking such problems is
>>> virtually impossible.
>>> I don't have OS-X, thus I checked the Linux-kernel sources: It seems
>>> that the power architecture doesn't have the same problem as x86.
>>> There is a special clear cacheline instruction for large memsets and 
>>> the
>>> rest is done through carefully optimized store 
>>> byte/halfword/word/double
>>> word sequences.
>>>
>>> Thus I'd check what happens if you memset not perfectly aligned 
>>> buffers.
>>> That's another point where over-optimized functions sometimes break
>>> down. If there is no slowdown, then I'd replace the postgres function
>>> with the OS provided function.
>>>

all memory (via malloc and friends) will be aligned on OS X, unless you 
remove padding (which I don't think you do)

>>> I'd add some __builtin_constant_p() optimizations, but I guess Tom 
>>> won't
>>> like gcc hacks ;-)
>>
>> I think it cannot be problem if you write it to some .h file (in port
>> directory?) as macro with "#ifdef GCC". The other thing is real
>> advantage of hacks like this in practical PG usage :-)
>
> The reason MemSet is a win is not that the C code is great but because
> it eliminates a function call.
>

Using MemSet really did speed things up. I think the function overhead 
is okay. As for real world usage, the function ExecMakeFunctionResult 
dropped from the top of the list when profiling (now < 1% vs 16% 
before)!  This was doing a big nasty delete (w/ cascading), insert in a 
cursor.

Here are results for a Mac G4 (single processor) OS 10.3, using -O2. 
This time the mac memset wins all around. Someone posted that this 
wasn't the case.

PG MemSet:
pgmemset_test 32
0.670u 0.020s 0:00.70 98.5%     0+0k 0+0io 0pf+0w
pgmemset_test 64
1.060u 0.000s 0:01.05 100.9%    0+0k 0+0io 0pf+0w
pgmemset_test 128
1.750u 0.010s 0:01.76 100.0%    0+0k 0+0io 0pf+0w
pgmemset_test 512
6.010u 0.030s 0:06.04 100.0%    0+0k 0+0io 0pf+0w

Mac memset:
memset_test 32
0.660u 0.020s 0:00.67 101.4%    0+0k 0+0io 0pf+0w
memset_test 64
0.720u 0.000s 0:00.72 100.0%    0+0k 0+0io 0pf+0w
memset_test 128
0.800u 0.010s 0:00.81 100.0%    0+0k 0+0io 0pf+0w
memset_test 512
1.470u 0.010s 0:01.48 100.0%    0+0k 0+0io 0pf+0w

Now I check about setting a byte after I memset, and it does slow down 
a tiny bit. But it is the same for both MemSet and memset for under 64.




pgsql-hackers by date:

Previous
From: Chris Browne
Date:
Subject: Re: AIX and V8 beta 3
Next
From: Tom Lane
Date:
Subject: Re: Vacuum writes on empty system