Re: tweaking MemSet() performance - 7.4.5 - Mailing list pgsql-hackers
From | Marc Colosimo |
---|---|
Subject | Re: tweaking MemSet() performance - 7.4.5 |
Date | |
Msg-id | DF0A0E72-121C-11D9-830D-000A95A5D8B2@mitre.org Whole thread Raw |
In response to | Re: tweaking MemSet() performance - 7.4.5 (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-hackers |
On Sep 29, 2004, at 7:37 AM, Bruce Momjian wrote: > Karel Zak wrote: >> On Sat, 2004-09-25 at 23:23 +0200, Manfred Spraul wrote: >>> mcolosimo@mitre.org wrote: >>> >>>>> If the memset >>>>> bypasses the cache then the following access will cause a cache >>>>> line >>>>> miss, which can be so slow that using the faster memset can result >>>>> in a >>>>> net performance loss. >>>> >>>> Could you suggest some structs to test? If I get your meaning, I >>>> would make a loop that sets then reads from the structure. >>>> >>> Read the sources and the cpu specs. Benchmarking such problems is >>> virtually impossible. >>> I don't have OS-X, thus I checked the Linux-kernel sources: It seems >>> that the power architecture doesn't have the same problem as x86. >>> There is a special clear cacheline instruction for large memsets and >>> the >>> rest is done through carefully optimized store >>> byte/halfword/word/double >>> word sequences. >>> >>> Thus I'd check what happens if you memset not perfectly aligned >>> buffers. >>> That's another point where over-optimized functions sometimes break >>> down. If there is no slowdown, then I'd replace the postgres function >>> with the OS provided function. >>> all memory (via malloc and friends) will be aligned on OS X, unless you remove padding (which I don't think you do) >>> I'd add some __builtin_constant_p() optimizations, but I guess Tom >>> won't >>> like gcc hacks ;-) >> >> I think it cannot be problem if you write it to some .h file (in port >> directory?) as macro with "#ifdef GCC". The other thing is real >> advantage of hacks like this in practical PG usage :-) > > The reason MemSet is a win is not that the C code is great but because > it eliminates a function call. > Using MemSet really did speed things up. I think the function overhead is okay. As for real world usage, the function ExecMakeFunctionResult dropped from the top of the list when profiling (now < 1% vs 16% before)! This was doing a big nasty delete (w/ cascading), insert in a cursor. Here are results for a Mac G4 (single processor) OS 10.3, using -O2. This time the mac memset wins all around. Someone posted that this wasn't the case. PG MemSet: pgmemset_test 32 0.670u 0.020s 0:00.70 98.5% 0+0k 0+0io 0pf+0w pgmemset_test 64 1.060u 0.000s 0:01.05 100.9% 0+0k 0+0io 0pf+0w pgmemset_test 128 1.750u 0.010s 0:01.76 100.0% 0+0k 0+0io 0pf+0w pgmemset_test 512 6.010u 0.030s 0:06.04 100.0% 0+0k 0+0io 0pf+0w Mac memset: memset_test 32 0.660u 0.020s 0:00.67 101.4% 0+0k 0+0io 0pf+0w memset_test 64 0.720u 0.000s 0:00.72 100.0% 0+0k 0+0io 0pf+0w memset_test 128 0.800u 0.010s 0:00.81 100.0% 0+0k 0+0io 0pf+0w memset_test 512 1.470u 0.010s 0:01.48 100.0% 0+0k 0+0io 0pf+0w Now I check about setting a byte after I memset, and it does slow down a tiny bit. But it is the same for both MemSet and memset for under 64.
pgsql-hackers by date: