Re: tweaking MemSet() performance - 7.4.5 - Mailing list pgsql-hackers
From | Marc Colosimo |
---|---|
Subject | Re: tweaking MemSet() performance - 7.4.5 |
Date | |
Msg-id | E60A72E0-08E7-11D9-B617-000A95A5D8B2@mitre.org Whole thread Raw |
In response to | Re: tweaking MemSet() performance - 7.4.5 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: tweaking MemSet() performance - 7.4.5
|
List | pgsql-hackers |
On Sep 17, 2004, at 3:55 PM, Tom Lane wrote: > Marc Colosimo <mcolosimo@mitre.org> writes: >> I'm using 7.4.5 on Mac OS X (G5) and was profiling it to see why it is >> SO SLOW at committing inserts and deletes into a large database. One >> of the many slowdowns was from MemSet. I found an old (2002) thread >> about this and retried the tests (see below). The main point is that >> the system memset crushes pg's!! > > Hmm. I tried to duplicate this on my G4 laptop, and found that they > were more or less on a par for small-to-middling block sizes (using > "gcc -O2"). Darwin's memset code must have some additional tweaks for > use on G5 hardware. Good for Apple --- this is the sort of thing that > OS vendors *ought* to be doing. The fact that we can beat the system > memset on so many platforms is an indictment of those platforms. > >> Is it possible to add a define to call >> the system memset at build time! This probably isn't the case on other >> systems. > > Feel free to hack the definition of MemSet in src/include/c.h. See the > comments for it for more context. > > Note that for small compile-time-constant block sizes (a case your test > program doesn't test, but it's common in pgsql), gcc with a > sufficiently > high optimization setting can unroll the loop into a linear sequence of > words zeroings. I would expect that to beat the system memset up to a > few dozen words, no matter how tense the memset coding is. So you > probably want to think in terms of reducing MEMSET_LOOP_LIMIT rather > than diking out the macro code altogether. Or maybe reduce MemSet to > "memset(...)" but leave MemSetAligned and/or MemSetTest/MemSetLoop > as-is. In any case, reporting results without mentioning the compiler > and optimization level in use isn't going to convince anybody ... > Oops, I used the same setting as in the old hacking message (-O2, gcc 3.3). If I understand what you are saying, then it turns out yes, PG's MemSet is faster for smaller blocksizes (see below, between 32 and 64). I just replaced the whole MemSet with memset and it is not very low when I profile. I could squeeze more out of it if I spent more time trying to understand it (change MEMSET_LOOP_LIMIT to 32 and then add memset after that?). I'm now working one understanding Spin Locks and friends. Putting in a sync call (in s_lock.h) is really a time killer and bad for performance (it takes up 35 cycles). run on a single processor G5 (1.8Gz, other was on a DP 2Gz G5) pgMemSet: * 4 0.070u 0.000s 0:00.15 46.6% 0+0k 0+0io 0pf+0w * 8 0.090u 0.000s 0:00.16 56.2% 0+0k 0+0io 0pf+0w * 16 0.120u 0.000s 0:00.17 70.5% 0+0k 0+0io 0pf+0w * 32 0.180u 0.000s 0:00.29 62.0% 0+0k 0+0io 0pf+0w * 64 0.450u 0.000s 0:00.92 48.9% 0+0k 0+0io 0pf+0w memset: * 4 0.170u 0.010s 0:00.44 40.9% 0+0k 0+0io 0pf+0w * 8 0.190u 0.000s 0:00.42 45.2% 0+0k 0+0io 0pf+0w * 16 0.190u 0.010s 0:00.39 51.2% 0+0k 0+0io 0pf+0w * 32 0.200u 0.000s 0:00.39 51.2% 0+0k 0+0io 0pf+0w * 64 0.260u 0.000s 0:00.38 68.4% 0+0k 0+0io 0pf+0w Marc
pgsql-hackers by date: