Thread: NAMEDATALEN Changes
NAMEDATALEN's benchmarked are 32, 64, 128 and 512. Attached is the shell script I used to do it. First row of a set is the time(1) for the pgbench -i run, second is the actual benchmark. Aside from the 'real' time of 64 there is a distinct increase in time required, but not significant. Benchmarks were run for 3000 transactions with scale factor of 5, but only 1 client. If there is a preferred setting for pgbench I can do an overnight run with it. Machine is a dual 500Mhz celery with 384MB ram and 2 IBM Deskstars in Raid 0, and a seperate system drive. Anything but 32 fails the 'name' check in the regression tests -- I assume this is expected? Don't know why 64 has a high 'real' time, but the system times are appropriate. NAMEDATALEN: 32 158.97 real 1.81 user 0.14 sys 80.58 real 1.30 user 3.81 sys NAMEDATALEN: 64 248.40 real 1.85 user 0.10 sys 96.36 real 1.44 user 3.86 sys NAMEDATALEN: 128 156.74 real 1.84 user 0.10 sys 94.36 real 1.47 user 4.01 sys NAMEDATALEN: 512 157.99 real 1.83 user 0.12 sys 101.14 real 1.47 user 4.23 sys -- Rod Taylor Your eyes are weary from staring at the CRT. You feel sleepy. Notice how restful it is to watch the cursor blink. Close your eyes. The opinions stated above are yours. You cannot imagine why you ever felt otherwise.
> > Great! The numbers for namedatalen = 64 seem like an outlier; perhaps > something else going on on your system? Did you try more than one run? Ran it again shortly after sending the email. It fell in line (mid way between 32 and 128) with Real time as would normally be expected. The times for the other values and 64's system times were very close to the original so I won't bother posting them again.
"Rod Taylor" <rbt@zort.ca> writes: > [ some hard data ] Great! The numbers for namedatalen = 64 seem like an outlier; perhaps something else going on on your system? Did you try more than one run? > Anything but 32 fails the 'name' check in the regression tests -- I > assume this is expected? Right. If you eyeball the actual diffs for the test you should see that the diff is due to a long name not getting truncated where the test expects it to be. regards, tom lane
On Wednesday 13 February 2002 21:07, Rod Taylor wrote: > NAMEDATALEN's benchmarked are 32, 64, 128 and 512. Attached is the > shell script I used to do it. Attached is a modified version for Linux, if anyone is interested. Will run it overnight out of quasi-scientific interest. Nice to have an idea what kind of effect my very long NAMEDATALEN setting (128) has. Yours Ian Barwick
Rod Taylor writes: > NAMEDATALEN's benchmarked are 32, 64, 128 and 512. Attached is the > shell script I used to do it. That's around a 15% performance loss for increasing it to 64 or 128. Seems pretty scary actually. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > That's around a 15% performance loss for increasing it to 64 or 128. > Seems pretty scary actually. Some of that could be bought back by fixing hashname() to not iterate past the first \0 when calculating the hash of a NAME datum; and then cc_hashname could go away. Not sure how much this would buy though. Looking closely at Rod's script, I realize that the user+sys times it is reporting are not the backend's but the pgbench client's. So it's impossible to tell from this how much of the extra cost is extra I/O and how much is CPU. I'm actually quite surprised that the client side shows any CPU-time difference at all; I wouldn't think it ever sees any null-padded NAME values. regards, tom lane
On Wed, 2002-02-13 at 20:00, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > That's around a 15% performance loss for increasing it to 64 or 128. > > Seems pretty scary actually. > > Some of that could be bought back by fixing hashname() to not iterate > past the first \0 when calculating the hash of a NAME datum; and then > cc_hashname could go away. Not sure how much this would buy though. I've attached a pretty trivial patch that implements this. Instead of automatically hashing NAMEDATALEN bytes, hashname() uses only strlen() bytes: this should improve both the common case (small identifers, 5-10 characters long), as well as reduce the penalty when NAMEDATALEN is increased. The patch passes the regression tests, FWIW. I didn't remove cc_hashname() -- I'll tackle that tomorrow unless anyone objects... I also did some pretty simple benchmarks; however, I'd appreciate it anyone could confirm these results. pg_bench: scale factor 1, 1 client, 10000 transactions. hardware: p3-850, 384 MB RAM, slow laptop IDE disk Run 1: Patch applied, NAMEDATALEN = 32 number of transactions actually processed: 10000/10000 tps = 19.940020(including connections establishing) tps = 19.940774(excluding connections establishing) Run 2: Patch applied, NAMEDATALEN = 128 number of transactions actually processed: 10000/10000 tps = 20.849385(including connections establishing) tps = 20.850010(excluding connections establishing) Run 3: Vanilla CVS, NAMEDATALEN = 32 (This is to check that the patch doesn't cause performance regressions for the "common case") number of transactions actually processed: 10000/10000 tps = 18.295418(including connections establishing) tps = 18.296191(excluding connections establishing) The performance improvement @ NAMEDATALEN = 128 seems strange. As I said, these benchmarks may not be particularly accurate, so I'd suggest waiting for others to contribute results before drawing any conclusions. Oh, and this is my first "real" Pg patch, so my apologies if I've screwed something up. ;-) Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Attachment
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 13 February 2002 23:59, Neil Conway wrote: > On Wed, 2002-02-13 at 20:00, Tom Lane wrote: [perf hit comment removed] > > I've attached a pretty trivial patch that implements this. Instead of > automatically hashing NAMEDATALEN bytes, hashname() uses only strlen() > bytes: this should improve both the common case (small identifers, 5-10 > characters long), as well as reduce the penalty when NAMEDATALEN is > increased. The patch passes the regression tests, FWIW. I didn't remove > cc_hashname() -- I'll tackle that tomorrow unless anyone objects... > > I also did some pretty simple benchmarks; however, I'd appreciate it > anyone could confirm these results. > Please bare with me on this as this is my first posting having any real content. Please don't hang me out if I've overlooked anything and I'm pointing out that I'm making a rather large assumption. Please correct as needed. The primary assumption is that the actual key lengths can be less than NAMEDATALEN. That is, if the string, "shortkey" is a valid input key (??) which provides a key length of 8-bytes as input to the hash_any() function even though NAMEDATALEN may be something like 128 or larger. If this assumption is correct, then wouldn't increasing the default input key size (NAMEDATALEN) beyond the maximum actual key length be a bug? That is to say, if we have a key with only 8-bytes of data and we iterrate over 128-bytes, wouldn't the resulting hash be arbitrary and invalid as it would be hashing memory which is not reflective of the key being hashed? If my assumptions are correct, then it sounds like using the strlen() implementation (assuming input keys are valid C-strings) is really the proper implementation short of using an adjusted min(NAMEDATALEN,strlen()) type approach. [snip - var NAMEDATALEN benchmark results] Greg -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE8a8Mg4lr1bpbcL6kRAlaxAJ47CO+ExL/ZMo/i6LDoetXrul9qqQCfQli3 AvqN6RJjSuAH/p/mpZ8J4JY= =wnVM -----END PGP SIGNATURE-----
Greg Copeland <greg@CopelandConsulting.Net> writes: > if we have a key with only 8-bytes of data and we iterrate over 128-bytes, > wouldn't the resulting hash be arbitrary and invalid as it would be hashing > memory which is not reflective of the key being hashed? As long as we do it *consistently*, we can do it either way. Using the trailing nulls in the hash does alter the computed hash value --- but we're only ever gonna compare the hash value against other hash values computed on other NAMEs by this same routine. This all assumes that the inputs are valid NAMEs, viz strlen < NAMEDATALEN and no funny business beyond the first \0. In practice, however, if a bogus NAME were handed to us we would just as soon ignore any characters beyond the first \0, because the ordering comparison operators for NAME all do so (they're all based on strncmp), as do the I/O routines etc. So this change actually makes the system more self-consistent not less so. regards, tom lane
On Thu, Feb 14, 2002 at 12:59:40AM -0500, Neil Conway wrote: > I've attached a pretty trivial patch that implements this. Instead of > automatically hashing NAMEDATALEN bytes, hashname() uses only strlen() > bytes: this should improve both the common case (small identifers, 5-10 > characters long), as well as reduce the penalty when NAMEDATALEN is > increased. The patch passes the regression tests, FWIW. I didn't remove > cc_hashname() -- I'll tackle that tomorrow unless anyone objects... Okay, I've attached a new version that removes cc_hashname(). As with the previous patch, this passes the regression tests. Feedback is welcome. Cheers, Neil
Attachment
On Wednesday 13 February 2002 23:27, Ian Barwick wrote: > On Wednesday 13 February 2002 21:07, Rod Taylor wrote: > > NAMEDATALEN's benchmarked are 32, 64, 128 and 512. Attached is the > > shell script I used to do it. > > Attached is a modified version for Linux, if anyone is interested. > > Will run it overnight out of quasi-scientific interest. > > Nice to have an idea what kind of effect my very long NAMEDATALEN setting > (128) has. Below the probably quite uninformative results, run under Linux with 2.2.16 on an AMD K2 350Mhz with 256MB RAM, EIDE HDs and other run of the mill hardware. I suspect some of the normal system jobs which usually run during the night caused the wildly varying results. Whatever else, for my purposes at least any performance issues with differening NAMEDATALENgths are nothing much to worry about. NAMEDATALEN: 32 220.73 real 3.39 user 0.10 sys 110.03 real 2.77 user 4.42 sys NAMEDATALEN: 64 205.31 real 3.55 user 0.08 sys 109.76 real 2.53 user 4.18 sys NAMEDATALEN: 128 224.65 real 3.35 user 0.10 sys 121.30 real 2.60 user 3.89 sys NAMEDATALEN: 256 209.48 real 3.62 user 0.11 sys 118.90 real 3.00 user 3.88 sys NAMEDATALEN: 512 204.65 real 3.36 user 0.14 sys 115.12 real 2.54 user 3.88 sys Ian Barwick