Thread: NAMEDATALEN Changes

NAMEDATALEN Changes

From

"Rod Taylor"

Date:

13 February 2002, 15:25:23

NAMEDATALEN's benchmarked are 32, 64, 128 and 512.  Attached is the
shell script I used to do it.

First row of a set is the time(1) for the pgbench -i run, second is
the actual benchmark.  Aside from the 'real' time of 64 there is a
distinct increase in time required, but not significant.

Benchmarks were run for 3000 transactions with scale factor of 5, but
only 1 client.   If there is a preferred setting for pgbench I can do
an overnight run with it.  Machine is a dual 500Mhz celery with 384MB
ram and 2 IBM Deskstars in Raid 0, and a seperate system drive.

Anything but 32 fails the 'name' check in the regression tests -- I
assume this is expected?

Don't know why 64 has a high 'real' time, but the system times are
appropriate.

NAMEDATALEN: 32

158.97 real 1.81 user 0.14 sys

80.58 real 1.30 user 3.81 sys



NAMEDATALEN: 64

248.40 real 1.85 user 0.10 sys

96.36 real 1.44 user 3.86 sys



NAMEDATALEN: 128

156.74 real 1.84 user 0.10 sys

94.36 real 1.47 user 4.01 sys



NAMEDATALEN: 512

157.99 real 1.83 user 0.12 sys

101.14 real 1.47 user 4.23 sys

--
Rod Taylor

Your eyes are weary from staring at the CRT. You feel sleepy. Notice
how restful it is to watch the cursor blink. Close your eyes. The
opinions stated above are yours. You cannot imagine why you ever felt
otherwise.

Re: NAMEDATALEN Changes

From

"Rod Taylor"

Date:

13 February 2002, 17:12:31

> > Great!  The numbers for namedatalen = 64 seem like an outlier;
perhaps
> something else going on on your system?  Did you try more than one
run?

Ran it again shortly after sending the email.  It fell in line (mid
way between 32 and 128) with Real time as would normally be expected.
The times for the other values and 64's system times were very close
to the original so I won't bother posting them again.

Re: NAMEDATALEN Changes

From

Tom Lane

Date:

13 February 2002, 17:12:34

"Rod Taylor" <rbt@zort.ca> writes:
> [ some hard data ]

Great!  The numbers for namedatalen = 64 seem like an outlier; perhaps
something else going on on your system?  Did you try more than one run?

> Anything but 32 fails the 'name' check in the regression tests -- I
> assume this is expected?

Right.  If you eyeball the actual diffs for the test you should see that
the diff is due to a long name not getting truncated where the test
expects it to be.
        regards, tom lane

Re: NAMEDATALEN Changes

From

Ian Barwick

Date:

13 February 2002, 17:57:59

On Wednesday 13 February 2002 21:07, Rod Taylor wrote:
> NAMEDATALEN's benchmarked are 32, 64, 128 and 512.  Attached is the
> shell script I used to do it.

Attached is a modified version for Linux, if anyone is interested.

Will run it overnight out of quasi-scientific interest.

Nice to have an idea what kind of effect my very long NAMEDATALEN setting 
(128) has.

Yours

Ian Barwick

Re: NAMEDATALEN Changes

From

Peter Eisentraut

Date:

13 February 2002, 19:13:18

Rod Taylor writes:

> NAMEDATALEN's benchmarked are 32, 64, 128 and 512.  Attached is the
> shell script I used to do it.

That's around a 15% performance loss for increasing it to 64 or 128.
Seems pretty scary actually.

-- 
Peter Eisentraut   peter_e@gmx.net

Re: NAMEDATALEN Changes

From

Tom Lane

Date:

13 February 2002, 20:02:17

Peter Eisentraut <peter_e@gmx.net> writes:
> That's around a 15% performance loss for increasing it to 64 or 128.
> Seems pretty scary actually.

Some of that could be bought back by fixing hashname() to not iterate
past the first \0 when calculating the hash of a NAME datum; and then
cc_hashname could go away.  Not sure how much this would buy though.

Looking closely at Rod's script, I realize that the user+sys times it is
reporting are not the backend's but the pgbench client's.  So it's
impossible to tell from this how much of the extra cost is extra I/O and
how much is CPU.  I'm actually quite surprised that the client side
shows any CPU-time difference at all; I wouldn't think it ever sees any
null-padded NAME values.
        regards, tom lane

Re: NAMEDATALEN Changes

From

Neil Conway

Date:

14 February 2002, 01:21:59

On Wed, 2002-02-13 at 20:00, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > That's around a 15% performance loss for increasing it to 64 or 128.
> > Seems pretty scary actually.
>
> Some of that could be bought back by fixing hashname() to not iterate
> past the first \0 when calculating the hash of a NAME datum; and then
> cc_hashname could go away.  Not sure how much this would buy though.

I've attached a pretty trivial patch that implements this. Instead of
automatically hashing NAMEDATALEN bytes, hashname() uses only strlen()
bytes: this should improve both the common case (small identifers, 5-10
characters long), as well as reduce the penalty when NAMEDATALEN is
increased. The patch passes the regression tests, FWIW. I didn't remove
cc_hashname() -- I'll tackle that tomorrow unless anyone objects...

I also did some pretty simple benchmarks; however, I'd appreciate it
anyone could confirm these results.

pg_bench: scale factor 1, 1 client, 10000 transactions.

hardware: p3-850, 384 MB RAM, slow laptop IDE disk

Run 1: Patch applied, NAMEDATALEN = 32

    number of transactions actually processed: 10000/10000
    tps = 19.940020(including connections establishing)
    tps = 19.940774(excluding connections establishing)

Run 2: Patch applied, NAMEDATALEN = 128

    number of transactions actually processed: 10000/10000
    tps = 20.849385(including connections establishing)
    tps = 20.850010(excluding connections establishing)

Run 3: Vanilla CVS, NAMEDATALEN = 32
(This is to check that the patch doesn't cause performance regressions
for the "common case")

    number of transactions actually processed: 10000/10000
    tps = 18.295418(including connections establishing)
    tps = 18.296191(excluding connections establishing)

The performance improvement @ NAMEDATALEN = 128 seems strange. As I
said, these benchmarks may not be particularly accurate, so I'd suggest
waiting for others to contribute results before drawing any conclusions.

Oh, and this is my first "real" Pg patch, so my apologies if I've
screwed something up. ;-)

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Attachment

hash_len.patch

Re: NAMEDATALEN Changes

From

Greg Copeland

Date:

14 February 2002, 09:03:25

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 13 February 2002 23:59, Neil Conway wrote:
> On Wed, 2002-02-13 at 20:00, Tom Lane wrote:

[perf hit comment removed]

>
> I've attached a pretty trivial patch that implements this. Instead of
> automatically hashing NAMEDATALEN bytes, hashname() uses only strlen()
> bytes: this should improve both the common case (small identifers, 5-10
> characters long), as well as reduce the penalty when NAMEDATALEN is
> increased. The patch passes the regression tests, FWIW. I didn't remove
> cc_hashname() -- I'll tackle that tomorrow unless anyone objects...
>
> I also did some pretty simple benchmarks; however, I'd appreciate it
> anyone could confirm these results.
>

Please bare with me on this as this is my first posting having any real 
content.  Please don't hang me out if I've overlooked anything and I'm 
pointing out that I'm making a rather large assumption.  Please correct as 
needed. 

The primary assumption is that the actual key lengths can be less than 
NAMEDATALEN.  That is, if the string, "shortkey" is a valid input key (??) 
which provides a key length of 8-bytes as input to the hash_any() function 
even though NAMEDATALEN may be something like 128 or larger.  If this 
assumption is correct, then wouldn't increasing the default input key size 
(NAMEDATALEN) beyond the maximum actual key length be a bug?  That is to say, 
if we have a key with only 8-bytes of data and we iterrate over 128-bytes, 
wouldn't the resulting hash be arbitrary and invalid as it would be hashing 
memory which is not reflective of the key being hashed?

If my assumptions are correct, then it sounds like using the strlen() 
implementation (assuming input keys are valid C-strings) is really the proper 
implementation short of using an adjusted min(NAMEDATALEN,strlen()) type 
approach.

[snip - var NAMEDATALEN benchmark results]

Greg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8a8Mg4lr1bpbcL6kRAlaxAJ47CO+ExL/ZMo/i6LDoetXrul9qqQCfQli3
AvqN6RJjSuAH/p/mpZ8J4JY=
=wnVM
-----END PGP SIGNATURE-----

Re: NAMEDATALEN Changes

From

Tom Lane

Date:

14 February 2002, 10:13:24

Greg Copeland <greg@CopelandConsulting.Net> writes:
> if we have a key with only 8-bytes of data and we iterrate over 128-bytes, 
> wouldn't the resulting hash be arbitrary and invalid as it would be hashing 
> memory which is not reflective of the key being hashed?

As long as we do it *consistently*, we can do it either way.  Using the
trailing nulls in the hash does alter the computed hash value --- but
we're only ever gonna compare the hash value against other hash values
computed on other NAMEs by this same routine.

This all assumes that the inputs are valid NAMEs, viz strlen <
NAMEDATALEN and no funny business beyond the first \0.  In practice,
however, if a bogus NAME were handed to us we would just as soon ignore
any characters beyond the first \0, because the ordering comparison
operators for NAME all do so (they're all based on strncmp), as do the
I/O routines etc.  So this change actually makes the system more
self-consistent not less so.
        regards, tom lane

Re: NAMEDATALEN Changes

From

nconway@klamath.dyndns.org (Neil Conway)

Date:

14 February 2002, 13:53:40

On Thu, Feb 14, 2002 at 12:59:40AM -0500, Neil Conway wrote:
> I've attached a pretty trivial patch that implements this. Instead of
> automatically hashing NAMEDATALEN bytes, hashname() uses only strlen()
> bytes: this should improve both the common case (small identifers, 5-10
> characters long), as well as reduce the penalty when NAMEDATALEN is
> increased. The patch passes the regression tests, FWIW. I didn't remove
> cc_hashname() -- I'll tackle that tomorrow unless anyone objects...

Okay, I've attached a new version that removes cc_hashname(). As with
the previous patch, this passes the regression tests. Feedback is welcome.

Cheers,

Neil

Attachment

hash_len.patch

Re: NAMEDATALEN Changes

From

Ian Barwick

Date:

14 February 2002, 16:22:18

On Wednesday 13 February 2002 23:27, Ian Barwick wrote:
> On Wednesday 13 February 2002 21:07, Rod Taylor wrote:
> > NAMEDATALEN's benchmarked are 32, 64, 128 and 512.  Attached is the
> > shell script I used to do it.
>
> Attached is a modified version for Linux, if anyone is interested.
>
> Will run it overnight out of quasi-scientific interest.
>
> Nice to have an idea what kind of effect my very long NAMEDATALEN setting
> (128) has.

Below the probably quite uninformative results, run under Linux with 2.2.16 
on an AMD K2 350Mhz with 256MB RAM, EIDE HDs and other run of the mill
hardware.

I suspect some of the normal system jobs which usually run during the night
caused the wildly varying results. Whatever else, for my purposes at least
any performance issues with differening NAMEDATALENgths are nothing much
to worry about.

NAMEDATALEN: 32
220.73 real 3.39 user 0.10 sys
110.03 real 2.77 user 4.42 sys

NAMEDATALEN: 64
205.31 real 3.55 user 0.08 sys
109.76 real 2.53 user 4.18 sys

NAMEDATALEN: 128
224.65 real 3.35 user 0.10 sys
121.30 real 2.60 user 3.89 sys

NAMEDATALEN: 256
209.48 real 3.62 user 0.11 sys
118.90 real 3.00 user 3.88 sys

NAMEDATALEN: 512
204.65 real 3.36 user 0.14 sys
115.12 real 2.54 user 3.88 sys

Ian Barwick