Thread: gperf anyone?

gperf anyone?

From
Peter Eisentraut
Date:
A while ago I played around with gperf (GNU perfect hash function
generator), abusing the keyword lookup in parser/keyword.c as playground.
Now before I delete this I was wondering if this would perhaps be of use
to the general public. I don't know how huge the speed advantage of this
is, I'm sure the parser/scanner speed is the least of our problems. But I
thunk especially ecpg could benefit from this. Btw., gperf is used by GCC,
so it's not a toy.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] gperf anyone?

From
The Hermit Hacker
Date:
On Wed, 19 Jan 2000, Peter Eisentraut wrote:

> A while ago I played around with gperf (GNU perfect hash function
> generator), abusing the keyword lookup in parser/keyword.c as playground.
> Now before I delete this I was wondering if this would perhaps be of use
> to the general public. I don't know how huge the speed advantage of this
> is, I'm sure the parser/scanner speed is the least of our problems. But I
> thunk especially ecpg could benefit from this. Btw., gperf is used by GCC,
> so it's not a toy.

Okay, that post told me absolutely nothing :(  What would we use it
for?  What is its purpose?

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] gperf anyone?

From
Bruce Momjian
Date:
[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> A while ago I played around with gperf (GNU perfect hash function
> generator), abusing the keyword lookup in parser/keyword.c as playground.
> Now before I delete this I was wondering if this would perhaps be of use
> to the general public. I don't know how huge the speed advantage of this
> is, I'm sure the parser/scanner speed is the least of our problems. But I
> thunk especially ecpg could benefit from this. Btw., gperf is used by GCC,
> so it's not a toy.

keywords are a fixed array, with a binary search to find a match.  Could
gperf be faster?  We also can not distribute GNU code.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] gperf anyone?

From
Alfred Perlstein
Date:
* Bruce Momjian <pgman@candle.pha.pa.us> [000118 17:14] wrote:
> [Charset ISO-8859-1 unsupported, filtering to ASCII...]
> > A while ago I played around with gperf (GNU perfect hash function
> > generator), abusing the keyword lookup in parser/keyword.c as playground.
> > Now before I delete this I was wondering if this would perhaps be of use
> > to the general public. I don't know how huge the speed advantage of this
> > is, I'm sure the parser/scanner speed is the least of our problems. But I
> > thunk especially ecpg could benefit from this. Btw., gperf is used by GCC,
> > so it's not a toy.
> 
> keywords are a fixed array, with a binary search to find a match.  Could
> gperf be faster?  

yes:

~ % gperf 
/* starting time is 21:10:49 */
postgresql 
really
kicks
butt
/* C code produced by gperf version 2.1 (K&R C version) */
/* Command-line: gperf  */



#define MIN_WORD_LENGTH 4
#define MAX_WORD_LENGTH 10
#define MIN_HASH_VALUE 4
#define MAX_HASH_VALUE 10
/*   4 keywords   7 is the maximum key range
*/

static int
hash (str, len)    register char *str;    register unsigned int  len;
{ static unsigned char hash_table[] =   {    10, 10, 10, 10, 10, 10, 10, 10, 10, 10,    10, 10, 10, 10, 10, 10, 10, 10,
10,10,    10, 10, 10, 10, 10, 10, 10, 10, 10, 10,    10, 10, 10, 10, 10, 10, 10, 10, 10, 10,    10, 10, 10, 10, 10, 10,
10,10, 10, 10,    10, 10, 10, 10, 10, 10, 10, 10, 10, 10,    10, 10, 10, 10, 10, 10, 10, 10, 10, 10,    10, 10, 10, 10,
10,10, 10, 10, 10, 10,    10, 10, 10, 10, 10, 10, 10, 10, 10, 10,    10, 10, 10, 10, 10, 10, 10, 10,  0, 10,    10, 10,
10,10, 10, 10, 10,  0,  0, 10,    10, 10,  0, 10,  0,  0,  0, 10, 10, 10,    10,  0, 10, 10, 10, 10, 10, 10,   };
returnlen + hash_table[str[len - 1]] + hash_table[str[0]];
 
}

char *
in_word_set (str, len)    register char *str;    register unsigned int len;
{
 static char * wordlist[] =   {     "", "", "", "",      "butt",      "kicks",      "really",      "", "", "",
"postgresql",   };
 
 if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)   {     register int key = hash (str, len);
     if (key <= MAX_HASH_VALUE && key >= MIN_HASH_VALUE)       {         register char *s = wordlist[key];
         if (*s == *str && !strcmp (str + 1, s + 1))           return s;       }   } return 0;
}
/* ending time is 21:10:58 */

A perfect hash should be much faster at the trivial expense of some space.

>From the distribution:While teaching a data structures course at University of California,Irvine, I developed a
programcalled GPERF that generates perfect hashfunctions for sets of key words.  A perfect hash function is simply:
   A hash function and a data structure that allows         recognition of a key word in a set of words using
exactly1 probe into the data structure.
 


> We also can not distribute GNU code.

I'm pretty sure that the code the gperf outputs is not covered under the
GPL, just gperf itself.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


Re: [HACKERS] gperf anyone?

From
Bruce Momjian
Date:
> A perfect hash should be much faster at the trivial expense of some space.
> 
> >From the distribution:
>  While teaching a data structures course at University of California,
>  Irvine, I developed a program called GPERF that generates perfect hash
>  functions for sets of key words.  A perfect hash function is simply:
>  
>           A hash function and a data structure that allows
>           recognition of a key word in a set of words using
>           exactly 1 probe into the data structure.
> 
> 
> > We also can not distribute GNU code.
> 
> I'm pretty sure that the code the gperf outputs is not covered under the
> GPL, just gperf itself.
> 

Can you run our keywords.c using our method and gperf and see if there
is any speed difference?


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] gperf anyone?

From
Don Baccus
Date:
At 07:36 PM 1/18/00 -0500, Bruce Momjian wrote:
>[Charset ISO-8859-1 unsupported, filtering to ASCII...]
>> A while ago I played around with gperf (GNU perfect hash function
>> generator), abusing the keyword lookup in parser/keyword.c as playground.
>> Now before I delete this I was wondering if this would perhaps be of use
>> to the general public. I don't know how huge the speed advantage of this
>> is, I'm sure the parser/scanner speed is the least of our problems. But I
>> thunk especially ecpg could benefit from this. Btw., gperf is used by GCC,
>> so it's not a toy.
>
>keywords are a fixed array, with a binary search to find a match.  Could
>gperf be faster?  We also can not distribute GNU code.

I wondered about this last, i.e. the use of GNU code since Postgres
is licensed differently.

The reality is that looking up keywords form a tiny fraction of the
time spent by any language system I can think of.  The current binary
search on a fixed array might be faster, might be slower than a perfect
hash on a particular machine depending on the calculation done to
do the hashing.

Whether faster or slower, though, I can't imagine either method taking
noticably more than 0% of the total time to process a query, even the
most simple queries.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] gperf anyone?

From
Tom Lane
Date:
> At 07:36 PM 1/18/00 -0500, Bruce Momjian wrote:
> I wondered about this last, i.e. the use of GNU code since Postgres
> is licensed differently.

AFAIK this is no worse than using flex or bison --- the source code of
gperf is GPL'ed, but its output is not.

Don Baccus <dhogaza@pacifier.com> writes:
> Whether faster or slower, though, I can't imagine either method taking
> noticably more than 0% of the total time to process a query, even the
> most simple queries.

I agree with Don that the performance benefit is likely to be
unmeasurable.  Still, there could be a win: we currently have to modify
keywords.c by hand every time we have to add/delete a keyword.  Does
gperf offer any aid for maintaining the keyword list?  If so, that'd
be sufficient reason to switch to it...
        regards, tom lane


Re: [HACKERS] gperf anyone?

From
Alfred Perlstein
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [000118 21:10] wrote:
> > At 07:36 PM 1/18/00 -0500, Bruce Momjian wrote:
> > I wondered about this last, i.e. the use of GNU code since Postgres
> > is licensed differently.
> 
> AFAIK this is no worse than using flex or bison --- the source code of
> gperf is GPL'ed, but its output is not.
> 
> Don Baccus <dhogaza@pacifier.com> writes:
> > Whether faster or slower, though, I can't imagine either method taking
> > noticably more than 0% of the total time to process a query, even the
> > most simple queries.
> 
> I agree with Don that the performance benefit is likely to be
> unmeasurable.  Still, there could be a win: we currently have to modify
> keywords.c by hand every time we have to add/delete a keyword.  Does
> gperf offer any aid for maintaining the keyword list?  If so, that'd
> be sufficient reason to switch to it...

any minimal performance boost shows over time, unfortunatly using gperf
will require that you either:

a) require gperf to be installed on a system that compiles postgresql
b) manually maintain the gperf compiled files in your CVS repo  (sort of like syscalls in FreeBSD)

Option B is not that bad at the expense of additional contributor
overhead.

I hope to be able to present some soft of bench to determine
if gperf is worth the additional effort of maintainance in the
near future.

in the meanwhile, happy hacking. :)

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


Re: [HACKERS] gperf anyone?

From
Michael Meskes
Date:
On Wed, Jan 19, 2000 at 12:38:18AM +0100, Peter Eisentraut wrote:
> is, I'm sure the parser/scanner speed is the least of our problems. But I
> thunk especially ecpg could benefit from this. Btw., gperf is used by GCC,

Why do you think ecpg would benefit more from it than the backend? :-)
Both parser are mostly the same. In fact running ecpg appears to use a lot
less time than the subsequent gcc run.

Michael
-- 
Michael Meskes                         | Go SF 49ers!
Th.-Heuss-Str. 61, D-41812 Erkelenz    | Go Rhein Fire!
Tel.: (+49) 2431/72651                 | Use Debian GNU/Linux!
Email: Michael@Fam-Meskes.De           | Use PostgreSQL!


Re: [HACKERS] gperf anyone?

From
Michael Meskes
Date:
On Tue, Jan 18, 2000 at 09:32:49PM -0800, Alfred Perlstein wrote:
> any minimal performance boost shows over time, unfortunatly using gperf
> will require that you either:
> 
> a) require gperf to be installed on a system that compiles postgresql
> b) manually maintain the gperf compiled files in your CVS repo
>    (sort of like syscalls in FreeBSD)

Sounds like the way we handle the bison/fley files. Why not do this for
gperf as well.

Michael
-- 
Michael Meskes                         | Go SF 49ers!
Th.-Heuss-Str. 61, D-41812 Erkelenz    | Go Rhein Fire!
Tel.: (+49) 2431/72651                 | Use Debian GNU/Linux!
Email: Michael@Fam-Meskes.De           | Use PostgreSQL!


Re: [HACKERS] gperf anyone?

From
Don Baccus
Date:
At 11:40 PM 1/18/00 -0500, Tom Lane wrote:

>I agree with Don that the performance benefit is likely to be
>unmeasurable.  Still, there could be a win: we currently have to modify
>keywords.c by hand every time we have to add/delete a keyword.  Does
>gperf offer any aid for maintaining the keyword list?  If so, that'd
>be sufficient reason to switch to it...

If so, yeah, it might make sense.  Without looking at the existing
code, though, the existing "binary search on a fixed array" makes
me think of a list of keywords in alphabetical order.  If true,
entering new keywords in alphabetical order doesn't seem like a terrible
burden on the implementor.  The resulting list is probably more readable
if kept alphabetical anyway...



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] gperf anyone?

From
Peter Eisentraut
Date:
On 2000-01-18, Bruce Momjian mentioned:

> Can you run our keywords.c using our method and gperf and see if there
> is any speed difference?

It seems to have a speed advantage of about 2.5. But in practice that
means that 1 million words take half a second. It's not a big deal to me,
I was just wondering before I throw it out. I guess it really only makes a
difference for compilers, which operate on 1000+ lines.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden




Re: [HACKERS] gperf anyone?

From
Peter Eisentraut
Date:
On 2000-01-18, Tom Lane mentioned:

> I agree with Don that the performance benefit is likely to be
> unmeasurable.  Still, there could be a win: we currently have to modify
> keywords.c by hand every time we have to add/delete a keyword.  Does
> gperf offer any aid for maintaining the keyword list?  If so, that'd
> be sufficient reason to switch to it...

That's a good point. It would allow you much more ordering freedom. The
file is attached for review. Of course adding/deleting keywords would now
require gperf. :(

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: [HACKERS] gperf anyone?

From
Bruce Momjian
Date:
[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> On 2000-01-18, Bruce Momjian mentioned:
> 
> > Can you run our keywords.c using our method and gperf and see if there
> > is any speed difference?
> 
> It seems to have a speed advantage of about 2.5. But in practice that
> means that 1 million words take half a second. It's not a big deal to me,
> I was just wondering before I throw it out. I guess it really only makes a
> difference for compilers, which operate on 1000+ lines.
> 

The big difference may be that the compiler has variables/types that are
added dynamically while running, while our list is static.  Insert time
for our types is zero because we don't add SQL keywords at runtime.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] gperf anyone?

From
The Hermit Hacker
Date:
On Wed, 19 Jan 2000, Bruce Momjian wrote:

> [Charset ISO-8859-1 unsupported, filtering to ASCII...]
> > On 2000-01-18, Bruce Momjian mentioned:
> > 
> > > Can you run our keywords.c using our method and gperf and see if there
> > > is any speed difference?
> > 
> > It seems to have a speed advantage of about 2.5. But in practice that
> > means that 1 million words take half a second. It's not a big deal to me,
> > I was just wondering before I throw it out. I guess it really only makes a
> > difference for compilers, which operate on 1000+ lines.
> > 
> 
> The big difference may be that the compiler has variables/types that are
> added dynamically while running, while our list is static.  Insert time
> for our types is zero because we don't add SQL keywords at runtime.

I'm curious...does it hurt us any to do this?  Like, will it slow things
down?  Is the end result cleaner, for neglible speed improvements?


Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] gperf anyone?

From
Alfred Perlstein
Date:
* The Hermit Hacker <scrappy@hub.org> [000119 12:51] wrote:
> On Wed, 19 Jan 2000, Bruce Momjian wrote:
> 
> > [Charset ISO-8859-1 unsupported, filtering to ASCII...]
> > > On 2000-01-18, Bruce Momjian mentioned:
> > > 
> > > > Can you run our keywords.c using our method and gperf and see if there
> > > > is any speed difference?
> > > 
> > > It seems to have a speed advantage of about 2.5. But in practice that
> > > means that 1 million words take half a second. It's not a big deal to me,
> > > I was just wondering before I throw it out. I guess it really only makes a
> > > difference for compilers, which operate on 1000+ lines.
> > > 
> > 
> > The big difference may be that the compiler has variables/types that are
> > added dynamically while running, while our list is static.  Insert time
> > for our types is zero because we don't add SQL keywords at runtime.
> 
> I'm curious...does it hurt us any to do this?  Like, will it slow things
> down?  Is the end result cleaner, for neglible speed improvements?

You can pick up a copy of my test at:

http://www.freebsd.org/~alfred/gperf.tgz

It should compile two programs 'gperf' and 'norm', I was able to get
almost a 100% speed improvement:

~/gperf % time ./gperf 
./gperf  0.49s user 0.00s system 100% cpu 0.489 total
~/gperf % time ./norm 
./norm  0.91s user 0.00s system 99% cpu 0.918 total

One thing you'll want to consider is the overall application of this,
if the potentially sparse tables that gperf creates causes us to fault
in extra cache pages it may not be so cool.

I'm also pretty sure someone more proficient with hash tables and gperf
in particular could get better results than I did, I sort of guessed
at gperf command line switches until one seemed to work.

also... to accomplish the gperf testing I do a strlen on each word,
over and over to simulate the need for it (as gperf needs that)
however if the strlen is already available in the parser at this
time, i'm pretty sure it would be even faster.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]


Re: [HACKERS] gperf anyone?

From
Peter Eisentraut
Date:
On Wed, 19 Jan 2000, Alfred Perlstein wrote:

> also... to accomplish the gperf testing I do a strlen on each word,
> over and over to simulate the need for it (as gperf needs that)
> however if the strlen is already available in the parser at this
> time, i'm pretty sure it would be even faster.

Precisely, that was the idea.

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] gperf anyone?

From
Peter Eisentraut
Date:
On Wed, 19 Jan 2000, Bruce Momjian wrote:

> [Charset ISO-8859-1 unsupported, filtering to ASCII...]
> > On 2000-01-18, Bruce Momjian mentioned:
> > 
> > > Can you run our keywords.c using our method and gperf and see if there
> > > is any speed difference?
> > 
> > It seems to have a speed advantage of about 2.5. But in practice that
> > means that 1 million words take half a second. It's not a big deal to me,
> > I was just wondering before I throw it out. I guess it really only makes a
> > difference for compilers, which operate on 1000+ lines.
> > 
> 
> The big difference may be that the compiler has variables/types that are
> added dynamically while running, while our list is static.  Insert time
> for our types is zero because we don't add SQL keywords at runtime.

No, compiler don't do this either. This is specifically for keyword
lookup. The whole idea is that you have one set of keywords that hardly
ever changes (such as for programming languages), then you take one
afternoon off, play around with the different options until you have a
lookup function which processes your particular keyword set fastest.

Again, this is not a big deal to me, I just did it to play around. In any
case it seems to run faster, but I wasn't sure if people wanted to bother.

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] gperf anyone?

From
Don Baccus
Date:
At 12:34 PM 1/20/00 +0100, Peter Eisentraut wrote:

>No, compiler don't do this either. This is specifically for keyword
>lookup. The whole idea is that you have one set of keywords that hardly
>ever changes (such as for programming languages), then you take one
>afternoon off, play around with the different options until you have a
>lookup function which processes your particular keyword set fastest.

>Again, this is not a big deal to me, I just did it to play around. In any
>case it seems to run faster, but I wasn't sure if people wanted to bother.

The binary search could very easily be sped up, actually (I just peeked).

If you'd care to e-mail me your two test cases, the one using the
output of gperf and the one using the current binary search, I'd
be more than willing to demonstrate.

I have nothing against speeding things up, though again identifying
keywords takes a vanishingly small part of the time required to
execute a query (or compile a program, for that matter).  I more
or less dislike adding dependencies to external tools when it
can be avoided, though.  

I just built a new PC to do linux-based development on and it
would be fun to have your benchmark program to compare my 
new and old linux boxes anyway :)  (P 200 classic vs. P500E, guess
which will win?)



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.