Home > mailing lists

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

From	Joel Jacobson
Subject	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date	March 20, 2019 22:55:56
Msg-id	CAASwCXdc1Yh10=gS1JjtF=Yn-zRsTG01k0Yw3M2xVrs7K5zq6w@mail.gmail.com Whole thread Raw
In response to	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
List	pgsql-hackers

Tree view

Many thanks for working on this, amazing work, really nice you made it a separate reusable Perl-module.

The generated hash functions reads one character at a time.

I've seen a performance trick in other hash functions [1]

to instead read multiple bytes in each iteration,

and then handle the remaining bytes after the loop.

[1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29

I've done some testing and it looks like a ~30% speed-up of the generated

ScanKeywords_hash_func() function would be possible.

If you think this approach is promising, I would be happy to prepare a patch for it,

but I wanted to check with the project this idea has not already been considered and ruled out

for some technical reasons I've failed to see, is there any?

For this to work you would need to use larger constants for $hash_mult1 and $hash_mult2 though.

I've successfully used these values:

$hash_mult1 0x2c1b3c6d

$hash_mult2 (0x297a2d39, 0x85ebca6b, 0xc2b2ae35, 0x7feb352d, 0x846ca68b)

Here is the idea:

Generated C-code:

for (; keylen >= 4; keylen -= 4, k += 4)

{

uint32_t v;

memcpy(&v, k, 4);

v |= 0x20202020;

a = a * 739982445 + v;

b = b * 2246822507 + v;

}

uint32_t v = 0;

switch (keylen)

{

case 3:

memcpy(&v, k, 3);

v |= 0x202020;

break;

case 2:

memcpy(&v, k, 2);

v |= 0x2020;

break;

case 1:

memcpy(&v, k, 1);

v |= 0x20;

break;

}

a = a * 739982445 + v;

b = b * 2246822507 + v;

return h[a % 883] + h[b % 883];

(Reding 8 bytes a time instead would perhaps be a win since some keywords are quite long.)

Perl-code:

sub _calc_hash

{

my ($key, $mult, $seed) = @_;

my $result = $seed;

my $i=0;

my $keylen = length($key);

for (; $keylen>=4; $keylen-=4, $i+=4) {

my $cn = (ord(substr($key,$i+0,1)) << 0)

| (ord(substr($key,$i+1,1)) << 8)

| (ord(substr($key,$i+2,1)) << 16)

| (ord(substr($key,$i+3,1)) << 24);

$cn |= 0x20202020 if $case_fold;

$result = ($result * $mult + $cn) % 4294967296;

}

my $cn = 0;

if ($keylen == 3) {

$cn = (ord(substr($key,$i+0,1)) << 0)

| (ord(substr($key,$i+1,1)) << 8)

| (ord(substr($key,$i+2,1)) << 16);

$cn |= 0x202020 if $case_fold;

} elsif ($keylen == 2) {

$cn = (ord(substr($key,$i+0,1)) << 0)

| (ord(substr($key,$i+1,1)) << 8);

$cn |= 0x2020 if $case_fold;

} elsif ($keylen == 1) {

$cn = (ord(substr($key,$i+0,1)) << 0);

$cn |= 0x20 if $case_fold;

}

$result = ($result * $mult + $cn) % 4294967296;

return $result;

}

pgsql-hackers by date:

From: Alvaro Herrera
Date: 20 March 2019, 22:53:44
Subject: Re: PostgreSQL pollutes the file system

From: legrand legrand
Date: 20 March 2019, 23:05:06
Subject: Re: [survey] New "Stable" QueryId based on normalized query text

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) - Mailing list pgsql-hackers

Previous

Next