Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date
Msg-id 4E0F6568.6010809@enterprisedb.com
Whole thread Raw
In response to Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
List pgsql-hackers
On 02.07.2011 21:07, Tom Lane wrote:
> I wrote:
>> I tweaked the patch a bit further (don't pessimize the boundary case
>> where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory,
>> improve the comment) and attach that version below.  This is a little
>> bit faster but I still wonder if it's worth the price of adding an
>> obscure dependency on the header size.
>
> It occurred to me that we could very easily remove that objection by
> making the code dynamically detect when it's reached a suitably aligned
> trigram.  The attached version of the patch does it that way.  It seems
> to be a percent or so slower than my previous version, but I think that
> making the function robust against header changes is probably well worth
> that price.

Ah, that opens the door to do something I considered earlier but
rejected because of alignment: instead of three 32-bit word fetches, we
could fetch one 64-bit word and 32-bit word. Might shave a few more
cycles...

Meanwhile, I experimented with optimizing the other part of the loop:
the HASH() macros for setting the right bits in the signature. With the
default compile-time settings, the signature array is 95 bits.
Currently, it's stored in a BITVEC, which is a byte array, but we could
store it in two 64-bit integers, which makes it possible to write SETBIT
differently. I experimented with a few approaches, first was essentially
this:

+ /* Set the nth bit in the signature, in s1 or s2 */
+ #define HASH_S(h) \
+     do {                                    \
+         unsigned int n = HASHVAL(h);        \
+         if (((uint64) (n)) < 64)            \
+             s1 |= (uint64) 1<<(n);            \
+         else                                \
+             s2 |= (uint64) 1<<((n) - 64);    \
+     } while(0)

That was a bit faster on my x64 laptop, but slightly slower on my ia64
HP-UX box. My second try was to use lookup tables, patch attached. That
was yet faster on x64, and a small win on the ia64 box too. I'm not sure
it's worth the added code complexity, though.

Here's a summary of the timings I got with different versions:

ia64 HP-UX (anole):

unpatched: 11194.038 ms
fast_makesign-tom: 10064.980 ms
fast_makesign-2int: 10649.726 ms
fast_makesign-tbl: 9951.547 ms

x64 laptop:

unpatched: 4595,209 ms
fast_makesign-tom: 3346,548 ms
fast_makesign-2int: 3102,874 ms
fast_makesign-tbl: 2997,854 ms

I used the same "REINDEX INDEX i_words" test I used earlier, repeated
each run a couple of times, and took the lowest number.
fast_makesign-tom is the first patch you posted, I haven't tested your
latest one. fast_makesign-2int is with the HASH_S() macro above, and
has_makesign-tbl is with the attached patch.

PS. in case you wonder why the HP-UX box is so much slower than my
laptop; this box isn't really meant for performance testing. It's just
something I happen to have access to, I think it's a virtual machine of
some sort. The numbers are very repeatable, however.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Next
From: Tom Lane
Date:
Subject: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)