Re: [GENERAL] Creation of tsearch2 index is very slow - Mailing list pgsql-performance

From Martijn van Oosterhout
Subject Re: [GENERAL] Creation of tsearch2 index is very slow
Date
Msg-id 20060120225720.GI31908@svana.org
Whole thread Raw
In response to Re: [GENERAL] Creation of tsearch2 index is very slow  (Ron <rjpeace@earthlink.net>)
List pgsql-performance
On Fri, Jan 20, 2006 at 05:46:34PM -0500, Ron wrote:
> At 04:37 PM 1/20/2006, Martijn van Oosterhout wrote:
> >Given that all it's doing is counting bits, a simple fix would be to
> >loop over bytes, use XOR and count ones. For extreme speedup create a
> >lookup table with 256 entries to give you the answer straight away...
> For an even more extreme speedup, don't most modern CPUs have an asm
> instruction that counts the bits (un)set (AKA "population counting")
> in various size entities (4b, 8b, 16b, 32b, 64b, and 128b for 64b
> CPUs with SWAR instructions)?

Quite possibly, though I wouldn't have the foggiest idea how to get the
C compiler to generate it.

Given that even a lookup table will get you pretty close to that with
plain C coding, I think that's quite enough for a function that really
is just a small part of a much larger system...

Better solution (as Tom points out): work out how to avoid calling it
so much in the first place... At the moment each call to
gtsvector_picksplit seems to call the distance function around 14262
times. Getting that down by an order of magnitude will help much much
more.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] Creation of tsearch2 index is very slow
Next
From: "Steinar H. Gunderson"
Date:
Subject: Re: [GENERAL] Creation of tsearch2 index is very slow