Re: [GENERAL] Creation of tsearch2 index is very slow - Mailing list pgsql-performance

From Martijn van Oosterhout
Subject Re: [GENERAL] Creation of tsearch2 index is very slow
Date
Msg-id 20060120223327.GH31908@svana.org
Whole thread Raw
In response to Re: [GENERAL] Creation of tsearch2 index is very slow  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Fri, Jan 20, 2006 at 04:50:17PM -0500, Tom Lane wrote:
> (hemdistcache calls hemdistsign --- I think gprof is doing something
> funny with tail-calls here, and showing hemdistsign as directly called
> from gtsvector_picksplit when control really arrives through hemdistcache.)

It may be the compiler. All these functions are declared static, which
gives the compiler quite a bit of leeway to rearrange code.

> The bulk of the problem is clearly in this loop, which performs O(N^2)
> comparisons to find the two entries that are furthest apart in hemdist
> terms:

Ah. A while ago someone came onto the list asking about bit strings
indexing[1]. If I'd known tsearch worked like this I would have pointed
him to it. Anyway, before he went off to implement it he mentioned
"Jarvis-Patrick clustering", whatever that means.

Probably more relevent was this thread[2] on -hackers a while back with
pseudo-code[3]. How well it works, I don't know, it worked for him
evidently, he went away happy...

[1] http://archives.postgresql.org/pgsql-general/2005-11/msg00473.php
[2] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01067.php
[3] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01069.php

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

pgsql-performance by date:

Previous
From: Ron
Date:
Subject: Re: [GENERAL] Creation of tsearch2 index is very slow
Next
From: Ron
Date:
Subject: Re: [GENERAL] Creation of tsearch2 index is very slow