Home > mailing lists

Re: gsoc, text search selectivity and dllist enhancments - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: gsoc, text search selectivity and dllist enhancments
Date	July 4, 2008 16:20:23
Msg-id	486E77E8.6010404@enterprisedb.com Whole thread Raw
In response to	Re: gsoc, text search selectivity and dllist enhancments (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: gsoc, text search selectivity and dllist enhancments
List	pgsql-hackers

Tree view

Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> The data structure I'd suggest is a simple array of pointers
>>> to the underlying hash table entries.  Since you have a predetermined
>>> maximum number of lexemes to track, you can just palloc the array once
>>> --- you don't need the expansibility properties of a list. 
> 
>> The number of lexemes isn't predetermined. It's 2 * (longest tsvector 
>> seen so far), and we don't know beforehand how long the longest tsvector is.
> 
> Hmm, I had just assumed without looking too closely that it was stats
> target times a fudge factor.  What is the rationale for doing it as
> above?  I don't think I like the idea of the limit varying over the
> course of the scan --- that means that lexemes in different places
> in the input will have significantly different probabilities of
> surviving to the final result.

Well, clearly if the list is smaller than the longest tsvector, 
inserting all elements of that long tsvector will flush out all other 
entries. Or if we throw away the newly inserted entries, some elements 
will never have a chance to climb up the list. I'm not sure where the 
"times two" figure comes from, maybe it's just a fudge factor, but the 
bottom line is that the minimum size needed depends on the size of the 
longest tsvector.

(Jan is offline until Saturday...)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Alvaro Herrera
Date: 04 July 2008, 16:12:51
Subject: Re: [COMMITTERS] pgsql: Fix a couple of bugs in win32 shmem name generation: * Don't cut

From: "David E. Wheeler"
Date: 05 July 2008, 02:40:01
Subject: Re: PATCH: CITEXT 2.0

Re: gsoc, text search selectivity and dllist enhancments - Mailing list pgsql-hackers

Previous

Next