Home > mailing lists

Re: Searching for substring with tsearch(1/2) - Mailing list pgsql-hackers

From	Teodor Sigaev
Subject	Re: Searching for substring with tsearch(1/2)
Date	December 10, 2003 08:20:53
Msg-id	3FD6E54E.7080109@sigaev.ru Whole thread Raw
In response to	Re: Searching for substring with tsearch(1/2) (Hannu Krosing <hannu@tm.ee>)
Responses	Re: Searching for substring with tsearch(1/2)
List	pgsql-hackers

Tree view

>>Tsearch was never minded as prefix search, and index structure doesn't support 
>>any kind of prefix or suffix. But you can write extension to tsearch, which will 
>>search by prefix. But such solution wiil not use index, only sequence scan.
> 
> 
> How efficient would tsearch be for really big expressions (where 'hu%'
> would be expanded (using a btree word index on one column word table) to
> tsearch equivalent of ( "human" or "humanity" or "humming" or "huge" or
> ..1000 words here...) before passing the expression to tsearch?

GiST index of tsearch doen't support prefix search, so it will works only by 
seqscan, as we know :) disk is much more slow than processor, speed will be 
limited by disk.


>>Prefix searches easy realized with inverted index, but it require a lot of 
>>programing.
>>The simplest way is:
>>create table invidx (
>>    lexeme text not null primary key,
>>         ids[] int
>>);
>>
>>where ids[] - array with identificators of documents which contains this word.
> 
> 
> How hard (or sensible ;) would be creating such an index using GiST ?
> As proved by tsearch GiST can cope well with many-to-many indexes.

Sorry, I don't understand. Do you mean that GiST supports one heap tuple in 
several index tuple? If yes then no :). GiST doesn't support this feature. I 
don't think that GiST may help in this situation.



> create table invidx (
>     lexeme   text not null,
>         textdate date not null,
>         ids[]    int,
>         primary  key (lexeme, textdate)
> );
> 
> which would partition the invidx table on textdate (or some other
> suitable datum)
> 
> 
>>2 If word is frequent then query with 'IN (select * from func()) may works slow...
> if it is often too slow then creating a temp table and doing a plain
> join may be faster.
Table structure as indidx decrease this problem.
-- 
Teodor Sigaev                                  E-mail: teodor@sigaev.ru

pgsql-hackers by date:

From: Jan Wieck
Date: 10 December 2003, 01:22:13
Subject: Re: ERROR: Index pg_toast_8443892_index is not a btree

From: Hannu Krosing
Date: 10 December 2003, 08:35:06
Subject: Re: Searching for substring with tsearch(1/2)

Re: Searching for substring with tsearch(1/2) - Mailing list pgsql-hackers

Previous

Next