Home > mailing lists

Re: Term positions in GIN fulltext index - Mailing list pgsql-hackers

From	Yoann Moreau
Subject	Re: Term positions in GIN fulltext index
Date	November 4, 2011 07:15:25
Msg-id	4EB3BB33.9080801@univ-avignon.fr Whole thread Raw
In response to	Re: Term positions in GIN fulltext index (Florian Pflug <fgp@phlo.org>)
Responses	Re: Term positions in GIN fulltext index
List	pgsql-hackers

Tree view

On 03/11/11 19:19, Florian Pflug wrote:
> There's a difference between values of type tsvector, and what GIN indices
> on columns or expressions of type tsvector store.

I was wondering what was the point about storing the tsvector in the 
table, I now understand. I then should use the GIN index to rank my 
documents, and work on the stored tsvectors for positions.

> As I pointed out above, you'll first need to make sure to store the result of
> to_tsvector in a columns. Then, what you need seems to be a functions that
> takes a tsvector value and returns the contained lexems as individual rows.
>
> Postgres doesn't seem to contain such a function currently (don't believe that,
> though - go and recheck the documentation. I don't know all thousands of built-in
> functions by heart). But it's easy to add one. You could either use PL/pgSQL
> to parse the tsvector's textual representation, or write a C function. If you
> go the PL/pgSQL route, regexp_split_to_table() might come in handy.

This seems easier to program than what I was thinking about, I'm going 
to do that. But I'm wondering about size of database with the GIN index 
plus the tsvector column, and performance about parsing the whole 
tsvectors for each document I need positions from (as I need them for a 
very few terms).

Maybe some external fulltext engine managing lexemes and positions would 
be more efficient for my purpose. I'll try some different things and let 
you know the results.

Thanks all for your help
Regards,
Yoann Moreau

pgsql-hackers by date:

From: Miroslav Šimulčík
Date: 04 November 2011, 07:09:46
Subject: Re: Storing original rows before update or delete

From: Pavel Stehule
Date: 04 November 2011, 07:23:33
Subject: a tsearch issue

Re: Term positions in GIN fulltext index - Mailing list pgsql-hackers

Previous

Next