Home > mailing lists

Re: tsearch2 and pdf files - Mailing list pgsql-general

From	Magnus Hagander
Subject	Re: tsearch2 and pdf files
Date	December 12, 2006 03:51:08
Msg-id	6BCB9D8A16AC4241919521715F4D8BCEA0FDF9@algol.sollentuna.se Whole thread Raw
In response to	Re: tsearch2 and pdf files ("philip johnson" <philip.johnson@atempo.com>)
List	pgsql-general

Tree view

> >> 1. Convert PDF to file with e.g xpdf
> >> 2. Insert parsed text to a table of your choice.
> >> 3. Make vectors from the text.
> >
> > Actually, if you're not going to use the headline()
> function, you cna
> > just store it directly in a vector, cutting down on the size
> > requirements.
> What size requirements ?

If you store both text and tsvector, that's going to use up a lot more
space than if you just store the tsvector. With a proper lexer and such,
it will be *more* than twice as large, given that the tsvector will be
smaller than the text.

//Magnus

pgsql-general by date:

From: Richard Huxton
Date: 12 December 2006, 03:38:39
Subject: Re: Why DISTINCT ... DESC is slow?

From: Michael Glaesemann
Date: 12 December 2006, 03:53:22
Subject: Re: Why DISTINCT ... DESC is slow?

Re: tsearch2 and pdf files - Mailing list pgsql-general

Previous

Next