Thread: Full text indexing of documents

Full text indexing of documents

From
"Leo"
Date:
How can I create the tsvector data for large text files WITHOUT loading the text into a column of the DB?
I did it using 2 tables (one with the text inserted with a perl program) and copying the tsvectors and removing the table with the text, but it is not easy to add data this way.
The text files are "frozen" HTML documents so I don't need a trigger to update the tsvector, all I store is a URL of the text.
So what I need  is a way to point to a file (maybe there is a special data type?) and create the tsvector from the document.

Re: Full text indexing of documents

From
"Rodrigo E. De León Plicet"
Date:
On Feb 18, 2008 10:08 AM, Leo <fleovey@jus.gov.ar> wrote:
> How can I create the tsvector data for large text files WITHOUT loading the
> text into a column of the DB?
> I did it using 2 tables (one with the text inserted with a perl program) and
> copying the tsvectors and removing the table with the text, but it is not
> easy to add data this way.
> The text files are "frozen" HTML documents so I don't need a trigger to
> update the tsvector, all I store is a URL of the text.
> So what I need  is a way to point to a file (maybe there is a special data
> type?) and create the tsvector from the document.

Try using pg_read_file() to see if it helps you:
http://www.postgresql.org/docs/8.2/static/functions-admin.html#FUNCTIONS-ADMIN-GENFILE

Re: Full text indexing of documents

From
"Leo"
Date:
On Feb 18, 2008 10:08 AM, Leo <fleovey@jus.gov.ar> wrote:
> How can I create the tsvector data for large text files WITHOUTloading the
> text into a column of the DB?
> I did it using 2 tables (one with the text inserted with a perlprogram) and
> copying the tsvectors and removing the table with the text, but it isnot
> easy to add data this way.
> The text files are "frozen" HTML documents so I don't need a triggerto
> update the tsvector, all I store is a URL of the text.
> So what I need  is a way to point to a file (maybe there is a specialdata
> type?) and create the tsvector from the document.

Try using pg_read_file() to see if it helps you:
http://www.postgresql.org/docs/8.2/static/functions-admin.html#FUNCTIONS-ADMIN-GENFILE
This is exactly whay I was looking for!  Many thanks to Rodrigo