tsvector from external files - Mailing list pgsql-general

From Perry Smith
Subject tsvector from external files
Date
Msg-id 97E0D2CA-1302-4EB7-8F0C-34ED54F37DC9@gmail.com
Whole thread Raw
Responses Re: tsvector from external files  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi,

In the documentation there is this statement:

Another possibility is to store the documents as simple text files in the file system. In this case, the database 
can be used to store the full text index and to execute searches, and some unique identifier can be used to 
retrieve the document from the file system.

It goes on to explain that there will be some limitations but I believe this is the path I want to go.

Eventually I'm going to be using Ruby as my language to interface to pglib and I'm not asking for Ruby help but some stepping stones would help me.  e.g. pointing out an interface in pglib would help me a great deal.

For example, in to_tsvector([config regconfig , ] document text) -- how do I give it an external file for document text?  I can think of many possible approaches but I thought I would ask here first for suggestions.

One approach is to go into a loop feeding perhaps 4K blocks of text and using the || operator but that has two disadvantages.  One is that the tsvector, as it grows, is being pushed back and forth across the client / server interface.  Using this approach will not exactly give me the same result (as explained with the || operator)...

The second approach is to create a large object first but that seems inefficient too.  Its also not clear that I can pass a reference to a large object in place of document text either.

Thank you,
Perry Smith

pgsql-general by date:

Previous
From: Denes Daniel
Date:
Subject: Re: Array comparison & prefix search
Next
From: Jose Maria Terry Jimenez
Date:
Subject: Error in crosstab using date_trunc