Re: integration of fulltext search in bytea/docs - Mailing list pgsql-general

From Sam Mason
Subject Re: integration of fulltext search in bytea/docs
Date
Msg-id 20090729152345.GF5407@samason.me.uk
Whole thread Raw
In response to integration of fulltext search in bytea/docs  (Radek Novotný <radek.novotny@mediawork.cz>)
List pgsql-general
On Wed, Jul 29, 2009 at 04:46:43PM +0200, Radek Novotnnn wrote:
> is there in the roadmap of postgre integration of fulltext searching in
> documents saved in blobs (bytea)?

Do you mean bytea or large-objects?

> Would be very very nice (postgre users can be proud to be first) to save
> documents into bytea and search that field via to_tsvector, to_tsquery ...

This seems easy; for large objects, just use lo_export() to dump the
blob out to the filesystem, and then use something like pl/perl to run
antiword on it, saving the results to another file and then returning
the file line-by-line as a SETOF TEXT (I think this is the best way of
handling things in case the resulting text file is enormous anyway).  If
this code was called "runfilter" we can use it like:

  UPDATE myfiles f SET tsidx = (
    SELECT ts_accum(to_tsvector(t))
    FROM runfilter(f.loid) t);

Where we've defined ts_accum to be:

  CREATE AGGREGATE ts_accum (tsvector) (
    SFUNC = tsvector_concat,
    STYPE = tsvector,
    INITCOND = ''
  );

bytea is different because you know when the values has changed (i.e.
write a trigger) but you need to write more code to get the bytea value
out into the filesystem.

--
  Sam  http://samason.me.uk/

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: How to prevent duplicate key error when two processes do DELETE/INSERT simultaneously?
Next
From: Tom Lane
Date:
Subject: Re: OID in $_TD->{new}/$_TD->{old}