Thread: integration of fulltext search in bytea/docs
is there in the roadmap of postgre integration of fulltext searching in documents saved in blobs (bytea)?
For example linux antiword can export fine text output that can be inserted into varchar field.
Would be very very nice (postgre users can be proud to be first) to save documents into bytea and search that field via to_tsvector, to_tsquery ...
--
www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming) a videoarchív ve formátu FLASH.
Bc. Radek Novotný
jednatel Mediawork group s.r.o.
tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz
On Wed, Jul 29, 2009 at 04:46:43PM +0200, Radek Novotnnn wrote: > is there in the roadmap of postgre integration of fulltext searching in > documents saved in blobs (bytea)? Do you mean bytea or large-objects? > Would be very very nice (postgre users can be proud to be first) to save > documents into bytea and search that field via to_tsvector, to_tsquery ... This seems easy; for large objects, just use lo_export() to dump the blob out to the filesystem, and then use something like pl/perl to run antiword on it, saving the results to another file and then returning the file line-by-line as a SETOF TEXT (I think this is the best way of handling things in case the resulting text file is enormous anyway). If this code was called "runfilter" we can use it like: UPDATE myfiles f SET tsidx = ( SELECT ts_accum(to_tsvector(t)) FROM runfilter(f.loid) t); Where we've defined ts_accum to be: CREATE AGGREGATE ts_accum (tsvector) ( SFUNC = tsvector_concat, STYPE = tsvector, INITCOND = '' ); bytea is different because you know when the values has changed (i.e. write a trigger) but you need to write more code to get the bytea value out into the filesystem. -- Sam http://samason.me.uk/
Hello Dne 29. červenec 2009 16:46 Radek Novotný <radek.novotny@mediawork.cz> napsal(a): > Hello, > is there in the roadmap of postgre integration of fulltext searching in > documents saved in blobs (bytea)? > What I know, no. PostgreSQL doesn't know about others binary formats, so it cannot do it. > For example linux antiword can export fine text output that can be inserted > into varchar field. I understand it well. Three years ago, we used antiword. We stored two values - bytea - original word doc, and text - antiword output. The overhead is minimal, and this solution worked very well. > > Would be very very nice (postgre users can be proud to be first) to save > documents into bytea and search that field via to_tsvector, to_tsquery ... > It's should be very slow. You have to do repeated transformation. Pavel Stehule > -- > > www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online > streaming) a videoarchív ve formátu FLASH. > > Bc. Radek Novotný > jednatel Mediawork group s.r.o. > > tel.: +420 724 020 361 > email: radek.novotny@mediawork.cz > http://www.mediawork.cz > >
--
www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming) a videoarchív ve formátu FLASH.
Bc. Radek Novotný
jednatel Mediawork group s.r.o.
tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz
Dne St 29. července 2009 17:38:31 Pavel Stehule napsal(a):
> Hello
>
> Dne 29. červenec 2009 16:46 Radek Novotný <radek.novotny@mediawork.cz>
>
> napsal(a):
> > Hello,
> > is there in the roadmap of postgre integration of fulltext searching in
> > documents saved in blobs (bytea)?
>
> What I know, no. PostgreSQL doesn't know about others binary formats,
> so it cannot do it.
>
> > For example linux antiword can export fine text output that can be
> > inserted into varchar field.
>
> I understand it well. Three years ago, we used antiword. We stored two
> values - bytea - original word doc, and text - antiword output. The
> overhead is minimal, and this solution worked very well.
>
> > Would be very very nice (postgre users can be proud to be first) to save
> > documents into bytea and search that field via to_tsvector, to_tsquery
> > ...
>
> It's should be very slow. You have to do repeated transformation.
>
> Pavel Stehule
>
> > --
> >
> > www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
> > streaming) a videoarchív ve formátu FLASH.
> >
> > Bc. Radek Novotný
> > jednatel Mediawork group s.r.o.
> >
> > tel.: +420 724 020 361
> > email: radek.novotny@mediawork.cz
> > http://www.mediawork.cz
On Thu, Jul 30, 2009 at 03:09:12PM +0200, Radek Novotnnn wrote: > Is there possible to create pg trigger that runs shell script? Yes, pl/perl can do this. -- Sam http://samason.me.uk/
On Jul 30, 2009, at 9:09 , Radek Novotný wrote: > Is there possible to create pg trigger that runs shell script? [Please don't top post.] Yes. You can use an untrusted language such as pl/perlu to run system commands. Michael Glaesemann grzm seespotcode net
In response to Radek Novotný : > Is there possible to create pg trigger that runs shell script? Sure, use an untrusted language for the trigger-function. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
Michael Glaesemann <grzm@seespotcode.net> writes: > On Jul 30, 2009, at 9:09 , Radek Novotn� wrote: >> Is there possible to create pg trigger that runs shell script? > Yes. You can use an untrusted language such as pl/perlu to run system > commands. The fact that you can do it doesn't make it a good idea ... If you do this, you'll need to consider what happens if the database transaction rolls back after calling your trigger. The effects in the filesystem are still there, but the effects in the database aren't. regards, tom lane