Thread: integration of fulltext search in bytea/docs

integration of fulltext search in bytea/docs

From
Radek Novotný
Date:
Hello,
is there in the roadmap of postgre integration of fulltext searching in documents saved in blobs (bytea)?


For example linux antiword can export fine text output that can be inserted into varchar field.


Would be very very nice (postgre users can be proud to be first) to save documents into bytea and search that field via to_tsvector, to_tsquery ...


--

www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming) a videoarchív ve formátu FLASH.


Bc. Radek Novotný
jednatel Mediawork group s.r.o.


tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz

Re: integration of fulltext search in bytea/docs

From
Sam Mason
Date:
On Wed, Jul 29, 2009 at 04:46:43PM +0200, Radek Novotnnn wrote:
> is there in the roadmap of postgre integration of fulltext searching in
> documents saved in blobs (bytea)?

Do you mean bytea or large-objects?

> Would be very very nice (postgre users can be proud to be first) to save
> documents into bytea and search that field via to_tsvector, to_tsquery ...

This seems easy; for large objects, just use lo_export() to dump the
blob out to the filesystem, and then use something like pl/perl to run
antiword on it, saving the results to another file and then returning
the file line-by-line as a SETOF TEXT (I think this is the best way of
handling things in case the resulting text file is enormous anyway).  If
this code was called "runfilter" we can use it like:

  UPDATE myfiles f SET tsidx = (
    SELECT ts_accum(to_tsvector(t))
    FROM runfilter(f.loid) t);

Where we've defined ts_accum to be:

  CREATE AGGREGATE ts_accum (tsvector) (
    SFUNC = tsvector_concat,
    STYPE = tsvector,
    INITCOND = ''
  );

bytea is different because you know when the values has changed (i.e.
write a trigger) but you need to write more code to get the bytea value
out into the filesystem.

--
  Sam  http://samason.me.uk/

Re: integration of fulltext search in bytea/docs

From
Pavel Stehule
Date:
Hello

Dne 29. červenec 2009 16:46 Radek Novotný <radek.novotny@mediawork.cz>
napsal(a):
> Hello,
> is there in the roadmap of postgre integration of fulltext searching in
> documents saved in blobs (bytea)?
>

What I know, no. PostgreSQL doesn't know about others binary formats,
so it cannot do it.

> For example linux antiword can export fine text output that can be inserted
> into varchar field.

I understand it well. Three years ago, we used antiword. We stored two
values - bytea - original word doc, and text - antiword output. The
overhead is minimal, and this solution worked very well.

>
> Would be very very nice (postgre users can be proud to be first) to save
> documents into bytea and search that field via to_tsvector, to_tsquery ...
>

It's should be very slow. You have to do repeated transformation.

Pavel Stehule

> --
>
> www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
> streaming) a videoarchív ve formátu FLASH.
>
> Bc. Radek Novotný
> jednatel Mediawork group s.r.o.
>
> tel.: +420 724 020 361
> email: radek.novotny@mediawork.cz
> http://www.mediawork.cz
>
>

Re: integration of fulltext search in bytea/docs

From
Radek Novotný
Date:
Is there possible to create pg trigger that runs shell script?
--

www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming) a videoarchív ve formátu FLASH.


Bc. Radek Novotný
jednatel Mediawork group s.r.o.


tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz

Dne St 29. července 2009 17:38:31 Pavel Stehule napsal(a):
> Hello
>
> Dne 29. červenec 2009 16:46 Radek Novotný <radek.novotny@mediawork.cz>
>
> napsal(a):
> > Hello,
> > is there in the roadmap of postgre integration of fulltext searching in
> > documents saved in blobs (bytea)?
>
> What I know, no. PostgreSQL doesn't know about others binary formats,
> so it cannot do it.
>
> > For example linux antiword can export fine text output that can be
> > inserted into varchar field.
>
> I understand it well. Three years ago, we used antiword. We stored two
> values - bytea - original word doc, and text - antiword output. The
> overhead is minimal, and this solution worked very well.
>
> > Would be very very nice (postgre users can be proud to be first) to save
> > documents into bytea and search that field via to_tsvector, to_tsquery
> > ...
>
> It's should be very slow. You have to do repeated transformation.
>
> Pavel Stehule
>
> > --
> >
> > www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
> > streaming) a videoarchív ve formátu FLASH.
> >
> > Bc. Radek Novotný
> > jednatel Mediawork group s.r.o.
> >
> > tel.: +420 724 020 361
> > email: radek.novotny@mediawork.cz
> > http://www.mediawork.cz


Re: integration of fulltext search in bytea/docs

From
Sam Mason
Date:
On Thu, Jul 30, 2009 at 03:09:12PM +0200, Radek Novotnnn wrote:
> Is there possible to create pg trigger that runs shell script?

Yes, pl/perl can do this.

--
  Sam  http://samason.me.uk/

Re: integration of fulltext search in bytea/docs

From
Michael Glaesemann
Date:
On Jul 30, 2009, at 9:09 , Radek Novotný wrote:

> Is there possible to create pg trigger that runs shell script?


[Please don't top post.]

Yes. You can use an untrusted language such as pl/perlu to run system
commands.

Michael Glaesemann
grzm seespotcode net




Re: integration of fulltext search in bytea/docs

From
"A. Kretschmer"
Date:
In response to Radek Novotný :
> Is there possible to create pg trigger that runs shell script?

Sure, use an untrusted language for the trigger-function.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: integration of fulltext search in bytea/docs

From
Tom Lane
Date:
Michael Glaesemann <grzm@seespotcode.net> writes:
> On Jul 30, 2009, at 9:09 , Radek Novotn� wrote:
>> Is there possible to create pg trigger that runs shell script?

> Yes. You can use an untrusted language such as pl/perlu to run system
> commands.

The fact that you can do it doesn't make it a good idea ...

If you do this, you'll need to consider what happens if the database
transaction rolls back after calling your trigger.  The effects in the
filesystem are still there, but the effects in the database aren't.

            regards, tom lane