Home > mailing lists

Re: Replacement for Oracle Text - Mailing list pgsql-general

From	Josh berkus
Subject	Re: Replacement for Oracle Text
Date	February 19, 2016 17:28:51
Msg-id	56C750C9.4000500@agliodbs.com Whole thread Raw
In response to	Replacement for Oracle Text (Daniel Westermann <daniel.westermann@dbi-services.com>)
Responses	Re: Replacement for Oracle Text
List	pgsql-general

Tree view

On 02/19/2016 05:49 AM, s d wrote:
> On 19 February 2016 at 14:19, Bruce Momjian <bruce@momjian.us
> <mailto:bruce@momjian.us>> wrote:
>
>     I wonder if PLPerl could be used to extract the words from a PDF
>     document and create a tsvector column from it.
>
>
>   I don't know about PLPerl(I'm pretty sure it could be used for this
> purpose, though.).  On the other hand I've written code for this in
> Python which should be easy to adapt for PLPython, if necessary.

I'd swear someone already built something to do this.  All you need is a
library which reads PDF and transforms it into text, and then you can
FTS it.  I know there's a module for OpenOffice docs somewhere as well,
but heck if I can remember where.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

pgsql-general by date:

From: Don Parris
Date: 19 February 2016, 16:12:34
Subject: Re: Charlotte Postgres User Group

From: Jeff Janes
Date: 19 February 2016, 19:18:28
Subject: Re: Monitoring and insight into NOTIFY queue

Re: Replacement for Oracle Text - Mailing list pgsql-general

Previous

Next