Re: Indexing MS/Open Office and PDF documents - Mailing list pgsql-general

From Jeff Davis
Subject Re: Indexing MS/Open Office and PDF documents
Date
Msg-id 1331845976.26127.16.camel@sussancws0025
Whole thread Raw
In response to Indexing MS/Open Office and PDF documents  (<Alexander.Bagerman@cognizant.com>)
Responses Re: Indexing MS/Open Office and PDF documents  (Richard Huxton <dev@archonet.com>)
Re: Indexing MS/Open Office and PDF documents  (dennis jenkins <dennis.jenkins.75@gmail.com>)
List pgsql-general
On Fri, 2012-03-16 at 01:57 +0530, Alexander.Bagerman@cognizant.com
wrote:
> Hi,
>
> We are looking to use Postgres 9 for the document storing and would
> like to take advantage of the full text search capabilities. We have
> hard time identifying MS/Open Office and PDF parsers to index stored
> documents and make them available for text searching. Any advice would
> be appreciated.

The first step is to find a library that can parse such documents, or
convert them to a format that can be parsed.

After you do that, PostgreSQL allows you to load arbitrary code as
functions (in various languages), so that will allow you to make use of
the library. It's hard to give more specific advice until you've found
the library you'd like to work with.

Regards,
    Jeff Davis



pgsql-general by date:

Previous
From: Ivan
Date:
Subject: Re: undo update
Next
From: Richard Huxton
Date:
Subject: Re: Indexing MS/Open Office and PDF documents