Thread: BLOB & Searching

BLOB & Searching

From
jdwatson1@gmail.com
Date:
Hi,
I am not 100% sure what the best solution would be, so I was hoping
someone could point me in the right direction.

I usually develop in MS tools, such as .net, ASP, SQL Server etc...,
but I really want to expand my skillset and learn as much about
Postgres
as possible.


What I need to do, is design a DB that will index and store
approximately 300 word docs, each with a size no more that 1MB.  They
need to be able to seacrh the word documents for keyword/phrases to be
able to identify which one to use.


So, I need to write 2 web interfaces. A front end and a back end. Front

end for the users who will search for their documents, and a backend
for an admin person to upload new/ammended documents to the DB to be
searchable.


NOW.....  I could do this in the usual MS tools that I work with using
BLOB's and the built in Full-text searching that comes with SQL Server,

but i don't have these to work with. I am working with PostGres & JSP
pages


What I was hoping someone could help me out with was identifying the
best possible solution to use.


1. How can I store the word doc's in the DB, would it be best to use a
BLOB data type?


2. Does Postgres support full text searching of a word document once it

is loaded into the BLOB column & how would this work?   Would I have to
unload each BLOB object, convert it back to text to search, or does
Postgres have the ability to complete the full-text search of a BLOB,
like MSSQL Server & Oracle do?


3. Is there a way to export the Word Doc From the BLOB colum and dump
it into a PDF format (I guess I am asking if someone has seen or
written a PDF generator script/storedProc for Postgres)?


If someone could help me out, it would be greatly appreciated.


cheers,
James


Re: BLOB & Searching

From
"Jim C. Nasby"
Date:
On Mon, Jun 12, 2006 at 02:44:34PM -0700, jdwatson1@gmail.com wrote:
> 1. How can I store the word doc's in the DB, would it be best to use a
> BLOB data type?

Use a bytea field.

> 2. Does Postgres support full text searching of a word document once it

Nope.

> is loaded into the BLOB column & how would this work?   Would I have to
> unload each BLOB object, convert it back to text to search, or does
> Postgres have the ability to complete the full-text search of a BLOB,
> like MSSQL Server & Oracle do?

You'd want to store the plain-text version of the doc and then index
that using tsearch2. Actually, there may be a way to avoid storing the
text representation if you get clever with tsearch2... worst case you
might need to extend tsearch2 so you can feed it an arbitrary function
instead of a field.

> 3. Is there a way to export the Word Doc From the BLOB colum and dump
> it into a PDF format (I guess I am asking if someone has seen or
> written a PDF generator script/storedProc for Postgres)?

No, but you should be able to make that happen using an untrusted
language/function and some external tools.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: BLOB & Searching

From
"Joshua D. Drake"
Date:
Jim C. Nasby wrote:
> On Mon, Jun 12, 2006 at 02:44:34PM -0700, jdwatson1@gmail.com wrote:
>> 1. How can I store the word doc's in the DB, would it be best to use a
>> BLOB data type?
>
> Use a bytea field.
>
>> 2. Does Postgres support full text searching of a word document once it
>
> Nope.

Not natively. It would however be possible to use one of the .doc libs
out there to read the binary document and run it through a parser and
feed that to Tsearch2.

Sincerely,

Joshua D. Drake



--

             === The PostgreSQL Company: Command Prompt, Inc. ===
       Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
       Providing the most comprehensive  PostgreSQL solutions since 1997
                      http://www.commandprompt.com/