Thread: BLOB & Searching
Hi, I am not 100% sure what the best solution would be, so I was hoping someone could point me in the right direction. I usually develop in MS tools, such as .net, ASP, SQL Server etc..., but I really want to expand my skillset and learn as much about Postgres as possible. What I need to do, is design a DB that will index and store approximately 300 word docs, each with a size no more that 1MB. They need to be able to seacrh the word documents for keyword/phrases to be able to identify which one to use. So, I need to write 2 web interfaces. A front end and a back end. Front end for the users who will search for their documents, and a backend for an admin person to upload new/ammended documents to the DB to be searchable. NOW..... I could do this in the usual MS tools that I work with using BLOB's and the built in Full-text searching that comes with SQL Server, but i don't have these to work with. I am working with PostGres & JSP pages What I was hoping someone could help me out with was identifying the best possible solution to use. 1. How can I store the word doc's in the DB, would it be best to use a BLOB data type? 2. Does Postgres support full text searching of a word document once it is loaded into the BLOB column & how would this work? Would I have to unload each BLOB object, convert it back to text to search, or does Postgres have the ability to complete the full-text search of a BLOB, like MSSQL Server & Oracle do? 3. Is there a way to export the Word Doc From the BLOB colum and dump it into a PDF format (I guess I am asking if someone has seen or written a PDF generator script/storedProc for Postgres)? If someone could help me out, it would be greatly appreciated. cheers, James
On Mon, Jun 12, 2006 at 02:44:34PM -0700, jdwatson1@gmail.com wrote: > 1. How can I store the word doc's in the DB, would it be best to use a > BLOB data type? Use a bytea field. > 2. Does Postgres support full text searching of a word document once it Nope. > is loaded into the BLOB column & how would this work? Would I have to > unload each BLOB object, convert it back to text to search, or does > Postgres have the ability to complete the full-text search of a BLOB, > like MSSQL Server & Oracle do? You'd want to store the plain-text version of the doc and then index that using tsearch2. Actually, there may be a way to avoid storing the text representation if you get clever with tsearch2... worst case you might need to extend tsearch2 so you can feed it an arbitrary function instead of a field. > 3. Is there a way to export the Word Doc From the BLOB colum and dump > it into a PDF format (I guess I am asking if someone has seen or > written a PDF generator script/storedProc for Postgres)? No, but you should be able to make that happen using an untrusted language/function and some external tools. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Mon, Jun 12, 2006 at 02:44:34PM -0700, jdwatson1@gmail.com wrote: >> 1. How can I store the word doc's in the DB, would it be best to use a >> BLOB data type? > > Use a bytea field. > >> 2. Does Postgres support full text searching of a word document once it > > Nope. Not natively. It would however be possible to use one of the .doc libs out there to read the binary document and run it through a parser and feed that to Tsearch2. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/