Re: what is the best way of storing text+image documents in postgresql - Mailing list pgsql-general

From Tomas Vondra
Subject Re: what is the best way of storing text+image documents in postgresql
Date
Msg-id 4DEFDE8B.2000502@fuzzy.cz
Whole thread Raw
In response to Re: what is the best way of storing text+image documents in postgresql  (John R Pierce <pierce@hogranch.com>)
List pgsql-general
Dne 8.6.2011 21:37, John R Pierce napsal(a):
> On 06/08/11 6:06 AM, Craig Ringer wrote:
>>> 1. save .doc documents in bytea columns. and show them with a word
>>> reader in web page (disadvantage: it needs a proper .doc reader
>>> installed on user computer)
>>
>> 1a: Convert the .doc files to a standard format like PDF that most
>> browsers can display. That's what I'd do.
>
> thats harder to integrate with a website in the sense that the PDF
> documents are hard page formatted, and can at best be displayed in an
> <iframe> within your site, and half the time, only displayed in an
> external PDF viewer since browser-pdf integration remains flakey and
> bugridden after all these years.   PDF text won't flow to fit your page
> layout, etc etc.

OTOH, the probability that the visitor has a PDF reader is much higher.
Plus the PDF is usually much easier to index etc. But yes, using this to
build a website is PITA.

> one approach to conversion might be to save the documents as an RTF type
> format, and run that through a preprocessor that reencodes them as a
> clean HTML or similar metalanguage that you can deal with intelligently.
>    MS Word's own HTML converter creates wretched HTML with tons of extra
> bizarro-world tags which likely will trip up your page formatting if you
> display these in context in your pages.

You could as well run htmltidy or something like that on the HTML. But
in both cases, this will seriously damage the formatting. So if the OP
wants to preserve it, the only viable solution is to keep the .doc
format or convert it to a .pdf and pray there's a working viewer.

Or you can convert the PDF into images ("convert" from imagemagick can
do that quite easily), display those images on the web and offer the PDF
for download.

regards
Tomas

pgsql-general by date:

Previous
From: "David Johnston"
Date:
Subject: Re: Converting uuid primary key column to serial int
Next
From: Mike Christensen
Date:
Subject: Re: Converting uuid primary key column to serial int