Re: PDF files: to store in database or not - Mailing list pgsql-general

From Eric Schwarzenbach
Subject Re: PDF files: to store in database or not
Date
Msg-id 7b0d4109-a722-9cd9-d324-dcea926bdff5@blackbrook.org
Whole thread Raw
In response to PDF files: to store in database or not  (Rich Shepard <rshepard@appl-ecosys.com>)
Responses Re: PDF files: to store in database or not
List pgsql-general
On 12/06/2016 01:30 PM, Rich Shepard wrote:
>   With no experience of storing binary data in a bytea column I don't
> know
> when its use is appropriate. I suspect that for an application I'm
> developing it would be better to store row-related documents outside the
> database, and want to learn if that is the appropriate approach.
>
>   Consider an application that manages a fleet of vehicles. There's a
> Vehicles table with information on each one (perhaps make, model, VIN,
> year
> of purchase) and a Services table. There are many PDF documents
> associated
> with each row in the tables: purchase contract, insurance form,
> service and
> maintenance records, etc.
>
>   My thinking is to not store these documents in the database, but to
> store
> them in subdirectories outside the database.
>
>   Your thoughts?
>
> Rich
I'd also be interested in answers to this that are give specific pros
and cons, and not in terms of "its better to do this than that."
What's "better" depends on how much you value the various pros and the
cons.

One of the pros of keeping them in the database is ease of protecting
adds and updates to the files and their related data with a transaction
and being able to have system where it iss pretty much impossible for
the documents to ever be out of sync with the related data.

I maintain some systems that do keep the documents outside of the
database, and the application code maintains the transactional integrity
of the files and data, and for the most part we don't have integrity
problems. In the worst of an add or update operation being interrupted
by a system crash or unexpected error, we have a new document saved but
the data about this document has not been written to the database and it
is as if that operation never happened. The file may really be there but
the system does not "know about it." This works even for updates because
our system versions documents and the old version is not written over,
there is simply a new version that the system never "knows" about.
Without versioning this would be more of a problem, and you would
probably need to protect yourself with code that does something like
temporarily keeping the last version of a file during an update and
switching over the metadata to reference the new document only at the
very last operation in the transaction.

We also have the potential of the database not matching the file store
when a system is migrated or "cloned." We are very careful about this,
but we've at least once had a case where a client's IT depart screwed it
up, and got a mismatched system to which they started writing new data.
Luckily this was a test or staging system and no production data was lost.

I've often wondered if we'd have been better off storing the files in
the database. This design decision was made some years ago, and our
concerns around this had to do with performance, but I don't know that
we had any real data that this should have been a concern, and I suspect
you could ameliorate if not eliminate this as an issue by careful
design. I'd loved to hear this idea confirmed or debunked by someone who
has more expertise (and ideally, done actual testing).

Cheers,

Eric


pgsql-general by date:

Previous
From: Moreno Andreo
Date:
Subject: Re: PDF files: to store in database or not
Next
From: Eric Schwarzenbach
Date:
Subject: Re: PDF files: to store in database or not