Thread: column type for pdf file

column type for pdf file

From
Emi Lu
Date:
Hello,

To save pdf files into postgresql8.3, what is the best column type?

bytea, blob, etc?

Thank you,
Emi


Re: column type for pdf file

From
Julien Cigar
Date:
Unless you've good reasons to do so it's best to store the file on the
file system and the file name/path in the database ...

On 05/18/2011 22:20, Emi Lu wrote:
> Hello,
>
> To save pdf files into postgresql8.3, what is the best column type?
>
> bytea, blob, etc?
>
> Thank you,
> Emi
>


Attachment

Re: column type for pdf file

From
Karsten Hilbert
Date:
On Wed, May 18, 2011 at 10:46:23PM +0200, Julien Cigar wrote:

> Unless you've good reasons to do so it's best to store the file on
> the file system

Why ?

If you suggest reasons are needed for storing the PDF in the
database I'd like to know the reasons for *not* doing so.

Karsten
-- 
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: column type for pdf file

From
Julien Cigar
Date:
On 05/18/2011 23:00, Karsten Hilbert wrote:
> On Wed, May 18, 2011 at 10:46:23PM +0200, Julien Cigar wrote:
>
>> Unless you've good reasons to do so it's best to store the file on
>> the file system
> Why ?
>
> If you suggest reasons are needed for storing the PDF in the
> database I'd like to know the reasons for *not* doing so.
>

It increases the load, consume connections, but the biggest drawback is
probably the memory consumption ..

IMHO storing binary data in a database is almost always a bad idea .. it
could be OK to store things like avatars, small icons, etc, but
certainly not to store files of several MB ... file systems are done for
that !

> Karsten


Attachment

Re: column type for pdf file

From
Karsten Hilbert
Date:
On Wed, May 18, 2011 at 11:21:43PM +0200, Julien Cigar wrote:

> >>Unless you've good reasons to do so it's best to store the file on
> >>the file system
> >Why ?
> >
> >If you suggest reasons are needed for storing the PDF in the
> >database I'd like to know the reasons for *not* doing so.
> 
> It increases the load, consume connections, but the biggest drawback
> is probably the memory consumption ..

Thanks. At least now the OP knows some of the reasoning for
not doing so :-)

Karsten
-- 
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: column type for pdf file

From
Julien Cigar
Date:
On 05/18/2011 23:27, Karsten Hilbert wrote:
> On Wed, May 18, 2011 at 11:21:43PM +0200, Julien Cigar wrote:
>
>>>> Unless you've good reasons to do so it's best to store the file on
>>>> the file system
>>> Why ?
>>>
>>> If you suggest reasons are needed for storing the PDF in the
>>> database I'd like to know the reasons for *not* doing so.
>> It increases the load, consume connections, but the biggest drawback
>> is probably the memory consumption ..
> Thanks. At least now the OP knows some of the reasoning for
> not doing so :-)
>

Yep sorry, I answered a bit too fast :)

> Karsten


Attachment

Re: column type for pdf file

From
Eric McKeeth
Date:
<div class="gmail_quote">On Wed, May 18, 2011 at 2:20 PM, Emi Lu <span dir="ltr"><<a
href="mailto:emilu@encs.concordia.ca">emilu@encs.concordia.ca</a>></span>wrote:<br /><blockquote class="gmail_quote"
style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> Hello,<br /><br /> To save pdf files into
postgresql8.3,what is the best column type?<br /><br /> bytea, blob, etc?<br /><br /> Thank you,<br /> Emi<font
color="#888888"></font><br/></blockquote></div><br />Everyone else has pointed out reasons for not doing this, and I
agreewith them that in the large majority of cases just storing a reference to a file stored outside the database is
preferable.However, to answer the question you asked, my rule of thumb is that if you need to store binary data in the
databaseis to use a bytea column, unless you need the random access capabilities that the large object interface
provides.A bytea column is typically easier to use, and has proper transactional behavior, enforcement of referential
integrity,etc.<br /><br />-Eric<br /> 

Re: column type for pdf file

From
Craig Ringer
Date:
On 05/19/2011 05:21 AM, Julien Cigar wrote:
> On 05/18/2011 23:00, Karsten Hilbert wrote:
>> On Wed, May 18, 2011 at 10:46:23PM +0200, Julien Cigar wrote:
>>
>>> Unless you've good reasons to do so it's best to store the file on
>>> the file system
>> Why ?
>>
>> If you suggest reasons are needed for storing the PDF in the
>> database I'd like to know the reasons for *not* doing so.
>>
>
> It increases the load, consume connections, but the biggest drawback is
> probably the memory consumption ..
>
> IMHO storing binary data in a database is almost always a bad idea .. it
> could be OK to store things like avatars, small icons, etc, but
> certainly not to store files of several MB ... file systems are done for
> that !

_however_, you lose transactional properties when doing this. You can 
land up with a file that's been added/updated where the associated 
transaction in the DB failed or rolled back. Very careful application 
programming and the use of 2 phase commit can provide reliable 
behaviour, but it's not trivial.

This is one area where I envy Microsoft. As they control the file system 
and the database, they've been able to come up with a really cool system 
where the file system integrates into database transactions, so you kind 
of get the best of both worlds. Very cool. If reiser4 hadn't gone the 
way of the dodo such a thing might've become possible on Linux, but I'm 
not aware of any other Linux file systems that safely support transactions.

--
Craig Ringer


Re: column type for pdf file

From
Piotr Czekalski
Date:
Right!
The external binary file storage has another advantage (that may be 
considered as disadvantage as well) - it is usually easier to develop 
and test mechanism as you're able to browse uploaded file result using 
os / application or even "exchange" file contents for test purposes, 
while in case of iternal storage you need some extra code to do it for 
you and you newer know, if it works well. Another problem is caching / 
feeding files in case of heavy load condition - caching of huge objects 
is problematic while you never know the dimension of the problem, 
non-caching approach may cause bottlenecks on frequent database reads.
I was considering both scenarios and finally've choosen external storage 
for our invoice generation system.

Regards,

Piotr



Re: column type for pdf file

From
Emi Lu
Date:
Hello all,

All right, it seems that everyone thinks saving a pdf into postgresql is 
not a good idea.

My situation is:
=====================
. pdf file: 500kb
. One year I need to save around 65 files = 32M

As a summary, disadvantages are:
==================================
. Memory issue when read/save/retrieve the file
. Consume connections
. Increase load
. during transaction lo may be lost?
. file systems do better than DB   . storing a reference to a file stored outside the database is 
preferable

If I miss anything, please add them.

I am curious, for what circumstances, should the lo be used in postgresql?

Thanks a lot!
Emi







On 05/19/2011 02:00 AM, Piotr Czekalski wrote:
> Right!
> The external binary file storage has another advantage (that may be
> considered as disadvantage as well) - it is usually easier to develop
> and test mechanism as you're able to browse uploaded file result using
> os / application or even "exchange" file contents for test purposes,
> while in case of iternal storage you need some extra code to do it for
> you and you newer know, if it works well. Another problem is caching /
> feeding files in case of heavy load condition - caching of huge objects
> is problematic while you never know the dimension of the problem,
> non-caching approach may cause bottlenecks on frequent database reads.
> I was considering both scenarios and finally've choosen external storage
> for our invoice generation system.
>
> Regards,
>
> Piotr
>
>


-- 
Emi Lu, ENCS, Concordia University, Montreal H3G 1M8
emilu@encs.concordia.ca        +1 514 848-2424 x5884


Re: column type for pdf file

From
Karsten Hilbert
Date:
On Thu, May 19, 2011 at 09:39:54AM -0400, Emi Lu wrote:

> All right, it seems that everyone thinks saving a pdf into postgresql
> is not a good idea.

No.

> As a summary, disadvantages are:
> ==================================
> . Memory issue when read/save/retrieve the file
> . Increase load

Those can be a consideration, yes.

> . Consume connections

You will need a connection to the database anyway. There's
no need to use another one.

> . during transaction lo may be lost?

Huh ?

> . file systems do better than DB

That's not a fact but rather an assertion. And a partial one
at that.

You may want to search the archives a bit as this has been
discussed at length various times.

Karsten
-- 
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: column type for pdf file

From
Craig Ringer
Date:
On 05/19/2011 09:53 PM, Karsten Hilbert wrote:
> On Thu, May 19, 2011 at 09:39:54AM -0400, Emi Lu wrote:
>
>> All right, it seems that everyone thinks saving a pdf into postgresql
>> is not a good idea.
>
> No.

Hardly everyone. You lose transaction safety when using file system 
storage outside the DB, you need another way to talk to the server than 
just the Pg connection, and most importantly your backups become more 
complicated because you have two things to back up.

It's not simple, and it depends a lot on how much the data changes, how 
big the files are, etc.

--
Craig Ringer


Re: column type for pdf file

From
Emi Lu
Date:
Craig, Karsten,

> Hardly everyone. You lose transaction safety when using file system
> storage outside the DB, you need another way to talk to the server than
> just the Pg connection, and most importantly your backups become more
> complicated because you have two things to back up.
>
> It's not simple, and it depends a lot on how much the data changes, how
> big the files are, etc.

The situation is:
======================
. pdf file size: 500kb
. 65 files per year = 32M
. operation: read/save/remove  but the total file number is around 65 files per year

How is the above case, saving pdf files into psql8.3 is an acceptable way?

Thank you,
Emi


Re: column type for pdf file

From
"Ross J. Reedstrom"
Date:
On Wed, May 18, 2011 at 05:06:36PM -0600, Eric McKeeth wrote:
> On Wed, May 18, 2011 at 2:20 PM, Emi Lu <emilu@encs.concordia.ca> wrote:
> 
> > Hello,
> >
> > To save pdf files into postgresql8.3, what is the best column type?
> >
> > bytea, blob, etc?
> >
> > Thank you,
> > Emi
> >
> 
> Everyone else has pointed out reasons for not doing this, and I agree with
> them that in the large majority of cases just storing a reference to a file
> stored outside the database is preferable. However, to answer the question
> you asked, my rule of thumb is that if you need to store binary data in the
> database is to use a bytea column, unless you need the random access
> capabilities that the large object interface provides. A bytea column is
> typically easier to use, and has proper transactional behavior, enforcement
> of referential integrity, etc.
> 

I'm with Eric on this one: for smaller use cases, the convenience of bytea
in the db is nice. As to random access, I wrote a client-side wrapper
for our middleware that implements a file iterator interface for python
on top of substr(bytea,position,blocksize). I was sort of surprised at
how well it performed. We're using it in production right now.

I actually store files in a leaf table w/ and id and hash, with
filenames in a separate linking table, so I'm even getting data
deduplication (all the rage in biz these days) for free.

Ross
-- 
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE