Thread: Re: 500KB PDF files for postgresql8.3, which is the most efficient way?

Hello,

I'd like to have more comments about the following case:

> . 500KB per PDF file; 30 files per year
> . PSQL8.3
   . struts2.2.3 + mybatis for sql operation
   . tomcat6

Added more info

  Solution:
  ====================
  (1) Save pdfs to file system, only point file name in psql8.3

  (2) Save oids of pdfs into table

  (3) Save pdf files as bytea column in psql8.3


Pros and cons for (1), (2), (3), which is the most efficient way?

Thanks a lot!
Emi


Re: Re: 500KB PDF files for postgresql8.3, which is the most efficient way?

From
Samuel Gendler
Date:


On Fri, May 27, 2011 at 9:15 AM, Emi Lu <emilu@encs.concordia.ca> wrote:

 Solution:
 ====================
 (1) Save pdfs to file system, only point file name in psql8.3

 (2) Save oids of pdfs into table

 (3) Save pdf files as bytea column in psql8.3


Pros and cons for (1), (2), (3), which is the most efficient way?

You asked about the "most efficient" but you didn't tell us what you are doing with the pdf files.  It really doesn't matter how you store it if you are simply inserting a 500KB object 30 times per year - then it becomes all about convenience.  I'd probably rule out solution #1 in that case purely from a maintenance perspective.  Storing the PDFs in the db eliminates the maintenance work of ensuring that the files are archived along with the db backups, that file paths stay correct whenever a new db host is provisioned, and storing the files in the db provides transaction-safe file storage.  If, on the other hand, you are selecting/updating the rows in question several hundred times per second or more, then you may well want to put some thought into efficiency and, assuming some kind of web access, providing your webserver/caching layer with direct access to the files on the filesystem for service efficiency.  But if you are only occasionally accessing the files in question, there's little reason not to put them in the db.  

If mybatis can't load the object that references the file without also loading the entire file from the db - and you are using that object for other things that will require frequent loading/storing - then you probably don't want the files in the db.  If it can load the object without loading the binary data, by lazily loading that column only when requested, then it doesn't matter.  Using BLOBs guarantees that you can access the entire row without loading the binary data if mybatis exposes the blob separately from the rest of the object, but mybatis may be capable of issuing a select without that column and then grabbing that column as needed in order to simulate that in the case of a bytea column, anyway.  Internally, a large bytea column is treated similarly to a blob, with any data over a certain size not stored in-line with the rest of the row for efficiency reasons.



Re: Re: 500KB PDF files for postgresql8.3, which is the most efficient way?

From
Piotr Czekalski
Date:
Hello Emi Lu,

The detailed discussion was just about two weeks ago on the plpgsql 
list, worth of glance.
Actually it is quite difficult to say which one is better in your case 
as it depends on utility (rapid file hostin or just storing but 
transaction safety is important, etc.).
Generally it seems that your requirements are rather small.

So briefly, my point of view on it:
(1) Pro:
- easy to implement,
- more efficient when storing and retrieving files than other solutions
- easy hosting over dedicated file server (with caching i.e.)
- DB size is small, files are separated - important on shared hosting 
where DB size limit is usually much smaller than file storage limit.
Cons:
- nontransactional - actually you may try to create double commit but it 
still may fail as it is not "atomic" operation, on two (FS and DB)
- more complicated to backup and restore.

(2) no opinion on mine

(3)
Pro:
- consistent,
- smart solution ;-)
- easy to backup / restore
- transactional
Cons:
- harder to implement storing and retrieving files
- may be non efficient (usually requires app side caching)
- DB size is bigger

Regards,

P.

-- 

--------------------------------------------------------------
"TECHBAZA.PL" Sp. z o.o.
Technologie WEB, eDB&  eCommerce
tel. (+4832) 7186081
fax. (+4832) 7003289
email: biuro@techbaza.pl
web: http://www.techbaza.pl
web: partner.techbaza.pl
web: ekspert.techbaza.pl