Re: Large objects [BLOB] again - general howto - Mailing list pgsql-general

From Jeremiah Jahn
Subject Re: Large objects [BLOB] again - general howto
Date
Msg-id 1069800564.5827.1508.camel@bluejay.goodinassociates.com
Whole thread Raw
In response to Re: Large objects [BLOB] again - general howto  ("Randolf Richardson, DevNet SysOp 29" <rr@8x.ca>)
List pgsql-general
Here's a quick list of my experiences with BLOB's and such.

Performance is just fine, I get about 1M hits a month and haven't had
any problems. Use a BLOB if you don't need to search though the data.
The main reason being that bytea and text types are parsed. To explain,
your entire SQL statement has to be parsed to make sure it's valid, this
includes you 100 meg jpeg. Slow, REAL slow.

I have found that it is best to have a separate connection for BLOB's
and one for everything else. Mind you, this is with Java, but the
autocommit settings on the connection don't appear to be thread safe, so
in high traffic you can accidentally cut off a transfer, or stop one
before it ever starts.

-jj-

On Sun, 2003-11-23 at 10:48, Randolf Richardson, DevNet SysOp 29 wrote:
> > Someone asks about performance of Large objects [LO] in PgSql [PG].  It
> > was interesting for me, because I didn't work yet with them yet and I
> > will have to soon. I tried search the web, doc and mailinglists, but I
> > didn't found any adequate reply. I would be happy, of someone, who have
> > experimence with them (or who know PG internals :) can make it clean. I
> > think I am not alone, who is interested in this topic.
>
>         You're certainly not alone.  I haven't done this yet, but will need to
> in the near future with a web-based database project I'm in the midst of
> planning at the moment which will involve allowing users to upload PNG, GIF
> and JPeG images and having them displayed in a sort of "picture album" type
> layout.
>
> > In past time, where row data was limited by single tupple (8K-32K), LO
> > was really needed, if you wanted to store more than 8K per row. Older
> > implementation issues are described in documentation.
>
>         I'm somewhat concerned about this, but not because of the large object
> support (which I understand is handled by not storing the object data
> directly in the row, thus circumventing this limitation altogether), rather
> I think about scalability with large rows that have a lot of columns that
> sum up to more than 32,768 bytes in size.
>
> > Today, there is 1G limit per row. Is there reason to use LO instead of
> > storing simple 'text' into database table ? (let's ignore 2G limit on
>
>         That depends on the needs of your application.  If all you're storing
> is text data, then the choice is yours.  If you're storing binary data,
> then simple text probably won't be appropriate.
>
> > LO, which can make LO interesting in some cases :)Documentation says it
> > is obsolvete. But documentation also doesn't tell anything about
> > performance, suggestion about using, etc.
> >
> >     By 'access' to binary file, I understand:
> >
> >         - reading (and then working with its contents on server side)
> >         - sending (read and send as-is to client, without workaround)
> >         - updating (from contents which I get from somewhere)
> >
> >     LO means storing file using large objects,
> >     column means storing file as 'text' column (standard way).
>
>         I don't know about this.  Hopefully someone else who has experience in
> this area will jump in here and clear this matter up.
>
> > Let's say I am having 'file' - smaller (tenhts of Kb) - let's say some
> > JPG image (photo of beer :) The file is 'readed' as is and passed to
> > client (browser, app, whatever). In this case, would be probably better
> > to store file as normal column, because it is easier to access it. This
> > is one of typical use. Every time I will write to that image, SQL parser
> > need to parse all datas, allocate proper memory, .... I think, real
> > differnece will not be too big, since picture having just tenths of Kb.
>
>         If you can ensure that the image size will never exceed the width of
> your column, then you'll be okay, but note that image sizes can vary widely
> depending on a number of factors in addition to image dimensions, one of
> which that is often overlooked is the possibility of poorly designed
> compression algorithms in the applications used to generate the JPeG image
> or the user selecting minimal compression because they want a better
> quality image (I do this sometimes in the web development I do, but only
> when it's appropriate).
>
>         Remember, however, that storing binary data in a column intended for
> text might cause some problems down the road (even if it doesn't now) since
> it will contain unexpected data.  You probably should look into the text
> encoding systems out there to see if there could be some potential problems
> with, for example, your data being misinterpreted as Chinese text.
>
> > But let's have the same picture, but in pretty fine resolution for
> > printing. File size will be megabytes and more [let's say 10Mb].
> > Ofcourse, file should be stored in DB because of data integrity. In this
> > case, we will need pretty much memory while reading/updating. All the
> > time. And if I will get many similar requests - and they will go
> > directly to DB, not throw sime 'local cache', it can raise problem.
>
>         If you can't trust your underlying file system to store files
> reliably, then you really shouldn't store your database their either.
>
>         If you're concerned about the transactional side of things, then your
> concern is definitely warranted.
>
> [sNip -- other questions I don't know the answers for, but would like to]
> > I wish new PgSql 7.4 documentation (as far I know it was not yet
> > released) contain small chapter / paragraph at LO about this. If someone
> > is new to PG (like me), there is no enough information about 'using' LO.
>
>         I'd like to see this too.
>
> > Thank you for corrections and comments about LO/columns binary storing.
> [sNip]
>
>         Thanks for asking.  I've been meaning to ask about some of this
> myself, but you've saved me the trouble.  =)
--
"You can't make a program without broken egos."
--
Jeremiah Jahn <jeremiah@cs.earlham.edu>


pgsql-general by date:

Previous
From: "D. Stimits"
Date:
Subject: performance versus order of fields in row
Next
From: Brian Hirt
Date:
Subject: plpgsql question