Thread: Large Objects
I need to do some investigation into the way Postgres handles large objects for a major project involving large objects. My questions are: * Can large objects be stored within a field value or must they be referenced by OID? * Are large objects backed up in the normal way or does special action have to be taken? * If the objects are restored or migrated will they retain their OID? * If not, is there another means of referencing them that would be persistent through migrations? * Is it practical/desirable to store files MIME-Encoded inside a text field? * The obvious advantages: * definitely portable across migrations and backups * based on universal technology * easy to code * easy to reference and version-control * The obvious disadvantages: * slow, Slow, SLOW * significant increase in per-file storage requirements Any help would be greatly appreciated. Hope I'm posting to the right list. Dan
On Mon, Dec 27, 2004 at 10:39:48 -0600, Dan Boitnott <dan@mcneese.edu> wrote: > I need to do some investigation into the way Postgres handles large > objects for a major project involving large objects. My questions are: I don't know the answer to all of your questions. > * Is it practical/desirable to store files MIME-Encoded inside a > text field? This should be possible if the files aren't too large. bytea is another type that might be better to use. > * The obvious disadvantages: > * slow, Slow, SLOW If you always need to access the whole file this might not be too bad. But if you only need to access a small part, you are going to pay a big cost as the whole record will need to be retrieved before you can pick out the part you want. > * significant increase in per-file storage requirements It might not be too bad as large records can be compressed. That should get back some of the bloat from uuencoding.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'd advise use of BYTEA as well. It's much simpler to work with than the OIDs, and has simpler semantics. You do need to escape data before handing it to the query string, and handle escaped results (see the docs), but overall much nicer than working with OIDs. On Dec 31, 2004, at 1:21 AM, Bruno Wolff III wrote: > On Mon, Dec 27, 2004 at 10:39:48 -0600, > Dan Boitnott <dan@mcneese.edu> wrote: >> I need to do some investigation into the way Postgres handles large >> objects for a major project involving large objects. My questions >> are: > > I don't know the answer to all of your questions. > >> * Is it practical/desirable to store files MIME-Encoded inside a >> text field? > > This should be possible if the files aren't too large. bytea is > another type > that might be better to use. > >> * The obvious disadvantages: >> * slow, Slow, SLOW > > If you always need to access the whole file this might not be too bad. > But if you only need to access a small part, you are going to pay a big > cost as the whole record will need to be retrieved before you can pick > out the part you want. > >> * significant increase in per-file storage requirements > > It might not be too bad as large records can be compressed. That > should get > back some of the bloat from uuencoding. > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match > > - ----------------------------------------------------------- Frank D. Engel, Jr. <fde101@fjrhome.net> $ ln -s /usr/share/kjvbible /usr/manual $ true | cat /usr/manual | grep "John 3:16" John 3:16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. $ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFB1XbY7aqtWrR9cZoRAp6PAJ0UMNDpfeiI2iUaAp3CMIyaxuJNgQCffoqJ mn4M418e7V9YZX5fwte9Ra0= =iXtd -----END PGP SIGNATURE----- ___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com
Frank D. Engel, Jr. wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I'd advise use of BYTEA as well. It's much simpler to work with than > the OIDs, and has simpler semantics. You do need to escape data > before handing it to the query string, and handle escaped results (see > the docs), but overall much nicer than working with OIDs. BYTEA is not always pragmatic. What is the file is 100 megs? 256 megs? pg_largeobject is more efficient than BYTEA for larger binaries. Sincerely, Joshua D. Drake > > On Dec 31, 2004, at 1:21 AM, Bruno Wolff III wrote: > >> On Mon, Dec 27, 2004 at 10:39:48 -0600, >> Dan Boitnott <dan@mcneese.edu> wrote: >> >>> I need to do some investigation into the way Postgres handles large >>> objects for a major project involving large objects. My questions are: >> >> >> I don't know the answer to all of your questions. >> >>> * Is it practical/desirable to store files MIME-Encoded inside a >>> text field? >> >> >> This should be possible if the files aren't too large. bytea is >> another type >> that might be better to use. >> >>> * The obvious disadvantages: >>> * slow, Slow, SLOW >> >> >> If you always need to access the whole file this might not be too bad. >> But if you only need to access a small part, you are going to pay a big >> cost as the whole record will need to be retrieved before you can pick >> out the part you want. >> >>> * significant increase in per-file storage requirements >> >> >> It might not be too bad as large records can be compressed. That >> should get >> back some of the bloat from uuencoding. >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: the planner will ignore your desire to choose an index scan if >> your >> joining column's datatypes do not match >> >> > - ----------------------------------------------------------- > Frank D. Engel, Jr. <fde101@fjrhome.net> > > $ ln -s /usr/share/kjvbible /usr/manual > $ true | cat /usr/manual | grep "John 3:16" > John 3:16 For God so loved the world, that he gave his only begotten > Son, that whosoever believeth in him should not perish, but have > everlasting life. > $ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (Darwin) > > iD8DBQFB1XbY7aqtWrR9cZoRAp6PAJ0UMNDpfeiI2iUaAp3CMIyaxuJNgQCffoqJ > mn4M418e7V9YZX5fwte9Ra0= > =iXtd > -----END PGP SIGNATURE----- > > > > ___________________________________________________________ > $0 Web Hosting with up to 120MB web space, 1000 MB Transfer > 10 Personalized POP and Web E-mail Accounts, and much more. > Signup at www.doteasy.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Attachment
Joshua D. Drake wrote: > Frank D. Engel, Jr. wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> I'd advise use of BYTEA as well. It's much simpler to work with than >> the OIDs, and has simpler semantics. You do need to escape data >> before handing it to the query string, and handle escaped results >> (see the docs), but overall much nicer than working with OIDs. > > > BYTEA is not always pragmatic. What is the file is 100 megs? 256 megs? > Intresting. What is the size when bytea become inafective ? Currently i keep all my products images in bytea record. is it practical ? how slower is it then accessing an image on a file system ( like ext3 ) ? Cheers > > pg_largeobject is more efficient than BYTEA for larger binaries. > > Sincerely, > > Joshua D. Drake > > -- -------------------------- Canaan Surfing Ltd. Internet Service Providers Ben-Nes Michael - Manager Tel: 972-4-6991122 Cel: 972-52-8555757 Fax: 972-4-6990098 http://www.canaan.net.il --------------------------
On Sat, Jan 01, 2005 at 01:28:04PM +0300, Michael Ben-Nes wrote: > Joshua D. Drake wrote: > >Frank D. Engel, Jr. wrote: > >>I'd advise use of BYTEA as well. It's much simpler to work with than > >>the OIDs, and has simpler semantics. You do need to escape data > >>before handing it to the query string, and handle escaped results > >>(see the docs), but overall much nicer than working with OIDs. > > > > > >BYTEA is not always pragmatic. What is the file is 100 megs? 256 megs? > > > Intresting. > What is the size when bytea become inafective ? I don't think it's so much a matter of effectiveness, it makes no difference at all in storage space. The issue is that if you store it in a field, accessing it becomes an all or nothing affair, which means if it's a 100Mb object, it's all going to be accessed whenever you ask for it. OTOH, large objects have lo_read/write/seek meaning you can access small parts at a time. So I imagine if you're storing large PDF files and all you're doing is dumping them to a client when they ask, it doesn't matter. But if the objects have structure and you might be interested in looking inside them without pulling the whole object down, the LO interface is better suited. When you delete a row, the object contained in it goes away too. Large Objects have a lifecycle outside of normal table values, and so may need separate managing... Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
>> > Intresting. > What is the size when bytea become inafective ? > > Currently i keep all my products images in bytea record. is it > practical ? Well I am going to make the assumption that you product images are small... sub 100k or something. Bytea is just fine for that. The problem is when the binary you want to store is 50 megs. When you access that file you will be using 50 megs of ram to do so. Large Objects don't work that way, you don't have the memory overhead. So it really depends on what you want to store. > > how slower is it then accessing an image on a file system ( like ext3 ) ? Well that would be an interesting test. Ext3 is very slow. I would assume that Ext3 would be faster just because of the database overhead. However you gain from having the images in the database for flexibility and manageability. Sincerely, Joshua D. Drake > > > Cheers > >> >> pg_largeobject is more efficient than BYTEA for larger binaries. >> >> Sincerely, >> >> Joshua D. Drake >> >> > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Attachment
> > >BYTEA is not always pragmatic. What is the file is 100 megs? 256 megs? > > What is the size when bytea become inafective ? > I don't think it's so much a matter of effectiveness, it makes no > difference at all in storage space. Ah, thanks, good to know. Something new to learn every day... > The issue is that if you store it > in a field, accessing it becomes an all or nothing affair, which means > if it's a 100Mb object, it's all going to be accessed whenever you ask > for it. At least for reads you are wrong. You can use substring() on bytea quite nicely. Remember, however, that that operates on *bytes*, not characters. Also be careful about encodings being set for the connection. At least with PostgreSQL < 7.4 we found we had to "reset client_encoding" despite the docs saying encodings won't affect bytea field reads. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Jan 1, 2005, at 11:40 AM, Joshua D. Drake wrote: > >>> >> Intresting. >> What is the size when bytea become inafective ? >> >> Currently i keep all my products images in bytea record. is it >> practical ? > > Well I am going to make the assumption that you product images are > small... > sub 100k or something. Bytea is just fine for that. The problem is when > the binary you want to store is 50 megs. When you access that file you > will be using 50 megs of ram to do so. > > Large Objects don't work that way, you don't have the memory overhead. > So > it really depends on what you want to store. > I prefer the _idea_ of using large objects but am worried about the implications. Without them I can back up the database using pg_dump and get a single tar file which can perfectly represent the database. This gives me (and those on high) the warm-fuzzies. If I store files (PDFs of varying sizes by the way, say from 500k to 50M) as large objects, will I still be able to restore the _whole_ database from a single pg_dump tar file? > >> >> how slower is it then accessing an image on a file system ( like ext3 >> ) ? > > Well that would be an interesting test. Ext3 is very slow. I would > assume > that Ext3 would be faster just because of the database overhead. > However you gain from having the images in the database for > flexibility and manageability. > > Sincerely, > > Joshua D. Drake > > > >> >> >> Cheers >> >>> >>> pg_largeobject is more efficient than BYTEA for larger binaries. >>> >>> Sincerely, >>> >>> Joshua D. Drake >>> >>> >> > > > -- > Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC > Postgresql support, programming shared hosting and dedicated hosting. > +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com > PostgreSQL Replicator -- production quality replication for PostgreSQL > > <jd.vcf> > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Sat, 2005-01-01 at 19:50 -0600, Dan Boitnott wrote: > On Jan 1, 2005, at 11:40 AM, Joshua D. Drake wrote: > > > > >>> > >> Intresting. > >> What is the size when bytea become inafective ? > >> > >> Currently i keep all my products images in bytea record. is it > >> practical ? > > > > Well I am going to make the assumption that you product images are > > small... > > sub 100k or something. Bytea is just fine for that. The problem is when > > the binary you want to store is 50 megs. When you access that file you > > will be using 50 megs of ram to do so. > > > > Large Objects don't work that way, you don't have the memory overhead. > > So > > it really depends on what you want to store. > > > > I prefer the _idea_ of using large objects but am worried about the > implications. Without them I can back up the database using pg_dump > and get a single tar file which can perfectly represent the database. > This gives me (and those on high) the warm-fuzzies. If I store files > (PDFs of varying sizes by the way, say from 500k to 50M) as large > objects, will I still be able to restore the _whole_ database from a > single pg_dump tar file? Yes, you will be able to do this. Your pg_dump http://www.postgresql.org/docs/current/static/app-pgdump.html > -b > --blobs > > Include large objects in dump. > > -Robby -- /*************************************** * Robby Russell | Owner.Developer.Geek * PLANET ARGON | www.planetargon.com * Portland, OR | robby@planetargon.com * 503.351.4730 | blog.planetargon.com * PHP/PostgreSQL Hosting & Development * --- Now supporting PHP5 --- ****************************************/
> gives me (and those on high) the warm-fuzzies. If I store files (PDFs > of varying sizes by the way, say from 500k to 50M) as large objects, > will I still be able to restore the _whole_ database from a single > pg_dump tar file? Don't forget a thing : If you put a webserver in front of this, you will have to load the entire object in RAM before you can serve it to the client, which means a little number of requests on large files will kick the server into swapalot mode. Using filesystem files solves this in the most easy of ways, but creates backup problems, but it's not that complex.