Thread: 4B row limit for CLOB tables
Hello. I see on this page a mention of basically a 4B row limit for tables that have BLOB's https://wiki.postgresql.org/wiki/BinaryFilesInDB Is this fact mentioned in the documentation anywhere? Is there an official source for this? (If not, maybe consider this a feature request to mention it in the documentation on BLOB). Cheers and thanks. -roger-
On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote: > Hello. I see on this page a mention of basically a 4B row limit for > tables that have BLOB's Oops I meant for BYTEA or TEXT columns, but it's possible the reasoning is the same... > https://wiki.postgresql.org/wiki/BinaryFilesInDB > > Is this fact mentioned in the documentation anywhere? Is there an > official source for this? (If not, maybe consider this a feature > request to mention it in the documentation on BLOB). > Cheers and thanks. > -roger- >
On 01/29/2015 09:51 AM, Roger Pack wrote: > Hello. I see on this page a mention of basically a 4B row limit for > tables that have BLOB's > https://wiki.postgresql.org/wiki/BinaryFilesInDB > > Is this fact mentioned in the documentation anywhere? Is there an > official source for this? (If not, maybe consider this a feature > request to mention it in the documentation on BLOB). Take a look at: http://www.postgresql.org/about/ Pretty sure it has to do with this: Maximum Table Size 32 TB > Cheers and thanks. > -roger- > > -- Adrian Klaver adrian.klaver@aklaver.com
On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2@gmail.com> wrote: > On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote: >> Hello. I see on this page a mention of basically a 4B row limit for >> tables that have BLOB's > > Oops I meant for BYTEA or TEXT columns, but it's possible the > reasoning is the same... It only applies to large objects, not bytea or text. >> https://wiki.postgresql.org/wiki/BinaryFilesInDB Some of that looks incorrect or out of date. (e.g. large objects can be a lot bigger than 2GB in 9.3+). >> >> Is this fact mentioned in the documentation anywhere? Is there an >> official source for this? (If not, maybe consider this a feature >> request to mention it in the documentation on BLOB). >> Cheers and thanks. >> -roger I'm not sure whether it's mentioned explicitly, but large objects are referenced by an OID, which is a 32 bit value (and a global resource). If you had 4B BLOBs, though, running out of OIDs would probably be the least of your worries. Cheers, Steve
On Thu, 29 Jan 2015 10:41:58 -0800 Steve Atkins <steve@blighty.com> wrote: > > >> Is this fact mentioned in the documentation anywhere? Is there an > >> official source for this? (If not, maybe consider this a feature > >> request to mention it in the documentation on BLOB). > >> Cheers and thanks. > >> -roger > > I'm not sure whether it's mentioned explicitly, but large objects are > referenced by an OID, which is a 32 bit value (and a global resource). > > If you had 4B BLOBs, though, running out of OIDs would probably be > the least of your worries. Because of how other RDBMs systems use BLOB-ish types, I think a lot of people get confused about when to use bytea vs. a large object in PostgreSQL ... and as a result, end up using large objects more often than is really necessary. Large objects are for LARGE data ... keep in mind that a bytea column can store up to 4G of data. While that seems excessive, it's perfectly reasonable to use it to store images and other data that's frequently in the "several megabytes" range. In general, if you can transfer the entirety of the data in a single shot, then bytea will work fine for you. Large objects are for something more like streaming media, where it's impractical to store the entire file in memory, even for a short time. I.e. you'd read ~100k from the DB into application memory, do processing on that data, then discard it and read another 100k. While large objects certainly fill a nitch and for some uses are the only way to make things work, I have never actually seen an implementation where large objects were the right solution to the problem. (Of course, I've never worked in the movie or music industry) but I've found that bytea is usually the correct storage method for things like PDF files, wordprocessor files, images, etc ... anything where the entire file needs to be delivered before it can be used. -- Bill Moran
Forgot to reply all on this one, many thanks to Steve Adrian and Bill for their answers. On Jan 29, 2015, at 12:32 PM, Roger Pack <rogerdpack2@gmail.com> wrote: > On 1/29/15, Steve Atkins <steve@blighty.com> wrote: >> >> On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2@gmail.com> wrote: >> >>> On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote: >>>> Hello. I see on this page a mention of basically a 4B row limit for >>>> tables that have BLOB's >>> >>> Oops I meant for BYTEA or TEXT columns, but it's possible the >>> reasoning is the same... >> >> It only applies to large objects, not bytea or text. >> >>>> https://wiki.postgresql.org/wiki/BinaryFilesInDB >> >> Some of that looks incorrect or out of date. (e.g. large objects can be a >> lot >> bigger than 2GB in 9.3+). >> >> >>>> >>>> Is this fact mentioned in the documentation anywhere? Is there an >>>> official source for this? (If not, maybe consider this a feature >>>> request to mention it in the documentation on BLOB). >>>> Cheers and thanks. >>>> -roger >> >> I'm not sure whether it's mentioned explicitly, but large objects are >> referenced by an OID, which is a 32 bit value (and a global resource). > > Thanks for the info, precisely what I was looking for. > As a following up, could you elaborate on what you mean by "global > resource"? I believe OID's are generated from a global counter. Does > this mean the maximum number of large objects in the database is 4B? Well, OIDs are generated from a global counter, but there's nothing to stop that wrapping around. OIDs are used in a lot of places in the system - to identify tables, and functions and loaded modules and suchlike, and duplicates are prevented by unique indexes or similar. But that means that if the OID counter were to wrap around and return an OID that was already in use in the same context then the attempt to use it would fail. For instance, you might try to create a table, and it would fail because the "next" OID was already used to specify another table. Wrapping the OID counter around will cause all sorts of things to break. Use of OIDs by the user (as opposed to by postgresql itself for internal bookkeeping) has been deprecated for years. That's one reason, but not the only reason, that I don't believe anyone should every use the postgresql large object infrastructure. For small (megabytes rather than gigabytes) chunks of data that might be processed in the database or might not, bytea or text types are the right thing. For anything larger, or anything that's not actually processed within the database (e.g. images or PDFs handled by a webapp) then leaving the file on the filesystem and just storing metadata in the database is usually the right thing. > If you actually had that many BLOB's (and the counter wrapped) I > assume that lo_create would start failing [i.e. it has some kind of > uniqueness constraint on the oid]? Or something like that? lo_create() would fail, but so would the rest of the database. Nothing would work. A billion large objects is too many. (Exactly a billion too many, in fact). Cheers, Steve
On 1/29/15, Steve Atkins <steve@blighty.com> wrote: > > On Jan 29, 2015, at 9:53 AM, Roger Pack <rogerdpack2@gmail.com> wrote: > >> On 1/29/15, Roger Pack <rogerdpack2@gmail.com> wrote: >>> Hello. I see on this page a mention of basically a 4B row limit for >>> tables that have BLOB's >> >> Oops I meant for BYTEA or TEXT columns, but it's possible the >> reasoning is the same... > > It only applies to large objects, not bytea or text. OK I think I figured out possibly why the wiki says this. I guess BYTEA entries > 8KB will be autostored via TOAST, which uses an OID in its backend. So BYTEA have the same limitation. It appears that disabling TOAST is not an option [1]? So I guess if the number of BYTEA entries (in the sum all tables? partitioning doesn't help?) with size > 8KB is > 4 billion then there is actually no option there? (I'm not running into that case just conjecturing). Thanks! -roger- [1] http://www.postgresql.org/message-id/20130405140348.GC4326@awork2.anarazel.de
> I'm not sure whether it's mentioned explicitly, but large objects are > referenced by an OID, which is a 32 bit value (and a global resource). Large object is not necessarily referenced by OID since 8.1. You can assign arbitrary 32 bit integers as long as they are unique in the pg_largeobject table. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp