Thread: 4 billion record limit?
PostgreSQL uses a hidden "oid" field to uniquely identify all records across all tables in all databases on a particular server. On my platform, this number is an unsigned 32-bit integer, meaning 0 to about 4 billion. 1. This implies a hard limit of 4 billion records on a server, right? 2. When I delete records, are those oids ever reused? -- Paul Caskey paul@nmxs.com Software Engineer New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
I'm not too familiar with the postgres internals, but the oid tells postgres where the row is stored in the filesystem. So by the time you've run out of oid's you've probably run out of space on your filesystem. Paul Caskey wrote: > > PostgreSQL uses a hidden "oid" field to uniquely identify all records > across all tables in all databases on a particular server. On my > platform, this number is an unsigned 32-bit integer, meaning 0 to about 4 > billion. > > 1. This implies a hard limit of 4 billion records on a server, right? > > 2. When I delete records, are those oids ever reused? > > -- > Paul Caskey paul@nmxs.com Software Engineer > New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 > --
Paul Caskey wrote: > > PostgreSQL uses a hidden "oid" field to uniquely identify all records > across all tables in all databases on a particular server. On my > platform, this number is an unsigned 32-bit integer, meaning 0 to about 4 > billion. > > 1. This implies a hard limit of 4 billion records on a server, right? Basically, yes. > 2. When I delete records, are those oids ever reused? No. The real solution is to support 64 bit oids, but that has not been done yet.
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > Paul Caskey wrote: >> 1. This implies a hard limit of 4 billion records on a server, right? > Basically, yes. It's only a hard limit if your application assumes OIDs are unique. If you don't assume that, then I think it's not a big problem. It's possible (though obviously not especially likely) that you might get OID collisions in the system tables after an OID-counter wraparound. I believe that in 7.0, any such collision will result in a "duplicate key" error, because we have unique indexes on OID for all the system tables where it matters. So worst case is you'd have to retry a table creation or whatever the command was. This could be annoying but it's certainly not catastrophic. > The real solution is to support 64 bit oids, but that has not been done > yet. 64-bit OIDs are probably the Final Solution here, but we're not in a big hurry to do that because of the space cost and portability issues. Another idea that's been discussed is to make OID column optional in user tables --- then, simply not using OIDs in your large tables would be enough to make the 32-bit OID supply last indefinitely. As long as we're talking about outer limits, it might be worth pointing out that block numbers within tables are 32 bits. With the default blocksize of 8K, that sets a hard limit of either 16Tb or 32Tb per table, depending on whether block-number arithmetic is unsigned or not (I'm not sure). You could push that to 64/128Tb by using BLCKSZ=32K. If that won't do, maybe you need a different DBMS ;-) regards, tom lane
Tom Lane wrote: > > Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > > Paul Caskey wrote: > >> 1. This implies a hard limit of 4 billion records on a server, right? > > > Basically, yes. > > It's only a hard limit if your application assumes OIDs are unique. > If you don't assume that, then I think it's not a big problem. > > It's possible (though obviously not especially likely) that you might > get OID collisions in the system tables after an OID-counter wraparound. This implies they do wrap around. So they are reused? Chris said no, but you're saying yes. (Maybe they wrap around "by accident", by adding one to MAXINT, which will give zero on an unsigned int, I believe. Will the system choke on zero? Has anyone tested this wraparound?) I will not have 4 billion records in one table or even one database. But on a large server with many databases, it is conceivable to have 4 billion records on one machine. With a lot of insert/delete activity, over a few years, it is certainly conceivable to have 4 billion inserts. If the oids don't wrap, I have a problem. I can ignore it for a long time, but it will loom, like Y2K. :-) Even if they do wrap, if I have some old records lying around with a low OIDs, they will trip me up. Like you said, these are "the outer limits", but I'm thinking ahead. Someone suggested in private that I pg_dump/restore all my data to "repack" the oids which start around 1700 on a fresh database. Thanks for that idea. Also thanks, Tom, for the sanity check Re: terabytes of data with 4 billion records. It's still possible, especially in coming years. It would be a big feather in PG's cap to "fully support" 64-bit platforms such as IRIX and Solaris (finally) and, coming soon to a theater near you, Linux on IA-64. -- Paul Caskey paul@nmxs.com 505-255-1999 New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
Joseph Shraibman writes: > I'm not too familiar with the postgres internals, but the oid tells > postgres where the row is stored in the filesystem. So by the time > you've run out of oid's you've probably run out of space on your > filesystem. The `tid' specifies the storage location. The `oid' is more or less an arbitrary number as far as user tables are concerned. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Paul Caskey <paul@nmxs.com> writes: > Tom Lane wrote: >> It's only a hard limit if your application assumes OIDs are unique. >> If you don't assume that, then I think it's not a big problem. >> >> It's possible (though obviously not especially likely) that you might >> get OID collisions in the system tables after an OID-counter wraparound. > This implies they do wrap around. So they are reused? Chris said no, but > you're saying yes. > (Maybe they wrap around "by accident", by adding one to MAXINT, which will > give zero on an unsigned int, I believe. Will the system choke on zero? > Has anyone tested this wraparound?) Yes, yes, and yes ;-). I just committed a patch to prevent the system from generating a zero OID during wraparound. I think that the old code would not have done so in any case, but correct behavior would've depended on the stored OID counter always being a multiple of 32, which seems a tad risky to assume over a long time span. There's a more direct check for wraparound in there now. (No, I didn't test it the hard way ... just stopped the postmaster and hacked on the OID counter in the pg_variable file by hand ...) > Even if they do wrap, if I have some old records lying around with a low > OIDs, they will trip me up. No doubt about it, you're likely to get a few "duplicate key" errors and stuff like that. I'm just observing that it's not likely to be a complete catastrophe, especially not if you don't rely on OIDs to be unique in your user tables. We have talked about offering 8-byte OIDs as a compile-time option, and I think it'll happen eventually, but I'm not holding my breath. Lots of other stuff seems more pressing... regards, tom lane
THe implications of the OIDs not wrapping are immense. We have some extremely active databases that will easily reach this limit in two or three years. For some applications, such as ecommerce, dumping then reinserting the rows is not an option for large databases due to the 24 hours nature of their work. This is a much more complex problem than it would at first seem as the "tripping up" over old records with low OIDs still presents a problem, yet if the system is changed to try and manage a list of available OIDs, it will then hit performance problems. Simply waiting for 64bit numbers is rather inelegant and also presumes usage parameters for the database... remember Bill Gates saying that he couldn't foresee any usage for more than 64MB of RAM? Besides which, PostgreSQL is the best DB around... there's a high standard to maintain! Some initial ideas: a) The system remains working the way that it does until a configurable cutoff point (% of OIDs remaining), at which point the DBA can either switch on some sort of OID "garbage collection" and take the associated performance hit, or dump the data and reinsert it to pack the OIDs. GARBAGE COLLECTION: b) The system could wrap around, keeping an internal pointer of where it is in the OID chain. It could scan the OIDs sequentially finding the first free OID. It coudl then store that position as the new start point for the next time an OID is needed. c) An OID compression utility could be writen that doesn't require bringing the DB down (but will obviously have a performance it). As running this utilty would be a known impact, provision could be made, or the knock budgeted for and accounted for while not bringing the entire DB to a halt. d) OIDs themselves could form a list. The OID system as it stands now could be the "default mode" but if the OIDs run out, then a new OID list is started. This means the OID usage would then change to work along the same lines that the Intel memory adressing works. Personally I hate it, but it does work. e) OIDs could be scrapped in favour of some other system. Well, that's my two pence worth. Brad Paul Caskey wrote: > Tom Lane wrote: > > > > Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > > > Paul Caskey wrote: > > >> 1. This implies a hard limit of 4 billion records on a server, right? > > > > > Basically, yes. > > > > It's only a hard limit if your application assumes OIDs are unique. > > If you don't assume that, then I think it's not a big problem. > > > > It's possible (though obviously not especially likely) that you might > > get OID collisions in the system tables after an OID-counter wraparound. > > This implies they do wrap around. So they are reused? Chris said no, but > you're saying yes. > > (Maybe they wrap around "by accident", by adding one to MAXINT, which will > give zero on an unsigned int, I believe. Will the system choke on zero? > Has anyone tested this wraparound?) > > I will not have 4 billion records in one table or even one database. But > on a large server with many databases, it is conceivable to have 4 billion > records on one machine. With a lot of insert/delete activity, over a few > years, it is certainly conceivable to have 4 billion inserts. If the oids > don't wrap, I have a problem. I can ignore it for a long time, but it > will loom, like Y2K. :-) > > Even if they do wrap, if I have some old records lying around with a low > OIDs, they will trip me up. Like you said, these are "the outer limits", > but I'm thinking ahead. > > Someone suggested in private that I pg_dump/restore all my data to > "repack" the oids which start around 1700 on a fresh database. Thanks for > that idea. Also thanks, Tom, for the sanity check Re: terabytes of data > with 4 billion records. It's still possible, especially in coming years. > It would be a big feather in PG's cap to "fully support" 64-bit platforms > such as IRIX and Solaris (finally) and, coming soon to a theater near you, > Linux on IA-64. > > -- > Paul Caskey paul@nmxs.com 505-255-1999 > New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 > --
> Simply waiting for 64bit numbers is rather inelegant and also > presumes usage > parameters for the database... remember Bill Gates saying that he couldn't > foresee any usage for more than 64MB of RAM? I've heard this before and I just don't agree. 64MB ram, perhaps, but who is going to need 64 * 2^32? The magnitude of increase is fairly vast! I probably need not mention that a 32 bit value can store up to 4,294,967,296 where a 64 bit number can store a number that is 4,294,967,296 times as big. If 32 bit wasn't enough to keep you happy for more than a few years, a 64 bit oid really should be enough for anyone for long enough that you won't be using the same software/hardware any more. Similar to how a 32 bit unix time in seconds is only good for another ~40 years, but a 64 bit one will keep us going for billions of years. I guess the real issue is rewriting the system so that the type of oid can be chosen at compile time, so you can use whatever damn one you feel like. I would also guess that the majority of systems out there using the latest versions of postgres, already have compiler support for 64 bit integers. So when someone gets around to doing the necessary work, everything will be nice. - Andrew P.S. IMHO if you can't afford to do a drop and reload once in a while, Postgres isn't a good choice at the moment for your application.
Tom Lane wrote: > > Paul Caskey <paul@nmxs.com> writes: > > Tom Lane wrote: > >> It's only a hard limit if your application assumes OIDs are unique. > >> If you don't assume that, then I think it's not a big problem. > >> > >> It's possible (though obviously not especially likely) that you might > >> get OID collisions in the system tables after an OID-counter wraparound. > > > This implies they do wrap around. So they are reused? Chris said no, but > > you're saying yes. > > > (Maybe they wrap around "by accident", by adding one to MAXINT, which will > > give zero on an unsigned int, I believe. Will the system choke on zero? > > Has anyone tested this wraparound?) > > Yes, yes, and yes ;-). Good answer. :-) Thanks for looking into it. > > Even if they do wrap, if I have some old records lying around with a low > > OIDs, they will trip me up. > > No doubt about it, you're likely to get a few "duplicate key" errors and > stuff like that. I'm just observing that it's not likely to be a > complete catastrophe, especially not if you don't rely on OIDs to be > unique in your user tables. I don't rely on OID uniqueness, but I assumed Postgres does! I don't use the OID for anything; I maintain my own "id" field with a sequence for every table. I can catch a "duplicate key" error in my code, but I'll treat it as if my own "id" caused a problem. It will totally confuse me and my code if there's a "duplicate key" error related to a hidden system field I never create or examine. I won't know if/how to re-insert with a different key to solve the problem. > We have talked about offering 8-byte OIDs as a compile-time option, > and I think it'll happen eventually, but I'm not holding my breath. > Lots of other stuff seems more pressing... Fair enough, and thanks for all your work so far. -- Paul Caskey paul@nmxs.com 505-255-1999 New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
Paul Caskey <paul@nmxs.com> writes: >> No doubt about it, you're likely to get a few "duplicate key" errors and >> stuff like that. I'm just observing that it's not likely to be a >> complete catastrophe, especially not if you don't rely on OIDs to be >> unique in your user tables. > I don't rely on OID uniqueness, but I assumed Postgres does! Only in the system tables, and not even in all of them. From the system's point of view, there's no real need to assign OIDs to user table rows at all --- so another possible answer is not to do that, unless the user requests it. regards, tom lane
brad <brad@kieser.net> writes: <snip> > Simply waiting for 64bit numbers is rather inelegant and also presumes usage > parameters for the database... remember Bill Gates saying that he couldn't > foresee any usage for more than 64MB of RAM? Besides which, PostgreSQL is the > best DB around... there's a high standard to maintain! <snip> Actually, he was purported to have said that nobody would need more that 640KB or ram, which was the limit of memory on MSDOS. Brings back memories... remember having to juggle the drivers on bootup to plays that game which wanted nearly all of the base memory but you still needed the mouse, soundcard and cdrom access? ;-) -- Prasanth Kumar kumar1@home.com
Ummm ... I'm a newbie to this list, but hasn't this evolved into a hacker issue? Also, Bill Gates said something along the lines of nobody will ever need more than 640KB of RAM ... which was the usable limit on the old XT's (remember them :) in the early MS-DOS days :) Dave Burbidge Network Administrator -----Original Message----- <snip>
On Thu, 27 Jul 2000, Dave Burbidge wrote: > Also, Bill Gates said something along the lines of nobody will ever need > more than 640KB of RAM ... which was the usable limit on the old XT's > (remember them :) in the early MS-DOS days :) Actually, I think the quote is an urban legend. Brett W. McCoy http://www.chapelperilous.net/~bmccoy/ ------------------------------------------------------------------------------- A girl's conscience doesn't really keep her from doing anything wrong-- it merely keeps her from enjoying it.
The Versant ODBMS uses 48 bit oids, and if you do the math I think you'll find that should last you forever. (It uses an additional 16 bits to identify the database, but that's another story.). Any complex scheme to solve this seems like a waste of time. In a couple of years when you are likely to be running out, you'll probably be upgrading your computer to a 64bit one with a newer version of postgres, and then the problem will disappear. brad wrote: > > THe implications of the OIDs not wrapping are immense. We have some extremely > active databases that will easily reach this limit in two or three years. For > some applications, such as ecommerce, dumping then reinserting the rows is > not an option for large databases due to the 24 hours nature of their work. > > This is a much more complex problem than it would at first seem as the > "tripping up" over old records with low OIDs still presents a problem, yet if > the system is changed to try and manage a list of available OIDs, it will > then hit performance problems. > > Simply waiting for 64bit numbers is rather inelegant and also presumes usage > parameters for the database... remember Bill Gates saying that he couldn't > foresee any usage for more than 64MB of RAM? Besides which, PostgreSQL is the > best DB around... there's a high standard to maintain! > > Some initial ideas: > a) The system remains working the way that it does until a configurable > cutoff point (% of OIDs remaining), at which point the DBA can either switch > on some sort of OID "garbage collection" and take the associated performance > hit, or dump the data and reinsert it to pack the OIDs. > GARBAGE COLLECTION: > b) The system could wrap around, keeping an internal pointer of where it is > in the OID chain. It could scan the OIDs sequentially finding the first free > OID. It coudl then store that position as the new start point for the next > time an OID is needed. > c) An OID compression utility could be writen that doesn't require bringing > the DB down (but will obviously have a performance it). As running this > utilty would be a known impact, provision could be made, or the knock > budgeted for and accounted for while not bringing the entire DB to a halt. > d) OIDs themselves could form a list. The OID system as it stands now could > be the "default mode" but if the OIDs run out, then a new OID list is > started. This means the OID usage would then change to work along the same > lines that the Intel memory adressing works. Personally I hate it, but it > does work. > e) OIDs could be scrapped in favour of some other system. > > Well, that's my two pence worth. > > Brad > > Paul Caskey wrote: > > > Tom Lane wrote: > > > > > > Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > > > > Paul Caskey wrote: > > > >> 1. This implies a hard limit of 4 billion records on a server, right? > > > > > > > Basically, yes. > > > > > > It's only a hard limit if your application assumes OIDs are unique. > > > If you don't assume that, then I think it's not a big problem. > > > > > > It's possible (though obviously not especially likely) that you might > > > get OID collisions in the system tables after an OID-counter wraparound. > > > > This implies they do wrap around. So they are reused? Chris said no, but > > you're saying yes. > > > > (Maybe they wrap around "by accident", by adding one to MAXINT, which will > > give zero on an unsigned int, I believe. Will the system choke on zero? > > Has anyone tested this wraparound?) > > > > I will not have 4 billion records in one table or even one database. But > > on a large server with many databases, it is conceivable to have 4 billion > > records on one machine. With a lot of insert/delete activity, over a few > > years, it is certainly conceivable to have 4 billion inserts. If the oids > > don't wrap, I have a problem. I can ignore it for a long time, but it > > will loom, like Y2K. :-) > > > > Even if they do wrap, if I have some old records lying around with a low > > OIDs, they will trip me up. Like you said, these are "the outer limits", > > but I'm thinking ahead. > > > > Someone suggested in private that I pg_dump/restore all my data to > > "repack" the oids which start around 1700 on a fresh database. Thanks for > > that idea. Also thanks, Tom, for the sanity check Re: terabytes of data > > with 4 billion records. It's still possible, especially in coming years. > > It would be a big feather in PG's cap to "fully support" 64-bit platforms > > such as IRIX and Solaris (finally) and, coming soon to a theater near you, > > Linux on IA-64. > > > > -- > > Paul Caskey paul@nmxs.com 505-255-1999 > > New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 > > --
Chris Bitmead wrote: > > Any complex scheme to solve this seems like a waste of time. In a couple > of years when you are likely to be running out, you'll probably be > upgrading your computer to a 64bit one with a newer version of postgres, > and then the problem will disappear. that's the kind of thing people said about y2k, isn't it ? -- Mathieu Arnold
Mathieu Arnold wrote: > Chris Bitmead wrote: > > > > > Any complex scheme to solve this seems like a waste of time. In a couple > > of years when you are likely to be running out, you'll probably be > > upgrading your computer to a 64bit one with a newer version of postgres, > > and then the problem will disappear. > > that's the kind of thing people said about y2k, isn't it ? > I don't want to start a war but I must agree here... I recoil when the argument is put forward for a "you will never use that up" approach. The best that I can offer is: Oh yeah? Seen some of the Beowulf clusters around recently? > > -- > Mathieu Arnold
True, it is a big number and for most people I would agree with what you are saying. Computationally the amount of activity on the database needed to use up that sort of number is immense. But then, two years ago the prospect of a 1GHz PC processor was also remote. I can only say that OpenSource is exploding across the computing landscape more comprehensively and faster than anyone thought possible and at the same time supercomputing is becoming the order of the day rather than that rare place that you hoped to be invited into. Linux has made this possible and OpenSource is at the very heart of this. I would personally like to see PostgreSQL being synonymous with OpenSource installations. To a large degree it has already achieved that. In my own experience working with datbases of several TB in size is now becoming normal rather than exceptional. I guess that what I am trying to say is that if the OID limit reflected the maximum storage capacity of the system (more easily determined) rather than a transactional capacity (a lot less easily dealt with) it would be a major feather in Postgres' cap. However, my guess would be that were someone to write a mechanism for choosing the OID handling algorithm, the payoff between performance/transaction limits would mean that probably most would opt for the performance gain and deal with the OID issue as a normal DBA procedure. I know that I would for the majority of databases that I administer. Brad Quoting Andrew Snow <als@fl.net.au>: > > > Simply waiting for 64bit numbers is rather inelegant and also > > presumes usage > > parameters for the database... remember Bill Gates saying that he > couldn't > > foresee any usage for more than 64MB of RAM? > > I've heard this before and I just don't agree. 64MB ram, perhaps, but who > is going to need 64 * 2^32? The magnitude of increase is fairly vast! > > I probably need not mention that a 32 bit value can store up to > 4,294,967,296 where a 64 bit number can store a number that is > 4,294,967,296 > times as big. If 32 bit wasn't enough to keep you happy for more than a few > years, a 64 bit oid really should be enough for anyone for long enough that > you won't be using the same software/hardware any more. > > Similar to how a 32 bit unix time in seconds is only good for another ~40 > years, but a 64 bit one will keep us going for billions of years. > > I guess the real issue is rewriting the system so that the type of oid can > be chosen at compile time, so you can use whatever damn one you feel like. > I would also guess that the majority of systems out there using the latest > versions of postgres, already have compiler support for 64 bit integers. > So > when someone gets around to doing the necessary work, everything will be > nice. > > > - Andrew > > > P.S. IMHO if you can't afford to do a drop and reload once in a while, > Postgres isn't a good choice at the moment for your application. > > > > Bradley Kieser Director Kieser.net
My mistake! ;-) I remember wondering who would ever need more that the 16K that the Sinclair Spectrum could give you! Quoting "Prasanth A. Kumar" <kumar1@home.com>: > brad <brad@kieser.net> writes: > > <snip> > > Simply waiting for 64bit numbers is rather inelegant and also presumes > usage > > parameters for the database... remember Bill Gates saying that he > couldn't > > foresee any usage for more than 64MB of RAM? Besides which, PostgreSQL is > the > > best DB around... there's a high standard to maintain! > <snip> > > Actually, he was purported to have said that nobody would need more > that 640KB or ram, which was the limit of memory on MSDOS. Brings back > memories... remember having to juggle the drivers on bootup to plays > that game which wanted nearly all of the base memory but you still > needed the mouse, soundcard and cdrom access? ;-) > > -- > Prasanth Kumar > kumar1@home.com > Bradley Kieser Director Kieser.net
Quoting Tom Lane <tgl@sss.pgh.pa.us>: > Paul Caskey <paul@nmxs.com> writes: > >> No doubt about it, you're likely to get a few "duplicate key" errors and > >> stuff like that. I'm just observing that it's not likely to be a > >> complete catastrophe, especially not if you don't rely on OIDs to be > >> unique in your user tables. > > > I don't rely on OID uniqueness, but I assumed Postgres does! > > Only in the system tables, and not even in all of them. From the > system's point of view, there's no real need to assign OIDs to > user table rows at all --- so another possible answer is not to > do that, unless the user requests it. > This changes things a lot. If the rows don't have to have OIDs associated with them then the 4bn limit is not a transactional limit... in which case there shouldn't be a problem. > regards, tom lane > Bradley Kieser Director Kieser.net
> My mistake! ;-) > I remember wondering who would ever need more that the 16K that > the Sinclair Spectrum could give you! To go back to my original point about putting things in perspective - increasing this by 2^32 would give you 68 terabytes of RAM. But if we can get rid of oid's altogether then I think that would be best. (At least we can make our own 64 bit serial types until they are officially implemented). - Andrew
Tom Lane wrote: > > Paul Caskey <paul@nmxs.com> writes: > > >> No doubt about it, you're likely to get a few "duplicate key" errors and > >> stuff like that. I'm just observing that it's not likely to be a > >> complete catastrophe, especially not if you don't rely on OIDs to be > >> unique in your user tables. > > > I don't rely on OID uniqueness, but I assumed Postgres does! > > Only in the system tables, and not even in all of them. From the > system's point of view, there's no real need to assign OIDs to > user table rows at all --- so another possible answer is not to > do that, unless the user requests it. That's interesting (almost shocking) to know, but it's probably too late to put the lid back on that bottle. I imagine a lot of software has been written especially for postgres, like pgaccess, which use the oid field along with cursors, etc. A lot of people have probably also relied on the oid as a convenient unique record identifier. I know I was tempted. If the system is doing it anyway, why maintain my own with a serial or sequence? So if we make the OID optional in user tables, they need to be left in by default, and only *omitted* with a compile-time option. Otherwise we'll break many more things than we solve with this Y2K-like problem. Dave Burbidge wrote: > > Ummm ... I'm a newbie to this list, but hasn't this evolved into a hacker > issue? Yes, we should move any further detailed discussion over there. I asked this question initially in the hacker list and got no response, so I came over here. But we did have a related discussion on 64-bit sequences in the hacker list. FWIW, I checked into MySQL, and as far as I can tell, they have nothing like this implicit 4 billion transactional "limit". So maybe competitive spirit will drive the postgres hackers to fix this problem sooner than later. ;-) If nothing else, this needs to be documented. Whether by design or not, people need to know there is a hard limit of 4 billion records on a server, and a strange fuzzy sorta-limit of 4 billion inserts on a server. Regardless of whether they use a 64-bit compiler. -- Paul Caskey paul@nmxs.com Software Engineer New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
> This implies they do wrap around. So they are reused? Chris said no, but > you're saying yes. > > (Maybe they wrap around "by accident", by adding one to MAXINT, which will > give zero on an unsigned int, I believe. Will the system choke on zero? > Has anyone tested this wraparound?) > > I will not have 4 billion records in one table or even one database. But > on a large server with many databases, it is conceivable to have 4 billion > records on one machine. With a lot of insert/delete activity, over a few > years, it is certainly conceivable to have 4 billion inserts. If the oids > don't wrap, I have a problem. I can ignore it for a long time, but it > will loom, like Y2K. :-) > > Even if they do wrap, if I have some old records lying around with a low > OIDs, they will trip me up. Like you said, these are "the outer limits", > but I'm thinking ahead. > > Someone suggested in private that I pg_dump/restore all my data to > "repack" the oids which start around 1700 on a fresh database. Thanks for > that idea. Also thanks, Tom, for the sanity check Re: terabytes of data > with 4 billion records. It's still possible, especially in coming years. > It would be a big feather in PG's cap to "fully support" 64-bit platforms > such as IRIX and Solaris (finally) and, coming soon to a theater near you, > Linux on IA-64. > I have added this to the FAQ. Seems people are concerned enough to warrant it: OIDs are stored as 4-byte integers, and will overflow at 4 billion. No one has reported this every happening, and we plan to have the limit removed before anyone does.<P> -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
brad writes: > I don't want to start a war but I must agree here... I recoil when the > argument is put forward for a "you will never use that up" approach. > The best that I can offer is: Oh yeah? Seen some of the Beowulf clusters > around recently? I don't want to participate in a war, either, but consider this: * At a conservative estimate, the Universe is 3e+17 seconds old. * 2^64 is 2e+19, about 100 times larger. So, if you're going to be creating 100 objects a second, for the next 10 billion years, you'll hit that 64-bit OID limit. Or, if you create 5 billion objects a second, you'll hit the limit in only 100 years. Now let's assume that Moore's Law holds true for number of objects created per second, in a PG database. Let's further assume that the current limit is 1000. So, in 18 months, the limit would be 2000 objects per second. In how many years would the limit actually reach the 5 billion I spoke of? I calculate 22 doublings, which works out to 33 years. So, assuming these numbers are reasonable, and assuming that Moore's Law applies to PG usage, we could have trouble with 64-bit OIDs in maybe 40 years. I'm not trying to be inflammatory here; just trying to add some relatively objective numbers to the discussion. Feel free to correct my numbers. Chris -- ---------------------------------------------------- cjones@rightnowtech.com Chris Jones System Administrator, RightNow Technologies "Is this going to be a stand-up programming session, sir, or another bug hunt?"
brad wrote: > > Mathieu Arnold wrote: > > > Chris Bitmead wrote: > > > > > > > > Any complex scheme to solve this seems like a waste of time. In a couple > > > of years when you are likely to be running out, you'll probably be > > > upgrading your computer to a 64bit one with a newer version of postgres, > > > and then the problem will disappear. > > > > that's the kind of thing people said about y2k, isn't it ? > I don't want to start a war but I must agree here... I recoil when the > argument is put forward for a "you will never use that up" approach. > The best that I can offer is: Oh yeah? Seen some of the Beowulf clusters > around recently? Regardless, the solution is not to make a complex oid reuse scheme. The solution is 64bit oids which is easily solved on a 64bit computer, but requires a bit of effort to make it work on 32bit machines. If you want to make the effort - go for it!
> FWIW, I checked into MySQL, and as far as I can tell, they have nothing > like this implicit 4 billion transactional "limit". So maybe competitive > spirit will drive the postgres hackers to fix this problem sooner than > later. ;-) We have *never* had a report of someone pushing this 4GB limit, and theoretical problems usually go into the long-term development plan, not in the "OHMYGODITSBROKEN" list. - Thomas
Chris Jones <cjones@rightnowtech.com> writes: > So, assuming these numbers are reasonable, and assuming that Moore's > Law applies to PG usage, we could have trouble with 64-bit OIDs in > maybe 40 years. Interesting point. However, once we bite the bullet of not tying OID == int32, it should be relatively painless to equate OID to any compiler-supported integer type. So we should only need to fix this problem once. If your projection is accurate, no doubt C2020 will consider "long long long int" to be a required type ;-) regards, tom lane
Paul Caskey <paul@nmxs.com> writes: > Tom Lane wrote: >>>> I don't rely on OID uniqueness, but I assumed Postgres does! >> >> Only in the system tables, and not even in all of them. From the >> system's point of view, there's no real need to assign OIDs to >> user table rows at all --- so another possible answer is not to >> do that, unless the user requests it. > That's interesting (almost shocking) to know, but it's probably too late > to put the lid back on that bottle. I imagine a lot of software has been > written especially for postgres, like pgaccess, which use the oid field > along with cursors, etc. A lot of people have probably also relied on the > oid as a convenient unique record identifier. Indeed, I have written such applications myself (though only after estimating they'd not exceed 4G rows before my children's children's children are safely dead ;-)). This is why we'd have to continue to offer OIDs for user tables as an option. Whether it should be default or not is debatable. Personally I prefer the straight-ahead approach of supporting wider OIDs. Sooner or later we will need to be able to handle wider-than- 32bit transaction IDs and block numbers too, and there's no cute workaround for those. Might as well get some practice with OIDs. > FWIW, I checked into MySQL, and as far as I can tell, they have nothing > like this implicit 4 billion transactional "limit". Hmph. Has anyone ever come close to shoving 4G rows into MySQL? Maybe they just haven't documented their limits... regards, tom lane
Hi all. I know this has been discussed before. I've read through the replication thread referenced from the todo-list. I've also tried to do a further search in the pqsql mail archives, but those searches didn't give me anything concrete. Neither have searches on google.com, given anything, nor following the tentative urls listed in the replication thread. So, my question is, has somebody come up with good code for asynchronous unidirectional replication ? If so, I'd be very interested to hear about it. If this hasn't been done, I have three programmers (myself included) on standby, who will be paid for adding this to postgresql. We'd be very interested in getting feedback from the core hackers on this, so that we do it Right, even if it means spending a lot of time on it. Regards, Karl T
Good point! Okay... 64 bit OIDs will probably just about do the trick! But hey, NEVER underestimate the efficiency of Linux :-) Chris Jones wrote: > brad writes: > > > I don't want to start a war but I must agree here... I recoil when the > > argument is put forward for a "you will never use that up" approach. > > The best that I can offer is: Oh yeah? Seen some of the Beowulf clusters > > around recently? > > I don't want to participate in a war, either, but consider this: > > * At a conservative estimate, the Universe is 3e+17 seconds old. > > * 2^64 is 2e+19, about 100 times larger. > > So, if you're going to be creating 100 objects a second, for the next 10 > billion years, you'll hit that 64-bit OID limit. Or, if you create 5 > billion objects a second, you'll hit the limit in only 100 years. > > Now let's assume that Moore's Law holds true for number of objects > created per second, in a PG database. Let's further assume that the > current limit is 1000. So, in 18 months, the limit would be 2000 > objects per second. In how many years would the limit actually reach > the 5 billion I spoke of? I calculate 22 doublings, which works out > to 33 years. > > So, assuming these numbers are reasonable, and assuming that Moore's > Law applies to PG usage, we could have trouble with 64-bit OIDs in > maybe 40 years. > > I'm not trying to be inflammatory here; just trying to add some > relatively objective numbers to the discussion. Feel free to correct > my numbers. > > Chris > > -- > ---------------------------------------------------- cjones@rightnowtech.com > Chris Jones > System Administrator, RightNow Technologies > "Is this going to be a stand-up programming session, sir, or another bug hunt?"
On Fri, 28 Jul 2000, Thomas Lockhart wrote: > > FWIW, I checked into MySQL, and as far as I can tell, they have nothing > > like this implicit 4 billion transactional "limit". So maybe competitive > > spirit will drive the postgres hackers to fix this problem sooner than > > later. ;-) > > We have *never* had a report of someone pushing this 4GB limit, and > theoretical problems usually go into the long-term development plan, not > in the "OHMYGODITSBROKEN" list. I'm not sure about MySQL not haveing the 4 Billion limit. As far as I know, Informix has (in it's 7.x versions) the 4 billion limit with the INTEGERS. Informix 9.x has added other data types, like an 8 byte INTEGER. Working on 64bit Solaris, but don't know which is the real limit. Maybe some day I'm make a small program to see if I can fill the column. ;-) Saludos... ;-) "And I'm happy, because you make me feel good, about me." - Melvin Udall ----------------------------------------------------------------- Martín Marqués email: martin@math.unl.edu.ar Santa Fe - Argentina http://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -----------------------------------------------------------------
On Fri, 28 Jul 2000, brad wrote: > Good point! > Okay... 64 bit OIDs will probably just about do the trick! > But hey, NEVER underestimate the efficiency of Linux :-) Never underestimated it!!! Linux for SPARC or ALPHA is 64 bits!!! The proble is Intel and the x86 platform!! Saludos... ;-) "And I'm happy, because you make me feel good, about me." - Melvin Udall ----------------------------------------------------------------- Martín Marqués email: martin@math.unl.edu.ar Santa Fe - Argentina http://math.unl.edu.ar/~martin/ Administrador de sistemas en math.unl.edu.ar -----------------------------------------------------------------
> So, my question is, has somebody come up with good code for asynchronous > unidirectional replication ? If so, I'd be very interested to hear about > it. I had to do a replication job in a project for an online newspaper database by one of the biggest swiss newspapers. Content is prepared on a development database. Publishers have the ability to publish the content on different levels (i.e. level article, level category, level product, level database). We made triggers on the development database which are storing every insert, update delete action on each table in a log table with a timestamp. The program is written in Perl and is based on our Db/Web framework. Presentation, business logic and data layers are completely separated in the framework. The program uses only the data layer of the framework. The program is easily configurable with a hash definition: %blmAbgleich::ruleset = ( Artikel => { before => [ [ Rubriken => 'rubrikId' ], [ Ausgaben => 'ausgabeId' ], ], after => [ [ DossierArtikel => 'artikelId' ], [ SlArtikel => 'artikelId' ], [ ArtikelArtikel => 'oberId' ], [ ArtikelArtikel => 'unterId' ], [ BilderArtikel => 'artikelId' ], ], } ..... ) That means when a record of the table artikel must be replicated the program has to look first in the tables rubriken and ausgaben referenced by the foreign key fields rubrikId and ausgabeId. After that the record itself is replicated and then all depending records on that record (i.e.. DossierArtikel joined by artikelId) are replicated. Therefore referential integrity is always ensured even in a recursive relationship. This is now working since 3 month without any problems. Its running on a informix ORDBMS but the framework was initially made for Postgres and later ported to Informix and should therefore run in a Postgres environment. If you are interested in please contact me. Best regards Herbie -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Herbert Liechti E-Mail: Herbert.Liechti@thinx.ch ThinX networked business services Stahlrain 10, CH-5200 Brugg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are many PostgreSQL supported operating Systems that run on 64-bit platforms -- virtually every popular Unix except OpenBSD comes to mind :-) There is something else that many aren't considering. In every application I've ever written to use any database I use ID numbers of my own making, always they're integer. 4 billion is the limit on any integer field, not just the OID so there are limitations everyone should realize when using any integer for any kind of record identification purposes.. -Mitch ----- Original Message ----- From: "Martin A. Marques" <martin@math.unl.edu.ar> To: "brad" <brad@kieser.net> Cc: "Chris Jones" <cjones@rightnowtech.com>; <pgsql-general@postgresql.org> Sent: Friday, July 28, 2000 6:47 AM Subject: Re: [GENERAL] Re: 4 billion record limit? > On Fri, 28 Jul 2000, brad wrote: > > > Good point! > > Okay... 64 bit OIDs will probably just about do the trick! > > But hey, NEVER underestimate the efficiency of Linux :-) > > Never underestimated it!!! Linux for SPARC or ALPHA is 64 bits!!! > The proble is Intel and the x86 platform!! > > Saludos... ;-) > > > "And I'm happy, because you make me feel good, about me." - Melvin Udall > ----------------------------------------------------------------- > Martín Marqués email: martin@math.unl.edu.ar > Santa Fe - Argentina http://math.unl.edu.ar/~martin/ > Administrador de sistemas en math.unl.edu.ar > ----------------------------------------------------------------- > >
Mitch Vincent wrote: > > There is something else that many aren't considering. In every application > I've ever written to use any database I use ID numbers of my own making, > always they're integer. 4 billion is the limit on any integer field, not > just the OID so there are limitations everyone should realize when using any > integer for any kind of record identification purposes.. > That's an excellent point, especially considering that *sequences* use an integer to hold their max_value, which is by default 2,147,483,647. You cannot go larger than that, either. I guess it's constrained to be positive. So OIDs give you more potential unique values than sequences, far as I can tell.
Thomas Lockhart wrote: > > > FWIW, I checked into MySQL, and as far as I can tell, they have nothing > > like this implicit 4 billion transactional "limit". So maybe competitive > > spirit will drive the postgres hackers to fix this problem sooner than > > later. ;-) > > We have *never* had a report of someone pushing this 4GB limit, and > theoretical problems usually go into the long-term development plan, not > in the "OHMYGODITSBROKEN" list. That's absolutely true, and I'm sorry for pushing your MySQL button. I'm not in some sort of panic to get this fixed. I just raised this issue to (1) see if I was the only one who noticed/cared and (2) learn some details to see if/what limits were really there. > Tom Lane wrote: > > Hmph. Has anyone ever come close to shoving 4G rows into MySQL? Maybe > they just haven't documented their limits... The guantlet has been dropped! Looks like I have my work cut out for me, this weekend. :-) Again, just to clarify, I'm not concerned about a database holding 4 billion records at once. I'm concerned about performing 4,000,000,000 inserts and 3,999,999,000 deletes over the course of a few years. I just ran the numbers and that's about 50 transactions/second on an international web site active 24 hours/day for 3 years. Put 5 such sites on one server and you're down to 10 trans/sec per site. Ramp up to 30 trans/sec/site and you're down to 1 year. Juggle these numbers however you want; it's not that far fetched. Sure, it has a lot to do with the application. Most databases are read much more than they're written, and of course a SELECT doesn't chew up an OID. But every INSERT does. Any application that updates information frequently can be vulnerable to this problem. With my table structure, it happens to be much easier to DELETE and INSERT inside a transaction than to UPDATE. And every "row" of customer-supplied data could turn into 100 INSERTs, so I'm accelerating toward that 4G limit 100 times faster. Bottom line: I can recycle my own IDs if I have to. But I need to know whether I should change my table structure and/or code to conserve OIDs, an internal system identifier particular to the RDBMS I chose. Considering the main reason I chose Postgres was for scalability, this seems a little ironic. -- Paul Caskey paul@nmxs.com Software Engineer New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
On Fri, Jul 28, 2000 at 11:48:10AM -0500, Keith G. Murphy wrote: > Mitch Vincent wrote: > > > > There is something else that many aren't considering. In every application > > I've ever written to use any database I use ID numbers of my own making, > > always they're integer. 4 billion is the limit on any integer field, not > > just the OID so there are limitations everyone should realize when using any > > integer for any kind of record identification purposes.. > > > That's an excellent point, especially considering that *sequences* use > an integer to hold their max_value, which is by default 2,147,483,647. > You cannot go larger than that, either. I guess it's constrained to be > positive. So OIDs give you more potential unique values than sequences, > far as I can tell. However, in each case, you've got a per table (per field, really) limit, not a per database. Not to mention that there are work arounds: two int fields form a nice 64 bit compund key, without any need for a 64 bit int custom type. Admittedly cumbersome to use, but standard SQL. The sequence is a bit more or a problem, but since it's non standard SQL anyway, writing your own sequence that uses a numeric counter gives you potentially infinite serials. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> > That's an excellent point, especially considering that *sequences* use > > an integer to hold their max_value, which is by default 2,147,483,647. > > You cannot go larger than that, either. I guess it's constrained to be > > positive. So OIDs give you more potential unique values than sequences, > > far as I can tell. What about postgres' int8 type which holds up to +/- 9 billion billion or something. - Andrew
Andrew Snow wrote: > > > > That's an excellent point, especially considering that *sequences* use > > > an integer to hold their max_value, which is by default 2,147,483,647. > > > You cannot go larger than that, either. I guess it's constrained to be > > > positive. So OIDs give you more potential unique values than sequences, > > > far as I can tell. > > What about postgres' int8 type which holds up to +/- 9 billion billion or > something. > > - Andrew Yes, it is quite possible to store 8-byte integers in a Postgres table with the INT8 datatype, even on a 32-bit platform like Intel. Unfortunately, as stated above, sequences are locked in at 4 bytes. Still, that's my solution for now, regarding large unique record identifiers. I have my own INT8 "id" field, but tie it to an INT4 sequence until INT8 sequences are available. Then my table will never need to change; I can just drop the old sequence and start a new one where the old one left off. But for my purposes, this is irrelevant because the hidden INT4 oid will still get me. Since the oids span all tables and all databases on a machine, I will "run out" of oids long before any of my individual table id sequences. But per the discussion here, by the time this is a problem, we will have INT8 oids and/or optional oids altogether, so all is good. I believe you can use negative numbers in a sequence, but it's not recommended. One good reason is that if you ever hit 0, your code may break or misbehave. Consider a common test like "if (! $id) ..." in Perl, C, PHP, etc. There are probably other good reasons for staying positive [sic]. Some other databases [which shall remain nameless here] enforce positive numbers for auto-incrementing integers. There's a recent discussion in the pgsql-hackers list about 64-bit sequences, so paw through the archives if you're interested. Bottom line: same as 64-bit oids, they will come along, eventually, probably before 32-bit sequences are a problem for anyone. Over and out, -- Paul Caskey paul@nmxs.com Software Engineer New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --