Thread: maximum number of rows in table - what about oid limits?
maximum number of rows in table - what about oid limits?
From
jmscott@yahoo.com (jmscott@REMOVEMEyahoo.com)
Date:
postgresql docs claim an essentially unlimited number of rows per table. http://postgresql.crimelabs.net/users-lounge/limitations.html this doesn't make sense if each row has an oid. do more subtle side effects exist if the oid wraps? thanks - jmscottJUSTREMOVEME@yahoo.com
On 6 Jun 2001, jmscott@REMOVEMEyahoo.com wrote: > postgresql docs claim an essentially unlimited number of > rows per table. > > http://postgresql.crimelabs.net/users-lounge/limitations.html > > this doesn't make sense if each row has an oid. > do more subtle side effects exist if the oid wraps? In general, unless you're relying on unique oids, you should be fine. You probably don't want to use oid as a unique key in your tables for that reason. Of course, sequences aren't sufficient either (also being int4) but some kind of int8 "sequence" mechanism would do it if you expect more than the int4 number of rows. You might have problems with creating system table entries with unique oids after wraparound, but generally that can be fixed by trying again. (Some of the system tables have a unique index on oid).
On Wed, 6 Jun 2001, John Scott wrote: > well i wasn't interested in using oids in my application. > i was curious about the relationship oids > and the tuple/row limit. > > i guess if what you say is true, the oids are NOT used internally > by postgres. this seems odd. > so, i guess my question still stands ... what happens when oids wrap? > are oids nothing more than a sequence with an index, > not used at all internally? They are used for references between system tables. That's why you'll get intermittent failures on things like create statements after rollover (system tables may have unique index on oid). As far as I know the system doesn't do stuff with the oid on user rows.
well i wasn't interested in using oids in my application. i was curious about the relationship oids and the tuple/row limit. i guess if what you say is true, the oids are NOT used internally by postgres. this seems odd. so, i guess my question still stands ... what happens when oids wrap? are oids nothing more than a sequence with an index, not used at all internally? i --- Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > > On 6 Jun 2001, jmscott@REMOVEMEyahoo.com wrote: > > > postgresql docs claim an essentially unlimited number of > > rows per table. > > > > http://postgresql.crimelabs.net/users-lounge/limitations.html > > > > this doesn't make sense if each row has an oid. > > do more subtle side effects exist if the oid wraps? > > In general, unless you're relying on unique oids, you should be fine. > You probably don't want to use oid as a unique key in your tables for that > reason. Of course, sequences aren't sufficient either (also being > int4) but some kind of int8 "sequence" mechanism would do it if you expect > more than the int4 number of rows. > > You might have problems with creating system table entries with unique > oids after wraparound, but generally that can be fixed by trying again. > (Some of the system tables have a unique index on oid). > ===== John Scott (john@august.com) Senior Partner August Associates email: john@august.com web: http://www.august.com/~jmscott __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/
--- Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > On Wed, 6 Jun 2001, John Scott wrote: > > > well i wasn't interested in using oids in my application. > > i was curious about the relationship oids > > and the tuple/row limit. > > > > i guess if what you say is true, the oids are NOT used internally > > by postgres. this seems odd. > > > so, i guess my question still stands ... what happens when oids wrap? > > are oids nothing more than a sequence with an index, > > not used at all internally? > > They are used for references between system tables. That's why > you'll get intermittent failures on things like create statements after > rollover (system tables may have unique index on oid). > As far as I know the system doesn't do stuff with the oid on user rows. > > hmmm. guess i'm still not getting a clear picture on what happens when oid's wrap? just seems like an obvious question. if i wrap twice, things could get very wicked, indeed. if i'm not comfortable with oid wrapping, then doesn't this imply that postgres can only handle 2^32 tuples (or 2^31 - 1) - this number of rows per table is relatively small by today's PC standards. just converting/bulk loading an existing large commercial database into postgres would expose this issue ;-) what am i missing here? i am comfortable not using oids in application logic. that' pretty standard. my question is, since oid's are used by the postgres kernel, what effect does oid wrapping have upon the stability of the database. for example, does oid wrapping mean i might clash with oid's in the system tables. i'll poke around the source. just hoped somebody could shed some light on what seemed like an obvious question. thanks-john ===== John Scott (john@august.com) Senior Partner August Associates email: john@august.com web: http://www.august.com/~jmscott __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/
> my question is, since oid's are used by the > postgres kernel, what effect does oid wrapping have upon > the stability of the database. for example, does oid wrapping mean > i might clash with oid's in the system tables. Let me try again. I've been having problems trying to write it out I fear. AFAIK, the only real clashes you have to worry about are oid clashes between rows within the same system table that have unique constraints on oid. That means that your table rows shouldn't conflict, and you shouldn't even get conflicts between different system tables. So, you could get a conflict when say creating a table between an old pg_class row and the one that was going to be created and the create table would fail and need to be tried again, but if you had a pg_class row oid 10000, and 100 other rows with oid 10000 in your own tables there's no conflict.
John Scott <jmscott@yahoo.com> writes: > my question is, since oid's are used by the > postgres kernel, what effect does oid wrapping have upon > the stability of the database. It doesn't affect stability, if by that you mean potential crashes. What can happen after a wrap is that the OID generated for a newly-created object might conflict with some already-existing object's OID. If that happens, you get a duplicate-key-insertion error on the OID index of the relevant system catalog (pg_class, pg_type, etc). There is a unique index on OID for each system catalog wherein OID is used to identify objects. It doesn't really matter whether the same OID is reused in different catalogs or in user tables. The odds of this happening seem pretty low, if you've got reasonable numbers of objects (eg, with a couple thousand tables in a database, you'd have a 2000/4G = 1 in 2 million chance of collision on pg_class OID). If it does happen, you can just retry the failed object-creation command until you get a nonconflicting OID assignment. This is certainly not ideal, but it's not nearly as big a problem as transaction ID wraparound. You can live with it, whereas right now xact ID wraparound is catastrophic. That we gotta work on, soon. regards, tom lane
Tom, Stephan, > What can happen after a wrap is that the OID generated for a > newly-created object might conflict with some already-existing > object's > OID. If that happens, you get a duplicate-key-insertion error on the > OID index of the relevant system catalog (pg_class, pg_type, etc). > There is a unique index on OID for each system catalog wherein OID > is used to identify objects. It doesn't really matter whether the > same OID is reused in different catalogs or in user tables. Given this, why bother with system-generated OIDs on user rows at all? Why not simply reserve the OIDs for the system tables? Or are you planning to later increase the OIDs to INT8 and start using them for OODB-analogous behavior on individual records? > This is certainly not ideal, but it's not nearly as big a problem as > transaction ID wraparound. You can live with it, whereas right now > xact ID wraparound is catastrophic. That we gotta work on, soon. Nothing like reassuring us commercial DB users, Tom. :-P Can you describe what you're talking about? -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > John Scott <jmscott@yahoo.com> writes: > > my question is, since oid's are used by the > > postgres kernel, what effect does oid wrapping have upon > > the stability of the database. > > It doesn't affect stability, if by that you mean potential crashes. yes and know. an application workaround is ok, > > What can happen after a wrap is that the OID generated for a > newly-created object might conflict with some already-existing object's > OID. If that happens, you get a duplicate-key-insertion error on the > OID index of the relevant system catalog (pg_class, pg_type, etc). > There is a unique index on OID for each system catalog wherein OID > is used to identify objects. It doesn't really matter whether the > same OID is reused in different catalogs or in user tables. depends on how your are changing the catalog. but does retrying fix the problem? wouldn't the second try just often see the same oid again. does a failed attempt increment the oid???? > > The odds of this happening seem pretty low, if you've got reasonable > numbers of objects (eg, with a couple thousand tables in a database, > you'd have a 2000/4G = 1 in 2 million chance of collision on pg_class > OID). If it does happen, you can just retry the failed object-creation > command until you get a nonconflicting OID assignment. same argument as above. all this sounds like a reasonable halfway solution. i'm comfortable NOT using OID's at the application level for many reasons. that's not a problem. i just don't want my imported, test database of 20B tuples crashing for mysterious reasons, obviously, right out of the gate, with me mumbling and waving my hands at the wall. i can do that now with commercial systems. > > This is certainly not ideal, but it's not nearly as big a problem as > transaction ID wraparound. You can live with it, whereas right now > xact ID wraparound is catastrophic. That we gotta work on, soon. yep, 1000tx/sec ~~ 50 days before wrap. by the way, have you written anything up on the txid wrapping problem? we talked about this at osdn last year in san jose. i proposed 64 bits, and you discussed a clever wrapping algorithm involving 48 bits and, if memory serves me, backward compatiablility of the disk format. i was the one driving the car. probably time to take this thread private? cheers-john > > regards, tom lane __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/
John Scott <jmscott@yahoo.com> writes: > but does retrying fix the problem? wouldn't the second try just often > see the same oid again. does a failed attempt increment the oid???? Yes. > by the way, have you written anything up on the txid wrapping problem? > we talked about this at osdn last year in san jose. i proposed > 64 bits, and you discussed a clever wrapping algorithm involving 48 bits and, > if memory serves me, backward compatiablility of the disk format. The wraparound idea was discussed on pghackers; people didn't like it much, but I haven't heard a better near-term solution ... regards, tom lane
"Josh Berkus" <josh@agliodbs.com> writes: > Given this, why bother with system-generated OIDs on user rows at all? > Why not simply reserve the OIDs for the system tables? An option to not generate OIDs unless requested (on a table-by-table basis) has been discussed. It seems like a fine near-term solution to me. 8-byte OIDs are a longer-term solution, because they'll break a lot of things (including clients...) >> This is certainly not ideal, but it's not nearly as big a problem as >> transaction ID wraparound. You can live with it, whereas right now >> xact ID wraparound is catastrophic. That we gotta work on, soon. > Nothing like reassuring us commercial DB users, Tom. :-P > Can you describe what you're talking about? It's in the archives: after 4G transactions, your database curls up and dies. When your pg_log starts to approach 1Gbyte (2 bits per transaction) you'd better plan on dump/initdb/reload. regards, tom lane
please help me
Besides compatibility, what breaks when you make OIDs/Txn IDs INT8s? Maybe there should be a minor fork called Postgres64 which does this for those needing large tables. Jon johnnyb6@sdf.lonestar.org SDF Public Access UNIX System - http://sdf.lonestar.org On Wed, 6 Jun 2001, Tom Lane wrote: > "Josh Berkus" <josh@agliodbs.com> writes: > > Given this, why bother with system-generated OIDs on user rows at all? > > Why not simply reserve the OIDs for the system tables? > > An option to not generate OIDs unless requested (on a table-by-table > basis) has been discussed. It seems like a fine near-term solution > to me. 8-byte OIDs are a longer-term solution, because they'll break > a lot of things (including clients...) > > >> This is certainly not ideal, but it's not nearly as big a problem as > >> transaction ID wraparound. You can live with it, whereas right now > >> xact ID wraparound is catastrophic. That we gotta work on, soon. > > > Nothing like reassuring us commercial DB users, Tom. :-P > > Can you describe what you're talking about? > > It's in the archives: after 4G transactions, your database curls up > and dies. When your pg_log starts to approach 1Gbyte (2 bits per > transaction) you'd better plan on dump/initdb/reload. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Jonathan Bartlett <johnnyb6@sdf.lonestar.org> writes: > Besides compatibility, what breaks when you make OIDs/Txn IDs > INT8s? OIDs are part of the on-the-wire protocol (column type data returned during a SELECT is in the form of OIDs). So extending OIDs to int8 would pose nontrivial client/server compatibility hazards. > Maybe there should be a minor fork called Postgres64 which does > this for those needing large tables. I've been thinking it should be a compile-time option. But the protocol issue is going to be a problem. regards, tom lane
> I've been thinking it should be a compile-time option. But the protocol > issue is going to be a problem. The protocol issues shouldn't be a problem if you require separate drivers. Force clients to use libpq64.so. The reason I'm concerned about this is that there are a _lot_ of great things you can do with OIDs if they are unique. For example, if you've worked with Oracle Applications, they have this tacked on to the end of all of their tables: LAST_UPDATED_BY Int, LAST_UPDATED_DATE Date, ... Which could be replaced by a single table Table UPDATE_HISTORY REFERRED_OBJECT OID, LAST_UPDATED_BY OID, LAST_UPDATED_DATE Date which applies to all tables. You could also have a notes table for everything - table NOTES REFERRED_OBJECT OID, Note Text And so on and so forth. However, with OID wrap-around, this would be problematic. Jon > > regards, tom lane >
Jon, > Which could be replaced by a single table > > Table UPDATE_HISTORY > REFERRED_OBJECT OID, > LAST_UPDATED_BY OID, > LAST_UPDATED_DATE Date > > which applies to all tables. You could also have a notes table for > everything - > > table NOTES > REFERRED_OBJECT OID, > Note Text Actually, I'm doing this with one of my applications. Rather than using the (problematic) OID, I simply established an independant sequence ('universal_sq') and used that as the primary key for all of my important data tables. So long as the total records in these tables stays < 2.4 billion, I'm doing fine. Nothing in PostgreSQL prevents you from using a single independent sequence as the key for multiple tables. If you are concerned about having > 2.4 billion recs, then perhaps it's time to hack an INT8 sequence functionality. I think that adding INT8 sequences to the PostgreSQL database would be a *lot* easier than modifying OID functionality. In fact, if it matters to you, why not pay for it to get done? -Josh Berkus P.S. A lot of these concerns affect only developers with high-traffic web applications and similar. For example, in my small business software, it will take <> 1 million days to exhaust the TXN register. Not something I need to worry about. Is there some good way that we can "vote with our pocketbooks" for various development issues in the PostgreSQL to-do list, short of hiring a C programmer ourselves? I, for one, am desperately eager for real stored procedures, and could get my clients to contribute toward the development, but not more than 4 figures ... ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
> Actually, I'm doing this with one of my applications. Rather than using > the (problematic) OID, I simply established an independant sequence > ('universal_sq') and used that as the primary key for all of my > important data tables. So long as the total records in these tables > stays < 2.4 billion, I'm doing fine. Nothing in PostgreSQL prevents you > from using a single independent sequence as the key for multiple tables. The nice thing about OIDs is that if you ever need to merge rows, you could make a function that searched all OID-type parameters and change the old reference to the new one. > If you are concerned about having > 2.4 billion recs, then perhaps it's > time to hack an INT8 sequence functionality. I think that adding INT8 > sequences to the PostgreSQL database would be a *lot* easier than > modifying OID functionality. In fact, if it matters to you, why not pay > for it to get done? Were I doing PostgreSQL for a business, I most assuredly would. However, I am currently just toying around with creating a common set of base tables that would be useful in almost any application, taking advantage of PostgreSQL's specific capabilities. However, I asked the question because if the only real problem is compatibility, I might be able to do it myself (or maybe not). > P.S. A lot of these concerns affect only developers with high-traffic > web applications and similar. For example, in my small business > software, it will take <> 1 million days to exhaust the TXN register. > Not something I need to worry about. Agreed, but if people are going to take PostgreSQL seriously, then some of the top limits need to be expanded. > Is there some good way that we can "vote with our pocketbooks" for > various development issues in the PostgreSQL to-do list, short of hiring > a C programmer ourselves? I, for one, am desperately eager for real > stored procedures, and could get my clients to contribute toward the > development, but not more than 4 figures ... You could probably find a college student to do whatever you wanted for a grand. Out of curiosity, what do you mean by "Real Stored Procedures"? Jon
Jon, > The nice thing about OIDs is that if you ever need to merge rows, you > could make a function that searched all OID-type parameters and > change the > old reference to the new one. The same thing can be done with the 'universal_sq' approach. > Were I doing PostgreSQL for a business, I most assuredly would. > However, > I am currently just toying around with creating a common set of base > tables that would be useful in almost any application, taking > advantage of > PostgreSQL's specific capabilities. However, I asked the question > because > if the only real problem is compatibility, I might be able to do it > myself (or maybe not). Great! Go for it. Personally, I lack the knowledge ... > Agreed, but if people are going to take PostgreSQL seriously, then > some > of the top limits need to be expanded. Expanding the top limits is obviously a Good Thing (tm). However, I point out that expanding the top limits may not be the highest priority for PostgreSQL development (of course, it may be). Row limits are mainly important to web apps. For the kind of app I do (accounting and reporting) things like stored procedures, nested transaction handling, full support for ALTER TABLE, etc., are a *lot* more important than OID/TXN limits. Plus some more administrative interface projects, and an updated version of PGAccess. It's my opinion that PostgreSQL has a lot more to gain in terms of market acceptance as an application database server than a web database server. In the web arena, PostgreSQL must compete against MySQL and Oracle which are quite good; in medium-sized database applications, the only competition is MS SQL Server. PostgreSQL is *already* better than MSSQL, but it's going to have to be *much* better before people will notice it. IMHO. > You could probably find a college student to do whatever you wanted > for a > grand. Hey, if you know anyone, please have them e-mail me. > Out of curiosity, what do you mean by "Real Stored Procedures"? Since you're undoubtedly familiar with PostgreSQL functions, I will list only how a stored procedure differes from a PgSQL function: 1. Is precompiled, with a saved plan of execution on the server.(not sure about the wisdom of this but it appears to be inthe SQL 99 standard) 2. Can return a rowset or multiple rowsets 3. Accepts an indefinite number of parameters 4. Returns an execution state and none to many return values 5. Supports full server control internally, depending on language; thus SQL and PL/pgSQL procedures should support cursors, locak handling, transactions and database control language. I'm not sure that all of the above is in the standard; the text on Stored Procedures in SQL 99 seems to be somewhat skimpy, the committee perhaps leaving it up to te vendors to implement (pfaugh!) but all of the above is consistent with Stored Procedure implementation in MS SQL Server 7 and Oracle 8. -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
> Jon, > > > The nice thing about OIDs is that if you ever need to merge rows, you > > could make a function that searched all OID-type parameters and > > change the > > old reference to the new one. > > The same thing can be done with the 'universal_sq' approach. How exactly? The column types would just be INT8, right? > Since you're undoubtedly familiar with PostgreSQL functions, I will list > only how a stored procedure differes from a PgSQL function: > > 1. Is precompiled, with a saved plan of execution on the server. > (not sure about the wisdom of this but it appears to be in the SQL 99 > standard) > 2. Can return a rowset or multiple rowsets > 3. Accepts an indefinite number of parameters > 4. Returns an execution state and none to many return values > 5. Supports full server control internally, depending on language; thus > SQL and PL/pgSQL procedures should support cursors, locak handling, > transactions and database control language. > Aha! The only thing I don't get is what do you mean by "execution state"? (I'm guessing that you're not referring to Texas). Jon
Jon, > How exactly? The column types would just be INT8, right? Ah. I see what you mean. In my application, the columns are all *named* the same ... clients.usq, orders.usq, etc. Thus a search on "usq" through out the application will locate everything. If you needed and INT8 variable, you could define a custom type on INT8 ... and then search on that type. > Aha! The only thing I don't get is what do you mean by "execution > state"? > (I'm guessing that you're not referring to Texas). <Snicker> A variable indicating success or failure. e.g. In MSSQL, you get a ReturnValue of INT2 type ... with 0 indicating success, and <> 0 as an error number. Speaking of which, one thing I forgot to explicitly mention above is error trapping of some kind would be extremely useful ... -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
> "Josh Berkus" <josh@agliodbs.com> writes: > > Given this, why bother with system-generated OIDs on user rows at all? > > Why not simply reserve the OIDs for the system tables? > > An option to not generate OIDs unless requested (on a table-by-table > basis) has been discussed. It seems like a fine near-term solution > to me. 8-byte OIDs are a longer-term solution, because they'll break > a lot of things (including clients...) > > >> This is certainly not ideal, but it's not nearly as big a problem as > >> transaction ID wraparound. You can live with it, whereas right now > >> xact ID wraparound is catastrophic. That we gotta work on, soon. > > > Nothing like reassuring us commercial DB users, Tom. :-P > > Can you describe what you're talking about? > > It's in the archives: after 4G transactions, your database curls up > and dies. When your pg_log starts to approach 1Gbyte (2 bits per > transaction) you'd better plan on dump/initdb/reload. I wonder is we should check the size of pg_log on startup and warn the administrator? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Besides compatibility, what breaks when you make OIDs/Txn IDs > INT8s? Maybe there should be a minor fork called Postgres64 which does > this for those needing large tables. We have delayed the problem of oid/xid wraparound by putting out good releases so people upgrade every year or so. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026