Thread: Call for objections: put back OIDs in CREATE TABLE AS/SELECT INTO
We've gotten a couple of complaints now about the fact that 7.3 doesn't include an OID column in a table created via CREATE TABLE AS or SELECT INTO. Unless I hear objections, I'm going to revert it to including an OID, and back-patch the fix for 7.3.2 as well. See discussion a couple days ago on pgsql-general, starting at http://archives.postgresql.org/pgsql-general/2003-01/msg00669.php regards, tom lane
Re: Call for objections: put back OIDs in CREATE TABLE AS/SELECT INTO
From
"Christopher Kings-Lynne"
Date:
Why don't you just include them by default, otherwise if WITHOUT OIDS appears in the CREATE TABLE command, then don't include them ? Chris > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane > Sent: Wednesday, 22 January 2003 4:12 AM > To: pgsql-hackers@postgresql.org > Subject: [HACKERS] Call for objections: put back OIDs in CREATE TABLE > AS/SELECT INTO > > > We've gotten a couple of complaints now about the fact that 7.3 doesn't > include an OID column in a table created via CREATE TABLE AS or SELECT > INTO. Unless I hear objections, I'm going to revert it to including an > OID, and back-patch the fix for 7.3.2 as well. See discussion a couple > days ago on pgsql-general, starting at > http://archives.postgresql.org/pgsql-general/2003-01/msg00669.php > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > Why don't you just include them by default, otherwise if WITHOUT OIDS > appears in the CREATE TABLE command, then don't include them ? Well, adding a WITHOUT OIDS option to CREATE TABLE AS would be a new feature, which I don't have the time/interest for. I won't do anything to preclude someone else implementing that, though ;-) regards, tom lane
Re: Call for objections: put back OIDs in CREATE TABLE AS/SELECT INTO
From
"Christopher Kings-Lynne"
Date:
> "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > > Why don't you just include them by default, otherwise if WITHOUT OIDS > > appears in the CREATE TABLE command, then don't include them ? > > Well, adding a WITHOUT OIDS option to CREATE TABLE AS would be a new > feature, which I don't have the time/interest for. I won't do anything > to preclude someone else implementing that, though ;-) Oh, I thought it already had it from the CREATE TABLE bit... Does sound like it would be a good ultimate solution tho. Chris
On Tue, 21 Jan 2003, Tom Lane wrote: > We've gotten a couple of complaints now about the fact that 7.3 doesn't > include an OID column in a table created via CREATE TABLE AS or SELECT > INTO. Unless I hear objections, I'm going to revert it to including an > OID, and back-patch the fix for 7.3.2 as well. I object. I personally think we should be moving towards not using OIDs as the default behaviour, inasmuch as we can, for several reasons: 1. It's not a relational concept. 2. The OID wraparound problem can get you. 3. Other SQL databases don't do this. 4. It's hidden, rather than exposed, and hidden things are generally a bad idea. 5. We should default to what gives us better performance, rather than worse. > See discussion a couple days ago on pgsql-general, starting at > http://archives.postgresql.org/pgsql-general/2003-01/msg00669.php There didn't seem to be many people clamouring to have it back. The ideal sitaution for me would be to have WITHOUT OIDS be the default for all table creations, and but of course allow WITH OIDS for backward compatability. But yeah, I know that this can introduce problems with old dumps, and may not be entirely easy to implement. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
On Thursday 23 January 2003 06:34, Curt Sampson wrote: > The ideal sitaution for me would be to have WITHOUT OIDS be the default > for all table creations, and but of course allow WITH OIDS for backward Why not make it a configuration option? I can actually think of a third behaviour that would make sense for me. Have it so that OIDs are created by default if there is no primary key defined and not if there is. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
Curt Sampson <cjs@cynic.net> writes: > I object. I personally think we should be moving towards not using OIDs > as the default behaviour, inasmuch as we can, for several reasons: All these objections are global in nature, not specific to CREATE TABLE AS. The argument that persuaded me to do something here is that CREATE TABLE AS should not be different from CREATE TABLE's default behavior. I have no problem with moving towards lack-of-OIDs as the default behavior for both statements, in the long run, if we can get past the compatibility issues. But I don't think OIDs in user tables are costing us anything much, so I'm not prepared to take any big compatibility hit to change the default ... regards, tom lane
Re: Call for objections: put back OIDs in CREATE TABLE AS/SELECT INTO
From
"Ross J. Reedstrom"
Date:
On Thu, Jan 23, 2003 at 10:03:28AM -0500, Tom Lane wrote: > Curt Sampson <cjs@cynic.net> writes: > > I object. I personally think we should be moving towards not using OIDs > > as the default behaviour, inasmuch as we can, for several reasons: > > All these objections are global in nature, not specific to CREATE TABLE > AS. The argument that persuaded me to do something here is that CREATE > TABLE AS should not be different from CREATE TABLE's default behavior. > > I have no problem with moving towards lack-of-OIDs as the default > behavior for both statements, in the long run, if we can get past the > compatibility issues. But I don't think OIDs in user tables are costing > us anything much, so I'm not prepared to take any big compatibility hit > to change the default ... Agreed as to taking the compatability hit in the 7.3 branch (you _were_ talking about changing 7.3, weren't you?) But I think Curt and D'Arcy have a point: what OIDs are costing the DBAs and PostgreSQL developing community is the pain of having an 'almost' solution in place. OIDs have always been the unwanted child in PostgreSQL: the 'pure relational' people don't want them, and the Object people are misled into thinking we've got a _real_ object id. On the relational side, they've stood in for proper use of primary keys (as D'Arcy points out), partly because it's so _easy_ to misuse them that way: the wire protocol returns the OID for free in some cases, and the interface libraries make it easy to get at. So the immediate case, changing the default (in 7.3) to match the CREATE TABLE case makes sense. However, we need to wean developers off using OIDs. I've been working with Diedrich Vorberg on a thin python object relational mapping interface (his Object Relational Membrane - ORM) and this was a central problem: you _need_ a unique id for an object, and the oid seemed so natural ... So in the longer term, we need to provide a replacement. Arguably, the primary key for a table is the right replacement, but we don't _require_ a pkey, so what to do in cases where this isn't one? Also, the pkey can be _any_ column(s), of _any_ type, which could be inconvenient for returning as the result of an insert, for example (imagine a text field as pkey, with a _huge_ block of text just written into it ...) Ross
Curt Sampson kirjutas N, 23.01.2003 kell 13:34: > On Tue, 21 Jan 2003, Tom Lane wrote: > > > We've gotten a couple of complaints now about the fact that 7.3 doesn't > > include an OID column in a table created via CREATE TABLE AS or SELECT > > INTO. Unless I hear objections, I'm going to revert it to including an > > OID, and back-patch the fix for 7.3.2 as well. > > I object. I personally think we should be moving towards not using OIDs > as the default behaviour, inasmuch as we can, for several reasons: I re-object > 1. It's not a relational concept. so are other system tuples (cid, tid, tableiod, ...). It is an OO concept. > 2. The OID wraparound problem can get you. put an unique index on OID column. > 3. Other SQL databases don't do this. Ask Date, hell tell you that SQL is evil, i.e. not relational ;) > 4. It's hidden, rather than exposed, and hidden things are generally a > bad idea. AFAIK carrying hidden weapons is forbidden in most of USA, in Europe you usually are forbidden to carry hand-weapons _exposed_ ;) > 5. We should default to what gives us better performance, rather than > worse. Not if it breaks anything ;) > > See discussion a couple days ago on pgsql-general, starting at > > http://archives.postgresql.org/pgsql-general/2003-01/msg00669.php > > There didn't seem to be many people clamouring to have it back. > > The ideal sitaution for me would be to have WITHOUT OIDS be the default > for all table creations, and but of course allow WITH OIDS for backward > compatability. But yeah, I know that this can introduce problems with > old dumps, and may not be entirely easy to implement. If you need a no-OID table, and INSERT INTO it. > cjs -- Hannu Krosing <hannu@tm.ee>
On Thu, 23 Jan 2003, Ross J. Reedstrom wrote: > So in the longer term, we need to provide a replacement. Arguably, the > primary key for a table is the right replacement, but we don't _require_ > a pkey, so what to do in cases where this isn't one? You're stuck. SQL breaks with relational theory in this way; tables need not have candidate keys, and thus you can have duplicate rows in a table. (I.e., mathamatically, a table is not a set, it's a bag.) The implications of this start to go on and on, but let's not get into that here. > Also, the pkey can be _any_ column(s), of _any_ type, which could be > inconvenient for returning as the result of an insert, for example > (imagine a text field as pkey, with a _huge_ block of text just > written into it ...) Well, this could be worked around to some extent, with some hackery. But in the end I reckon it's much easier just to have the object system force you to declare specific a specific object-ID column, if that's what it takes. So long as you've got a candidate key, even if it's not the primary key, you're fine. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
On Fri, 23 Jan 2003, Hannu Krosing wrote: > > 1. [OIDs are] not a relational concept. > so are other system tuples (cid, tid, tableiod, ...). But there's a key difference here; nobody's advertising these others as any sort of row identifier: i.e., a candidate key. And besides, I wouldn't object at all to getting rid of these, except that they store essential system information and I can't figure out how to get rid of them. :-) > It is an OO concept. Well, it's not, because we have an OID wrap-around problem, so an OID is actually not an OID at all, but simply an arbitrary number tacked on to a row. Other rows, in the same or other tables can have the same OID. > > 2. The OID wraparound problem can get you. > put an unique index on OID column. That still doesn't make it a real OID, because you can't guarantee that two rows in different tables won't have the same OID. > > 3. Other SQL databases don't do this. > Ask Date, hell tell you that SQL is evil, i.e. not relational ;) I did, he said that, and I agreed with him. :-) So now we have something that's evil because it's not relational and also evil because it's not SQL. Double-yuck! > > 5. We should default to what gives us better performance, rather than > > worse. > Not if it breaks anything ;) I disagree. We have to weigh the cost of the breakage versus the benefits in each individual circumstance. We've broken plenty of things before because we felt it was better to do so than maintain backward compatability. Because of its history as a research tool, there's a lot of experimental stuff in postgres that, in hindsight, we can say didn't work so well. When we find something that's not working so well, we should at least consider making some sort of move toward the "right thing," rather than continuing to do the wrong thing forever just for the sake of backwards compatability. Summary: I don't want to hear absolutes like "we should never break backwards compatibility." I want to hear arguments that the cost of breaking backwards compatability is X, and the benefit of the new way of doing things is Y, and here is why you think X > Y. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Ross, you make some powerful arguments here. Probably the most significant was the idea that you need a unique identifier for every row, and it should be of a consistent type, which primary key is not. We clearly need a GUC parameter to turn on/off oids. But it seems we will always need the ability to return something like OID to the user if the user wants it. What it seems we need is a 64-bit oid someday. As an aside, as Tom already said, the 7.3.X patch is just to make CREATE TABLE and CREATE TABLE AS behave the same for OIDs. It does not effect our defaults for future releases, though this little change in 7.3.0 did show use that some folks are using OID and did miss them. --------------------------------------------------------------------------- Ross J. Reedstrom wrote: > On Thu, Jan 23, 2003 at 10:03:28AM -0500, Tom Lane wrote: > > Curt Sampson <cjs@cynic.net> writes: > > > I object. I personally think we should be moving towards not using OIDs > > > as the default behaviour, inasmuch as we can, for several reasons: > > > > All these objections are global in nature, not specific to CREATE TABLE > > AS. The argument that persuaded me to do something here is that CREATE > > TABLE AS should not be different from CREATE TABLE's default behavior. > > > > I have no problem with moving towards lack-of-OIDs as the default > > behavior for both statements, in the long run, if we can get past the > > compatibility issues. But I don't think OIDs in user tables are costing > > us anything much, so I'm not prepared to take any big compatibility hit > > to change the default ... > > Agreed as to taking the compatability hit in the 7.3 branch (you _were_ > talking about changing 7.3, weren't you?) But I think Curt and D'Arcy > have a point: what OIDs are costing the DBAs and PostgreSQL developing > community is the pain of having an 'almost' solution in place. OIDs have > always been the unwanted child in PostgreSQL: the 'pure relational' people > don't want them, and the Object people are misled into thinking we've got > a _real_ object id. On the relational side, they've stood in for proper > use of primary keys (as D'Arcy points out), partly because it's so _easy_ > to misuse them that way: the wire protocol returns the OID for free in > some cases, and the interface libraries make it easy to get at. > > So the immediate case, changing the default (in 7.3) to match the CREATE > TABLE case makes sense. However, we need to wean developers off using > OIDs. I've been working with Diedrich Vorberg on a thin python object > relational mapping interface (his Object Relational Membrane - ORM) > and this was a central problem: you _need_ a unique id for an object, > and the oid seemed so natural ... > > So in the longer term, we need to provide a replacement. Arguably, the > primary key for a table is the right replacement, but we don't _require_ > a pkey, so what to do in cases where this isn't one? Also, the pkey > can be _any_ column(s), of _any_ type, which could be inconvenient for > returning as the result of an insert, for example (imagine a text field > as pkey, with a _huge_ block of text just written into it ...) > > Ross > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sun, 26 Jan 2003, Bruce Momjian wrote: > Ross, you make some powerful arguments here. Probably the most > significant was the idea that you need a unique identifier for every > row, and it should be of a consistent type, which primary key is not. I don't see why you need a unqiue identifier per row, nor do I see why, if you are going to have one, it needs to be the same type across all tables. Having this may be very desirable, and even necessary, for many or all object-to-relational mapping frameworks, but that is certainly not the only thing that postgres is used for. And I still maintain that if something does need something like of OIDs, it should be declared explicitly in the database schema (as you have to do in other DBMSes) and not use a "hidden" feature. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
> > Ross, you make some powerful arguments here. Probably the most > > significant was the idea that you need a unique identifier for every > > row, and it should be of a consistent type, which primary key is not. > > I don't see why you need a unqiue identifier per row, nor do I see why, > if you are going to have one, it needs to be the same type across all > tables. If i had table with multi col primary key like... create table devices ( major int4, minor int4, primary key (major, minor)); ... and do this: insert into devices (major, minor values (224, find_free_minor_for(224)) should the database report something like INSERT '{<([\'224\', \'89\'])>}' 1 which I could then parse in my client program and try to recover my fresh brand new primary key from it? No thanks... Anyways, I've got an idea: what about having option that INSERTs return "oid_status" in form major = '224' and minor = '10' or state = 'ca' ? Then you could just throw this expression into a select query after where ;P And tables would never need row oids... -- Antti Haapala
On Mon, 27 Jan 2003, Antti Haapala wrote: > > I don't see why you need a unqiue identifier per row, nor do I see why, > > if you are going to have one, it needs to be the same type across all > > tables. (Note here: it may not have been quite clear, but I'm not asking for specific instances of where you might want to do this; I'm asking why it should be forced upon every single table in the world, unless people a) know that postgresql does this, and b) use special SQL extensions that are not compatable with any other DMBS in the world.) > If i had table with multi col primary key like... > > create table devices ( > major int4, > minor int4, > primary key (major, minor) > ); > > ... and do this: > > insert into devices (major, minor values (224, find_free_minor_for(224)) > > should the database report something like > > INSERT '{<([\'224\', \'89\'])>}' 1 > > which I could then parse in my client program and try to recover my > fresh brand new primary key from it? No thanks... It's up to you. It sounds like in this particular application, you want a single integer as the primary key. So I have no objection to you changing the table to be create table devices (id serial PRIMARY KEY,major int4,minor int4,CONSTRAINT major_minor_unique UNIQUE (major, minor) ); and then selecting currval('devices_id_seq') in order to find out what the id of that record is. But my first question here is, why do you want to do this with what is effectively a hidden column, rather than explicitly showing that you need this, as above? And why do you want to run the risk of OID wraparound when you don't have to? Next, other applications might not need to parse whatever the database reports, or may know in advance what they've inserted. So why do you want to, by default, impose the overhead of this special hidden column on these other applications? > Anyways, I've got an idea: what about having option that INSERTs return > "oid_status" in form... I don't understand exactly how an INSERT statement "returns" anything. An INSERT statement is not a function, is it? However, I have no objection to adding a function or other method to get the primary key of the most recent insertion, assuming it exists, for those folks with multi-column primary keys. Presumably it would generate a result set just like a regular SELECT.... cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
> > Anyways, I've got an idea: what about having option that INSERTs return > > "oid_status" in form... > > I don't understand exactly how an INSERT statement "returns" anything. > An INSERT statement is not a function, is it? I mean the backend message CompletedResponse (and s/oid_status/PQoidStatus/ (as it's written in libpq-fe.h)) (ok, it's deprecated now in favor of PQoidValue). From postgresql docs see section Frontend-Backend protocol: Byte1('C') Identifies the message as a completed response. String The command tag. This is usually a single word thatidentifies which SQL command was completed. For an INSERT command, the tag is INSERT oid rows, where rows is the number of rows inserted, and oidis the object ID of the inserted row if rows is 1, otherwise oid is 0. Wouldn't it be nice to add here If table doesn't contain row oids, in place of oid is the primary key of the newly inserted record (if any)as column= 'value' [ and column = 'value [ and ... ]] -- Antti Haapala
On Mon, 27 Jan 2003, Antti Haapala wrote: > For an INSERT command, the tag is INSERT oid rows, where rows > is the number of rows inserted, and oid is the object ID of the > inserted row if rows is 1, otherwise oid is 0. > > Wouldn't it be nice to add here > > If table doesn't contain row oids, in place of oid is the > primary key of the newly inserted record (if any) as column = > 'value' [ and column = 'value [ and ... ]] Well, as was pointed out, that's a lot of data to send back if your primary key is a huge text column, and you've still got a problem if you have a multi-column primary key. Since this sort of functionality is not all that frequently needed, I'd still say it would probably be cleaner to make a new query that selects the most recently inserted primary key. That means that a) you don't have to send back a potentially large amount of data unless the user asks for it, and b) multi-column primary keys work just fine. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC