Thread: Re: [NOVICE] Last ID Problem
Hi Tom and others, > I think the correct solution is not to mess with what's admittedly a legacy aspect of > our client API. Instead we should invent the "INSERT RETURNING" and "UPDATE RETURNING" > commands that have been discussed repeatedly (see the pghackers archives). That would > allow people to get what they want, and do so in only one network round trip, without > any artificial dependencies on OIDs or TIDs or anything else. It'd be unportable, but > surely no more so than relying on OIDs or TIDs ... Just off the top of my head, would it not be feasible to add a column to pg_class called lastinsert that points to the OID of the pg_attribute column to return after an insert? It could be changed using something similar to "ALTER TABLE x SET LASTINSERT TO y", but by default it would be set to the OID of the primary key of the table if the table specified WITHOUT OIDS at creation time, or the first column of the table otherwise. After the INSERT command, the value of the resulting is column is passed back to the client. I see that INSERT...RETURNING is a solution to the problem, but it seems somewhat strange to have to use an unportable command just to be able to return an identifier for the last inserted record... Kind regards, Mark. ------------------------ WebBased Ltd South West Technology Centre Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 791021 F: +44 (0)1752 791023 W: http://www.webbased.co.uk
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > Just off the top of my head, would it not be feasible to add a column to > pg_class called lastinsert that points to the OID of the pg_attribute column > to return after an insert? No. The thing everyone is ignoring here is that the INSERT command tag format is not something we can just go and change. You certainly could not put anything in it that wasn't an integer, and I'm not sure it would even be safe to put a bigint. So most of the cases you might actually want (timestamp, bigserial, etc) would be ruled out. Hardly worth inventing such a feature. > I see that INSERT...RETURNING is a solution to the problem, but it seems > somewhat strange to have to use an unportable command just to be able to > return an identifier for the last inserted record... How is what you're suggesting more portable? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > How is what you're suggesting more portable? Well, the driver would be free to implement $sth->last_insert_id() using whatever proprietary extensions it has available. The non-portableness would at least be hidden in the driver layer. Switch out the driver and the right thing would happen. "INSERT/UPDATE ... RETURNING" isn't something a driver can take advantage of. It would require it to modify your statements which it can't do safely. So your application would have such non-portable SQL code written into it. Switch databases and your application code needs to be ported. -- greg
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> How is what you're suggesting more portable? > Well, the driver would be free to implement $sth->last_insert_id() using > whatever proprietary extensions it has available. The non-portableness would > at least be hidden in the driver layer. Are you asserting that last_insert_id() is a portable function? I doubt it. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> How is what you're suggesting more portable? > > > Well, the driver would be free to implement $sth->last_insert_id() using > > whatever proprietary extensions it has available. The non-portableness would > > at least be hidden in the driver layer. > > Are you asserting that last_insert_id() is a portable function? I doubt > it. Well I'm not sure what you mean by "portable". It's part of the DBI driver definition, so in theory it is. Not all drivers will implement it though, or implement it properly, and for some it may be more efficient than others. For postgres it looks like currently it requires you to pass in the table and field might even need a "driver-specific hint" telling it the sequence name. At least an application using it has a hope of working on a new driver. An application using RETURNING will only work on Oracle and one day Postgres. So it would be nice if the Postgres driver could efficiently implement it without having to do a second SELECT and without having to know out of band info like a sequence name. This is from the DBI documentation -- that is, the non-driver-specific abstract interface documentation. "last_insert_id" $rv = $dbh->last_insert_id($catalog, $schema, $table, $field); $rv = $dbh->last_insert_id($catalog,$schema, $table, $field, \%attr); Returns a value 'identifying' the row just inserted, if possible. Typically this would be a value assignedby the database server to a column with an auto_increment or serial type. Returns undef if thedriver does not support the method or can't determine the value. The $catalog, $schema, $table, and $field parameters may be required for some drivers (see below). Ifyou don't know the parameter values and your driver does not need them, then use "undef" for each. There are several caveats to be aware of with this method if you want to use it for portable applications: * For some drivers the value may only available immediately after the insert statement has executed(e.g., mysql, Informix). * For some drivers the $catalog, $schema, $table, and $field parameters are required (e.g., Pg), forothers they are ignored (e.g., mysql). * Drivers may return an indeterminate value if no insert has been performed yet. * For some drivers the value may only be available if placeholders have not been used (e.g., Sybase,MS SQL). In this case the value returned would be from the last non-placeholder insert statement. * Some drivers may need driver-specific hints about how to get the value. For example, being told thename of the database 'sequence' object that holds the value. Any such hints are passed as driver-specificattributes in the \%attr parameter. * If the underlying database offers nothing better, then some drivers may attempt to implement thismethod by executing ""select max($field) from $table"". Drivers using any approach like this shouldissue a warning if "AutoCommit" is true because it is generally unsafe - another process may have modifiedthe table between your insert and the select. For situations where you know it is safe, suchas when you have locked the table, you can silence the warning by passing "Warn" => 0 in \%attr. * If no insert has been performed yet, or the last insert failed, then the value is implementation defined. Given all the caveats above, it's clear that this method must be used with care. The "last_insert_id" method was added in DBI 1.38. -- greg
Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > >>Tom Lane <tgl@sss.pgh.pa.us> writes: >> >>>How is what you're suggesting more portable? > > >>Well, the driver would be free to implement $sth->last_insert_id() using >>whatever proprietary extensions it has available. The non-portableness would >>at least be hidden in the driver layer. > > > Are you asserting that last_insert_id() is a portable function? I doubt > it. I'm not familiar with the Perl interface, but JDBC has a standardized interface for this: http://java.sun.com/j2se/1.5.0/docs/api/java/sql/Statement.html#executeUpdate(java.lang.String,%20int) http://java.sun.com/j2se/1.5.0/docs/api/java/sql/Statement.html#getGeneratedKeys() I tend to agree that a protocol-level change is easier to support in a driver. If it's done by extending INSERT/UPDATE, the driver will need to parse and modify queries which is hairy at the best of times. -O
Greg Stark <gsstark@mit.edu> writes: > This is from the DBI documentation -- that is, the non-driver-specific > abstract interface documentation. > Returns a value 'identifying' the row just inserted, if possible. > Typically this would be a value assigned by the database server to > a column with an auto_increment or serial type. Aside from the numerous serious problems pointed out in the documentation, this has an even more fatal objection, which is that it's unspecified what the result value is and thus there is no portable way of *using* the result after you have it. (If the PG driver returns an OID you certainly couldn't use that the same way as some other driver that returns a primary key ... especially a multicolumn primary key ...) This "portable" function is so unportable that I see no reason to accept it as precedent. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Merlin Moncur wrote: > That is a shortcoming of the DBD::pg driver which really should be > returning a key (comprised of columns, some or none of which may be > defaulted by the server). Actually, the spec comes from DBI, not DBD::Pg, and is inspired by MySQL's last_insert_id function. It is a poorly-speced function, but we've done our best in the upcoming version of DBD::Pg, which will support it. Greg Stark wrote: > For postgres it looks like currently it requires you to pass in > the table and field might even need a "driver-specific hint" telling > it the sequence name. For the record, the only required field for DBD::Pg will be the table name, although the name of the sequence is highly encouraged. Here's the docs for the next version, the first which supports lii: last_insert_id $rv = $dbh->last_insert_id($catalog, $schema, $table, $field); $rv = $dbh->last_insert_id($catalog, $schema,$table, $field, \%attr); Attempts to return the id of the last value to be inserted into a table. You can either provide a sequence name (preferred) or provide a table name with optional schema. The $catalog and $field arguments are always ignored. The current value of the sequence is returned by a call to the 'currval' PostgreSQL function. This will fail if the sequence has not yet been used in the current database connection. If you do not know the name of the sequence, you can provide a table name and DBD::Pg will attempt to return the correct value. To do this, there must be at least one column in the table with a C<NOT NULL> constraint, that has a unique constraint, and which uses a sequence as a default value. If more than one column meets these conditions, the primary key will be used. This involves some looking up of things in the system table, so DBD::Pg will cache the sequence name for susequent calls. If you need to disable this caching for some reason, you can control it via the 'pg_cache' attribute. Please keep in mind that this method is far from foolproof, so make your script use it properly. Specifically, make sure that it is called immediately after the insert, and that the insert does not add a value to the column that is using the sequence as a default value. Some examples: $dbh->do("CREATE SEQUENCE lii_seq START 1"); $dbh->do("CREATE TABLE lii ( foobar INTEGER NOT NULL UNIQUEDEFAULT nextval('lii_seq'), baz VARCHAR)"); $SQL = "INSERT INTO lii(baz) VALUES (?)"; $sth = $dbh->prepare($SQL);for (qw(uno dos tres quattro)) { $sth->execute($_); my $newid = $dbh->last_insert_id(C<undef>,undef,undef,undef,{sequence=>'lii_seq'}); print "Last insert id was $newid\n"; } If you did not want to worry about the sequence name: $dbh->do("CREATE TABLE lii2 ( foobar SERIAL UNIQUE, baz VARCHAR)");$SQL = "INSERT INTO lii2(baz) VALUES (?)"; $sth = $dbh->prepare($SQL); for (qw(uno dos tres quattro)) { $sth->execute($_); my $newid = $dbh->last_insert_id(undef,undef,"lii2",undef); print "Last insert id was $newid\n"; } - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200502022110 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCAYgSvJuQZxSWSsgRAgg3AJ4id98pta0CQR2w3xgwkxnph7qW4wCeMAJH g/eXhtcmvXei9mESDDXg/s8= =QaUa -----END PGP SIGNATURE-----
> Attempts to return the id of the last value to be inserted into a table. > You can either provide a sequence name (preferred) or provide a table > name with optional schema. The $catalog and $field arguments are always ignored. > The current value of the sequence is returned by a call to the > 'currval' PostgreSQL function. This will fail if the sequence has not yet > been used in the current database connection. This suffers from the same problems that currval does when using connection pools tho. I previously suggested a function similar to last_insert_id in behaviour, and have attached it to this email for reference. Even so, this also suffers from the same problems when using a connection pool. The solution I proposed, namely having the tuple returned by inserts/updates (perhaps even deletes?) would only mean changing the client library to handle this, and as an example, libpg could easily figure out the OID of said tuple and return that if it's present for PQExec() (for backwards compatibility just as it does today,) and add a separate PQExecSelect() that instead returns the tuple(s) as if they had been SELECTed. -- John Hansen <john@geeknet.com.au> GeekNET
Attachment
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > This suffers from the same problems that currval does when using > connection pools tho. I still don't see this as much of a real world problem however, more of a "doctor, it hurts when I do this" variety. As the DBD::Pg docs point out, you should not separate the calls to insert and currval far apart, and certainly not across connections. I have a hard time visualizing a case where an app would ever need to worry about the problem anyway, unless they were using pooling in a strange way and had a very poorly written application. > The solution I proposed, namely having the tuple returned by > inserts/updates (perhaps even deletes?) would only mean changing the > client library to handle this, and as an example, libpg could easily > figure out the OID of said tuple and return that if it's present for > PQExec() (for backwards compatibility just as it does today,) and add a > separate PQExecSelect() that instead returns the tuple(s) as if they had > been SELECTed. There's a few issues with the above, however, the most important of which is that OIDs are going away, and then what do you use? Also, it does not handle cases where the insert necessarily happened with a direct INSERT via PQexec: the insert could have happened inside of a called function, or a trigger could have inserted into three different tables. Truth be told, I don't think the whole last_insert_id() in DBI is a very useful function anyway. It's mainly (at this point) a quick abstraction of a nextval call between Oracle and PostgreSQL. We do go out of our way to be more compatible to MySQL by accepting just a table name, but one should really use the sequence directly, IMO. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200502030012 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCAbQRvJuQZxSWSsgRArYMAKC4Kgsv153HHbC05AtraAh4O7oL9wCgtDmR zoucziPs5cyC1at00M8MC9w= =PDUD -----END PGP SIGNATURE-----
Hi Tom, > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 02 February 2005 15:35 > To: Mark Cave-Ayland > Cc: 'Alvaro Herrera'; 'Michael Fuhr'; 'Mitch Pirtle'; 'Tatsuo > Ishii'; pgsql-hackers@postgresql.org; operationsengineer1@yahoo.com > Subject: Re: [NOVICE] Last ID Problem (cut) > No. The thing everyone is ignoring here is that the INSERT > command tag format is not something we can just go and > change. You certainly could not put anything in it that > wasn't an integer, and I'm not sure it would even be safe to > put a bigint. So most of the cases you might actually want > (timestamp, bigserial, etc) would be ruled out. Hardly worth > inventing such a feature. OK, I didn't realise it was the command tag that was the issue here. I took a look at the libpq source to see how the INSERT tag works and it looks like it currently assumes a string of numbers. So as a minimum it would need some form of protocol extension to get this work (which I see from the archives that you were not keen to pursue). > How is what you're suggesting more portable? I was hoping that it would only require minimal change (but obviously that is not the case). I think, out of the remaining options, that keeping with currval() is going to be the best approach - I can't really see the benefit of using a non-SQL standard command just for SQL insertion, mainly as you and others have suggested for portability reasons :( Kind regards, Mark. ------------------------ WebBased Ltd South West Technology Centre Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 791021 F: +44 (0)1752 791023 W: http://www.webbased.co.uk