Thread: portable DBAPI auto-increment
Hi, I have a portability issue that I'm not sure what to do with: I need an automatically unique column, but I don't see a portable way to ask for it in my code. For my example, I'm going to use column k as the key and v as the value, though in real life there are another 10 or so columns. In sqlite and mysql, I can do c.execute("insert into foo ( v ) values ( 'xyz' )") k = c.lastrowid In psycopg2, I find that lastrowid is always 0, even though it is making rows in the table with k=1, k=2, etc. I did some searching and found that the value of lastrowid is not taken from the autoincrementing column, but is some non-unique entity called an OID. The recommended workaround seems to be c.execute("insert into foo ( v ) values ( 'xyz' ) returning k") To do this, I would need to make a special case for inserting into tables that have an autoincrement key. For example, I might use k = my_db_hack.insert_with_autoinc(c, " insert into foo ( v ) values ('xyz')") where insert_with_autoinc would recognize the postgres db interface and add the "returning" and read the results out of the cursor. It seems kind of klunky, though. So, my question is: Is there a more portable way to do this that I'm not thinking of? Secondarily, I see the the postgres documentation recommends not using the OID in your own tables. What use is the OID? Why doesn't c.lastrowid to return the value of the serial column, or None if there isn't one? Mark S.
On 07/04/11 22:40, Mark Sienkiewicz wrote: > Hi, > > I have a portability issue that I'm not sure what to do with: I need an > automatically unique column, but I don't see a portable way to ask for > it in my code. > > For my example, I'm going to use column k as the key and v as the value, > though in real life there are another 10 or so columns. > > In sqlite and mysql, I can do > c.execute("insert into foo ( v ) values ( 'xyz' )") > k = c.lastrowid Declare k as "serial" and then do the following: c.execute("INSERT INTO foo (v) VALUES ('xyz') RETURNING k") k = c.fetchone()[0] Hope this helps, federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. -- D.E.Knuth
On Thu, Apr 07, 2011 at 04:40:29PM -0400, Mark Sienkiewicz wrote: > I have a portability issue that I'm not sure what to do with: I need > an automatically unique column, but I don't see a portable way to ask > for it in my code. SELECT automatically_unique_column FROM table_in_question > For my example, I'm going to use column k as the key and v as the > value, though in real life there are another 10 or so columns. > > In sqlite and mysql, I can do > c.execute("insert into foo ( v ) values ( 'xyz' )") > k = c.lastrowid .lastrowid is an implementation detail depending on the database to keep a unique column on your rows behind your back. PostgreSQL used to do that (the row OID) but that's been deprecated because it serves no known purpose that cannot easily be served otherwise. > In psycopg2, I find that lastrowid is always 0, even though it is > making rows in the table with k=1, k=2, etc. > > I did some searching and found that the value of lastrowid is not > taken from the autoincrementing column, but is some non-unique entity > called an OID. OIDs *are* unique per table. > So, my question is: Is there a more portable way to do this that I'm > not thinking of? Maybe second thougth needs to be given to whether .lastrowid is the right tool for the job. > Secondarily, I see the the postgres documentation recommends not > using the OID in your own tables. What use is the OID? Why doesn't > c.lastrowid to return the value of the serial column, or None if > there isn't one? Consider this table: create table temp ( pk serial primary key, pk2 serial unique, pk3 integer unique not null ) with OIDS; Which value would you expect psycopg2 to return as .lastrowid ? They *all* fulfill the criteria: uniquely identify the row (.oid, .pk, .pk2, .pk3). Using RETURNING does not cut it either because one can return nearly arbitary data. Now, a case could be made to let the user *tell* psycopg2 what to use for .lastrowid on any given table. That's not as trivial as it sounds, however. Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert wrote: >> In sqlite and mysql, I can do >> c.execute("insert into foo ( v ) values ( 'xyz' )") >> k = c.lastrowid >> > > .lastrowid is an implementation detail depending on the > database to keep a unique column on your rows behind your back. > Yes. From the available documentation, this looks like the obvious purpose for .lastrowid existing at all. But, Joe Abbate wrote: > > As I recall, ROWID --which is only defined tautologically in PEP-249 as > "the 'Row ID' column"-- is an Oracle feature. There it's a pseudo column > that is normally hidden (does not appear in SELECT *). > ... > > If MySQL and SQLite have interpreted ROWID as the value of an > auto-increment column, that makes using it non-standard and therefore by > definition non-portable. That is totally non-obvious from the PEP, but _extremely_ useful to know. ( And, yes, both sqlite3 and MySQLdb return the value of the autoincrement column in lastrowid. ) So, Karsten Hilbert wrote: > Maybe second thougth needs to be given to whether .lastrowid > is the right tool for the job. > Evidently, you are right. Currently, I expect the database to create a unique identifier for the record on insert. Instead, I need to make a separate UID generator and insert the record with a previously generated UID. Federico Di Gregorio wrote: > Declare k as "serial" and then do the following: > > c.execute("INSERT INTO foo (v) VALUES ('xyz') RETURNING k") > k = c.fetchone()[0] > That basic code could be the core of the UID generation. It would also need to deal with possibly non-unique numbers after it wraps (in postgres, the value after 2147483647 is 1), but I probably have at least 5 years to figure that out. (For comparison, mysql uses an unsigned 64 bit value for auto increment and chokes when it runs out. sqlite wraps, but it automatically finds a new key value that is not used.) Thanks to all of you for the help! Mark S.
On Fri, Apr 8, 2011 at 5:23 PM, Mark Sienkiewicz <sienkiew@stsci.edu> wrote: > That basic code could be the core of the UID generation. It would also need > to deal with possibly non-unique numbers after it wraps (in postgres, the > value after 2147483647 is 1), but I probably have at least 5 years to figure > that out. This is plain wrong. Where did you get this idea? piro=> create table testser (id serial); NOTICE: CREATE TABLE will create implicit sequence "testser_id_seq" for serial column "testser.id" CREATE TABLE piro=> alter SEQUENCE testser_id_seq restart with 2147483647; ALTER SEQUENCE piro=> insert into testser values (default) returning id; id ------------ 2147483647 (1 row) INSERT 0 1 piro=> insert into testser values (default) returning id; ERROR: integer out of range > (For comparison, mysql uses an unsigned 64 bit value for auto increment and > chokes when it runs out. sqlite wraps, but it automatically finds a new key > value that is not used.) Please, read the fine manual: PostgreSQL has a 64 bit bigserial data type too. -- Daniele
On Fri, Apr 8, 2011 at 12:42 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > On Fri, Apr 8, 2011 at 5:23 PM, Mark Sienkiewicz <sienkiew@stsci.edu> wrote: >> (For comparison, mysql uses an unsigned 64 bit value for auto increment and >> chokes when it runs out. sqlite wraps, but it automatically finds a new key >> value that is not used.) > > Please, read the fine manual: PostgreSQL has a 64 bit bigserial data type too. In addition, if you absolutely have to have the full 64 bit range, you can set the starting value of the sequence to -9223372036854775808. -- Thanks, David Blewett
On Fri, Apr 8, 2011 at 12:52 PM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > On Thu, Apr 07, 2011 at 04:40:29PM -0400, Mark Sienkiewicz wrote: >> In psycopg2, I find that lastrowid is always 0, even though it is >> making rows in the table with k=1, k=2, etc. >> >> I did some searching and found that the value of lastrowid is not >> taken from the autoincrementing column, but is some non-unique entity >> called an OID. > > OIDs *are* unique per table. Actually they are not - you need unique index on oid column for that. That is also one of the reasons to avoid them... -- marko
On Thu, Apr 7, 2011 at 9:40 PM, Mark Sienkiewicz <sienkiew@stsci.edu> wrote: > I have a portability issue that I'm not sure what to do with: I need an > automatically unique column, but I don't see a portable way to ask for it in > my code. > > For my example, I'm going to use column k as the key and v as the value, > though in real life there are another 10 or so columns. > > In sqlite and mysql, I can do > c.execute("insert into foo ( v ) values ( 'xyz' )") > k = c.lastrowid > > In psycopg2, I find that lastrowid is always 0, even though it is making > rows in the table with k=1, k=2, etc. > > I did some searching and found that the value of lastrowid is not taken from > the autoincrementing column, but is some non-unique entity called an OID. > The recommended workaround seems to be > c.execute("insert into foo ( v ) values ( 'xyz' ) returning k") > > To do this, I would need to make a special case for inserting into tables > that have an autoincrement key. For example, I might use > k = my_db_hack.insert_with_autoinc(c, " insert into foo ( v ) values > ('xyz')") > > where insert_with_autoinc would recognize the postgres db interface and add > the "returning" and read the results out of the cursor. It seems kind of > klunky, though. > > So, my question is: Is there a more portable way to do this that I'm not > thinking of? I would say no, "returning" is the best option, "select currval(seq_name)" a second -vastly inferior- one. PostgreSQL doesn't give any special value to a field of type serial, nor to the primary key of a table. It's not possible for psycopg to add back such value without a) parsing the query, b) querying the system catalog, 3) add policies of its own on top of the database and 4) rewrite the query. None of these things are likely to happen in psycopg, albeit you may find some of those in some higher level tool using psycopg as its driver. Looks like you are hoping to write "portable SQL queries". I don't believe this approach will bring you very far: if you want to write portable _queries_ you need a tool that generates the _SQL_ for you, or you will have problems in any query but the simplest ones. Hard? Yes, it is. Take a look at SQLAlchemy if you want portable queries, or be prepared to work at a level higher than the SQL. > Secondarily, I see the the postgres documentation recommends not using the > OID in your own tables. What use is the OID? Why doesn't c.lastrowid to > return the value of the serial column, or None if there isn't one? The use of the OID is mostly historical and currently only limited to the system catalog for the limitations you have probably read in the docs. c.lastrowid is not set to the value of the serial column because PostgreSQL doesn't tell it. psycopg doesn't even know the query you have sent is an insert, let alone that this insert was performed on a table that happened to have exactly one serial column (that by the way is not even a type: it's only a value for the default and its type may actually be anything). -- Daniele
Daniele Varrazzo wrote: > On Fri, Apr 8, 2011 at 5:23 PM, Mark Sienkiewicz <sienkiew@stsci.edu> wrote: > > >> That basic code could be the core of the UID generation. It would also need >> to deal with possibly non-unique numbers after it wraps (in postgres, the >> value after 2147483647 is 1), but I probably have at least 5 years to figure >> that out. >> > > This is plain wrong. Where did you get this idea? > I did an experiment: test=# create table foo ( k serial, v varchar(10) ); NOTICE: CREATE TABLE will create implicit sequence "foo_k_seq" for serial column "foo.k" CREATE TABLE test=# insert into foo ( k, v ) values ( 2147483647, 'max' ); INSERT 0 1 test=# insert into foo ( v ) values ( 'a' ) ; INSERT 0 1 test=# select * from foo; 2147483647 | max 1 | a test=# So, for a "serial" column with 2147483647 in it, I got 1 for the next value inserted. I read your example to find out what you were doing different and then went back to the postgres manual. What I see is that "serial" is not the same thing as auto-increment -- it is different thing entirely that you can use instead of auto-increment in some cases. That is important to know, but was not immediately obvious to me from what I have read so far. >> (For comparison, mysql uses an unsigned 64 bit value for auto increment and >> chokes when it runs out. sqlite wraps, but it automatically finds a new key >> value that is not used.) >> > > Please, read the fine manual: PostgreSQL has a 64 bit bigserial data type too. > Yes, I do know about the manual and I have spent considerable time reading it. As you may know about manuals for a system of this size, it takes time to find everything you need to know and to understand all the implications. Mark S.
On 04/08/2011 10:43 AM, Mark Sienkiewicz wrote: > Daniele Varrazzo wrote: > > > I did an experiment: > > test=# create table foo ( k serial, v varchar(10) ); > NOTICE: CREATE TABLE will create implicit sequence "foo_k_seq" for > serial column "foo.k" > CREATE TABLE > test=# insert into foo ( k, v ) values ( 2147483647, 'max' ); > INSERT 0 1 > test=# insert into foo ( v ) values ( 'a' ) ; > INSERT 0 1 > test=# select * from foo; > 2147483647 | max > 1 | a > > test=# > > So, for a "serial" column with 2147483647 in it, I got 1 for the next > value inserted. I read your example to find out what you were doing > different and then went back to the postgres manual. What I see is that > "serial" is not the same thing as auto-increment -- it is different > thing entirely that you can use instead of auto-increment in some cases. > That is important to know, but was not immediately obvious to me from > what I have read so far. Actually, it is the same thing. A SERIAL or BIGSERIAL type is just a macro that sets up an auto-increment field. > > > > Mark S. > > -- Adrian Klaver adrian.klaver@gmail.com
On Fri, Apr 08, 2011 at 08:36:25PM +0300, Marko Kreen wrote: > > On Thu, Apr 07, 2011 at 04:40:29PM -0400, Mark Sienkiewicz wrote: > >> In psycopg2, I find that lastrowid is always 0, even though it is > >> making rows in the table with k=1, k=2, etc. > >> > >> I did some searching and found that the value of lastrowid is not > >> taken from the autoincrementing column, but is some non-unique entity > >> called an OID. > > > > OIDs *are* unique per table. > > Actually they are not - you need unique index on oid column for that. > > That is also one of the reasons to avoid them... Thanks for the correction. Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Daniele Varrazzo wrote: > I would say no, "returning" is the best option, "select > currval(seq_name)" a second -vastly inferior- one. > I can see the logic in that. "Returning" happens in one statement, but select currval followed by an insert means you have to go to the database twice. > Looks like you are hoping to write "portable SQL queries". I don't > believe this approach will bring you very far: if you want to write > portable _queries_ you need a tool that generates the _SQL_ for you, > or you will have problems in any query but the simplest ones. Hard? > Yes, it is. Take a look at SQLAlchemy if you want portable queries, or > be prepared to work at a level higher than the SQL. > I'm aware of the risk, but I've had reasonable enough success so far for routine queries that I don't want to get into the extra cost of using an sql generator. Most of what I do with a database is look things up in tables, and it is pretty easy to write even fairly complicated select statements that don't get into database-specifics. This auto-increment field is the most unusual thing I do in my current application, so working around the database differences there is still less work than converting to an sql generator. I appreciate the advice, though. In other circumstances, I would look into it. Mark S.