Getting sequence-generated IDs from multiple row insert - Mailing list pgsql-general

From Ben Hoyt
Subject Getting sequence-generated IDs from multiple row insert
Date
Msg-id CAL9jXCHVg0CVEke+eiMjYA_UBffnacGYEN9Psb1h-kFw8QLxrw@mail.gmail.com
Whole thread Raw
Responses Re: Getting sequence-generated IDs from multiple row insert  (Andrew Sullivan <ajs@crankycanuck.ca>)
List pgsql-general
Hi folks,

I've just run into a subtle but fairly serious race condition while using web.py's SQL library to insert multiple rows into the database and return their IDs (a "serial primary key" column). Specifically I'm using the multiple_insert() function that web.py defines here:


This function runs a query like this:

    INSERT INTO table (col1, col2) VALUES (col1_1, col2_1), (col1_2, col2_2), ...; SELECT currval('table_id_seq');

Using the output of the currval(), web.py tries to build a list of the most recent IDs by creating a range from "currval - num_rows_inserted + 1" through "currval". In Python:

    out = range(out-len(values)+1, out+1)

This *looks* nice, and must have seemed fine to the developers who implemented it, but I've just hit a case where two sessions each doing a multiple insert don't use sequential IDs. For example, the range code above for the first insert gave 2117552...2117829. And the second insert gave 2117625...2117818. Which are obviously overlapping and is a nasty bug waiting to happen. Thankfully it caused an IntegrityError further down in my code so I didn't screw things up.

First of all, I presume this is expected, and is how the sequence with a multi-row insert is supposed to work? In other words, the sequence guarantees the IDs will be unique, but with multi-row insert, they won't necessarily be consecutive? If so, it's a fairly serious bug in web.py's multiple_insert(), which probably shouldn't return anything due to this issue.

Second, what's the right thing to do here? The first thing I found was PostgreSQL's "RETURNING" clause, but somewhat frustratingly for this use case, even that's not guaranteed to return the results in the order you specified. I need the IDs in insertion order so I can do further processing. Tom Lane and others in this thread indicate that this is not a guarantee of the RETURNING clause, for future optimization reasons and due to how SQL handles sets:


So currently I've changed my code to use RETURNING and then I'm ordering the results based on a secondary column that I know the order of. This works, but seems clunky, so I'm wondering if there's a nicer way.

Thanks,
Ben

pgsql-general by date:

Previous
From: Leonardo M. Ramé
Date:
Subject: Re: Complex query
Next
From: Edson Richter
Date:
Subject: Re: Why does "checkpointer" is consumig ~1.2Gb of RAM?