Thread: Increment a sequence by more than one
Hi,<br /><br /> I'm writing an import app in a third party language. It's going to use "copy to" to move data from STDINto a postgres (8.2) table. There are some complexities though: it's going to copy the records to a "versioned" tablefirst, and then at a later time the records will be copied by a different procedure to the "live" table. The live tableand versioned table are identical in terms of their field definitions. But there is no sequence associated with theversioned table (whose primary key is "id" plus "import_group_id", whereas the live table's pk is just "id"). So all versionedtable entries must already "know" what their id would be in the live table. (This makes sense for other businessprocess we have, but it's a bit of a problem in this instance).<br /><br /> My problem: I'd like to be able to graba block of id's from the live table's pk sequence. So let's say my importer has 5,000 new rows to import and the currentmax pk in the live table is 540,203. I'd like to be able to increment the primary key sequence in such a way thatI get a block of ids all to myself and the sequence is reset to 545,203 with a guarantee that all the id's between 540203and 545203 are unused.<br /><br /> I'm guessing this can be done with a stored procedure, but if possible I'd liketo be able to run this command from my third party app without calling a stored procedure (we try not to use stored procedureshere b/c we code entirely in this third party language - if we <i>had </i>to, it's possible we could install astored procedure though). <br /><br /> But since I've seen so much magic on display from people on this list, I'm goingto ask if it's possible to do this solely from PG SQL sent from a third party language? The main tricky bit seems tobe ensuring that everything is locked so two such increment calls at the same time don't yield overlapping blocks of ids.Is there a way to "lock" the sequence generator for the duration of a "nextval" <b>and </b>"setval" call? Since pk sequencefunctions like nextval cannot be rolled back, I'm guessing that "begin/end" won't do the job? <br /><br /> I.e:<br/><br /> -- need "magic lock" statement on pk sequence here<br /> nextval<br /> -- returns 540203<br /> setval(545203)<br/> -- now sequence is set to where I want it and I "own" 5000 id's<br /> -- release magic lock here<br /><br/> My fallback is to just have a routine that calls "nextval" a bunch of times and stores all the id's it gets - theymay or may not be sequential but they'll be unique. This is going to be a really slow way to get a large number of id'sof course and just seems plain wrongheaded in many ways.<br /><br /> Any insights? All help is appreciated and inputon a better way to solve the problem completely is of course welcome as well.<br /><br /> Sincerely,<br /><br /> Steve<br/>
On Aug 3, 2007, at 11:50 , Steve Midgley wrote: > My problem: I'd like to be able to grab a block of id's from the > live table's pk sequence. So let's say my importer has 5,000 new > rows to import and the current max pk in the live table is 540,203. > I'd like to be able to increment the primary key sequence in such a > way that I get a block of ids all to myself and the sequence is > reset to 545,203 with a guarantee that all the id's between 540203 > and 545203 are unused. Setting the next number that will be taken is generated is straightforward. ALTER SEQUENCE foo_seq RESTART WITH 545203; Perhaps doing something like (untested): ALTER SEQUENCE foo_seq RESTART WITH (SELECT last_value + CASE WHEN is_called THEN 1 ELSE 0 END); You'll need to manage the skipped values yourself though, of course. Perhaps set up a table to hold the current number used in the skipped range. Basically this would be the same approach as that used by people who need to guarantee gapless sequences: you can check the archives for details, but basically you need to make sure the table is properly locked when you're planning to use a new number. > But since I've seen so much magic on display from people on this > list, I'm going to ask if it's possible to do this solely from PG > SQL sent from a third party language? The main tricky bit seems to > be ensuring that everything is locked so two such increment calls > at the same time don't yield overlapping blocks of ids. Is there a > way to "lock" the sequence generator for the duration of a > "nextval" and "setval" call? Since pk sequence functions like > nextval cannot be rolled back, I'm guessing that "begin/end" won't > do the job? I'm not sure, but perhaps calling SELECT FOR UPDATE on the sequence itself (i.e., not using nextval/setval) would give the appropriate lock, though I wouldn't be surprised if it isn't possible. As you've noted, sequences are designed for performance to ignore transactions, so this may not be possible (though perhaps that's just values returned via the nextval function). Hope this gives you some additional ideas on how to handle this. Michael Glaesemann grzm seespotcode net
On 8/3/07, Steve Midgley <public@misuse.org> wrote: > > Hi, > > I'm writing an import app in a third party language. It's going to use > "copy to" to move data from STDIN to a postgres (8.2) table. There are some > complexities though: it's going to copy the records to a "versioned" table > first, and then at a later time the records will be copied by a different > procedure to the "live" table. The live table and versioned table are > identical in terms of their field definitions. But there is no sequence > associated with the versioned table (whose primary key is "id" plus > "import_group_id", whereas the live table's pk is just "id"). So all > versioned table entries must already "know" what their id would be in the > live table. (This makes sense for other business process we have, but it's a > bit of a problem in this instance). > > My problem: I'd like to be able to grab a block of id's from the live > table's pk sequence. So let's say my importer has 5,000 new rows to import > and the current max pk in the live table is 540,203. I'd like to be able to > increment the primary key sequence in such a way that I get a block of ids > all to myself and the sequence is reset to 545,203 with a guarantee that all > the id's between 540203 and 545203 are unused. The real danger in doing this is race conditions. Most anything you do involves a possible race condition. As long as the race condition doesn't result in an id getting used twice, you're safe. So: test=# create sequence m; CREATE SEQUENCE test=# select nextval('m');nextval --------- 1 (1 row) test=# alter sequence m increment by 5000; ALTER SEQUENCE test=# select nextval('m');nextval --------- 5001 (1 row) test=# alter sequence m increment by 1; ALTER SEQUENCE test=# select nextval('m');nextval --------- 5002 (1 row) In this method, the only possible race condition is that someone might run a nextval('m') between the time you set the increment to 5000 and 1 again. If that happens, you'd have 5,000 missing ids, but since sequences are designed to prevent dupes, not holes, that's ok. > But since I've seen so much magic on display from people on this list, I'm > going to ask if it's possible to do this solely from PG SQL sent from a > third party language? The main tricky bit seems to be ensuring that > everything is locked so two such increment calls at the same time don't > yield overlapping blocks of ids. Is there a way to "lock" the sequence > generator for the duration of a "nextval" and "setval" call? Avoiding the setval is the real key. It doesn't scale. Missing 5,000 ids is no big deal. repeating them IS a big deal. Not using setval is the best answer.
Hi Scott, Thanks for this info (and Michael too!). Let me see if I understand your suggestion. I would run these three commands in sequence: # select nextval('[my_seq_name]'); returns => 52 [I believe that the sequence is at 52] # alter sequence [my_seq_name] increment by 5000; # select nextval('[my_seq_name]'); returns => 5052 If the third command doesn't return "5052" - i.e. it returns 5053, then I know that somewhere in this sequence another process grabbed an id out from under me. It doesn't matter where, but I would know that my 5000 id's are not unique and should be discarded? If the third command DOES return 5052, then I know that those 5000 id's are "locked" for my use and no other application could have grabbed one of them? Can anyone see a flaw in that? It looks right to me.. Scott - it also seems to me that I need not waste all those id's if another application does grab one during my statement: If I detect a failure, I could just reset the pk sequence back to the max id of the underlying table before trying again. I think this code would do it (stolen from Ruby's postgres adaptor): SELECT setval('[seq_name]', (SELECT COALESCE(MAX([pk_of_seq]), (SELECT min_value FROM [seq_name])) FROM [table_of_pk]), false) So for table "property" with pk of "id" and sequence name "property_id_seq": SELECT setval('property_id_seq', (SELECT COALESCE(MAX(id), (SELECT min_value FROM property_id_seq)) FROM property), false) Then I could just retry the first process - though on a table that is very busy, I might have retry for a while.. Regarding Michael's suggestion - I tried messing around with LOCK and similar commands but they're only allowed to run against TABLES not SEQUENCES - too bad - that would have been perfect. I'm now starting to think that there's no way to solve this problem in an "elegant manner" even in a stored procedure? Your method seems to be as good as it's going to get? (Not that I'm complaining!) Thanks again - any thoughts are welcome, Steve At 08:01 PM 8/3/2007, Scott Marlowe wrote: >On 8/3/07, Steve Midgley <public@misuse.org> wrote: > > > > Hi, > > > > I'm writing an import app in a third party language. It's going to > use > > "copy to" to move data from STDIN to a postgres (8.2) table. There > are some > > complexities though: it's going to copy the records to a > "versioned" table > > first, and then at a later time the records will be copied by a > different > > procedure to the "live" table. The live table and versioned table > are > > identical in terms of their field definitions. But there is no > sequence > > associated with the versioned table (whose primary key is "id" plus > > "import_group_id", whereas the live table's pk is just "id"). So > all > > versioned table entries must already "know" what their id would be > in the > > live table. (This makes sense for other business process we have, > but it's a > > bit of a problem in this instance). > > > > My problem: I'd like to be able to grab a block of id's from the > live > > table's pk sequence. So let's say my importer has 5,000 new rows to > import > > and the current max pk in the live table is 540,203. I'd like to be > able to > > increment the primary key sequence in such a way that I get a block > of ids > > all to myself and the sequence is reset to 545,203 with a guarantee > that all > > the id's between 540203 and 545203 are unused. > >The real danger in doing this is race conditions. Most anything you >do involves a possible race condition. As long as the race condition >doesn't result in an id getting used twice, you're safe. > >So: > >test=# create sequence m; >CREATE SEQUENCE >test=# select nextval('m'); > nextval >--------- > 1 >(1 row) > >test=# alter sequence m increment by 5000; >ALTER SEQUENCE >test=# select nextval('m'); > nextval >--------- > 5001 >(1 row) > >test=# alter sequence m increment by 1; >ALTER SEQUENCE >test=# select nextval('m'); > nextval >--------- > 5002 >(1 row) > >In this method, the only possible race condition is that someone might >run a nextval('m') between the time you set the increment to 5000 and >1 again. If that happens, you'd have 5,000 missing ids, but since >sequences are designed to prevent dupes, not holes, that's ok. > > > But since I've seen so much magic on display from people on this > list, I'm > > going to ask if it's possible to do this solely from PG SQL sent > from a > > third party language? The main tricky bit seems to be ensuring that > > everything is locked so two such increment calls at the same time > don't > > yield overlapping blocks of ids. Is there a way to "lock" the > sequence > > generator for the duration of a "nextval" and "setval" call? > >Avoiding the setval is the real key. It doesn't scale. Missing 5,000 >ids is no big deal. repeating them IS a big deal. Not using setval >is the best answer.
On 8/3/07, Steve Midgley <public@misuse.org> wrote: > Hi Scott, > > Thanks for this info (and Michael too!). > > Let me see if I understand your suggestion. I would run these three > commands in sequence: > > # select nextval('[my_seq_name]'); > returns => 52 [I believe that the sequence is at 52] > # alter sequence [my_seq_name] increment by 5000; > # select nextval('[my_seq_name]'); > returns => 5052 > > If the third command doesn't return "5052" - i.e. it returns 5053, then > I know that somewhere in this sequence another process grabbed an id > out from under me. It doesn't matter where, but I would know that my > 5000 id's are not unique and should be discarded? If the third command > DOES return 5052, then I know that those 5000 id's are "locked" for my > use and no other application could have grabbed one of them? No, that's not what would happen. If someone grabbed an id after the increment value was changed, then you'd get 10052, cause they would increment the sequence by 5,000.since you're not using setval, and you're keeping the increment positive, there's no danger of collision, only of over-incrementing and leaving a giant hole in your sequence. which is ok. > Can anyone see a flaw in that? It looks right to me.. > > Scott - it also seems to me that I need not waste all those id's if > another application does grab one during my statement: If I detect a > failure, I could just reset the pk sequence back to the max id of the > underlying table before trying again. I think this code would do it > (stolen from Ruby's postgres adaptor): That is open to a race condition. The bad kind. > SELECT setval('[seq_name]', (SELECT COALESCE(MAX([pk_of_seq]), (SELECT > min_value FROM [seq_name])) FROM [table_of_pk]), false) As long as you're using setval, you have a race condition. Please avoid it. Unless you can guarantee that no one else is using the database at the same time (during a maintenance window etc...) > So for table "property" with pk of "id" and sequence name > "property_id_seq": > > SELECT setval('property_id_seq', (SELECT COALESCE(MAX(id), (SELECT > min_value FROM property_id_seq)) FROM property), false) You'd think that the select coalesce and the outer select setval would not have a race condition, but they still do. Just a much smaller one. > I'm now starting to think that there's no way to solve this problem in > an "elegant manner" even in a stored procedure? Your method seems to be > as good as it's going to get? (Not that I'm complaining!) Yep. Safe is better than pretty or elegant. :)
Oh, another point. You should run the alter sequence m increment 5000; select nextval('m'); alter sequence m increment 1; one right after the other to reduce the number of 5000 wide holes in your sequence. Or, given the size of bigint, you could just set the increment to 5000 and leave it there, and then any insert could grab nextval('m') and insert up to 5000 more ids with monotonically increasing ids safely. Note you wouldn't use defaults or nextvals for the rest, you'd have to calculate them in your application.
On Aug 3, 2007, at 11:50 AM, Steve Midgley wrote: > Hi, > > I'm writing an import app in a third party language. It's going to > use "copy to" to move data from STDIN to a postgres (8.2) table. > There are some complexities though: it's going to copy the records > to a "versioned" table first, and then at a later time the records > will be copied by a different procedure to the "live" table. The > live table and versioned table are identical in terms of their > field definitions. But there is no sequence associated with the > versioned table (whose primary key is "id" plus "import_group_id", > whereas the live table's pk is just "id"). So all versioned table > entries must already "know" what their id would be in the live > table. (This makes sense for other business process we have, but > it's a bit of a problem in this instance). > > My problem: I'd like to be able to grab a block of id's from the > live table's pk sequence. So let's say my importer has 5,000 new > rows to import and the current max pk in the live table is 540,203. > I'd like to be able to increment the primary key sequence in such a > way that I get a block of ids all to myself and the sequence is > reset to 545,203 with a guarantee that all the id's between 540203 > and 545203 are unused. > > I'm guessing this can be done with a stored procedure, but if > possible I'd like to be able to run this command from my third > party app without calling a stored procedure (we try not to use > stored procedures here b/c we code entirely in this third party > language - if we had to, it's possible we could install a stored > procedure though). > > But since I've seen so much magic on display from people on this > list, I'm going to ask if it's possible to do this solely from PG > SQL sent from a third party language? The main tricky bit seems to > be ensuring that everything is locked so two such increment calls > at the same time don't yield overlapping blocks of ids. Is there a > way to "lock" the sequence generator for the duration of a > "nextval" and "setval" call? Since pk sequence functions like > nextval cannot be rolled back, I'm guessing that "begin/end" won't > do the job? > > I.e: > > -- need "magic lock" statement on pk sequence here > nextval > -- returns 540203 > setval(545203) > -- now sequence is set to where I want it and I "own" 5000 id's > -- release magic lock here > > My fallback is to just have a routine that calls "nextval" a bunch > of times and stores all the id's it gets - they may or may not be > sequential but they'll be unique. This is going to be a really slow > way to get a large number of id's of course and just seems plain > wrongheaded in many ways. > > Any insights? All help is appreciated and input on a better way to > solve the problem completely is of course welcome as well. Is there actually a requirement that the block of 5000 values not have gaps? If not, why not make the versioned table's id column default to nextval from the same sequence? Then when the data is copied over to the live table, as long as you supply the the id it won't generate a new id and you'll maintain your row-row relationships. If you do require that the block not have gaps, check out the article on how to do this here: http://www.varlena.com/ varlena/GeneralBits/130.php Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On Aug 3, 2007, at 14:28 , Steve Midgley wrote: AIUI, one difference between the solutions Scott and I proposed is that while INCREMENT is set at 5000, each time nextval is called the sequence is incremented by 5000. For example: test=# select nextval('foos_foo_id_seq'); nextval --------- 1 (1 row) test=# select nextval('foos_foo_id_seq'); nextval --------- 2 (1 row) test=# alter sequence foos_foo_id_seq increment 5000; ALTER SEQUENCE test=# select nextval('foos_foo_id_seq'); nextval --------- 5002 (1 row) test=# select nextval('foos_foo_id_seq'); nextval --------- 10002 (1 row) The only issue with this is that it burns through sequence values faster. That may not be a concern, of course. I wonder if there isn't a way to use ALTER SEQUENCE ... CACHE to handle this: -- Alice's session test=# select nextval('foos_foo_id_seq'); nextval --------- 15002 (1 row) -- Bob's session test=# select nextval('foos_foo_id_seq'); nextval --------- 15003 (1 row) test=# select nextval('foos_foo_id_seq'); nextval --------- 15004 (1 row) -- Alice's session test=# alter sequence foos_foo_id_seq cache 5000; ALTER SEQUENCE test=# select nextval('foos_foo_id_seq'); nextval --------- 20003 (1 row) test=# select nextval('foos_foo_id_seq'); nextval --------- 20004 (1 row) Now you should be able to safely use the values from nextval to nextval + cache without worrying that the values in that range are going to be used by another backend. It looks like you could even do: -- Alice's session test=# select nextval('foos_foo_id_seq'); nextval --------- 30096 (1 row) test=# alter sequence foos_foo_id_seq cache 10; -- set cache to preallocate ALTER SEQUENCE test=# select nextval('foos_foo_id_seq'); -- get nextval and preallocate the next 10 nextval --------- 30097 (1 row) -- Bob's session test=# select nextval('foos_foo_id_seq'); nextval --------- 30107 (1 row) -- Alice's session test=# alter sequence foos_foo_id_seq cache 1; -- reset cache so other backends aren't burning sequence values unnecessarily ALTER SEQUENCE test=# select nextval('foos_foo_id_seq'); -- note that the previously cached values (for both Alice and Bob's session) are discarded, and available for manual entry nextval --------- 30117 (1 row) Again, you can set up another table to keep track of the values that are going to be used manually, perhaps something like: CREATE TABLE preallocated_foo_ids ( current_value INTEGER NOT NULL , maximum_value INTEGER NOT NULL , check (current_value <= maximum_value) ); Then, to use: test=# ALTER SEQUENCE foos_foo_id_seq CACHE 10; ALTER SEQUENCE test=# DELETE FROM preallocated_foo_ids; -- clear old ones DELETE 1 test=# INSERT INTO preallocated_foo_ids (current_value, maximum_value) test-# SELECT val, val + 10 test-# FROM (SELECT nextval('foos_foo_id_seq')) AS seq(val); INSERT 0 1 test=# ALTER SEQUENCE foos_foo_id_seq CACHE 1; ALTER SEQUENCE test=# SELECT * FROM preallocated_foo_ids; current_value | maximum_value ---------------+--------------- 30142 | 30152 (1 row) You've now got a preallocated range in preallocated_ids. Just use standard table locking on preallocated_foo_ids: there shouldn't be heavy contention on this table during your load, so I don't think performance should suffer too badly. test=# BEGIN; BEGIN test=# SELECT current_value test-# FROM preallocated_foo_ids test-# FOR UPDATE; current_value --------------- 30142 (1 row) test=# INSERT INTO foos (foo_id) VALUES (30142); INSERT 0 1 test=# UPDATE preallocated_foo_ids test-# SET current_value = current_value + 1; UPDATE 1 test=# COMMIT; COMMIT test=# SELECT * FROM preallocated_foo_ids; current_value | maximum_value ---------------+--------------- 30143 | 30152 (1 row) When you run into an error because of the CHECK constraint, you know you've hit the end of your range (if you haven't been checking otherwise). > Regarding Michael's suggestion - I tried messing around with LOCK > and similar commands but they're only allowed to run against TABLES > not SEQUENCES - too bad - that would have been perfect. Yeah, I thought that might be the case. Michael Glaesemann grzm seespotcode net
On Aug 3, 2007, at 15:27 , Erik Jones wrote: > Is there actually a requirement that the block of 5000 values not > have gaps? Good point. > If not, why not make the versioned table's id column default to > nextval from the same sequence? Of course, the ids of the two tables could be interleaved in this case. This might not be an issue, of course. Michael Glaesemann grzm seespotcode net
Hi Scott,<br /><br /> You've moved into more general territory, so I'm starting a new thread. The code I provided to reseta primary key sequence is actually part of Ruby on Rails core library - actually they use something very similar towhat I originally sent:<br /><br /> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT increment_by FROM#{sequence}), (SELECT min_value FROM #{sequence})) FROM #{table}), false)<br /><br /> Where:<br /> #{sequence} = sequencename<br /> #{pk} = primary key of table under sequence<br /> #{table} = table under sequence<br /><br /> Their codeis a little different from what I provided before b/c it increments by one (times the increment #) above the max(pk).But essentially it's the same. (I think their method leaves small gaps in the sequence every time it runs). AlsoI think they're method is likely to be a little slower (one extra select statement) and therefore (perhaps) more vulnerableto a race?<br /><br /> You mentioned something more general though: "As long as you're using setval you have arace condition"? However the postgres manual states:<br /><br /><blockquote cite="" class="cite" type="cite">The sequencefunctions, listed in <a href="http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html#FUNCTIONS-SEQUENCE-TABLE">Table 9-34</a>, providesimple, multiuser-safe methods for obtaining successive sequence values from sequence objects. </blockquote><br />(<a eudora="autourl" href="http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html"> http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html</a>)<br /><br /> Included in Table 9-34 is "setval"- so I'm not clear how it can have a race condition all by itself? Or maybe it only has a race condition when usedin ways similar to how Ruby/Rails is using it? (i.e. with a compound select/coalesce statement as one of its parameters?)Would this command have a race condition:<br /><br /> select setval('my_pk_seq', 500)<br /><br /> This issueis reasonably important since Ruby on Rails is fairly widely used. As you say, the race window would be pretty smallon a compound select -- and the Ruby function doesn't actually get called very often, but if you wouldn't mind explaininghow the race condition would manifest, I'll post a warning on the RoR bug tracking site so that people can at leastunderstand that there's a potential bug here..<br /><br /> Thanks again,<br /><br /> Steve<br /><br /> At 08:42 PM 8/3/2007,Scott Marlowe wrote:<br /><blockquote cite="" class="cite" type="cite">On 8/3/07, Steve Midgley <public@misuse.org>wrote:<br /> > Hi Scott,<br /> ><br /> > Thanks for this info (and Michael too!).<br />><br /> > Let me see if I understand your suggestion. I would run these three<br /> > commands in sequence:<br/> ><br /> > # select nextval('[my_seq_name]');<br /> > returns => 52 [I believe that the sequenceis at 52]<br /> > # alter sequence [my_seq_name] increment by 5000;<br /> > # select nextval('[my_seq_name]');<br/> > returns => 5052<br /> ><br /> > If the third command doesn't return "5052" -i.e. it returns 5053, then<br /> > I know that somewhere in this sequence another process grabbed an id<br /> > outfrom under me. It doesn't matter where, but I would know that my<br /> > 5000 id's are not unique and should be discarded?If the third command<br /> > DOES return 5052, then I know that those 5000 id's are "locked" for my<br /> >use and no other application could have grabbed one of them?<br /><br /> No, that's not what would happen. If someonegrabbed an id after the<br /> increment value was changed, then you'd get 10052, cause they would<br /> incrementthe sequence by 5,000.since you're not using setval, and<br /> you're keeping the increment positive, there's nodanger of collision,<br /> only of over-incrementing and leaving a giant hole in your sequence.<br /> which is ok.<br /><br/> > Can anyone see a flaw in that? It looks right to me..<br /> ><br /> > Scott - it also seems to me thatI need not waste all those id's if<br /> > another application does grab one during my statement: If I detect a<br/> > failure, I could just reset the pk sequence back to the max id of the<br /> > underlying table before tryingagain. I think this code would do it<br /> > (stolen from Ruby's postgres adaptor):<br /><br /> That is open toa race condition. The bad kind.<br /><br /> > SELECT setval('[seq_name]', (SELECT COALESCE(MAX([pk_of_seq]), (SELECT<br/> > min_value FROM [seq_name])) FROM [table_of_pk]), false)<br /><br /> As long as you're using setval, youhave a race condition. Please<br /> avoid it. Unless you can guarantee that no one else is using the<br /> databaseat the same time (during a maintenance window etc...)<br /><br /> > So for table "property" with pk of "id" andsequence name<br /> > "property_id_seq":<br /> ><br /> > SELECT setval('property_id_seq', (SELECT COALESCE(MAX(id),(SELECT<br /> > min_value FROM property_id_seq)) FROM property), false)<br /><br /> You'd think thatthe select coalesce and the outer select setval would<br /> not have a race condition, but they still do. Just a muchsmaller<br /> one.<br /><br /> > I'm now starting to think that there's no way to solve this problem in<br /> >an "elegant manner" even in a stored procedure? Your method seems to be<br /> > as good as it's going to get? (Notthat I'm complaining!)<br /><br /> Yep. Safe is better than pretty or elegant. :)</blockquote>
Steve Midgley <public@misuse.org> writes: > The code I provided to reset a primary key sequence is actually part of > Ruby on Rails core library - actually they use something very similar > to what I originally sent: > SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT > increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence})) > FROM #{table}), false) Ugh. That's completely unsafe/broken, unless they also use locking that you didn't show. > You mentioned something more general though: "As long as you're using > setval you have a race condition"? However the postgres manual states: >> The sequence functions, listed in >> <http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html#FUNCTIONS-SEQUENCE-TABLE>Table >> 9-34, provide simple, multiuser-safe methods for obtaining successive >> sequence values from sequence objects. > (http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html) > Included in Table 9-34 is "setval" - so I'm not clear how it can have a > race condition all by itself? It doesn't have a race condition "all by itself": it will do what it's told. The problem with commands such as the above is that there's a time window between calculating the max() and executing the setval(), and that window is more than large enough to allow someone else to insert a row that invalidates your max() computation. (Because of MVCC snapshotting, the risk window is in fact as long as the entire calculation of the max --- it's not just a few instructions as some might naively think.) Now it is possible to make this brute-force approach safe: you can lock the table against all other modifications until you've applied your own changes. But you pay a high price in loss of concurrency if you do that. regards, tom lane
On 03/08/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
On Aug 3, 2007, at 15:27 , Erik Jones wrote:
> Is there actually a requirement that the block of 5000 values not
> have gaps?
Good point.
> If not, why not make the versioned table's id column default to
> nextval from the same sequence?
Of course, the ids of the two tables could be interleaved in this
case. This might not be an issue, of course.
Michael Glaesemann
grzm seespotcode net
It seams to me that one should use the cache feature of a sequence is there just for this purpose.
That way when you get the next value your session caches and any other sessions will get one after your cache range.
Peter
Hi Peter,<br /><br /> I struggled to implement Michael's suggestion to use CACHE in this regard when he made it but afteryour encouragement I've studied it more and you and he are both totally right - CACHE is designed to do exactly whatI want. Here's the sample code so as to put this issue to bed and to record what the solution is:<br /><br /> Scenario:<br/> Bob wants a block of 50 id's<br /> Alice just wants a single id but will accidentally "interlope" into Bob'ssequence obtainment.<br /> "property_id_seq" = 100<br /><br /> Bob:<br /> # alter sequence property_id_seq CACHE 50<br/> Alice:<br /> # select nextval('property_id_seq')<br /> => 101 (wastes ids up to 150)<br /> Bob:<br /> # selectnextval('propery_id_seq')<br /> => 151 (Bob now knows that 151-201 are locked permanently for his exclusive use)<br/> Bob:<br /> # alter sequence property_id_seq CACHE 1<br /> => Sequence will now return single ids to everyone<br/><br /> So in the worst case, there will be id "wastage" equal to the CACHE size times the number of "interlopers"who grab ids while Bob is obtaining his block. And Bob's time to grab a set of id's is fairly small since he'sonly issuing a couple of very fast sql statements..<br /><br /> NOTE: If all calling parties must agree to always usethe same CACHE number for obtaining blocks of id's, then this method seems bulletproof (if two parties use differing CACHE#'s then they could cause too few id's to be CACHED to one of the parties).<br /><br /> I hope this helps someone elseon the archives down the road. Thanks to everyone for putting their time and attention on this problem. I'm very grateful.<br/><br /> Sincerely,<br /><br /> Steve<br /><br /> At 08:00 AM 8/6/2007, Peter Childs wrote:<br /><br /><br /><blockquotecite="" class="cite" type="cite">On 03/08/07, <b>Michael Glaesemann</b> <<a href="mailto:grzm@seespotcode.net">grzm@seespotcode.net</a>>wrote:<br /><dl><br /><dd>On Aug 3, 2007, at 15:27 , ErikJones wrote:<br /><br /><dd>> Is there actually a requirement that the block of 5000 values not<br /><dd>> havegaps?<br /><br /><dd>Good point.<br /><br /><dd>> If not, why not make the versioned table's id column default to<br /><dd>> nextval from the same sequence?<br /><br /><dd>Of course, the ids of the two tables could be interleavedin this<br /><dd>case. This might not be an issue, of course.<br /><br /><br /><dd>Michael Glaesemann<br /><dd>grzmseespotcode net<br /><br /></dl><br /> It seams to me that one should use the cache feature of a sequence is therejust for this purpose.<br /><br /> That way when you get the next value your session caches and any other sessions willget one after your cache range. <br /><br /> Peter</blockquote>
Steve Midgley writes: >> The code I provided to reset a primary key sequence is actually part of >> Ruby on Rails core library - actually they use something very similar >> to what I originally sent: ... >> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT >> increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence})) >> FROM #{table}), false) Tom Lane wrote: > Ugh. That's completely unsafe/broken, unless they also use locking that > you didn't show. ... > It doesn't have a race condition "all by itself": it will do what it's > told. The problem with commands such as the above is that there's a > time window between calculating the max() and executing the setval(), > and that window is more than large enough to allow someone else to > insert a row that invalidates your max() computation. (Because of MVCC > snapshotting, the risk window is in fact as long as the entire > calculation of the max --- it's not just a few instructions as some > might naively think.) > > Now it is possible to make this brute-force approach safe: you can lock > the table against all other modifications until you've applied your own > changes. But you pay a high price in loss of concurrency if you do > that. All this trouble over semantically-significant ID columns seems to support the camp that excoriates use of artificial ID columns and autoincrementation altogether. The usual argument in their favor is that they speed up performance, but this epicyclic dance to accomodate FK references to autoincremented keys makes the case that there is also a performance penalty, and in the more critical performance area of code development and correctness than in the less critical search speed area. -- Lew
On 8/4/07, Lew <lew@lewscanon.nospam> wrote: > Steve Midgley writes: > >> The code I provided to reset a primary key sequence is actually part of > >> Ruby on Rails core library - actually they use something very similar > >> to what I originally sent: > ... > >> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT > >> increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence})) > >> FROM #{table}), false) > > Tom Lane wrote: > > Ugh. That's completely unsafe/broken, unless they also use locking that > > you didn't show. > ... > > It doesn't have a race condition "all by itself": it will do what it's > > told. The problem with commands such as the above is that there's a > > time window between calculating the max() and executing the setval(), > > and that window is more than large enough to allow someone else to > > insert a row that invalidates your max() computation. (Because of MVCC > > snapshotting, the risk window is in fact as long as the entire > > calculation of the max --- it's not just a few instructions as some > > might naively think.) > > > > Now it is possible to make this brute-force approach safe: you can lock > > the table against all other modifications until you've applied your own > > changes. But you pay a high price in loss of concurrency if you do > > that. > > All this trouble over semantically-significant ID columns seems to support the > camp that excoriates use of artificial ID columns and autoincrementation > altogether. I don't that it does that at all. If properly implemented, sequences provide a race free way to assign unique ids to a table that otherwise would have no natural primary key. I think that having a religious preference one way or the other means you might use a suboptimal method in some cases. In the db I work on, we have artificial pks of sequences, artificial pks made from a large list of pre-created ids (6 alphanum char locator codes, common in the travel industry) and still other tables that have natural primary keys. Each has a reason for being the way it is. I can't see this saying natural pks, which are often open to interpretation and changes over time are any less are any better than artificial pks from a sequence at all. I can see it saying that the people who developed RoR framework were NOT database specialists. > The usual argument in their favor is that they speed up performance, but this > epicyclic dance to accomodate FK references to autoincremented keys makes the > case that there is also a performance penalty, and in the more critical > performance area of code development and correctness than in the less critical > search speed area. Sorry, but that's only one of the two arguments. The other one is that natural pks are often impossible due to the dataset not being able to be unique because of the business constraints. In an airline reservation system, you might start with last and first name for a pk. Then add phone number. except some people don't wanna give their phone numbers, so you use steet address, only to find out that John Smith with no phone lives on 123 Acorn St in Portland while another John Smith with no phone lives on 123 Acorn St in Phillidelphia. And what do you do when they call back about their ticket? Seriously, the natural primary key method has as many problems as the artificial one. In the USAF, they used to use a system to keep track of your medical records. The first letter of your last name, followed by the last four of your SSN. So, Ken Johnson might have an SSN of 123-45-6789 While Patricia Jackson might have an SSN of 453-89-6789. Imagine Ken's surprise when his doctor tells him he's pregnant. Or Patricia's surprise to have a limb amputated when she went in for a problem with heart murmers. By the time you add enough fields together to get a reliable primary key, you might have a multi-dimensional monster that kills performance. You can't just religiously say that one or the other is the only answer. Finally the cost of software development is only a small part of the overall cost of creating and deploying a piece of code. Last place I worked had a transactional database developed really quickly by non-db people that used isam tables (no, not mysql with myisam, just isam) that corrupted records dozens of times a day and required a team of support people to keep it running semi-smoothly and hours of downtime each night to be cleaned up. The initial savings on development costs were eaten up rather quickly by the ongoing support costs.