Thread: Increment a sequence by more than one

Increment a sequence by more than one

From

Steve Midgley

Date:

03 August 2007, 13:51:24

Hi,<br /><br /> I'm writing an import app in a third party language. It's going to use "copy to" to move data from
STDINto a postgres (8.2) table. There are some complexities though: it's going to copy the records to a "versioned"
tablefirst, and then at a later time the records will be copied by a different procedure to the "live" table. The live
tableand versioned table are identical in terms of their field definitions. But there is no sequence associated with
theversioned table (whose primary key is "id" plus "import_group_id", whereas the live table's pk is just "id"). So all
versionedtable entries must already "know" what their id would be in the live table. (This makes sense for other
businessprocess we have, but it's a bit of a problem in this instance).<br /><br /> My problem: I'd like to be able to
graba block of id's from the live table's pk sequence. So let's say my importer has 5,000 new rows to import and the
currentmax pk in the live table is 540,203. I'd like to be able to increment the primary key sequence in such a way
thatI get a block of ids all to myself and the sequence is reset to 545,203 with a guarantee that all the id's between
540203and 545203 are unused.<br /><br /> I'm guessing this can be done with a stored procedure, but if possible I'd
liketo be able to run this command from my third party app without calling a stored procedure (we try not to use stored
procedureshere b/c we code entirely in this third party language - if we <i>had </i>to, it's possible we could install
astored procedure though). <br /><br /> But since I've seen so much magic on display from people on this list, I'm
goingto ask if it's possible to do this solely from PG SQL sent from a third party language? The main tricky bit seems
tobe ensuring that everything is locked so two such increment calls at the same time don't yield overlapping blocks of
ids.Is there a way to "lock" the sequence generator for the duration of a "nextval" <b>and </b>"setval" call? Since pk
sequencefunctions like nextval cannot be rolled back, I'm guessing that "begin/end" won't do the job? <br /><br />
I.e:<br/><br /> -- need "magic lock" statement on pk sequence here<br /> nextval<br /> -- returns 540203<br />
setval(545203)<br/> -- now sequence is set to where I want it and I "own" 5000 id's<br /> -- release magic lock here<br
/><br/> My fallback is to just have a routine that calls "nextval" a bunch of times and stores all the id's it gets -
theymay or may not be sequential but they'll be unique. This is going to be a really slow way to get a large number of
id'sof course and just seems plain wrongheaded in many ways.<br /><br /> Any insights? All help is appreciated and
inputon a better way to solve the problem completely is of course welcome as well.<br /><br /> Sincerely,<br /><br />
Steve<br/>

Re: Increment a sequence by more than one

From

Michael Glaesemann

Date:

03 August 2007, 14:26:30

On Aug 3, 2007, at 11:50 , Steve Midgley wrote:

> My problem: I'd like to be able to grab a block of id's from the  
> live table's pk sequence. So let's say my importer has 5,000 new  
> rows to import and the current max pk in the live table is 540,203.  
> I'd like to be able to increment the primary key sequence in such a  
> way that I get a block of ids all to myself and the sequence is  
> reset to 545,203 with a guarantee that all the id's between 540203  
> and 545203 are unused.

Setting the next number that will be taken is generated is  
straightforward.
ALTER SEQUENCE foo_seq RESTART WITH 545203;

Perhaps doing something like (untested):

ALTER SEQUENCE foo_seq RESTART WITH (SELECT last_value + CASE WHEN  
is_called THEN 1 ELSE 0 END);

You'll need to manage the skipped values yourself though, of course.  
Perhaps set up a table to hold the current number used in the skipped  
range. Basically this would be the same approach as that used by  
people who need to guarantee gapless sequences: you can check the  
archives for details, but basically you need to make sure the table  
is properly locked when you're planning to use a new number.

> But since I've seen so much magic on display from people on this  
> list, I'm going to ask if it's possible to do this solely from PG  
> SQL sent from a third party language? The main tricky bit seems to  
> be ensuring that everything is locked so two such increment calls  
> at the same time don't yield overlapping blocks of ids. Is there a  
> way to "lock" the sequence generator for the duration of a  
> "nextval" and "setval" call? Since pk sequence functions like  
> nextval cannot be rolled back, I'm guessing that "begin/end" won't  
> do the job?

I'm not sure, but perhaps calling SELECT FOR UPDATE on the sequence  
itself (i.e., not using nextval/setval) would give the appropriate  
lock, though I wouldn't be surprised if it isn't possible. As you've  
noted, sequences are designed for performance to ignore transactions,  
so this may not be possible (though perhaps that's just values  
returned via the nextval function).

Hope this gives you some additional ideas on how to handle this.

Michael Glaesemann
grzm seespotcode net

Re: Increment a sequence by more than one

From

"Scott Marlowe"

Date:

03 August 2007, 16:22:26

On 8/3/07, Steve Midgley <public@misuse.org> wrote:
>
>  Hi,
>
>  I'm writing an import app in a third party language. It's going to use
> "copy to" to move data from STDIN to a postgres (8.2) table. There are some
> complexities though: it's going to copy the records to a "versioned" table
> first, and then at a later time the records will be copied by a different
> procedure to the "live" table. The live table and versioned table are
> identical in terms of their field definitions. But there is no sequence
> associated with the versioned table (whose primary key is "id" plus
> "import_group_id", whereas the live table's pk is just "id"). So all
> versioned table entries must already "know" what their id would be in the
> live table. (This makes sense for other business process we have, but it's a
> bit of a problem in this instance).
>
>  My problem: I'd like to be able to grab a block of id's from the live
> table's pk sequence. So let's say my importer has 5,000 new rows to import
> and the current max pk in the live table is 540,203. I'd like to be able to
> increment the primary key sequence in such a way that I get a block of ids
> all to myself and the sequence is reset to 545,203 with a guarantee that all
> the id's between 540203 and 545203 are unused.

The real danger in doing this is race conditions.  Most anything you
do involves a possible race condition.  As long as the race condition
doesn't result in an id getting used twice, you're safe.

So:

test=# create sequence m;
CREATE SEQUENCE
test=# select nextval('m');nextval
---------      1
(1 row)

test=# alter sequence m increment by 5000;
ALTER SEQUENCE
test=# select nextval('m');nextval
---------   5001
(1 row)

test=# alter sequence m increment by 1;
ALTER SEQUENCE
test=# select nextval('m');nextval
---------   5002
(1 row)

In this method, the only possible race condition is that someone might
run a nextval('m') between the time you set the increment to 5000 and
1 again.  If that happens, you'd have 5,000 missing ids, but since
sequences are designed to prevent dupes, not holes, that's ok.

>  But since I've seen so much magic on display from people on this list, I'm
> going to ask if it's possible to do this solely from PG SQL sent from a
> third party language? The main tricky bit seems to be ensuring that
> everything is locked so two such increment calls at the same time don't
> yield overlapping blocks of ids. Is there a way to "lock" the sequence
> generator for the duration of a "nextval" and "setval" call?

Avoiding the setval is the real key.  It doesn't scale.  Missing 5,000
ids is no big deal.  repeating them IS a big deal.  Not using setval
is the best answer.

Re: Increment a sequence by more than one

From

Steve Midgley

Date:

03 August 2007, 16:28:55

Hi Scott,

Thanks for this info (and Michael too!).

Let me see if I understand your suggestion. I would run these three 
commands in sequence:

# select nextval('[my_seq_name]');
returns => 52 [I believe that the sequence is at 52]
# alter sequence [my_seq_name] increment by 5000;
# select nextval('[my_seq_name]');
returns => 5052

If the third command doesn't return "5052" - i.e. it returns 5053, then 
I know that somewhere in this sequence another process grabbed an id 
out from under me. It doesn't matter where, but I would know that my 
5000 id's are not unique and should be discarded? If the third command 
DOES return 5052, then I know that those 5000 id's are "locked" for my 
use and no other application could have grabbed one of them?

Can anyone see a flaw in that? It looks right to me..

Scott - it also seems to me that I need not waste all those id's if 
another application does grab one during my statement: If I detect a 
failure, I could just reset the pk sequence back to the max id of the 
underlying table before trying again. I think this code would do it 
(stolen from Ruby's postgres adaptor):

SELECT setval('[seq_name]', (SELECT COALESCE(MAX([pk_of_seq]), (SELECT 
min_value FROM [seq_name])) FROM [table_of_pk]), false)

So for table "property" with pk of "id" and sequence name 
"property_id_seq":

SELECT setval('property_id_seq', (SELECT COALESCE(MAX(id), (SELECT 
min_value FROM property_id_seq)) FROM property), false)

Then I could just retry the first process - though on a table that is 
very busy, I might have retry for a while..

Regarding Michael's suggestion - I tried messing around with LOCK and 
similar commands but they're only allowed to run against TABLES not 
SEQUENCES - too bad - that would have been perfect.

I'm now starting to think that there's no way to solve this problem in 
an "elegant manner" even in a stored procedure? Your method seems to be 
as good as it's going to get? (Not that I'm complaining!)

Thanks again - any thoughts are welcome,

Steve

At 08:01 PM 8/3/2007, Scott Marlowe wrote:
>On 8/3/07, Steve Midgley <public@misuse.org> wrote:
> >
> >  Hi,
> >
> >  I'm writing an import app in a third party language. It's going to 
> use
> > "copy to" to move data from STDIN to a postgres (8.2) table. There 
> are some
> > complexities though: it's going to copy the records to a 
> "versioned" table
> > first, and then at a later time the records will be copied by a 
> different
> > procedure to the "live" table. The live table and versioned table 
> are
> > identical in terms of their field definitions. But there is no 
> sequence
> > associated with the versioned table (whose primary key is "id" plus
> > "import_group_id", whereas the live table's pk is just "id"). So 
> all
> > versioned table entries must already "know" what their id would be 
> in the
> > live table. (This makes sense for other business process we have, 
> but it's a
> > bit of a problem in this instance).
> >
> >  My problem: I'd like to be able to grab a block of id's from the 
> live
> > table's pk sequence. So let's say my importer has 5,000 new rows to 
> import
> > and the current max pk in the live table is 540,203. I'd like to be 
> able to
> > increment the primary key sequence in such a way that I get a block 
> of ids
> > all to myself and the sequence is reset to 545,203 with a guarantee 
> that all
> > the id's between 540203 and 545203 are unused.
>
>The real danger in doing this is race conditions.  Most anything you
>do involves a possible race condition.  As long as the race condition
>doesn't result in an id getting used twice, you're safe.
>
>So:
>
>test=# create sequence m;
>CREATE SEQUENCE
>test=# select nextval('m');
>  nextval
>---------
>        1
>(1 row)
>
>test=# alter sequence m increment by 5000;
>ALTER SEQUENCE
>test=# select nextval('m');
>  nextval
>---------
>     5001
>(1 row)
>
>test=# alter sequence m increment by 1;
>ALTER SEQUENCE
>test=# select nextval('m');
>  nextval
>---------
>     5002
>(1 row)
>
>In this method, the only possible race condition is that someone might
>run a nextval('m') between the time you set the increment to 5000 and
>1 again.  If that happens, you'd have 5,000 missing ids, but since
>sequences are designed to prevent dupes, not holes, that's ok.
>
> >  But since I've seen so much magic on display from people on this 
> list, I'm
> > going to ask if it's possible to do this solely from PG SQL sent 
> from a
> > third party language? The main tricky bit seems to be ensuring that
> > everything is locked so two such increment calls at the same time 
> don't
> > yield overlapping blocks of ids. Is there a way to "lock" the 
> sequence
> > generator for the duration of a "nextval" and "setval" call?
>
>Avoiding the setval is the real key.  It doesn't scale.  Missing 5,000
>ids is no big deal.  repeating them IS a big deal.  Not using setval
>is the best answer.

Re: Increment a sequence by more than one

From

"Scott Marlowe"

Date:

03 August 2007, 16:43:07

On 8/3/07, Steve Midgley <public@misuse.org> wrote:
> Hi Scott,
>
> Thanks for this info (and Michael too!).
>
> Let me see if I understand your suggestion. I would run these three
> commands in sequence:
>
> # select nextval('[my_seq_name]');
> returns => 52 [I believe that the sequence is at 52]
> # alter sequence [my_seq_name] increment by 5000;
> # select nextval('[my_seq_name]');
> returns => 5052
>
> If the third command doesn't return "5052" - i.e. it returns 5053, then
> I know that somewhere in this sequence another process grabbed an id
> out from under me. It doesn't matter where, but I would know that my
> 5000 id's are not unique and should be discarded? If the third command
> DOES return 5052, then I know that those 5000 id's are "locked" for my
> use and no other application could have grabbed one of them?

No, that's not what would happen.  If someone grabbed an id after the
increment value was changed, then you'd get 10052, cause they would
increment the sequence by 5,000.since you're not using setval, and
you're keeping the increment positive, there's no danger of collision,
only of over-incrementing and leaving a giant hole in your sequence.
which is ok.

> Can anyone see a flaw in that? It looks right to me..
>
> Scott - it also seems to me that I need not waste all those id's if
> another application does grab one during my statement: If I detect a
> failure, I could just reset the pk sequence back to the max id of the
> underlying table before trying again. I think this code would do it
> (stolen from Ruby's postgres adaptor):

That is open to a race condition.  The bad kind.

> SELECT setval('[seq_name]', (SELECT COALESCE(MAX([pk_of_seq]), (SELECT
> min_value FROM [seq_name])) FROM [table_of_pk]), false)

As long as you're using setval, you have a race condition.  Please
avoid it.  Unless you can guarantee that no one else is using the
database at the same time (during a maintenance window etc...)

> So for table "property" with pk of "id" and sequence name
> "property_id_seq":
>
> SELECT setval('property_id_seq', (SELECT COALESCE(MAX(id), (SELECT
> min_value FROM property_id_seq)) FROM property), false)

You'd think that the select coalesce and the outer select setval would
not have a race condition, but they still do.  Just a much smaller
one.

> I'm now starting to think that there's no way to solve this problem in
> an "elegant manner" even in a stored procedure? Your method seems to be
> as good as it's going to get? (Not that I'm complaining!)

Yep.  Safe is better than pretty or elegant. :)

Re: Increment a sequence by more than one

From

"Scott Marlowe"

Date:

03 August 2007, 16:45:40

Oh, another point.  You should run the

alter sequence m increment 5000;
select nextval('m');
alter sequence m increment 1;

one right after the other to reduce the number of 5000 wide holes in
your sequence.

Or, given the size of bigint, you could just set the increment to 5000
and leave it there, and then any insert could grab nextval('m') and
insert up to 5000 more ids with monotonically increasing ids safely.

Note you wouldn't use defaults or nextvals for the rest, you'd have to
calculate them in your application.

Re: Increment a sequence by more than one

From

Erik Jones

Date:

03 August 2007, 17:27:39

On Aug 3, 2007, at 11:50 AM, Steve Midgley wrote:

> Hi,
>
> I'm writing an import app in a third party language. It's going to
> use "copy to" to move data from STDIN to a postgres (8.2) table.
> There are some complexities though: it's going to copy the records
> to a "versioned" table first, and then at a later time the records
> will be copied by a different procedure to the "live" table. The
> live table and versioned table are identical in terms of their
> field definitions. But there is no sequence associated with the
> versioned table (whose primary key is "id" plus "import_group_id",
> whereas the live table's pk is just "id"). So all versioned table
> entries must already "know" what their id would be in the live
> table. (This makes sense for other business process we have, but
> it's a bit of a problem in this instance).
>
> My problem: I'd like to be able to grab a block of id's from the
> live table's pk sequence. So let's say my importer has 5,000 new
> rows to import and the current max pk in the live table is 540,203.
> I'd like to be able to increment the primary key sequence in such a
> way that I get a block of ids all to myself and the sequence is
> reset to 545,203 with a guarantee that all the id's between 540203
> and 545203 are unused.
>
> I'm guessing this can be done with a stored procedure, but if
> possible I'd like to be able to run this command from my third
> party app without calling a stored procedure (we try not to use
> stored procedures here b/c we code entirely in this third party
> language - if we had to, it's possible we could install a stored
> procedure though).
>
> But since I've seen so much magic on display from people on this
> list, I'm going to ask if it's possible to do this solely from PG
> SQL sent from a third party language? The main tricky bit seems to
> be ensuring that everything is locked so two such increment calls
> at the same time don't yield overlapping blocks of ids. Is there a
> way to "lock" the sequence generator for the duration of a
> "nextval" and "setval" call? Since pk sequence functions like
> nextval cannot be rolled back, I'm guessing that "begin/end" won't
> do the job?
>
> I.e:
>
> -- need "magic lock" statement on pk sequence here
> nextval
> -- returns 540203
> setval(545203)
> -- now sequence is set to where I want it and I "own" 5000 id's
> -- release magic lock here
>
> My fallback is to just have a routine that calls "nextval" a bunch
> of times and stores all the id's it gets - they may or may not be
> sequential but they'll be unique. This is going to be a really slow
> way to get a large number of id's of course and just seems plain
> wrongheaded in many ways.
>
> Any insights? All help is appreciated and input on a better way to
> solve the problem completely is of course welcome as well.

Is there actually a requirement that the block of 5000 values not
have gaps?  If not, why not make the versioned table's id column
default to nextval from the same sequence?  Then when the data is
copied over to the live table, as long as you supply the the id it
won't generate a new id and you'll maintain your row-row
relationships.  If you do require that the block not have gaps, check
out the article on how to do this here: http://www.varlena.com/
varlena/GeneralBits/130.php

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: Increment a sequence by more than one

From

Michael Glaesemann

Date:

03 August 2007, 17:31:12

On Aug 3, 2007, at 14:28 , Steve Midgley wrote:

AIUI, one difference between the solutions Scott and I proposed is  
that while INCREMENT is set at 5000, each time nextval is called the  
sequence is incremented by 5000. For example:

test=# select nextval('foos_foo_id_seq');
nextval
---------       1
(1 row)

test=# select nextval('foos_foo_id_seq');
nextval
---------       2
(1 row)

test=# alter sequence foos_foo_id_seq increment 5000;
ALTER SEQUENCE
test=# select nextval('foos_foo_id_seq');
nextval
---------    5002
(1 row)

test=# select nextval('foos_foo_id_seq');
nextval
---------   10002
(1 row)


The only issue with this is that it burns through sequence values  
faster. That may not be a concern, of course.

I wonder if there isn't a way to use ALTER SEQUENCE ... CACHE to  
handle this:

-- Alice's session
test=# select nextval('foos_foo_id_seq');
nextval
---------   15002
(1 row)

-- Bob's session
test=# select nextval('foos_foo_id_seq');
nextval
---------   15003
(1 row)

test=# select nextval('foos_foo_id_seq');
nextval
---------   15004
(1 row)

-- Alice's session
test=# alter sequence foos_foo_id_seq cache 5000;
ALTER SEQUENCE
test=# select nextval('foos_foo_id_seq'); nextval
---------   20003
(1 row)

test=# select nextval('foos_foo_id_seq');
nextval
---------   20004
(1 row)

Now you should be able to safely use the values from nextval to  
nextval + cache without worrying that the values in that range are  
going to be used by another backend. It looks like you could even do:



-- Alice's session
test=# select nextval('foos_foo_id_seq');
nextval
---------   30096
(1 row)

test=# alter sequence foos_foo_id_seq cache 10;  -- set cache to  
preallocate
ALTER SEQUENCE
test=# select nextval('foos_foo_id_seq'); -- get nextval and  
preallocate the next 10
nextval
---------   30097
(1 row)

-- Bob's session
test=# select nextval('foos_foo_id_seq');
nextval
---------   30107
(1 row)

-- Alice's session
test=# alter sequence foos_foo_id_seq cache 1; -- reset cache so  
other backends aren't burning sequence values unnecessarily
ALTER SEQUENCE
test=# select nextval('foos_foo_id_seq'); -- note that the previously  
cached values (for both Alice and Bob's session) are discarded, and  
available for manual entry
nextval
---------   30117
(1 row)


Again, you can set up another table to keep track of the values that  
are going to be used manually, perhaps something like:

CREATE TABLE preallocated_foo_ids
(    current_value INTEGER NOT NULL    , maximum_value INTEGER NOT NULL    , check (current_value <= maximum_value)
);

Then, to use:

test=# ALTER SEQUENCE foos_foo_id_seq CACHE 10;
ALTER SEQUENCE
test=# DELETE FROM preallocated_foo_ids; -- clear old ones
DELETE 1
test=# INSERT INTO preallocated_foo_ids (current_value, maximum_value)
test-# SELECT val, val + 10
test-# FROM (SELECT nextval('foos_foo_id_seq')) AS seq(val);
INSERT 0 1
test=# ALTER SEQUENCE foos_foo_id_seq CACHE 1;
ALTER SEQUENCE
test=# SELECT * FROM preallocated_foo_ids;
current_value | maximum_value
---------------+---------------         30142 |         30152
(1 row)

You've now got a preallocated range in preallocated_ids. Just use  
standard table locking on preallocated_foo_ids: there shouldn't be  
heavy contention on this table during your load, so I don't think  
performance should suffer too badly.

test=# BEGIN;
BEGIN
test=# SELECT current_value
test-# FROM preallocated_foo_ids
test-# FOR UPDATE;
current_value
---------------         30142
(1 row)

test=# INSERT INTO foos (foo_id) VALUES (30142);
INSERT 0 1
test=# UPDATE preallocated_foo_ids
test-# SET current_value = current_value + 1;
UPDATE 1
test=# COMMIT;
COMMIT
test=# SELECT * FROM preallocated_foo_ids;
current_value | maximum_value
---------------+---------------         30143 |         30152
(1 row)

When you run into an error because of the CHECK constraint, you know  
you've hit the end of your range (if you haven't been checking  
otherwise).

> Regarding Michael's suggestion - I tried messing around with LOCK  
> and similar commands but they're only allowed to run against TABLES  
> not SEQUENCES - too bad - that would have been perfect.

Yeah, I thought that might be the case.

Michael Glaesemann
grzm seespotcode net

Re: Increment a sequence by more than one

From

Michael Glaesemann

Date:

03 August 2007, 17:33:39

On Aug 3, 2007, at 15:27 , Erik Jones wrote:

> Is there actually a requirement that the block of 5000 values not  
> have gaps?

Good point.

> If not, why not make the versioned table's id column default to  
> nextval from the same sequence?

Of course, the ids of the two tables could be interleaved in this  
case. This might not be an issue, of course.

Michael Glaesemann
grzm seespotcode net

Race condition in resetting a sequence

From

Steve Midgley

Date:

03 August 2007, 19:15:47

Hi Scott,<br /><br /> You've moved into more general territory, so I'm starting a new thread. The code I provided to
reseta primary key sequence is actually part of Ruby on Rails core library - actually they use something very similar
towhat I originally sent:<br /><br /> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT increment_by
FROM#{sequence}), (SELECT min_value FROM #{sequence})) FROM #{table}), false)<br /><br /> Where:<br /> #{sequence} =
sequencename<br /> #{pk} = primary key of table under sequence<br /> #{table} = table under sequence<br /><br /> Their
codeis a little different from what I provided before b/c it increments by one (times the increment #) above the
max(pk).But essentially it's the same. (I think their method leaves small gaps in the sequence every time it runs).
AlsoI think they're method is likely to be a little slower (one extra select statement) and therefore (perhaps) more
vulnerableto a race?<br /><br /> You mentioned something more general though: "As long as you're using setval you have
arace condition"? However the postgres manual states:<br /><br /><blockquote cite="" class="cite" type="cite">The
sequencefunctions, listed in <a
href="http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html#FUNCTIONS-SEQUENCE-TABLE">Table 9-34</a>,
providesimple, multiuser-safe methods for obtaining successive sequence values from sequence objects. </blockquote><br
/>(<a eudora="autourl" href="http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html">
http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html</a>)<br /><br /> Included in Table 9-34 is
"setval"- so I'm not clear how it can have a race condition all by itself? Or maybe it only has a race condition when
usedin ways similar to how Ruby/Rails is using it? (i.e. with a compound select/coalesce statement as one of its
parameters?)Would this command have a race condition:<br /><br /> select setval('my_pk_seq', 500)<br /><br /> This
issueis reasonably important since Ruby on Rails is fairly widely used. As you say, the race window would be pretty
smallon a compound select -- and the Ruby function doesn't actually get called very often, but if you wouldn't mind
explaininghow the race condition would manifest, I'll post a warning on the RoR bug tracking site so that people can at
leastunderstand that there's a potential bug here..<br /><br /> Thanks again,<br /><br /> Steve<br /><br /> At 08:42 PM
8/3/2007,Scott Marlowe wrote:<br /><blockquote cite="" class="cite" type="cite">On 8/3/07, Steve Midgley
<public@misuse.org>wrote:<br /> > Hi Scott,<br /> ><br /> > Thanks for this info (and Michael too!).<br
/>><br /> > Let me see if I understand your suggestion. I would run these three<br /> > commands in
sequence:<br/> ><br /> > # select nextval('[my_seq_name]');<br /> > returns => 52 [I believe that the
sequenceis at 52]<br /> > # alter sequence [my_seq_name] increment by 5000;<br /> > # select
nextval('[my_seq_name]');<br/> > returns => 5052<br /> ><br /> > If the third command doesn't return "5052"
-i.e. it returns 5053, then<br /> > I know that somewhere in this sequence another process grabbed an id<br /> >
outfrom under me. It doesn't matter where, but I would know that my<br /> > 5000 id's are not unique and should be
discarded?If the third command<br /> > DOES return 5052, then I know that those 5000 id's are "locked" for my<br />
>use and no other application could have grabbed one of them?<br /><br /> No, that's not what would happen.  If
someonegrabbed an id after the<br /> increment value was changed, then you'd get 10052, cause they would<br />
incrementthe sequence by 5,000.since you're not using setval, and<br /> you're keeping the increment positive, there's
nodanger of collision,<br /> only of over-incrementing and leaving a giant hole in your sequence.<br /> which is ok.<br
/><br/> > Can anyone see a flaw in that? It looks right to me..<br /> ><br /> > Scott - it also seems to me
thatI need not waste all those id's if<br /> > another application does grab one during my statement: If I detect
a<br/> > failure, I could just reset the pk sequence back to the max id of the<br /> > underlying table before
tryingagain. I think this code would do it<br /> > (stolen from Ruby's postgres adaptor):<br /><br /> That is open
toa race condition.  The bad kind.<br /><br /> > SELECT setval('[seq_name]', (SELECT COALESCE(MAX([pk_of_seq]),
(SELECT<br/> > min_value FROM [seq_name])) FROM [table_of_pk]), false)<br /><br /> As long as you're using setval,
youhave a race condition.  Please<br /> avoid it.  Unless you can guarantee that no one else is using the<br />
databaseat the same time (during a maintenance window etc...)<br /><br /> > So for table "property" with pk of "id"
andsequence name<br /> > "property_id_seq":<br /> ><br /> > SELECT setval('property_id_seq', (SELECT
COALESCE(MAX(id),(SELECT<br /> > min_value FROM property_id_seq)) FROM property), false)<br /><br /> You'd think
thatthe select coalesce and the outer select setval would<br /> not have a race condition, but they still do.  Just a
muchsmaller<br /> one.<br /><br /> > I'm now starting to think that there's no way to solve this problem in<br />
>an "elegant manner" even in a stored procedure? Your method seems to be<br /> > as good as it's going to get?
(Notthat I'm complaining!)<br /><br /> Yep.  Safe is better than pretty or elegant. :)</blockquote>

Re: Race condition in resetting a sequence

From

Tom Lane

Date:

03 August 2007, 21:02:15

Steve Midgley <public@misuse.org> writes:
> The code I provided to reset a primary key sequence is actually part of 
> Ruby on Rails core library - actually they use something very similar 
> to what I originally sent:

> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT 
> increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence})) 
> FROM #{table}), false)

Ugh.  That's completely unsafe/broken, unless they also use locking that
you didn't show.

> You mentioned something more general though: "As long as you're using 
> setval you have a race condition"? However the postgres manual states:

>> The sequence functions, listed in 
>> <http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html#FUNCTIONS-SEQUENCE-TABLE>Table 
>> 9-34, provide simple, multiuser-safe methods for obtaining successive 
>> sequence values from sequence objects.

> (http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html)

> Included in Table 9-34 is "setval" - so I'm not clear how it can have a 
> race condition all by itself?

It doesn't have a race condition "all by itself": it will do what it's
told.  The problem with commands such as the above is that there's a
time window between calculating the max() and executing the setval(),
and that window is more than large enough to allow someone else to
insert a row that invalidates your max() computation.  (Because of MVCC
snapshotting, the risk window is in fact as long as the entire
calculation of the max --- it's not just a few instructions as some
might naively think.)

Now it is possible to make this brute-force approach safe: you can lock
the table against all other modifications until you've applied your own
changes.  But you pay a high price in loss of concurrency if you do
that.
        regards, tom lane

Re: Increment a sequence by more than one

From

"Peter Childs"

Date:

06 August 2007, 04:00:40

On 03/08/07, Michael Glaesemann <grzm@seespotcode.net> wrote:

On Aug 3, 2007, at 15:27 , Erik Jones wrote:

> Is there actually a requirement that the block of 5000 values not
> have gaps?

Good point.

> If not, why not make the versioned table's id column default to
> nextval from the same sequence?

Of course, the ids of the two tables could be interleaved in this
case. This might not be an issue, of course.

Michael Glaesemann
grzm seespotcode net

It seams to me that one should use the cache feature of a sequence is there just for this purpose.

That way when you get the next value your session caches and any other sessions will get one after your cache range.

Peter

Re: Increment a sequence by more than one

From

Steve Midgley

Date:

06 August 2007, 09:21:45

Hi Peter,<br /><br /> I struggled to implement Michael's suggestion to use CACHE in this regard when he made it but
afteryour encouragement I've studied it more and you and he are both totally right - CACHE is designed to do exactly
whatI want. Here's the sample code so as to put this issue to bed and to record what the solution is:<br /><br />
Scenario:<br/> Bob wants a block of 50 id's<br /> Alice just wants a single id but will accidentally "interlope" into
Bob'ssequence obtainment.<br /> "property_id_seq" = 100<br /><br /> Bob:<br /> # alter sequence property_id_seq CACHE
50<br/> Alice:<br /> # select nextval('property_id_seq')<br /> => 101 (wastes ids up to 150)<br /> Bob:<br /> #
selectnextval('propery_id_seq')<br /> => 151 (Bob now knows that 151-201 are locked permanently for his exclusive
use)<br/> Bob:<br /> # alter sequence property_id_seq CACHE 1<br /> => Sequence will now return single ids to
everyone<br/><br /> So in the worst case, there will be id "wastage" equal to the CACHE size times the number of
"interlopers"who grab ids while Bob is obtaining his block. And Bob's time to grab a set of id's is fairly small since
he'sonly issuing a couple of very fast sql statements..<br /><br /> NOTE: If all calling parties must agree to always
usethe same CACHE number for obtaining blocks of id's, then this method seems bulletproof (if two parties use differing
CACHE#'s then they could cause too few id's to be CACHED to one of the parties).<br /><br /> I hope this helps someone
elseon the archives down the road. Thanks to everyone for putting their time and attention on this problem. I'm very
grateful.<br/><br /> Sincerely,<br /><br /> Steve<br /><br /> At 08:00 AM 8/6/2007, Peter Childs wrote:<br /><br /><br
/><blockquotecite="" class="cite" type="cite">On 03/08/07, <b>Michael Glaesemann</b> <<a
href="mailto:grzm@seespotcode.net">grzm@seespotcode.net</a>>wrote:<br /><dl><br /><dd>On Aug 3, 2007, at 15:27 ,
ErikJones wrote:<br /><br /><dd>> Is there actually a requirement that the block of 5000 values not<br /><dd>>
havegaps?<br /><br /><dd>Good point.<br /><br /><dd>> If not, why not make the versioned table's id column default
to<br /><dd>> nextval from the same sequence?<br /><br /><dd>Of course, the ids of the two tables could be
interleavedin this<br /><dd>case. This might not be an issue, of course.<br /><br /><br /><dd>Michael Glaesemann<br
/><dd>grzmseespotcode net<br /><br /></dl><br /> It seams to me that one should use the cache feature of a sequence is
therejust for this purpose.<br /><br /> That way when you get the next value your session caches and any other sessions
willget one after your cache range. <br /><br /> Peter</blockquote>

Re: Race condition in resetting a sequence

From

Lew

Date:

10 August 2007, 02:29:54

Steve Midgley writes:
>> The code I provided to reset a primary key sequence is actually part of 
>> Ruby on Rails core library - actually they use something very similar 
>> to what I originally sent:
...
>> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT 
>> increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence})) 
>> FROM #{table}), false)

Tom Lane wrote:
> Ugh.  That's completely unsafe/broken, unless they also use locking that
> you didn't show.
...
> It doesn't have a race condition "all by itself": it will do what it's
> told.  The problem with commands such as the above is that there's a
> time window between calculating the max() and executing the setval(),
> and that window is more than large enough to allow someone else to
> insert a row that invalidates your max() computation.  (Because of MVCC
> snapshotting, the risk window is in fact as long as the entire
> calculation of the max --- it's not just a few instructions as some
> might naively think.)
> 
> Now it is possible to make this brute-force approach safe: you can lock
> the table against all other modifications until you've applied your own
> changes.  But you pay a high price in loss of concurrency if you do
> that.

All this trouble over semantically-significant ID columns seems to support the 
camp that excoriates use of artificial ID columns and autoincrementation 
altogether.

The usual argument in their favor is that they speed up performance, but this 
epicyclic dance to accomodate FK references to autoincremented keys makes the 
case that there is also a performance penalty, and in the more critical 
performance area of code development and correctness than in the less critical 
search speed area.

-- 
Lew

Re: Race condition in resetting a sequence

From

"Scott Marlowe"

Date:

10 August 2007, 18:00:45

On 8/4/07, Lew <lew@lewscanon.nospam> wrote:
> Steve Midgley writes:
> >> The code I provided to reset a primary key sequence is actually part of
> >> Ruby on Rails core library - actually they use something very similar
> >> to what I originally sent:
> ...
> >> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT
> >> increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence}))
> >> FROM #{table}), false)
>
> Tom Lane wrote:
> > Ugh.  That's completely unsafe/broken, unless they also use locking that
> > you didn't show.
> ...
> > It doesn't have a race condition "all by itself": it will do what it's
> > told.  The problem with commands such as the above is that there's a
> > time window between calculating the max() and executing the setval(),
> > and that window is more than large enough to allow someone else to
> > insert a row that invalidates your max() computation.  (Because of MVCC
> > snapshotting, the risk window is in fact as long as the entire
> > calculation of the max --- it's not just a few instructions as some
> > might naively think.)
> >
> > Now it is possible to make this brute-force approach safe: you can lock
> > the table against all other modifications until you've applied your own
> > changes.  But you pay a high price in loss of concurrency if you do
> > that.
>
> All this trouble over semantically-significant ID columns seems to support the
> camp that excoriates use of artificial ID columns and autoincrementation
> altogether.

I don't that it does that at all.  If properly implemented, sequences
provide a race free way to assign unique ids to a table that otherwise
would have no natural primary key.  I think that having a religious
preference one way or the other means you might use a suboptimal
method in some cases.  In the db I work on, we have artificial pks of
sequences, artificial pks made from a large list of pre-created ids (6
alphanum char locator codes, common in the travel industry) and still
other tables that have natural primary keys.  Each has a reason for
being the way it is.

I can't see this saying natural pks, which are often open to
interpretation and changes over time are any less are any better than
artificial pks from a sequence at all.  I can see it saying that the
people who developed RoR framework were NOT database specialists.

> The usual argument in their favor is that they speed up performance, but this
> epicyclic dance to accomodate FK references to autoincremented keys makes the
> case that there is also a performance penalty, and in the more critical
> performance area of code development and correctness than in the less critical
> search speed area.

Sorry, but that's only one of the two arguments.  The other one is
that natural pks are often impossible due to the dataset not being
able to be unique because of the business constraints.

In an airline reservation system, you might start with last and first
name for a pk.  Then add phone number.  except some people don't wanna
give their phone numbers, so you use steet address, only to find out
that John Smith with no phone lives on 123 Acorn St in Portland while
another John Smith with no phone lives on 123 Acorn St in
Phillidelphia.

And what do you do when they call back about their ticket?

Seriously, the natural primary key method has as many problems as the
artificial one.

In the USAF, they used to use a system to keep track of your medical
records.  The first letter of your last name, followed by the last
four of your SSN.  So, Ken Johnson might have an SSN of 123-45-6789
While Patricia Jackson might have an SSN of 453-89-6789.

Imagine Ken's surprise when his doctor tells him he's pregnant.  Or
Patricia's surprise to have a limb amputated when she went in for a
problem with heart murmers.

By the time you add enough fields together to get a reliable primary
key, you might have a multi-dimensional monster that kills
performance.

You can't just religiously say that one or the other is the only answer.

Finally the cost of software development is only a small part of the
overall cost of creating and deploying a piece of code.  Last place I
worked had a transactional database developed really quickly by non-db
people that used isam tables (no, not mysql with myisam, just isam)
that corrupted records dozens of times a day and required a team of
support people to keep it running semi-smoothly and hours of downtime
each night to be cleaned up.  The initial savings on development costs
were eaten up rather quickly by the ongoing support costs.