Thread: allocate chunk of sequence

allocate chunk of sequence

From
Gary Fu
Date:
hello,

I try to allocate a chunk of ids from a sequence with the following
proc.  However, if I don't use the 'lock lock_table', the proc may not
work when it runs at the same time by different psql sessions.  Is there
a better way without using the 'lock lock_table' ?

Thanks,
Gary

create or replace function proc_allocate_seq(int)
     returns int as $$

declare
     nNumberOfFiles          alias for $1;

     aFileId int;
     aNewFileId int;

begin

     lock lock_table;

     aFileId = nextval('aa_seq');

; sleep(3);    if you have the proc

     aNewFileId = setval('aa_seq', aFileId + nNumberOfFiles - 1);

     return aFileId;

end;
$$ language plpgsql;

Re: allocate chunk of sequence

From
Scott Marlowe
Date:
Gary Fu wrote:
> hello,
>
> I try to allocate a chunk of ids from a sequence with the following
> proc.  However, if I don't use the 'lock lock_table', the proc may not
> work when it runs at the same time by different psql sessions.  Is
> there a better way without using the 'lock lock_table' ?
>
>
>     aNewFileId = setval('aa_seq', aFileId + nNumberOfFiles - 1);
This is NOT the best way to use sequences.

Much better would be to use a loop to allocate the them one after the
other, and put them into a record type or something.

Do they HAVE to be contiguous?

If they're always the same size, then set the increment value of the
sequence on creation to reflect that.

i.e.:

create sequence abc increment by 20

then just select nextval, and you have that plus the 20 after it all to
yourself.  Lots of ways to handle this, but setval is generally the
worst way to handle anything in a highly parallel env.

Re: allocate chunk of sequence

From
Gary Fu
Date:
Scott Marlowe wrote:
> Gary Fu wrote:
>
>> hello,
>>
>> I try to allocate a chunk of ids from a sequence with the following
>> proc.  However, if I don't use the 'lock lock_table', the proc may not
>> work when it runs at the same time by different psql sessions.  Is
>> there a better way without using the 'lock lock_table' ?
>>
>>
>>     aNewFileId = setval('aa_seq', aFileId + nNumberOfFiles - 1);
>
> This is NOT the best way to use sequences.
>
> Much better would be to use a loop to allocate the them one after the
> other, and put them into a record type or something.
>
> Do they HAVE to be contiguous?
>
> If they're always the same size, then set the increment value of the
> sequence on creation to reflect that.
>
> i.e.:
>
> create sequence abc increment by 20
>
> then just select nextval, and you have that plus the 20 after it all to
> yourself.  Lots of ways to handle this, but setval is generally the
> worst way to handle anything in a highly parallel env.
>
>
Thanks for the reply.

The chunk to be allocated is not the same size, so to set the increment
value will not help.

I'm not sure how the nextval function to handle this internally, if it
has to read and update the sequence object. Does it use some kind of
lock ? Otherwise the problem mentioned here should happen to nextval
function also.

The lock command does not work for the sequence, so in my example, I
have to use a dummy table for lock to work.  Another thought is to have
the nextval function takes an extra argument for the increment value
(instead of the default increment value).

Gary

Re: allocate chunk of sequence

From
PFC
Date:
> The chunk to be allocated is not the same size, so to set the increment
> value will not help.

    I'm sometimes not that subtle, so I'd just use a BIGINT sequence. Think
about the largest chunk you'll ever get (probably less than 2^30 rows, yes
?), set this sequence increment to this very high value (this will still
let you have 2^32 chunks at least), and make each chunk be
(big_sequence_value + N).

> I'm not sure how the nextval function to handle this internally, if it
> has to read and update the sequence object. Does it use some kind of
> lock ? Otherwise the problem mentioned here should happen to nextval
> function also.

    Yes it takes a lock, but the lock is held for a very short time (just
increment the value and release the lock).

    In InnoDB, the AUTO_INCREMENT lock is held for the entire duration of the
SQL statement, including any triggers that might fire. This means if you
have ON INSERT triggers which take some time to process, INSERT
concurrency is horrendous. Not so with Postgres sequences.

> The lock command does not work for the sequence, so in my example, I
> have to use a dummy table for lock to work.  Another thought is to have
> the nextval function takes an extra argument for the increment value
> (instead of the default increment value).

    If you really really want your ids to be contiguous, you can't use the
bigint sequence above. So :
    If you do not want to spoil Postgres' good concurrency on inserts, you'll
want to avoid doing I/O while holding a lock. Burning CPU while holding a
lock is OK unless you want to use all your cores, but waiting for IO while
locked is BAD.

    So, here's the deal :

- the hard way is to patch nextval() for an extra argument
- the easy way is to write a plpgsql function doing this :
    - you have a chunk of N rows to insert.
    - get lock
    - X := nextval()
    - call nextval() (N-1) times in a plpgsql FOR loop
    - release lock
    - do your INSERTs, using X ... (X+N-1) as ids (generate_series() is your
friend)

    Thus you only lock while burning CPU calling nextval(). Not that bad. The
IO-intensive INSERT isn't under the lock so it can be concurrent.

Re: allocate chunk of sequence

From
Scott Marlowe
Date:
PFC wrote:
>
>> The chunk to be allocated is not the same size, so to set the
>> increment value will not help.
>
>     I'm sometimes not that subtle, so I'd just use a BIGINT sequence.
> Think about the largest chunk you'll ever get (probably less than 2^30
> rows, yes ?), set this sequence increment to this very high value
> (this will still let you have 2^32 chunks at least), and make each
> chunk be (big_sequence_value + N).
And that's only if they need to be contiguous.  If they don't, then just
grab however many you need one after the other.
>
>> I'm not sure how the nextval function to handle this internally, if
>> it has to read and update the sequence object. Does it use some kind
>> of lock ? Otherwise the problem mentioned here should happen to
>> nextval function also.
>
>     Yes it takes a lock, but the lock is held for a very short time
> (just increment the value and release the lock).
And that lock will be WAY shorter and bothersome than the lock you'll
need if you do select setval('seqname',select max(id)+1 from sometable,
FALSE); in the middle of your UDF.

>     In InnoDB, the AUTO_INCREMENT lock is held for the entire duration
> of the SQL statement, including any triggers that might fire. This
> means if you have ON INSERT triggers which take some time to process,
> INSERT concurrency is horrendous. Not so with Postgres sequences.
Note that that hasn't been true for some time now, according to Heikki
Tuuri (sp?).  He told me they changed the way that worked about halfway
through the 5.0 dev cycle so that they use a kind of internal sequence
much like postgresql.  Except in postgresql the sequence is exposed.

Again, needing contiguous ids is gonna cause a problem.  It is going to
serialize all inserts to your database, and slow it down in any kind of
large parallel environment.

If you can get away without them then do so.