Thread: Sequence question

Sequence question

From

Eric E

Date:

20 October 2004, 16:04:31

Hi,
    I have a question about sequences.  I need a field to have values
with no holes in the sequence.  However, the values do not need to be in
order.

My users will draw a number or numbers from the sequence and write to
the field.  Sometimes, however, these sequence numbers will be discarded
(after a transaction is complete), and thus available for use. During
the transaction, however, any drawn numbers need to be unavailable.
I would like the next user who draws a number to draw the lowest number
she can, starting with the holes in the sequence.

This continuous sequence is absolutely required by our company, as the
fact that the sequence has no holes is used to check for much more
serious problems.

So my question is:
what's the most effective way to get the next available number?

My present method is to do a query that finds the first and last number
in each of the holes, step through those holes, and then start
generating new numbers.  Unfortunately, this involves doing a table scan
each time - before I generate the number, and does not produce the
transaction-safety I want.

Does anyone have any better ideas?  Places I should look?

Thanks,

Eric

Re: Sequence question

From

David Ecker

Date:

20 October 2004, 16:25:58

Far from being a perfect idea but a faster solution than stepping through
all holes:

1) Create a second table containing only one field of type of your key.
2) When you delete an entry place the delete key value in your second table
3) If you insert a new entry into your old table and your new table contains
a value, take the minimum value in the new table as your new key and delete
that entry from the new table. If the new table is empty just use the
sequence to get the new key value.

Hope that helps
David Ecker

Eric E wrote:

> Hi,
>     I have a question about sequences.  I need a field to have values
> with no holes in the sequence.  However, the values do not need to be in
> order.
>
> My users will draw a number or numbers from the sequence and write to
> the field.  Sometimes, however, these sequence numbers will be discarded
> (after a transaction is complete), and thus available for use. During
> the transaction, however, any drawn numbers need to be unavailable.
> I would like the next user who draws a number to draw the lowest number
> she can, starting with the holes in the sequence.
>
> This continuous sequence is absolutely required by our company, as the
> fact that the sequence has no holes is used to check for much more
> serious problems.
>
> So my question is:
> what's the most effective way to get the next available number?
>
> My present method is to do a query that finds the first and last number
> in each of the holes, step through those holes, and then start
> generating new numbers.  Unfortunately, this involves doing a table scan
> each time - before I generate the number, and does not produce the
> transaction-safety I want.
>
> Does anyone have any better ideas?  Places I should look?
>
> Thanks,
>
> Eric

Re: Sequence question

From

Andrew Sullivan

Date:

20 October 2004, 16:58:06

On Tue, Oct 19, 2004 at 11:19:05AM -0400, Eric E wrote:

> My users will draw a number or numbers from the sequence and write to
> the field.  Sometimes, however, these sequence numbers will be discarded
> (after a transaction is complete), and thus available for use. During
> the transaction, however, any drawn numbers need to be unavailable.
> I would like the next user who draws a number to draw the lowest number
> she can, starting with the holes in the sequence.

There are two ways I've seen to do this.  One is the low-concurrency
way.  Another is a sort of clever approach that isn't theoretically
perfect, but which provides slightly better concurrency.

The low-concurrency approach is pretty much what you'd expect: keep
the value in a table which is locked by each transaction which is
incrementing it, and complete the incrementing in the transaction
scope.  That way, if it rolls back, the value hasn't been
incremented, and is ready for the next user.  The problem, of course,
is that this forces every transaction to stand in line.

An alternative approach I've heard is to pre-allocate numbers from a
sequence into a table:

create table seq_allocation (
    serialno int8 not null unique,
    grant_status int
        constraint status_limiter check (grant_status in
        (1,2,3)) );

The idea is that a grant_status of 1 means the serial number is
unallocated, a grant_status of 2 means it's pending, and 3 means it's
granted.  When you start, in one transaction you pick the next
available serialno with a status of 1.  Then you update that row to
set it to 2 (make sure you use "where grant_status = 1" to avoid a
race condition), and then commit.  Now you have your serial number.
Use it, and then at the end of your transaction where you are
committing, set the grant_status to 3, so you know it's really used.

Now, how do you handle the cases where either the transaction fails
so you can't set it to 3?  Simple: your client captures errors and
then sets the value back to 1 later.  For client errors, you need yet
another process which will go around periodically and check for
grant_status = 2, and make sure nobody's actually in the middle of
trying to use them.  (You could refine the seq_allocation table by
storing the pid of the allocating back end.  Then your maintenance
script could look for such a back end while cleaning up.)

The savepoints features of 8.0 will make some of this even easier for
you.

Note that this second method is not completely bulletproof, but it
might be good enough for the cases you want.  I have a feeling,
however, that you're creating a new problem for yourself by not being
able to skip sequence values.  My bet is that you actually need to
find a better way to solve the "other serious problems" you have
rather than banging on sequences to get them to fit your intended
use.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to.  That actually seems sort of quaint now.
        --J.D. Baldwin

Re: Sequence question

From

Alvaro Herrera

Date:

20 October 2004, 18:00:51

On Wed, Oct 20, 2004 at 11:57:42AM -0400, Andrew Sullivan wrote:

> Now, how do you handle the cases where either the transaction fails
> so you can't set it to 3?  Simple: your client captures errors and
> then sets the value back to 1 later.

Has anyone read "the Sagas paper" by Garcia-Molina?  They present a way
to handle "extended transaction models", trying to cope sort-of
automatically with this kind of situations.  More generally, AFAIU the
idea is to have multi-transaction recoverability and rollback-ability.
It seems interesting.

http://portal.acm.org/citation.cfm?doid=38713.38742

I have only skimmed through it, but it sounds somewhat interesting.
I'd love to know what do people think of this.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)

Re: Sequence question

From

Eric E

Date:

20 October 2004, 19:58:39

Hi Andrew,
   I had basically started working on an idea like the second approach,
but had not been able to put the status element so clearly.  I really
like the statuses of available, pending, and granted.

There's one more twist I think I can use to optimize this: once a number
is assigned, it cannot be reassigned.  So I think I can add:

  - have the sequence preallocation table hold only numbers with status
being available or pending, i.e., delete numbers once they have been
allocated.  This leaves on two possible statuses: available and pending.
  - push off getting new numbers into the preallocation table with a
full-table search until convenient times

I also liked your point about the atomicity of :
get number, change status to pending, commit

After that one can proceed with writing the number into my data table.
My thought was that the you could set the status ussing sessionID.  That
way a server-side job could look for expired sessions and remark those
numbers available.

Any thoughts?

This is point is definitely important
 > I have a feeling,
 > however, that you're creating a new problem for yourself by not being
 > able to skip sequence values.  My bet is that you actually need to
 > find a better way to solve the "other serious problems" you have
 > rather than banging on sequences to get them to fit your intended
 > use.

I just haven't really seen anyway around the need to use all of our
storage rows that doens't involve a complicated mapping to boxes.

Thanks,

Eric


Andrew Sullivan wrote:
> On Tue, Oct 19, 2004 at 11:19:05AM -0400, Eric E wrote:
>
>
>>My users will draw a number or numbers from the sequence and write to
>>the field.  Sometimes, however, these sequence numbers will be discarded
>>(after a transaction is complete), and thus available for use. During
>>the transaction, however, any drawn numbers need to be unavailable.
>>I would like the next user who draws a number to draw the lowest number
>>she can, starting with the holes in the sequence.
>
>
> There are two ways I've seen to do this.  One is the low-concurrency
> way.  Another is a sort of clever approach that isn't theoretically
> perfect, but which provides slightly better concurrency.
>
> The low-concurrency approach is pretty much what you'd expect: keep
> the value in a table which is locked by each transaction which is
> incrementing it, and complete the incrementing in the transaction
> scope.  That way, if it rolls back, the value hasn't been
> incremented, and is ready for the next user.  The problem, of course,
> is that this forces every transaction to stand in line.
>
> An alternative approach I've heard is to pre-allocate numbers from a
> sequence into a table:
>
> create table seq_allocation (
>     serialno int8 not null unique,
>     grant_status int
>         constraint status_limiter check (grant_status in
>         (1,2,3)) );
>
> The idea is that a grant_status of 1 means the serial number is
> unallocated, a grant_status of 2 means it's pending, and 3 means it's
> granted.  When you start, in one transaction you pick the next
> available serialno with a status of 1.  Then you update that row to
> set it to 2 (make sure you use "where grant_status = 1" to avoid a
> race condition), and then commit.  Now you have your serial number.
> Use it, and then at the end of your transaction where you are
> committing, set the grant_status to 3, so you know it's really used.
>
> Now, how do you handle the cases where either the transaction fails
> so you can't set it to 3?  Simple: your client captures errors and
> then sets the value back to 1 later.  For client errors, you need yet
> another process which will go around periodically and check for
> grant_status = 2, and make sure nobody's actually in the middle of
> trying to use them.  (You could refine the seq_allocation table by
> storing the pid of the allocating back end.  Then your maintenance
> script could look for such a back end while cleaning up.)
>
> The savepoints features of 8.0 will make some of this even easier for
> you.
>
> Note that this second method is not completely bulletproof, but it
> might be good enough for the cases you want.  I have a feeling,
> however, that you're creating a new problem for yourself by not being
> able to skip sequence values.  My bet is that you actually need to
> find a better way to solve the "other serious problems" you have
> rather than banging on sequences to get them to fit your intended
> use.
>
> A
>

Re: Sequence question

From

Andrew Sullivan

Date:

20 October 2004, 20:38:22

On Wed, Oct 20, 2004 at 02:59:27PM -0400, Eric E wrote:
>  - have the sequence preallocation table hold only numbers with status
> being available or pending, i.e., delete numbers once they have been
> allocated.  This leaves on two possible statuses: available and pending.

I would argue that you're best to record everything.  That is, your
proposal moves from a table which covers all the possible states --
granted, pending, and available -- to a table which entails ambiguous
answers to some questions.  Since you're not going to preallocate
every logically possible id, then if an id isn't in the table, you
can't tell if it simply has never been allocated, or if it has
already been used.  If you have that stored elsewhere, though, you
can get it from a join, so perhaps there's no need for this.  (I note
that a real normalisation freak, like the one I occasionally play on
TV, would require you to use a REFERENCES constraint on the status
value, and use the referenced table as a control for what status
values you can use.  This has the not inconsiderable benefit that if
you have a new status -- say, "storage burned down" or "impounded by
SCO for copyright violation" or something else -- you have a
completely trivial way to add it.  It's certainly the way I'd
actually do this.)

> I also liked your point about the atomicity of :
> get number, change status to pending, commit

The real problem with it is that you do have the possibility of
orphaned pending actions.

> My thought was that the you could set the status ussing sessionID.  That
> way a server-side job could look for expired sessions and remark those
> numbers available.

That's something like what I'd do, yes.  It mostly depends on what's
available to your application.  I tend to be very belt-and-suspenders
about this sort of thing.  Probably I'd put a wall-clock timestamp on
the field, too, to give me clues about when things might be going
wrong, &c.

> > find a better way to solve the "other serious problems" you have
> > rather than banging on sequences to get them to fit your intended
> > use.
>
> I just haven't really seen anyway around the need to use all of our
> storage rows that doens't involve a complicated mapping to boxes.

I was more concerned that you were trying to do this for invoice
numbers, another place where people often require serial numbers.  In
that case, I usually think they're wrong, because I can think of
plenty of better ways to solve that one (unless it's a legal
requirement, which is sometimes is).  But mapping data points to
places in space is one of those cases where you probably _do_ need
this sort of preallocation mechanism.  It's what hotels do, after
all.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.
        --Philip Greenspun