Thread: Best approach for a "gap-less" sequence

Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Hi!


I was trying to solve a problem on an old system and realized that there might
be some better approach for doing what I need.

We have some documents that need to be ordered sequentially and without gaps.
I could use a sequence, but if the transaction fails then when I rollback the
sequence will already have been incremented.

So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to
it, read the value, increase it, do what I need and then I COMMIT the
transaction, ensuring that the sequence has no gaps.

Is there a better way to guarantee that there will be no gaps in my sequence
if something goes wrong with my transaction?


--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
"chris smith"
Date:
On 8/12/06, Jorge Godoy <jgodoy@gmail.com> wrote:
>
> Hi!
>
>
> I was trying to solve a problem on an old system and realized that there might
> be some better approach for doing what I need.
>
> We have some documents that need to be ordered sequentially and without gaps.
> I could use a sequence, but if the transaction fails then when I rollback the
> sequence will already have been incremented.
>
> So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to
> it, read the value, increase it, do what I need and then I COMMIT the
> transaction, ensuring that the sequence has no gaps.
>
> Is there a better way to guarantee that there will be no gaps in my sequence
> if something goes wrong with my transaction?

Why does it matter?

I assume there is a reason you need it like this..

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Best approach for a "gap-less" sequence

From
Thomas Kellerer
Date:
Jorge Godoy wrote on 12.08.2006 01:33:
> I was trying to solve a problem on an old system and realized that there might
> be some better approach for doing what I need.
>
> We have some documents that need to be ordered sequentially and without gaps.
> I could use a sequence, but if the transaction fails then when I rollback the
> sequence will already have been incremented.
>
> So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to
> it, read the value, increase it, do what I need and then I COMMIT the
> transaction, ensuring that the sequence has no gaps.
>
> Is there a better way to guarantee that there will be no gaps in my sequence
> if something goes wrong with my transaction?

What do you do if a document gets deleted? Renumber the "following" documents so
that no gaps are present in the already used ids?

Thomas

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
"chris smith" <dmagick@gmail.com> writes:

> Why does it matter?
>
> I assume there is a reason you need it like this..

Of course there is.  It is a project requirement and also a law requirement
that there's no unused number and that they be chronologically ordered as
well.  This is also part of the documented procedure that existed in paper and
that got ISO 9001 certified (so a lot of money was spent here before).  The
law requirement is the strongest reason, though.

After a number is assigned, it can't be changed, reused or have anything
"newer" in a 'previous' (numerically-wise) entry.

Concurrency is a problem since there might be a lot of people using it.  I
wanted to see if there was something that could improve performance here or to
solve the problem in a better way without locking the table.


Thanks,
--
Jorge Godoy      <jgodoy@gmail.com>


Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:

> What do you do if a document gets deleted? Renumber the "following" documents
> so that no gaps are present in the already used ids?

There's no deletion possibility.  A RULE sets a column named "active" to
"False" instead (I can set it manually or let the RULE do that for me...).

--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Jorge Godoy <jgodoy@gmail.com> writes:

> Is there a better way to guarantee that there will be no gaps in my sequence
> if something goes wrong with my transaction?

From the overwhelming feedback I assume there isn't a better way yet...
Thanks.  I'll see how I can improve the model then to separate these sequences
into different tables.

--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
Christian Kratzer
Date:
Hi,

On Sat, 12 Aug 2006, chris smith wrote:

> On 8/12/06, Jorge Godoy <jgodoy@gmail.com> wrote:
<snipp/>
>> Is there a better way to guarantee that there will be no gaps in my
>> sequence
>> if something goes wrong with my transaction?
>
> Why does it matter?
>
> I assume there is a reason you need it like this..

For example german tax law requires invoices to be numbered
sequentially without gaps.  This is supposed to make it harder
to cheat on VAT.

You cannot just drop an invoice as that would leave a gap.  Tax
inspectors will search for gaps and query to whatever invoice
is missing from records.

I could not care less about gaps in surrogate keys but this
kind of stuff is an external requirement.

Theres propably not much choice on how to implement something
like this but to just store the last assigned number in some row.

I would at least try to assign multiple such numbers in batches
to mimize contention on the row you store the counter in.

Greetings
Christian

--
Christian Kratzer                       ck@cksoft.de
CK Software GmbH                        http://www.cksoft.de/
Phone: +49 7452 889 135                 Fax: +49 7452 889 136

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Christian Kratzer <ck-lists@cksoft.de> writes:

> I would at least try to assign multiple such numbers in batches to mimize
> contention on the row you store the counter in.

What do you mean here?  How would you guarantee that on of the receiver
transactions didn't rollback and left a gap in the "sequence"?

I believe that for invoices it is less problematic.  At least here I don't
need the "time" part control, so if I leave one blank I can fill it later in
the same day without problems (except, of course, if the sequence number is
tied to some other physical evidence such as the paper counterpart of the
invoice and that is also chronologically assigned).

The whole problem appears because no matter how much we validate input and
relationships on the input interface, something might happen and make the
"INSERT" transaction fail.  Theoretically, all should go fine, but... :-)

--
Jorge Godoy      <jgodoy@gmail.com>


Re: Best approach for a "gap-less" sequence

From
Christian Kratzer
Date:
Hi,

On Sun, 13 Aug 2006, Jorge Godoy wrote:

> Christian Kratzer <ck-lists@cksoft.de> writes:
>
>> I would at least try to assign multiple such numbers in batches to mimize
>> contention on the row you store the counter in.
>
> What do you mean here?  How would you guarantee that on of the receiver
> transactions didn't rollback and left a gap in the "sequence"?

you would need to serialize the transactions assigning the
numbers and you would need to update the the counter in
the same transaction that assigns your numbers to your
documents or whatever.

Assigning a batch of 1000 numbers in one transaction would
propably be more efficient than assigning 1000 numbers in
1000 separate transactions that all need to be serialized.

> I believe that for invoices it is less problematic.  At least here I don't
> need the "time" part control, so if I leave one blank I can fill it later in
> the same day without problems (except, of course, if the sequence number is
> tied to some other physical evidence such as the paper counterpart of the
> invoice and that is also chronologically assigned).

Thats of course the idea. The numbers on the paper invoices
have to be gapless.   The tax people want to have a warm
fuzzy feeling that they are seeing all your invoices or they
will begin to speculate on how much vat they have not
received from you.

> The whole problem appears because no matter how much we validate input and
> relationships on the input interface, something might happen and make the
> "INSERT" transaction fail.  Theoretically, all should go fine, but... :-)

increment the counter in the same transaction that assigns
your values.

Of course I know little or nothing about your application
and what you need gaples sequences for.

I just pulled the invoice example out of my hat to show
that there are legitimate use cases for gapless sequences
of numbers.

Greetings
Christian

--
Christian Kratzer                       ck@cksoft.de
CK Software GmbH                        http://www.cksoft.de/
Phone: +49 7452 889 135                 Fax: +49 7452 889 136

Re: Best approach for a "gap-less" sequence

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jorge Godoy wrote:
> Jorge Godoy <jgodoy@gmail.com> writes:
>
>> Is there a better way to guarantee that there will be no gaps in my sequence
>> if something goes wrong with my transaction?
>
> From the overwhelming feedback I assume there isn't a better way yet...
> Thanks.  I'll see how I can improve the model then to separate these sequences
> into different tables.

Pre-allocate records.  The (primary key?) field would have the
numbers already filled in, but all the rest of the fields in each
record be NULL, blanks, zeros or indicator values ("~~~~~~~~~~",
- -999999999, etc).

Then create a single-field table called, for example, CUR_MAX_VALUE
that gets incremented as part of each transaction.  To serialize
access, transactions would need an EXCLUSIVE lock on the table.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE31J0S9HxQb37XmcRAkofAKCATXegeO6VRM8MW7AOkrFenMBtWgCgkksN
+7yKXTm3STQvLo7KTduUhsY=
=kxsK
-----END PGP SIGNATURE-----

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Ron Johnson <ron.l.johnson@cox.net> writes:

> Pre-allocate records.  The (primary key?) field would have the
> numbers already filled in, but all the rest of the fields in each
> record be NULL, blanks, zeros or indicator values ("~~~~~~~~~~",
> -999999999, etc).
>
> Then create a single-field table called, for example, CUR_MAX_VALUE
> that gets incremented as part of each transaction.  To serialize
> access, transactions would need an EXCLUSIVE lock on the table.

What's the difference to having just the table with the sequence where I make
an exclusive lock to get the value while inside the transaction?  This
approach seems more complicated since I'd have to exclude records that match
the "not-used" pattern.



--
Jorge Godoy      <jgodoy@gmail.com>

Attachment

Re: Best approach for a "gap-less" sequence

From
Ron Johnson
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jorge Godoy wrote:
> Ron Johnson <ron.l.johnson@cox.net> writes:
>
>> Pre-allocate records.  The (primary key?) field would have the
>> numbers already filled in, but all the rest of the fields in each
>> record be NULL, blanks, zeros or indicator values ("~~~~~~~~~~",
>> -999999999, etc).
>>
>> Then create a single-field table called, for example, CUR_MAX_VALUE
>> that gets incremented as part of each transaction.  To serialize
>> access, transactions would need an EXCLUSIVE lock on the table.
>
> What's the difference to having just the table with the sequence where I make
> an exclusive lock to get the value while inside the transaction?  This
> approach seems more complicated since I'd have to exclude records that match
> the "not-used" pattern.

The use of CUR_MAX_VALUE "should" ensure that you never have gaps,
since a rollback or process death would not update CUR_MAX_VALUE.

Your WHERE clauses would *not* have
    AND NAME <> "~~~~~~~~~~".
They would say
    AND SEQ_NO <= (SELECT CUR_MAX_VALUE FROM CUR_MAX_VALUE).

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE34jdS9HxQb37XmcRArBMAJ9ZS3/daUhhKu5f22nfo2m2AlXRfgCg7IfG
amkfOOnaJ1UzKRdlyZfJlvE=
=KCnM
-----END PGP SIGNATURE-----

Re: Best approach for a "gap-less" sequence

From
Chris
Date:
Jorge Godoy wrote:
> Jorge Godoy <jgodoy@gmail.com> writes:
>
>> Is there a better way to guarantee that there will be no gaps in my sequence
>> if something goes wrong with my transaction?
>
>From the overwhelming feedback I assume there isn't a better way yet...
> Thanks.  I'll see how I can improve the model then to separate these sequences
> into different tables.
>

I'm not sure what type of lock you'd need to make sure no other
transactions updated the table (see
http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in
theory" something like this should work:

begin;
select id from table order by id desc limit 1;
insert into table (id, blah) values (id+1, 'blah');
commit;


P.S. I'm sure in older versions this query wouldn't use an index:
select max(id) from table;

I'm not sure about 8.0+.. hence doing an order by the id desc limit 1.

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Chris <dmagick@gmail.com> writes:

> I'm not sure what type of lock you'd need to make sure no other transactions
> updated the table (see
> http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory"
> something like this should work:
>
> begin;
> select id from table order by id desc limit 1;
> insert into table (id, blah) values (id+1, 'blah');
> commit;

This is part of the solution, yes.  But I would still need locking this table
so that no other concurrent transaction gets another "id".  I don't want to
lock the main table -- as I believe you're suggesting -- because I want it to
be searchable and updatable while I'm inserting new data.  I just can't have
gaps in the sequence but I don't want to restrict everything else here.

> P.S. I'm sure in older versions this query wouldn't use an index:
> select max(id) from table;

It doesn't.  You'd have to do what you did: "order by <x> desc limit 1" to
have it using indexes...

> I'm not sure about 8.0+.. hence doing an order by the id desc limit 1.

I also have to test it...  But I still keep using the "order by desc" syntax
:-)



Thanks for your answer,
--
Jorge Godoy      <jgodoy@gmail.com>


Re: Best approach for a "gap-less" sequence

From
Michael Fuhr
Date:
On Mon, Aug 14, 2006 at 09:09:51AM -0300, Jorge Godoy wrote:
> Chris <dmagick@gmail.com> writes:
> > P.S. I'm sure in older versions this query wouldn't use an index:
> > select max(id) from table;
>
> It doesn't.  You'd have to do what you did: "order by <x> desc limit 1" to
> have it using indexes...
>
> > I'm not sure about 8.0+.. hence doing an order by the id desc limit 1.
>
> I also have to test it...  But I still keep using the "order by desc" syntax

Excerpt from the 8.1 Release Notes:

  Automatically use indexes for MIN() and MAX() (Tom)

      In previous releases, the only way to use an index for MIN()
      or MAX() was to rewrite the query as SELECT col FROM tab ORDER
      BY col LIMIT 1.  Index usage now happens automatically.

--
Michael Fuhr

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Michael Fuhr <mike@fuhr.org> writes:

>   Automatically use indexes for MIN() and MAX() (Tom)
>
>       In previous releases, the only way to use an index for MIN()
>       or MAX() was to rewrite the query as SELECT col FROM tab ORDER
>       BY col LIMIT 1.  Index usage now happens automatically.

Thanks Michael!  This is great.

--
Jorge Godoy      <jgodoy@gmail.com>


Re: Best approach for a "gap-less" sequence

From
Alvaro Herrera
Date:
Jorge Godoy wrote:
> Chris <dmagick@gmail.com> writes:
>
> > I'm not sure what type of lock you'd need to make sure no other transactions
> > updated the table (see
> > http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory"
> > something like this should work:
> >
> > begin;
> > select id from table order by id desc limit 1;
> > insert into table (id, blah) values (id+1, 'blah');
> > commit;
>
> This is part of the solution, yes.  But I would still need locking this table
> so that no other concurrent transaction gets another "id".  I don't want to
> lock the main table -- as I believe you're suggesting -- because I want it to
> be searchable and updatable while I'm inserting new data.

So you have to hold a lock that conflicts with itself, but not with
ACCESS SHARE which is the lock acquired by SELECT.  I think the first
one on the list with these two properties is SHARE UPDATE EXCLUSIVE.
Have a look at the list yourself:

http://www.postgresql.org/docs/8.1/static/explicit-locking.html

Note the tip at the end of the table:

Tip:  Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR
UPDATE/SHARE) statement.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Best approach for a "gap-less" sequence

From
AgentM
Date:
Since the gapless numbers are purely for the benefit of the tax
people, you could build your db with regular sequences as primary
keys and then regularly (or just before tax-time) insert into a table
which maps the gapless sequence to the real primary key.

-M



Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
AgentM <agentm@themactionfaction.com> writes:

> Since the gapless numbers are purely for the benefit of the tax people, you
> could build your db with regular sequences as primary  keys and then regularly
> (or just before tax-time) insert into a table  which maps the gapless sequence
> to the real primary key.

That's also an interesting approach.  An auxiliary table like

       transaction integer FK to the transactions table
       transaction_nb integer  gapless sequence

should do it.  A trigger inserting on this auxiliary table would also take
care of everything...  If I have an after trigger I believe I wouldn't need
any locking...  I have to think about this...

As simple as this might be, I haven't thought about it :-)  Thanks for your
suggestion.

--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
Harald Fuchs
Date:
In article <878xlrwcxm.fsf@gmail.com>,
Jorge Godoy <jgodoy@gmail.com> writes:

> AgentM <agentm@themactionfaction.com> writes:
>> Since the gapless numbers are purely for the benefit of the tax people, you
>> could build your db with regular sequences as primary  keys and then regularly
>> (or just before tax-time) insert into a table  which maps the gapless sequence
>> to the real primary key.

> That's also an interesting approach.  An auxiliary table like

>        transaction integer FK to the transactions table
>        transaction_nb integer  gapless sequence

> should do it.  A trigger inserting on this auxiliary table would also take
> care of everything...  If I have an after trigger I believe I wouldn't need
> any locking...  I have to think about this...

Why putting gapless numbers into the database at all?  Just calculate them at
query time.

Re: Best approach for a "gap-less" sequence

From
Richard Broersma Jr
Date:
> > AgentM <agentm@themactionfaction.com> writes:
> >> Since the gapless numbers are purely for the benefit of the tax people, you
> >> could build your db with regular sequences as primary  keys and then regularly
> >> (or just before tax-time) insert into a table  which maps the gapless sequence
> >> to the real primary key.
>
> > That's also an interesting approach.  An auxiliary table like
>
> >        transaction integer FK to the transactions table
> >        transaction_nb integer  gapless sequence
>
> > should do it.  A trigger inserting on this auxiliary table would also take
> > care of everything...  If I have an after trigger I believe I wouldn't need
> > any locking...  I have to think about this...
>
> Why putting gapless numbers into the database at all?  Just calculate them at
> query time.

I am curious, can you calculate something like this using only sql? Or you you need to employee a
procedural language like plpsgql?

Regards,

Richard Broersma Jr.

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Harald Fuchs <hf0731x@protecting.net> writes:

> Why putting gapless numbers into the database at all?  Just calculate them at
> query time.

And how would you retrieve the record that corresponds to invoice number
#16355, for example?  Recalculating few records is fine, but millions of them
everytime you need to recover some of those is something that doesn't look
efficient to me...

--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
Scott Ribe
Date:
> Why putting gapless numbers into the database at all?  Just calculate them at
> query time.

There is ABSOLUTELY NO WAY that would be acceptable for accounting or legal
purposes. It would be the same as fabricating the numbers during an audit.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Best approach for a "gap-less" sequence

From
Berend Tober
Date:
Jorge Godoy wrote:

> Chris <dmagick@gmail.com> writes:
>
>
>>I'm not sure what type of lock you'd need to make sure no other transactions
>>updated the table (see
>>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory"
>>something like this should work:
>>
>>begin;
>>select id from table order by id desc limit 1;
>>insert into table (id, blah) values (id+1, 'blah');
>>commit;
>
>
> This is part of the solution, yes.  But I would still need locking this table
> so that no other concurrent transaction gets another "id".  I don't want to
> lock the main table --

Wouldn't SELECT ... FOR UPDATE give you the row lock you need without
locking the table?

 From "http://www.postgresql.org/docs/8.1/interactive/sql-select.html":

"FOR UPDATE/FOR SHARE Clause

...FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being modified or
deleted by other transactions until the current transaction ends. That
is, other transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE
of these rows will be blocked until the current transaction ends. Also,
if an UPDATE, DELETE, or SELECT FOR UPDATE from another transaction has
already locked a selected row or rows, SELECT FOR UPDATE will wait for
the other transaction to complete, and will then lock and return the
updated row (or no row, if the row was deleted). ..."

Regards,
Berend Tober

Re: Best approach for a "gap-less" sequence

From
Harald Fuchs
Date:
In article <20060814162836.58963.qmail@web31811.mail.mud.yahoo.com>,
Richard Broersma Jr <rabroersma@yahoo.com> writes:

> I am curious, can you calculate something like this using only sql? Or you you need to employee a
> procedural language like plpsgql?

You could use something like

  SELECT (SELECT count(*) FROM tbl t2 WHERE t2.id < t1.id), whatever
  FROM tbl t1

but correlated subqueries are slow; thus incrementing the counter in
the application would be faster for huge reports.

Re: Best approach for a "gap-less" sequence

From
Harald Fuchs
Date:
In article <87zme7uvcn.fsf@gmail.com>,
Jorge Godoy <jgodoy@gmail.com> writes:

> Harald Fuchs <hf0731x@protecting.net> writes:
>> Why putting gapless numbers into the database at all?  Just calculate them at
>> query time.

> And how would you retrieve the record that corresponds to invoice number
> #16355, for example?  Recalculating few records is fine, but millions of them
> everytime you need to recover some of those is something that doesn't look
> efficient to me...

This would be

  SELECT whatever
  FROM tbl
  ORDER BY id
  LIMIT 1
  OFFSET 16355 -1

Since id is the primary key, this can use an index scan.

Re: Best approach for a "gap-less" sequence

From
Harald Fuchs
Date:
In article <C1061F94.52761%scott_ribe@killerbytes.com>,
Scott Ribe <scott_ribe@killerbytes.com> writes:

>> Why putting gapless numbers into the database at all?  Just
>> calculate them at query time.

> There is ABSOLUTELY NO WAY that would be acceptable for accounting or legal
> purposes. It would be the same as fabricating the numbers during an audit.

At some point in time those numbers "get fabricated" anyway.  As long
as you don't change the records inbetween, the technical effect would
be the same.  But you might be right that this is forbidden.  I don't
speak legalese.

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Harald Fuchs <hf0731x@protecting.net> writes:

> In article <87zme7uvcn.fsf@gmail.com>,
> Jorge Godoy <jgodoy@gmail.com> writes:
>
>> Harald Fuchs <hf0731x@protecting.net> writes:
>>> Why putting gapless numbers into the database at all?  Just calculate them at
>>> query time.
>
>> And how would you retrieve the record that corresponds to invoice number
>> #16355, for example?  Recalculating few records is fine, but millions of them
>> everytime you need to recover some of those is something that doesn't look
>> efficient to me...
>
> This would be
>
>   SELECT whatever
>   FROM tbl
>   ORDER BY id
>   LIMIT 1
>   OFFSET 16355 -1
>
> Since id is the primary key, this can use an index scan.

If the ID was the number yes.  I thought you were suggesting having the number
computed at "query time", not to insert the record on the table...

--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
Brad Nicholson
Date:
On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote:
> Jorge Godoy wrote:
>
> > Chris <dmagick@gmail.com> writes:
> >
> >
> >>I'm not sure what type of lock you'd need to make sure no other transactions
> >>updated the table (see
> >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory"
> >>something like this should work:
> >>
> >>begin;
> >>select id from table order by id desc limit 1;
> >>insert into table (id, blah) values (id+1, 'blah');
> >>commit;
> >
> >
> > This is part of the solution, yes.  But I would still need locking this table
> > so that no other concurrent transaction gets another "id".  I don't want to
> > lock the main table --
>
> Wouldn't SELECT ... FOR UPDATE give you the row lock you need without
> locking the table?

Nope, concurrent transactions won't work.

Let current max id = x

Transaction 1 (t1) does a select max(id) for update, gets a lock on the
last tuple at the time of the select, and gets x as a value for max id

Transaction 2 (t2) does a select max(id) for update, has to wait for t1
to release its lock.

t1 inserts (x+1) as the new max id of the table.  t1 releases its lock

t2 is granted the lock on the tuple it has been waiting for, which
contains the max id of x

t2 tries to insert a value of x+1, insert fails (if it doesn't, you
really want to have a close look at your constraints :-)

Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.




Re: Best approach for a "gap-less" sequence

From
Adrian Klaver
Date:
On Monday 14 August 2006 01:59 pm, Brad Nicholson wrote:
> On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote:
> > Jorge Godoy wrote:
> > > Chris <dmagick@gmail.com> writes:
> > >>I'm not sure what type of lock you'd need to make sure no other
> > >> transactions updated the table (see
> > >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in
> > >> theory" something like this should work:
> > >>
> > >>begin;
> > >>select id from table order by id desc limit 1;
> > >>insert into table (id, blah) values (id+1, 'blah');
> > >>commit;
> > >
> > > This is part of the solution, yes.  But I would still need locking this
> > > table so that no other concurrent transaction gets another "id".  I
> > > don't want to lock the main table --
> >
> > Wouldn't SELECT ... FOR UPDATE give you the row lock you need without
> > locking the table?
>
> Nope, concurrent transactions won't work.
>
> Let current max id = x
>
> Transaction 1 (t1) does a select max(id) for update, gets a lock on the
> last tuple at the time of the select, and gets x as a value for max id
>
> Transaction 2 (t2) does a select max(id) for update, has to wait for t1
> to release its lock.
>
> t1 inserts (x+1) as the new max id of the table.  t1 releases its lock
>
> t2 is granted the lock on the tuple it has been waiting for, which
> contains the max id of x
>
> t2 tries to insert a value of x+1, insert fails (if it doesn't, you
> really want to have a close look at your constraints :-)
>

I am still working through this stuff myself, but the following excerpt from
the documentation would seem to contradict what you are saying. See the part
marked with ***. t2 should see a new max(id) after t1 commits and therefore
insert(x+1) would succeed.

http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE

"FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as
though for update. This prevents them from being modified or deleted by other
transactions until the current transaction ends. That is, other transactions
that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these rows will be
blocked until the current transaction ends.*** Also, if an UPDATE, DELETE, or
SELECT FOR UPDATE from another transaction has already locked a selected row
or rows, SELECT FOR UPDATE will wait for the other transaction to complete,
and will then lock and return the updated row (or no row, if the row was
deleted).***"
--
Adrian Klaver
aklaver@comcast.net

Re: Best approach for a "gap-less" sequence

From
Berend Tober
Date:
Brad Nicholson wrote:

> On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote:
>
>>Jorge Godoy wrote:
>>
>>
>>>Chris <dmagick@gmail.com> writes:
>>>
>>>
>>>
>>>>I'm not sure what type of lock you'd need to make sure no other transactions
>>>>updated the table (see
>>>>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory"
>>>>something like this should work:
>>>>
>>>>begin;
>>>>select id from table order by id desc limit 1;
>>>>insert into table (id, blah) values (id+1, 'blah');
>>>>commit;
>>>
>>>
>>>This is part of the solution, yes.  But I would still need locking this table
>>>so that no other concurrent transaction gets another "id".  I don't want to
>>>lock the main table --
>>
>>Wouldn't SELECT ... FOR UPDATE give you the row lock you need without
>>locking the table?
>
>
> Nope, concurrent transactions won't work.
>
> Let current max id = x
>
> Transaction 1 (t1) does a select max(id) for update, gets a lock on the
> last tuple at the time of the select, and gets x as a value for max id
>
> Transaction 2 (t2) does a select max(id) for update, has to wait for t1
> to release its lock.
>
> t1 inserts (x+1) as the new max id of the table.  t1 releases its lock
>
> t2 is granted the lock on the tuple it has been waiting for, which
> contains the max id of x
>
> t2 tries to insert a value of x+1, insert fails (if it doesn't, you
> really want to have a close look at your constraints :-)


I see. The FOR UPDATE form is not applicable with aggregates.

I was looking at this as if he uses a separate table to keep track of
the most-recently-issued sequence value, as the original post specified
was the case, whereas using the MAX aggregate, as suggested
subsequently, implies calculating the most-recently-used sequence value
from the table that has all the earlier sequence values in it.

The original question stated he "...acquire(d) a SHARE ROW EXCLUSIVE
lock", which locks the whole table, so my thought was to suggest that
locking the entire table was not necessary -- rather only the row. Of
course, if the control table has only one row, then this may not matter.

But since nothing beats empirical evidence, I tried it using

BEGIN;
SELECT column1 FROM test.table1 FOR UPDATE;
UPDATE test.table1 SET column1=column1+1;
COMMIT;

executed line-by-line alternating between two separate pgAdmin SQL
windows, and concurrency is properly accounted for.

Regards,
Berend Tober

Re: Best approach for a "gap-less" sequence

From
Adrian Klaver
Date:
On Monday 14 August 2006 02:46 pm, Adrian Klaver wrote:
> > Let current max id = x
> >
> > Transaction 1 (t1) does a select max(id) for update, gets a lock on the
> > last tuple at the time of the select, and gets x as a value for max id
> >
> > Transaction 2 (t2) does a select max(id) for update, has to wait for t1
> > to release its lock.
> >
> > t1 inserts (x+1) as the new max id of the table.  t1 releases its lock
> >
> > t2 is granted the lock on the tuple it has been waiting for, which
> > contains the max id of x
> >
> > t2 tries to insert a value of x+1, insert fails (if it doesn't, you
> > really want to have a close look at your constraints :-)
>
> I am still working through this stuff myself, but the following excerpt
> from the documentation would seem to contradict what you are saying. See
> the part marked with ***. t2 should see a new max(id) after t1 commits and
> therefore insert(x+1) would succeed.
>
> http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDA
>TE-SHARE
>
> "FOR UPDATE causes the rows retrieved by the SELECT statement to be locked
> as though for update. This prevents them from being modified or deleted by
> other transactions until the current transaction ends. That is, other
> transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these
> rows will be blocked until the current transaction ends.*** Also, if an
> UPDATE, DELETE, or SELECT FOR UPDATE from another transaction has already
> locked a selected row or rows, SELECT FOR UPDATE will wait for the other
> transaction to complete, and will then lock and return the updated row (or
> no row, if the row was deleted).***"
I spoke too soon. Actually trying this exposed the fact that FOR UPDATE does
not work with aggregates. Something I would have discovered earlier if I had
read the documentation all the way through.
--
Adrian Klaver
aklaver@comcast.net

Re: Best approach for a "gap-less" sequence

From
elein
Date:
On Mon, Aug 14, 2006 at 02:46:17PM -0700, Adrian Klaver wrote:
> On Monday 14 August 2006 01:59 pm, Brad Nicholson wrote:
> > On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote:
> > > Jorge Godoy wrote:
> > > > Chris <dmagick@gmail.com> writes:
> > > >>I'm not sure what type of lock you'd need to make sure no other
> > > >> transactions updated the table (see
> > > >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in
> > > >> theory" something like this should work:
> > > >>
> > > >>begin;
> > > >>select id from table order by id desc limit 1;
> > > >>insert into table (id, blah) values (id+1, 'blah');
> > > >>commit;
> > > >
> > > > This is part of the solution, yes.  But I would still need locking this
> > > > table so that no other concurrent transaction gets another "id".  I
> > > > don't want to lock the main table --
> > >
> > > Wouldn't SELECT ... FOR UPDATE give you the row lock you need without
> > > locking the table?
> >
> > Nope, concurrent transactions won't work.
> >
> > Let current max id = x
> >
> > Transaction 1 (t1) does a select max(id) for update, gets a lock on the
> > last tuple at the time of the select, and gets x as a value for max id
> >
> > Transaction 2 (t2) does a select max(id) for update, has to wait for t1
> > to release its lock.
> >
> > t1 inserts (x+1) as the new max id of the table.  t1 releases its lock
> >
> > t2 is granted the lock on the tuple it has been waiting for, which
> > contains the max id of x
> >
> > t2 tries to insert a value of x+1, insert fails (if it doesn't, you
> > really want to have a close look at your constraints :-)
> >
>
> I am still working through this stuff myself, but the following excerpt from
> the documentation would seem to contradict what you are saying. See the part
> marked with ***. t2 should see a new max(id) after t1 commits and therefore
> insert(x+1) would succeed.
>
> http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE
>
> "FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as
> though for update. This prevents them from being modified or deleted by other
> transactions until the current transaction ends. That is, other transactions
> that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these rows will be
> blocked until the current transaction ends.*** Also, if an UPDATE, DELETE, or
> SELECT FOR UPDATE from another transaction has already locked a selected row
> or rows, SELECT FOR UPDATE will wait for the other transaction to complete,
> and will then lock and return the updated row (or no row, if the row was
> deleted).***"

If this is true the solution for a transactional, gapless sequence is possible
for table.gl_id  where updated from count.gl_id.  It is simple.  However, it
*depends* on the fact that the second transaction getting the newly updated
record from the first transaction.  It seems pretty clear, not counting aggregates,
that this is true from this doc snippet.  Speak now, if someone doesn't read it
this way!  I'd like to understand why.

If it weren't true, there would also be a workaround which caught a duplicate
value and tried again, looping.

I may publish the gapless sequence technique on general bits if there is no
discrepancy in the understanding of the status of the second transaction's
row value (updated).

--elein
elein@varlena.com




> --
> Adrian Klaver
> aklaver@comcast.net
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Best approach for a "gap-less" sequence

From
Berend Tober
Date:
elein wrote:

> On Mon, Aug 14, 2006 at 02:46:17PM -0700, Adrian Klaver wrote:
>
>>On Monday 14 August 2006 01:59 pm, Brad Nicholson wrote:
>>
>>>On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote:
>>>>Wouldn't SELECT ... FOR UPDATE give you the row lock you need without
>>>>locking the table?
>
> If this is true the solution for a transactional, gapless sequence ...
> I may publish the gapless sequence technique on general bits if there is no
> discrepancy in the understanding of the status of the second transaction's
> row value (updated).


/*
Hi Elein, I'm an avid reader of your General Bits column.

One of my favorite sayings is "nothing beats empirical evidence", so
regardless of what people interpret the documentation to say, here is a
simplified description of an actual working implementation of how it is
done:

The background:

A business requirement is to generate table rows that have uniformly
increasing, whole number sequences, i.e., the "gap-less" sequence. In
this particular case the situation requires multiple such sequences
within the same table -- for each employee, there is a
uniformly-sequenced set of expense reports. I use the term "compound
sequence" for this situation because the expense reports are sequenced
independently on a per-employee basis.

Specifically, I have employee data in
*/

CREATE SCHEMA test;
SET search_path = test, public, pg_catalog;

CREATE TABLE employee
(
    employee_pk SERIAL, -- Identifies the employee.
    /*
    ...lots of non-relevent columns omitted ...
    */
    expense_report_seq int4 DEFAULT 0, -- Compound sequence control.
       CONSTRAINT employee_pkey PRIMARY KEY (employee_pk)
);

/*
The expense_report_seq column stores the most-recently-used expense
report number for each employee, i.e., it is the control value for the
compound sequences that appear in
*/

CREATE TABLE expense
(
    employee_pk int4 NOT NULL,
    expense_report_pk int4 NOT NULL,
    /*
    ...lots of non-relevent columns omitted ...
    */
   CONSTRAINT expense_report_pkey PRIMARY KEY (employee_pk,
expense_report_pk),
    CONSTRAINT expense_fkey FOREIGN KEY (employee_pk)
        REFERENCES employee (employee_pk)
);


/*
A before-insert trigger handles the compound sequence:
*/

CREATE OR REPLACE FUNCTION expense_bit()
   RETURNS "trigger" AS
'
BEGIN
    UPDATE employee
        SET expense_report_seq = (expense_report_seq + 1)
        WHERE employee_pk = NEW.employee_pk;
    SELECT INTO NEW.expense_report_pk expense_report_seq
        FROM employee WHERE employee_pk = NEW.employee_pk;
   RETURN new;
END;
'
   LANGUAGE 'plpgsql' VOLATILE;

/*
Other triggers handle allowed deletion and correction of some expense
report data under certain circumstances.
*/

CREATE TRIGGER expense_bit
   BEFORE INSERT
   ON expense
   FOR EACH ROW
   EXECUTE PROCEDURE expense_bit();


/*
Turns out the SELECT ... FOR UPDATE syntax is not even required because
code inside functions, particularly trigger functions as illustrated
here, is treated as a transaction and the UPDATE statement locks the
effected row until the trigger completes.
*/

-- Then test it:

INSERT INTO employee DEFAULT VALUES;
INSERT INTO employee DEFAULT VALUES;

-- In two separate sessions, run many competing inserts:

SET search_path = test, public, pg_catalog;
INSERT INTO expense VALUES (1);
INSERT INTO expense VALUES (1);
/*
    ...
*/
INSERT INTO expense VALUES (1);


INSERT INTO expense VALUES (2);
INSERT INTO expense VALUES (2);
/*
    ...
*/
INSERT INTO expense VALUES (2);

-- And check your results:
SELECT * FROM expense order by 1,2;
/*
Regards,
Berend Tober
*/


Re: Best approach for a "gap-less" sequence

From
"Dawid Kuroczko"
Date:
On 8/12/06, Jorge Godoy <jgodoy@gmail.com> wrote:
> I was trying to solve a problem on an old system and realized that there might
> be some better approach for doing what I need.
>
> We have some documents that need to be ordered sequentially and without gaps.
> I could use a sequence, but if the transaction fails then when I rollback the
> sequence will already have been incremented.
>
> So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to
> it, read the value, increase it, do what I need and then I COMMIT the
> transaction, ensuring that the sequence has no gaps.
>
> Is there a better way to guarantee that there will be no gaps in my sequence
> if something goes wrong with my transaction?

Hmm, I would do it this way:

-- First prepare a table for keeping gapless sequence, say:
CREATE TABLE gapless_seq (
    gseq_name varchar(256) PRIMARY KEY,
    gseq_value integer NOT NULL
);
-- ...and populate it:
INSERT INTO gapless_seq VALUES('tax_id', '1');

-- then create a function to retrieve the values:
CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$
    DECLARE
       n integer;
    BEGIN
       SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t
FOR UPDATE;
       UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t;
       RETURN n;
    END;
$$ STABLE LANGUAGE PLpgsql;

-- ...and use it as default in table definiton
CREATE TABLE taxdata (
    tax_id integer PRIMARY KEY DEFAULT gseq_nextval('tax_id'),
    customer text,
    when timestamptz
);

...etc.  SELECT ... FOR UPDATE woud ensure a row lock on "gapless sequence",
a PLpgsql function would make a nice wrapper for it (so it would be usable more
or less similar to real sequences), and it should work.

I did not test the code right now, but I've written something similar to
it some time ago, and it worked fine.  Remember to vacuum gapless_seq
table frequently and don't expect stellar performance from it.

   Regards,
      Dawid

Re: Best approach for a "gap-less" sequence

From
"Dawid Kuroczko"
Date:
On 8/16/06, Dawid Kuroczko <qnex42@gmail.com> wrote:
> -- then create a function to retrieve the values:
> CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$
>     DECLARE
>        n integer;
>     BEGIN
>        SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t
> FOR UPDATE;
>        UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t;
>        RETURN n;
>     END;
> $$ STABLE LANGUAGE PLpgsql;
       ^^^^^^^^^^^
VOLATILE of course!


Regards,
   Dawid

Re: Best approach for a "gap-less" sequence

From
Adrian Klaver
Date:
On Wednesday 16 August 2006 10:59 am, elein wrote:
> On Mon, Aug 14, 2006 at 02:46:17PM -0700, Adrian Klaver wrote:
> > On Monday 14 August 2006 01:59 pm, Brad Nicholson wrote:
> > > On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote:
> > > > Jorge Godoy wrote:
> > > > > Chris <dmagick@gmail.com> writes:
> > > > >>I'm not sure what type of lock you'd need to make sure no other
> > > > >> transactions updated the table (see
> > > > >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but
> > > > >> "in theory" something like this should work:
> > > > >>
> > > > >>begin;
> > > > >>select id from table order by id desc limit 1;
> > > > >>insert into table (id, blah) values (id+1, 'blah');
> > > > >>commit;
> > > > >
> > > > > This is part of the solution, yes.  But I would still need locking
> > > > > this table so that no other concurrent transaction gets another
> > > > > "id".  I don't want to lock the main table --
> > > >
> > > > Wouldn't SELECT ... FOR UPDATE give you the row lock you need without
> > > > locking the table?
> > >
> > > Nope, concurrent transactions won't work.
> > >
> > > Let current max id = x
> > >
> > > Transaction 1 (t1) does a select max(id) for update, gets a lock on the
> > > last tuple at the time of the select, and gets x as a value for max id
> > >
> > > Transaction 2 (t2) does a select max(id) for update, has to wait for t1
> > > to release its lock.
> > >
> > > t1 inserts (x+1) as the new max id of the table.  t1 releases its lock
> > >
> > > t2 is granted the lock on the tuple it has been waiting for, which
> > > contains the max id of x
> > >
> > > t2 tries to insert a value of x+1, insert fails (if it doesn't, you
> > > really want to have a close look at your constraints :-)
> >
> > I am still working through this stuff myself, but the following excerpt
> > from the documentation would seem to contradict what you are saying. See
> > the part marked with ***. t2 should see a new max(id) after t1 commits
> > and therefore insert(x+1) would succeed.
> >
> > http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UP
> >DATE-SHARE
> >
> > "FOR UPDATE causes the rows retrieved by the SELECT statement to be
> > locked as though for update. This prevents them from being modified or
> > deleted by other transactions until the current transaction ends. That
> > is, other transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE
> > of these rows will be blocked until the current transaction ends.***
> > Also, if an UPDATE, DELETE, or SELECT FOR UPDATE from another transaction
> > has already locked a selected row or rows, SELECT FOR UPDATE will wait
> > for the other transaction to complete, and will then lock and return the
> > updated row (or no row, if the row was deleted).***"
>
> If this is true the solution for a transactional, gapless sequence is
> possible for table.gl_id  where updated from count.gl_id.  It is simple.
> However, it *depends* on the fact that the second transaction getting the
> newly updated record from the first transaction.  It seems pretty clear,
> not counting aggregates, that this is true from this doc snippet.  Speak
> now, if someone doesn't read it this way!  I'd like to understand why.
>
> If it weren't true, there would also be a workaround which caught a
> duplicate value and tried again, looping.
>
> I may publish the gapless sequence technique on general bits if there is no
> discrepancy in the understanding of the status of the second transaction's
> row value (updated).
>
> --elein
> elein@varlena.com

After I discovered that aggregates did not work I did some simple tests
updating a single row table. As I far as I could determine the docs hold
true :) I only ran three transactions at a time but each saw the incremented
value from the previous transaction.
--
Adrian Klaver
aklaver@comcast.net

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
"Dawid Kuroczko" <qnex42@gmail.com> writes:

> I did not test the code right now, but I've written something similar to
> it some time ago, and it worked fine.  Remember to vacuum gapless_seq
> table frequently and don't expect stellar performance from it.

Interesting approach...  And I don't expect too much performance for it.  The
restriction of the gapless sequence makes it expected that there's some minor
delay somewhere.  It would be bad on "common" sequences, but not on
gapless. :-)

Thanks for the code...  It is a bit different from mine -- better, in
fact... ;-) -- and I could give it a try.

--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
elein <elein@varlena.com> writes:

> If this is true the solution for a transactional, gapless sequence is possible
> for table.gl_id  where updated from count.gl_id.  It is simple.  However, it
> *depends* on the fact that the second transaction getting the newly updated
> record from the first transaction.  It seems pretty clear, not counting aggregates,
> that this is true from this doc snippet.  Speak now, if someone doesn't read it
> this way!  I'd like to understand why.

I agre with your reading.

> If it weren't true, there would also be a workaround which caught a duplicate
> value and tried again, looping.

This is true, but really bad.  I believe a friend of mine had something like
that in another database server, but it really looked like an ugly hack...  Of
course, if it's the only way to solve the problem then we have to live with
it.

> I may publish the gapless sequence technique on general bits if there is no
> discrepancy in the understanding of the status of the second transaction's
> row value (updated).

I need more testing here.  But from what I tested, your reading looks right.


--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
Berend Tober <btober@seaworthysys.com> writes:

> A business requirement is to generate table rows that have uniformly
> increasing, whole number sequences, i.e., the "gap-less" sequence. In this
> particular case the situation requires multiple such sequences within the same
> table -- for each employee, there is a uniformly-sequenced set of expense
> reports. I use the term "compound sequence" for this situation because the
> expense reports are sequenced independently on a per-employee basis.

This is something that I'll also have to code ;-)  But the sequence for
"employees" would also be a sequential and gapless sequence here (yep, it's
composed of two gapless sequences, being one "fixed" and the other varying for
each new record inside the first sequence).

Is the performance degradation here too big?  I think that it might be lower
than with approaches that used some kind of locking...


I'm not sure, on this approach of yours, how's the probability of having
several records inserted on different transactions for one employee.  In my
cases I see that when I have one unique "filter" (employee, invoice, etc.) the
operation is done only by one person and consequently in one transaction
only, what makes it possible to adopt more complex -- and badly performant --
solutions (not that I want them, it's just that it wouldn't be noticeable in
the application as far as the user is concerned).  Do you have this
concurrence on the real application?

--
Jorge Godoy      <jgodoy@gmail.com>

Re: Best approach for a "gap-less" sequence

From
"Dawid Kuroczko"
Date:
On 8/17/06, Merlin Moncure <mmoncure@gmail.com> wrote:
> On 8/16/06, Dawid Kuroczko <qnex42@gmail.com> wrote:
> > -- then create a function to retrieve the values:
> > CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$
> >     DECLARE
> >        n integer;
> >     BEGIN
> >        SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t
> > FOR UPDATE;
> >        UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t;
> >        RETURN n;
> >     END;
> > $$ STABLE LANGUAGE PLpgsql;
> >
>
> the problem here is if you have two concurrent transactions which call
> this funtion, it is possible for them both to return the same sequence
> number in read comitted mode.  Using this funtion outside of
> transactions is no different that using a sequence except that it is
> slower.

Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
The first-to-obtain the gapless sequence transaction will establish
a lock onthe "tax_id" row.  The other transaction will block until
the first transaction finishes (and the row is updated) and will
establish the row lock on it.

"FOR UPDATE" effectively serializes access to this row for all
transactions wanting to update it, even in read commited
mode.  Your statement would be true if I didn't use "SELECT
... FOR UPDATE"; but just a plain SELECT there.

   Regards,
       Dawid


PS: http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE

Re: Best approach for a "gap-less" sequence

From
Berend Tober
Date:
Jorge Godoy wrote:
> Berend Tober <btober@seaworthysys.com> writes:
>
>>A business requirement is to generate table rows that have uniformly
>>increasing, whole number sequences, i.e., the "gap-less" sequence.
>
> This is something that I'll also have to code ;-)  But the sequence for
> "employees" would also be a sequential and gapless sequence here (yep, it's
> composed of two gapless sequences, being one "fixed" and the other varying for
> each new record inside the first sequence).

Shouldn't be a problem to implement the same approach one level deeper
like that.


> Is the performance degradation here too big?

That is where my empirical evidence is somewhat deficient. The
application from which that example was drawn is used routinely by less
than five persons, so any performance degradation is not evident.

 > I think that it might be lower
 > than with approaches that used some kind of locking...

Your comment is confusing. The example DOES use locking -- the UPDATE
statement inside the trigger locks the modified employee row until the
trigger function completes -- I'm pretty sure I pointed out that. This
is the minimally sufficient locking that has to happen to satisfy this
business requirement. The original poster (was that you?) was using
table-level locking, which is unnecessary in this approach.

> I'm not sure, on this approach of yours, how's the probability of having
> several records inserted on different transactions for one employee.  In my
> cases I see that when I have one unique "filter" (employee, invoice, etc.) the
> operation is done only by one person and consequently in one transaction
> only, what makes it possible to adopt more complex -- and badly performant --
> solutions (not that I want them, it's just that it wouldn't be noticeable in
> the application as far as the user is concerned).  Do you have this
> concurrence on the real application?

The design was intended to withstand concurrent use. As explained in

"http://www.postgresql.org/docs/7.4/static/explicit-locking.html#LOCKING-ROWS",

"A row-level lock on a specific row is automatically acquired when the
row is updated (or deleted or marked for update). The lock is held until
the transaction commits or rolls back. Row-level locks do not affect
data querying; they block writers to the same row only."

This is why you always UPDATE before SELECT. Since the trigger locks the
row first, a second transaction initiated before completion of the first
transaction is blocked until the first transaction commits.


Regards,
Berend Tober

Re: Best approach for a "gap-less" sequence

From
AgentM
Date:
Just in case no one else has brought it up- 8.1+ supports 2PC and
savepoints, so one alternative would be to run your standard
insertion operations in a prepared transaction or savepoint block. If
you get so far as being able to prepare the transaction/complete the
savepoint block, you should be able to snag a sequence id and commit
everything.

-M

Re: Best approach for a "gap-less" sequence

From
"Merlin Moncure"
Date:
On 8/17/06, Dawid Kuroczko <qnex42@gmail.com> wrote:
> On 8/17/06, Merlin Moncure <mmoncure@gmail.com> wrote:
> > On 8/16/06, Dawid Kuroczko <qnex42@gmail.com> wrote:
> > > -- then create a function to retrieve the values:
> > > CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$
> > >     DECLARE
> > >        n integer;
> > >     BEGIN
> > >        SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t
> > > FOR UPDATE;
> > >        UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t;
> > >        RETURN n;
> > >     END;
> > > $$ STABLE LANGUAGE PLpgsql;
> > >
> >
> > the problem here is if you have two concurrent transactions which call
> > this funtion, it is possible for them both to return the same sequence
> > number in read comitted mode.  Using this funtion outside of
> > transactions is no different that using a sequence except that it is
> > slower.
>
> Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
> The first-to-obtain the gapless sequence transaction will establish
> a lock onthe "tax_id" row.  The other transaction will block until
> the first transaction finishes (and the row is updated) and will
> establish the row lock on it.

yes, you are right...i didnt think the problem through properly.

merlin

Re: Best approach for a "gap-less" sequence

From
Brad Nicholson
Date:
On Thu, 2006-08-17 at 12:12 -0400, Merlin Moncure wrote:
> On 8/17/06, Dawid Kuroczko <qnex42@gmail.com> wrote:
> > On 8/17/06, Merlin Moncure <mmoncure@gmail.com> wrote:
> > > On 8/16/06, Dawid Kuroczko <qnex42@gmail.com> wrote:
> > > > -- then create a function to retrieve the values:
> > > > CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$
> > > >     DECLARE
> > > >        n integer;
> > > >     BEGIN
> > > >        SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t
> > > > FOR UPDATE;
> > > >        UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t;
> > > >        RETURN n;
> > > >     END;
> > > > $$ STABLE LANGUAGE PLpgsql;
> > > >
> > >
> > > the problem here is if you have two concurrent transactions which call
> > > this funtion, it is possible for them both to return the same sequence
> > > number in read comitted mode.  Using this funtion outside of
> > > transactions is no different that using a sequence except that it is
> > > slower.
> >
> > Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
> > The first-to-obtain the gapless sequence transaction will establish
> > a lock onthe "tax_id" row.  The other transaction will block until
> > the first transaction finishes (and the row is updated) and will
> > establish the row lock on it.
>
> yes, you are right...i didnt think the problem through properly.

Lets just hope the performance on a concurrent system is not a
requirement of such a system...

Brad.


Re: Best approach for a "gap-less" sequence

From
"Merlin Moncure"
Date:
On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote:

> > > Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
> > > The first-to-obtain the gapless sequence transaction will establish
> > > a lock onthe "tax_id" row.  The other transaction will block until
> > > the first transaction finishes (and the row is updated) and will
> > > establish the row lock on it.
> >
> > yes, you are right...i didnt think the problem through properly.
>
> Lets just hope the performance on a concurrent system is not a
> requirement of such a system...
>

right, if the transations are long running, there is a big problem as
they are serialized around access to the sequence.  however this is
better than the control record approach because control record have
problems with mvcc bloat.  concurrent performance will of course be
awful.

a good compomise in some cases is to save off canceled transactions
ids' in a free list   you would still have to deal with transactions
that were not gracefully cancelled though.

merlin

Re: Best approach for a "gap-less" sequence

From
Scott Marlowe
Date:
On Thu, 2006-08-17 at 15:07, Merlin Moncure wrote:
> On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote:
>
> > > > Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
> > > > The first-to-obtain the gapless sequence transaction will establish
> > > > a lock onthe "tax_id" row.  The other transaction will block until
> > > > the first transaction finishes (and the row is updated) and will
> > > > establish the row lock on it.
> > >
> > > yes, you are right...i didnt think the problem through properly.
> >
> > Lets just hope the performance on a concurrent system is not a
> > requirement of such a system...
> >
>
> right, if the transations are long running, there is a big problem as
> they are serialized around access to the sequence.  however this is
> better than the control record approach because control record have
> problems with mvcc bloat.  concurrent performance will of course be
> awful.
>
> a good compomise in some cases is to save off canceled transactions
> ids' in a free list   you would still have to deal with transactions
> that were not gracefully cancelled though.

Is it not possible in some circumstances to create the invoice first,
THEN assign a sequential ID after creation?

Re: Best approach for a "gap-less" sequence

From
Brad Nicholson
Date:
On Thu, 2006-08-17 at 16:07 -0400, Merlin Moncure wrote:
> On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote:
>
> > > > Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
> > > > The first-to-obtain the gapless sequence transaction will establish
> > > > a lock onthe "tax_id" row.  The other transaction will block until
> > > > the first transaction finishes (and the row is updated) and will
> > > > establish the row lock on it.
> > >
> > > yes, you are right...i didnt think the problem through properly.
> >
> > Lets just hope the performance on a concurrent system is not a
> > requirement of such a system...
> >
>
> right, if the transations are long running, there is a big problem as
> they are serialized around access to the sequence.  however this is
> better than the control record approach because control record have
> problems with mvcc bloat.  concurrent performance will of course be
> awful.

This effect will be magnified if there other long running transactions
(pg_dump and pre 8.2 vacuum, I'm looking at you), as the dead tuples
from the updates will start to pile up, and reads to the table slow
down, locks persist for longer...


Re: Best approach for a "gap-less" sequence

From
Brad Nicholson
Date:
On Thu, 2006-08-17 at 15:13 -0500, Scott Marlowe wrote:
> On Thu, 2006-08-17 at 15:07, Merlin Moncure wrote:
> > On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote:
> >
> > > > > Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
> > > > > The first-to-obtain the gapless sequence transaction will establish
> > > > > a lock onthe "tax_id" row.  The other transaction will block until
> > > > > the first transaction finishes (and the row is updated) and will
> > > > > establish the row lock on it.
> > > >
> > > > yes, you are right...i didnt think the problem through properly.
> > >
> > > Lets just hope the performance on a concurrent system is not a
> > > requirement of such a system...
> > >
> >
> > right, if the transations are long running, there is a big problem as
> > they are serialized around access to the sequence.  however this is
> > better than the control record approach because control record have
> > problems with mvcc bloat.  concurrent performance will of course be
> > awful.
> >
> > a good compomise in some cases is to save off canceled transactions
> > ids' in a free list   you would still have to deal with transactions
> > that were not gracefully cancelled though.
>
> Is it not possible in some circumstances to create the invoice first,
> THEN assign a sequential ID after creation?

If speed of access was an issue, that's how I'd look at doing it - batch
assign them after the fact.

Brad.


Re: Best approach for a "gap-less" sequence

From
"Dawid Kuroczko"
Date:
On 8/17/06, Merlin Moncure <mmoncure@gmail.com> wrote:
> On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote:
> > > > Hmm, I think you are wrong.  There is a SELECT ... FOR UPDATE;
> > > > The first-to-obtain the gapless sequence transaction will establish
> > > > a lock onthe "tax_id" row.  The other transaction will block until
> > > > the first transaction finishes (and the row is updated) and will
> > > > establish the row lock on it.
> > >
> > > yes, you are right...i didnt think the problem through properly.
> >
> > Lets just hope the performance on a concurrent system is not a
> > requirement of such a system...
>
> right, if the transations are long running, there is a big problem as
> they are serialized around access to the sequence.  however this is
> better than the control record approach because control record have
> problems with mvcc bloat.  concurrent performance will of course be
> awful.
>
> a good compomise in some cases is to save off canceled transactions
> ids' in a free list   you would still have to deal with transactions
> that were not gracefully cancelled though.

I believe there is no ON ROLLBACK trigger...  One might periodicaly
check the table for "gaps" and put them into "reuse me" table,
say a left outer join between generate_series and a real table,
withing "last known continuous id" and "max(id) from table".

However, if "gapless sequence" table bloat while long running transactions
is a problem, the other approach might be minimalising the size of this
"gapless sequence" table, say:

CREATE TABLE gapless_seq (
  gseq_value int NOT NULL;
);
INSER INTO gapless_seq VALUES(1);

...and then SELECT gseq_vaule FROM gapless_seq FOR UPDATE and
UPDATE gapless_seq SET gseq_value = gseq_value+1;

...this saves a few bytes per each update.  If we keep only a single
row, there is no need for an index (there is no vilibility information
in the index, so index would be useless in our case), and no need
for unique constraint checking (assuming we are sure noone will
ever ever insert additional rows -- it would be a disaster -- CREATE RULE
to ignore any INSERTs/DELETEs just to be sure.

Of course this does not solve the concurency issue, but I'm assuming
that if someone wants a gapless sequence, she wants its generation
serialized (or would assign sequences in post-transaction batches).

   Regards,
       Dawid

Re: Best approach for a "gap-less" sequence

From
elein
Date:
The technique for a single part gapless sequence and a two-part gapless
sequence has been published today at http://www.varlena.com/GeneralBits.

I'd be interested to hear of high concurrency usage that would break it
or be notably slow.

--elein
elein@varlena.com

Re: Best approach for a "gap-less" sequence

From
Jorge Godoy
Date:
elein <elein@varlena.com> writes:

> The technique for a single part gapless sequence and a two-part gapless
> sequence has been published today at http://www.varlena.com/GeneralBits.
>
> I'd be interested to hear of high concurrency usage that would break it
> or be notably slow.
>
> --elein
> elein@varlena.com

Thanks for the article!  It will be really helpful since GeneralBits is a
reference point for PostgreSQL users! :-)


--
Jorge Godoy      <jgodoy@gmail.com>