Thread: How to create "auto-increment" field WITHOUT a sequence object?

How to create "auto-increment" field WITHOUT a sequence object?

From
Dmitry Koterov
Date:
Hello.

I need to create an auto-increment field on a table WITHOUT using sequences:

CREATE TABLE tbl(
  name TEXT,
  uniq_id INTEGER
);

Each INSERT to this table must generate a new uniq_id which is distinct from all others. 

The problem is that these INSERTs are rolled back oftenly (i.e. they are executed within a transaction block which is rolled back time to time), this is an existing design of the current architecture and unfortunately we have to live with it. And I need as compact uniq_id generation (with minimum "holes") as it possible - this is a VERY important requirement (to export these values into external systems which accepts only IDs limited from 1 to 100000). 

So I cannot use sequences: sequence value is obviously not rolled back, so if I insert nextval(...) as uniq_id, I will have large holes (because of often transaction rollbacks) and exhaust 100000 uniq_ids very fast. How to deal with all this without sequences?

I tried

BEGIN;
LOCK TABLE tbl;
INSERT INTO tbl(uniq_id) VALUES((SELECT max(uniq_id) FROM tbl) + 1);
COMMIT;

but seems it performs too hard locking - time to time this query is timed out (or sometimes deadlocks with other queries).

Is there any other, less hard, locking which allow me to guarantee that no INSERTs will be performed into tbl between max() calculation and UPDATE query itself, but does not lock the whole table?

Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Dmitriy Igrishin
Date:
Hey Dmitry,

2011/6/30 Dmitry Koterov <dmitry.koterov@gmail.com>
Hello.

I need to create an auto-increment field on a table WITHOUT using sequences:

CREATE TABLE tbl(
  name TEXT,
  uniq_id INTEGER
);

Each INSERT to this table must generate a new uniq_id which is distinct from all others. 

The problem is that these INSERTs are rolled back oftenly (i.e. they are executed within a transaction block which is rolled back time to time), this is an existing design of the current architecture and unfortunately we have to live with it. And I need as compact uniq_id generation (with minimum "holes") as it possible - this is a VERY important requirement (to export these values into external systems which accepts only IDs limited from 1 to 100000). 

So I cannot use sequences: sequence value is obviously not rolled back, so if I insert nextval(...) as uniq_id, I will have large holes (because of often transaction rollbacks) and exhaust 100000 uniq_ids very fast. How to deal with all this without sequences?

I tried

BEGIN;
LOCK TABLE tbl;
INSERT INTO tbl(uniq_id) VALUES((SELECT max(uniq_id) FROM tbl) + 1);
COMMIT;

Consider to create table with column of type integer and
write a function which will perform SELECT FOR UPDATE ...
and returns the next value, i.e.
BEGIN;
INSERT INTO tbl(uniq_id) SELECT uniq_id_generator(); -- SELECT FOR UPDATE inside
COMMIT; -- or ROLLBACK

--
// Dmitriy.


Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Craig Ringer
Date:
On 07/01/2011 02:20 AM, Dmitry Koterov wrote:
>
> The problem is that these INSERTs are rolled back oftenly (i.e. they
> are executed within a transaction block which is rolled back time to
> time), this is an existing design of the current architecture and
> unfortunately we have to live with it. And I need as compact uniq_id
> generation (with minimum "holes") as it possible - this is a VERY
> important requirement (to export these values into external systems
> which accepts only IDs limited from 1 to 100000).

What you want is often referred to as a "gapless sequence". Searching
the mailing list archives for that will find you lots of information.

> but seems it performs too hard locking - time to time this query is
> timed out (or sometimes deadlocks with other queries).
You'll have that problem with any gapless sequence approach. You'll have
to be prepared to re-try failed transactions after deadlocks, or be
*extremely* strict about the order in which you perform operations so
you avoid any deadlocks.

--
Craig Ringer

Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Chris Travers
Date:
Having done gapless numbering for some users of accounting software,
there are two suggestions I would make.

The first is that any sort of gapless numbering inherently runs into
scalability.  You HAVE to lock relevant records, and this means that
only one insert can run at a time, and it must commit before the next
insert can run.  This means you have to keep your transactions short
and predictable in terms of table order.  The suggestion of using for
update is a good one, but it doesn't entirely get rid of the problem,
which is inherent in ensuring gapless numbering in a system with
concurrent transactions.

The second is that you absolutely should use this approach as rarely
as you can get away with.  If it isn't required, don't use it!

Best Wishes,
Chris Travers

Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Dmitriy Igrishin
Date:
Hey Chris,

The suggestion of using for
update is a good one, but it doesn't entirely get rid of the problem,
which is inherent in ensuring gapless numbering in a system with
concurrent transactions.
Why not?

I mean the following solution:

CREATE TABLE myseq(tabnm text not null, lastid integer not null);

INSERT INTO myseq SELECT 'mytab', 0; -- initialization

CREATE OR REPLACE FUNCTION public.myseq_nextval(a_tabnm text)
 RETURNS integer
 LANGUAGE sql
 STRICT
AS $function$
UPDATE myseq SET lastid = li + 1 FROM
  (SELECT lastid li FROM myseq WHERE tabnm = $1 FOR UPDATE) foo
  RETURNING lastid;
$function$

-- Test

dmitigr=> BEGIN;
BEGIN
dmitigr=> SELECT myseq_nextval('mytab');
 myseq_nextval
---------------
             1
(1 row)

dmitigr=> ROLLBACK;
ROLLBACK
dmitigr=> SELECT * FROM myseq;
 tabnm | lastid
-------+--------
 mytab |      0
(1 row)

So, with this approach you'll get a lock only on INSERT.

dmitigr=> CREATE TABLE mytab(id integer not null DEFAULT myseq_nextval('mytab'));
CREATE TABLE
dmitigr=> INSERT INTO mytab DEFAULT VALUES;
INSERT 0 1
dmitigr=> INSERT INTO mytab DEFAULT VALUES;
INSERT 0 1
dmitigr=> SELECT * FROM mytab;
 id
----
  1
  2
(2 rows)


--
// Dmitriy.


Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Chris Travers
Date:
On Fri, Jul 1, 2011 at 1:16 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
> Hey Chris,
>
>> The suggestion of using for
>> update is a good one, but it doesn't entirely get rid of the problem,
>> which is inherent in ensuring gapless numbering in a system with
>> concurrent transactions.
>
> Why not?

Just because it locks less doesn't mean that it doesn't lock.

The point is:  if gaps are acceptable then the sequences which exist
outside of transactions are idea.  If gaps are not acceptable, you
have to lock and force transactions through the system serially which
means a possibility of deadlocks and performance issues.  These issues
are inherent in gapless numbering because you can't get a gapless
sequence when things roll back without such locks.
>
> I mean the following solution:
>
> CREATE TABLE myseq(tabnm text not null, lastid integer not null);
>
> INSERT INTO myseq SELECT 'mytab', 0; -- initialization
>
> CREATE OR REPLACE FUNCTION public.myseq_nextval(a_tabnm text)
>  RETURNS integer
>  LANGUAGE sql
>  STRICT
> AS $function$
> UPDATE myseq SET lastid = li + 1 FROM
>   (SELECT lastid li FROM myseq WHERE tabnm = $1 FOR UPDATE) foo
>   RETURNING lastid;
> $function$
>
> -- Test
>
> dmitigr=> BEGIN;
> BEGIN
> dmitigr=> SELECT myseq_nextval('mytab');
>  myseq_nextval
> ---------------
>              1
> (1 row)
>
> dmitigr=> ROLLBACK;
> ROLLBACK
> dmitigr=> SELECT * FROM myseq;
>  tabnm | lastid
> -------+--------
>  mytab |      0
> (1 row)
>
> So, with this approach you'll get a lock only on INSERT.

True.  But the point us that you MUST lock on insert to get gapless
sequences, and this creates inherent problems in terms of performance
and concurrency, so that you should not use it unless you really have
no other choice (i.e. because the tax authorities demand it).

Best Wishes,
Chris Travers

Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Dmitriy Igrishin
Date:


2011/7/1 Chris Travers <chris.travers@gmail.com>
On Fri, Jul 1, 2011 at 1:16 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
> Hey Chris,
>
>> The suggestion of using for
>> update is a good one, but it doesn't entirely get rid of the problem,
>> which is inherent in ensuring gapless numbering in a system with
>> concurrent transactions.
>
> Why not?

Just because it locks less doesn't mean that it doesn't lock.

The point is:  if gaps are acceptable then the sequences which exist
outside of transactions are idea.  If gaps are not acceptable, you
have to lock and force transactions through the system serially which
means a possibility of deadlocks and performance issues.  These issues
are inherent in gapless numbering because you can't get a gapless
sequence when things roll back without such locks.
Then I don't clearly understand the existence of locks (the LOCK
command, SELECT FOR UPDATE clause and so on) if the usage
of them gives only problems...

>
> I mean the following solution:
>
> CREATE TABLE myseq(tabnm text not null, lastid integer not null);
>
> INSERT INTO myseq SELECT 'mytab', 0; -- initialization
>
> CREATE OR REPLACE FUNCTION public.myseq_nextval(a_tabnm text)
>  RETURNS integer
>  LANGUAGE sql
>  STRICT
> AS $function$
> UPDATE myseq SET lastid = li + 1 FROM
>   (SELECT lastid li FROM myseq WHERE tabnm = $1 FOR UPDATE) foo
>   RETURNING lastid;
> $function$
>
> -- Test
>
> dmitigr=> BEGIN;
> BEGIN
> dmitigr=> SELECT myseq_nextval('mytab');
>  myseq_nextval
> ---------------
>              1
> (1 row)
>
> dmitigr=> ROLLBACK;
> ROLLBACK
> dmitigr=> SELECT * FROM myseq;
>  tabnm | lastid
> -------+--------
>  mytab |      0
> (1 row)
>
> So, with this approach you'll get a lock only on INSERT.

True.  But the point us that you MUST lock on insert to get gapless
sequences, and this creates inherent problems in terms of performance
and concurrency, so that you should not use it unless you really have
no other choice (i.e. because the tax authorities demand it).
Sure, but, again, why LOCK and SELECT FOR UPDATE exists ?

Best Wishes,
Chris Travers



--
// Dmitriy.


Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Craig Ringer
Date:
On 1/07/2011 4:21 PM, Chris Travers wrote:

> means a possibility of deadlocks and performance issues.  These issues
> are inherent in gapless numbering because you can't get a gapless
> sequence when things roll back without such locks.

Actually, another approach that allows parallel transactions is (at
least theoretically) possible. You can hand out IDs as transactions
request them, and if a transaction rolls back you can abort it *and*
*all* *transactions* *given* *higher* *IDs*, then re-issue the lot. I
guess that could be useful if transaction failure was extremely unlikely
and you needed to have lots of work in flight at once.

In practice I don't think it'd be very useful to do this, because
transactions would still have to commit in the order they obtained IDs
in, so a slow or blocked transaction would still stall the system just
like it does with lock-based gapless numbering. Also, once a later
transaction had obtained an ID prior transactions couldn't obtain any
more IDs.

I'm not sure this is possible in Pg without modifying the server,
either. It'd be kind of interesting in a useless-toy-project kind of way.

--
Craig Ringer

Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Vincent Veyron
Date:
Le vendredi 01 juillet 2011 à 12:28 +0400, Dmitriy Igrishin a écrit :


> Then I don't clearly understand the existence of locks (the LOCK
> command, SELECT FOR UPDATE clause and so on) if the usage
> of them gives only problems...
>

Chris already explained why twice :

"you MUST lock on insert to get gapless sequences"

Can't you just :
-create the records with a regular sequence, that will have gaps
-when you want to export, number an additional column from 1 to 10 000
and use that as key
?
--
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique


Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Dmitriy Igrishin
Date:
Hey Vincent,

2011/7/3 Vincent Veyron <vv.lists@wanadoo.fr>
Le vendredi 01 juillet 2011 à 12:28 +0400, Dmitriy Igrishin a écrit :


> Then I don't clearly understand the existence of locks (the LOCK
> command, SELECT FOR UPDATE clause and so on) if the usage
> of them gives only problems...
>

Chris already explained why twice :

"you MUST lock on insert to get gapless sequences"
Not me :-). The OP must do it. So, what problem here? Deadlocks?
Again, if deadlocks are so dangerous, why the LOCK command exists?


Can't you just :
-create the records with a regular sequence, that will have gaps
-when you want to export, number an additional column from 1 to 10 000
and use that as key
?
I don't use any locks explicitly :-)
--
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique




--
// Dmitriy.


Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Ireneusz Pluta
Date:
W dniu 2011-06-30 20:20, Dmitry Koterov pisze:
> And I need as compact uniq_id generation (with minimum "holes") as it possible - this is a VERY
> important requirement (to export these values into external systems which accepts only IDs limited
> from 1 to 100000).
>
> So I cannot use sequences: sequence value is obviously not rolled back, so if I insert
> nextval(...) as uniq_id, I will have large holes (because of often transaction rollbacks) and
> exhaust 100000 uniq_ids very fast. How to deal with all this without sequences?

You may use dense_rank() (or even rank()) window function to map your sequence-with-gaps to a
no-gap-id which will be used for exports.

Consider this:

test=# select uniq_id_with_gaps, dense_rank() over (order by uniq_id_with_gaps) as
uniq_id_without_gaps from (select generate_series(1, 100, 7) as uniq_id_with_gaps) a;
  uniq_id_with_gaps | uniq_id_without_gaps
-------------------+----------------------
                  1 |                    1
                  8 |                    2
                 15 |                    3
                 22 |                    4
                 29 |                    5
                 36 |                    6
                 43 |                    7
                 50 |                    8
                 57 |                    9
                 64 |                   10
                 71 |                   11
                 78 |                   12
                 85 |                   13
                 92 |                   14
                 99 |                   15

Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Chris Travers
Date:
On Sun, Jul 3, 2011 at 7:25 AM, Ireneusz Pluta <ipluta@wp.pl> wrote:

> You may use dense_rank() (or even rank()) window function to map your
> sequence-with-gaps to a no-gap-id which will be used for exports.
>

The typical case where gapless numbering comes up is something like this:

In Greece, you go get invoice paper from the tax office which is
numbered in sequence and the government gets a list of the invoice
forms you have purchased.  You then print the invoices on those paper
forms, and must number the invoices sequentially and without gaps.  In
the case of an audit, all paper forms obtained must be accounted for
as must all gaps in numbering.  You MUST be able to connect each
sequential invoice number (internally generated) to each invoice form
(numbered at the tax office).

In this case you really have no choice but to lock some records,
generate a new gapless id, and save/print it.  Naturally this causes
the sorts of problems mentioned.

Best Wishes,
Chris Travers

Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Vincent Veyron
Date:
Le dimanche 03 juillet 2011 à 18:10 +0400, Dmitriy Igrishin a écrit :

>
> Not me :-). The OP must do it.

Duh! sorry about that. Indeed, I confused you with Dmitry.

--
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique


Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Alban Hertroys
Date:
On 3 Jul 2011, at 16:10, Dmitriy Igrishin wrote:

> "you MUST lock on insert to get gapless sequences"
> Not me :-). The OP must do it. So, what problem here? Deadlocks?
> Again, if deadlocks are so dangerous, why the LOCK command exists?

It's not deadlocks, it's concurrent updates that are the trouble. If you don't lock, you run the risk for two records
beingassigned the same number concurrently. 

With a unique constraint added into the mix (and there should be one) that means that one of the transactions will fail
theunique constraint check on commit. 

It's possible to catch that in the client and redo the transaction with a new ID, but if that's not acceptable (for
examplebecause it matters which transaction got the ID first) then you need to lock records. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4e109f6e12092079216178!



Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Dmitriy Igrishin
Date:
Hey Alban,

2011/7/3 Alban Hertroys <dalroi@solfertje.student.utwente.nl>
On 3 Jul 2011, at 16:10, Dmitriy Igrishin wrote:

> "you MUST lock on insert to get gapless sequences"
> Not me :-). The OP must do it. So, what problem here? Deadlocks?
> Again, if deadlocks are so dangerous, why the LOCK command exists?

It's not deadlocks, it's concurrent updates that are the trouble. If you don't lock, you run the risk for two records being assigned the same number concurrently.
Thanks for clarify, but I know why resources must be locked when
they are used concurrently :-). See my previous post about SELECT FOR UPDATE ...
and I don't see the problem with it. As well as with the LOCK command.


With a unique constraint added into the mix (and there should be one) that means that one of the transactions will fail the unique constraint check on commit.

It's possible to catch that in the client and redo the transaction with a new ID, but if that's not acceptable (for example because it matters which transaction got the ID first) then you need to lock records.
Sure.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:1257,4e109f6512095013212184!





--
// Dmitriy.


Re: How to create "auto-increment" field WITHOUT a sequence object?

From
Vincent de Phily
Date:
On Sunday 03 July 2011 07:47:01 Chris Travers wrote:
> On Sun, Jul 3, 2011 at 7:25 AM, Ireneusz Pluta <ipluta@wp.pl> wrote:
> > You may use dense_rank() (or even rank()) window function to map your
> > sequence-with-gaps to a no-gap-id which will be used for exports.
>
> The typical case where gapless numbering comes up is something like this:
>
> In Greece, you go get invoice paper from the tax office which is
> numbered in sequence and the government gets a list of the invoice
> forms you have purchased.  You then print the invoices on those paper
> forms, and must number the invoices sequentially and without gaps.  In
> the case of an audit, all paper forms obtained must be accounted for
> as must all gaps in numbering.  You MUST be able to connect each
> sequential invoice number (internally generated) to each invoice form
> (numbered at the tax office).
>
> In this case you really have no choice but to lock some records,
> generate a new gapless id, and save/print it.  Naturally this causes
> the sorts of problems mentioned.

So the problem with Ireneuz's solution is that the mapping isn't stable
enough. Then how about creating this mapping on disk whenever an export is
done (assuming exports are much less frequent than inserts) ?

* create table idmap(realid integer references realdata(id), gaplessid
  integer);
* insert into realdata with the usual sequence
* whenever an export of new data is requested :
  * lock table idmap
  * select * from realdata where id > (select max(realid) from idmap) and
     realdata.timestamp < now() -
     'safety_margin_for_inserts_likely_to_rollback'::interval order by id for
     update;
  * insert into idmap
  * unlock table idmap
  * select gaplessid,data from idmap left join realdata


Depending on your insert/export ratio this might be more efficient. And of
course you can't delete rows 6 months later, but you knew that :p

--
Vincent de Phily
Mobile Devices
+33 (0) 142 119 325
+353 (0) 85 710 6320

Warning
This message (and any associated files) is intended only for the use of its
intended recipient and may contain information that is confidential, subject
to copyright or constitutes a trade secret. If you are not the intended
recipient you are hereby notified that any dissemination, copying or
distribution of this message, or files associated with this message, is
strictly prohibited. If you have received this message in error, please
notify us immediately by replying to the message and deleting it from your
computer. Any views or opinions presented are solely those of the author
vincent.dephily@mobile-devices.fr and do not necessarily represent those of
the
company. Although the company has taken reasonable precautions to ensure no
viruses are present in this email, the company cannot accept responsibility
for any loss or damage arising from the use of this email or attachments.