Thread: How to create "auto-increment" field WITHOUT a sequence object?
Hello.
I need to create an auto-increment field on a table WITHOUT using sequences:
CREATE TABLE tbl(
name TEXT,
uniq_id INTEGER
);
Each INSERT to this table must generate a new uniq_id which is distinct from all others.
The problem is that these INSERTs are rolled back oftenly (i.e. they are executed within a transaction block which is rolled back time to time), this is an existing design of the current architecture and unfortunately we have to live with it. And I need as compact uniq_id generation (with minimum "holes") as it possible - this is a VERY important requirement (to export these values into external systems which accepts only IDs limited from 1 to 100000).
So I cannot use sequences: sequence value is obviously not rolled back, so if I insert nextval(...) as uniq_id, I will have large holes (because of often transaction rollbacks) and exhaust 100000 uniq_ids very fast. How to deal with all this without sequences?
I tried
BEGIN;
LOCK TABLE tbl;
INSERT INTO tbl(uniq_id) VALUES((SELECT max(uniq_id) FROM tbl) + 1);
COMMIT;
but seems it performs too hard locking - time to time this query is timed out (or sometimes deadlocks with other queries).
Is there any other, less hard, locking which allow me to guarantee that no INSERTs will be performed into tbl between max() calculation and UPDATE query itself, but does not lock the whole table?
Hey Dmitry,
--
// Dmitriy.
2011/6/30 Dmitry Koterov <dmitry.koterov@gmail.com>
Consider to create table with column of type integer and
write a function which will perform SELECT FOR UPDATE ...
and returns the next value, i.e.
BEGIN;
INSERT INTO tbl(uniq_id) SELECT uniq_id_generator(); -- SELECT FOR UPDATE inside
COMMIT; -- or ROLLBACK
Hello.I need to create an auto-increment field on a table WITHOUT using sequences:CREATE TABLE tbl(name TEXT,uniq_id INTEGER);Each INSERT to this table must generate a new uniq_id which is distinct from all others.The problem is that these INSERTs are rolled back oftenly (i.e. they are executed within a transaction block which is rolled back time to time), this is an existing design of the current architecture and unfortunately we have to live with it. And I need as compact uniq_id generation (with minimum "holes") as it possible - this is a VERY important requirement (to export these values into external systems which accepts only IDs limited from 1 to 100000).So I cannot use sequences: sequence value is obviously not rolled back, so if I insert nextval(...) as uniq_id, I will have large holes (because of often transaction rollbacks) and exhaust 100000 uniq_ids very fast. How to deal with all this without sequences?I triedBEGIN;LOCK TABLE tbl;INSERT INTO tbl(uniq_id) VALUES((SELECT max(uniq_id) FROM tbl) + 1);COMMIT;
Consider to create table with column of type integer and
write a function which will perform SELECT FOR UPDATE ...
and returns the next value, i.e.
BEGIN;
INSERT INTO tbl(uniq_id) SELECT uniq_id_generator(); -- SELECT FOR UPDATE inside
COMMIT; -- or ROLLBACK
--
// Dmitriy.
On 07/01/2011 02:20 AM, Dmitry Koterov wrote: > > The problem is that these INSERTs are rolled back oftenly (i.e. they > are executed within a transaction block which is rolled back time to > time), this is an existing design of the current architecture and > unfortunately we have to live with it. And I need as compact uniq_id > generation (with minimum "holes") as it possible - this is a VERY > important requirement (to export these values into external systems > which accepts only IDs limited from 1 to 100000). What you want is often referred to as a "gapless sequence". Searching the mailing list archives for that will find you lots of information. > but seems it performs too hard locking - time to time this query is > timed out (or sometimes deadlocks with other queries). You'll have that problem with any gapless sequence approach. You'll have to be prepared to re-try failed transactions after deadlocks, or be *extremely* strict about the order in which you perform operations so you avoid any deadlocks. -- Craig Ringer
Having done gapless numbering for some users of accounting software, there are two suggestions I would make. The first is that any sort of gapless numbering inherently runs into scalability. You HAVE to lock relevant records, and this means that only one insert can run at a time, and it must commit before the next insert can run. This means you have to keep your transactions short and predictable in terms of table order. The suggestion of using for update is a good one, but it doesn't entirely get rid of the problem, which is inherent in ensuring gapless numbering in a system with concurrent transactions. The second is that you absolutely should use this approach as rarely as you can get away with. If it isn't required, don't use it! Best Wishes, Chris Travers
Hey Chris,
--
// Dmitriy.
The suggestion of using for
update is a good one, but it doesn't entirely get rid of the problem,
which is inherent in ensuring gapless numbering in a system with
concurrent transactions.
Why not?
I mean the following solution:
CREATE TABLE myseq(tabnm text not null, lastid integer not null);
INSERT INTO myseq SELECT 'mytab', 0; -- initialization
CREATE OR REPLACE FUNCTION public.myseq_nextval(a_tabnm text)
RETURNS integer
LANGUAGE sql
STRICT
AS $function$
UPDATE myseq SET lastid = li + 1 FROM
(SELECT lastid li FROM myseq WHERE tabnm = $1 FOR UPDATE) foo
RETURNING lastid;
$function$
-- Test
dmitigr=> BEGIN;
BEGIN
dmitigr=> SELECT myseq_nextval('mytab');
myseq_nextval
---------------
1
(1 row)
dmitigr=> ROLLBACK;
ROLLBACK
dmitigr=> SELECT * FROM myseq;
tabnm | lastid
-------+--------
mytab | 0
(1 row)
So, with this approach you'll get a lock only on INSERT.
dmitigr=> CREATE TABLE mytab(id integer not null DEFAULT myseq_nextval('mytab'));
CREATE TABLE
dmitigr=> INSERT INTO mytab DEFAULT VALUES;
INSERT 0 1
dmitigr=> INSERT INTO mytab DEFAULT VALUES;
INSERT 0 1
dmitigr=> SELECT * FROM mytab;
id
----
1
2
(2 rows)
I mean the following solution:
CREATE TABLE myseq(tabnm text not null, lastid integer not null);
INSERT INTO myseq SELECT 'mytab', 0; -- initialization
CREATE OR REPLACE FUNCTION public.myseq_nextval(a_tabnm text)
RETURNS integer
LANGUAGE sql
STRICT
AS $function$
UPDATE myseq SET lastid = li + 1 FROM
(SELECT lastid li FROM myseq WHERE tabnm = $1 FOR UPDATE) foo
RETURNING lastid;
$function$
-- Test
dmitigr=> BEGIN;
BEGIN
dmitigr=> SELECT myseq_nextval('mytab');
myseq_nextval
---------------
1
(1 row)
dmitigr=> ROLLBACK;
ROLLBACK
dmitigr=> SELECT * FROM myseq;
tabnm | lastid
-------+--------
mytab | 0
(1 row)
So, with this approach you'll get a lock only on INSERT.
dmitigr=> CREATE TABLE mytab(id integer not null DEFAULT myseq_nextval('mytab'));
CREATE TABLE
dmitigr=> INSERT INTO mytab DEFAULT VALUES;
INSERT 0 1
dmitigr=> INSERT INTO mytab DEFAULT VALUES;
INSERT 0 1
dmitigr=> SELECT * FROM mytab;
id
----
1
2
(2 rows)
--
// Dmitriy.
On Fri, Jul 1, 2011 at 1:16 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote: > Hey Chris, > >> The suggestion of using for >> update is a good one, but it doesn't entirely get rid of the problem, >> which is inherent in ensuring gapless numbering in a system with >> concurrent transactions. > > Why not? Just because it locks less doesn't mean that it doesn't lock. The point is: if gaps are acceptable then the sequences which exist outside of transactions are idea. If gaps are not acceptable, you have to lock and force transactions through the system serially which means a possibility of deadlocks and performance issues. These issues are inherent in gapless numbering because you can't get a gapless sequence when things roll back without such locks. > > I mean the following solution: > > CREATE TABLE myseq(tabnm text not null, lastid integer not null); > > INSERT INTO myseq SELECT 'mytab', 0; -- initialization > > CREATE OR REPLACE FUNCTION public.myseq_nextval(a_tabnm text) > RETURNS integer > LANGUAGE sql > STRICT > AS $function$ > UPDATE myseq SET lastid = li + 1 FROM > (SELECT lastid li FROM myseq WHERE tabnm = $1 FOR UPDATE) foo > RETURNING lastid; > $function$ > > -- Test > > dmitigr=> BEGIN; > BEGIN > dmitigr=> SELECT myseq_nextval('mytab'); > myseq_nextval > --------------- > 1 > (1 row) > > dmitigr=> ROLLBACK; > ROLLBACK > dmitigr=> SELECT * FROM myseq; > tabnm | lastid > -------+-------- > mytab | 0 > (1 row) > > So, with this approach you'll get a lock only on INSERT. True. But the point us that you MUST lock on insert to get gapless sequences, and this creates inherent problems in terms of performance and concurrency, so that you should not use it unless you really have no other choice (i.e. because the tax authorities demand it). Best Wishes, Chris Travers
2011/7/1 Chris Travers <chris.travers@gmail.com>
On Fri, Jul 1, 2011 at 1:16 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:Just because it locks less doesn't mean that it doesn't lock.
> Hey Chris,
>
>> The suggestion of using for
>> update is a good one, but it doesn't entirely get rid of the problem,
>> which is inherent in ensuring gapless numbering in a system with
>> concurrent transactions.
>
> Why not?
The point is: if gaps are acceptable then the sequences which exist
outside of transactions are idea. If gaps are not acceptable, you
have to lock and force transactions through the system serially which
means a possibility of deadlocks and performance issues. These issues
are inherent in gapless numbering because you can't get a gapless
sequence when things roll back without such locks.
Then I don't clearly understand the existence of locks (the LOCK
command, SELECT FOR UPDATE clause and so on) if the usage
of them gives only problems...
command, SELECT FOR UPDATE clause and so on) if the usage
of them gives only problems...
True. But the point us that you MUST lock on insert to get gapless>
> I mean the following solution:
>
> CREATE TABLE myseq(tabnm text not null, lastid integer not null);
>
> INSERT INTO myseq SELECT 'mytab', 0; -- initialization
>
> CREATE OR REPLACE FUNCTION public.myseq_nextval(a_tabnm text)
> RETURNS integer
> LANGUAGE sql
> STRICT
> AS $function$
> UPDATE myseq SET lastid = li + 1 FROM
> (SELECT lastid li FROM myseq WHERE tabnm = $1 FOR UPDATE) foo
> RETURNING lastid;
> $function$
>
> -- Test
>
> dmitigr=> BEGIN;
> BEGIN
> dmitigr=> SELECT myseq_nextval('mytab');
> myseq_nextval
> ---------------
> 1
> (1 row)
>
> dmitigr=> ROLLBACK;
> ROLLBACK
> dmitigr=> SELECT * FROM myseq;
> tabnm | lastid
> -------+--------
> mytab | 0
> (1 row)
>
> So, with this approach you'll get a lock only on INSERT.
sequences, and this creates inherent problems in terms of performance
and concurrency, so that you should not use it unless you really have
no other choice (i.e. because the tax authorities demand it).
Sure, but, again, why LOCK and SELECT FOR UPDATE exists ?
Best Wishes,
Chris Travers
--
// Dmitriy.
On 1/07/2011 4:21 PM, Chris Travers wrote: > means a possibility of deadlocks and performance issues. These issues > are inherent in gapless numbering because you can't get a gapless > sequence when things roll back without such locks. Actually, another approach that allows parallel transactions is (at least theoretically) possible. You can hand out IDs as transactions request them, and if a transaction rolls back you can abort it *and* *all* *transactions* *given* *higher* *IDs*, then re-issue the lot. I guess that could be useful if transaction failure was extremely unlikely and you needed to have lots of work in flight at once. In practice I don't think it'd be very useful to do this, because transactions would still have to commit in the order they obtained IDs in, so a slow or blocked transaction would still stall the system just like it does with lock-based gapless numbering. Also, once a later transaction had obtained an ID prior transactions couldn't obtain any more IDs. I'm not sure this is possible in Pg without modifying the server, either. It'd be kind of interesting in a useless-toy-project kind of way. -- Craig Ringer
Le vendredi 01 juillet 2011 à 12:28 +0400, Dmitriy Igrishin a écrit : > Then I don't clearly understand the existence of locks (the LOCK > command, SELECT FOR UPDATE clause and so on) if the usage > of them gives only problems... > Chris already explained why twice : "you MUST lock on insert to get gapless sequences" Can't you just : -create the records with a regular sequence, that will have gaps -when you want to export, number an additional column from 1 to 10 000 and use that as key ? -- Vincent Veyron http://marica.fr/ Logiciel de gestion des sinistres et des contentieux pour le service juridique
Hey Vincent,
--
// Dmitriy.
2011/7/3 Vincent Veyron <vv.lists@wanadoo.fr>
Le vendredi 01 juillet 2011 à 12:28 +0400, Dmitriy Igrishin a écrit :Chris already explained why twice :
> Then I don't clearly understand the existence of locks (the LOCK
> command, SELECT FOR UPDATE clause and so on) if the usage
> of them gives only problems...
>
"you MUST lock on insert to get gapless sequences"
Not me :-). The OP must do it. So, what problem here? Deadlocks?
Again, if deadlocks are so dangerous, why the LOCK command exists?
Again, if deadlocks are so dangerous, why the LOCK command exists?
Can't you just :
-create the records with a regular sequence, that will have gaps
-when you want to export, number an additional column from 1 to 10 000
and use that as key
?
I don't use any locks explicitly :-)
--
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique
--
// Dmitriy.
W dniu 2011-06-30 20:20, Dmitry Koterov pisze: > And I need as compact uniq_id generation (with minimum "holes") as it possible - this is a VERY > important requirement (to export these values into external systems which accepts only IDs limited > from 1 to 100000). > > So I cannot use sequences: sequence value is obviously not rolled back, so if I insert > nextval(...) as uniq_id, I will have large holes (because of often transaction rollbacks) and > exhaust 100000 uniq_ids very fast. How to deal with all this without sequences? You may use dense_rank() (or even rank()) window function to map your sequence-with-gaps to a no-gap-id which will be used for exports. Consider this: test=# select uniq_id_with_gaps, dense_rank() over (order by uniq_id_with_gaps) as uniq_id_without_gaps from (select generate_series(1, 100, 7) as uniq_id_with_gaps) a; uniq_id_with_gaps | uniq_id_without_gaps -------------------+---------------------- 1 | 1 8 | 2 15 | 3 22 | 4 29 | 5 36 | 6 43 | 7 50 | 8 57 | 9 64 | 10 71 | 11 78 | 12 85 | 13 92 | 14 99 | 15
On Sun, Jul 3, 2011 at 7:25 AM, Ireneusz Pluta <ipluta@wp.pl> wrote: > You may use dense_rank() (or even rank()) window function to map your > sequence-with-gaps to a no-gap-id which will be used for exports. > The typical case where gapless numbering comes up is something like this: In Greece, you go get invoice paper from the tax office which is numbered in sequence and the government gets a list of the invoice forms you have purchased. You then print the invoices on those paper forms, and must number the invoices sequentially and without gaps. In the case of an audit, all paper forms obtained must be accounted for as must all gaps in numbering. You MUST be able to connect each sequential invoice number (internally generated) to each invoice form (numbered at the tax office). In this case you really have no choice but to lock some records, generate a new gapless id, and save/print it. Naturally this causes the sorts of problems mentioned. Best Wishes, Chris Travers
Le dimanche 03 juillet 2011 à 18:10 +0400, Dmitriy Igrishin a écrit : > > Not me :-). The OP must do it. Duh! sorry about that. Indeed, I confused you with Dmitry. -- Vincent Veyron http://marica.fr/ Logiciel de gestion des sinistres et des contentieux pour le service juridique
On 3 Jul 2011, at 16:10, Dmitriy Igrishin wrote: > "you MUST lock on insert to get gapless sequences" > Not me :-). The OP must do it. So, what problem here? Deadlocks? > Again, if deadlocks are so dangerous, why the LOCK command exists? It's not deadlocks, it's concurrent updates that are the trouble. If you don't lock, you run the risk for two records beingassigned the same number concurrently. With a unique constraint added into the mix (and there should be one) that means that one of the transactions will fail theunique constraint check on commit. It's possible to catch that in the client and redo the transaction with a new ID, but if that's not acceptable (for examplebecause it matters which transaction got the ID first) then you need to lock records. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4e109f6e12092079216178!
Hey Alban,
--
// Dmitriy.
2011/7/3 Alban Hertroys <dalroi@solfertje.student.utwente.nl>
On 3 Jul 2011, at 16:10, Dmitriy Igrishin wrote:It's not deadlocks, it's concurrent updates that are the trouble. If you don't lock, you run the risk for two records being assigned the same number concurrently.
> "you MUST lock on insert to get gapless sequences"
> Not me :-). The OP must do it. So, what problem here? Deadlocks?
> Again, if deadlocks are so dangerous, why the LOCK command exists?
Thanks for clarify, but I know why resources must be locked when
they are used concurrently :-). See my previous post about SELECT FOR UPDATE ...
and I don't see the problem with it. As well as with the LOCK command.
they are used concurrently :-). See my previous post about SELECT FOR UPDATE ...
and I don't see the problem with it. As well as with the LOCK command.
With a unique constraint added into the mix (and there should be one) that means that one of the transactions will fail the unique constraint check on commit.
It's possible to catch that in the client and redo the transaction with a new ID, but if that's not acceptable (for example because it matters which transaction got the ID first) then you need to lock records.
Sure.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:1257,4e109f6512095013212184!
--
// Dmitriy.
On Sunday 03 July 2011 07:47:01 Chris Travers wrote: > On Sun, Jul 3, 2011 at 7:25 AM, Ireneusz Pluta <ipluta@wp.pl> wrote: > > You may use dense_rank() (or even rank()) window function to map your > > sequence-with-gaps to a no-gap-id which will be used for exports. > > The typical case where gapless numbering comes up is something like this: > > In Greece, you go get invoice paper from the tax office which is > numbered in sequence and the government gets a list of the invoice > forms you have purchased. You then print the invoices on those paper > forms, and must number the invoices sequentially and without gaps. In > the case of an audit, all paper forms obtained must be accounted for > as must all gaps in numbering. You MUST be able to connect each > sequential invoice number (internally generated) to each invoice form > (numbered at the tax office). > > In this case you really have no choice but to lock some records, > generate a new gapless id, and save/print it. Naturally this causes > the sorts of problems mentioned. So the problem with Ireneuz's solution is that the mapping isn't stable enough. Then how about creating this mapping on disk whenever an export is done (assuming exports are much less frequent than inserts) ? * create table idmap(realid integer references realdata(id), gaplessid integer); * insert into realdata with the usual sequence * whenever an export of new data is requested : * lock table idmap * select * from realdata where id > (select max(realid) from idmap) and realdata.timestamp < now() - 'safety_margin_for_inserts_likely_to_rollback'::interval order by id for update; * insert into idmap * unlock table idmap * select gaplessid,data from idmap left join realdata Depending on your insert/export ratio this might be more efficient. And of course you can't delete rows 6 months later, but you knew that :p -- Vincent de Phily Mobile Devices +33 (0) 142 119 325 +353 (0) 85 710 6320 Warning This message (and any associated files) is intended only for the use of its intended recipient and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer. Any views or opinions presented are solely those of the author vincent.dephily@mobile-devices.fr and do not necessarily represent those of the company. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.