Thread: Best approach for a "gap-less" sequence
Hi! I was trying to solve a problem on an old system and realized that there might be some better approach for doing what I need. We have some documents that need to be ordered sequentially and without gaps. I could use a sequence, but if the transaction fails then when I rollback the sequence will already have been incremented. So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to it, read the value, increase it, do what I need and then I COMMIT the transaction, ensuring that the sequence has no gaps. Is there a better way to guarantee that there will be no gaps in my sequence if something goes wrong with my transaction? -- Jorge Godoy <jgodoy@gmail.com>
On 8/12/06, Jorge Godoy <jgodoy@gmail.com> wrote: > > Hi! > > > I was trying to solve a problem on an old system and realized that there might > be some better approach for doing what I need. > > We have some documents that need to be ordered sequentially and without gaps. > I could use a sequence, but if the transaction fails then when I rollback the > sequence will already have been incremented. > > So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to > it, read the value, increase it, do what I need and then I COMMIT the > transaction, ensuring that the sequence has no gaps. > > Is there a better way to guarantee that there will be no gaps in my sequence > if something goes wrong with my transaction? Why does it matter? I assume there is a reason you need it like this.. -- Postgresql & php tutorials http://www.designmagick.com/
Jorge Godoy wrote on 12.08.2006 01:33: > I was trying to solve a problem on an old system and realized that there might > be some better approach for doing what I need. > > We have some documents that need to be ordered sequentially and without gaps. > I could use a sequence, but if the transaction fails then when I rollback the > sequence will already have been incremented. > > So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to > it, read the value, increase it, do what I need and then I COMMIT the > transaction, ensuring that the sequence has no gaps. > > Is there a better way to guarantee that there will be no gaps in my sequence > if something goes wrong with my transaction? What do you do if a document gets deleted? Renumber the "following" documents so that no gaps are present in the already used ids? Thomas
"chris smith" <dmagick@gmail.com> writes: > Why does it matter? > > I assume there is a reason you need it like this.. Of course there is. It is a project requirement and also a law requirement that there's no unused number and that they be chronologically ordered as well. This is also part of the documented procedure that existed in paper and that got ISO 9001 certified (so a lot of money was spent here before). The law requirement is the strongest reason, though. After a number is assigned, it can't be changed, reused or have anything "newer" in a 'previous' (numerically-wise) entry. Concurrency is a problem since there might be a lot of people using it. I wanted to see if there was something that could improve performance here or to solve the problem in a better way without locking the table. Thanks, -- Jorge Godoy <jgodoy@gmail.com>
Thomas Kellerer <spam_eater@gmx.net> writes: > What do you do if a document gets deleted? Renumber the "following" documents > so that no gaps are present in the already used ids? There's no deletion possibility. A RULE sets a column named "active" to "False" instead (I can set it manually or let the RULE do that for me...). -- Jorge Godoy <jgodoy@gmail.com>
Jorge Godoy <jgodoy@gmail.com> writes: > Is there a better way to guarantee that there will be no gaps in my sequence > if something goes wrong with my transaction? From the overwhelming feedback I assume there isn't a better way yet... Thanks. I'll see how I can improve the model then to separate these sequences into different tables. -- Jorge Godoy <jgodoy@gmail.com>
Hi, On Sat, 12 Aug 2006, chris smith wrote: > On 8/12/06, Jorge Godoy <jgodoy@gmail.com> wrote: <snipp/> >> Is there a better way to guarantee that there will be no gaps in my >> sequence >> if something goes wrong with my transaction? > > Why does it matter? > > I assume there is a reason you need it like this.. For example german tax law requires invoices to be numbered sequentially without gaps. This is supposed to make it harder to cheat on VAT. You cannot just drop an invoice as that would leave a gap. Tax inspectors will search for gaps and query to whatever invoice is missing from records. I could not care less about gaps in surrogate keys but this kind of stuff is an external requirement. Theres propably not much choice on how to implement something like this but to just store the last assigned number in some row. I would at least try to assign multiple such numbers in batches to mimize contention on the row you store the counter in. Greetings Christian -- Christian Kratzer ck@cksoft.de CK Software GmbH http://www.cksoft.de/ Phone: +49 7452 889 135 Fax: +49 7452 889 136
Christian Kratzer <ck-lists@cksoft.de> writes: > I would at least try to assign multiple such numbers in batches to mimize > contention on the row you store the counter in. What do you mean here? How would you guarantee that on of the receiver transactions didn't rollback and left a gap in the "sequence"? I believe that for invoices it is less problematic. At least here I don't need the "time" part control, so if I leave one blank I can fill it later in the same day without problems (except, of course, if the sequence number is tied to some other physical evidence such as the paper counterpart of the invoice and that is also chronologically assigned). The whole problem appears because no matter how much we validate input and relationships on the input interface, something might happen and make the "INSERT" transaction fail. Theoretically, all should go fine, but... :-) -- Jorge Godoy <jgodoy@gmail.com>
Hi, On Sun, 13 Aug 2006, Jorge Godoy wrote: > Christian Kratzer <ck-lists@cksoft.de> writes: > >> I would at least try to assign multiple such numbers in batches to mimize >> contention on the row you store the counter in. > > What do you mean here? How would you guarantee that on of the receiver > transactions didn't rollback and left a gap in the "sequence"? you would need to serialize the transactions assigning the numbers and you would need to update the the counter in the same transaction that assigns your numbers to your documents or whatever. Assigning a batch of 1000 numbers in one transaction would propably be more efficient than assigning 1000 numbers in 1000 separate transactions that all need to be serialized. > I believe that for invoices it is less problematic. At least here I don't > need the "time" part control, so if I leave one blank I can fill it later in > the same day without problems (except, of course, if the sequence number is > tied to some other physical evidence such as the paper counterpart of the > invoice and that is also chronologically assigned). Thats of course the idea. The numbers on the paper invoices have to be gapless. The tax people want to have a warm fuzzy feeling that they are seeing all your invoices or they will begin to speculate on how much vat they have not received from you. > The whole problem appears because no matter how much we validate input and > relationships on the input interface, something might happen and make the > "INSERT" transaction fail. Theoretically, all should go fine, but... :-) increment the counter in the same transaction that assigns your values. Of course I know little or nothing about your application and what you need gaples sequences for. I just pulled the invoice example out of my hat to show that there are legitimate use cases for gapless sequences of numbers. Greetings Christian -- Christian Kratzer ck@cksoft.de CK Software GmbH http://www.cksoft.de/ Phone: +49 7452 889 135 Fax: +49 7452 889 136
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jorge Godoy wrote: > Jorge Godoy <jgodoy@gmail.com> writes: > >> Is there a better way to guarantee that there will be no gaps in my sequence >> if something goes wrong with my transaction? > > From the overwhelming feedback I assume there isn't a better way yet... > Thanks. I'll see how I can improve the model then to separate these sequences > into different tables. Pre-allocate records. The (primary key?) field would have the numbers already filled in, but all the rest of the fields in each record be NULL, blanks, zeros or indicator values ("~~~~~~~~~~", - -999999999, etc). Then create a single-field table called, for example, CUR_MAX_VALUE that gets incremented as part of each transaction. To serialize access, transactions would need an EXCLUSIVE lock on the table. - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE31J0S9HxQb37XmcRAkofAKCATXegeO6VRM8MW7AOkrFenMBtWgCgkksN +7yKXTm3STQvLo7KTduUhsY= =kxsK -----END PGP SIGNATURE-----
Ron Johnson <ron.l.johnson@cox.net> writes: > Pre-allocate records. The (primary key?) field would have the > numbers already filled in, but all the rest of the fields in each > record be NULL, blanks, zeros or indicator values ("~~~~~~~~~~", > -999999999, etc). > > Then create a single-field table called, for example, CUR_MAX_VALUE > that gets incremented as part of each transaction. To serialize > access, transactions would need an EXCLUSIVE lock on the table. What's the difference to having just the table with the sequence where I make an exclusive lock to get the value while inside the transaction? This approach seems more complicated since I'd have to exclude records that match the "not-used" pattern. -- Jorge Godoy <jgodoy@gmail.com>
Attachment
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jorge Godoy wrote: > Ron Johnson <ron.l.johnson@cox.net> writes: > >> Pre-allocate records. The (primary key?) field would have the >> numbers already filled in, but all the rest of the fields in each >> record be NULL, blanks, zeros or indicator values ("~~~~~~~~~~", >> -999999999, etc). >> >> Then create a single-field table called, for example, CUR_MAX_VALUE >> that gets incremented as part of each transaction. To serialize >> access, transactions would need an EXCLUSIVE lock on the table. > > What's the difference to having just the table with the sequence where I make > an exclusive lock to get the value while inside the transaction? This > approach seems more complicated since I'd have to exclude records that match > the "not-used" pattern. The use of CUR_MAX_VALUE "should" ensure that you never have gaps, since a rollback or process death would not update CUR_MAX_VALUE. Your WHERE clauses would *not* have AND NAME <> "~~~~~~~~~~". They would say AND SEQ_NO <= (SELECT CUR_MAX_VALUE FROM CUR_MAX_VALUE). - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE34jdS9HxQb37XmcRArBMAJ9ZS3/daUhhKu5f22nfo2m2AlXRfgCg7IfG amkfOOnaJ1UzKRdlyZfJlvE= =KCnM -----END PGP SIGNATURE-----
Jorge Godoy wrote: > Jorge Godoy <jgodoy@gmail.com> writes: > >> Is there a better way to guarantee that there will be no gaps in my sequence >> if something goes wrong with my transaction? > >From the overwhelming feedback I assume there isn't a better way yet... > Thanks. I'll see how I can improve the model then to separate these sequences > into different tables. > I'm not sure what type of lock you'd need to make sure no other transactions updated the table (see http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory" something like this should work: begin; select id from table order by id desc limit 1; insert into table (id, blah) values (id+1, 'blah'); commit; P.S. I'm sure in older versions this query wouldn't use an index: select max(id) from table; I'm not sure about 8.0+.. hence doing an order by the id desc limit 1. -- Postgresql & php tutorials http://www.designmagick.com/
Chris <dmagick@gmail.com> writes: > I'm not sure what type of lock you'd need to make sure no other transactions > updated the table (see > http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory" > something like this should work: > > begin; > select id from table order by id desc limit 1; > insert into table (id, blah) values (id+1, 'blah'); > commit; This is part of the solution, yes. But I would still need locking this table so that no other concurrent transaction gets another "id". I don't want to lock the main table -- as I believe you're suggesting -- because I want it to be searchable and updatable while I'm inserting new data. I just can't have gaps in the sequence but I don't want to restrict everything else here. > P.S. I'm sure in older versions this query wouldn't use an index: > select max(id) from table; It doesn't. You'd have to do what you did: "order by <x> desc limit 1" to have it using indexes... > I'm not sure about 8.0+.. hence doing an order by the id desc limit 1. I also have to test it... But I still keep using the "order by desc" syntax :-) Thanks for your answer, -- Jorge Godoy <jgodoy@gmail.com>
On Mon, Aug 14, 2006 at 09:09:51AM -0300, Jorge Godoy wrote: > Chris <dmagick@gmail.com> writes: > > P.S. I'm sure in older versions this query wouldn't use an index: > > select max(id) from table; > > It doesn't. You'd have to do what you did: "order by <x> desc limit 1" to > have it using indexes... > > > I'm not sure about 8.0+.. hence doing an order by the id desc limit 1. > > I also have to test it... But I still keep using the "order by desc" syntax Excerpt from the 8.1 Release Notes: Automatically use indexes for MIN() and MAX() (Tom) In previous releases, the only way to use an index for MIN() or MAX() was to rewrite the query as SELECT col FROM tab ORDER BY col LIMIT 1. Index usage now happens automatically. -- Michael Fuhr
Michael Fuhr <mike@fuhr.org> writes: > Automatically use indexes for MIN() and MAX() (Tom) > > In previous releases, the only way to use an index for MIN() > or MAX() was to rewrite the query as SELECT col FROM tab ORDER > BY col LIMIT 1. Index usage now happens automatically. Thanks Michael! This is great. -- Jorge Godoy <jgodoy@gmail.com>
Jorge Godoy wrote: > Chris <dmagick@gmail.com> writes: > > > I'm not sure what type of lock you'd need to make sure no other transactions > > updated the table (see > > http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory" > > something like this should work: > > > > begin; > > select id from table order by id desc limit 1; > > insert into table (id, blah) values (id+1, 'blah'); > > commit; > > This is part of the solution, yes. But I would still need locking this table > so that no other concurrent transaction gets another "id". I don't want to > lock the main table -- as I believe you're suggesting -- because I want it to > be searchable and updatable while I'm inserting new data. So you have to hold a lock that conflicts with itself, but not with ACCESS SHARE which is the lock acquired by SELECT. I think the first one on the list with these two properties is SHARE UPDATE EXCLUSIVE. Have a look at the list yourself: http://www.postgresql.org/docs/8.1/static/explicit-locking.html Note the tip at the end of the table: Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Since the gapless numbers are purely for the benefit of the tax people, you could build your db with regular sequences as primary keys and then regularly (or just before tax-time) insert into a table which maps the gapless sequence to the real primary key. -M
AgentM <agentm@themactionfaction.com> writes: > Since the gapless numbers are purely for the benefit of the tax people, you > could build your db with regular sequences as primary keys and then regularly > (or just before tax-time) insert into a table which maps the gapless sequence > to the real primary key. That's also an interesting approach. An auxiliary table like transaction integer FK to the transactions table transaction_nb integer gapless sequence should do it. A trigger inserting on this auxiliary table would also take care of everything... If I have an after trigger I believe I wouldn't need any locking... I have to think about this... As simple as this might be, I haven't thought about it :-) Thanks for your suggestion. -- Jorge Godoy <jgodoy@gmail.com>
In article <878xlrwcxm.fsf@gmail.com>, Jorge Godoy <jgodoy@gmail.com> writes: > AgentM <agentm@themactionfaction.com> writes: >> Since the gapless numbers are purely for the benefit of the tax people, you >> could build your db with regular sequences as primary keys and then regularly >> (or just before tax-time) insert into a table which maps the gapless sequence >> to the real primary key. > That's also an interesting approach. An auxiliary table like > transaction integer FK to the transactions table > transaction_nb integer gapless sequence > should do it. A trigger inserting on this auxiliary table would also take > care of everything... If I have an after trigger I believe I wouldn't need > any locking... I have to think about this... Why putting gapless numbers into the database at all? Just calculate them at query time.
> > AgentM <agentm@themactionfaction.com> writes: > >> Since the gapless numbers are purely for the benefit of the tax people, you > >> could build your db with regular sequences as primary keys and then regularly > >> (or just before tax-time) insert into a table which maps the gapless sequence > >> to the real primary key. > > > That's also an interesting approach. An auxiliary table like > > > transaction integer FK to the transactions table > > transaction_nb integer gapless sequence > > > should do it. A trigger inserting on this auxiliary table would also take > > care of everything... If I have an after trigger I believe I wouldn't need > > any locking... I have to think about this... > > Why putting gapless numbers into the database at all? Just calculate them at > query time. I am curious, can you calculate something like this using only sql? Or you you need to employee a procedural language like plpsgql? Regards, Richard Broersma Jr.
Harald Fuchs <hf0731x@protecting.net> writes: > Why putting gapless numbers into the database at all? Just calculate them at > query time. And how would you retrieve the record that corresponds to invoice number #16355, for example? Recalculating few records is fine, but millions of them everytime you need to recover some of those is something that doesn't look efficient to me... -- Jorge Godoy <jgodoy@gmail.com>
> Why putting gapless numbers into the database at all? Just calculate them at > query time. There is ABSOLUTELY NO WAY that would be acceptable for accounting or legal purposes. It would be the same as fabricating the numbers during an audit. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
Jorge Godoy wrote: > Chris <dmagick@gmail.com> writes: > > >>I'm not sure what type of lock you'd need to make sure no other transactions >>updated the table (see >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory" >>something like this should work: >> >>begin; >>select id from table order by id desc limit 1; >>insert into table (id, blah) values (id+1, 'blah'); >>commit; > > > This is part of the solution, yes. But I would still need locking this table > so that no other concurrent transaction gets another "id". I don't want to > lock the main table -- Wouldn't SELECT ... FOR UPDATE give you the row lock you need without locking the table? From "http://www.postgresql.org/docs/8.1/interactive/sql-select.html": "FOR UPDATE/FOR SHARE Clause ...FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these rows will be blocked until the current transaction ends. Also, if an UPDATE, DELETE, or SELECT FOR UPDATE from another transaction has already locked a selected row or rows, SELECT FOR UPDATE will wait for the other transaction to complete, and will then lock and return the updated row (or no row, if the row was deleted). ..." Regards, Berend Tober
In article <20060814162836.58963.qmail@web31811.mail.mud.yahoo.com>, Richard Broersma Jr <rabroersma@yahoo.com> writes: > I am curious, can you calculate something like this using only sql? Or you you need to employee a > procedural language like plpsgql? You could use something like SELECT (SELECT count(*) FROM tbl t2 WHERE t2.id < t1.id), whatever FROM tbl t1 but correlated subqueries are slow; thus incrementing the counter in the application would be faster for huge reports.
In article <87zme7uvcn.fsf@gmail.com>, Jorge Godoy <jgodoy@gmail.com> writes: > Harald Fuchs <hf0731x@protecting.net> writes: >> Why putting gapless numbers into the database at all? Just calculate them at >> query time. > And how would you retrieve the record that corresponds to invoice number > #16355, for example? Recalculating few records is fine, but millions of them > everytime you need to recover some of those is something that doesn't look > efficient to me... This would be SELECT whatever FROM tbl ORDER BY id LIMIT 1 OFFSET 16355 -1 Since id is the primary key, this can use an index scan.
In article <C1061F94.52761%scott_ribe@killerbytes.com>, Scott Ribe <scott_ribe@killerbytes.com> writes: >> Why putting gapless numbers into the database at all? Just >> calculate them at query time. > There is ABSOLUTELY NO WAY that would be acceptable for accounting or legal > purposes. It would be the same as fabricating the numbers during an audit. At some point in time those numbers "get fabricated" anyway. As long as you don't change the records inbetween, the technical effect would be the same. But you might be right that this is forbidden. I don't speak legalese.
Harald Fuchs <hf0731x@protecting.net> writes: > In article <87zme7uvcn.fsf@gmail.com>, > Jorge Godoy <jgodoy@gmail.com> writes: > >> Harald Fuchs <hf0731x@protecting.net> writes: >>> Why putting gapless numbers into the database at all? Just calculate them at >>> query time. > >> And how would you retrieve the record that corresponds to invoice number >> #16355, for example? Recalculating few records is fine, but millions of them >> everytime you need to recover some of those is something that doesn't look >> efficient to me... > > This would be > > SELECT whatever > FROM tbl > ORDER BY id > LIMIT 1 > OFFSET 16355 -1 > > Since id is the primary key, this can use an index scan. If the ID was the number yes. I thought you were suggesting having the number computed at "query time", not to insert the record on the table... -- Jorge Godoy <jgodoy@gmail.com>
On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote: > Jorge Godoy wrote: > > > Chris <dmagick@gmail.com> writes: > > > > > >>I'm not sure what type of lock you'd need to make sure no other transactions > >>updated the table (see > >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory" > >>something like this should work: > >> > >>begin; > >>select id from table order by id desc limit 1; > >>insert into table (id, blah) values (id+1, 'blah'); > >>commit; > > > > > > This is part of the solution, yes. But I would still need locking this table > > so that no other concurrent transaction gets another "id". I don't want to > > lock the main table -- > > Wouldn't SELECT ... FOR UPDATE give you the row lock you need without > locking the table? Nope, concurrent transactions won't work. Let current max id = x Transaction 1 (t1) does a select max(id) for update, gets a lock on the last tuple at the time of the select, and gets x as a value for max id Transaction 2 (t2) does a select max(id) for update, has to wait for t1 to release its lock. t1 inserts (x+1) as the new max id of the table. t1 releases its lock t2 is granted the lock on the tuple it has been waiting for, which contains the max id of x t2 tries to insert a value of x+1, insert fails (if it doesn't, you really want to have a close look at your constraints :-) Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
On Monday 14 August 2006 01:59 pm, Brad Nicholson wrote: > On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote: > > Jorge Godoy wrote: > > > Chris <dmagick@gmail.com> writes: > > >>I'm not sure what type of lock you'd need to make sure no other > > >> transactions updated the table (see > > >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in > > >> theory" something like this should work: > > >> > > >>begin; > > >>select id from table order by id desc limit 1; > > >>insert into table (id, blah) values (id+1, 'blah'); > > >>commit; > > > > > > This is part of the solution, yes. But I would still need locking this > > > table so that no other concurrent transaction gets another "id". I > > > don't want to lock the main table -- > > > > Wouldn't SELECT ... FOR UPDATE give you the row lock you need without > > locking the table? > > Nope, concurrent transactions won't work. > > Let current max id = x > > Transaction 1 (t1) does a select max(id) for update, gets a lock on the > last tuple at the time of the select, and gets x as a value for max id > > Transaction 2 (t2) does a select max(id) for update, has to wait for t1 > to release its lock. > > t1 inserts (x+1) as the new max id of the table. t1 releases its lock > > t2 is granted the lock on the tuple it has been waiting for, which > contains the max id of x > > t2 tries to insert a value of x+1, insert fails (if it doesn't, you > really want to have a close look at your constraints :-) > I am still working through this stuff myself, but the following excerpt from the documentation would seem to contradict what you are saying. See the part marked with ***. t2 should see a new max(id) after t1 commits and therefore insert(x+1) would succeed. http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE "FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these rows will be blocked until the current transaction ends.*** Also, if an UPDATE, DELETE, or SELECT FOR UPDATE from another transaction has already locked a selected row or rows, SELECT FOR UPDATE will wait for the other transaction to complete, and will then lock and return the updated row (or no row, if the row was deleted).***" -- Adrian Klaver aklaver@comcast.net
Brad Nicholson wrote: > On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote: > >>Jorge Godoy wrote: >> >> >>>Chris <dmagick@gmail.com> writes: >>> >>> >>> >>>>I'm not sure what type of lock you'd need to make sure no other transactions >>>>updated the table (see >>>>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in theory" >>>>something like this should work: >>>> >>>>begin; >>>>select id from table order by id desc limit 1; >>>>insert into table (id, blah) values (id+1, 'blah'); >>>>commit; >>> >>> >>>This is part of the solution, yes. But I would still need locking this table >>>so that no other concurrent transaction gets another "id". I don't want to >>>lock the main table -- >> >>Wouldn't SELECT ... FOR UPDATE give you the row lock you need without >>locking the table? > > > Nope, concurrent transactions won't work. > > Let current max id = x > > Transaction 1 (t1) does a select max(id) for update, gets a lock on the > last tuple at the time of the select, and gets x as a value for max id > > Transaction 2 (t2) does a select max(id) for update, has to wait for t1 > to release its lock. > > t1 inserts (x+1) as the new max id of the table. t1 releases its lock > > t2 is granted the lock on the tuple it has been waiting for, which > contains the max id of x > > t2 tries to insert a value of x+1, insert fails (if it doesn't, you > really want to have a close look at your constraints :-) I see. The FOR UPDATE form is not applicable with aggregates. I was looking at this as if he uses a separate table to keep track of the most-recently-issued sequence value, as the original post specified was the case, whereas using the MAX aggregate, as suggested subsequently, implies calculating the most-recently-used sequence value from the table that has all the earlier sequence values in it. The original question stated he "...acquire(d) a SHARE ROW EXCLUSIVE lock", which locks the whole table, so my thought was to suggest that locking the entire table was not necessary -- rather only the row. Of course, if the control table has only one row, then this may not matter. But since nothing beats empirical evidence, I tried it using BEGIN; SELECT column1 FROM test.table1 FOR UPDATE; UPDATE test.table1 SET column1=column1+1; COMMIT; executed line-by-line alternating between two separate pgAdmin SQL windows, and concurrency is properly accounted for. Regards, Berend Tober
On Monday 14 August 2006 02:46 pm, Adrian Klaver wrote: > > Let current max id = x > > > > Transaction 1 (t1) does a select max(id) for update, gets a lock on the > > last tuple at the time of the select, and gets x as a value for max id > > > > Transaction 2 (t2) does a select max(id) for update, has to wait for t1 > > to release its lock. > > > > t1 inserts (x+1) as the new max id of the table. t1 releases its lock > > > > t2 is granted the lock on the tuple it has been waiting for, which > > contains the max id of x > > > > t2 tries to insert a value of x+1, insert fails (if it doesn't, you > > really want to have a close look at your constraints :-) > > I am still working through this stuff myself, but the following excerpt > from the documentation would seem to contradict what you are saying. See > the part marked with ***. t2 should see a new max(id) after t1 commits and > therefore insert(x+1) would succeed. > > http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDA >TE-SHARE > > "FOR UPDATE causes the rows retrieved by the SELECT statement to be locked > as though for update. This prevents them from being modified or deleted by > other transactions until the current transaction ends. That is, other > transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these > rows will be blocked until the current transaction ends.*** Also, if an > UPDATE, DELETE, or SELECT FOR UPDATE from another transaction has already > locked a selected row or rows, SELECT FOR UPDATE will wait for the other > transaction to complete, and will then lock and return the updated row (or > no row, if the row was deleted).***" I spoke too soon. Actually trying this exposed the fact that FOR UPDATE does not work with aggregates. Something I would have discovered earlier if I had read the documentation all the way through. -- Adrian Klaver aklaver@comcast.net
On Mon, Aug 14, 2006 at 02:46:17PM -0700, Adrian Klaver wrote: > On Monday 14 August 2006 01:59 pm, Brad Nicholson wrote: > > On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote: > > > Jorge Godoy wrote: > > > > Chris <dmagick@gmail.com> writes: > > > >>I'm not sure what type of lock you'd need to make sure no other > > > >> transactions updated the table (see > > > >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but "in > > > >> theory" something like this should work: > > > >> > > > >>begin; > > > >>select id from table order by id desc limit 1; > > > >>insert into table (id, blah) values (id+1, 'blah'); > > > >>commit; > > > > > > > > This is part of the solution, yes. But I would still need locking this > > > > table so that no other concurrent transaction gets another "id". I > > > > don't want to lock the main table -- > > > > > > Wouldn't SELECT ... FOR UPDATE give you the row lock you need without > > > locking the table? > > > > Nope, concurrent transactions won't work. > > > > Let current max id = x > > > > Transaction 1 (t1) does a select max(id) for update, gets a lock on the > > last tuple at the time of the select, and gets x as a value for max id > > > > Transaction 2 (t2) does a select max(id) for update, has to wait for t1 > > to release its lock. > > > > t1 inserts (x+1) as the new max id of the table. t1 releases its lock > > > > t2 is granted the lock on the tuple it has been waiting for, which > > contains the max id of x > > > > t2 tries to insert a value of x+1, insert fails (if it doesn't, you > > really want to have a close look at your constraints :-) > > > > I am still working through this stuff myself, but the following excerpt from > the documentation would seem to contradict what you are saying. See the part > marked with ***. t2 should see a new max(id) after t1 commits and therefore > insert(x+1) would succeed. > > http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE > > "FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as > though for update. This prevents them from being modified or deleted by other > transactions until the current transaction ends. That is, other transactions > that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these rows will be > blocked until the current transaction ends.*** Also, if an UPDATE, DELETE, or > SELECT FOR UPDATE from another transaction has already locked a selected row > or rows, SELECT FOR UPDATE will wait for the other transaction to complete, > and will then lock and return the updated row (or no row, if the row was > deleted).***" If this is true the solution for a transactional, gapless sequence is possible for table.gl_id where updated from count.gl_id. It is simple. However, it *depends* on the fact that the second transaction getting the newly updated record from the first transaction. It seems pretty clear, not counting aggregates, that this is true from this doc snippet. Speak now, if someone doesn't read it this way! I'd like to understand why. If it weren't true, there would also be a workaround which caught a duplicate value and tried again, looping. I may publish the gapless sequence technique on general bits if there is no discrepancy in the understanding of the status of the second transaction's row value (updated). --elein elein@varlena.com > -- > Adrian Klaver > aklaver@comcast.net > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
elein wrote: > On Mon, Aug 14, 2006 at 02:46:17PM -0700, Adrian Klaver wrote: > >>On Monday 14 August 2006 01:59 pm, Brad Nicholson wrote: >> >>>On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote: >>>>Wouldn't SELECT ... FOR UPDATE give you the row lock you need without >>>>locking the table? > > If this is true the solution for a transactional, gapless sequence ... > I may publish the gapless sequence technique on general bits if there is no > discrepancy in the understanding of the status of the second transaction's > row value (updated). /* Hi Elein, I'm an avid reader of your General Bits column. One of my favorite sayings is "nothing beats empirical evidence", so regardless of what people interpret the documentation to say, here is a simplified description of an actual working implementation of how it is done: The background: A business requirement is to generate table rows that have uniformly increasing, whole number sequences, i.e., the "gap-less" sequence. In this particular case the situation requires multiple such sequences within the same table -- for each employee, there is a uniformly-sequenced set of expense reports. I use the term "compound sequence" for this situation because the expense reports are sequenced independently on a per-employee basis. Specifically, I have employee data in */ CREATE SCHEMA test; SET search_path = test, public, pg_catalog; CREATE TABLE employee ( employee_pk SERIAL, -- Identifies the employee. /* ...lots of non-relevent columns omitted ... */ expense_report_seq int4 DEFAULT 0, -- Compound sequence control. CONSTRAINT employee_pkey PRIMARY KEY (employee_pk) ); /* The expense_report_seq column stores the most-recently-used expense report number for each employee, i.e., it is the control value for the compound sequences that appear in */ CREATE TABLE expense ( employee_pk int4 NOT NULL, expense_report_pk int4 NOT NULL, /* ...lots of non-relevent columns omitted ... */ CONSTRAINT expense_report_pkey PRIMARY KEY (employee_pk, expense_report_pk), CONSTRAINT expense_fkey FOREIGN KEY (employee_pk) REFERENCES employee (employee_pk) ); /* A before-insert trigger handles the compound sequence: */ CREATE OR REPLACE FUNCTION expense_bit() RETURNS "trigger" AS ' BEGIN UPDATE employee SET expense_report_seq = (expense_report_seq + 1) WHERE employee_pk = NEW.employee_pk; SELECT INTO NEW.expense_report_pk expense_report_seq FROM employee WHERE employee_pk = NEW.employee_pk; RETURN new; END; ' LANGUAGE 'plpgsql' VOLATILE; /* Other triggers handle allowed deletion and correction of some expense report data under certain circumstances. */ CREATE TRIGGER expense_bit BEFORE INSERT ON expense FOR EACH ROW EXECUTE PROCEDURE expense_bit(); /* Turns out the SELECT ... FOR UPDATE syntax is not even required because code inside functions, particularly trigger functions as illustrated here, is treated as a transaction and the UPDATE statement locks the effected row until the trigger completes. */ -- Then test it: INSERT INTO employee DEFAULT VALUES; INSERT INTO employee DEFAULT VALUES; -- In two separate sessions, run many competing inserts: SET search_path = test, public, pg_catalog; INSERT INTO expense VALUES (1); INSERT INTO expense VALUES (1); /* ... */ INSERT INTO expense VALUES (1); INSERT INTO expense VALUES (2); INSERT INTO expense VALUES (2); /* ... */ INSERT INTO expense VALUES (2); -- And check your results: SELECT * FROM expense order by 1,2; /* Regards, Berend Tober */
On 8/12/06, Jorge Godoy <jgodoy@gmail.com> wrote: > I was trying to solve a problem on an old system and realized that there might > be some better approach for doing what I need. > > We have some documents that need to be ordered sequentially and without gaps. > I could use a sequence, but if the transaction fails then when I rollback the > sequence will already have been incremented. > > So, today I have a control table and I acquire a SHARE ROW EXCLUSIVE lock to > it, read the value, increase it, do what I need and then I COMMIT the > transaction, ensuring that the sequence has no gaps. > > Is there a better way to guarantee that there will be no gaps in my sequence > if something goes wrong with my transaction? Hmm, I would do it this way: -- First prepare a table for keeping gapless sequence, say: CREATE TABLE gapless_seq ( gseq_name varchar(256) PRIMARY KEY, gseq_value integer NOT NULL ); -- ...and populate it: INSERT INTO gapless_seq VALUES('tax_id', '1'); -- then create a function to retrieve the values: CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$ DECLARE n integer; BEGIN SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t FOR UPDATE; UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t; RETURN n; END; $$ STABLE LANGUAGE PLpgsql; -- ...and use it as default in table definiton CREATE TABLE taxdata ( tax_id integer PRIMARY KEY DEFAULT gseq_nextval('tax_id'), customer text, when timestamptz ); ...etc. SELECT ... FOR UPDATE woud ensure a row lock on "gapless sequence", a PLpgsql function would make a nice wrapper for it (so it would be usable more or less similar to real sequences), and it should work. I did not test the code right now, but I've written something similar to it some time ago, and it worked fine. Remember to vacuum gapless_seq table frequently and don't expect stellar performance from it. Regards, Dawid
On 8/16/06, Dawid Kuroczko <qnex42@gmail.com> wrote: > -- then create a function to retrieve the values: > CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$ > DECLARE > n integer; > BEGIN > SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t > FOR UPDATE; > UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t; > RETURN n; > END; > $$ STABLE LANGUAGE PLpgsql; ^^^^^^^^^^^ VOLATILE of course! Regards, Dawid
On Wednesday 16 August 2006 10:59 am, elein wrote: > On Mon, Aug 14, 2006 at 02:46:17PM -0700, Adrian Klaver wrote: > > On Monday 14 August 2006 01:59 pm, Brad Nicholson wrote: > > > On Mon, 2006-08-14 at 16:08 -0400, Berend Tober wrote: > > > > Jorge Godoy wrote: > > > > > Chris <dmagick@gmail.com> writes: > > > > >>I'm not sure what type of lock you'd need to make sure no other > > > > >> transactions updated the table (see > > > > >>http://www.postgresql.org/docs/8.1/interactive/sql-lock.html) but > > > > >> "in theory" something like this should work: > > > > >> > > > > >>begin; > > > > >>select id from table order by id desc limit 1; > > > > >>insert into table (id, blah) values (id+1, 'blah'); > > > > >>commit; > > > > > > > > > > This is part of the solution, yes. But I would still need locking > > > > > this table so that no other concurrent transaction gets another > > > > > "id". I don't want to lock the main table -- > > > > > > > > Wouldn't SELECT ... FOR UPDATE give you the row lock you need without > > > > locking the table? > > > > > > Nope, concurrent transactions won't work. > > > > > > Let current max id = x > > > > > > Transaction 1 (t1) does a select max(id) for update, gets a lock on the > > > last tuple at the time of the select, and gets x as a value for max id > > > > > > Transaction 2 (t2) does a select max(id) for update, has to wait for t1 > > > to release its lock. > > > > > > t1 inserts (x+1) as the new max id of the table. t1 releases its lock > > > > > > t2 is granted the lock on the tuple it has been waiting for, which > > > contains the max id of x > > > > > > t2 tries to insert a value of x+1, insert fails (if it doesn't, you > > > really want to have a close look at your constraints :-) > > > > I am still working through this stuff myself, but the following excerpt > > from the documentation would seem to contradict what you are saying. See > > the part marked with ***. t2 should see a new max(id) after t1 commits > > and therefore insert(x+1) would succeed. > > > > http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UP > >DATE-SHARE > > > > "FOR UPDATE causes the rows retrieved by the SELECT statement to be > > locked as though for update. This prevents them from being modified or > > deleted by other transactions until the current transaction ends. That > > is, other transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE > > of these rows will be blocked until the current transaction ends.*** > > Also, if an UPDATE, DELETE, or SELECT FOR UPDATE from another transaction > > has already locked a selected row or rows, SELECT FOR UPDATE will wait > > for the other transaction to complete, and will then lock and return the > > updated row (or no row, if the row was deleted).***" > > If this is true the solution for a transactional, gapless sequence is > possible for table.gl_id where updated from count.gl_id. It is simple. > However, it *depends* on the fact that the second transaction getting the > newly updated record from the first transaction. It seems pretty clear, > not counting aggregates, that this is true from this doc snippet. Speak > now, if someone doesn't read it this way! I'd like to understand why. > > If it weren't true, there would also be a workaround which caught a > duplicate value and tried again, looping. > > I may publish the gapless sequence technique on general bits if there is no > discrepancy in the understanding of the status of the second transaction's > row value (updated). > > --elein > elein@varlena.com After I discovered that aggregates did not work I did some simple tests updating a single row table. As I far as I could determine the docs hold true :) I only ran three transactions at a time but each saw the incremented value from the previous transaction. -- Adrian Klaver aklaver@comcast.net
"Dawid Kuroczko" <qnex42@gmail.com> writes: > I did not test the code right now, but I've written something similar to > it some time ago, and it worked fine. Remember to vacuum gapless_seq > table frequently and don't expect stellar performance from it. Interesting approach... And I don't expect too much performance for it. The restriction of the gapless sequence makes it expected that there's some minor delay somewhere. It would be bad on "common" sequences, but not on gapless. :-) Thanks for the code... It is a bit different from mine -- better, in fact... ;-) -- and I could give it a try. -- Jorge Godoy <jgodoy@gmail.com>
elein <elein@varlena.com> writes: > If this is true the solution for a transactional, gapless sequence is possible > for table.gl_id where updated from count.gl_id. It is simple. However, it > *depends* on the fact that the second transaction getting the newly updated > record from the first transaction. It seems pretty clear, not counting aggregates, > that this is true from this doc snippet. Speak now, if someone doesn't read it > this way! I'd like to understand why. I agre with your reading. > If it weren't true, there would also be a workaround which caught a duplicate > value and tried again, looping. This is true, but really bad. I believe a friend of mine had something like that in another database server, but it really looked like an ugly hack... Of course, if it's the only way to solve the problem then we have to live with it. > I may publish the gapless sequence technique on general bits if there is no > discrepancy in the understanding of the status of the second transaction's > row value (updated). I need more testing here. But from what I tested, your reading looks right. -- Jorge Godoy <jgodoy@gmail.com>
Berend Tober <btober@seaworthysys.com> writes: > A business requirement is to generate table rows that have uniformly > increasing, whole number sequences, i.e., the "gap-less" sequence. In this > particular case the situation requires multiple such sequences within the same > table -- for each employee, there is a uniformly-sequenced set of expense > reports. I use the term "compound sequence" for this situation because the > expense reports are sequenced independently on a per-employee basis. This is something that I'll also have to code ;-) But the sequence for "employees" would also be a sequential and gapless sequence here (yep, it's composed of two gapless sequences, being one "fixed" and the other varying for each new record inside the first sequence). Is the performance degradation here too big? I think that it might be lower than with approaches that used some kind of locking... I'm not sure, on this approach of yours, how's the probability of having several records inserted on different transactions for one employee. In my cases I see that when I have one unique "filter" (employee, invoice, etc.) the operation is done only by one person and consequently in one transaction only, what makes it possible to adopt more complex -- and badly performant -- solutions (not that I want them, it's just that it wouldn't be noticeable in the application as far as the user is concerned). Do you have this concurrence on the real application? -- Jorge Godoy <jgodoy@gmail.com>
On 8/17/06, Merlin Moncure <mmoncure@gmail.com> wrote: > On 8/16/06, Dawid Kuroczko <qnex42@gmail.com> wrote: > > -- then create a function to retrieve the values: > > CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$ > > DECLARE > > n integer; > > BEGIN > > SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t > > FOR UPDATE; > > UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t; > > RETURN n; > > END; > > $$ STABLE LANGUAGE PLpgsql; > > > > the problem here is if you have two concurrent transactions which call > this funtion, it is possible for them both to return the same sequence > number in read comitted mode. Using this funtion outside of > transactions is no different that using a sequence except that it is > slower. Hmm, I think you are wrong. There is a SELECT ... FOR UPDATE; The first-to-obtain the gapless sequence transaction will establish a lock onthe "tax_id" row. The other transaction will block until the first transaction finishes (and the row is updated) and will establish the row lock on it. "FOR UPDATE" effectively serializes access to this row for all transactions wanting to update it, even in read commited mode. Your statement would be true if I didn't use "SELECT ... FOR UPDATE"; but just a plain SELECT there. Regards, Dawid PS: http://www.postgresql.org/docs/8.1/interactive/sql-select.html#SQL-FOR-UPDATE-SHARE
Jorge Godoy wrote: > Berend Tober <btober@seaworthysys.com> writes: > >>A business requirement is to generate table rows that have uniformly >>increasing, whole number sequences, i.e., the "gap-less" sequence. > > This is something that I'll also have to code ;-) But the sequence for > "employees" would also be a sequential and gapless sequence here (yep, it's > composed of two gapless sequences, being one "fixed" and the other varying for > each new record inside the first sequence). Shouldn't be a problem to implement the same approach one level deeper like that. > Is the performance degradation here too big? That is where my empirical evidence is somewhat deficient. The application from which that example was drawn is used routinely by less than five persons, so any performance degradation is not evident. > I think that it might be lower > than with approaches that used some kind of locking... Your comment is confusing. The example DOES use locking -- the UPDATE statement inside the trigger locks the modified employee row until the trigger function completes -- I'm pretty sure I pointed out that. This is the minimally sufficient locking that has to happen to satisfy this business requirement. The original poster (was that you?) was using table-level locking, which is unnecessary in this approach. > I'm not sure, on this approach of yours, how's the probability of having > several records inserted on different transactions for one employee. In my > cases I see that when I have one unique "filter" (employee, invoice, etc.) the > operation is done only by one person and consequently in one transaction > only, what makes it possible to adopt more complex -- and badly performant -- > solutions (not that I want them, it's just that it wouldn't be noticeable in > the application as far as the user is concerned). Do you have this > concurrence on the real application? The design was intended to withstand concurrent use. As explained in "http://www.postgresql.org/docs/7.4/static/explicit-locking.html#LOCKING-ROWS", "A row-level lock on a specific row is automatically acquired when the row is updated (or deleted or marked for update). The lock is held until the transaction commits or rolls back. Row-level locks do not affect data querying; they block writers to the same row only." This is why you always UPDATE before SELECT. Since the trigger locks the row first, a second transaction initiated before completion of the first transaction is blocked until the first transaction commits. Regards, Berend Tober
Just in case no one else has brought it up- 8.1+ supports 2PC and savepoints, so one alternative would be to run your standard insertion operations in a prepared transaction or savepoint block. If you get so far as being able to prepare the transaction/complete the savepoint block, you should be able to snag a sequence id and commit everything. -M
On 8/17/06, Dawid Kuroczko <qnex42@gmail.com> wrote: > On 8/17/06, Merlin Moncure <mmoncure@gmail.com> wrote: > > On 8/16/06, Dawid Kuroczko <qnex42@gmail.com> wrote: > > > -- then create a function to retrieve the values: > > > CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$ > > > DECLARE > > > n integer; > > > BEGIN > > > SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t > > > FOR UPDATE; > > > UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t; > > > RETURN n; > > > END; > > > $$ STABLE LANGUAGE PLpgsql; > > > > > > > the problem here is if you have two concurrent transactions which call > > this funtion, it is possible for them both to return the same sequence > > number in read comitted mode. Using this funtion outside of > > transactions is no different that using a sequence except that it is > > slower. > > Hmm, I think you are wrong. There is a SELECT ... FOR UPDATE; > The first-to-obtain the gapless sequence transaction will establish > a lock onthe "tax_id" row. The other transaction will block until > the first transaction finishes (and the row is updated) and will > establish the row lock on it. yes, you are right...i didnt think the problem through properly. merlin
On Thu, 2006-08-17 at 12:12 -0400, Merlin Moncure wrote: > On 8/17/06, Dawid Kuroczko <qnex42@gmail.com> wrote: > > On 8/17/06, Merlin Moncure <mmoncure@gmail.com> wrote: > > > On 8/16/06, Dawid Kuroczko <qnex42@gmail.com> wrote: > > > > -- then create a function to retrieve the values: > > > > CREATE FUNCTION gseq_nextval(t text) RETURNS integer AS $$ > > > > DECLARE > > > > n integer; > > > > BEGIN > > > > SELECT INTO n gseq_value+1 FROM gapless_seq WHERE gseq_name = t > > > > FOR UPDATE; > > > > UPDATE gapless_seq SET gapless_value = n WHERE gseq_name = t; > > > > RETURN n; > > > > END; > > > > $$ STABLE LANGUAGE PLpgsql; > > > > > > > > > > the problem here is if you have two concurrent transactions which call > > > this funtion, it is possible for them both to return the same sequence > > > number in read comitted mode. Using this funtion outside of > > > transactions is no different that using a sequence except that it is > > > slower. > > > > Hmm, I think you are wrong. There is a SELECT ... FOR UPDATE; > > The first-to-obtain the gapless sequence transaction will establish > > a lock onthe "tax_id" row. The other transaction will block until > > the first transaction finishes (and the row is updated) and will > > establish the row lock on it. > > yes, you are right...i didnt think the problem through properly. Lets just hope the performance on a concurrent system is not a requirement of such a system... Brad.
On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote: > > > Hmm, I think you are wrong. There is a SELECT ... FOR UPDATE; > > > The first-to-obtain the gapless sequence transaction will establish > > > a lock onthe "tax_id" row. The other transaction will block until > > > the first transaction finishes (and the row is updated) and will > > > establish the row lock on it. > > > > yes, you are right...i didnt think the problem through properly. > > Lets just hope the performance on a concurrent system is not a > requirement of such a system... > right, if the transations are long running, there is a big problem as they are serialized around access to the sequence. however this is better than the control record approach because control record have problems with mvcc bloat. concurrent performance will of course be awful. a good compomise in some cases is to save off canceled transactions ids' in a free list you would still have to deal with transactions that were not gracefully cancelled though. merlin
On Thu, 2006-08-17 at 15:07, Merlin Moncure wrote: > On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote: > > > > > Hmm, I think you are wrong. There is a SELECT ... FOR UPDATE; > > > > The first-to-obtain the gapless sequence transaction will establish > > > > a lock onthe "tax_id" row. The other transaction will block until > > > > the first transaction finishes (and the row is updated) and will > > > > establish the row lock on it. > > > > > > yes, you are right...i didnt think the problem through properly. > > > > Lets just hope the performance on a concurrent system is not a > > requirement of such a system... > > > > right, if the transations are long running, there is a big problem as > they are serialized around access to the sequence. however this is > better than the control record approach because control record have > problems with mvcc bloat. concurrent performance will of course be > awful. > > a good compomise in some cases is to save off canceled transactions > ids' in a free list you would still have to deal with transactions > that were not gracefully cancelled though. Is it not possible in some circumstances to create the invoice first, THEN assign a sequential ID after creation?
On Thu, 2006-08-17 at 16:07 -0400, Merlin Moncure wrote: > On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote: > > > > > Hmm, I think you are wrong. There is a SELECT ... FOR UPDATE; > > > > The first-to-obtain the gapless sequence transaction will establish > > > > a lock onthe "tax_id" row. The other transaction will block until > > > > the first transaction finishes (and the row is updated) and will > > > > establish the row lock on it. > > > > > > yes, you are right...i didnt think the problem through properly. > > > > Lets just hope the performance on a concurrent system is not a > > requirement of such a system... > > > > right, if the transations are long running, there is a big problem as > they are serialized around access to the sequence. however this is > better than the control record approach because control record have > problems with mvcc bloat. concurrent performance will of course be > awful. This effect will be magnified if there other long running transactions (pg_dump and pre 8.2 vacuum, I'm looking at you), as the dead tuples from the updates will start to pile up, and reads to the table slow down, locks persist for longer...
On Thu, 2006-08-17 at 15:13 -0500, Scott Marlowe wrote: > On Thu, 2006-08-17 at 15:07, Merlin Moncure wrote: > > On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote: > > > > > > > Hmm, I think you are wrong. There is a SELECT ... FOR UPDATE; > > > > > The first-to-obtain the gapless sequence transaction will establish > > > > > a lock onthe "tax_id" row. The other transaction will block until > > > > > the first transaction finishes (and the row is updated) and will > > > > > establish the row lock on it. > > > > > > > > yes, you are right...i didnt think the problem through properly. > > > > > > Lets just hope the performance on a concurrent system is not a > > > requirement of such a system... > > > > > > > right, if the transations are long running, there is a big problem as > > they are serialized around access to the sequence. however this is > > better than the control record approach because control record have > > problems with mvcc bloat. concurrent performance will of course be > > awful. > > > > a good compomise in some cases is to save off canceled transactions > > ids' in a free list you would still have to deal with transactions > > that were not gracefully cancelled though. > > Is it not possible in some circumstances to create the invoice first, > THEN assign a sequential ID after creation? If speed of access was an issue, that's how I'd look at doing it - batch assign them after the fact. Brad.
On 8/17/06, Merlin Moncure <mmoncure@gmail.com> wrote: > On 8/17/06, Brad Nicholson <bnichols@ca.afilias.info> wrote: > > > > Hmm, I think you are wrong. There is a SELECT ... FOR UPDATE; > > > > The first-to-obtain the gapless sequence transaction will establish > > > > a lock onthe "tax_id" row. The other transaction will block until > > > > the first transaction finishes (and the row is updated) and will > > > > establish the row lock on it. > > > > > > yes, you are right...i didnt think the problem through properly. > > > > Lets just hope the performance on a concurrent system is not a > > requirement of such a system... > > right, if the transations are long running, there is a big problem as > they are serialized around access to the sequence. however this is > better than the control record approach because control record have > problems with mvcc bloat. concurrent performance will of course be > awful. > > a good compomise in some cases is to save off canceled transactions > ids' in a free list you would still have to deal with transactions > that were not gracefully cancelled though. I believe there is no ON ROLLBACK trigger... One might periodicaly check the table for "gaps" and put them into "reuse me" table, say a left outer join between generate_series and a real table, withing "last known continuous id" and "max(id) from table". However, if "gapless sequence" table bloat while long running transactions is a problem, the other approach might be minimalising the size of this "gapless sequence" table, say: CREATE TABLE gapless_seq ( gseq_value int NOT NULL; ); INSER INTO gapless_seq VALUES(1); ...and then SELECT gseq_vaule FROM gapless_seq FOR UPDATE and UPDATE gapless_seq SET gseq_value = gseq_value+1; ...this saves a few bytes per each update. If we keep only a single row, there is no need for an index (there is no vilibility information in the index, so index would be useless in our case), and no need for unique constraint checking (assuming we are sure noone will ever ever insert additional rows -- it would be a disaster -- CREATE RULE to ignore any INSERTs/DELETEs just to be sure. Of course this does not solve the concurency issue, but I'm assuming that if someone wants a gapless sequence, she wants its generation serialized (or would assign sequences in post-transaction batches). Regards, Dawid
The technique for a single part gapless sequence and a two-part gapless sequence has been published today at http://www.varlena.com/GeneralBits. I'd be interested to hear of high concurrency usage that would break it or be notably slow. --elein elein@varlena.com
elein <elein@varlena.com> writes: > The technique for a single part gapless sequence and a two-part gapless > sequence has been published today at http://www.varlena.com/GeneralBits. > > I'd be interested to hear of high concurrency usage that would break it > or be notably slow. > > --elein > elein@varlena.com Thanks for the article! It will be really helpful since GeneralBits is a reference point for PostgreSQL users! :-) -- Jorge Godoy <jgodoy@gmail.com>