Thread: Logical locking beyond pg_advisory
I need a mechanism of "logical locking" more ductile than the pg_advisory family.
I'm thinking of a table ("lock_table") that would be part of the database, with columns
* tablename varchar - name of the table "locked"
* rowid integer, - id of the row "locked"
* ownerid varchar, - identifier of the "user" who acquired the lock
* acquired timestamp - to be able to release "abandoned" locks after a certain time
and a group of functions
1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over the entire table, setting rowid to zero
2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, if the owner is the recorded one
3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table is locked by some user other than the ownerid argument
4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_try_advisory_lock
5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_advisory_unlock
6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid
The timeout (default, maybe 15 minutes) is implicitly applied if the lock is taken by another user (there will be no notification).
Redundant locks are not queued, they simply return true, may be after an update of the acquired column.
Successful locks insert a new row, except the rare case of a timeout, which becomes an update (ownerid and acquired)
Unlock operations deletes the corresponding row
My question is double
a) What is the opinion on the project?
b) What are the consequences of the large number of inserts and deletions
c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I think) no table operations.
TIA
I'm thinking of a table ("lock_table") that would be part of the database, with columns
* tablename varchar - name of the table "locked"
* rowid integer, - id of the row "locked"
* ownerid varchar, - identifier of the "user" who acquired the lock
* acquired timestamp - to be able to release "abandoned" locks after a certain time
and a group of functions
1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over the entire table, setting rowid to zero
2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, if the owner is the recorded one
3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table is locked by some user other than the ownerid argument
4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_try_advisory_lock
5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_advisory_unlock
6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid
The timeout (default, maybe 15 minutes) is implicitly applied if the lock is taken by another user (there will be no notification).
Redundant locks are not queued, they simply return true, may be after an update of the acquired column.
Successful locks insert a new row, except the rare case of a timeout, which becomes an update (ownerid and acquired)
Unlock operations deletes the corresponding row
My question is double
a) What is the opinion on the project?
b) What are the consequences of the large number of inserts and deletions
c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I think) no table operations.
TIA
Em dom, 16 de set de 2018 às 17:53, marcelo <marcelo.nicolet@gmail.com> escreveu:
>
> I need a mechanism of "logical locking" more ductile than the pg_advisory family.
> I'm thinking of a table ("lock_table") that would be part of the database, with columns
> * tablename varchar - name of the table "locked"
> * rowid integer, - id of the row "locked"
> * ownerid varchar, - identifier of the "user" who acquired the lock
> * acquired timestamp - to be able to release "abandoned" locks after a certain time
>
> and a group of functions
> 1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over the entire table, setting rowid to zero
> 2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, if the owner is the recorded one
> 3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table is locked by some user other than the ownerid argument
> 4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_try_advisory_lock
> 5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_advisory_unlock
> 6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid
>
> The timeout (default, maybe 15 minutes) is implicitly applied if the lock is taken by another user (there will be no notification).
> Redundant locks are not queued, they simply return true, may be after an update of the acquired column.
> Successful locks insert a new row, except the rare case of a timeout, which becomes an update (ownerid and acquired)
> Unlock operations deletes the corresponding row
>
> My question is double
> a) What is the opinion on the project?
Would be nice if you explain more about what kind of problem you want to solve.
> b) What are the consequences of the large number of inserts and deletions
The first thing came to my mind with this approach is table bloat.
> c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I think) no table operations.
Yeap... no table operations.
Regards,
Yeap... no table operations.
Regards,
--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Would be nice if you explain more about what kind of problem you want to solve.
There are two main questions "to solve"
a) generally speaking, I like to get some form of "exclusive access" to the row before updating or deleting. None of the optimistic / pesimistic automatic variants of concurrency management glad me. Nor, of course, the "versioning" variants.
b) some of the tables I´m working on have a "number" column (not the PK) which sometimes come from material, external sources, but sometimes must be assigned by the system. This could be solved two main ways
b.1) Use a trigger to get the famous "max(n) + 1". At least in one of the cases, the number automatically generated must fall into different ranges conditioned by the value of another column, and the ranges vary between versions of the database.
b.2) "Lock the entire table", get programmatically the next number for the correct range, assign it and free the table lock.
Of course, and beforehand, all database manipulations are done thru applications.
Till yesterday, I was working with the "advisory" family. Having a bigint as the only "lock identifier" I was working with a hash of the table name XORed with the id of the row or zero for the entire table. (All my tables have an autosequential integer id as PK).
Even if I found a very robust hash algorithm for the table name, I cannot discard some collision once the id was xored. I tested five or six table names, along 20000 ids every one, without collision. But...
Of course, I need the "full table lock" for inserts. So, it´s a very separated concern with updates and deletions. But...
TIA
There are two main questions "to solve"
a) generally speaking, I like to get some form of "exclusive access" to the row before updating or deleting. None of the optimistic / pesimistic automatic variants of concurrency management glad me. Nor, of course, the "versioning" variants.
b) some of the tables I´m working on have a "number" column (not the PK) which sometimes come from material, external sources, but sometimes must be assigned by the system. This could be solved two main ways
b.1) Use a trigger to get the famous "max(n) + 1". At least in one of the cases, the number automatically generated must fall into different ranges conditioned by the value of another column, and the ranges vary between versions of the database.
b.2) "Lock the entire table", get programmatically the next number for the correct range, assign it and free the table lock.
Of course, and beforehand, all database manipulations are done thru applications.
Till yesterday, I was working with the "advisory" family. Having a bigint as the only "lock identifier" I was working with a hash of the table name XORed with the id of the row or zero for the entire table. (All my tables have an autosequential integer id as PK).
Even if I found a very robust hash algorithm for the table name, I cannot discard some collision once the id was xored. I tested five or six table names, along 20000 ids every one, without collision. But...
Of course, I need the "full table lock" for inserts. So, it´s a very separated concern with updates and deletions. But...
TIA
On 17/09/2018 03:19 , Fabrízio de Royes Mello wrote:
Em dom, 16 de set de 2018 às 17:53, marcelo <marcelo.nicolet@gmail.com> escreveu:
>
> I need a mechanism of "logical locking" more ductile than the pg_advisory family.
> I'm thinking of a table ("lock_table") that would be part of the database, with columns
> * tablename varchar - name of the table "locked"
> * rowid integer, - id of the row "locked"
> * ownerid varchar, - identifier of the "user" who acquired the lock
> * acquired timestamp - to be able to release "abandoned" locks after a certain time
>
> and a group of functions
> 1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over the entire table, setting rowid to zero
> 2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, if the owner is the recorded one
> 3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table is locked by some user other than the ownerid argument
> 4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_try_advisory_lock
> 5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_advisory_unlock
> 6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid
>
> The timeout (default, maybe 15 minutes) is implicitly applied if the lock is taken by another user (there will be no notification).
> Redundant locks are not queued, they simply return true, may be after an update of the acquired column.
> Successful locks insert a new row, except the rare case of a timeout, which becomes an update (ownerid and acquired)
> Unlock operations deletes the corresponding row
>
> My question is double
> a) What is the opinion on the project?
Would be nice if you explain more about what kind of problem you want to solve.> b) What are the consequences of the large number of inserts and deletionsThe first thing came to my mind with this approach is table bloat.> c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I think) no table operations.
Yeap... no table operations.
Regards,
--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
On Sun, Sep 16, 2018 at 3:53 PM marcelo <marcelo.nicolet@gmail.com> wrote: > > I need a mechanism of "logical locking" more ductile than the pg_advisory family. > I'm thinking of a table ("lock_table") that would be part of the database, with columns > * tablename varchar - name of the table "locked" > * rowid integer, - id of the row "locked" > * ownerid varchar, - identifier of the "user" who acquired the lock > * acquired timestamp - to be able to release "abandoned" locks after a certain time > > and a group of functions > 1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over the entire table, setting rowid to zero > 2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, if the owner is the recorded one > 3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table is locked by some user other than the owneridargument > 4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_try_advisory_lock > 5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_advisory_unlock > 6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid > > The timeout (default, maybe 15 minutes) is implicitly applied if the lock is taken by another user (there will be no notification). > Redundant locks are not queued, they simply return true, may be after an update of the acquired column. > Successful locks insert a new row, except the rare case of a timeout, which becomes an update (ownerid and acquired) > Unlock operations deletes the corresponding row > > My question is double > a) What is the opinion on the project? > b) What are the consequences of the large number of inserts and deletions > c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I think) no table operations. Why can't you use the advisory lock functions? The challenge with manually managed locks are they they are slow and you will lose the coordination the database provides you. For example, if your application crashes you will have to clean up all held locks yourself. Building out that infrastructure will be difficult. merlin
On Mon, Sep 17, 2018 at 5:09 PM Merlin Moncure <mmoncure@gmail.com> wrote:
On Sun, Sep 16, 2018 at 3:53 PM marcelo <marcelo.nicolet@gmail.com> wrote:
>
> I need a mechanism of "logical locking" more ductile than the pg_advisory family.
> I'm thinking of a table ("lock_table") that would be part of the database, with columns
> * tablename varchar - name of the table "locked"
> * rowid integer, - id of the row "locked"
> * ownerid varchar, - identifier of the "user" who acquired the lock
> * acquired timestamp - to be able to release "abandoned" locks after a certain time
>
> and a group of functions
> 1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over the entire table, setting rowid to zero
> 2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, if the owner is the recorded one
> 3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table is locked by some user other than the ownerid argument
> 4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_try_advisory_lock
> 5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_advisory_unlock
> 6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid
>
> The timeout (default, maybe 15 minutes) is implicitly applied if the lock is taken by another user (there will be no notification).
> Redundant locks are not queued, they simply return true, may be after an update of the acquired column.
> Successful locks insert a new row, except the rare case of a timeout, which becomes an update (ownerid and acquired)
> Unlock operations deletes the corresponding row
>
> My question is double
> a) What is the opinion on the project?
> b) What are the consequences of the large number of inserts and deletions
> c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I think) no table operations.
Why can't you use the advisory lock functions? The challenge with
manually managed locks are they they are slow and you will lose the
coordination the database provides you. For example, if your
application crashes you will have to clean up all held locks yourself.
Building out that infrastructure will be difficult.
First, I think in an ideal world, you wouldn't handle this problem with either approach but sometimes you have to.
I have done both approaches actually. LedgerSMB uses its own lock table because locks have to persist across multiple HTTP requests and we have various automatic cleanup processes.
When I was working on the queue management stuff at Novozymes we used advisory locks extensively.
These two approaches have serious downsides:
1. Lock tables are *slow* and require careful thinking through cleanup scenarios. In LedgerSMB we tied to the application session with an ON DELETE event that would unlock the row. We estimated that for every 2 seconds that the db spent doing useful work, it spent 42 seconds managing the locks..... Additionally the fact that locks take effect on snapshot advance is a problem here.
2. In my talk, "PostgreSQL at 10TB and Beyond" I talk about a problem we had using advisory locks for managing rows that were being processed for deletion. Since the deletion was the scan for items at the head of an index, under heavy load we could spend long enough checking dead rows that the locks could go away with our snapshot failing to advance. This would result in duplicate processing. So the fact that advisory locks don't really follow snapshot semantics is a really big problem here since it means you can have race conditions in advisory locks that can't happen with other locking issues. I still love advisory locks but they are not a good tool for this.
The real solution most of the time is actually to lock the rows by selecting FOR UPDATE and possibly SKIP LOCKED. The way update/delete row locking in PostgreSQL works is usually good enough except in a few rare edge cases. Only in *very rare* cases do lock tables or advisory locks make sense for actual row processing.
merlin
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.
On 17/09/2018 12:21 , Chris Travers wrote:
I´m using an ORM (Devart´s) to access the database, so, I cannot "select ... FOR UPDATE". The application paradigm is that a user have a list of records (after a query) and she could update or delete any of them as the business rules allows it. So, at least an advisory lock is a must.On Mon, Sep 17, 2018 at 5:09 PM Merlin Moncure <mmoncure@gmail.com> wrote:On Sun, Sep 16, 2018 at 3:53 PM marcelo <marcelo.nicolet@gmail.com> wrote:
>
> I need a mechanism of "logical locking" more ductile than the pg_advisory family.
> I'm thinking of a table ("lock_table") that would be part of the database, with columns
> * tablename varchar - name of the table "locked"
> * rowid integer, - id of the row "locked"
> * ownerid varchar, - identifier of the "user" who acquired the lock
> * acquired timestamp - to be able to release "abandoned" locks after a certain time
>
> and a group of functions
> 1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over the entire table, setting rowid to zero
> 2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, if the owner is the recorded one
> 3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table is locked by some user other than the ownerid argument
> 4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_try_advisory_lock
> 5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_advisory_unlock
> 6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid
>
> The timeout (default, maybe 15 minutes) is implicitly applied if the lock is taken by another user (there will be no notification).
> Redundant locks are not queued, they simply return true, may be after an update of the acquired column.
> Successful locks insert a new row, except the rare case of a timeout, which becomes an update (ownerid and acquired)
> Unlock operations deletes the corresponding row
>
> My question is double
> a) What is the opinion on the project?
> b) What are the consequences of the large number of inserts and deletions
> c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I think) no table operations.
Why can't you use the advisory lock functions? The challenge with
manually managed locks are they they are slow and you will lose the
coordination the database provides you. For example, if your
application crashes you will have to clean up all held locks yourself.
Building out that infrastructure will be difficult.First, I think in an ideal world, you wouldn't handle this problem with either approach but sometimes you have to.I have done both approaches actually. LedgerSMB uses its own lock table because locks have to persist across multiple HTTP requests and we have various automatic cleanup processes.When I was working on the queue management stuff at Novozymes we used advisory locks extensively.These two approaches have serious downsides:1. Lock tables are *slow* and require careful thinking through cleanup scenarios. In LedgerSMB we tied to the application session with an ON DELETE event that would unlock the row. We estimated that for every 2 seconds that the db spent doing useful work, it spent 42 seconds managing the locks..... Additionally the fact that locks take effect on snapshot advance is a problem here.2. In my talk, "PostgreSQL at 10TB and Beyond" I talk about a problem we had using advisory locks for managing rows that were being processed for deletion. Since the deletion was the scan for items at the head of an index, under heavy load we could spend long enough checking dead rows that the locks could go away with our snapshot failing to advance. This would result in duplicate processing. So the fact that advisory locks don't really follow snapshot semantics is a really big problem here since it means you can have race conditions in advisory locks that can't happen with other locking issues. I still love advisory locks but they are not a good tool for this.The real solution most of the time is actually to lock the rows by selecting FOR UPDATE and possibly SKIP LOCKED. The way update/delete row locking in PostgreSQL works is usually good enough except in a few rare edge cases. Only in *very rare* cases do lock tables or advisory locks make sense for actual row processing.
merlin--Best Wishes,Chris TraversEfficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.
I´m convinced by now: I would stay with advisory locks... expecting no app crash could occur...
Thank you all.
Marcelo
On Mon, Sep 17, 2018 at 6:04 PM marcelo <marcelo.nicolet@gmail.com> wrote:
I´m using an ORM (Devart´s) to access the database, so, I cannot "select ... FOR UPDATE". The application paradigm is that a user have a list of records (after a query) and she could update or delete any of them as the business rules allows it. So, at least an advisory lock is a must.
I´m convinced by now: I would stay with advisory locks... expecting no app crash could occur...
I would say to fix this in the ORM rather than reinvent what the database already gives you in the database.
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.
On 17/09/2018 14:27 , Chris Travers wrote:
You are right. But you know...On Mon, Sep 17, 2018 at 6:04 PM marcelo <marcelo.nicolet@gmail.com> wrote:I´m using an ORM (Devart´s) to access the database, so, I cannot "select ... FOR UPDATE". The application paradigm is that a user have a list of records (after a query) and she could update or delete any of them as the business rules allows it. So, at least an advisory lock is a must.
I´m convinced by now: I would stay with advisory locks... expecting no app crash could occur...I would say to fix this in the ORM rather than reinvent what the database already gives you in the database.
--Best Wishes,Chris TraversEfficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.