Re: pg_advisory_lock problem - Mailing list pgsql-general

From Merlin Moncure
Subject Re: pg_advisory_lock problem
Date
Msg-id CAHyXU0xoHSs2qQoe9zkXpMJUwxwvd053ohyscsvTWbfMEukPKg@mail.gmail.com
Whole thread Raw
In response to Re: pg_advisory_lock problem  (Rémi Cura <remi.cura@gmail.com>)
Responses Re: pg_advisory_lock problem  (Rémi Cura <remi.cura@gmail.com>)
List pgsql-general
On Tue, Aug 12, 2014 at 5:07 AM, Rémi Cura <remi.cura@gmail.com> wrote:
>
>
>
> 2014-08-11 22:48 GMT+02:00 Kevin Grittner <kgrittn@ymail.com>:
>
>> Rémi Cura <remi.cura@gmail.com> wrote:
>>
>> > as you (both?) suggested it works using advisory lock used at the
>> > beginning and end of the transaction. This way there is no upsert
>> > at all if the element is locked? (I used general advisory lockbut
>> > in the same way as transactionnal lock)
>>
>> This is too vague to comment on.
>>
>> > The issue in this case is simple : I have to use about 100k
>> > advisory locks, which is a big memory requirement for my hardware
>> > :-(
>>
>> ... and that doesn't seem to make any sense.  Either you are not
>> understanding advisory locks or you are doing something very, very
>> unusual.
>>
>> > Merlin I'm afraid I don't understand what is "vanilla" LOCK
>> > TABLE.
>>
>> See the LOCK TABLE command.
>>
>> http://www.postgresql.org/docs/current/interactive/sql-lock.html
>>
>> http://www.postgresql.org/docs/current/interactive/explicit-locking.html
>>
>> > I can't really use a lock table because each query upsert
>> > sequentially into 3 tables, doing lots of computing between.
>>
>> Now *that* I understand.  :-)  It's not an unusual requirement,
>> but can be a challenge when using snapshot isolation (where writes
>> don't block reads and reads don't block anything).  There are two
>> main approaches -- introduce blocking to serialize some of the
>> operations, or use the SERIALIZABLE transaction isolation level to
>> ensure that the behavior of all concurrent transactions is
>> consistent with the behavior you would see if they were run one at
>> a time.  The latter approach doesn't introduce any new blocking,
>> but it can cause transactions to get an ERROR with a SQLSTATE of
>> 40001 at just about any point, so you need to be prepared to
>> recognize that and retry those transactions from the beginning (not
>> just the last statement of the transaction), ignoring any data read
>> during the failed attempt.
>>
>> You may want to read the entire chapter on concurrency control:
>>
>> http://www.postgresql.org/docs/current/interactive/mvcc.html
>>
>> If you are considering using SERIALIZABLE transactions, you should
>> probably review the examples in the Wiki, to get an idea of how it
>> behaves in various cases:
>>
>> http://wiki.postgresql.org/wiki/SSI
>>
>> > I use parallel query to compute faster (load dividing). I guess
>> > it would be very slow with about 8 parallel queries with locks.
>>
>> Well, if you introduce blocking you reduce your parallelism, but if
>> you use serializable transactions and there are actually a lot of
>> conflicts you can see poor performance because of the errors
>> rolling back transactions and the need to retry them from the
>> start.  The techniques used to implement serializable transactions
>> in PostgreSQL are basically a refinement of the Optimistic
>> Concurrency Control (OCC) techniques, but generally with far fewer
>> retries needed -- the point being that it optimistically assumes
>> that there will not be a conflict so that concurrency is better,
>> but has to cancel things if that optimism proves to be unfounded.
>>
>> To make related to changes to multiple tables and maintain coherent
>> data, you probably will need to do one or the other.
>>
>> --
>> Kevin Grittner
>> EDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
>
>
> Hey, thanks for your detailed answer.
>
> The particularity here is that I use a big query with CTE instead of a more
> procedural way.
> I do sophisticated geometric computing using postGIS. I guess it is a hack
> of both postgres and postgis.
>
> I explain better the pg_advisory locks uses I have tried.
>
> First classic use of pg_advisory, not working :
> CTE_1 (find what rows will be upserted in table_1)
> CTE_2 (find what rows will be upserted in table_2)
> CTE_3 (find what rows will be upserted in table_3)
> CTE_4 (compute the result to be upserted into table_1)
> CTE_5 (upsert into table_1 using custom upsert_function)
> CTE_6 (compute the result to be upserted into table_2)
> CTE_7 (upsert into table_2 using custom upsert_function)
> CTE_8 (compute the result to be upserted into table_2)
> CTE_9 (upsert into table_2 using custom upsert_function)
> CTE_10 (end of query)
> each of the upserting function is plpgsql and do something like
> pg_advisory_lock(table_number, id of row to be upserted)
> with updating AS (update table)
> insert into table if not updated
> pg_advisory_unlock(table_number,id of row to be upserted)
>
> According to what the list said, it doesn't work because of visibility
> issues : the locking work, so we know each processes will upsert the same
> thing sequentially. However it will fail because each process has no
> visibility on the insert done by the others. So when the second process
> upsert the same thing, it will try to insert , and we get 2 inserts for the
> same row
>
> Second non classic use of pg_adivsory, working but too taxing on resources
> CTE_1 (find what rows will be upserted in table_1,
> pg_try_advisory_lock(1,id1)  )
> CTE_2 (find what rows will be upserted in table_2,
> pg_try_advisory_lock(2,id2)  )
> CTE_3 (find what rows will be upserted in table_3,
> pg_try_advisory_lock(3,id3)  )
> CTE_4 (compute the result to be upserted into table_1)
> CTE_5 (upsert into table_1 using custom upsert_function, only if
> pg_try_advisory_lock was true )
> CTE_6 (compute the result to be upserted into table_2)
> CTE_7 (upsert into table_2 using custom upsert_function, only if
> pg_try_advisory_lock was true)
> CTE_8 (compute the result to be upserted into table_2)
> CTE_9 (upsert into table_2 using custom upsert_function, only if
> pg_try_advisory_lock was true)
> CTE_10 (release locks for (1,id1) , (2,id2) , (3,id3)) )
> CTE_10 (end of query)
>
> This works because we don't even try to upsert a row that is already being
> taken care by another process. So there is no visibility issue;
> The problem is with ressources, I have to use 100k locks for several
> minutes. It grezatly increase memory use.
>
> The difference between the 2 is essentially : in first case we lock each
> individual upsert statement. In second case we lock at the query level.
>
> On the bright side, I don't need to wait for the insert in table_1 to
> compute table_2, same for table_2 and table_3.
> However I can't control when the CTE upserting into each table is executed
> (as far as I know, there is no guarantee of execution order when using
> multiple cte).
>
> Now about the lock of tables: I guess it wouldn't work for the same reasons
> at the individual upsert statement.
> If I use it at the whole query level, other parallel process will be waiting
> for the whole query to finish before being allowed to go on?
> Moreover, whatever LOCK I use will be within the same big transaction, so
> the only solution may be :
>
> lock table_1,table_2,table_3;
> execute big querry;
> unlock table_1, table_2, table_3;
>
> I didn't find any "vanilla" lock. Is "vanilla"  some kind of postgres slang?

'vanilla' is common english slang for 'regular'.   Basically I was
referring to the built in lock table statement.  LOCK has no
corresponding 'UNLOCK'; you lock a table or tables then locks are
released for you at the end of transaction.  What you are trying to
do, but isn't really possible, is to 'temporarily' lock an object
inside a transaction.  Transactions group related work and must be
handled as one unit.

The default lock of LOCK TABLE is very heavy but you can pass an
optional lockmode if you want to allow readers.   Definitely read this
chapter: http://www.postgresql.org/docs/9.3/static/explicit-locking.html.
The point I was making is that using advisory locks to block access to
a table usually isn't a good idea; just use the built in locks.

merlin


pgsql-general by date:

Previous
From: Rémi Cura
Date:
Subject: Re: pg_advisory_lock problem
Next
From: Jose Moreira
Date:
Subject: Inserting large binary data into lo type table