Thread: Avoiding deadlocks when performing bulk update and delete operations

Avoiding deadlocks when performing bulk update and delete operations

From
Sanjaya Vithanagama
Date:
Hi All,

We have a single table which does not have any foreign key references.

id_A (bigint)
id_B (bigint)
val_1 (varchar)
val_2 (varchar)

The primary key of the table is a composite of id_A and id_B.

Reads and writes of this table are highly concurrent and the table has millions of rows. We have several stored procedures which do mass updates and deletes. Those stored procedures are being called concurrently mainly by triggers and application code.

The operations usually look like the following where it could match thousands of records to update or delete:

DELETE FROM table_name t
USING (
   SELECT id_A, id_B
   FROM   table_name 
   WHERE  id_A = ANY(array_of_id_A)
   AND    id_B = ANY(array_of_id_B)
   ORDER  BY id_A, id_B
   FOR    UPDATE
   ) del
WHERE  t.id_A = del.id_A
AND    t.id_B = del.id_B;


UPDATE table_name t
SET    val_1 = 'some value'
     , val_2 = 'some value'
FROM (
   SELECT id_A, id_B
   FROM   table_name 
   WHERE  id_A = ANY(array_of_id_A)
   AND    id_B = ANY(array_of_id_B)
   ORDER  BY id_A, id_B
   FOR    UPDATE
   ) upd
WHERE  t.id_A = upd.id_A
AND    t.id_B = upd.id_B;

We are experiencing deadlocks and all our attempts to perform operations with locks (row level using SELECT FOR UPDATE as used in the above queries and table level locks) do not seem to solve these deadlock issues. (Note that we cannot in any way use access exclusive locking on this table because of the performance impact)

Is there another way that we could try to solve these deadlock situations? The reference manual says — "The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order."

Is there a guaranteed way to do bulk update/delete operations in a particular order so that we can ensure deadlocks won't occur? Or are there any other tricks to avoid deadlocks in this situation?

Thank you in advance,
Sanjaya

Re: Avoiding deadlocks when performing bulk update and delete operations

From
Gavin Flower
Date:
On 24/11/14 16:51, Sanjaya Vithanagama wrote:
> Hi All,
>
> We have a single table which does not have any foreign key references.
>
> id_A (bigint)
> id_B (bigint)
> val_1 (varchar)
> val_2 (varchar)
>
> The primary key of the table is a composite of id_A and id_B.
>
> Reads and writes of this table are highly concurrent and the table has
> millions of rows. We have several stored procedures which do mass
> updates and deletes. Those stored procedures are being called
> concurrently mainly by triggers and application code.
>
> The operations usually look like the following where it could match
> thousands of records to update or delete:
>
> DELETE FROM table_name t
> USING (
>    SELECT id_A, id_B
>    FROM   table_name
>    WHERE  id_A = ANY(array_of_id_A)
>    AND    id_B = ANY(array_of_id_B)
>    ORDER  BY id_A, id_B
>    FOR    UPDATE
>    ) del
> WHERE  t.id_A = del.id_A
> AND    t.id_B = del.id_B;
>
>
> UPDATE table_name t
> SET    val_1 = 'some value'
>      , val_2 = 'some value'
> FROM (
>    SELECT id_A, id_B
>    FROM   table_name
>    WHERE  id_A = ANY(array_of_id_A)
>    AND    id_B = ANY(array_of_id_B)
>    ORDER  BY id_A, id_B
>    FOR    UPDATE
>    ) upd
> WHERE  t.id_A = upd.id_A
> AND    t.id_B = upd.id_B;
>
> We are experiencing deadlocks and all our attempts to perform
> operations with locks (row level using SELECT FOR UPDATE as used in
> the above queries and table level locks) do not seem to solve these
> deadlock issues. (Note that we cannot in any way use access exclusive
> locking on this table because of the performance impact)
>
> Is there another way that we could try to solve these deadlock
> situations? The reference manual says — "The best defense against
> deadlocks is generally to avoid them by being certain that all
> applications using a database acquire locks on multiple objects in a
> consistent order."
>
> Is there a guaranteed way to do bulk update/delete operations in a
> particular order so that we can ensure deadlocks won't occur? Or are
> there any other tricks to avoid deadlocks in this situation?
>
> Thank you in advance,
> Sanjaya
Unless there is some sort of implied locking, or other nonsense like
different lock types, then always acquiring locks in the same order
should work - as far as I can tell.

For purely locking problems, and assuming that all the relevant tables
are locked:

For if process p1 acquires locks in the order A, B, & C
and process p2 acquires locks in the order A, C, & D,
Then as soon as one process grabs A, then the other process cannot grab
A nor the other locks - so deadlock is avoided.

Similarly:
p1 A, B, C, & D
p2 B & C
However, if p1 grabs A and then p2 grabs B, P1 will have to wait for p2
to finish before p1 continues - but still, neither is deadlocked.  Even
if there is p3 which locks B - at worst 2 processes will wait until the
lucky first process releases its locks.

You may have problems if there is some resource that is in contention,
where 2 processes require the resource and grab it in several parts at
different times, and they both grab some, and then there is insufficient
to completely satisfy either - this is guesswork, I'm not sure what
resources (if any) would be a problem here.


Cheers,
Gavin



Re: Avoiding deadlocks when performing bulk update and delete operations

From
Igor Neyman
Date:

 

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sanjaya Vithanagama
Sent: Sunday, November 23, 2014 10:52 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Avoiding deadlocks when performing bulk update and delete operations

 

Hi All,

 

We have a single table which does not have any foreign key references.

 

id_A (bigint)

id_B (bigint)

val_1 (varchar)

val_2 (varchar)

 

The primary key of the table is a composite of id_A and id_B.

 

Reads and writes of this table are highly concurrent and the table has millions of rows. We have several stored procedures which do mass updates and deletes. Those stored procedures are being called concurrently mainly by triggers and application code.

 

The operations usually look like the following where it could match thousands of records to update or delete:

 

DELETE FROM table_name t

USING (

   SELECT id_A, id_B

   FROM   table_name 

   WHERE  id_A = ANY(array_of_id_A)

   AND    id_B = ANY(array_of_id_B)

   ORDER  BY id_A, id_B

   FOR    UPDATE

   ) del

WHERE  t.id_A = del.id_A

AND    t.id_B = del.id_B;

 

 

UPDATE table_name t

SET    val_1 = 'some value'

     , val_2 = 'some value'

FROM (

   SELECT id_A, id_B

   FROM   table_name 

   WHERE  id_A = ANY(array_of_id_A)

   AND    id_B = ANY(array_of_id_B)

   ORDER  BY id_A, id_B

   FOR    UPDATE

   ) upd

WHERE  t.id_A = upd.id_A

AND    t.id_B = upd.id_B;

 

We are experiencing deadlocks and all our attempts to perform operations with locks (row level using SELECT FOR UPDATE as used in the above queries and table level locks) do not seem to solve these deadlock issues. (Note that we cannot in any way use access exclusive locking on this table because of the performance impact)

 

Is there another way that we could try to solve these deadlock situations? The reference manual says — "The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order."

 

Is there a guaranteed way to do bulk update/delete operations in a particular order so that we can ensure deadlocks won't occur? Or are there any other tricks to avoid deadlocks in this situation?

 

Thank you in advance,

Sanjaya

 

 

May be I’m missing something here, but it seems that you make the problem worse by using :

 

DELETE … USING (SELECT … FOR UPDATE)…

 

Can’t you just do:

 

DELETE FROM table_name

  WHERE id_A = ANY(array_of_id_A)

       AND id_B = ANY(array_of_id_B);

 

?

 

Regards,

Igor Neyman

 

How to individually list the DDL for all individual data base objects

From
Berend Tober
Date:
Is there a good way to individually list the DDL for all individual data
base objects?

Running a data base dump like:

  pg_dump mydatabase  > mydatabase-database.sql

produces one big file with all the DDL and DML to restore the data base,
which is very convenient for most cases.

Using that I have sometimes cut-and-pasted out of it the DDL for
individual items that I want to work on, and then fed that back in with

psql mydatabase < newstuff.sql


What I would like, though, is a set of output files, one for each
DDL/DML item currently represented in the one big file.

I could maybe attempt some convoluted automated parsing of the big file
with, for example, Perl, but that seems, well, convoluted, error-prone,
and probably fragile.

The directory dump output option for pg_dump is similar to, but not
really, what I want (as far as I see the individual files that produces
are only the ones for reloading data, so correct me if I am wrong ...
and please show me how to do it right!)

I have played around with the custom format dump followed by pg_restore
and various options, but did not get what I wanted, at least not as
elegantly as I wanted.

What I have come up with is fairly simple in appearance, but the way it
works, by reading one line-at-a-time from the list file associated with
the dump file, and then running pg_restore with just that one line,
rinse and repeat for each piece of DDL/DML, also seems convoluted and
potentially fragile.


Something along the lines of (... if anyone thinks this a good idea, or
good starting point ...):


grep -v '^;' listfile | while read a b c n
     do
       a=${a/;}
       echo $a > f
       pg_restore -L f -f outputdir/$a dumpfile
     done

This, as it is, creates a set of files named according to the id number
that pg_dump uses to identify each element. Ideally, I would like the
files named after the schema+object it represents.

Thanks for your help!


---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



Re: How to individually list the DDL for all individual data base objects

From
François Beausoleil
Date:
Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit :

> Is there a good way to individually list the DDL for all individual data base objects?
>

<snip>

> grep -v '^;' listfile | while read a b c n
>    do
>      a=${a/;}
>      echo $a > f
>      pg_restore -L f -f outputdir/$a dumpfile
>    done
>
> This, as it is, creates a set of files named according to the id number that pg_dump uses to identify each element.
Ideally,I would like the files named after the schema+object it represents. 

Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dump
file?

$ pg_restore —help
…

  -f, --file=FILENAME      output file name
…
  -t, --table=NAME         restore named table
…

Such that you could run:

$ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump

Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can alleviate
someof these problems. YMMV. 

Hope that helps!
François



Re: How to individually list the DDL for all individual data base objects

From
Alvaro Herrera
Date:
Berend Tober wrote:
> Is there a good way to individually list the DDL for all individual data
> base objects?

These threads might interest you:

http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0YS78txHFmdtyxJfsrsRcLqN@mail.gmail.com
http://www.postgresql.org/message-id/CAASwCXdKROy2UHLof4Zvxx00goSccqXkg_g4Yf9vOuK8aXWh0g@mail.gmail.com

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: How to individually list the DDL for all individual data base objects

From
Berend Tober
Date:
François Beausoleil wrote:
>
> Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit :
>
>> Is there a good way to individually list the DDL for all individual data base objects?
>>
>
> <snip>
>
>
> Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom
dumpfile? 

Yes sir, thank you. That is what the script does.

But that little script seems fragilly-dependent upon the format of the
list file and runs (starts a new process for) pg_restore for every
individual line in the list file. ... which seems like poor practise,
generally. Additionally, I'd like stability in the output file names,
rather than the (likely changing) internal, automatically
numerically-named items.

And, btw, I want not just a single table.

The dependance issue is potentially a problem, but (as I believe) it
looks like the output of pg_restore in a list file is in the correct
order to process dependencies, so I could walk backwards of forwards
through that if changes to one object were dependent on, or caused
cascading effects in other objects.



>
> $ pg_restore —help
> …
>
>    -f, --file=FILENAME      output file name
> …
>    -t, --table=NAME         restore named table
> …
>
> Such that you could run:
>
> $ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump
>
> Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can
alleviatesome of these problems. YMMV. 
>
> Hope that helps!
> François
>


---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



Re: How to individually list the DDL for all individual data base objects

From
Melvin Davidson
Date:
You should probably look at the pg_extractor utility.

https://github.com/omniti-labs/pg_extractor

With it, you can dump individual or selected objects to separate directories.

On Mon, Nov 24, 2014 at 11:00 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Berend Tober wrote:
> Is there a good way to individually list the DDL for all individual data
> base objects?

These threads might interest you:

http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0YS78txHFmdtyxJfsrsRcLqN@mail.gmail.com
http://www.postgresql.org/message-id/CAASwCXdKROy2UHLof4Zvxx00goSccqXkg_g4Yf9vOuK8aXWh0g@mail.gmail.com

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: How to individually list the DDL for all individual data base objects

From
Adrian Klaver
Date:
On 11/24/2014 08:12 AM, Berend Tober wrote:
> François Beausoleil wrote:
>>
>> Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit :
>>
>>> Is there a good way to individually list the DDL for all individual
>>> data base objects?
>>>
>>
>> <snip>
>>
>>
>> Were you aware that pg_restore can restore to STDOUT, and output DDL
>> for only a single named object from a custom dump file?
>
> Yes sir, thank you. That is what the script does.
>
> But that little script seems fragilly-dependent upon the format of the
> list file and runs (starts a new process for) pg_restore for every
> individual line in the list file. ... which seems like poor practise,
> generally. Additionally, I'd like stability in the output file names,
> rather than the (likely changing) internal, automatically
> numerically-named items.
>
> And, btw, I want not just a single table.
>
> The dependance issue is potentially a problem, but (as I believe) it
> looks like the output of pg_restore in a list file is in the correct
> order to process dependencies, so I could walk backwards of forwards
> through that if changes to one object were dependent on, or caused
> cascading effects in other objects.

At this point I have to ask:

What is the problem you are trying to solve?

Your last paragraph seems to be reinventing the -Fc TOC, so I am not
sure where you going with this?

>
>
>
>>
>> $ pg_restore —help
>> …
>>
>>    -f, --file=FILENAME      output file name
>> …
>>    -t, --table=NAME         restore named table
>> …
>>
>> Such that you could run:
>>
>> $ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump
>>
>> Unfortunately, this does not respect dependencies and you may have
>> issues. The --disable-triggers option can alleviate some of these
>> problems. YMMV.
>>
>> Hope that helps!
>> François
>>
>
>
> ---
> This email is free from viruses and malware because avast! Antivirus
> protection is active.
> http://www.avast.com
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: How to individually list the DDL for all individual data base objects

From
Berend Tober
Date:
Adrian Klaver wrote:
> On 11/24/2014 08:12 AM, Berend Tober wrote:
>> François Beausoleil wrote:
>>>
>>> Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit :
>>>
>>>> Is there a good way to individually list the DDL for all individual
>>>> data base objects?
>>>>
>>>
>>> <snip>
>>>
>>>
>>> Were you aware that pg_restore can restore to STDOUT, and output DDL
>>> for only a single named object from a custom dump file?
>>
>>    <snip> <snip>
>>
>> The dependance issue is potentially a problem, but (as I believe) it
>> looks like the output of pg_restore in a list file is in the correct
>> order to process dependencies, so I could walk backwards of forwards
>> through that if changes to one object were dependent on, or caused
>> cascading effects in other objects.
>
> At this point I have to ask:
>
> What is the problem you are trying to solve?
>
> Your last paragraph seems to be reinventing the -Fc TOC, so I am not
> sure where you going with this?


On the small scale, I wanted to have an individual file listing the DDL
for each data base object in the case that I need to modify the object
... I could start with the script that created it as a basis for
modifications, like for views and functions, etc. In the larger scale, I
was thinking I would like to check in all of the individual modules to
revision control, retaining a fine-grained control, rather than
submitting the one big comprehensive file.




---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



Re: How to individually list the DDL for all individual data base objects

From
Berend Tober
Date:
Melvin Davidson wrote:
> You should probably look at the pg_extractor utility.
>
> https://github.com/omniti-labs/pg_extractor
>
> With it, you can dump individual or selected objects to separate
> directories.


That looks like what I'm looking for. (Note: I did Google searching, but
apparently did not use the right set of search terms, because this tool
really is described as being what it is that I am looking to do!)

Thanks!



---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



Re: Avoiding deadlocks when performing bulk update and delete operations

From
Bill Moran
Date:
On Mon, 24 Nov 2014 14:51:42 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

> Hi All,
>
> We have a single table which does not have any foreign key references.
>
> id_A (bigint)
> id_B (bigint)
> val_1 (varchar)
> val_2 (varchar)
>
> The primary key of the table is a composite of id_A and id_B.
>
> Reads and writes of this table are highly concurrent and the table has
> millions of rows. We have several stored procedures which do mass updates
> and deletes. Those stored procedures are being called concurrently mainly
> by triggers and application code.
>
> The operations usually look like the following where it could match
> thousands of records to update or delete:
>
> DELETE FROM table_name t
> USING (
>    SELECT id_A, id_B
>    FROM   table_name
>    WHERE  id_A = ANY(array_of_id_A)
>    AND    id_B = ANY(array_of_id_B)
>    ORDER  BY id_A, id_B
>    FOR    UPDATE
>    ) del
> WHERE  t.id_A = del.id_A
> AND    t.id_B = del.id_B;
>
>
> UPDATE table_name t
> SET    val_1 = 'some value'
>      , val_2 = 'some value'
> FROM (
>    SELECT id_A, id_B
>    FROM   table_name
>    WHERE  id_A = ANY(array_of_id_A)
>    AND    id_B = ANY(array_of_id_B)
>    ORDER  BY id_A, id_B
>    FOR    UPDATE
>    ) upd
> WHERE  t.id_A = upd.id_A
> AND    t.id_B = upd.id_B;
>
> We are experiencing deadlocks and all our attempts to perform operations
> with locks (row level using SELECT FOR UPDATE as used in the above queries
> and table level locks) do not seem to solve these deadlock issues. (Note
> that we cannot in any way use access exclusive locking on this table
> because of the performance impact)
>
> Is there another way that we could try to solve these deadlock situations?
> The reference manual says ? "The best defense against deadlocks is
> generally to avoid them by being certain that all applications using a
> database acquire locks on multiple objects in a consistent order."
>
> Is there a guaranteed way to do bulk update/delete operations in a
> particular order so that we can ensure deadlocks won't occur? Or are there
> any other tricks to avoid deadlocks in this situation?

Lots of stuff to say about this ...

First off, Igor has a valid point that the subselects are not helping any
and may be making the situation slightly worse. I can't see any reason not
to simiplify the queries as he suggested.

Secondly, a lot of your comments are too vague for me to understand what's
happening, so I'm going to ask a bunch of questions to clarify:

* How many UPDATE/INSERT queries are you running per second?
* How many in parallel on average?
* What's the typical execution time for an UPDATE/INSERT that might cause
  this problem?
* How frequently do deadlocks occur?
* Are there other tables involved in the transactions ... i.e., have you
  confirmed that these are the _only_ tables causing the deadlock?

Since you didn't include any log output, I'm fuzzy on some of those things
above ... but I'm assuming that you're unable to post specific details of
the precise problem.

I have a lot of suggestions, but instead of bombing you with all of them, I
think it would be better if you answered those questions, which will tell
me which suggestions are most likely to help.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com


Re: Avoiding deadlocks when performing bulk update and delete operations

From
Sanjaya Vithanagama
Date:


On Tue, Nov 25, 2014 at 4:42 AM, Bill Moran <wmoran@potentialtech.com> wrote:
On Mon, 24 Nov 2014 14:51:42 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

> Hi All,
>
> We have a single table which does not have any foreign key references.
>
> id_A (bigint)
> id_B (bigint)
> val_1 (varchar)
> val_2 (varchar)
>
> The primary key of the table is a composite of id_A and id_B.
>
> Reads and writes of this table are highly concurrent and the table has
> millions of rows. We have several stored procedures which do mass updates
> and deletes. Those stored procedures are being called concurrently mainly
> by triggers and application code.
>
> The operations usually look like the following where it could match
> thousands of records to update or delete:
>
> DELETE FROM table_name t
> USING (
>    SELECT id_A, id_B
>    FROM   table_name
>    WHERE  id_A = ANY(array_of_id_A)
>    AND    id_B = ANY(array_of_id_B)
>    ORDER  BY id_A, id_B
>    FOR    UPDATE
>    ) del
> WHERE  t.id_A = del.id_A
> AND    t.id_B = del.id_B;
>
>
> UPDATE table_name t
> SET    val_1 = 'some value'
>      , val_2 = 'some value'
> FROM (
>    SELECT id_A, id_B
>    FROM   table_name
>    WHERE  id_A = ANY(array_of_id_A)
>    AND    id_B = ANY(array_of_id_B)
>    ORDER  BY id_A, id_B
>    FOR    UPDATE
>    ) upd
> WHERE  t.id_A = upd.id_A
> AND    t.id_B = upd.id_B;
>
> We are experiencing deadlocks and all our attempts to perform operations
> with locks (row level using SELECT FOR UPDATE as used in the above queries
> and table level locks) do not seem to solve these deadlock issues. (Note
> that we cannot in any way use access exclusive locking on this table
> because of the performance impact)
>
> Is there another way that we could try to solve these deadlock situations?
> The reference manual says ? "The best defense against deadlocks is
> generally to avoid them by being certain that all applications using a
> database acquire locks on multiple objects in a consistent order."
>
> Is there a guaranteed way to do bulk update/delete operations in a
> particular order so that we can ensure deadlocks won't occur? Or are there
> any other tricks to avoid deadlocks in this situation?

Lots of stuff to say about this ...

First off, Igor has a valid point that the subselects are not helping any
and may be making the situation slightly worse. I can't see any reason not
to simiplify the queries as he suggested.

We used sub-selects in the delete queries was with the hope that we could lock all the rows that will be deleted before the actual deletion happens. (So that another transaction won't grab the lock for a row which will be deleted). 
 

Secondly, a lot of your comments are too vague for me to understand what's
happening, so I'm going to ask a bunch of questions to clarify:

* How many UPDATE/INSERT queries are you running per second?

In peak hours it could be anywhere from 100-250. The problem occurs when two triggers happen to update/delete the same rows at once.
 
* How many in parallel on average?
* What's the typical execution time for an UPDATE/INSERT that might cause
  this problem?

The updates are the most problematic with the execution time being in the rage of 5-50 seconds. 
 
* How frequently do deadlocks occur?

We are seeing deadlocks about 2-3 times per day in the production server. To reproduce the problem easily we've written a simple Java class with multiple threads calling to the stored procedures running the above queries inside a loop. This way we can easily recreate a scenario that happens in the production.
 
* Are there other tables involved in the transactions ... i.e., have you
  confirmed that these are the _only_ tables causing the deadlock?

Yes, there are no other tables involved with the quires so we can eliminate any deadlock issues related to foreign key references. 
 

Since you didn't include any log output, I'm fuzzy on some of those things
above ... but I'm assuming that you're unable to post specific details of
the precise problem.

The log output looks like the following: (I have abstracted away the function names are real queries but this represents the actual issue)

ERROR:  deadlock detected
DETAIL:  Process 54624 waits for ShareLock on transaction 14164828; blocked by process 54605.
Process 54605 waits for ShareLock on transaction 14164827; blocked by process 54624.
Process 54624: SELECT 1 FROM proc_delete()
Process 54605: SELECT 1 FROM proc_update()
HINT:  See server log for query details.
CONTEXT:  SQL statement "UPDATE table_name t
SET    val_1 = 'some value'
      , val_2 = 'some value'
FROM (
    SELECT id_A, id_B
    FROM   table_name 
    WHERE  id_A = ANY(array_of_id_A)
    AND    id_B = ANY(array_of_id_B)
    ORDER  BY id_A, id_B
    FOR    UPDATE
    ) upd
WHERE  t.id_A = upd.id_A
AND    t.id_B = upd.id_B"
PL/pgSQL function proc_delete() line 22 at SQL statement
SQL statement "SELECT proc_delete()"
PL/pgSQL function calling_function() line 6 at PERFORM
STATEMENT:  SELECT 1 FROM calling_function()
ERROR:  current transaction is aborted, commands ignored until end of transaction block

 

I have a lot of suggestions, but instead of bombing you with all of them, I
think it would be better if you answered those questions, which will tell
me which suggestions are most likely to help.

It'll be really good if we can get some ideas/alternative suggestion on how to solve this one. It's been affecting our production servers for weeks and we still haven't come across a concrete solution which fixes them.

Thank You.
 

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com



--
Sanjaya

Re: Avoiding deadlocks when performing bulk update and delete operations

From
Bill Moran
Date:
On Wed, 26 Nov 2014 10:41:56 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:
>
> > * How frequently do deadlocks occur?
>
> We are seeing deadlocks about 2-3 times per day in the production server.
> To reproduce the problem easily we've written a simple Java class with
> multiple threads calling to the stored procedures running the above queries
> inside a loop. This way we can easily recreate a scenario that happens in
> the production.

Don't overcomplicate your solution. Adjust your code to detect the deadlock
and replay the transaction when it happens. At 2-3 deadlocks per day, it's
difficult to justify any other solution (as any other solution would be
more time-consuming to implement, AND would interfere with performance).

I've worked with a number of write-heavy applications that experienced
deadlocks, some of them on the order of hundreds of deadlocks per day.
In some cases, you can adjust the queries to reduce the incidence of
deadlocks, or eliminate the possibility of deadlocks completely.  The
situation that you describe is not one of those cases, as the planner
can choose to lock rows in whatever order it thinks it most efficient
and you don't have direct control over that.

The performance hit you'll take 2-3 times a day when a statement has to
be replayed due to deadlock will hardly be noticed (although a statement
that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
will only happen 2-3 times a day, and the solution I'm proposing won't
have any performance impact on the other 13000000 queries per day that
don't deadlock.

2-3 deadlocks per day is normal operation for a heavily contented table,
in my experience.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com


Re: Avoiding deadlocks when performing bulk update and delete operations

From
Sanjaya Vithanagama
Date:


On Wed, Nov 26, 2014 at 11:47 AM, Bill Moran <wmoran@potentialtech.com> wrote:
On Wed, 26 Nov 2014 10:41:56 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:
>
> > * How frequently do deadlocks occur?
>
> We are seeing deadlocks about 2-3 times per day in the production server.
> To reproduce the problem easily we've written a simple Java class with
> multiple threads calling to the stored procedures running the above queries
> inside a loop. This way we can easily recreate a scenario that happens in
> the production.

Don't overcomplicate your solution. Adjust your code to detect the deadlock
and replay the transaction when it happens. At 2-3 deadlocks per day, it's
difficult to justify any other solution (as any other solution would be
more time-consuming to implement, AND would interfere with performance).

When you say replay the transaction, I believe that is to catch the exception inside the stored procedure? We've considered that option at one state but, the problem with that is we don't have enough context information at the stored procedure where this deadlock occurs. 
 

I've worked with a number of write-heavy applications that experienced
deadlocks, some of them on the order of hundreds of deadlocks per day.
In some cases, you can adjust the queries to reduce the incidence of
deadlocks, or eliminate the possibility of deadlocks completely.  The
situation that you describe is not one of those cases, as the planner
can choose to lock rows in whatever order it thinks it most efficient
and you don't have direct control over that.

The performance hit you'll take 2-3 times a day when a statement has to
be replayed due to deadlock will hardly be noticed (although a statement
that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
will only happen 2-3 times a day, and the solution I'm proposing won't
have any performance impact on the other 13000000 queries per day that
don't deadlock.

2-3 deadlocks per day is normal operation for a heavily contented table,
in my experience.

Given that we have no control over how Postgres performs delete and update operations, the only other possibility seems to be to partition this table by id_A (so that the individual tables will never be deadlocked). But that seems to be a too extreme end option at this stage.

 

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com



--
Sanjaya

Re: Avoiding deadlocks when performing bulk update and delete operations

From
Bill Moran
Date:
On Thu, 27 Nov 2014 15:07:49 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

> On Wed, Nov 26, 2014 at 11:47 AM, Bill Moran <wmoran@potentialtech.com>
> wrote:
>
> > On Wed, 26 Nov 2014 10:41:56 +1100
> > Sanjaya Vithanagama <svithanagama@gmail.com> wrote:
> > >
> > > > * How frequently do deadlocks occur?
> > >
> > > We are seeing deadlocks about 2-3 times per day in the production server.
> > > To reproduce the problem easily we've written a simple Java class with
> > > multiple threads calling to the stored procedures running the above
> > queries
> > > inside a loop. This way we can easily recreate a scenario that happens in
> > > the production.
> >
> > Don't overcomplicate your solution. Adjust your code to detect the deadlock
> > and replay the transaction when it happens. At 2-3 deadlocks per day, it's
> > difficult to justify any other solution (as any other solution would be
> > more time-consuming to implement, AND would interfere with performance).
>
> When you say replay the transaction, I believe that is to catch the
> exception inside the stored procedure? We've considered that option at one
> state but, the problem with that is we don't have enough context
> information at the stored procedure where this deadlock occurs.

Why not catch it in the application calling the stored procedure?

I don't understand how you could not have enough context to run the command
you were just trying to run. Can you elaborate on what you mean by that?

> > I've worked with a number of write-heavy applications that experienced
> > deadlocks, some of them on the order of hundreds of deadlocks per day.
> > In some cases, you can adjust the queries to reduce the incidence of
> > deadlocks, or eliminate the possibility of deadlocks completely.  The
> > situation that you describe is not one of those cases, as the planner
> > can choose to lock rows in whatever order it thinks it most efficient
> > and you don't have direct control over that.
> >
> > The performance hit you'll take 2-3 times a day when a statement has to
> > be replayed due to deadlock will hardly be noticed (although a statement
> > that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
> > will only happen 2-3 times a day, and the solution I'm proposing won't
> > have any performance impact on the other 13000000 queries per day that
> > don't deadlock.
> >
> > 2-3 deadlocks per day is normal operation for a heavily contented table,
> > in my experience.
>
> Given that we have no control over how Postgres performs delete and update
> operations, the only other possibility seems to be to partition this table
> by id_A (so that the individual tables will never be deadlocked). But that
> seems to be a too extreme end option at this stage.

That would be overcomplicating the solution, and almost certainly won't work
anyway. If you're getting deadlocks, it's because two processes are trying
to modify the same rows. Even if you partition, those same rows will be on
the same partition, so you'll still deadlock.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com


Re: Avoiding deadlocks when performing bulk update and delete operations

From
rob stone
Date:


On Thu, 2014-11-27 at 06:49 -0500, Bill Moran wrote:
> On Thu, 27 Nov 2014 15:07:49 +1100
> Sanjaya Vithanagama <svithanagama@gmail.com> wrote:
>
> > On Wed, Nov 26, 2014 at 11:47 AM, Bill Moran <wmoran@potentialtech.com>
> > wrote:
> >
> > > On Wed, 26 Nov 2014 10:41:56 +1100
> > > Sanjaya Vithanagama <svithanagama@gmail.com> wrote:
> > > >
> > > > > * How frequently do deadlocks occur?
> > > >
> > > > We are seeing deadlocks about 2-3 times per day in the production server.
> > > > To reproduce the problem easily we've written a simple Java class with
> > > > multiple threads calling to the stored procedures running the above
> > > queries
> > > > inside a loop. This way we can easily recreate a scenario that happens in
> > > > the production.
> > >
> > > Don't overcomplicate your solution. Adjust your code to detect the deadlock
> > > and replay the transaction when it happens. At 2-3 deadlocks per day, it's
> > > difficult to justify any other solution (as any other solution would be
> > > more time-consuming to implement, AND would interfere with performance).
> >
> > When you say replay the transaction, I believe that is to catch the
> > exception inside the stored procedure? We've considered that option at one
> > state but, the problem with that is we don't have enough context
> > information at the stored procedure where this deadlock occurs.
>
> Why not catch it in the application calling the stored procedure?
>
> I don't understand how you could not have enough context to run the command
> you were just trying to run. Can you elaborate on what you mean by that?
>
> > > I've worked with a number of write-heavy applications that experienced
> > > deadlocks, some of them on the order of hundreds of deadlocks per day.
> > > In some cases, you can adjust the queries to reduce the incidence of
> > > deadlocks, or eliminate the possibility of deadlocks completely.  The
> > > situation that you describe is not one of those cases, as the planner
> > > can choose to lock rows in whatever order it thinks it most efficient
> > > and you don't have direct control over that.
> > >
> > > The performance hit you'll take 2-3 times a day when a statement has to
> > > be replayed due to deadlock will hardly be noticed (although a statement
> > > that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
> > > will only happen 2-3 times a day, and the solution I'm proposing won't
> > > have any performance impact on the other 13000000 queries per day that
> > > don't deadlock.
> > >
> > > 2-3 deadlocks per day is normal operation for a heavily contented table,
> > > in my experience.
> >
> > Given that we have no control over how Postgres performs delete and update
> > operations, the only other possibility seems to be to partition this table
> > by id_A (so that the individual tables will never be deadlocked). But that
> > seems to be a too extreme end option at this stage.
>
> That would be overcomplicating the solution, and almost certainly won't work
> anyway. If you're getting deadlocks, it's because two processes are trying
> to modify the same rows. Even if you partition, those same rows will be on
> the same partition, so you'll still deadlock.
>
> --
> Bill Moran
> I need your help to succeed:
> http://gamesbybill.com
>
>

I've been following this discussion for a few days.
In my experience, deadlocks are either caused by poor database design or
within the application.
As you are only talking about a single table, then the problem has to be
with the application.
You mentioned that you are using Java and so can we assume there is a
class for the table with getters and setters?
The table should also have its action factory where there ought to be a
method to handle updates and deletes?
Instead of passing int_a and int_b values via an array, why not have a
method to which you pass an int_a value, an int_b value, a boolean to
indicate either update or delete and the values for update or nulls if
deleting.
Then you can go into transaction state, lock the row for update and
throw all the exceptions you need and gracefully handle any feedback to
your users.
If you cannot lock the row just display an alert "Please try again in
five minutes" or something similar.

HTH.

Rob