Thread: [GENERAL] Advisory lock deadlock issue

[GENERAL] Advisory lock deadlock issue

From

David Rosenstrauch

Date:

07 June 2017, 17:16:20

I'm running a Spark job that is writing to a postgres db (v9.6), using
the JDBC driver (v42.0.0), and running into a puzzling error:

2017-06-06 16:05:17.718 UTC [36661] dmx@dmx ERROR:  deadlock detected
2017-06-06 16:05:17.718 UTC [36661] dmx@dmx DETAIL:  Process 36661 waits
for ExclusiveLock on advisory lock [16649,0,102440,5]; blocked by
process 36662.
         Process 36662 waits for ExclusiveLock on advisory lock
[16649,0,102439,5]; blocked by process 36661.

However, I can't for the life of me figure out a) how the advisory locks
are getting created (as I'm not calling for them myself), and b) how to
fix this issue.


A bit of background:  My Spark job runs as multiple processes on
multiple machines.  Each process is performing the writes to pgsql using
the jdbc driver.  The writes are performed a) as PG UPSERTS, b) as JDBC
batches, and c) using JDBC prepared statements.  So each process, when
it's time to write to the db, creates several prepared statements, adds
a bunch of UPSERTs to each prepared statement (i.e., each prepared
statement contains a batch of a few hundred UPSERTs), and then performs
an executeBatch() on each statement to perform the write.  That
executeBatch() call is where I'm running into the error.

In theory, since there's multiple processes that are issuing these
batched DB writes, there could be a record locking problem if, say, 2
processes tried to perform updates to the same user record.  But in
reality this should be impossible.  Spark partitions everything based on
a key - in my case userID - so all DB writes for the same user should be
happening in the same process.  So at worst I could just have a batch
that contains multiple UPSERTs to the same user record, but I should
never be seeing updates to the same user from different processes.


So, I'm very puzzled by that deadlock error.  Specifically:

* How could it be possible that there are 2 PG processes trying to
acquire the same lock?  Spark's partitioning should ensure that all
updates to the same user record get routed to the same process, so this
situation shouldn't even be possible.

* How/why am I winding up acquiring advisory locks in the first place?
I'm never requesting them.  I looked at the PG JDBC driver code a bit,
thinking that it might automatically be creating them for some reason,
but that doesn't seem to be the case.  Maybe the PG database itself is?
(E.g., Does the PG automatically use advisory locks with UPSERTs?)

And, last but not least:

* How do I resolve this "waits for ExclusiveLock on advisory lock"
issue?  There's precious little info available regarding exactly what
that error message is and how to solve.


Any help or pointers greatly appreciated!

Thanks,

DR

Re: [GENERAL] Advisory lock deadlock issue

From

Merlin Moncure

Date:

07 June 2017, 17:32:36

On Wed, Jun 7, 2017 at 9:16 AM, David Rosenstrauch <darose@darose.net> wrote:
> I'm running a Spark job that is writing to a postgres db (v9.6), using the
> JDBC driver (v42.0.0), and running into a puzzling error:
>
> 2017-06-06 16:05:17.718 UTC [36661] dmx@dmx ERROR:  deadlock detected
> 2017-06-06 16:05:17.718 UTC [36661] dmx@dmx DETAIL:  Process 36661 waits for
> ExclusiveLock on advisory lock [16649,0,102440,5]; blocked by process 36662.
>         Process 36662 waits for ExclusiveLock on advisory lock
> [16649,0,102439,5]; blocked by process 36661.
>
> However, I can't for the life of me figure out a) how the advisory locks are
> getting created (as I'm not calling for them myself), and b) how to fix this
> issue.
>
>
> A bit of background:  My Spark job runs as multiple processes on multiple
> machines.  Each process is performing the writes to pgsql using the jdbc
> driver.  The writes are performed a) as PG UPSERTS, b) as JDBC batches, and
> c) using JDBC prepared statements.  So each process, when it's time to write
> to the db, creates several prepared statements, adds a bunch of UPSERTs to
> each prepared statement (i.e., each prepared statement contains a batch of a
> few hundred UPSERTs), and then performs an executeBatch() on each statement
> to perform the write.  That executeBatch() call is where I'm running into
> the error.
>
> In theory, since there's multiple processes that are issuing these batched
> DB writes, there could be a record locking problem if, say, 2 processes
> tried to perform updates to the same user record.  But in reality this
> should be impossible.  Spark partitions everything based on a key - in my
> case userID - so all DB writes for the same user should be happening in the
> same process.  So at worst I could just have a batch that contains multiple
> UPSERTs to the same user record, but I should never be seeing updates to the
> same user from different processes.
>
>
> So, I'm very puzzled by that deadlock error.  Specifically:
>
> * How could it be possible that there are 2 PG processes trying to acquire
> the same lock?  Spark's partitioning should ensure that all updates to the
> same user record get routed to the same process, so this situation shouldn't
> even be possible.

That's really a question for the Spark team.  Obviously they are --
advisory locks lay on top of the basic locking mechanics and are very
well tested and proven.   What I can tell you is that in the core
functions provided by postgres there are no advisory locks thrown --
you own the locking space (that is, code under your control).

> * How/why am I winding up acquiring advisory locks in the first place? I'm
> never requesting them.  I looked at the PG JDBC driver code a bit, thinking
> that it might automatically be creating them for some reason, but that
> doesn't seem to be the case.  Maybe the PG database itself is? (E.g., Does
> the PG automatically use advisory locks with UPSERTs?)

Some code under your control is. This could be an external module,
application code, or an sproc.

> And, last but not least:
>
> * How do I resolve this "waits for ExclusiveLock on advisory lock" issue?
> There's precious little info available regarding exactly what that error
> message is and how to solve.

Barring some reponse from Spark team, here is how I would narrow the
problem down:

*) lets search the contents of pg_proc for functions calling advisory locks:
SELECT * FROM pg_proc where prosrc ~* 'advisory';

that might turn up some 3rd party code hits

*) turn on statement level logging and in bash:
tail -f postgres_xx.log | grep -i advisory

*) repeatedly query pg_locks for locktype = 'advisory'
SELECT * FROM pg_locks where locktype = 'advisory'

also,
SELECT * FROM pg_locks WHERE locktype = 'advisory' AND NOT granted;

Advisory locks are a very blunt instrument and it's a significant risk
that two different locking systems are stepping on each other's toes.
I do not recommend using them (especially non-xact variant) unless you
have total control over all the code potentially throwing locks and
have a good understanding of interactions with connection poolers
between locking code and the database.

merlin

Re: [GENERAL] Advisory lock deadlock issue

From

David Rosenstrauch

Date:

07 June 2017, 18:11:53


On 06/07/2017 10:32 AM, Merlin Moncure wrote:
> On Wed, Jun 7, 2017 at 9:16 AM, David Rosenstrauch <darose@darose.net> wrote:
>> * How could it be possible that there are 2 PG processes trying to acquire
>> the same lock?  Spark's partitioning should ensure that all updates to the
>> same user record get routed to the same process, so this situation shouldn't
>> even be possible.
>
> That's really a question for the Spark team.  Obviously they are --
> advisory locks lay on top of the basic locking mechanics and are very
> well tested and proven.   What I can tell you is that in the core
> functions provided by postgres there are no advisory locks thrown --
> you own the locking space (that is, code under your control).

>> * How/why am I winding up acquiring advisory locks in the first place? I'm
>> never requesting them.  I looked at the PG JDBC driver code a bit, thinking
>> that it might automatically be creating them for some reason, but that
>> doesn't seem to be the case.  Maybe the PG database itself is? (E.g., Does
>> the PG automatically use advisory locks with UPSERTs?)
>
> Some code under your control is. This could be an external module,
> application code, or an sproc.

>> And, last but not least:
>>
>> * How do I resolve this "waits for ExclusiveLock on advisory lock" issue?
>> There's precious little info available regarding exactly what that error
>> message is and how to solve.
>
> Barring some reponse from Spark team, here is how I would narrow the
> problem down:

> merlin


Thanks much for the suggestions.  I'll look into them..

As far as the source of the advisory locks, I don't think they're coming
from Spark as I'm not using any Spark code to access PG.  (Just straight
JDBC.)

I'm actually using an offshoot of PG (CitusDB), so perhaps Citus is
somehow initiating them.  I'll try to pin this down a bit further.

Thanks,

DR

Re: [GENERAL] Advisory lock deadlock issue

From

Adrian Klaver

Date:

07 June 2017, 23:46:56

On 06/07/2017 08:11 AM, David Rosenstrauch wrote:
>
>
> On 06/07/2017 10:32 AM, Merlin Moncure wrote:
>> On Wed, Jun 7, 2017 at 9:16 AM, David Rosenstrauch <darose@darose.net>
>> wrote:
>>> * How could it be possible that there are 2 PG processes trying to
>>> acquire
>>> the same lock?  Spark's partitioning should ensure that all updates
>>> to the
>>> same user record get routed to the same process, so this situation
>>> shouldn't
>>> even be possible.
>>
>> That's really a question for the Spark team.  Obviously they are --
>> advisory locks lay on top of the basic locking mechanics and are very
>> well tested and proven.   What I can tell you is that in the core
>> functions provided by postgres there are no advisory locks thrown --
>> you own the locking space (that is, code under your control).
>
>>> * How/why am I winding up acquiring advisory locks in the first
>>> place? I'm
>>> never requesting them.  I looked at the PG JDBC driver code a bit,
>>> thinking
>>> that it might automatically be creating them for some reason, but that
>>> doesn't seem to be the case.  Maybe the PG database itself is? (E.g.,
>>> Does
>>> the PG automatically use advisory locks with UPSERTs?)
>>
>> Some code under your control is. This could be an external module,
>> application code, or an sproc.
>
>>> And, last but not least:
>>>
>>> * How do I resolve this "waits for ExclusiveLock on advisory lock"
>>> issue?
>>> There's precious little info available regarding exactly what that error
>>> message is and how to solve.
>>
>> Barring some reponse from Spark team, here is how I would narrow the
>> problem down:
>
>> merlin
>
>
> Thanks much for the suggestions.  I'll look into them..
>
> As far as the source of the advisory locks, I don't think they're coming
> from Spark as I'm not using any Spark code to access PG.  (Just straight
> JDBC.)
>
> I'm actually using an offshoot of PG (CitusDB), so perhaps Citus is
> somehow initiating them.  I'll try to pin this down a bit further.

Aah that is an important piece of info. A quick search found:

https://www.citusdata.com/blog/2017/04/11/rebalancing-your-database-with-citus/

"
...

While this move is happening it takes a standard Postgres advisory locks
..."



>
> Thanks,
>
> DR
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com