Thread: Let's invent a function to report lock-wait-blocking PIDs

Let's invent a function to report lock-wait-blocking PIDs

From

Tom Lane

Date:

20 March 2013, 18:02:41

I was just looking into why the -DCLOBBER_CACHE_ALWAYS buildfarm
critters aren't managing to run the new "timeouts" isolation test
successfully, despite very generous timeouts.  The answer is that
2 seconds isn't quite enough time to parse+plan+execute the query
that isolationtester uses to see if the current test session is
blocked on a lock, if CLOBBER_CACHE_ALWAYS is on.  Now, that query
is totally horrible:
   appendPQExpBufferStr(&wait_query,                        "SELECT 1 FROM pg_locks holder, pg_locks waiter "
            "WHERE NOT waiter.granted AND waiter.pid = $1 "                        "AND holder.granted "
       "AND holder.pid <> $1 AND holder.pid IN (");   /* The spec syntax requires at least one session; assume that
here.*/   appendPQExpBuffer(&wait_query, "%s", backend_pids[1]);   for (i = 2; i < nconns; i++)
appendPQExpBuffer(&wait_query,", %s", backend_pids[i]);   appendPQExpBufferStr(&wait_query,                        ")
"
                        "AND holder.mode = ANY (CASE waiter.mode "                        "WHEN 'AccessShareLock' THEN
ARRAY["                       "'AccessExclusiveLock'] "                        "WHEN 'RowShareLock' THEN ARRAY["
               "'ExclusiveLock',"                        "'AccessExclusiveLock'] "                        "WHEN
'RowExclusiveLock'THEN ARRAY["                        "'ShareLock',"                        "'ShareRowExclusiveLock',"
                     "'ExclusiveLock',"                        "'AccessExclusiveLock'] "                        "WHEN
'ShareUpdateExclusiveLock'THEN ARRAY["                        "'ShareUpdateExclusiveLock',"
"'ShareLock',"                       "'ShareRowExclusiveLock',"                        "'ExclusiveLock',"
        "'AccessExclusiveLock'] "                        "WHEN 'ShareLock' THEN ARRAY["
"'RowExclusiveLock',"                       "'ShareUpdateExclusiveLock',"
"'ShareRowExclusiveLock',"                       "'ExclusiveLock',"                        "'AccessExclusiveLock'] "
                   "WHEN 'ShareRowExclusiveLock' THEN ARRAY["                        "'RowExclusiveLock',"
         "'ShareUpdateExclusiveLock',"                        "'ShareLock',"
"'ShareRowExclusiveLock',"                       "'ExclusiveLock',"                        "'AccessExclusiveLock'] "
                   "WHEN 'ExclusiveLock' THEN ARRAY["                        "'RowShareLock',"
"'RowExclusiveLock',"                       "'ShareUpdateExclusiveLock',"                        "'ShareLock',"
              "'ShareRowExclusiveLock',"                        "'ExclusiveLock',"
"'AccessExclusiveLock']"                        "WHEN 'AccessExclusiveLock' THEN ARRAY["
"'AccessShareLock',"                       "'RowShareLock',"                        "'RowExclusiveLock',"
        "'ShareUpdateExclusiveLock',"                        "'ShareLock',"
"'ShareRowExclusiveLock',"                       "'ExclusiveLock',"                        "'AccessExclusiveLock'] END)
"
                 "AND holder.locktype IS NOT DISTINCT FROM waiter.locktype "                 "AND holder.database IS
NOTDISTINCT FROM waiter.database "                 "AND holder.relation IS NOT DISTINCT FROM waiter.relation "
             "AND holder.page IS NOT DISTINCT FROM waiter.page "                        "AND holder.tuple IS NOT
DISTINCTFROM waiter.tuple "             "AND holder.virtualxid IS NOT DISTINCT FROM waiter.virtualxid "       "AND
holder.transactionidIS NOT DISTINCT FROM waiter.transactionid "                   "AND holder.classid IS NOT DISTINCT
FROMwaiter.classid "                        "AND holder.objid IS NOT DISTINCT FROM waiter.objid "               "AND
holder.objsubidIS NOT DISTINCT FROM waiter.objsubid ");
 

This is way more knowledge than we (should) want a client to embed about
which lock types block which others.  What's worse, it's still wrong.
The query will find cases where one of the test sessions *directly*
blocks another one, but not cases where the blockage is indirect.
For example, consider that A holds AccessShareLock, B is waiting for
AccessExclusiveLock on the same object, and C is queued up behind B
for another AccessShareLock.  This query will not think that C is
blocked, not even if B is part of the set of sessions of interest
(because B will show the lock as not granted); but especially so if
B is not part of the set.

I think that such situations may not arise in the specific context that
isolationtester says it's worried about, which is to disregard waits for
locks held by autovacuum.  But in general, you can't reliably tell who's
blocking whom with a query like this.

If isolationtester were the only market for this type of information,
maybe it wouldn't be worth worrying about.  But I'm pretty sure that
there are a *lot* of monitoring applications out there that are trying
to extract who-blocks-whom information from pg_locks.  I hadn't realized
before quite how painful it is to do that, even incorrectly.

I propose that we should add a backend function that simplifies this
type of query.  The API that comes to mind is (name subject to
bikeshedding)
pg_blocking_pids(pid int) returns int[]

defined to return NULL if the argument isn't the PID of any backend or
that backend isn't waiting for a lock, and otherwise an array of the
PIDs of the backends that are blocking it from getting the lock.
I would compute the array as
PIDs of backends already holding conflicting locks,plus PIDs of backends requesting conflicting locks that areahead of
thisone in the lock's wait queue,plus PIDs of backends that block the latter group of PIDs(ie, are holding locks
conflictingwith their requests,or are awaiting such locks and are ahead of them in the queue)
 

There would be some cases where this definition would be too expansive,
ie we'd release the waiter after only some of the listed sessions had
released their lock or request.  (That could happen for instance if we
concluded we had to move up the waiter's request to escape a deadlock.)
But I think that it's better to err in that direction than to
underestimate the set of relevant PIDs.

In the isolationtester use-case, we'd get the right answer by testing
whether this function's result has any overlap with the set of PIDs of
test sessions, ie
select pg_blocking_pids($1) && array[pid1, pid2, pid3, ...]

Thoughts?
        regards, tom lane

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Robert Haas

Date:

20 March 2013, 18:29:11

On Wed, Mar 20, 2013 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I was just looking into why the -DCLOBBER_CACHE_ALWAYS buildfarm
> critters aren't managing to run the new "timeouts" isolation test
> successfully, despite very generous timeouts.  The answer is that
> 2 seconds isn't quite enough time to parse+plan+execute the query
> that isolationtester uses to see if the current test session is
> blocked on a lock, if CLOBBER_CACHE_ALWAYS is on.  Now, that query
> is totally horrible:
>
>     appendPQExpBufferStr(&wait_query,
>                          "SELECT 1 FROM pg_locks holder, pg_locks waiter "
>                          "WHERE NOT waiter.granted AND waiter.pid = $1 "
>                          "AND holder.granted "
>                          "AND holder.pid <> $1 AND holder.pid IN (");
>     /* The spec syntax requires at least one session; assume that here. */
>     appendPQExpBuffer(&wait_query, "%s", backend_pids[1]);
>     for (i = 2; i < nconns; i++)
>         appendPQExpBuffer(&wait_query, ", %s", backend_pids[i]);
>     appendPQExpBufferStr(&wait_query,
>                          ") "
>
>                          "AND holder.mode = ANY (CASE waiter.mode "
>                          "WHEN 'AccessShareLock' THEN ARRAY["
>                          "'AccessExclusiveLock'] "
>                          "WHEN 'RowShareLock' THEN ARRAY["
>                          "'ExclusiveLock',"
>                          "'AccessExclusiveLock'] "
>                          "WHEN 'RowExclusiveLock' THEN ARRAY["
>                          "'ShareLock',"
>                          "'ShareRowExclusiveLock',"
>                          "'ExclusiveLock',"
>                          "'AccessExclusiveLock'] "
>                          "WHEN 'ShareUpdateExclusiveLock' THEN ARRAY["
>                          "'ShareUpdateExclusiveLock',"
>                          "'ShareLock',"
>                          "'ShareRowExclusiveLock',"
>                          "'ExclusiveLock',"
>                          "'AccessExclusiveLock'] "
>                          "WHEN 'ShareLock' THEN ARRAY["
>                          "'RowExclusiveLock',"
>                          "'ShareUpdateExclusiveLock',"
>                          "'ShareRowExclusiveLock',"
>                          "'ExclusiveLock',"
>                          "'AccessExclusiveLock'] "
>                          "WHEN 'ShareRowExclusiveLock' THEN ARRAY["
>                          "'RowExclusiveLock',"
>                          "'ShareUpdateExclusiveLock',"
>                          "'ShareLock',"
>                          "'ShareRowExclusiveLock',"
>                          "'ExclusiveLock',"
>                          "'AccessExclusiveLock'] "
>                          "WHEN 'ExclusiveLock' THEN ARRAY["
>                          "'RowShareLock',"
>                          "'RowExclusiveLock',"
>                          "'ShareUpdateExclusiveLock',"
>                          "'ShareLock',"
>                          "'ShareRowExclusiveLock',"
>                          "'ExclusiveLock',"
>                          "'AccessExclusiveLock'] "
>                          "WHEN 'AccessExclusiveLock' THEN ARRAY["
>                          "'AccessShareLock',"
>                          "'RowShareLock',"
>                          "'RowExclusiveLock',"
>                          "'ShareUpdateExclusiveLock',"
>                          "'ShareLock',"
>                          "'ShareRowExclusiveLock',"
>                          "'ExclusiveLock',"
>                          "'AccessExclusiveLock'] END) "
>
>                   "AND holder.locktype IS NOT DISTINCT FROM waiter.locktype "
>                   "AND holder.database IS NOT DISTINCT FROM waiter.database "
>                   "AND holder.relation IS NOT DISTINCT FROM waiter.relation "
>                          "AND holder.page IS NOT DISTINCT FROM waiter.page "
>                          "AND holder.tuple IS NOT DISTINCT FROM waiter.tuple "
>               "AND holder.virtualxid IS NOT DISTINCT FROM waiter.virtualxid "
>         "AND holder.transactionid IS NOT DISTINCT FROM waiter.transactionid "
>                     "AND holder.classid IS NOT DISTINCT FROM waiter.classid "
>                          "AND holder.objid IS NOT DISTINCT FROM waiter.objid "
>                 "AND holder.objsubid IS NOT DISTINCT FROM waiter.objsubid ");
>
> This is way more knowledge than we (should) want a client to embed about
> which lock types block which others.  What's worse, it's still wrong.
> The query will find cases where one of the test sessions *directly*
> blocks another one, but not cases where the blockage is indirect.
> For example, consider that A holds AccessShareLock, B is waiting for
> AccessExclusiveLock on the same object, and C is queued up behind B
> for another AccessShareLock.  This query will not think that C is
> blocked, not even if B is part of the set of sessions of interest
> (because B will show the lock as not granted); but especially so if
> B is not part of the set.
>
> I think that such situations may not arise in the specific context that
> isolationtester says it's worried about, which is to disregard waits for
> locks held by autovacuum.  But in general, you can't reliably tell who's
> blocking whom with a query like this.
>
> If isolationtester were the only market for this type of information,
> maybe it wouldn't be worth worrying about.  But I'm pretty sure that
> there are a *lot* of monitoring applications out there that are trying
> to extract who-blocks-whom information from pg_locks.  I hadn't realized
> before quite how painful it is to do that, even incorrectly.
>
> I propose that we should add a backend function that simplifies this
> type of query.  The API that comes to mind is (name subject to
> bikeshedding)
>
>         pg_blocking_pids(pid int) returns int[]
>
> defined to return NULL if the argument isn't the PID of any backend or
> that backend isn't waiting for a lock, and otherwise an array of the
> PIDs of the backends that are blocking it from getting the lock.
> I would compute the array as
>
>         PIDs of backends already holding conflicting locks,
>         plus PIDs of backends requesting conflicting locks that are
>         ahead of this one in the lock's wait queue,
>         plus PIDs of backends that block the latter group of PIDs
>         (ie, are holding locks conflicting with their requests,
>         or are awaiting such locks and are ahead of them in the queue)
>
> There would be some cases where this definition would be too expansive,
> ie we'd release the waiter after only some of the listed sessions had
> released their lock or request.  (That could happen for instance if we
> concluded we had to move up the waiter's request to escape a deadlock.)
> But I think that it's better to err in that direction than to
> underestimate the set of relevant PIDs.
>
> In the isolationtester use-case, we'd get the right answer by testing
> whether this function's result has any overlap with the set of PIDs of
> test sessions, ie
>
>         select pg_blocking_pids($1) && array[pid1, pid2, pid3, ...]

Sounds excellent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Alvaro Herrera

Date:

20 March 2013, 18:37:23

Robert Haas escribió:
> On Wed, Mar 20, 2013 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I was just looking into why the -DCLOBBER_CACHE_ALWAYS buildfarm
> > critters aren't managing to run the new "timeouts" isolation test
> > successfully, despite very generous timeouts.  The answer is that
> > 2 seconds isn't quite enough time to parse+plan+execute the query
> > that isolationtester uses to see if the current test session is
> > blocked on a lock, if CLOBBER_CACHE_ALWAYS is on.  Now, that query
> > is totally horrible:

> > In the isolationtester use-case, we'd get the right answer by testing
> > whether this function's result has any overlap with the set of PIDs of
> > test sessions, ie
> >
> >         select pg_blocking_pids($1) && array[pid1, pid2, pid3, ...]
>
> Sounds excellent.

Yeah, I have looked at that query a couple of times wondering how it
could be improved and came up blank.  Glad you had a reason to be in the
area.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Greg Stark

Date:

20 March 2013, 18:55:07

On Wed, Mar 20, 2013 at 6:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I propose that we should add a backend function that simplifies this
> type of query.  The API that comes to mind is (name subject to
> bikeshedding)
>
>         pg_blocking_pids(pid int) returns int[]

I've wanted to use pg_locks as a demonstration for recursive queries
many times and ran into the same problem. It's just too hard to figure
out which lock holders would be blocking which other locks.

I would like to be able to generate the full graph showing indirect
blocking. This seems to be not quite powerful enough to do it though.
I would have expected something that took whole pg_lock row values or
something like that.

-- 
greg

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Tom Lane

Date:

20 March 2013, 19:13:08

Greg Stark <stark@mit.edu> writes:
> On Wed, Mar 20, 2013 at 6:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I propose that we should add a backend function that simplifies this
>> type of query.  The API that comes to mind is (name subject to
>> bikeshedding)
>> 
>> pg_blocking_pids(pid int) returns int[]

> I've wanted to use pg_locks as a demonstration for recursive queries
> many times and ran into the same problem. It's just too hard to figure
> out which lock holders would be blocking which other locks.

> I would like to be able to generate the full graph showing indirect
> blocking. This seems to be not quite powerful enough to do it though.
> I would have expected something that took whole pg_lock row values or
> something like that.

I wanted to write the function so it would inspect the lock data
structures directly rather than reconstruct them from pg_locks output;
coercing those back from text to internal form and matching up the lock
identities is a very large part of the inefficiency of the
isolationtester query.  Moreover, the pg_locks output fails to capture
lock queue ordering at all, I believe, so the necessary info just isn't
there for determining who's blocking whom in the case of conflicting
ungranted requests.

Now a disadvantage of that approach is that successive calls to the
function won't necessarily see the same state.  So if we wanted to break
down the results into direct and indirect blockers, we couldn't do that
with separate functions; we'd have to think of some representation that
captures all the info in a single function's output.

Also, I intentionally proposed that this just return info relevant to a
single process, in hopes that that would make it cheap enough that we
could do the calculations while holding the lock data structure LWLocks.
(Not having written the code yet, I'm not totally sure that will fly.)
If we want a global view of the who-blocks-whom situation, I think we'll
need another approach.  But since this way solves isolationtester's
problem fairly neatly, I was hopeful that it would be useful for other
apps too.
        regards, tom lane

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Dimitri Fontaine

Date:

20 March 2013, 22:28:19

Tom Lane <tgl@sss.pgh.pa.us> writes:
>>> I propose that we should add a backend function that simplifies this
>>> type of query.  The API that comes to mind is (name subject to
>>> bikeshedding)
>>>
>>> pg_blocking_pids(pid int) returns int[]

+1

> If we want a global view of the who-blocks-whom situation, I think we'll
> need another approach.  But since this way solves isolationtester's
> problem fairly neatly, I was hopeful that it would be useful for other
> apps too.

What about a function
 pg_is_lock_exclusive(lock, lock) returns boolean pg_is_lock_exclusive(lock[], lock[]) returns boolean

I suppose that the lock type would be text ('ExclusiveLock'), but we
could also expose a new ENUM type for that (pg_lock_mode). If we do
that, we can also provide operators such as the following… I did try to
search for some existing ones but failed to do so.
 pg_lock_mode & pg_lock_mode pg_lock_mode | pg_lock_mode

Equiped with that, it should be possible to come up with a recursive
query on pg_locks that displays the whole graph, and we should then
provide as one of our system views.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Tom Lane

Date:

20 March 2013, 22:46:27

Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> If we want a global view of the who-blocks-whom situation, I think we'll
>> need another approach.  But since this way solves isolationtester's
>> problem fairly neatly, I was hopeful that it would be useful for other
>> apps too.

> What about a function

>   pg_is_lock_exclusive(lock, lock) returns boolean
>   pg_is_lock_exclusive(lock[], lock[]) returns boolean

> I suppose that the lock type would be text ('ExclusiveLock'), but we
> could also expose a new ENUM type for that (pg_lock_mode).

I don't have an objection to providing such a function, but it doesn't
do anything for the problem beyond allowing getting rid of the hairy
case expression.  That's a good thing to do of course --- but what about
the indirect-blockage issue?
        regards, tom lane

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Dimitri Fontaine

Date:

20 March 2013, 23:03:30

Tom Lane <tgl@sss.pgh.pa.us> writes:
>>   pg_is_lock_exclusive(lock, lock) returns boolean
>>   pg_is_lock_exclusive(lock[], lock[]) returns boolean
>
>> I suppose that the lock type would be text ('ExclusiveLock'), but we
>> could also expose a new ENUM type for that (pg_lock_mode).
>
> I don't have an objection to providing such a function, but it doesn't
> do anything for the problem beyond allowing getting rid of the hairy
> case expression.  That's a good thing to do of course --- but what about
> the indirect-blockage issue?

It's too late for my brain to build the full answer, the idea is that we
have another way to build the dependency cycles in the pg_locks query
and then we can aggregate locks at each level and see about conflicts
once we accumulated the data.

Is that even possible? E_GOTOSLEEP.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Simon Riggs

Date:

21 March 2013, 00:20:28

On 20 March 2013 18:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The API that comes to mind is (name subject to
> bikeshedding)
>
>         pg_blocking_pids(pid int) returns int[]
>

Useful. Can we also have an SRF rather than an array?

Does the definition as an array imply anything about our ability to
join an SRF to an array?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Noah Misch

Date:

21 March 2013, 01:48:11

On Wed, Mar 20, 2013 at 02:02:32PM -0400, Tom Lane wrote:
[fun query for appraising lock contention]

> This is way more knowledge than we (should) want a client to embed about
> which lock types block which others.  What's worse, it's still wrong.
> The query will find cases where one of the test sessions *directly*
> blocks another one, but not cases where the blockage is indirect.
> For example, consider that A holds AccessShareLock, B is waiting for
> AccessExclusiveLock on the same object, and C is queued up behind B
> for another AccessShareLock.  This query will not think that C is
> blocked, not even if B is part of the set of sessions of interest
> (because B will show the lock as not granted); but especially so if
> B is not part of the set.
> 
> I think that such situations may not arise in the specific context that
> isolationtester says it's worried about, which is to disregard waits for
> locks held by autovacuum.  But in general, you can't reliably tell who's
> blocking whom with a query like this.

Indeed, isolationtester only uses the lock wait query when all but one session
is idle (typically idle-in-transaction).  But a more-general implementation of
the isolationtester concept would need the broader comprehension you describe.

> If isolationtester were the only market for this type of information,
> maybe it wouldn't be worth worrying about.  But I'm pretty sure that
> there are a *lot* of monitoring applications out there that are trying
> to extract who-blocks-whom information from pg_locks.

Agreed; such a feature would carry its own weight.  Unless the cost to
implement it is similar to the overall cost of just making the affected
timeout values high enough, I do think it's best delayed until 9.4.

> I propose that we should add a backend function that simplifies this
> type of query.  The API that comes to mind is (name subject to
> bikeshedding)
> 
>     pg_blocking_pids(pid int) returns int[]
> 
> defined to return NULL if the argument isn't the PID of any backend or
> that backend isn't waiting for a lock, and otherwise an array of the
> PIDs of the backends that are blocking it from getting the lock.
> I would compute the array as
> 
>     PIDs of backends already holding conflicting locks,
>     plus PIDs of backends requesting conflicting locks that are
>     ahead of this one in the lock's wait queue,
>     plus PIDs of backends that block the latter group of PIDs
>     (ie, are holding locks conflicting with their requests,
>     or are awaiting such locks and are ahead of them in the queue)
> 
> There would be some cases where this definition would be too expansive,
> ie we'd release the waiter after only some of the listed sessions had
> released their lock or request.  (That could happen for instance if we
> concluded we had to move up the waiter's request to escape a deadlock.)
> But I think that it's better to err in that direction than to
> underestimate the set of relevant PIDs.

That definition seems compatible with, albeit overkill for, the needs of
isolationtester.  However, I have an inkling that we should expose those
categories.  Perhaps one of these interfaces?
pg_blocking_pids(pid int, OUT blocker int, OUT waiting bool, OUT direct bool) returns setof recordpg_blocking_pids(pid
int,OUT blocker int, OUT how text) returns setof record
 

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Tom Lane

Date:

21 March 2013, 03:36:35

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 20 March 2013 18:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The API that comes to mind is (name subject to
>> bikeshedding)
>> 
>> pg_blocking_pids(pid int) returns int[]

> Useful. Can we also have an SRF rather than an array?

I thought about that, but at least for the isolationtester use-case,
the array result is clearly easier to use.  You can get from one to the
other with unnest() or array_agg(), so I don't really feel a need to
provide both.  Can you generate use-cases where the set-result approach
is superior?
        regards, tom lane

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Heikki Linnakangas

Date:

22 March 2013, 09:39:24

On 21.03.2013 05:36, Tom Lane wrote:
> Simon Riggs<simon@2ndQuadrant.com>  writes:
>> On 20 March 2013 18:02, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> The API that comes to mind is (name subject to
>>> bikeshedding)
>>>
>>> pg_blocking_pids(pid int) returns int[]
>
>> Useful. Can we also have an SRF rather than an array?
>
> I thought about that, but at least for the isolationtester use-case,
> the array result is clearly easier to use.  You can get from one to the
> other with unnest() or array_agg(), so I don't really feel a need to
> provide both.  Can you generate use-cases where the set-result approach
> is superior?

How about inverting the function into:

pg_pid_blocked_by(pid int) returns int

It would take as argument a pid, and return the pid of the process that 
is blocking the given process. That would feel more natural to me.

- Heikki

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Tom Lane

Date:

22 March 2013, 14:24:36

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 21.03.2013 05:36, Tom Lane wrote:
>>> The API that comes to mind is (name subject to bikeshedding)
>>> pg_blocking_pids(pid int) returns int[]

> How about inverting the function into:
> pg_pid_blocked_by(pid int) returns int
> It would take as argument a pid, and return the pid of the process that 
> is blocking the given process. That would feel more natural to me.

Hm, I'm not sure that's uniquely defined.  In the case I mentioned
before (A has AccessShare, B is blocked waiting for AccessExclusive,
C wants AccessShare and is queued behind B), which of A and B do
you think is blocking C?

Whichever answer you choose could be the wrong one for isolationtester:
I think it needs to consider that C is blocked if *either* A or B is
part of its set of test processes.  So that's why I thought an array
(or set) result including both A and B would be appropriate.  AFAICT,
what you're proposing isn't the "inverse" of what I said, it's the
same direction but you're assuming there's only one blocking process.
        regards, tom lane

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Jim Nasby

Date:

23 March 2013, 06:18:27

On 3/20/13 10:36 PM, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 20 March 2013 18:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The API that comes to mind is (name subject to
>>> bikeshedding)
>>>
>>> pg_blocking_pids(pid int) returns int[]
>
>> Useful. Can we also have an SRF rather than an array?
>
> I thought about that, but at least for the isolationtester use-case,
> the array result is clearly easier to use.  You can get from one to the
> other with unnest() or array_agg(), so I don't really feel a need to
> provide both.  Can you generate use-cases where the set-result approach
> is superior?

Unless pg_blocking_pids(..) RETURNS SETOF would be significantly faster than unnest(), not directly, BUT...

Anytime I'm looking at locks I almost always want to know not only who's blocking who, but what they're actually
blockingon. Related to that, I also wish we had a way to provide more info about why we're blocked on an XID, since
justpointing your finger at a backend often doesn't do much to tell you what caused the block in the first place.

So from that standpoint, I'd prefer that pg_blocking_pids returned enough info to tell me exactly which locks were
blocking.

*thinking*

Actually, is it possible for a backend to have more than one ungranted lock? If not then I suppose that would be good
enoughto tell you which lock had the problem.

On the performance side, I've also often wished for a way to pull data from pg_* tables/functions atomically; would it
bereasonable to have a separate function that would copy everything from the proc array into local memory so you could
queryit from there to your hearts content? Bonus if it could also copy all/parts of the statistics file.

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Greg Smith

Date:

24 March 2013, 14:26:18

On 3/20/13 2:02 PM, Tom Lane wrote:
> If isolationtester were the only market for this type of information,
> maybe it wouldn't be worth worrying about.  But I'm pretty sure that
> there are a *lot* of monitoring applications out there that are trying
> to extract who-blocks-whom information from pg_locks.  I hadn't realized
> before quite how painful it is to do that, even incorrectly.

As a FYI, the one Marco wrote here is over 100 lines of code, and while 
he did a great job I'd still never suggest we release it--because it's 
misleading in just enough cases to be dangerous.  We can run it 
usefully, but I'd never hand this over to a customer and expect them to 
do something with it.

> I propose that we should add a backend function that simplifies this
> type of query.  The API that comes to mind is (name subject to
> bikeshedding)
>
>     pg_blocking_pids(pid int) returns int[]

I think there's a whole family of functions like this needed.  This is 
one of them, so if it helps the isolation tester I'd be happy to see it 
added as a first one, whether or not more come along one day.

I'd rather get the data back as a SRF because I'd usually be joining it 
to pg_locks and/or pg_stat_activity to figure out what the blocking pids 
own or are doing.  You can obviously convert the array form to/from the 
SRF form.  The exposed function API that is easier for users to join 
with is my preference.  If the isolation tester is easier to write 
against the array form, it can play the appropriate nesting game to do 
so.  I see that as the unusual case though, and it is also the one being 
coded by people who know how to handle the conversion.

The longer list of views/functions I keep wanting includes things like:

-What processes are blocking P from running?  [This new function]

-What processes hold locks and are running usefully--they have some 
locks but all are granted?  [Easy to extract from pg_locks]

-For each running process, which processes are waiting on them? 
[Requires a long WITH RECURSIVE query that doesn't get trapped by 
circular locks]

-If I try to grab lock type L on object O, what existing locks will that 
conflict with?

One really magic thing I'd like in this area is EXPLAIN (ANALYZE ON, 
LOCKS ON) which pops out a list of all the locks acquired when running 
that statement.  We're never going to get fully correct documentation of 
what locks a given statement needs.  If I can figure that out in a test 
environment by running the statement there and seeing what locks it 
grabbed along the way, that would eliminate most of the need for 
documenting things.

Note that an EXPLAIN based approach doesn't solve all the problems in 
this area, because the trickiest ones I run into are ALTER TABLE 
changes--which you can't EXPLAIN.  Some API that dumps the locks an 
arbitrary statement acquired just before it exits would be ideal.  When 
a user can ask "what locks did an ALTER TABLE adding a foreign key take 
and what order were they grabbed in?", that would solve the hardest of 
the questions I see in the field.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Alvaro Herrera

Date:

25 March 2013, 15:30:52

Greg Smith wrote:

> Note that an EXPLAIN based approach doesn't solve all the problems
> in this area, because the trickiest ones I run into are ALTER TABLE
> changes--which you can't EXPLAIN.  Some API that dumps the locks an
> arbitrary statement acquired just before it exits would be ideal.
> When a user can ask "what locks did an ALTER TABLE adding a foreign
> key take and what order were they grabbed in?", that would solve the
> hardest of the questions I see in the field.

Hm, this sounds like something we could apply to event triggers -- at
ddl_command_end, you would run a SRF, say
pg_event_trigger_acquired_locks() to get what you want.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Let's invent a function to report lock-wait-blocking PIDs

From

Bruce Momjian

Date:

25 March 2013, 19:53:10

On Thu, Mar 21, 2013 at 12:03:21AM +0100, Dimitri Fontaine wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> >>   pg_is_lock_exclusive(lock, lock) returns boolean
> >>   pg_is_lock_exclusive(lock[], lock[]) returns boolean
> >
> >> I suppose that the lock type would be text ('ExclusiveLock'), but we
> >> could also expose a new ENUM type for that (pg_lock_mode).
> >
> > I don't have an objection to providing such a function, but it doesn't
> > do anything for the problem beyond allowing getting rid of the hairy
> > case expression.  That's a good thing to do of course --- but what about
> > the indirect-blockage issue?
> 
> It's too late for my brain to build the full answer, the idea is that we
> have another way to build the dependency cycles in the pg_locks query
> and then we can aggregate locks at each level and see about conflicts
> once we accumulated the data.
> 
> Is that even possible? E_GOTOSLEEP.

Should this be a TODO?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +