Thread: Signaling of waiting for a cleanup lock?

Signaling of waiting for a cleanup lock?

From
Andres Freund
Date:
Hi,

VACUUM sometimes waits synchronously for a cleanup lock on a heap
page. Sometimes for a long time. Without reporting it externally.
Rather confusing ;).

Since we only take cleanup locks around vacuum, how about we report at
least in pgstat that we're waiting? At the moment, there's really no way
to know if that's what's happening.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Signaling of waiting for a cleanup lock?

From
Robert Haas
Date:
On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> VACUUM sometimes waits synchronously for a cleanup lock on a heap
> page. Sometimes for a long time. Without reporting it externally.
> Rather confusing ;).
>
> Since we only take cleanup locks around vacuum, how about we report at
> least in pgstat that we're waiting? At the moment, there's really no way
> to know if that's what's happening.

That seems like a pretty good idea to me.  I think we've avoided doing
this for LWLocks for fear that there might be too much overhead, but
it's hard for me to imagine a workload where you're waiting for
cleanup locks often enough for the overhead to matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Signaling of waiting for a cleanup lock?

From
Andres Freund
Date:
On 2014-04-12 17:40:34 -0400, Robert Haas wrote:
> On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > VACUUM sometimes waits synchronously for a cleanup lock on a heap
> > page. Sometimes for a long time. Without reporting it externally.
> > Rather confusing ;).
> >
> > Since we only take cleanup locks around vacuum, how about we report at
> > least in pgstat that we're waiting? At the moment, there's really no way
> > to know if that's what's happening.
> 
> That seems like a pretty good idea to me.

What I am not sure about is how... It's trivial to set
pg_stat_activity.waiting = true, but without a corresponding description
what the backend is waiting for it's not exactly obvious what's
happening. I think that's better than nothing, but maybe somebody has a
glorious better idea.

Overwriting parts of the query/activity sounds like it'd be somewhat
expensive ugly.

> I think we've avoided doing
> this for LWLocks for fear that there might be too much overhead, but
> it's hard for me to imagine a workload where you're waiting for
> cleanup locks often enough for the overhead to matter.

Hm. I am not sure I see the cost as a very compelling thing here. Sure,
we can't list the acquired lwlocks and such, but it should be cheap
enough to export what lwlock we're waiting for if we're going to
sleep. I think it'd be worthwile making that visible somehow.
But that's a separate issue...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Signaling of waiting for a cleanup lock?

From
Amit Kapila
Date:
On Sun, Apr 13, 2014 at 9:14 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-04-12 17:40:34 -0400, Robert Haas wrote:
>> On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > VACUUM sometimes waits synchronously for a cleanup lock on a heap
>> > page. Sometimes for a long time. Without reporting it externally.
>> > Rather confusing ;).
>> >
>> > Since we only take cleanup locks around vacuum, how about we report at
>> > least in pgstat that we're waiting? At the moment, there's really no way
>> > to know if that's what's happening.
>>
>> That seems like a pretty good idea to me.
>
> What I am not sure about is how... It's trivial to set
> pg_stat_activity.waiting = true, but without a corresponding description
> what the backend is waiting for it's not exactly obvious what's
> happening. I think that's better than nothing, but maybe somebody has a
> glorious better idea.
>
> Overwriting parts of the query/activity sounds like it'd be somewhat
> expensive ugly.
>
>> I think we've avoided doing
>> this for LWLocks for fear that there might be too much overhead, but
>> it's hard for me to imagine a workload where you're waiting for
>> cleanup locks often enough for the overhead to matter.
>
> Hm. I am not sure I see the cost as a very compelling thing here. Sure,
> we can't list the acquired lwlocks and such, but it should be cheap
> enough to export what lwlock we're waiting for if we're going to
> sleep. I think it'd be worthwile making that visible somehow.
> But that's a separate issue...

How about having a view like pg_lwlocks similar to pg_locks which
can be used to capture and provide the useful information?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Signaling of waiting for a cleanup lock?

From
Simon Riggs
Date:
On 13 April 2014 16:44, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-04-12 17:40:34 -0400, Robert Haas wrote:
>> On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > VACUUM sometimes waits synchronously for a cleanup lock on a heap
>> > page. Sometimes for a long time. Without reporting it externally.
>> > Rather confusing ;).
>> >
>> > Since we only take cleanup locks around vacuum, how about we report at
>> > least in pgstat that we're waiting? At the moment, there's really no way
>> > to know if that's what's happening.
>>
>> That seems like a pretty good idea to me.
>
> What I am not sure about is how... It's trivial to set
> pg_stat_activity.waiting = true, but without a corresponding description
> what the backend is waiting for it's not exactly obvious what's
> happening. I think that's better than nothing, but maybe somebody has a
> glorious better idea.

pg_stat_activity.waiting = true

can be done in 9.4 easily enough. Any objections to doing this?

Easy to set the ps message also

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Signaling of waiting for a cleanup lock?

From
Andres Freund
Date:
On 2014-04-14 15:45:45 +0100, Simon Riggs wrote:
> On 13 April 2014 16:44, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-04-12 17:40:34 -0400, Robert Haas wrote:
> >> On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> > VACUUM sometimes waits synchronously for a cleanup lock on a heap
> >> > page. Sometimes for a long time. Without reporting it externally.
> >> > Rather confusing ;).
> >> >
> >> > Since we only take cleanup locks around vacuum, how about we report at
> >> > least in pgstat that we're waiting? At the moment, there's really no way
> >> > to know if that's what's happening.
> >>
> >> That seems like a pretty good idea to me.
> >
> > What I am not sure about is how... It's trivial to set
> > pg_stat_activity.waiting = true, but without a corresponding description
> > what the backend is waiting for it's not exactly obvious what's
> > happening. I think that's better than nothing, but maybe somebody has a
> > glorious better idea.
> 
> pg_stat_activity.waiting = true

Yes. That's what I suggested above. The patch for it is trivial, but:
Currently - I think - everything that sets waiting = true, also has
contents in pg_locks. Not sure if it will confuse users if that's not
the case anymore.

> Easy to set the ps message also

That actually makes it considerably more expensive since we'd need to
save the old string somewhere. I am not sure it will be relevant, but
it's not as easy a sell as just setting a single boolean.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Signaling of waiting for a cleanup lock?

From
Robert Haas
Date:
On Mon, Apr 14, 2014 at 10:50 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-04-14 15:45:45 +0100, Simon Riggs wrote:
>> On 13 April 2014 16:44, Andres Freund <andres@2ndquadrant.com> wrote:
>> > On 2014-04-12 17:40:34 -0400, Robert Haas wrote:
>> >> On Fri, Apr 11, 2014 at 10:28 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> >> > VACUUM sometimes waits synchronously for a cleanup lock on a heap
>> >> > page. Sometimes for a long time. Without reporting it externally.
>> >> > Rather confusing ;).
>> >> >
>> >> > Since we only take cleanup locks around vacuum, how about we report at
>> >> > least in pgstat that we're waiting? At the moment, there's really no way
>> >> > to know if that's what's happening.
>> >>
>> >> That seems like a pretty good idea to me.
>> >
>> > What I am not sure about is how... It's trivial to set
>> > pg_stat_activity.waiting = true, but without a corresponding description
>> > what the backend is waiting for it's not exactly obvious what's
>> > happening. I think that's better than nothing, but maybe somebody has a
>> > glorious better idea.
>>
>> pg_stat_activity.waiting = true
>
> Yes. That's what I suggested above. The patch for it is trivial, but:
> Currently - I think - everything that sets waiting = true, also has
> contents in pg_locks. Not sure if it will confuse users if that's not
> the case anymore.

In my personal opinion, it would be OK to change that, provided that
we have some real good documentation for it.

Longer-term, I'm wondering if we shouldn't have something like
pg_stat_activity.wait_type instead of pg_stat_activity.waiting.  It
could be NULL when not waiting, or otherwise "lock", "lwlock", "buffer
cleanup", etc.

>> Easy to set the ps message also
>
> That actually makes it considerably more expensive since we'd need to
> save the old string somewhere. I am not sure it will be relevant, but
> it's not as easy a sell as just setting a single boolean.

Yeah, I'm not too sanguine about squeezing that part into 9.4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Signaling of waiting for a cleanup lock?

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-04-14 15:45:45 +0100, Simon Riggs wrote:
>> On 13 April 2014 16:44, Andres Freund <andres@2ndquadrant.com> wrote:
>>> What I am not sure about is how... It's trivial to set
>>> pg_stat_activity.waiting = true, but without a corresponding description
>>> what the backend is waiting for it's not exactly obvious what's
>>> happening. I think that's better than nothing, but maybe somebody has a
>>> glorious better idea.

>> pg_stat_activity.waiting = true

> Yes. That's what I suggested above. The patch for it is trivial, but:
> Currently - I think - everything that sets waiting = true, also has
> contents in pg_locks. Not sure if it will confuse users if that's not
> the case anymore.

I think it will.  This is a case where a quick and dirty hack is nothing
but quick and dirty.

I wonder whether we should not try to fix this by making the process wait
on a heavyweight lock, if it has to wait.  That would also get us out of
the rather grotty business of using a special-purpose signal to wake it
up.  However, there's still a visibility problem, in that there'd be no
way to tell which other processes are blocking it (which is the thing
you *really* want to know).
        regards, tom lane



Re: Signaling of waiting for a cleanup lock?

From
Andres Freund
Date:
On 2014-04-14 11:30:02 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-04-14 15:45:45 +0100, Simon Riggs wrote:
> >> On 13 April 2014 16:44, Andres Freund <andres@2ndquadrant.com> wrote:
> >>> What I am not sure about is how... It's trivial to set
> >>> pg_stat_activity.waiting = true, but without a corresponding description
> >>> what the backend is waiting for it's not exactly obvious what's
> >>> happening. I think that's better than nothing, but maybe somebody has a
> >>> glorious better idea.
> 
> >> pg_stat_activity.waiting = true
> 
> > Yes. That's what I suggested above. The patch for it is trivial, but:
> > Currently - I think - everything that sets waiting = true, also has
> > contents in pg_locks. Not sure if it will confuse users if that's not
> > the case anymore.
> 
> I think it will.  This is a case where a quick and dirty hack is nothing
> but quick and dirty.

Well, it's still better than the current situation of waiting and not
signalling it to anything externally visible.

I think Robert's suggestion of an additional
waiting_on=lock,lwlock,bufferpin might be a realistic way forward.

> I wonder whether we should not try to fix this by making the process wait
> on a heavyweight lock, if it has to wait.  That would also get us out of
> the rather grotty business of using a special-purpose signal to wake it
> up.  However, there's still a visibility problem, in that there'd be no
> way to tell which other processes are blocking it (which is the thing
> you *really* want to know).

That'd be neat, but I am not really sure how? Which lock would we be
waiting on? I don't really see heavyweight locks scaling up to buffer
pins?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Signaling of waiting for a cleanup lock?

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-04-14 11:30:02 -0400, Tom Lane wrote:
>> I wonder whether we should not try to fix this by making the process wait
>> on a heavyweight lock, if it has to wait.  That would also get us out of
>> the rather grotty business of using a special-purpose signal to wake it
>> up.  However, there's still a visibility problem, in that there'd be no
>> way to tell which other processes are blocking it (which is the thing
>> you *really* want to know).

> That'd be neat, but I am not really sure how? Which lock would we be
> waiting on?

Well, we already have locktags for relation pages, so that aspect isn't
that hard.  The bigger problem is what are we waiting *for*, that is,
what is it that blocks the lock request from being granted?

In an ideal world, when we needed to wait for a cleanup lock, we'd cause
the lock manager to set up pre-granted sharable page locks for all the
processes currently holding buffer pins, and then wait for an exclusive
page lock.  The current hack of signaling when you're the last one off the
page would be replaced by releasing your lock (if it exists) when you drop
your own pin.  I'm not sure it's really worth the trouble to try to do
this, but it would solve the visibility problem; and it might allow us to
be a bit smarter about the priority of a cleanup lock request versus
incoming regular pin requests.
        regards, tom lane



Re: Signaling of waiting for a cleanup lock?

From
Robert Haas
Date:
On Mon, Apr 14, 2014 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On 2014-04-14 11:30:02 -0400, Tom Lane wrote:
>>> I wonder whether we should not try to fix this by making the process wait
>>> on a heavyweight lock, if it has to wait.  That would also get us out of
>>> the rather grotty business of using a special-purpose signal to wake it
>>> up.  However, there's still a visibility problem, in that there'd be no
>>> way to tell which other processes are blocking it (which is the thing
>>> you *really* want to know).
>
>> That'd be neat, but I am not really sure how? Which lock would we be
>> waiting on?
>
> Well, we already have locktags for relation pages, so that aspect isn't
> that hard.  The bigger problem is what are we waiting *for*, that is,
> what is it that blocks the lock request from being granted?
>
> In an ideal world, when we needed to wait for a cleanup lock, we'd cause
> the lock manager to set up pre-granted sharable page locks for all the
> processes currently holding buffer pins, and then wait for an exclusive
> page lock.  The current hack of signaling when you're the last one off the
> page would be replaced by releasing your lock (if it exists) when you drop
> your own pin.  I'm not sure it's really worth the trouble to try to do
> this, but it would solve the visibility problem; and it might allow us to
> be a bit smarter about the priority of a cleanup lock request versus
> incoming regular pin requests.

AFAICS, the big advantage of something like this is that we'd get
proper deadlock detection, and that's not a trivial point.  But other
than that I don't like it much.  The fast-path locking stuff was
basically this kind of thing in reverse: instead of trying to
implement something with a separate locking implementation into the
regular lock manager, we were trying to take something that was
handled by the regular lock manager in a completely generic way and
give it an optimized path that performs much better on the cases that
actually arise in practice.  And, it was worth it, because we got a
huge performance boost, but it was also messy and, from your recent
reports, apparently still not fully debugged.

Now, I don't really have a specific proposal in mind that I think
would be better than shoehorning more stuff into the lock manager, so
it would be hard for me to oppose that if someone worked out all of
the problems and proposed a patch.  But I have uneasy feelings about
it.  It's hard to see how we can have a reasonable deadlock detector
without some kind of very generic locking mechanism, and that's what
the lock manager is, but we also know from experience that it's really
slow in some circumstances and frustratingly easy to run out of shared
memory, so I can't help but wonder if we really ought to be giving the
whole way we do locking and deadlock detection a broader rethink
somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Signaling of waiting for a cleanup lock?

From
Andres Freund
Date:
On 2014-04-14 12:02:22 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-04-14 11:30:02 -0400, Tom Lane wrote:
> >> I wonder whether we should not try to fix this by making the process wait
> >> on a heavyweight lock, if it has to wait.  That would also get us out of
> >> the rather grotty business of using a special-purpose signal to wake it
> >> up.  However, there's still a visibility problem, in that there'd be no
> >> way to tell which other processes are blocking it (which is the thing
> >> you *really* want to know).
> 
> > That'd be neat, but I am not really sure how? Which lock would we be
> > waiting on?
> 
> Well, we already have locktags for relation pages, so that aspect isn't
> that hard.  The bigger problem is what are we waiting *for*, that is,
> what is it that blocks the lock request from being granted?

> In an ideal world, when we needed to wait for a cleanup lock, we'd cause
> the lock manager to set up pre-granted sharable page locks for all the
> processes currently holding buffer pins, and then wait for an exclusive
> page lock.

Well, wouldn't that imply every pin (well, unless local) goes through
the lock manager in some way because otherwise we'll end up with
basically the same kludges as today painted in a different color? I
can't believe that'll easily work out performancewise.

I think it might be worthwile to do this if we can figure out how to do
it performantly, but I won't hold my breath for it. And I think we need
to improve visibility of cleanup locks (and possibly lwlocks) on a much
shorter timescale.

> I'm not sure it's really worth the trouble to try to do
> this, but it would solve the visibility problem; and it might allow us to
> be a bit smarter about the priority of a cleanup lock request versus
> incoming regular pin requests.

I don't know how, but some smarter priorization here would be really
helpful. It pretty much sucks that some relation essentially can't be
vacuumed once there's tuples needing freezing.
I wonder if we could make the freezing part work without a cleanup lock...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Signaling of waiting for a cleanup lock?

From
Andres Freund
Date:
On 2014-04-14 12:21:09 -0400, Robert Haas wrote:
> AFAICS, the big advantage of something like this is that we'd get
> proper deadlock detection, and that's not a trivial point.

Hm. Is this actually something we need? I am not aware of deadlock prone
scenarios involving buffer pins during normal processing (HS is another
matter).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Signaling of waiting for a cleanup lock?

From
Alvaro Herrera
Date:
Tom Lane wrote:

> In an ideal world, when we needed to wait for a cleanup lock, we'd cause
> the lock manager to set up pre-granted sharable page locks for all the
> processes currently holding buffer pins, and then wait for an exclusive
> page lock.  The current hack of signaling when you're the last one off the
> page would be replaced by releasing your lock (if it exists) when you drop
> your own pin.  I'm not sure it's really worth the trouble to try to do
> this, but it would solve the visibility problem; and it might allow us to
> be a bit smarter about the priority of a cleanup lock request versus
> incoming regular pin requests.

AFAIU this would represent a behavioral change: right now, vacuum waits
until everybody is gone, and new pinners might arrive while vacuum is
waiting.  With this scheme, new pinners would have to wait behind
vacuum.  Maybe this change alone is enough to avoid vacuum blocking for
long periods waiting for cleanup lock.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Signaling of waiting for a cleanup lock?

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-04-14 12:21:09 -0400, Robert Haas wrote:
>> AFAICS, the big advantage of something like this is that we'd get
>> proper deadlock detection, and that's not a trivial point.

> Hm. Is this actually something we need? I am not aware of deadlock prone
> scenarios involving buffer pins during normal processing (HS is another
> matter).

Ordinary buffer pinning isn't supposed to be subject to deadlocks (that's
why it's reasonable to use LWLocks for it), but it's less clear that
cleanup locks couldn't be subject to deadlocks.

The real issue here is that LWLocks support neither deadlock detection nor
state reporting, and that's more or less exactly why they're lightweight,
so we do not want to go in the direction of adding such features to them.
The cleanup-lock mechanism is sort of an ugly intermediate thing between
LWLocks and regular heavyweight locks.  That was okay, sort of, as long
as it was simple ... but once we start loading additional feature
requirements onto it, it gets (even) less attractive to have a
single-purpose mechanism.  In particular I'm not sold on the use-case
for being able to tell that a process is waiting without being able to
tell what it's waiting for.  I can figure that much out already.

One concrete reason not to do the proposed trivial hack is that the lock
readout views are asynchronous.  Right now, if someone sees a process that
claims to be waiting but they don't see any entry in pg_locks, they know
they saw inconsistent state.  If we add a valid state where waiting can be
true without a pg_locks entry, they won't know what to think.  I don't
want to go there.
        regards, tom lane



Re: Signaling of waiting for a cleanup lock?

From
Andres Freund
Date:
On 2014-04-14 13:06:21 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-04-14 12:21:09 -0400, Robert Haas wrote:
> >> AFAICS, the big advantage of something like this is that we'd get
> >> proper deadlock detection, and that's not a trivial point.
> 
> > Hm. Is this actually something we need? I am not aware of deadlock prone
> > scenarios involving buffer pins during normal processing (HS is another
> > matter).
> 
> Ordinary buffer pinning isn't supposed to be subject to deadlocks (that's
> why it's reasonable to use LWLocks for it), but it's less clear that
> cleanup locks couldn't be subject to deadlocks.

We only acquire cleanup locks in a blocking fashion from vacuum - and
vacuum has a pretty clearly defined locking behaviour. Additionally both
in vacuum and in opportunistic pruning there's only a very small and
defined amount of work done once the buffer is successfully pinned.

> In particular I'm not sold on the use-case
> for being able to tell that a process is waiting without being able to
> tell what it's waiting for.  I can figure that much out already.

You can? How? It could also be io or something else that's problematic.

> One concrete reason not to do the proposed trivial hack is that the lock
> readout views are asynchronous.  Right now, if someone sees a process that
> claims to be waiting but they don't see any entry in pg_locks, they know
> they saw inconsistent state.  If we add a valid state where waiting can be
> true without a pg_locks entry, they won't know what to think.  I don't
> want to go there.

What's you opinion of the waiting = true combined with waiting_for =
'cleanup lock' or something similar?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Signaling of waiting for a cleanup lock?

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-04-14 13:06:21 -0400, Tom Lane wrote:
>> In particular I'm not sold on the use-case
>> for being able to tell that a process is waiting without being able to
>> tell what it's waiting for.  I can figure that much out already.

> You can? How? It could also be io or something else that's problematic.

If the process is not consuming any CPU time at all, it's waiting on
something.  (Now admittedly, that might be hard to tell remotely ---
but Simon seems to be assuming you have access to "ps" output.)

>> One concrete reason not to do the proposed trivial hack is that the lock
>> readout views are asynchronous.  Right now, if someone sees a process that
>> claims to be waiting but they don't see any entry in pg_locks, they know
>> they saw inconsistent state.  If we add a valid state where waiting can be
>> true without a pg_locks entry, they won't know what to think.  I don't
>> want to go there.

> What's you opinion of the waiting = true combined with waiting_for =
> 'cleanup lock' or something similar?

It's better than the original proposal, but it still doesn't tell you
which other processes are blocking the waiter, which makes it not
terribly useful IMO.  Any actual gain in utility here would come from
being able to find that out.
        regards, tom lane



Re: Signaling of waiting for a cleanup lock?

From
Robert Haas
Date:
On Mon, Apr 14, 2014 at 1:26 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-04-14 13:06:21 -0400, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>> > On 2014-04-14 12:21:09 -0400, Robert Haas wrote:
>> >> AFAICS, the big advantage of something like this is that we'd get
>> >> proper deadlock detection, and that's not a trivial point.
>>
>> > Hm. Is this actually something we need? I am not aware of deadlock prone
>> > scenarios involving buffer pins during normal processing (HS is another
>> > matter).
>>
>> Ordinary buffer pinning isn't supposed to be subject to deadlocks (that's
>> why it's reasonable to use LWLocks for it), but it's less clear that
>> cleanup locks couldn't be subject to deadlocks.
>
> We only acquire cleanup locks in a blocking fashion from vacuum - and
> vacuum has a pretty clearly defined locking behaviour. Additionally both
> in vacuum and in opportunistic pruning there's only a very small and
> defined amount of work done once the buffer is successfully pinned.

Nevertheless, I'm pretty sure undetected deadlocks are possible; we've
discussed it before.  The TODO list contains a pointer to:

http://www.postgresql.org/message-id/21534.1200956464@sss.pgh.pa.us

I think the scenario is something like: vacuum is waiting for a buffer
pin that pertains to an open query in some other session, which then
tries to take a lock that conflicts with one already held by vacuum.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Signaling of waiting for a cleanup lock?

From
Jim Nasby
Date:
On 4/14/14, 12:06 PM, Tom Lane wrote:
> One concrete reason not to do the proposed trivial hack is that the lock
> readout views are asynchronous.  Right now, if someone sees a process that
> claims to be waiting but they don't see any entry in pg_locks, they know
> they saw inconsistent state.  If we add a valid state where waiting can be
> true without a pg_locks entry, they won't know what to think.  I don't
> want to go there.

FWIW, I really wish we had a way to eliminate that inconsistency. It makes already difficult to debug problems even
harderto deal with.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Signaling of waiting for a cleanup lock?

From
Jim Nasby
Date:
On 4/14/14, 12:44 PM, Tom Lane wrote:
> Andres Freund<andres@2ndquadrant.com>  writes:
>> >On 2014-04-14 13:06:21 -0400, Tom Lane wrote:
>>> >>In particular I'm not sold on the use-case
>>> >>for being able to tell that a process is waiting without being able to
>>> >>tell what it's waiting for.  I can figure that much out already.

>> >You can? How? It could also be io or something else that's problematic.

> If the process is not consuming any CPU time at all, it's waiting on
> something.  (Now admittedly, that might be hard to tell remotely ---
> but Simon seems to be assuming you have access to "ps" output.)

Right... and then I always find myself wondering what it's actually waiting on. IO? lwlock? Something else?
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net