Thread: Lock Wait Statistics (next commitfest)

Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

25 January 2009, 19:58:05

Where I work they make extensive use of Postgresql. One of the things
they typically want to know about are lock waits. Out of the box in
there is not much in the way of tracking for such, particularly in older
versions. The view pg_stats is fine for stuff happening *now*, but
typically I find I'm being asked about something that happened last
night...

Now for those platforms with dtrace there is a lock wait probe, useful -
but not for those of us on Linux! There is the conf option to log lock
waits > deadlock timeout (8.3 onwards), and this is great, but I
wondered if having something like this available as part of the stats
module would be useful.

So here is my initial attempt at this, at this point merely to spark
discussion (attached patch)

I have followed some of the ideas from the function execution stats, but
locks required some special treatment.

- new parameter track_locks (maybe should be track_lock_waits?)
- new hash locks attached to stats dbentry
- several new stat*lock c functions
- new view pg_stat_lock_waits
- new attributes for pg_stat_database

This version has simply exposed the locktag structure in the view along
with corresponding #waits and wait time. This should probably get
reformed to look a little more like pg_locks. I figured this is easily
doable along with the no doubt many changes coming from revew comments.

Also I did not do any clever amalgamation of transaction lock waits -
there is probably gonna be a lot of those and keeping the detail is
unlikely to be useful. It would be easy to collect them all together in
one transaction entry.

regards

Mark

Attachment

lockstats-1.patch.gz

Re: Lock Wait Statistics (next commitfest)

From

Jaime Casanova

Date:

10 July 2009, 14:24:14

On Sun, Jan 25, 2009 at 6:57 PM, Mark Kirkwood<markir@paradise.net.nz> wrote:
>
> So here is my initial attempt at this, at this point merely to spark
> discussion (attached patch)
>

this patch doesn't apply cleanly to head... can you update it, please?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: Lock Wait Statistics (next commitfest)

From

Jaime Casanova

Date:

10 July 2009, 21:56:50

On Fri, Jul 10, 2009 at 12:23 PM, Jaime
Casanova<jcasanov@systemguards.com.ec> wrote:
> On Sun, Jan 25, 2009 at 6:57 PM, Mark Kirkwood<markir@paradise.net.nz> wrote:
>>
>> So here is my initial attempt at this, at this point merely to spark
>> discussion (attached patch)
>>
>
> this patch doesn't apply cleanly to head... can you update it, please?
>

i did it myself, i think this is something we need...

this compile and seems to work... something i was wondering is that
having the total time of lock waits is not very accurate because we
can have 9 lock waits awaiting 1 sec each and one awaiting for 1
minute... simply sum them all will give a bad statistic or am i
missing something?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Attachment

lockstats-2.patch

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

17 July 2009, 05:39:00

Jaime Casanova wrote:
>
> i did it myself, i think this is something we need...
>
> this compile and seems to work... something i was wondering is that
> having the total time of lock waits is not very accurate because we
> can have 9 lock waits awaiting 1 sec each and one awaiting for 1
> minute... simply sum them all will give a bad statistic or am i
> missing something?
>
>   
Thank you Jaime - looks good. I seem to have been asleep at the wheel 
and missed *both* of your emails until now, belated apologies for that  
- especially the first one :-(

With respect to the sum of wait times being not very granular, yes - 
quite true. I was thinking it is useful to be able to answer the 
question 'where is my wait time being spent' - but it hides cases like 
the one you mention. What would you like to see?  would max and min wait 
times be a useful addition, or are you thinking along different lines?

Cheers

Mark

Re: Lock Wait Statistics (next commitfest)

From

Jaime Casanova

Date:

20 July 2009, 04:48:29

On Fri, Jul 17, 2009 at 3:38 AM, Mark Kirkwood<markir@paradise.net.nz> wrote:
>
> With respect to the sum of wait times being not very granular, yes - quite
> true. I was thinking it is useful to be able to answer the question 'where
> is my wait time being spent' - but it hides cases like the one you mention.
> What would you like to see?  would max and min wait times be a useful
> addition, or are you thinking along different lines?
>

track number of locks, sum of wait times, max(wait time).
but actually i started to think that the best is just make use of
log_lock_waits send the logs to csvlog and analyze there...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

23 July 2009, 04:58:03

Jaime Casanova wrote:
> On Fri, Jul 17, 2009 at 3:38 AM, Mark Kirkwood<markir@paradise.net.nz> wrote:
>   
>> With respect to the sum of wait times being not very granular, yes - quite
>> true. I was thinking it is useful to be able to answer the question 'where
>> is my wait time being spent' - but it hides cases like the one you mention.
>> What would you like to see?  would max and min wait times be a useful
>> addition, or are you thinking along different lines?
>>
>>     
>
> track number of locks, sum of wait times, max(wait time).
> but actually i started to think that the best is just make use of
> log_lock_waits send the logs to csvlog and analyze there...
>
>   
Right - I'll look at adding max (at least) early next week.

Yeah, enabling log_lock_waits is certainly another approach, however you 
currently miss out on those that are < deadlock_timeout - and 
potentially they could be the source of your problem (i.e millions of 
waits all < deadlock_timeout but taken together rather significant). 
This shortcoming could be overcome by making the cutoff wait time 
decoupled from deadlock_timeout (e.g a new parameter 
log_min_lock_wait_time or similar).

I'm thinking that having the lock waits analyzable via sql easily may 
mean that for most people they don't need to collect and analyze their 
logs for this stuff (they just examine the lock stats view from Pgadmin 
or similar).

Cheers

Mark

Re: Lock Wait Statistics (next commitfest)

From

Tom Lane

Date:

23 July 2009, 13:16:53

Mark Kirkwood <markir@paradise.net.nz> writes:
> Yeah, enabling log_lock_waits is certainly another approach, however you 
> currently miss out on those that are < deadlock_timeout - and 
> potentially they could be the source of your problem (i.e millions of 
> waits all < deadlock_timeout but taken together rather significant). 
> This shortcoming could be overcome by making the cutoff wait time 
> decoupled from deadlock_timeout (e.g a new parameter 
> log_min_lock_wait_time or similar).

The reason that they're tied together is to keep from creating
unreasonable complexity (and an unreasonable number of extra kernel
calls) in management of the timeout timers.  You will find that you
can't just wave your hand and decree that they are now decoupled.
        regards, tom lane

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

24 July 2009, 21:23:30

Tom Lane wrote:
> Mark Kirkwood <markir@paradise.net.nz> writes:
>   
>> Yeah, enabling log_lock_waits is certainly another approach, however you 
>> currently miss out on those that are < deadlock_timeout - and 
>> potentially they could be the source of your problem (i.e millions of 
>> waits all < deadlock_timeout but taken together rather significant). 
>> This shortcoming could be overcome by making the cutoff wait time 
>> decoupled from deadlock_timeout (e.g a new parameter 
>> log_min_lock_wait_time or similar).
>>     
>
> The reason that they're tied together is to keep from creating
> unreasonable complexity (and an unreasonable number of extra kernel
> calls) in management of the timeout timers.  You will find that you
> can't just wave your hand and decree that they are now decoupled.
>
>   

Thanks Tom - I did wonder if there was a deeper reason they were tied 
together!

Cheers

Mark

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

24 July 2009, 21:29:20

Mark Kirkwood wrote:
> Jaime Casanova wrote:
>> On Fri, Jul 17, 2009 at 3:38 AM, Mark 
>> Kirkwood<markir@paradise.net.nz> wrote:
>>  
>>> With respect to the sum of wait times being not very granular, yes - 
>>> quite
>>> true. I was thinking it is useful to be able to answer the question 
>>> 'where
>>> is my wait time being spent' - but it hides cases like the one you 
>>> mention.
>>> What would you like to see?  would max and min wait times be a useful
>>> addition, or are you thinking along different lines?
>>>
>>>     
>>
>> track number of locks, sum of wait times, max(wait time).
>> but actually i started to think that the best is just make use of
>> log_lock_waits send the logs to csvlog and analyze there...
>>
>>   
> Right - I'll look at adding max (at least) early next week.
>

I'm also thinking of taking a look at amalgamating transaction type lock 
waits. This seems like a good idea because:
- individually, and viewed at a later date, I don't think they 
individual detail is going to be useful- there will be a lot of them- I think the statistical data (count, sum elapsed,
maxelapsed) may be 
 
sufficiently interesting

Cheers

Mark

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

31 July 2009, 22:14:32

Mark Kirkwood wrote:
> Jaime Casanova wrote:
>> On Fri, Jul 17, 2009 at 3:38 AM, Mark
>> Kirkwood<markir@paradise.net.nz> wrote:
>>
>>> With respect to the sum of wait times being not very granular, yes -
>>> quite
>>> true. I was thinking it is useful to be able to answer the question
>>> 'where
>>> is my wait time being spent' - but it hides cases like the one you
>>> mention.
>>> What would you like to see?  would max and min wait times be a useful
>>> addition, or are you thinking along different lines?
>>>
>>>
>>
>> track number of locks, sum of wait times, max(wait time).
>> but actually i started to think that the best is just make use of
>> log_lock_waits send the logs to csvlog and analyze there...
>>
>>
> Right - I'll look at adding max (at least) early next week.
>
>
>
>
Patch with max(wait time).

Still TODO

- amalgamate individual transaction lock waits
- redo (rather ugly) temporary pg_stat_lock_waits in a form more like
pg_locks

Attachment

lockstats-3.patch.gz

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

08 August 2009, 21:47:34

Mark Kirkwood wrote:
> Mark Kirkwood wrote:
>> Jaime Casanova wrote:
>>> On Fri, Jul 17, 2009 at 3:38 AM, Mark
>>> Kirkwood<markir@paradise.net.nz> wrote:
>>>
>>>> With respect to the sum of wait times being not very granular, yes
>>>> - quite
>>>> true. I was thinking it is useful to be able to answer the question
>>>> 'where
>>>> is my wait time being spent' - but it hides cases like the one you
>>>> mention.
>>>> What would you like to see?  would max and min wait times be a useful
>>>> addition, or are you thinking along different lines?
>>>>
>>>>
>>>
>>> track number of locks, sum of wait times, max(wait time).
>>> but actually i started to think that the best is just make use of
>>> log_lock_waits send the logs to csvlog and analyze there...
>>>
>>>
>> Right - I'll look at adding max (at least) early next week.
>>
>>
>>
>>
> Patch with max(wait time).
>
> Still TODO
>
> - amalgamate individual transaction lock waits
> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
> pg_locks
>
This version has the individual transaction lock waits amalgamated.

Still TODO: redo pg_stat_lock_waits ...

Attachment

lockstats-4.patch.gz

Re: Lock Wait Statistics (next commitfest)

From

Stephen Frost

Date:

15 September 2009, 22:58:16

Mark,
 Your last email on this patch, from August 9th, indicates that you've still got "TODO: redo pg_stat_lock_waits ...".
Hasyou updated this patch since then?
 
     Thanks!
    Stephen

Re: Lock Wait Statistics (next commitfest)

From

Pierre Frédéric Caillaud

Date:

18 September 2009, 15:49:14

I have this patch, if you're interested.

LWLock Instrumentation Patch

- counts locks and waits in shared and exclusive mode
- for selected locks, measures wait and hold times
- for selected locks, displays a histogram of wait and hold times
- information is printed at backend exit

Configurable by #define's in lwlock.c

Regards,
pierre

Attachment

lwlock_instrument_v4.patch

Re: Lock Wait Statistics (next commitfest)

From

Josh Berkus

Date:

18 September 2009, 15:59:32

Pierre,

> Configurable by #define's in lwlock.c

Given that we already have dtrace/systemtap probes around the lwlocks,
is there some way you could use those instead of extra #defines?

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

23 September 2009, 03:34:05

Stephen Frost wrote:
> Mark,
>
>   Your last email on this patch, from August 9th, indicates that you've
>   still got "TODO: redo pg_stat_lock_waits ...".  Has you updated this
>   patch since then?
>
>
>   

Stephen,

No -  that is still a TODO for me - real life getting in the way :-)

Cheers

Mark

Re: Lock Wait Statistics (next commitfest)

From

Jaime Casanova

Date:

28 September 2009, 04:14:58

On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir@paradise.net.nz> wrote:
>>>
>>>
>> Patch with max(wait time).
>>
>> Still TODO
>>
>> - amalgamate individual transaction lock waits
>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
>> pg_locks
>>
> This version has the individual transaction lock waits amalgamated.
>
> Still TODO: redo pg_stat_lock_waits ...
>

it applies with some hunks, compiles fine and seems to work...
i'm still not sure this is what we need, some more comments could be helpful.

what kind of questions are we capable of answer with this and and what
kind of questions are we still missing?

for example, now we know "number of locks that had to wait", "total
time waiting" and "max time waiting for a single lock"... but still we
can have an inaccurate understanding if we have lots of locks waiting
little time and a few waiting a huge amount of time...

something i have been asked when system starts to slow down is "can we
know if there were a lock contention on that period"? for now the only
way to answer that is logging locks

about the patch itself:
you haven't documented either. what is the pg_stat_lock_waits view
for? and what are those fieldx it has?

i'll let this patch as "needs review" for more people to comment on it...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: Lock Wait Statistics (next commitfest)

From

Jeff Janes

Date:

29 September 2009, 22:02:51

On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
> On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir@paradise.net.nz> wrote:
>>>>
>>>>
>>> Patch with max(wait time).
>>>
>>> Still TODO
>>>
>>> - amalgamate individual transaction lock waits
>>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
>>> pg_locks
>>>
>> This version has the individual transaction lock waits amalgamated.
>>
>> Still TODO: redo pg_stat_lock_waits ...
>>
>
> it applies with some hunks, compiles fine and seems to work...
> i'm still not sure this is what we need, some more comments could be helpful.

I'm pretty sure the logic of this patch is not correct.

in pgstat_init_lock_wait(LOCKTAG *locktag)
...
+       l_curr = htabent->l_counts.l_tot_wait_time;
+       INSTR_TIME_SET_CURRENT(l_start);
+       INSTR_TIME_ADD(l_curr, l_start);
+       htabent->l_counts.l_tot_wait_time = l_curr;

in pgstat_end_lock_wait(LOCKTAG  *locktag)
...
+       l_start = htabent->l_counts.l_tot_wait_time;
+       INSTR_TIME_SET_CURRENT(l_end);
+       INSTR_TIME_SUBTRACT(l_end, l_start);
+       htabent->l_counts.l_tot_wait_time = l_end;

So l_start = time cumulatively waited previously + time at start of this wait.

l_end - l_start is equal to:

= time at end of this wait -  ( time at start of this wait + time
cumulatively waited previously)
= (time at end of this wait - time at start of this wait) - time
cumulatively waited previously
= (duration of this wait) - time waited cumulatively previously.

That minus sign in the last line can't be good, can it?

Also

+       htabent->l_counts.l_tot_wait_time = l_end;
+
+       if (INSTR_TIME_GET_MICROSEC(l_end) >
INSTR_TIME_GET_MICROSEC(htabent->l_counts.l_max_wait_time))
+               htabent->l_counts.l_max_wait_time = l_end;

The total wait time is equal to the max wait time (which are both
equal to l_end)?
One or both of those has to end up being wrong.  At this stage, is
l_end supposed to be the last wait time, or the cumulative wait time?

One of the things in the patch review checklist is about compiler
warnings, so I am reporting these:

lock.c: In function `LockAcquire':
lock.c:797: warning: passing arg 1 of `pgstat_init_lock_wait' discards
qualifiers from pointer target type
lock.c:802: warning: passing arg 1 of `pgstat_end_lock_wait' discards
qualifiers from pointer target type

Cheers,

Jeff

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

03 October 2009, 20:15:33

Jaime Casanova wrote:
>
> it applies with some hunks, compiles fine and seems to work...
> i'm still not sure this is what we need, some more comments could be helpful.
>
>   
Yeah, that's the big question. Are the current capabilities (logging 'em 
for waits > deadlock timeout + dtrace hooks) enough?  I'm thinking that 
if we had dtrace for linux generally available, then the need for this 
patch would be lessened.

> what kind of questions are we capable of answer with this and and what
> kind of questions are we still missing?
>
> for example, now we know "number of locks that had to wait", "total
> time waiting" and "max time waiting for a single lock"... but still we
> can have an inaccurate understanding if we have lots of locks waiting
> little time and a few waiting a huge amount of time...
>
> something i have been asked when system starts to slow down is "can we
> know if there were a lock contention on that period"? for now the only
> way to answer that is logging locks
>
>   
Right - there still may be other aggregates that need to be captured.... 
it would be great to have some more feedback from the field about this. 
In my case, I was interested in seeing if the elapsed time was being 
spent waiting for locks or actually executing (in fact it turned out to 
be the latter - but was still very useful to be able to rule out locking 
issues).  However , as you mention - there maybe cases where the 
question is more about part of the system suffering a disproportional 
number/time of lock waits...

> about the patch itself:
> you haven't documented either. what is the pg_stat_lock_waits view
> for? and what are those fieldx it has?
>
>   

Yeah, those fields are straight from the LOCKTAG structure. I need to 
redo the view to be more like pg_locks, and also do the doco. However 
I've been a bit hesitant to dive into these last two steps until I see 
that there is some real enthusiasm for this patch (or failing that, a 
feeling that it is needed...).

> i'll let this patch as "needs review" for more people to comment on it...
>
>   
Agreed, thanks for the comments.

Mark

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

03 October 2009, 20:22:42

Jeff Janes wrote:
>
> The total wait time is equal to the max wait time (which are both
> equal to l_end)?
> One or both of those has to end up being wrong.  At this stage, is
> l_end supposed to be the last wait time, or the cumulative wait time?
>
>
>   
Hmm - I may well have fat fingered the arithmetic, thanks I'll take a look!

> One of the things in the patch review checklist is about compiler
> warnings, so I am reporting these:
>
> lock.c: In function `LockAcquire':
> lock.c:797: warning: passing arg 1 of `pgstat_init_lock_wait' discards
> qualifiers from pointer target type
> lock.c:802: warning: passing arg 1 of `pgstat_end_lock_wait' discards
> qualifiers from pointer target type
>
>
>
>   

Right, will look at too.

Cheers

Mark

Re: Lock Wait Statistics (next commitfest)

From

Jeff Janes

Date:

04 October 2009, 17:14:43

On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
> On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir@paradise.net.nz> wrote:
>>>>
>>>>
>>> Patch with max(wait time).
>>>
>>> Still TODO
>>>
>>> - amalgamate individual transaction lock waits
>>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
>>> pg_locks
>>>
>> This version has the individual transaction lock waits amalgamated.
>>
>> Still TODO: redo pg_stat_lock_waits ...
>>
>
> it applies with some hunks, compiles fine and seems to work...
> i'm still not sure this is what we need, some more comments could be helpful.
>
> what kind of questions are we capable of answer with this and and what
> kind of questions are we still missing?
>
> for example, now we know "number of locks that had to wait", "total
> time waiting" and "max time waiting for a single lock"... but still we
> can have an inaccurate understanding if we have lots of locks waiting
> little time and a few waiting a huge amount of time...

Aren't the huge ones already loggable from the deadlock detector?

With the max, we can at least put an upper limit on how long the
longest ones could have been.  However, is there a way to reset the
max?  I tried deleting data/pg_stat_tmp, but that didn't work.  With
cumulative values, you can you take snapshots and then take the
difference of them, that won't work with max.  If the max can't be
reset except with an initdb, I think that makes it barely usable.

> something i have been asked when system starts to slow down is "can we
> know if there were a lock contention on that period"? for now the only
> way to answer that is logging locks

I was surprised to find that running with track_locks on did not cause
a detectable difference in performance, so you could just routinely do
regularly scheduled snapshots and go back and mine them over the time
that a problem was occurring.  I just checked with pgbench over
various levels of concurrency and fsync settings.  If potential
slowness wouldn't show up there, I don't know how else to look for it.

Cheers,

Jeff

Re: Lock Wait Statistics (next commitfest)

From

Robert Haas

Date:

05 October 2009, 22:03:46

On Sun, Oct 4, 2009 at 4:14 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova
> <jcasanov@systemguards.com.ec> wrote:
>> On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir@paradise.net.nz> wrote:
>>>>>
>>>>>
>>>> Patch with max(wait time).
>>>>
>>>> Still TODO
>>>>
>>>> - amalgamate individual transaction lock waits
>>>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
>>>> pg_locks
>>>>
>>> This version has the individual transaction lock waits amalgamated.
>>>
>>> Still TODO: redo pg_stat_lock_waits ...
>>>
>>
>> it applies with some hunks, compiles fine and seems to work...
>> i'm still not sure this is what we need, some more comments could be helpful.
>>
>> what kind of questions are we capable of answer with this and and what
>> kind of questions are we still missing?
>>
>> for example, now we know "number of locks that had to wait", "total
>> time waiting" and "max time waiting for a single lock"... but still we
>> can have an inaccurate understanding if we have lots of locks waiting
>> little time and a few waiting a huge amount of time...
>
> Aren't the huge ones already loggable from the deadlock detector?
>
> With the max, we can at least put an upper limit on how long the
> longest ones could have been.  However, is there a way to reset the
> max?  I tried deleting data/pg_stat_tmp, but that didn't work.  With
> cumulative values, you can you take snapshots and then take the
> difference of them, that won't work with max.  If the max can't be
> reset except with an initdb, I think that makes it barely usable.
>
>> something i have been asked when system starts to slow down is "can we
>> know if there were a lock contention on that period"? for now the only
>> way to answer that is logging locks
>
> I was surprised to find that running with track_locks on did not cause
> a detectable difference in performance, so you could just routinely do
> regularly scheduled snapshots and go back and mine them over the time
> that a problem was occurring.  I just checked with pgbench over
> various levels of concurrency and fsync settings.  If potential
> slowness wouldn't show up there, I don't know how else to look for it.

It seems that this patch had open TODO items at the beginning of the
CommitFest (so perhaps we should have bounced it immediately), and I
think that's still the case now, so I am going to mark this as
Returned with Feedback.  A lot of good reviewing has been done,
though, so many this can be submitted for a future CommitFest in
something close to a final form.

Thanks,

...Robert

Re: Lock Wait Statistics (next commitfest)

From

Bruce Momjian

Date:

26 February 2010, 23:45:36

What happened to this patch?

---------------------------------------------------------------------------

Mark Kirkwood wrote:
> Where I work they make extensive use of Postgresql. One of the things 
> they typically want to know about are lock waits. Out of the box in 
> there is not much in the way of tracking for such, particularly in older 
> versions. The view pg_stats is fine for stuff happening *now*, but 
> typically I find I'm being asked about something that happened last 
> night...
> 
> Now for those platforms with dtrace there is a lock wait probe, useful - 
> but not for those of us on Linux! There is the conf option to log lock 
> waits > deadlock timeout (8.3 onwards), and this is great, but I 
> wondered if having something like this available as part of the stats 
> module would be useful.
> 
> So here is my initial attempt at this, at this point merely to spark 
> discussion (attached patch)
> 
> I have followed some of the ideas from the function execution stats, but 
> locks required some special treatment.
> 
> - new parameter track_locks (maybe should be track_lock_waits?)
> - new hash locks attached to stats dbentry
> - several new stat*lock c functions
> - new view pg_stat_lock_waits
> - new attributes for pg_stat_database
> 
> This version has simply exposed the locktag structure in the view along 
> with corresponding #waits and wait time. This should probably get 
> reformed to look a little more like pg_locks. I figured this is easily 
> doable along with the no doubt many changes coming from revew comments.
> 
> Also I did not do any clever amalgamation of transaction lock waits - 
> there is probably gonna be a lot of those and keeping the detail is 
> unlikely to be useful. It would be easy to collect them all together in 
> one transaction entry.
> 
> regards
> 
> Mark
> 
> 
> 

[ application/x-gzip is not supported, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.comPG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard
drive,Christ can be your backup. +

Re: Lock Wait Statistics (next commitfest)

From

Greg Smith

Date:

27 February 2010, 01:11:04

Bruce Momjian wrote:
> What happened to this patch?
>   

Returned with feedback in October after receiving a lot of review, no 
updated version submitted since then:

https://commitfest.postgresql.org/action/patch_view?id=98

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Lock Wait Statistics (next commitfest)

From

Gokulakannan Somasundaram

Date:

27 February 2010, 02:44:03

I am just adding my two cents, please ignore it, if its totally irrelevant.
While we do performance testing/tuning of any applications, the important things, a standard monitoring requirement from a database are
a) Different type of wait events and the time spent in each of them
b) Top ten Queries by Total Logical reads & Average Logical Reads
c) Top ten Queries by Total CPU Time & Average CPU Time

The monitoring methodology should not put too much overhead during the test to invalidate the application response times captured during the performance test (Let's not worry about Heisenberg uncertainty for now :)) )

Of all the databases i worked with, Oracle provides the best monitoring product in the form of Statspack.

Statspack works by the following way -a) it takes a copy of important catalog tables(pg_ tables) which store the information like wait statistics against wait events, i/o statistics cumulative against each SQL_Hash( and SQL_Text), whether a particular plan went for hard parse/ soft parse(because of plan caching) and the status of different in-memory data structures etc.

So we take a snapshot like this before and after the test and generate statspack report out of it, which contains all the necessary information for database level tuning. So we are never left in the dark from database tuning perspective.

Recently i wrote a set of SQL Statements, which will do the same for SQL Server from their sys tables like wait_io_events, query_io_stats etc and finally will retrieve the information in the same format as Statspack.

But i think we lack some functionality like that in Postgres. I think things like DTrace are more for developers than for users and as already pointed out, will work only in Solaris. While we can expect that for Linux shortly, people in windows do not have much options. (While i am maintaining that DTrace is a absolutely wonderful hooking mechanism). So we should aim to develop a monitoring mechanism like statspack for postgres.

Hope i have delievered my concern.

Thanks,
Gokul.

On Sat, Feb 27, 2010 at 10:40 AM, Greg Smith <greg@2ndquadrant.com> wrote:

Bruce Momjian wrote:
What happened to this patch?

Returned with feedback in October after receiving a lot of review, no updated version submitted since then:

https://commitfest.postgresql.org/action/patch_view?id=98

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

27 February 2010, 06:40:41

Greg Smith wrote:
> Bruce Momjian wrote:
>> What happened to this patch?
>>   
>
> Returned with feedback in October after receiving a lot of review, no 
> updated version submitted since then:
>
> https://commitfest.postgresql.org/action/patch_view?id=98
>

Hmm - I would say a bit of review rather than a lot :-)

The feeling I received from the various comments was a lukewarm level of 
interest at best, so I must confess that I let other things take 
precedence. Also I was after some clear feedback about whether a 
separate stats utility was necessary at all given we have Dtrace support 
- despite this not being available for Linux... and the only comment 
dealing to this concern is from Gokul just now!

I'd also like to take the opportunity to express a little frustration 
about the commitfest business - really all I wanted was the patch 
*reviewed* as WIP - it seemed that in order to do that I needed to enter 
it into the various commitfests... then I was faced with comments to the 
effect that it was not ready for commit so should not have been entered 
into a commifest at all... sigh, a bit of an enthusiasm killer I'm afraid...

regards

Mark

Re: Lock Wait Statistics (next commitfest)

From

Tom Lane

Date:

27 February 2010, 12:44:49

Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> I'd also like to take the opportunity to express a little frustration 
> about the commitfest business - really all I wanted was the patch 
> *reviewed* as WIP - it seemed that in order to do that I needed to enter 
> it into the various commitfests... then I was faced with comments to the 
> effect that it was not ready for commit so should not have been entered 
> into a commifest at all... sigh, a bit of an enthusiasm killer I'm afraid...

Well, entering a patch in a commitfest is certainly the best way to
ensure that you'll get some feedback.  If you just pop it up on the
mailing lists, you may or may not draw much commentary depending on
how interested people are and how much time they have that day.
(A day or so later there'll be other topics to distract them.)

As long as the patch submission is clearly labeled WIP you shouldn't
get complaints about it not being ready to commit.

The other approach I'd suggest, if what you mainly want is design
review, is to not post a patch at all.  Post a design spec, or even
just specific questions.  It's less work for people to look at and
so you're more likely to get immediate feedback.
        regards, tom lane

Re: Lock Wait Statistics (next commitfest)

From

Greg Smith

Date:

27 February 2010, 14:18:08

Mark Kirkwood wrote:
> Greg Smith wrote:
>> Returned with feedback in October after receiving a lot of review, no 
>> updated version submitted since then:
>>
>> https://commitfest.postgresql.org/action/patch_view?id=98
>>
>
> Hmm - I would say a bit of review rather than a lot :-)

It looks like you got useful feedback from at least three people, and 
people were regularly looking at your patch in some form for about three 
months.  That's a lot of review.  In many other open-source projects, 
your first patch would have been rejected after a quick look as 
unsuitable and that would have been the end of things for you.  I feel 
lucky every time I get a volunteer to spend time reading my work and 
suggesting how it could be better; your message here doesn't seem to 
share that perspective.

> I'd also like to take the opportunity to express a little frustration 
> about the commitfest business - really all I wanted was the patch 
> *reviewed* as WIP - it seemed that in order to do that I needed to 
> enter it into the various commitfests... then I was faced with 
> comments to the effect that it was not ready for commit so should not 
> have been entered into a commifest at all... sigh, a bit of an 
> enthusiasm killer I'm afraid...

To lower your frustration level next time, make sure to label the e-mail 
and the entry on the CommitFest app with the magic abbreviation "WIP" 
and this shouldn't be so much of an issue.  The assumption for patches 
is that someone submitted them as commit candidates, and therefore they 
should be reviewed to that standard, unless clearly labeled otherwise.  
You briefly disclaimed yours as not being in that category in the 
initial text of your first message, but it was easy to miss that, 
particularly once it had been >8 months from when that messages showed 
up and it was still being discussed.

If you wanted to pick this back up again, I'd think that a look at 
what's been happening with the lock_timeout GUC patch would be 
informative--I'd think that has some overlap with the sort of thing you 
were trying to do.

FYI, I thought your patch was useful, but didn't spent time on it 
because it's not ambitious enough.  I would like to see statistics on a 
lot more types of waiting than just locks, and keep trying to find time 
to think about that big problem rather than worrying about the 
individual pieces of it.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Lock Wait Statistics (next commitfest)

From

Greg Smith

Date:

27 February 2010, 15:17:52

Gokulakannan Somasundaram wrote:
> Statspack works by the following way -a) it takes a copy of important 
> catalog tables(pg_ tables) which store the information like wait 
> statistics against wait events, i/o statistics cumulative against each 
> SQL_Hash( and SQL_Text), whether a particular plan went for hard 
> parse/ soft parse(because of plan caching) and the status of different 
> in-memory data structures etc.

This is actually missing the real work that went into building this 
feature into Oracle.  Before it was possible to build statspack, first 
they went to the trouble of noting every source of this form of 
latency--lock waits, I/O statistics and waits, buffer pool waits--and 
instrumented every single one of them internally.  Basically, every time 
something that might wait for a resource you later wanted to monitor the 
wait for happens, a start/end timestamp for that wait is noted, and 
ultimately the difference between them noting how long the event took is 
stored into the database.  That's the data you must have collected at 
some point in order to get the summary.

Meanwhile, PostgreSQL development is resistant to making any changes in 
this direction under the theory that a) it adds a lot of code complexity 
and b) constant calls to get the current system time are too expensive 
on some platforms to do them all the time.  Until those two things are 
sorted out, what the high-level interface to the direction you're 
suggesting looks like doesn't really matter.  DTrace support has managed 
to clear both of those hurdles due to its optional nature, perceived low 
overhead, and removing *storing* all the events generated to something 
that happens outside of the database.

I agree with you that something like statspack is the direction 
PostgreSQL eventually needs to go, but it's going to be an uphill battle 
the whole time to get it built.  The objections will be that it will add 
too much overhead at the lowest levels, where the data needed to support 
it is collected at.  Once that is cleared, the high-level interface is 
easy to build.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

27 February 2010, 17:04:47

Greg Smith wrote:
> Mark Kirkwood wrote:
>> Greg Smith wrote:
>>> Returned with feedback in October after receiving a lot of review, 
>>> no updated version submitted since then:
>>>
>>> https://commitfest.postgresql.org/action/patch_view?id=98
>>>
>>
>> Hmm - I would say a bit of review rather than a lot :-)
>
> It looks like you got useful feedback from at least three people, and 
> people were regularly looking at your patch in some form for about 
> three months.  That's a lot of review.  In many other open-source 
> projects, your first patch would have been rejected after a quick look 
> as unsuitable and that would have been the end of things for you.  I 
> feel lucky every time I get a volunteer to spend time reading my work 
> and suggesting how it could be better; your message here doesn't seem 
> to share that perspective.

I don't mean to be ungrateful about the actual reviews at all - and I 
did value the feedback received (which I hope was reasonably clear in 
the various replies I sent). I sense a bit of attacking the messenger in 
your tone... I've submitted several patches to Postgres in the past, and 
have previously always enjoyed the experience, and I do get the culture 
- being a volunteer myself.
>
> To lower your frustration level next time, make sure to label the 
> e-mail and the entry on the CommitFest app with the magic abbreviation 
> "WIP" and this shouldn't be so much of an issue.  The assumption for 
> patches is that someone submitted them as commit candidates, and 
> therefore they should be reviewed to that standard, unless clearly 
> labeled otherwise.  You briefly disclaimed yours as not being in that 
> category in the initial text of your first message, but it was easy to 
> miss that, particularly once it had been >8 months from when that 
> messages showed up and it was still being discussed.
>

LOL - I said a bit - it was only a little, connected with the commit vs 
review confusion. I think I just got caught in the bedding in time for 
the new development processes, I was advised to add it to the 
commitfests, and them advised that it should not have been at a later 
time. Yes, a bit frustrating at the time but not earth shattering at 
all. I'm mentioning it now mainly to help clarify the situation for 
anyone else wanting a WIP patch reviewed! In my case while labeling as 
WIP could well have helped - the (pretty short) message accompanying the 
patch is very clear that there is stuff to do - using the magic phrase 
"...merely to spark discussion..." - as were all the followup 
accompanying ones, with phrases like "still todo...". So yes, next time 
I'll label any patches more clearly, reviewers need to read the text of 
the patch they are about to review (note that most did).

> If you wanted to pick this back up again, I'd think that a look at 
> what's been happening with the lock_timeout GUC patch would be 
> informative--I'd think that has some overlap with the sort of thing 
> you were trying to do.
>
> FYI, I thought your patch was useful, but didn't spent time on it 
> because it's not ambitious enough.  I would like to see statistics on 
> a lot more types of waiting than just locks, and keep trying to find 
> time to think about that big problem rather than worrying about the 
> individual pieces of it.
>
Excellent thanks - that is exactly the sort of comment that would have 
been very valuable to have made at the time (echo's Gokul's recent one 
interestingly enough). After all if enough people share this view, then 
clearly I need to either forget about the current patch, or design 
something more ambitious!

regards

Mark

Re: Lock Wait Statistics (next commitfest)

From

Josh Berkus

Date:

27 February 2010, 17:16:09

On 2/27/10 1:04 PM, Mark Kirkwood wrote:
>>
> 
> LOL - I said a bit - it was only a little, connected with the commit vs
> review confusion. I think I just got caught in the bedding in time for
> the new development processes, I was advised to add it to the
> commitfests, and them advised that it should not have been at a later time.

Yeah,this is only the 2nd year we have done CFs, and is the first year
we've had non-Core Team managing them.  So the cement on the procedure
is still wet.

--Josh

Re: Lock Wait Statistics (next commitfest)

From

Greg Smith

Date:

27 February 2010, 18:23:39

Mark Kirkwood wrote:
> I don't mean to be ungrateful about the actual reviews at all - and I 
> did value the feedback received (which I hope was reasonably clear in 
> the various replies I sent). I sense a bit of attacking the messenger 
> in your tone...

I thought there was a moderately big difference between the reality of 
the review you got and how you were characterizing it, and I was just 
trying to provide some perspective on how bad a true "bit of review" 
only would have worked.  Since I saw you disclaimed that wording with a 
smiley I know it wasn't intending to be ungrateful, and I didn't intend 
to shoot the messenger.  Apologies if my tone grazed you though.

In any case, process feedback noted and assimilated into recommended 
practice:  I just added a section about WIP patches to 
http://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission

While I was in there I also added some more notes on my personal top 
patch submission peeve, patches whose purpose in life is to improve 
performance that don't come with associated easy to run test cases, 
including a sample of that test running on a system that shows the 
speedup clearly.  If I were in charge I just would make it standard 
project policy to reject any performance patch without those 
characteristics immediately.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Performance Patches Was: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

27 February 2010, 19:23:14

Greg Smith wrote:
>
>
> While I was in there I also added some more notes on my personal top 
> patch submission peeve, patches whose purpose in life is to improve 
> performance that don't come with associated easy to run test cases, 
> including a sample of that test running on a system that shows the 
> speedup clearly.  If I were in charge I just would make it standard 
> project policy to reject any performance patch without those 
> characteristics immediately.
>

While I completely agree that the submitter should be required to supply 
a test case and their results, so the rest of us can try to reproduce 
said improvement - rejecting the patch out of hand is a bit harsh I feel 
- Hey, they may just have forgotten to supply these things! The reviewer 
can always ask, can they not? I would prefer to see the wiki say 
something along the lines of "If you don't supply a test case you will 
be asked for one before any further review can proceed..."

Cheers

Mark

Re: Performance Patches Was: Lock Wait Statistics (next commitfest)

From

Greg Smith

Date:

27 February 2010, 20:06:40

Mark Kirkwood wrote:
> While I completely agree that the submitter should be required to 
> supply a test case and their results, so the rest of us can try to 
> reproduce said improvement - rejecting the patch out of hand is a bit 
> harsh I feel - Hey, they may just have forgotten to supply these things! 
I didn't put any strong wording in the Wiki, I was just mentioning my 
personal position is far less tolerant of this than the current project 
policy.  What I added was:

"If the patch is intended to improve performance, it's a good idea to 
include some reproducible tests to demonstrate the improvement. If a 
reviewer cannot duplicate your claimed performance improvement in a 
short period of time, it's very likely your patch will be bounced. Do 
not expect that a reviewer is going to find your performance feature so 
interesting that they will build an entire test suite to prove it works. 
You should have done that as part of patch validation, and included the 
necessary framework for testing with the submission."

Finding a reviewer for a performance patch and getting them up to speed 
to evaluate any submitted patch is time intensive, and it really sucks 
from the perspective of the CF manager and any reviewer who is handed a 
messy one.  The intention was not to cut people off without warning 
them.  The position I would advocate as being a fair one is that if you 
don't provide a test case for a performance improvement patch, you can't 
then expect that you'll be assigned a reviewer by the CF manager either 
until that's corrected.  And if time for the CF runs out before you do 
that, you're automatically moved to "returned with 
feedback"--specifically, "write us a test case".

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Lock Wait Statistics (next commitfest)

From

Robert Haas

Date:

27 February 2010, 21:50:51

On Sat, Feb 27, 2010 at 5:40 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> I'd also like to take the opportunity to express a little frustration about
> the commitfest business - really all I wanted was the patch *reviewed* as
> WIP - it seemed that in order to do that I needed to enter it into the
> various commitfests... then I was faced with comments to the effect that it
> was not ready for commit so should not have been entered into a commifest at
> all... sigh, a bit of an enthusiasm killer I'm afraid...

This might be my fault, so I apologize for killing your enthusiasm.  I
think when I get wrapped up in a CommitFest (and especially during the
second half) I get wound up in determining whether or not things are
going to get applied and tend to give short shrift to thinks that seem
like they won't.  My bad.

Generally speaking, I am always in favor of adding things to the
CommitFest, but I guess the one exception I would make is if there are
outstanding comments already given that haven't been addressed yet.
In that case it seems a little unfair to ask people to review it
further unless there are very specific questions you need answered.  I
think you were good about communicating that the patch wasn't ready to
be applied yet, but I also think that it's to be expected that you'll
get less feedback while it's in that state.

Anyway, my apologies for turning you off to the process... that wasn't
my intent and I'm sorry that it had that effect.

...Robert

Re: Performance Patches Was: Lock Wait Statistics (next commitfest)

From

Robert Haas

Date:

27 February 2010, 21:54:04

On Sat, Feb 27, 2010 at 6:22 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> Greg Smith wrote:
>> While I was in there I also added some more notes on my personal top patch
>> submission peeve, patches whose purpose in life is to improve performance
>> that don't come with associated easy to run test cases, including a sample
>> of that test running on a system that shows the speedup clearly.  If I were
>> in charge I just would make it standard project policy to reject any
>> performance patch without those characteristics immediately.
>
> While I completely agree that the submitter should be required to supply a
> test case and their results, so the rest of us can try to reproduce said
> improvement - rejecting the patch out of hand is a bit harsh I feel - Hey,
> they may just have forgotten to supply these things! The reviewer can always
> ask, can they not? I would prefer to see the wiki say something along the
> lines of "If you don't supply a test case you will be asked for one before
> any further review can proceed..."

Agreed.  Personally, I have no problem with giving a patch a brief
once-over even if it lacks an appropriate test case, but serious
review without a test case is really hard.  That's one of the things
that slowed down rbtree a lot this last CommitFest.  We should
probably try to make a point of trying to point this problem out to
patch submitters before the CommitFest even starts, so that they can
address it in advance.

...Robert

Re: Lock Wait Statistics (next commitfest)

From

Mark Kirkwood

Date:

28 February 2010, 02:06:51

Robert Haas wrote:
>
>
> This might be my fault, so I apologize for killing your enthusiasm.  I
> think when I get wrapped up in a CommitFest (and especially during the
> second half) I get wound up in determining whether or not things are
> going to get applied and tend to give short shrift to thinks that seem
> like they won't.  My bad.
>
> Generally speaking, I am always in favor of adding things to the
> CommitFest, but I guess the one exception I would make is if there are
> outstanding comments already given that haven't been addressed yet.
> In that case it seems a little unfair to ask people to review it
> further unless there are very specific questions you need answered.  I
> think you were good about communicating that the patch wasn't ready to
> be applied yet, but I also think that it's to be expected that you'll
> get less feedback while it's in that state.
>
>   

Yeah, makes sense, altho perhaps there needs to be a way to get 
incremental progress reviewed?

> Anyway, my apologies for turning you off to the process... that wasn't
> my intent and I'm sorry that it had that effect.
>
>
>   
I think there was a level of confusion on both sides, especially with a 
newish process for me to get my head around, so no apology needed at all 
as it is/was clear that there was no intent on your part to make things 
hard! (that is why I said nothing at the time). But thank you for your 
kind words, much appreciated.

Best wishes

Mark

best wishes

Re: Lock Wait Statistics (next commitfest)

From

Robert Haas

Date:

01 March 2010, 02:53:45

On Sun, Feb 28, 2010 at 1:06 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> Robert Haas wrote:
>> This might be my fault, so I apologize for killing your enthusiasm.  I
>> think when I get wrapped up in a CommitFest (and especially during the
>> second half) I get wound up in determining whether or not things are
>> going to get applied and tend to give short shrift to thinks that seem
>> like they won't.  My bad.
>>
>> Generally speaking, I am always in favor of adding things to the
>> CommitFest, but I guess the one exception I would make is if there are
>> outstanding comments already given that haven't been addressed yet.
>> In that case it seems a little unfair to ask people to review it
>> further unless there are very specific questions you need answered.  I
>> think you were good about communicating that the patch wasn't ready to
>> be applied yet, but I also think that it's to be expected that you'll
>> get less feedback while it's in that state.
>
> Yeah, makes sense, altho perhaps there needs to be a way to get incremental
> progress reviewed?

I think it's possible to get that, but there's a certain way you need
to ask.  As a general rule, anything that is of the form "here's my
code, can you take a look" gets less attention - with the possible
except of a patch from a committer who is planning to commit it if no
one writes back.  And even then it often doesn't get looked at.  Code
dumps are just no fun.  Now if you write something like "here's my
patch... I can't quite finish it because of X and I'm not sure whether
the best solution is Y or Z", those tend to get answered a lot more
often, at least IME.  Reading a patch and trying to understand what
it's doing and why it's doing it and whether it's really the best
solution is a fairly time-consuming effort; giving the reader some
context makes that a lot easier, and so people are more likely to help
you if you do it, again IME.

...Robert