Thread: Lock Wait Statistics (next commitfest)
Where I work they make extensive use of Postgresql. One of the things they typically want to know about are lock waits. Out of the box in there is not much in the way of tracking for such, particularly in older versions. The view pg_stats is fine for stuff happening *now*, but typically I find I'm being asked about something that happened last night... Now for those platforms with dtrace there is a lock wait probe, useful - but not for those of us on Linux! There is the conf option to log lock waits > deadlock timeout (8.3 onwards), and this is great, but I wondered if having something like this available as part of the stats module would be useful. So here is my initial attempt at this, at this point merely to spark discussion (attached patch) I have followed some of the ideas from the function execution stats, but locks required some special treatment. - new parameter track_locks (maybe should be track_lock_waits?) - new hash locks attached to stats dbentry - several new stat*lock c functions - new view pg_stat_lock_waits - new attributes for pg_stat_database This version has simply exposed the locktag structure in the view along with corresponding #waits and wait time. This should probably get reformed to look a little more like pg_locks. I figured this is easily doable along with the no doubt many changes coming from revew comments. Also I did not do any clever amalgamation of transaction lock waits - there is probably gonna be a lot of those and keeping the detail is unlikely to be useful. It would be easy to collect them all together in one transaction entry. regards Mark
Attachment
On Sun, Jan 25, 2009 at 6:57 PM, Mark Kirkwood<markir@paradise.net.nz> wrote: > > So here is my initial attempt at this, at this point merely to spark > discussion (attached patch) > this patch doesn't apply cleanly to head... can you update it, please? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Fri, Jul 10, 2009 at 12:23 PM, Jaime Casanova<jcasanov@systemguards.com.ec> wrote: > On Sun, Jan 25, 2009 at 6:57 PM, Mark Kirkwood<markir@paradise.net.nz> wrote: >> >> So here is my initial attempt at this, at this point merely to spark >> discussion (attached patch) >> > > this patch doesn't apply cleanly to head... can you update it, please? > i did it myself, i think this is something we need... this compile and seems to work... something i was wondering is that having the total time of lock waits is not very accurate because we can have 9 lock waits awaiting 1 sec each and one awaiting for 1 minute... simply sum them all will give a bad statistic or am i missing something? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Attachment
Jaime Casanova wrote: > > i did it myself, i think this is something we need... > > this compile and seems to work... something i was wondering is that > having the total time of lock waits is not very accurate because we > can have 9 lock waits awaiting 1 sec each and one awaiting for 1 > minute... simply sum them all will give a bad statistic or am i > missing something? > > Thank you Jaime - looks good. I seem to have been asleep at the wheel and missed *both* of your emails until now, belated apologies for that - especially the first one :-( With respect to the sum of wait times being not very granular, yes - quite true. I was thinking it is useful to be able to answer the question 'where is my wait time being spent' - but it hides cases like the one you mention. What would you like to see? would max and min wait times be a useful addition, or are you thinking along different lines? Cheers Mark
On Fri, Jul 17, 2009 at 3:38 AM, Mark Kirkwood<markir@paradise.net.nz> wrote: > > With respect to the sum of wait times being not very granular, yes - quite > true. I was thinking it is useful to be able to answer the question 'where > is my wait time being spent' - but it hides cases like the one you mention. > What would you like to see? would max and min wait times be a useful > addition, or are you thinking along different lines? > track number of locks, sum of wait times, max(wait time). but actually i started to think that the best is just make use of log_lock_waits send the logs to csvlog and analyze there... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova wrote: > On Fri, Jul 17, 2009 at 3:38 AM, Mark Kirkwood<markir@paradise.net.nz> wrote: > >> With respect to the sum of wait times being not very granular, yes - quite >> true. I was thinking it is useful to be able to answer the question 'where >> is my wait time being spent' - but it hides cases like the one you mention. >> What would you like to see? would max and min wait times be a useful >> addition, or are you thinking along different lines? >> >> > > track number of locks, sum of wait times, max(wait time). > but actually i started to think that the best is just make use of > log_lock_waits send the logs to csvlog and analyze there... > > Right - I'll look at adding max (at least) early next week. Yeah, enabling log_lock_waits is certainly another approach, however you currently miss out on those that are < deadlock_timeout - and potentially they could be the source of your problem (i.e millions of waits all < deadlock_timeout but taken together rather significant). This shortcoming could be overcome by making the cutoff wait time decoupled from deadlock_timeout (e.g a new parameter log_min_lock_wait_time or similar). I'm thinking that having the lock waits analyzable via sql easily may mean that for most people they don't need to collect and analyze their logs for this stuff (they just examine the lock stats view from Pgadmin or similar). Cheers Mark
Mark Kirkwood <markir@paradise.net.nz> writes: > Yeah, enabling log_lock_waits is certainly another approach, however you > currently miss out on those that are < deadlock_timeout - and > potentially they could be the source of your problem (i.e millions of > waits all < deadlock_timeout but taken together rather significant). > This shortcoming could be overcome by making the cutoff wait time > decoupled from deadlock_timeout (e.g a new parameter > log_min_lock_wait_time or similar). The reason that they're tied together is to keep from creating unreasonable complexity (and an unreasonable number of extra kernel calls) in management of the timeout timers. You will find that you can't just wave your hand and decree that they are now decoupled. regards, tom lane
Tom Lane wrote: > Mark Kirkwood <markir@paradise.net.nz> writes: > >> Yeah, enabling log_lock_waits is certainly another approach, however you >> currently miss out on those that are < deadlock_timeout - and >> potentially they could be the source of your problem (i.e millions of >> waits all < deadlock_timeout but taken together rather significant). >> This shortcoming could be overcome by making the cutoff wait time >> decoupled from deadlock_timeout (e.g a new parameter >> log_min_lock_wait_time or similar). >> > > The reason that they're tied together is to keep from creating > unreasonable complexity (and an unreasonable number of extra kernel > calls) in management of the timeout timers. You will find that you > can't just wave your hand and decree that they are now decoupled. > > Thanks Tom - I did wonder if there was a deeper reason they were tied together! Cheers Mark
Mark Kirkwood wrote: > Jaime Casanova wrote: >> On Fri, Jul 17, 2009 at 3:38 AM, Mark >> Kirkwood<markir@paradise.net.nz> wrote: >> >>> With respect to the sum of wait times being not very granular, yes - >>> quite >>> true. I was thinking it is useful to be able to answer the question >>> 'where >>> is my wait time being spent' - but it hides cases like the one you >>> mention. >>> What would you like to see? would max and min wait times be a useful >>> addition, or are you thinking along different lines? >>> >>> >> >> track number of locks, sum of wait times, max(wait time). >> but actually i started to think that the best is just make use of >> log_lock_waits send the logs to csvlog and analyze there... >> >> > Right - I'll look at adding max (at least) early next week. > I'm also thinking of taking a look at amalgamating transaction type lock waits. This seems like a good idea because: - individually, and viewed at a later date, I don't think they individual detail is going to be useful- there will be a lot of them- I think the statistical data (count, sum elapsed, maxelapsed) may be sufficiently interesting Cheers Mark
Mark Kirkwood wrote: > Jaime Casanova wrote: >> On Fri, Jul 17, 2009 at 3:38 AM, Mark >> Kirkwood<markir@paradise.net.nz> wrote: >> >>> With respect to the sum of wait times being not very granular, yes - >>> quite >>> true. I was thinking it is useful to be able to answer the question >>> 'where >>> is my wait time being spent' - but it hides cases like the one you >>> mention. >>> What would you like to see? would max and min wait times be a useful >>> addition, or are you thinking along different lines? >>> >>> >> >> track number of locks, sum of wait times, max(wait time). >> but actually i started to think that the best is just make use of >> log_lock_waits send the logs to csvlog and analyze there... >> >> > Right - I'll look at adding max (at least) early next week. > > > > Patch with max(wait time). Still TODO - amalgamate individual transaction lock waits - redo (rather ugly) temporary pg_stat_lock_waits in a form more like pg_locks
Attachment
Mark Kirkwood wrote: > Mark Kirkwood wrote: >> Jaime Casanova wrote: >>> On Fri, Jul 17, 2009 at 3:38 AM, Mark >>> Kirkwood<markir@paradise.net.nz> wrote: >>> >>>> With respect to the sum of wait times being not very granular, yes >>>> - quite >>>> true. I was thinking it is useful to be able to answer the question >>>> 'where >>>> is my wait time being spent' - but it hides cases like the one you >>>> mention. >>>> What would you like to see? would max and min wait times be a useful >>>> addition, or are you thinking along different lines? >>>> >>>> >>> >>> track number of locks, sum of wait times, max(wait time). >>> but actually i started to think that the best is just make use of >>> log_lock_waits send the logs to csvlog and analyze there... >>> >>> >> Right - I'll look at adding max (at least) early next week. >> >> >> >> > Patch with max(wait time). > > Still TODO > > - amalgamate individual transaction lock waits > - redo (rather ugly) temporary pg_stat_lock_waits in a form more like > pg_locks > This version has the individual transaction lock waits amalgamated. Still TODO: redo pg_stat_lock_waits ...
Attachment
Mark, Your last email on this patch, from August 9th, indicates that you've still got "TODO: redo pg_stat_lock_waits ...". Hasyou updated this patch since then? Thanks! Stephen
I have this patch, if you're interested. LWLock Instrumentation Patch - counts locks and waits in shared and exclusive mode - for selected locks, measures wait and hold times - for selected locks, displays a histogram of wait and hold times - information is printed at backend exit Configurable by #define's in lwlock.c Regards, pierre
Attachment
Pierre, > Configurable by #define's in lwlock.c Given that we already have dtrace/systemtap probes around the lwlocks, is there some way you could use those instead of extra #defines? -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Stephen Frost wrote: > Mark, > > Your last email on this patch, from August 9th, indicates that you've > still got "TODO: redo pg_stat_lock_waits ...". Has you updated this > patch since then? > > > Stephen, No - that is still a TODO for me - real life getting in the way :-) Cheers Mark
On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir@paradise.net.nz> wrote: >>> >>> >> Patch with max(wait time). >> >> Still TODO >> >> - amalgamate individual transaction lock waits >> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like >> pg_locks >> > This version has the individual transaction lock waits amalgamated. > > Still TODO: redo pg_stat_lock_waits ... > it applies with some hunks, compiles fine and seems to work... i'm still not sure this is what we need, some more comments could be helpful. what kind of questions are we capable of answer with this and and what kind of questions are we still missing? for example, now we know "number of locks that had to wait", "total time waiting" and "max time waiting for a single lock"... but still we can have an inaccurate understanding if we have lots of locks waiting little time and a few waiting a huge amount of time... something i have been asked when system starts to slow down is "can we know if there were a lock contention on that period"? for now the only way to answer that is logging locks about the patch itself: you haven't documented either. what is the pg_stat_lock_waits view for? and what are those fieldx it has? i'll let this patch as "needs review" for more people to comment on it... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir@paradise.net.nz> wrote: >>>> >>>> >>> Patch with max(wait time). >>> >>> Still TODO >>> >>> - amalgamate individual transaction lock waits >>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like >>> pg_locks >>> >> This version has the individual transaction lock waits amalgamated. >> >> Still TODO: redo pg_stat_lock_waits ... >> > > it applies with some hunks, compiles fine and seems to work... > i'm still not sure this is what we need, some more comments could be helpful. I'm pretty sure the logic of this patch is not correct. in pgstat_init_lock_wait(LOCKTAG *locktag) ... + l_curr = htabent->l_counts.l_tot_wait_time; + INSTR_TIME_SET_CURRENT(l_start); + INSTR_TIME_ADD(l_curr, l_start); + htabent->l_counts.l_tot_wait_time = l_curr; in pgstat_end_lock_wait(LOCKTAG *locktag) ... + l_start = htabent->l_counts.l_tot_wait_time; + INSTR_TIME_SET_CURRENT(l_end); + INSTR_TIME_SUBTRACT(l_end, l_start); + htabent->l_counts.l_tot_wait_time = l_end; So l_start = time cumulatively waited previously + time at start of this wait. l_end - l_start is equal to: = time at end of this wait - ( time at start of this wait + time cumulatively waited previously) = (time at end of this wait - time at start of this wait) - time cumulatively waited previously = (duration of this wait) - time waited cumulatively previously. That minus sign in the last line can't be good, can it? Also + htabent->l_counts.l_tot_wait_time = l_end; + + if (INSTR_TIME_GET_MICROSEC(l_end) > INSTR_TIME_GET_MICROSEC(htabent->l_counts.l_max_wait_time)) + htabent->l_counts.l_max_wait_time = l_end; The total wait time is equal to the max wait time (which are both equal to l_end)? One or both of those has to end up being wrong. At this stage, is l_end supposed to be the last wait time, or the cumulative wait time? One of the things in the patch review checklist is about compiler warnings, so I am reporting these: lock.c: In function `LockAcquire': lock.c:797: warning: passing arg 1 of `pgstat_init_lock_wait' discards qualifiers from pointer target type lock.c:802: warning: passing arg 1 of `pgstat_end_lock_wait' discards qualifiers from pointer target type Cheers, Jeff
Jaime Casanova wrote: > > it applies with some hunks, compiles fine and seems to work... > i'm still not sure this is what we need, some more comments could be helpful. > > Yeah, that's the big question. Are the current capabilities (logging 'em for waits > deadlock timeout + dtrace hooks) enough? I'm thinking that if we had dtrace for linux generally available, then the need for this patch would be lessened. > what kind of questions are we capable of answer with this and and what > kind of questions are we still missing? > > for example, now we know "number of locks that had to wait", "total > time waiting" and "max time waiting for a single lock"... but still we > can have an inaccurate understanding if we have lots of locks waiting > little time and a few waiting a huge amount of time... > > something i have been asked when system starts to slow down is "can we > know if there were a lock contention on that period"? for now the only > way to answer that is logging locks > > Right - there still may be other aggregates that need to be captured.... it would be great to have some more feedback from the field about this. In my case, I was interested in seeing if the elapsed time was being spent waiting for locks or actually executing (in fact it turned out to be the latter - but was still very useful to be able to rule out locking issues). However , as you mention - there maybe cases where the question is more about part of the system suffering a disproportional number/time of lock waits... > about the patch itself: > you haven't documented either. what is the pg_stat_lock_waits view > for? and what are those fieldx it has? > > Yeah, those fields are straight from the LOCKTAG structure. I need to redo the view to be more like pg_locks, and also do the doco. However I've been a bit hesitant to dive into these last two steps until I see that there is some real enthusiasm for this patch (or failing that, a feeling that it is needed...). > i'll let this patch as "needs review" for more people to comment on it... > > Agreed, thanks for the comments. Mark
Jeff Janes wrote: > > The total wait time is equal to the max wait time (which are both > equal to l_end)? > One or both of those has to end up being wrong. At this stage, is > l_end supposed to be the last wait time, or the cumulative wait time? > > > Hmm - I may well have fat fingered the arithmetic, thanks I'll take a look! > One of the things in the patch review checklist is about compiler > warnings, so I am reporting these: > > lock.c: In function `LockAcquire': > lock.c:797: warning: passing arg 1 of `pgstat_init_lock_wait' discards > qualifiers from pointer target type > lock.c:802: warning: passing arg 1 of `pgstat_end_lock_wait' discards > qualifiers from pointer target type > > > > Right, will look at too. Cheers Mark
On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir@paradise.net.nz> wrote: >>>> >>>> >>> Patch with max(wait time). >>> >>> Still TODO >>> >>> - amalgamate individual transaction lock waits >>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like >>> pg_locks >>> >> This version has the individual transaction lock waits amalgamated. >> >> Still TODO: redo pg_stat_lock_waits ... >> > > it applies with some hunks, compiles fine and seems to work... > i'm still not sure this is what we need, some more comments could be helpful. > > what kind of questions are we capable of answer with this and and what > kind of questions are we still missing? > > for example, now we know "number of locks that had to wait", "total > time waiting" and "max time waiting for a single lock"... but still we > can have an inaccurate understanding if we have lots of locks waiting > little time and a few waiting a huge amount of time... Aren't the huge ones already loggable from the deadlock detector? With the max, we can at least put an upper limit on how long the longest ones could have been. However, is there a way to reset the max? I tried deleting data/pg_stat_tmp, but that didn't work. With cumulative values, you can you take snapshots and then take the difference of them, that won't work with max. If the max can't be reset except with an initdb, I think that makes it barely usable. > something i have been asked when system starts to slow down is "can we > know if there were a lock contention on that period"? for now the only > way to answer that is logging locks I was surprised to find that running with track_locks on did not cause a detectable difference in performance, so you could just routinely do regularly scheduled snapshots and go back and mine them over the time that a problem was occurring. I just checked with pgbench over various levels of concurrency and fsync settings. If potential slowness wouldn't show up there, I don't know how else to look for it. Cheers, Jeff
On Sun, Oct 4, 2009 at 4:14 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova > <jcasanov@systemguards.com.ec> wrote: >> On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir@paradise.net.nz> wrote: >>>>> >>>>> >>>> Patch with max(wait time). >>>> >>>> Still TODO >>>> >>>> - amalgamate individual transaction lock waits >>>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like >>>> pg_locks >>>> >>> This version has the individual transaction lock waits amalgamated. >>> >>> Still TODO: redo pg_stat_lock_waits ... >>> >> >> it applies with some hunks, compiles fine and seems to work... >> i'm still not sure this is what we need, some more comments could be helpful. >> >> what kind of questions are we capable of answer with this and and what >> kind of questions are we still missing? >> >> for example, now we know "number of locks that had to wait", "total >> time waiting" and "max time waiting for a single lock"... but still we >> can have an inaccurate understanding if we have lots of locks waiting >> little time and a few waiting a huge amount of time... > > Aren't the huge ones already loggable from the deadlock detector? > > With the max, we can at least put an upper limit on how long the > longest ones could have been. However, is there a way to reset the > max? I tried deleting data/pg_stat_tmp, but that didn't work. With > cumulative values, you can you take snapshots and then take the > difference of them, that won't work with max. If the max can't be > reset except with an initdb, I think that makes it barely usable. > >> something i have been asked when system starts to slow down is "can we >> know if there were a lock contention on that period"? for now the only >> way to answer that is logging locks > > I was surprised to find that running with track_locks on did not cause > a detectable difference in performance, so you could just routinely do > regularly scheduled snapshots and go back and mine them over the time > that a problem was occurring. I just checked with pgbench over > various levels of concurrency and fsync settings. If potential > slowness wouldn't show up there, I don't know how else to look for it. It seems that this patch had open TODO items at the beginning of the CommitFest (so perhaps we should have bounced it immediately), and I think that's still the case now, so I am going to mark this as Returned with Feedback. A lot of good reviewing has been done, though, so many this can be submitted for a future CommitFest in something close to a final form. Thanks, ...Robert
What happened to this patch? --------------------------------------------------------------------------- Mark Kirkwood wrote: > Where I work they make extensive use of Postgresql. One of the things > they typically want to know about are lock waits. Out of the box in > there is not much in the way of tracking for such, particularly in older > versions. The view pg_stats is fine for stuff happening *now*, but > typically I find I'm being asked about something that happened last > night... > > Now for those platforms with dtrace there is a lock wait probe, useful - > but not for those of us on Linux! There is the conf option to log lock > waits > deadlock timeout (8.3 onwards), and this is great, but I > wondered if having something like this available as part of the stats > module would be useful. > > So here is my initial attempt at this, at this point merely to spark > discussion (attached patch) > > I have followed some of the ideas from the function execution stats, but > locks required some special treatment. > > - new parameter track_locks (maybe should be track_lock_waits?) > - new hash locks attached to stats dbentry > - several new stat*lock c functions > - new view pg_stat_lock_waits > - new attributes for pg_stat_database > > This version has simply exposed the locktag structure in the view along > with corresponding #waits and wait time. This should probably get > reformed to look a little more like pg_locks. I figured this is easily > doable along with the no doubt many changes coming from revew comments. > > Also I did not do any clever amalgamation of transaction lock waits - > there is probably gonna be a lot of those and keeping the detail is > unlikely to be useful. It would be easy to collect them all together in > one transaction entry. > > regards > > Mark > > > [ application/x-gzip is not supported, skipping... ] > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.comPG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive,Christ can be your backup. +
Bruce Momjian wrote: > What happened to this patch? > Returned with feedback in October after receiving a lot of review, no updated version submitted since then: https://commitfest.postgresql.org/action/patch_view?id=98 -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
I am just adding my two cents, please ignore it, if its totally irrelevant.
While we do performance testing/tuning of any applications, the important things, a standard monitoring requirement from a database are
a) Different type of wait events and the time spent in each of them
b) Top ten Queries by Total Logical reads & Average Logical Reads
c) Top ten Queries by Total CPU Time & Average CPU Time
The monitoring methodology should not put too much overhead during the test to invalidate the application response times captured during the performance test (Let's not worry about Heisenberg uncertainty for now :)) )
Of all the databases i worked with, Oracle provides the best monitoring product in the form of Statspack.
Statspack works by the following way -a) it takes a copy of important catalog tables(pg_ tables) which store the information like wait statistics against wait events, i/o statistics cumulative against each SQL_Hash( and SQL_Text), whether a particular plan went for hard parse/ soft parse(because of plan caching) and the status of different in-memory data structures etc.
So we take a snapshot like this before and after the test and generate statspack report out of it, which contains all the necessary information for database level tuning. So we are never left in the dark from database tuning perspective.
Recently i wrote a set of SQL Statements, which will do the same for SQL Server from their sys tables like wait_io_events, query_io_stats etc and finally will retrieve the information in the same format as Statspack.
But i think we lack some functionality like that in Postgres. I think things like DTrace are more for developers than for users and as already pointed out, will work only in Solaris. While we can expect that for Linux shortly, people in windows do not have much options. (While i am maintaining that DTrace is a absolutely wonderful hooking mechanism). So we should aim to develop a monitoring mechanism like statspack for postgres.
Hope i have delievered my concern.
Thanks,
Gokul.
While we do performance testing/tuning of any applications, the important things, a standard monitoring requirement from a database are
a) Different type of wait events and the time spent in each of them
b) Top ten Queries by Total Logical reads & Average Logical Reads
c) Top ten Queries by Total CPU Time & Average CPU Time
The monitoring methodology should not put too much overhead during the test to invalidate the application response times captured during the performance test (Let's not worry about Heisenberg uncertainty for now :)) )
Of all the databases i worked with, Oracle provides the best monitoring product in the form of Statspack.
Statspack works by the following way -a) it takes a copy of important catalog tables(pg_ tables) which store the information like wait statistics against wait events, i/o statistics cumulative against each SQL_Hash( and SQL_Text), whether a particular plan went for hard parse/ soft parse(because of plan caching) and the status of different in-memory data structures etc.
So we take a snapshot like this before and after the test and generate statspack report out of it, which contains all the necessary information for database level tuning. So we are never left in the dark from database tuning perspective.
Recently i wrote a set of SQL Statements, which will do the same for SQL Server from their sys tables like wait_io_events, query_io_stats etc and finally will retrieve the information in the same format as Statspack.
But i think we lack some functionality like that in Postgres. I think things like DTrace are more for developers than for users and as already pointed out, will work only in Solaris. While we can expect that for Linux shortly, people in windows do not have much options. (While i am maintaining that DTrace is a absolutely wonderful hooking mechanism). So we should aim to develop a monitoring mechanism like statspack for postgres.
Hope i have delievered my concern.
Thanks,
Gokul.
On Sat, Feb 27, 2010 at 10:40 AM, Greg Smith <greg@2ndquadrant.com> wrote:
Bruce Momjian wrote:Returned with feedback in October after receiving a lot of review, no updated version submitted since then:What happened to this patch?
https://commitfest.postgresql.org/action/patch_view?id=98
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Greg Smith wrote: > Bruce Momjian wrote: >> What happened to this patch? >> > > Returned with feedback in October after receiving a lot of review, no > updated version submitted since then: > > https://commitfest.postgresql.org/action/patch_view?id=98 > Hmm - I would say a bit of review rather than a lot :-) The feeling I received from the various comments was a lukewarm level of interest at best, so I must confess that I let other things take precedence. Also I was after some clear feedback about whether a separate stats utility was necessary at all given we have Dtrace support - despite this not being available for Linux... and the only comment dealing to this concern is from Gokul just now! I'd also like to take the opportunity to express a little frustration about the commitfest business - really all I wanted was the patch *reviewed* as WIP - it seemed that in order to do that I needed to enter it into the various commitfests... then I was faced with comments to the effect that it was not ready for commit so should not have been entered into a commifest at all... sigh, a bit of an enthusiasm killer I'm afraid... regards Mark
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > I'd also like to take the opportunity to express a little frustration > about the commitfest business - really all I wanted was the patch > *reviewed* as WIP - it seemed that in order to do that I needed to enter > it into the various commitfests... then I was faced with comments to the > effect that it was not ready for commit so should not have been entered > into a commifest at all... sigh, a bit of an enthusiasm killer I'm afraid... Well, entering a patch in a commitfest is certainly the best way to ensure that you'll get some feedback. If you just pop it up on the mailing lists, you may or may not draw much commentary depending on how interested people are and how much time they have that day. (A day or so later there'll be other topics to distract them.) As long as the patch submission is clearly labeled WIP you shouldn't get complaints about it not being ready to commit. The other approach I'd suggest, if what you mainly want is design review, is to not post a patch at all. Post a design spec, or even just specific questions. It's less work for people to look at and so you're more likely to get immediate feedback. regards, tom lane
Mark Kirkwood wrote: > Greg Smith wrote: >> Returned with feedback in October after receiving a lot of review, no >> updated version submitted since then: >> >> https://commitfest.postgresql.org/action/patch_view?id=98 >> > > Hmm - I would say a bit of review rather than a lot :-) It looks like you got useful feedback from at least three people, and people were regularly looking at your patch in some form for about three months. That's a lot of review. In many other open-source projects, your first patch would have been rejected after a quick look as unsuitable and that would have been the end of things for you. I feel lucky every time I get a volunteer to spend time reading my work and suggesting how it could be better; your message here doesn't seem to share that perspective. > I'd also like to take the opportunity to express a little frustration > about the commitfest business - really all I wanted was the patch > *reviewed* as WIP - it seemed that in order to do that I needed to > enter it into the various commitfests... then I was faced with > comments to the effect that it was not ready for commit so should not > have been entered into a commifest at all... sigh, a bit of an > enthusiasm killer I'm afraid... To lower your frustration level next time, make sure to label the e-mail and the entry on the CommitFest app with the magic abbreviation "WIP" and this shouldn't be so much of an issue. The assumption for patches is that someone submitted them as commit candidates, and therefore they should be reviewed to that standard, unless clearly labeled otherwise. You briefly disclaimed yours as not being in that category in the initial text of your first message, but it was easy to miss that, particularly once it had been >8 months from when that messages showed up and it was still being discussed. If you wanted to pick this back up again, I'd think that a look at what's been happening with the lock_timeout GUC patch would be informative--I'd think that has some overlap with the sort of thing you were trying to do. FYI, I thought your patch was useful, but didn't spent time on it because it's not ambitious enough. I would like to see statistics on a lot more types of waiting than just locks, and keep trying to find time to think about that big problem rather than worrying about the individual pieces of it. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Gokulakannan Somasundaram wrote: > Statspack works by the following way -a) it takes a copy of important > catalog tables(pg_ tables) which store the information like wait > statistics against wait events, i/o statistics cumulative against each > SQL_Hash( and SQL_Text), whether a particular plan went for hard > parse/ soft parse(because of plan caching) and the status of different > in-memory data structures etc. This is actually missing the real work that went into building this feature into Oracle. Before it was possible to build statspack, first they went to the trouble of noting every source of this form of latency--lock waits, I/O statistics and waits, buffer pool waits--and instrumented every single one of them internally. Basically, every time something that might wait for a resource you later wanted to monitor the wait for happens, a start/end timestamp for that wait is noted, and ultimately the difference between them noting how long the event took is stored into the database. That's the data you must have collected at some point in order to get the summary. Meanwhile, PostgreSQL development is resistant to making any changes in this direction under the theory that a) it adds a lot of code complexity and b) constant calls to get the current system time are too expensive on some platforms to do them all the time. Until those two things are sorted out, what the high-level interface to the direction you're suggesting looks like doesn't really matter. DTrace support has managed to clear both of those hurdles due to its optional nature, perceived low overhead, and removing *storing* all the events generated to something that happens outside of the database. I agree with you that something like statspack is the direction PostgreSQL eventually needs to go, but it's going to be an uphill battle the whole time to get it built. The objections will be that it will add too much overhead at the lowest levels, where the data needed to support it is collected at. Once that is cleared, the high-level interface is easy to build. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > Mark Kirkwood wrote: >> Greg Smith wrote: >>> Returned with feedback in October after receiving a lot of review, >>> no updated version submitted since then: >>> >>> https://commitfest.postgresql.org/action/patch_view?id=98 >>> >> >> Hmm - I would say a bit of review rather than a lot :-) > > It looks like you got useful feedback from at least three people, and > people were regularly looking at your patch in some form for about > three months. That's a lot of review. In many other open-source > projects, your first patch would have been rejected after a quick look > as unsuitable and that would have been the end of things for you. I > feel lucky every time I get a volunteer to spend time reading my work > and suggesting how it could be better; your message here doesn't seem > to share that perspective. I don't mean to be ungrateful about the actual reviews at all - and I did value the feedback received (which I hope was reasonably clear in the various replies I sent). I sense a bit of attacking the messenger in your tone... I've submitted several patches to Postgres in the past, and have previously always enjoyed the experience, and I do get the culture - being a volunteer myself. > > To lower your frustration level next time, make sure to label the > e-mail and the entry on the CommitFest app with the magic abbreviation > "WIP" and this shouldn't be so much of an issue. The assumption for > patches is that someone submitted them as commit candidates, and > therefore they should be reviewed to that standard, unless clearly > labeled otherwise. You briefly disclaimed yours as not being in that > category in the initial text of your first message, but it was easy to > miss that, particularly once it had been >8 months from when that > messages showed up and it was still being discussed. > LOL - I said a bit - it was only a little, connected with the commit vs review confusion. I think I just got caught in the bedding in time for the new development processes, I was advised to add it to the commitfests, and them advised that it should not have been at a later time. Yes, a bit frustrating at the time but not earth shattering at all. I'm mentioning it now mainly to help clarify the situation for anyone else wanting a WIP patch reviewed! In my case while labeling as WIP could well have helped - the (pretty short) message accompanying the patch is very clear that there is stuff to do - using the magic phrase "...merely to spark discussion..." - as were all the followup accompanying ones, with phrases like "still todo...". So yes, next time I'll label any patches more clearly, reviewers need to read the text of the patch they are about to review (note that most did). > If you wanted to pick this back up again, I'd think that a look at > what's been happening with the lock_timeout GUC patch would be > informative--I'd think that has some overlap with the sort of thing > you were trying to do. > > FYI, I thought your patch was useful, but didn't spent time on it > because it's not ambitious enough. I would like to see statistics on > a lot more types of waiting than just locks, and keep trying to find > time to think about that big problem rather than worrying about the > individual pieces of it. > Excellent thanks - that is exactly the sort of comment that would have been very valuable to have made at the time (echo's Gokul's recent one interestingly enough). After all if enough people share this view, then clearly I need to either forget about the current patch, or design something more ambitious! regards Mark
On 2/27/10 1:04 PM, Mark Kirkwood wrote: >> > > LOL - I said a bit - it was only a little, connected with the commit vs > review confusion. I think I just got caught in the bedding in time for > the new development processes, I was advised to add it to the > commitfests, and them advised that it should not have been at a later time. Yeah,this is only the 2nd year we have done CFs, and is the first year we've had non-Core Team managing them. So the cement on the procedure is still wet. --Josh
Mark Kirkwood wrote: > I don't mean to be ungrateful about the actual reviews at all - and I > did value the feedback received (which I hope was reasonably clear in > the various replies I sent). I sense a bit of attacking the messenger > in your tone... I thought there was a moderately big difference between the reality of the review you got and how you were characterizing it, and I was just trying to provide some perspective on how bad a true "bit of review" only would have worked. Since I saw you disclaimed that wording with a smiley I know it wasn't intending to be ungrateful, and I didn't intend to shoot the messenger. Apologies if my tone grazed you though. In any case, process feedback noted and assimilated into recommended practice: I just added a section about WIP patches to http://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission While I was in there I also added some more notes on my personal top patch submission peeve, patches whose purpose in life is to improve performance that don't come with associated easy to run test cases, including a sample of that test running on a system that shows the speedup clearly. If I were in charge I just would make it standard project policy to reject any performance patch without those characteristics immediately. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > > > While I was in there I also added some more notes on my personal top > patch submission peeve, patches whose purpose in life is to improve > performance that don't come with associated easy to run test cases, > including a sample of that test running on a system that shows the > speedup clearly. If I were in charge I just would make it standard > project policy to reject any performance patch without those > characteristics immediately. > While I completely agree that the submitter should be required to supply a test case and their results, so the rest of us can try to reproduce said improvement - rejecting the patch out of hand is a bit harsh I feel - Hey, they may just have forgotten to supply these things! The reviewer can always ask, can they not? I would prefer to see the wiki say something along the lines of "If you don't supply a test case you will be asked for one before any further review can proceed..." Cheers Mark
Mark Kirkwood wrote: > While I completely agree that the submitter should be required to > supply a test case and their results, so the rest of us can try to > reproduce said improvement - rejecting the patch out of hand is a bit > harsh I feel - Hey, they may just have forgotten to supply these things! I didn't put any strong wording in the Wiki, I was just mentioning my personal position is far less tolerant of this than the current project policy. What I added was: "If the patch is intended to improve performance, it's a good idea to include some reproducible tests to demonstrate the improvement. If a reviewer cannot duplicate your claimed performance improvement in a short period of time, it's very likely your patch will be bounced. Do not expect that a reviewer is going to find your performance feature so interesting that they will build an entire test suite to prove it works. You should have done that as part of patch validation, and included the necessary framework for testing with the submission." Finding a reviewer for a performance patch and getting them up to speed to evaluate any submitted patch is time intensive, and it really sucks from the perspective of the CF manager and any reviewer who is handed a messy one. The intention was not to cut people off without warning them. The position I would advocate as being a fair one is that if you don't provide a test case for a performance improvement patch, you can't then expect that you'll be assigned a reviewer by the CF manager either until that's corrected. And if time for the CF runs out before you do that, you're automatically moved to "returned with feedback"--specifically, "write us a test case". -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Sat, Feb 27, 2010 at 5:40 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > I'd also like to take the opportunity to express a little frustration about > the commitfest business - really all I wanted was the patch *reviewed* as > WIP - it seemed that in order to do that I needed to enter it into the > various commitfests... then I was faced with comments to the effect that it > was not ready for commit so should not have been entered into a commifest at > all... sigh, a bit of an enthusiasm killer I'm afraid... This might be my fault, so I apologize for killing your enthusiasm. I think when I get wrapped up in a CommitFest (and especially during the second half) I get wound up in determining whether or not things are going to get applied and tend to give short shrift to thinks that seem like they won't. My bad. Generally speaking, I am always in favor of adding things to the CommitFest, but I guess the one exception I would make is if there are outstanding comments already given that haven't been addressed yet. In that case it seems a little unfair to ask people to review it further unless there are very specific questions you need answered. I think you were good about communicating that the patch wasn't ready to be applied yet, but I also think that it's to be expected that you'll get less feedback while it's in that state. Anyway, my apologies for turning you off to the process... that wasn't my intent and I'm sorry that it had that effect. ...Robert
On Sat, Feb 27, 2010 at 6:22 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > Greg Smith wrote: >> While I was in there I also added some more notes on my personal top patch >> submission peeve, patches whose purpose in life is to improve performance >> that don't come with associated easy to run test cases, including a sample >> of that test running on a system that shows the speedup clearly. If I were >> in charge I just would make it standard project policy to reject any >> performance patch without those characteristics immediately. > > While I completely agree that the submitter should be required to supply a > test case and their results, so the rest of us can try to reproduce said > improvement - rejecting the patch out of hand is a bit harsh I feel - Hey, > they may just have forgotten to supply these things! The reviewer can always > ask, can they not? I would prefer to see the wiki say something along the > lines of "If you don't supply a test case you will be asked for one before > any further review can proceed..." Agreed. Personally, I have no problem with giving a patch a brief once-over even if it lacks an appropriate test case, but serious review without a test case is really hard. That's one of the things that slowed down rbtree a lot this last CommitFest. We should probably try to make a point of trying to point this problem out to patch submitters before the CommitFest even starts, so that they can address it in advance. ...Robert
Robert Haas wrote: > > > This might be my fault, so I apologize for killing your enthusiasm. I > think when I get wrapped up in a CommitFest (and especially during the > second half) I get wound up in determining whether or not things are > going to get applied and tend to give short shrift to thinks that seem > like they won't. My bad. > > Generally speaking, I am always in favor of adding things to the > CommitFest, but I guess the one exception I would make is if there are > outstanding comments already given that haven't been addressed yet. > In that case it seems a little unfair to ask people to review it > further unless there are very specific questions you need answered. I > think you were good about communicating that the patch wasn't ready to > be applied yet, but I also think that it's to be expected that you'll > get less feedback while it's in that state. > > Yeah, makes sense, altho perhaps there needs to be a way to get incremental progress reviewed? > Anyway, my apologies for turning you off to the process... that wasn't > my intent and I'm sorry that it had that effect. > > > I think there was a level of confusion on both sides, especially with a newish process for me to get my head around, so no apology needed at all as it is/was clear that there was no intent on your part to make things hard! (that is why I said nothing at the time). But thank you for your kind words, much appreciated. Best wishes Mark best wishes
On Sun, Feb 28, 2010 at 1:06 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > Robert Haas wrote: >> This might be my fault, so I apologize for killing your enthusiasm. I >> think when I get wrapped up in a CommitFest (and especially during the >> second half) I get wound up in determining whether or not things are >> going to get applied and tend to give short shrift to thinks that seem >> like they won't. My bad. >> >> Generally speaking, I am always in favor of adding things to the >> CommitFest, but I guess the one exception I would make is if there are >> outstanding comments already given that haven't been addressed yet. >> In that case it seems a little unfair to ask people to review it >> further unless there are very specific questions you need answered. I >> think you were good about communicating that the patch wasn't ready to >> be applied yet, but I also think that it's to be expected that you'll >> get less feedback while it's in that state. > > Yeah, makes sense, altho perhaps there needs to be a way to get incremental > progress reviewed? I think it's possible to get that, but there's a certain way you need to ask. As a general rule, anything that is of the form "here's my code, can you take a look" gets less attention - with the possible except of a patch from a committer who is planning to commit it if no one writes back. And even then it often doesn't get looked at. Code dumps are just no fun. Now if you write something like "here's my patch... I can't quite finish it because of X and I'm not sure whether the best solution is Y or Z", those tend to get answered a lot more often, at least IME. Reading a patch and trying to understand what it's doing and why it's doing it and whether it's really the best solution is a fairly time-consuming effort; giving the reader some context makes that a lot easier, and so people are more likely to help you if you do it, again IME. ...Robert