Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02) - Mailing list pgsql-hackers
From | Julian Markwort |
---|---|
Subject | Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02) |
Date | |
Msg-id | 1bd396a9-4573-55ad-7ce8-fe7adffa1bd9@uni-muenster.de Whole thread Raw |
In response to | Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (Julian Markwort <julian.markwort@uni-muenster.de>) |
Responses |
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02) |
List | pgsql-hackers |
Hello hackers, I'd like to follow up to my previous proposition of tracking (some) best and worst plans for different queries in the pg_stat_statements extension. Based on the comments and suggestions made towards my last endeavour, I've taken the path of computing the interquartile distance (by means of an adapted z-test, under the assumption of normal distribution, based on the mean_time and stddev_time already used by the extension). A bad plan is recorded, if there is no previously recorded plan, or if the current execution time is greater than the maximum of the previously recorded plan's time and the query's mean+1.5*interquartile_distance. A good plan is recorded on a similar condition; The execution time needs to be shorter than the minimum of the previously recorded good plan's time and the query's mean-1.5*interquartile_distance. The boundaries are chosen to resemble the boundaries for whiskers in boxplots. Using these boundaries, plans will be updated very seldomly, as long as they are more or less normally distributed. Changes in the plans (for example the use of indices) used for each kind of query will most likely result in execution times exceeding these boundaries, so such changes are (very probably) recorded. The ideal solution would be to compare the current plan with the last plan and only update when there is a difference between them, however I think this is unreasonably complex and a rather expensive task to compute on the completion of every query. The intent of this patch is to provide a quick insight into the plans currently used by the database for the execution of certain queries. The tracked plans only represent instances of queries with very good or very poor performance. I've (re)submitted this patch for the next commitfest as well. Kind regards Julian On 03/04/2017 02:56 PM, Julian Markwort wrote: > Alright, for the next version of this patch I'll look into standard > deviation (an implementation of Welfords' algorithm already exists in > pg_stat_statements). > > On 3/4/17 14:18, Peter Eisentraut wrote: > >> The other problem is that this measures execution time, which can vary >> for reasons other than plan. I would have expected that the cost >> numbers are tracked somehow. > I've already thought of tracking specific parts of the explanation, > like the cost numbers, instead of the whole string, I'll think of > something, but if anybody has any bright ideas in the meantime, I'd > gladly listen to them. > >> There is also the issue of generic vs specific plans, which this >> approach might be papering over. > Would you be so kind and elaborate a little bit on this? I'm not sure > if I understand this correctly. This patch only tracks specific plans, > yes. The inital idea was that there might be some edge-cases that are > not apparent when looking at generalized plans or queries. > > kind regards > Julian
Attachment
pgsql-hackers by date: