Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02) - Mailing list pgsql-hackers

From Julian Markwort
Subject Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
Date
Msg-id 1bd396a9-4573-55ad-7ce8-fe7adffa1bd9@uni-muenster.de
Whole thread Raw
In response to Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans  (Julian Markwort <julian.markwort@uni-muenster.de>)
Responses Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
List pgsql-hackers
Hello hackers,

I'd like to follow up to my previous proposition of tracking (some) best
and worst plans for different queries in the pg_stat_statements extension.

Based on the comments and suggestions made towards my last endeavour,
I've taken the path of computing the interquartile distance (by means of
an adapted z-test, under the assumption of normal distribution, based on
the mean_time and stddev_time already used by the extension).

A bad plan is recorded, if there is no previously recorded plan, or if
the current execution time is greater than the maximum of the previously
recorded plan's time and the query's mean+1.5*interquartile_distance.
A good plan is recorded on a similar condition; The execution time needs
to be shorter than the minimum of the previously recorded good plan's
time and the query's mean-1.5*interquartile_distance.

The boundaries are chosen to resemble the boundaries for whiskers in
boxplots.
Using these boundaries, plans will be updated very seldomly, as long as
they are more or less normally distributed.
Changes in the plans (for example the use of indices) used for each kind
of query will most likely result in execution times exceeding these
boundaries, so such changes are (very probably) recorded.

The ideal solution would be to compare the current plan with the last
plan and only update when there is a difference between them, however I
think this is unreasonably complex and a rather expensive task to
compute on the completion of every query.

The intent of this patch is to provide a quick insight into the plans
currently used by the database for the execution of certain queries. The
tracked plans only represent instances of queries with very good or very
poor performance.

I've (re)submitted this patch for the next commitfest as well.

Kind regards
Julian


On 03/04/2017 02:56 PM, Julian Markwort wrote:
> Alright, for the next version of this patch I'll look into standard
> deviation (an implementation of Welfords' algorithm already exists in
> pg_stat_statements).
>
> On 3/4/17 14:18, Peter Eisentraut wrote:
>
>> The other problem is that this measures execution time, which can vary
>> for reasons other than plan.  I would have expected that the cost
>> numbers are tracked somehow.
> I've already thought of tracking specific parts of the explanation,
> like the cost numbers, instead of the whole string, I'll think of
> something, but if anybody has any bright ideas in the meantime, I'd
> gladly listen to them.
>
>> There is also the issue of generic vs specific plans, which this
>> approach might be papering over.
> Would you be so kind and elaborate a little bit on this? I'm not sure
> if I understand this correctly. This patch only tracks specific plans,
> yes. The inital idea was that there might be some edge-cases that are
> not apparent when looking at generalized plans or queries.
>
> kind regards
> Julian


Attachment

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: AS OF queries
Next
From: Vik Fearing
Date:
Subject: Re: [HACKERS] PATCH: psql tab completion for SELECT