Home > mailing lists

Re: Add min and max execute statement time in pg_stat_statement - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	Re: Add min and max execute statement time in pg_stat_statement
Date	January 23, 2014 17:19:32
Msg-id	52E124B2.6020905@dunslane.net Whole thread Raw
In response to	Re: Add min and max execute statement time in pg_stat_statement (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Responses	Re: Add min and max execute statement time in pg_stat_statement
List	pgsql-hackers

Tree view

On 01/22/2014 11:33 PM, KONDO Mitsumasa wrote:
> (2014/01/23 12:00), Andrew Dunstan wrote:
>>
>> On 01/22/2014 08:28 PM, KONDO Mitsumasa wrote:
>>> (2014/01/22 22:26), Robert Haas wrote:
>>>> On Wed, Jan 22, 2014 at 3:32 AM, KONDO Mitsumasa
>>>> <kondo.mitsumasa@lab.ntt.co.jp> wrote:
>>>>>> OK, Kondo, please demonstrate benchmarks that show we have <1% 
>>>>>> impact
>>>>>> from this change. Otherwise we may need a config parameter to allow
>>>>>> the calculation.
>>>>>
>>>>> OK, testing DBT-2 now. However, error range of benchmark might be 
>>>>> 1% higher.
>>>>> So I show you detail HTML results.
>>>>
>>>> To see any impact from spinlock contention, I think you're pretty much
>>>> going to need a machine with >32 cores, I think, and lots of
>>>> concurrency.  pgbench -S is probably a better test than DBT-2, because
>>>> it leaves out all the writing, so percentage-wise more time will be
>>>> spent doing things like updating the pgss hash table.
>>> Oh, thanks to inform me. I think essential problem of my patch has 
>>> bottle neck
>>> in sqrt() function and other division caluculation. I will replcace 
>>> sqrt()
>>> function in math.h to more faster algorithm. And moving unneccessary 
>>> part of
>>> caluculation in LWlocks or other locks. It might take time to 
>>> improvement, so
>>> please wait for a while.
>>>
>>
>> Umm, I have not read the patch, but are you not using Welford's 
>> method? Its
>> per-statement overhead should be absolutely tiny (and should not 
>> compute a square
>> root at all  per statement - the square root should only be computed 
>> when the
>> standard deviation is actually wanted, e.g. when a user examines
>> pg_stat_statements) See for example
>> <http://www.johndcook.com/standard_deviation.html>
> Thanks for your advice. I read your example roughly, however, I think 
> calculating variance is not so heavy in my patch. Double based sqrt 
> caluculation is most heavily in my mind. And I find fast square root 
> algorithm that is used in 3D games.
> http://en.wikipedia.org/wiki/Fast_inverse_square_root
>
> This page shows inverse square root algorithm, but it can caluculate 
> normal square root, and it is faster algorithm at the price of 
> precision than general algorithm. I think we want to fast algorithm, 
> so it will be suitable.

According to the link I gave above:
   The most obvious way to compute variance then would be to have two   sums: one to accumulate the sum of the x's and
anotherto accumulate   the sums of the squares of the x's. If the x's are large and the   differences between them
small,direct evaluation of the equation   above would require computing a small number as the difference of   two large
numbers,a red flag for numerical computing. The loss of   precision can be so bad that the expression above evaluates
toa   /negative/ number even though variance is always positive. 

As I read your patch that's what it seems to be doing.

What is more, if the square root calculation is affecting your 
benchmarks, I suspect you are benchmarking the wrong thing. The 
benchmarks should not call for a single square root calculation. What we 
really want to know is what is the overhead from keeping these stats. 
But your total runtime code (i.e. code NOT from calling 
pg_stat_statements()) for stddev appears to be this:
   e->counters.total_sqtime += total_time * total_time;

cheers

andrew

pgsql-hackers by date:

From: Kohei KaiGai
Date: 23 January 2014, 17:11:19
Subject: Re: dynamic shared memory and locks

From: Amit Kapila
Date: 23 January 2014, 17:23:14
Subject: Re: [bug fix] pg_ctl always uses the same event source

Re: Add min and max execute statement time in pg_stat_statement - Mailing list pgsql-hackers

Previous

Next