Home > mailing lists

Re: Abbreviated keys for Numeric - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Abbreviated keys for Numeric
Date	February 23, 2015 19:56:35
Msg-id	54EB5BA8.8050700@2ndquadrant.com Whole thread Raw
In response to	Re: Abbreviated keys for Numeric (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses	Re: Abbreviated keys for Numeric
List	pgsql-hackers

Tree view

Hi,

On 23.2.2015 11:59, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> 
>  Tomas> Interesting, but I think Gavin was asking about how much
>  Tomas> variation was there for each tested case (e.g. query executed on
>  Tomas> the same code / dataset). And in those cases the padding /
>  Tomas> alignment won't change, and thus the effects you describe won't
>  Tomas> influence the results, no?
> 
> My point is exactly the fact that since the result is not affected,
> this variation between runs of the same code is of no real relevance
> to the question of whether a given change in performance can properly
> be attributed to a patch.
> 
> Put it this way: suppose I have a test that when run repeatedly with no
> code changes takes 6.10s (s=0.025s), and I apply a patch that changes
> that to 6.26s (s=0.025s). Did the patch have an impact on performance?
> 
> Now suppose that instead of applying the patch I insert random amounts
> of padding in an unused function and find that my same test now takes a
> mean of 6.20s (s=0.058s) when I take the best timing for each padding
> size and calculate stats across sizes. Now it looks obvious that the
> actual code of the patch probably wasn't responsible for any change...
> 
> The numbers used here aren't theoretical; they are obtained by testing a
> single query - "select * from d_flt order by v offset 10000000" where
> d_flt contains 5 million float8 values - over 990 times with 33
> different random padding sizes (uniform in 0-32767). Here's a scatter
> plot, with 3 runs of each padding size so you can see the repeatability:
> http://tinyurl.com/op9qg8a

I think we're talking about slightly different things, then.

I believe Gavin was asking about variability for executions with a
particular code (i.e. with fixed amount of padding), to decide whether
it even makes sense to compare results for different patches or whether
the differences are just random noise.

Interpreting those differences - whether they are due to changes in the
algorithm or a result of some padding somewhere else in the code, that's
of course important too.

I believe the small regressions (1-10%) for small data sets, might be
caused by this 'random padding' effect, because that's probably where
L1/L2 cache is most important. For large datasets the caches are
probably not as efficient anyway, so the random padding makes no
difference, and the speedup is just as good as for the other queries.
See for example this:
 http://www.postgresql.org/message-id/54EB580C.2000904@2ndquadrant.com

But I'm speculating here ... time for a profiler, I guess.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Andres Freund
Date: 23 February 2015, 19:54:18
Subject: Re: Primary not sending to synchronous standby

From: Heikki Linnakangas
Date: 23 February 2015, 19:56:42
Subject: Re: Redesigning checkpoint_segments

Re: Abbreviated keys for Numeric - Mailing list pgsql-hackers

Previous

Next