Re: Abbreviated keys for Numeric - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Abbreviated keys for Numeric |
Date | |
Msg-id | 54EB5BA8.8050700@2ndquadrant.com Whole thread Raw |
In response to | Re: Abbreviated keys for Numeric (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: Abbreviated keys for Numeric
|
List | pgsql-hackers |
Hi, On 23.2.2015 11:59, Andrew Gierth wrote: >>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > > Tomas> Interesting, but I think Gavin was asking about how much > Tomas> variation was there for each tested case (e.g. query executed on > Tomas> the same code / dataset). And in those cases the padding / > Tomas> alignment won't change, and thus the effects you describe won't > Tomas> influence the results, no? > > My point is exactly the fact that since the result is not affected, > this variation between runs of the same code is of no real relevance > to the question of whether a given change in performance can properly > be attributed to a patch. > > Put it this way: suppose I have a test that when run repeatedly with no > code changes takes 6.10s (s=0.025s), and I apply a patch that changes > that to 6.26s (s=0.025s). Did the patch have an impact on performance? > > Now suppose that instead of applying the patch I insert random amounts > of padding in an unused function and find that my same test now takes a > mean of 6.20s (s=0.058s) when I take the best timing for each padding > size and calculate stats across sizes. Now it looks obvious that the > actual code of the patch probably wasn't responsible for any change... > > The numbers used here aren't theoretical; they are obtained by testing a > single query - "select * from d_flt order by v offset 10000000" where > d_flt contains 5 million float8 values - over 990 times with 33 > different random padding sizes (uniform in 0-32767). Here's a scatter > plot, with 3 runs of each padding size so you can see the repeatability: > http://tinyurl.com/op9qg8a I think we're talking about slightly different things, then. I believe Gavin was asking about variability for executions with a particular code (i.e. with fixed amount of padding), to decide whether it even makes sense to compare results for different patches or whether the differences are just random noise. Interpreting those differences - whether they are due to changes in the algorithm or a result of some padding somewhere else in the code, that's of course important too. I believe the small regressions (1-10%) for small data sets, might be caused by this 'random padding' effect, because that's probably where L1/L2 cache is most important. For large datasets the caches are probably not as efficient anyway, so the random padding makes no difference, and the speedup is just as good as for the other queries. See for example this: http://www.postgresql.org/message-id/54EB580C.2000904@2ndquadrant.com But I'm speculating here ... time for a profiler, I guess. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: