Re: Abbreviated keys for Numeric - Mailing list pgsql-hackers

From Gavin Flower
Subject Re: Abbreviated keys for Numeric
Date
Msg-id 54E81926.8000505@archidevsys.co.nz
Whole thread Raw
In response to Re: Abbreviated keys for Numeric  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Abbreviated keys for Numeric  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Abbreviated keys for Numeric  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
On 21/02/15 18:18, Tomas Vondra wrote:
> Hi,
>
> On 21.2.2015 02:06, Tomas Vondra wrote:
>> On 21.2.2015 02:00, Andrew Gierth wrote:
>>>>>>>> "Tomas" == Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>   >> Right...so don't test a datum sort case, since that isn't supported
>>>   >> at all in the master branch. Your test case is invalid for that
>>>   >> reason.
>>>
>>>   Tomas> What do you mean by 'Datum sort case'?
>>>
>>> A case where the code path goes via tuplesort_begin_datum rather than
>>> tuplesort_begin_heap.
>>>
>>>   Tomas> The test I was using is this:
>>>
>>>   Tomas>    select percentile_disc(0) within group (order by randnum) from stuff;
>>>
>>> Sorting single columns in aggregate calls uses the Datum sort path (in
>>> fact I think it's currently the only place that does).
>>>
>>> Do that test with _both_ the Datum and Numeric sort patches in place,
>>> and you will see the effect. With only the Numeric patch, the numeric
>>> abbrev code is not called.
>> D'oh! Thanks for the explanation.
> OK, so I've repeated the benchmarks with both patches applied, and I
> think the results are interesting. I extended the benchmark a bit - see
> the SQL script attached.
>
>    1) multiple queries
>
>       select percentile_disc(0) within group (order by val) from stuff
>
>       select count(distinct val) from stuff
>
>       select * from
>         (select * from stuff order by val offset 100000000000) foo
>
>    2) multiple data types - int, float, text and numeric
>
>    3) multiple scales - 1M, 2M, 3M, 4M and 5M rows
>
> Each query was executed 10x, the timings were averaged. I do know some
> of the data types don't benefit from the patches, but I included them to
> get a sense of how noisy the results are.
>
> I did the measurements for
>
>    1) master
>    2) master + datum_sort_abbrev.patch
>    3) master + datum_sort_abbrev.patch + numeric_sortsup.patch
>
> and then computed the speedup for each type/scale combination (the
> impact on all the queries is almost exactly the same).
>
> Complete results are available here: http://bit.ly/1EA4mR9
>
> I'll post all the summary here, although some of the numbers are about
> the other abbreviated keys patch.
>
>
> 1) datum_sort_abbrev.patch vs. master
>
>      scale      float      int    numeric     text
>      ---------------------------------------------
>      1          101%       99%       105%     404%
>      2          101%       98%        96%      98%
>      3          101%      101%        99%      97%
>      4          100%      101%        98%      95%
>      5           99%       98%        93%      95%
>
> 2) numeric_sortsup.patch vs. master
>
>      scale     float       int    numeric     text
>      ---------------------------------------------
>      1           97%       98%       374%     396%
>      2          100%      101%       407%      96%
>      3           99%      102%       407%      95%
>      4           99%      101%       423%      92%
>      5           95%       99%       411%      92%
>
>
> I think the gains are pretty awesome - I mean, 400% speedup for Numeric
> accross the board? Yes please!
>
> The gains for text are also very nice, although in this case that only
> happens for the smallest scale (1M rows), and for larger scales it's
> actually slower than current master :-(
>
> It's not just rainbows and unicorns, though. With both patches applied,
> text sorts get even slower (up to ~8% slower than master), It also seems
> to impact float (which gets ~5% slower, for some reason), but I don't
> see how that could happen ... but I suspect this might be noise.
>
> I'll repeat the tests on another machine after the weekend, and post an
> update whether the results are the same or significantly different.
>
> regards
>
>
>
What are the standard deviations?

Do the arithmetic means change much if you exclude the 2 fastest & 2 
slowest?

How do the arithmetic means compare to their respective medians?

Essentially, how consistent are the results, or how great is the noise?  
There may be better indicators than the ones I've suggested above.



Cheers,
Gavin




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Abbreviated keys for Numeric
Next
From: Pavel Stehule
Date:
Subject: Re: POLA violation with \c service=