On 21.2.2015 20:33, Peter Geoghegan wrote:
> On Sat, Feb 21, 2015 at 10:57 AM, Peter Geoghegan <pg@heroku.com>
>
>> That's odd. I have a hard time thinking of why the datum sort
>> patch could be at fault, though.
>
> Oh, wait. For queries like this, which I now see in your
> spreadsheet:
>
> select * from (select * from stuff_text order by randtxt offset
> 100000000000) foo
>
> There is no reason to think that either patch will improve things
> over master branch tip's performance. This doesn't use a datum
> tuplesort. So that explains it, I think.
Really? Because those are the queries that you posted on 26/1 to
demonstrate that this patch makes sorting Numeric even faster than
sorting float8.
And for the Numeric data type this actually gets significant speedup
with the numeric_sortsupp.patch (~4x).
But maybe for text that works differently?
> Although I cannot easily explain the disparity in performance between
> 1M and 5M sized sets for this query:
>
> select count(distinct randtxt) from stuff_text
>
> You did make sure that the queries didn't spill to disk, right? Or
> that they did so consistently, at least.
All the queries were running with work_mem=1GB, and I don't think they
were spilling to disk. Actually, I don't have the 'merge' patch applied,
so that would probably crash because of SIGSEGV.
> There is also no reason to think that integer or float datum sorts
> will be accelerated, because they could always use sortsupport - the
> datum sort patch is only about making it also possible to also use
> abbreviation for opclasses that support it, like text (and,
> eventually, numeric).
Yes, I'm aware of that. I used that as a control group, to get and idea
of how noisy the results are, and maybe check if the patches don't
affect it for some unexpected reason.
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services