Re: Re: Abbreviated keys for Datum tuplesort - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Re: Abbreviated keys for Datum tuplesort
Date
Msg-id CAM3SWZTiZt+dfvkYArHnYvjmc6ewuQyXawp4ZWfghhVaXMjP-Q@mail.gmail.com
Whole thread Raw
In response to Re: Re: Abbreviated keys for Datum tuplesort  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Apr 2, 2015 at 11:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Apr 2, 2015 at 5:34 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> I think it's explained by the pre-check for sorted input making the
>> number of comparisons exactly n -1. As I pointed out to Tomas, if you
>> put a single, solitary unsorted element at the end, the abbreviated
>> version is then 8x faster (maybe that was in relation to a slightly
>> different case, but the pattern is the same). So this case isn't an
>> argument against datum abbreviation, or even abbreviation in general,
>> but rather an argument against using strxfrm() in general (which for
>> example the GCC docs strongly recommend for sorting lists of strings).
>> It's a bad argument, IMV. This sort is already extremely fast.
>
> OK, I see.

Wait. It's also due to the fact that some cases are spuriously
aborted, because the cost model for text needs to be fixed, I believe.
It's not clear that this is actually such a case from Tomas' remarks,
because the smaller scale tested *did* win significantly, and that
appeared to be otherwise comparable to the regressed cases. So
although what I describe here about perfectly pre-sorted input is
something I've seen (and something that we discussed on another thread
IIRC), this is not an example of it. Again, this is just an example of
why we need to fix the cost model for text.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Re: Abbreviated keys for Datum tuplesort
Next
From: Petr Jelinek
Date:
Subject: Unused variable in hashpage.c