Re: Abbreviated keys for Numeric - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Abbreviated keys for Numeric
Date
Msg-id CAM3SWZQgEAG0ij2JzAJtf-GjAkXa6Q5OWdEijJiNC6rxq3YkpQ@mail.gmail.com
Whole thread Raw
In response to Re: Abbreviated keys for Numeric  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Abbreviated keys for Numeric  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sat, Feb 21, 2015 at 10:57 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Feb 20, 2015 at 9:18 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> The gains for text are also very nice, although in this case that only
>> happens for the smallest scale (1M rows), and for larger scales it's
>> actually slower than current master :-(
>
> That's odd. I have a hard time thinking of why the datum sort patch
> could be at fault, though.

Oh, wait. For queries like this, which I now see in your spreadsheet:

select * from (select * from stuff_text order by randtxt offset
100000000000) foo

There is no reason to think that either patch will improve things over
master branch tip's performance. This doesn't use a datum tuplesort.
So that explains it, I think. Although I cannot easily explain the
disparity in performance between 1M and 5M sized sets for this query:

select count(distinct randtxt) from stuff_text

You did make sure that the queries didn't spill to disk, right? Or
that they did so consistently, at least.

There is also no reason to think that integer or float datum sorts
will be accelerated, because they could always use sortsupport - the
datum sort patch is only about making it also possible to also use
abbreviation for opclasses that support it, like text (and,
eventually, numeric).
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Next
From: Stephen Frost
Date:
Subject: Re: deparsing utility commands