Re: Re: Abbreviated keys for Datum tuplesort - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Re: Abbreviated keys for Datum tuplesort
Date
Msg-id 54E79F9C.4090208@2ndquadrant.com
Whole thread Raw
In response to Re: Re: Abbreviated keys for Datum tuplesort  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: Re: Abbreviated keys for Datum tuplesort  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 25.1.2015 12:15, Andrew Gierth wrote:
>
> So given some suitable test data, such as
> 
> create table stuff as select random()::text as randtext
>   from generate_series(1,1000000);  -- or however many rows
> 
> you can do
> 
> select percentile_disc(0) within group (order by randtext) from stuff;
> 
> or
> 
> select count(distinct randtext) from stuff;
> 
> The performance improvements I saw were pretty much exactly as
> expected from the improvement in the ORDER BY and CREATE INDEX cases.

I've spent a fair amount of testing this today, and when using the
simple percentile_disc example mentioned above, I see this pattern:
                                master   patched   speedup  ---------------------------------------------------------
generate_series(1,1000000)     4.2       0.7      6   generate_series(1,2000000)      9.2       9.8      0.93
generate_series(1,3000000)    14.5      15.3      0.95
 


so for a small dataset the speedup is very nice, but for larger sets
there's ~5% slowdown. Is this expected?


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: POLA violation with \c service=
Next
From: Peter Geoghegan
Date:
Subject: Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)