Home > mailing lists

Re: Re: Abbreviated keys for Datum tuplesort - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Re: Abbreviated keys for Datum tuplesort
Date	February 20, 2015 20:57:22
Msg-id	54E79F9C.4090208@2ndquadrant.com Whole thread Raw
In response to	Re: Re: Abbreviated keys for Datum tuplesort (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses	Re: Re: Abbreviated keys for Datum tuplesort
List	pgsql-hackers

Tree view

On 25.1.2015 12:15, Andrew Gierth wrote:
>
> So given some suitable test data, such as
> 
> create table stuff as select random()::text as randtext
>   from generate_series(1,1000000);  -- or however many rows
> 
> you can do
> 
> select percentile_disc(0) within group (order by randtext) from stuff;
> 
> or
> 
> select count(distinct randtext) from stuff;
> 
> The performance improvements I saw were pretty much exactly as
> expected from the improvement in the ORDER BY and CREATE INDEX cases.

I've spent a fair amount of testing this today, and when using the
simple percentile_disc example mentioned above, I see this pattern:
                                master   patched   speedup  ---------------------------------------------------------
generate_series(1,1000000)     4.2       0.7      6   generate_series(1,2000000)      9.2       9.8      0.93
generate_series(1,3000000)    14.5      15.3      0.95
 


so for a small dataset the speedup is very nice, but for larger sets
there's ~5% slowdown. Is this expected?


-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Alvaro Herrera
Date: 20 February 2015, 20:55:27
Subject: Re: POLA violation with \c service=

From: Peter Geoghegan
Date: 20 February 2015, 21:01:24
Subject: Re: failures with tuplesort and ordered set aggregates (due to 5cefbf5a6c44)

Re: Re: Abbreviated keys for Datum tuplesort - Mailing list pgsql-hackers

Previous

Next