>>>>> "Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> Is there a performance test case where this patch should shineMark> brightest? I'd like to load a schema with
lotsof data, and runMark> a grouping sets query, both before and after applying the patch,Mark> to see what the
performanceadvantage is.
The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.
So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):
create table sales ( id serial, product_id integer, store_id integer, customer_id integer, qty integer);
-- random integer function
create function d(integer) returns integer language sqlas $f$ select floor(random()*$1)::integer + 1; $f$;
-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty) select d(20), d(6), d(10), d(100) from
generate_series(1,10000000);
-- example 2d cube:
select product_id, store_id, count(*), sum(qty) from salesgroup by cube(product_id, store_id);
-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty) from salesgroup by cube(product_id, store_id,
customer_id);
-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty) from salesgroup by cube(product_id, store_id,
customer_id,(qty / 10));
On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).
--
Andrew (irc:RhodiumToad)