Re: [HACKERS] Hash support for grouping sets - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: [HACKERS] Hash support for grouping sets
Date
Msg-id 87tw6j94y7.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: [HACKERS] Hash support for grouping sets  (Mark Dilger <hornschnorter@gmail.com>)
List pgsql-hackers
>>>>> "Mark" == Mark Dilger <hornschnorter@gmail.com> writes:
Mark> Is there a performance test case where this patch should shineMark> brightest?  I'd like to load a schema with
lotsof data, and runMark> a grouping sets query, both before and after applying the patch,Mark> to see what the
performanceadvantage is.
 

The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.

So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):

create table sales ( id serial, product_id integer, store_id integer, customer_id integer, qty integer);

-- random integer function
create function d(integer) returns integer language sqlas $f$ select floor(random()*$1)::integer + 1; $f$;

-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty) select d(20), d(6), d(10), d(100) from
generate_series(1,10000000);

-- example 2d cube:
select product_id, store_id, count(*), sum(qty) from salesgroup by cube(product_id, store_id);

-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty) from salesgroup by cube(product_id, store_id,
customer_id);

-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty) from salesgroup by cube(product_id, store_id,
customer_id,(qty / 10));
 

On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).

-- 
Andrew (irc:RhodiumToad)



pgsql-hackers by date:

Previous
From: Mithun Cy
Date:
Subject: Re: [HACKERS] [POC] A better way to expand hash indexes.
Next
From: Teodor Sigaev
Date:
Subject: Re: [HACKERS] Potential data loss of 2PC files