Re: Disk-based hash aggregate's cost model - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Disk-based hash aggregate's cost model
Date
Msg-id f8848998e8890149747b0ad6cf2f4244d6f841b8.camel@j-davis.com
Whole thread Raw
In response to Re: Disk-based hash aggregate's cost model  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Disk-based hash aggregate's cost model  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote:
> So I'm wondering if the hashagg is not ignoring similar non-I/O costs
> for the spilling case. In particular, the initial section computing
> startup_cost seems to ignore that we may need to so some of the stuff
> repeatedly - for example we'll repeat hash lookups for spilled
> tuples,
> and so on.

I tried to analyze this using a slightly different approach: cost units
per second of runtime. Obviously this will vary based on the machine,
but it's interesting anyway.

All of the following fit in system memory. Schema, data, and queries
are at the end of this email.

A low value of cost-units/second-runtime means "more likely to be
chosen incorrectly" and a high value means "more likely to be missed
incorrectly".

Plan    | work_mem | 10M  | 100M INT4    | 100M | 10M
        |          | INT4 | (10M groups) | INT4 | TEXT
--------+----------+------+--------------+------+-----
HashAgg | 4MB      |  88  |  63          |  82  | 78
HashAgg | 1TB      |  41  |  37          |  33  | 38
Sort    | 4MB      | 182  | 188          | 174  | 37
Sort    | 1TB      | 184  | 231          | 189  | 30


Sort is consistent for a given datatype, but it seems that the
comparison cost is far from proportionate between data types.

HashAgg is consistent between the data types, but the disk costs play a
larger role (in this case, there is no actual disk access to worry
about, because it fits in system memory).

At least for these simple queries, Sort is punished unfairly for INT4,
but gets an unfair boost for TEXT.

It seems that we should make a change here, but to be conservative for
13, we should limit changes to the new plans, which are the first line
(HashAgg that spills).

The attached patch makes two adjustments: a 2X disk penalty for
HashAgg, and I also add:

  spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost

The new results:

Plan    | work_mem | 10M  | 100M INT4    | 100M | 10M
        |          | INT4 | (10M groups) | INT4 | TEXT
--------+----------+------+--------------+------+-----
HashAgg | 4MB      | 192  | 131          | 178  | 166


That's much more comparable to Sort for INT4, but makes the gap wider
for TEXT. Fortunately, at least for my simple queries, it just barely
avoids a plan change to Sort for the TEXT case (which is nearly 5X
slower than HashAgg).

I think this approach is reasonable for 13: it only changes the costing
for HashAgg plans that are expected to spill, it's fairly conservative
so we will not get lots of new bad plans, and it still seems to use
HashAgg for cases where it clearly should.

Note: in-memory HashAgg is still unfairly favored over Sort, at least
for these cases.

Regards,
    Jeff Davis


create table t10m(i int);
insert into t10m select (random()*1000000000)::int from
generate_series(1,10000000);
explain analyze select distinct i from t10m;

create table t100m(i int);
insert into t100m select (random()*2000000000)::int from
generate_series(1,100000000);
explain analyze select distinct i from t100m;

-- 100m tuples, 10m groups, 10tuples/group
create table t100m_10(i int);
insert into t100m_10 select (random()*10000000)::int from
generate_series(1,100000000);
explain analyze select distinct i from t100m_10;

create table text10m(t text collate "C.UTF-8", i int, n numeric);
insert into text10m select s.g::text, s.g, s.g::numeric from (select
(random()*1000000000)::int as g from generate_series(1,10000000)) s;
explain analyze select distinct t from text10m;



Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Two fsync related performance issues?
Next
From: Peter Geoghegan
Date:
Subject: Re: Disk-based hash aggregate's cost model