Home > mailing lists

cost_sort vs cost_agg - Mailing list pgsql-hackers

From	Andy Fan
Subject	cost_sort vs cost_agg
Date	January 14, 2021 13:42:36
Msg-id	CAKU4AWqx2TeXyhmr1TNBt0u+WNCr=dpTTpNcwudE=kNJbFVGtQ@mail.gmail.com Whole thread
Responses	Re: cost_sort vs cost_agg
List	pgsql-hackers

Tree view

Currently the cost_sort doesn't consider the number of columns to sort, which
means the cost of SELECT * FROM t ORDER BY a; equals with the SELECT *
FROM t ORDER BY a, b; which is obviously wrong. The impact of this is when we
choose the plan for SELECT DISTINCT * FROM t ORDER BY c between:

Sort
Sort Key: c
-> HashAggregate
Group Key: c, a, b, d, e, f, g, h, i, j, k, l, m, n

and

Unique
-> Sort
Sort Key: c, a, b, d, e, f, g, h, i, j, k, l, m, n

Since "Sort (c)" has the same cost as "Sort (c, a, b, d, e, f, g, h, i, j, k,
l, m, n)", and Unique node on a sorted input is usually cheaper than
HashAggregate, so the later one will win usually which might bad at many
places.

My patch v1 did a simple improvement for cost_sort, which will consider the
number of cols to sort. The main part is below:

cost_sort:
Assert(numSortCols);
/* Include the default cost-per-comparison */
+ comparison_cost += (2.0 * cpu_operator_cost * numSortCols);

However it still chooses a wrong plan in the simple case below.

create table wcols (a int , b int, c int, d int, e int, f int, g int, h int, i
int, j int, k int, l int, m int, n int);

insert into wcols select i, i , i, i , i, i , i, i, i, i, i, i, i, i from
generate_series(1, 1000000)i;

select distinct * from wcols order by c;

Optimizer chose HashAggregate with my patch, but it takes 6s. after set
enable_hashagg = off, it takes 2s.

The main reason is both cost_sort and cost_agg doesn't consider the real hash
function or real sort function, they use cpu_operator_cost instead. If we really
want to fix this issue, shall we a). figure the real pg_proc.oid for sort and
hash during planning stage and costing with that? b). in cost_sort, we may
consider the nature order of input data for the ordered column as well? c).

store the Oids in SortPath and AggPath to avoid the double calculation during
createPlan stage? or any better suggestion?

Thanks

Best Regards

Andy Fan (https://www.aliyun.com/)

Attachment

v1-0001-Improve-the-cost_sort-v1.patch

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 14 January 2021, 13:34:56
Subject: Re: ResourceOwner refactoring

From: Amit Langote
Date: 14 January 2021, 13:57:10
Subject: Re: POC: postgres_fdw insert batching

cost_sort vs cost_agg - Mailing list pgsql-hackers

Attachment

Previous

Next