Re: Eager aggregation, take 3 - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Eager aggregation, take 3
Date
Msg-id CAMbWs4-2cVfBk1HNGtqV1QFo2yKnzdxLy0BAqQaJHBt+8+kspw@mail.gmail.com
Whole thread Raw
In response to Re: Eager aggregation, take 3  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
I've run TPC-DS again to compare planning times with and without eager
aggregation.  Out of 99 queries, only one query (query 64) shows a
noticeable increase in planning time.  This query performs inner joins
across 38 tables.  This is a very large search space.  (I'm talking
about the standard join search method, not the GEQO.)

If my math doesn't fail me, the maximum number of different join
orders when joining n tables is: Catalan(n − 1) x n!.  For n = 38,
this number is astronomically large.  In practice, query 64 joins 19
tables twice (due to a CTE), which still results in about 3.4E28
different join orders.

Of course, in practice, with the help of join_collapse_limit and other
heuristics, the effective search space is reduced a lot, but even
then, it remains very large.  Given this, I'm not too surprised that
query 64 shows an increase in planning time when eager aggregation is
applied -- exploring the best join order in such a space is inherently
expensive.

That said, I've identified a few performance hotspots that can be
optimized to help reduce planning time:

1) the exprs_known_equal() call in get_expression_sortgroupref(),
which is used to check if a given expression is known equal to a
grouping expression due to ECs.  We can optimize this by storing the
EC of each grouping expression, and then get_expression_sortgroupref()
would only need to search the relevant EC, rather than scanning all of
them.

2) the estimate_num_groups() call in create_rel_agg_info().  We can
optimize this by avoiding unnecessary calls to estimate_num_groups()
where possible.

Attached is an updated version of the patch with these optimizations
applied.  With this patch, the planning times for query 64, with and
without eager aggregation, are:

-- with eager aggregation
 Planning Time: 9432.042 ms
-- without eager aggregation
 Planning Time: 7196.999 ms

I think the increase in planning time is acceptable given the large
search space involved, though I may be biased.

- Richard

Attachment

pgsql-hackers by date:

Previous
From: Paul A Jungwirth
Date:
Subject: Re: SQL:2011 Application Time Update & Delete
Next
From: Ashutosh Bapat
Date:
Subject: Re: Report bytes and transactions actually sent downtream