Home > mailing lists

Re: DML and column cound in aggregated subqueries - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: DML and column cound in aggregated subqueries
Date	October 31, 2016 13:36:04
Msg-id	17969.1477920957@sss.pgh.pa.us Whole thread Raw
In response to	DML and column cound in aggregated subqueries (Andres Freund <andres@anarazel.de>)
Responses	Re: DML and column cound in aggregated subqueries
List	pgsql-hackers

Tree view

Andres Freund <andres@anarazel.de> writes:
> this doesn't look right. The ctid shouldn't be in the aggregate output -
> after all it's pretty much meaningless here.

I suspect it's being added to support EvalPlanQual row re-fetches.

> Casting a wider net: find_hash_columns() and it's subroutines seem like
> pretty much dead code, including outdated comments (look for "SQL99
> semantics").

Hm, maybe.  In principle the planner could do that instead, but it doesn't
look like it actually does at the moment; I'm not seeing any distinction
in tlist-building for AGG_HASHED vs other cases in create_agg_plan().
As an example:

regression=# explain verbose select avg(ten), hundred from tenk1 group by 2;
                        QUERY PLAN                                                                       
--------------------------------------------------------------------------------
-----------------------------------------------------------------------HashAggregate  (cost=508.00..509.25 rows=100
width=36) Output: avg(ten), hundred  Group Key: tenk1.hundred  ->  Seq Scan on public.tenk1  (cost=0.00..458.00
rows=10000width=8)        Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, tw 
othousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
(5 rows)

If you wanted to forgo find_hash_columns() then it would be important
for the seqscan to output a minimal tlist rather than try to optimize
away its projection step.

I'm not that excited about making such a change in terms of performance:
you'd essentially be skipping a handmade projection step inside nodeAgg
at the cost of probably adding one at the input node, as in this example.
But it might be worth doing anyway just on the grounds that this ought to
be the planner's domain not a custom hack in the executor.
        regards, tom lane

pgsql-hackers by date:

From: Tomas Vondra
Date: 31 October 2016, 13:32:27
Subject: Re: Speed up Clog Access by increasing CLOG buffers

From: Robert Haas
Date: 31 October 2016, 13:44:23
Subject: Re: Dumb mistakes in WalSndWriteData()

Re: DML and column cound in aggregated subqueries - Mailing list pgsql-hackers

Previous

Next