Home > mailing lists

Re: [HACKERS] WIP: Aggregation push-down - Mailing list pgsql-hackers

From	Antonin Houska
Subject	Re: [HACKERS] WIP: Aggregation push-down
Date	February 28, 2019 16:02:54
Msg-id	23468.1551369774@localhost Whole thread
In response to	Re: [HACKERS] WIP: Aggregation push-down (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [HACKERS] WIP: Aggregation push-down
List	pgsql-hackers

Tree view

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Antonin Houska <ah@cybertec.at> writes:
> > Michael Paquier <michael@paquier.xyz> wrote:
> >> Latest patch set fails to apply, so moved to next CF, waiting on
> >> author.
> 
> > Rebased.
> 
> This is in need of rebasing again :-(.  I went ahead and pushed the 001
> part, since that seemed fairly uncontroversial.

ok, thanks.

> I did not spend a whole lot of time looking at the patch today, but
> I'm still pretty distressed at the data structures you've chosen.
> I remain of the opinion that a grouped relation and a base relation
> are, er, unrelated, even if they happen to share the same relid set.
> So I do not see the value of the RelOptInfoSet struct you propose here,

ok. As you suggested upthread, I try now to reuse the join_rel_list /
join_rel_hash structures, see v11-001-Introduce_RelInfoList.patch.

> and I definitely don't think there's any value in having, eg,
> create_seqscan_path or create_index_path dealing with this stuff.

Originally I tried to aggregate any path that ever gets passed to agg_path(),
but that's probably not worth the code complexity. Now the partial aggregation
is only applied to paths that survived agg_path() on the plain relation.

> I also don't like changing create_nestloop_path et al to take a PathTarget
> rather than using the RelOptInfo's pathtarget; IMO, it's flat out wrong
> for a path to generate a tlist different from what its parent RelOptInfo
> says that the relation produces.

Likewise, the current patch version is less invasive, so create_nestloop_path
et al are not touched at all.

> I think basically the way this ought to work is that we generate baserel
> paths pretty much the same as today, and then we run through the baserels
> and see which ones are potentially worth doing partial aggregation on,
> and for each one that is, we create a separate "upper relation" RelOptInfo
> describing that.  The paths for this RelOptInfo would be
> partial-aggregation paths using paths from the corresponding baserel as
> input.  Then we'd run a join search that only considers joining grouped
> rels with plain rels (I concur that joining two grouped rels is not worth
> coping with, at least for now).

make_join_rel() is the core of my implementation: besides joining two plain
relations it tries to join plain relation to grouped one, and also to
aggregate the join of the two plain relations. I consider this approach less
invasive and more efficient than running the whole standard_join_search again
for the grouped rels.

The problem of numeric-like data types (i.e. types for wich equality of two
values of the grouping key does not justify putting them into the same group
because information like scale would be discarded this way) remains open. My
last idea was to add a boolean flag to operator class which tells that
equality implies "bitwise equality", and to disallow aggregate push-down if
SortGroupClause.eqop is in an opclass which has this field FALSE. I'd like to
hear your opinion before I do any coding.

-- 
Antonin Houska
https://www.cybertec-postgresql.com

Attachment

pgsql-hackers by date:

From: Robert Haas
Date: 28 February 2019, 15:58:49
Subject: Re: Why don't we have a small reserved OID range for patch revisions?

From: Joshua Brindle
Date: 28 February 2019, 16:03:28
Subject: Re: Re: Row Level Security − leakproof-ness and performance implications

Re: [HACKERS] WIP: Aggregation push-down - Mailing list pgsql-hackers

Attachment

Previous

Next