Re: Partial aggregates pushdown - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Partial aggregates pushdown
Date
Msg-id CA+TgmoaiH_Na3y17PankZH59EgJWbgdPM-szb1eP6DQoOAMgYg@mail.gmail.com
Whole thread Raw
In response to Re: Partial aggregates pushdown  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Partial aggregates pushdown
Re: Partial aggregates pushdown
List pgsql-hackers
On Mon, Nov 20, 2023 at 5:48 PM Bruce Momjian <bruce@momjian.us> wrote:
> > I do have a concern about this, though. It adds a lot of bloat. It
> > adds a whole lot of additional entries to pg_aggregate, and every new
> > aggregate we add in the future will require a bonus entry for this,
> > and it needs a bunch of new pg_proc entries as well. One idea that
> > I've had in the past is to instead introduce syntax that just does
> > this, without requiring a separate aggregate definition in each case.
> > For example, maybe instead of changing string_agg(whatever) to
> > string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
> > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
> > something. Then all aggregates could be treated in a generic way. I'm
> > not completely sure that's better, but I think it's worth considering.
>
> So use an SQL keyword to indicates a pushdown call?  We could then
> automate the behavior rather than requiring special catalog functions?

Right. It would require more infrastructure in the parser, planner,
and executor, but it would be infinitely reusable instead of needing a
new thing for every aggregate. I think that might be better, but to be
honest I'm not totally sure.

> > I don't think the patch does a good job explaining why HAVING,
> > DISTINCT, and ORDER BY are a problem. It seems to me that HAVING
> > shouldn't really be a problem, because HAVING is basically a WHERE
> > clause that occurs after aggregation is complete, and whether or not
> > the aggregation is safe shouldn't depend on what we're going to do
> > with the value afterward. The HAVING clause can't necessarily be
> > pushed to the remote side, but I don't see how or why it could make
> > the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a
> > little trickier: if we pushed down DISTINCT, we'd still have to
> > re-DISTINCT-ify when combining locally, and if we pushed down ORDER
> > BY, we'd have to do a merge pass to combine the returned values unless
> > we could prove that the partitions were non-overlapping ranges that
> > would be visited in the correct order. Although that all sounds
> > doable, I think it's probably a good thing that the current patch
> > doesn't try to handle it -- this is complicated already. But it should
> > explain why it's not handling it and maybe even a bit about how it
> > could be handling in the future, rather than just saying "well, this
> > kind of thing is not safe." The trouble with that explanation is that
> > it does nothing to help the reader understand whether the thing in
> > question is *fundamentally* unsafe or whether we just don't have the
> > right code to make it work.
>
> Makes sense.

Actually, I think I was wrong about this. We can't handle ORDER BY or
DISTINCT because we can't distinct-ify or order after we've already
partially aggregated. At least not in general, and not without
additional aggregate support functions. So what I said above was wrong
with respect to those. Or so I believe, anyway. But I still don't see
why HAVING should be a problem.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Change GUC hashtable to use simplehash?
Next
From: David Steele
Date:
Subject: Re: Add recovery to pg_control and remove backup_label