Re: Partial aggregates pushdown - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Partial aggregates pushdown |
Date | |
Msg-id | ZV0US_1taS8Utkxk@momjian.us Whole thread Raw |
In response to | Re: Partial aggregates pushdown (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Tue, Nov 21, 2023 at 12:16:41PM -0500, Robert Haas wrote: > On Mon, Nov 20, 2023 at 5:48 PM Bruce Momjian <bruce@momjian.us> wrote: > > > I do have a concern about this, though. It adds a lot of bloat. It > > > adds a whole lot of additional entries to pg_aggregate, and every new > > > aggregate we add in the future will require a bonus entry for this, > > > and it needs a bunch of new pg_proc entries as well. One idea that > > > I've had in the past is to instead introduce syntax that just does > > > this, without requiring a separate aggregate definition in each case. > > > For example, maybe instead of changing string_agg(whatever) to > > > string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE > > > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or > > > something. Then all aggregates could be treated in a generic way. I'm > > > not completely sure that's better, but I think it's worth considering. > > > > So use an SQL keyword to indicates a pushdown call? We could then > > automate the behavior rather than requiring special catalog functions? > > Right. It would require more infrastructure in the parser, planner, > and executor, but it would be infinitely reusable instead of needing a > new thing for every aggregate. I think that might be better, but to be > honest I'm not totally sure. It would make it automatic. I guess we need to look at how big the patch is to do it. > > > I don't think the patch does a good job explaining why HAVING, > > > DISTINCT, and ORDER BY are a problem. It seems to me that HAVING > > > shouldn't really be a problem, because HAVING is basically a WHERE > > > clause that occurs after aggregation is complete, and whether or not > > > the aggregation is safe shouldn't depend on what we're going to do > > > with the value afterward. The HAVING clause can't necessarily be > > > pushed to the remote side, but I don't see how or why it could make > > > the aggregate itself unsafe to push down. DISTINCT and ORDER BY are a > > > little trickier: if we pushed down DISTINCT, we'd still have to > > > re-DISTINCT-ify when combining locally, and if we pushed down ORDER > > > BY, we'd have to do a merge pass to combine the returned values unless > > > we could prove that the partitions were non-overlapping ranges that > > > would be visited in the correct order. Although that all sounds > > > doable, I think it's probably a good thing that the current patch > > > doesn't try to handle it -- this is complicated already. But it should > > > explain why it's not handling it and maybe even a bit about how it > > > could be handling in the future, rather than just saying "well, this > > > kind of thing is not safe." The trouble with that explanation is that > > > it does nothing to help the reader understand whether the thing in > > > question is *fundamentally* unsafe or whether we just don't have the > > > right code to make it work. > > > > Makes sense. > > Actually, I think I was wrong about this. We can't handle ORDER BY or > DISTINCT because we can't distinct-ify or order after we've already > partially aggregated. At least not in general, and not without > additional aggregate support functions. So what I said above was wrong > with respect to those. Or so I believe, anyway. But I still don't see > why HAVING should be a problem. This should probably be documented in the patch. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
pgsql-hackers by date: