Re: [HACKERS] WIP: Aggregation push-down - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] WIP: Aggregation push-down
Date
Msg-id 20200111011124.efacqhdsptc36pm2@development
Whole thread Raw
In response to Re: [HACKERS] WIP: Aggregation push-down  (Antonin Houska <ah@cybertec.at>)
Responses Re: [HACKERS] WIP: Aggregation push-down
List pgsql-hackers
Hi,

I've been looking at the last version (v14) of this patch series,
submitted way back in July and unfortunately quiet since then. Antonin
is probably right one of the reasons for the lack of reviews is that it
requires interest/experience with planner.

Initially it was also a bit hard to understand what are the benefits
(and the patch shifted a bit), which is now mostly addressed by the
README in the last patch. The trouble is that's hidden in the patch and
so not immediately obvious to people considering reviewing it :-( Tom
posted a nice summary in November 2018, but it was perhaps focused on
the internals.

So my first suggestion it to write a short summary with example how it
affects practical queries (plan change, speedup) etc.

My second suggestion is to have meaningful commit messages for each part
of the patch series, explaining why we need that particular part. It
might have been explained somewhere in the thread, but as a reviewer I
really don't want to reverse-engineer the whole thread.


Now, regarding individual parts of the patch:


1) v14-0001-Introduce-RelInfoList-structure.patch
-------------------------------------------------

- I'm not entirely sure why we need this change. We had the list+hash
   before, so I assume we do this because we need the output functions?

- The RelInfoList naming is pretty confusing, because it's actually not
   a list but a combination of list+hash. And the embedded list is called
   items, to make it yet a bit more confusing. I suggest we call this
   just RelInfo or RelInfoLookup or something else not including List.

- RelInfoEntry seems severely under-documented, particularly the data
   part (which is void* making it even harder to understand what's it for
   without having to read the functions). AFAICS it's just simply a link
   to the embedded list, but maybe I'm wrong.

- I suggest we move find_join_rel/add_rel_info/add_join_rel in relnode.c
   right before build_join_rel. This makes diff clearer/smaller and
   visual diffs nicer.


2) v14-0002-Introduce-make_join_rel_common-function.patch
---------------------------------------------------------

I see no point in keeping this change in a separate patch, as it prety
much just renames make_join_rel to make_join_rel_common and then adds 

   make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
   {
       return make_join_rel_common(root, rel1, rel2);
   }

which seems rather trivial and useless on it's own. I'd just merge it
into 0003 where we use the _common function more extensively.


3) v14-0003-Aggregate-push-down-basic-functionality.patch
---------------------------------------------------------

I haven't done much review on this yet, but I've done some testing and
benchmarking so let me share at least that. The queries I used were of
the type I mentioned earlier in this thread - essentially a star schema,
i.e. fact table referencing dimensions, with aggregation per columns in
the dimension. So something like

   SELECT d.c, sum(f) FROM fact JOIN dimension d ON (d.id = f.d_id)
   GROUP BY d.c;

where "fact" table is much much larger than the dimension.

Attached is a script I used for testing with a bunch of queries and a
number of parameters (parallelism, size of dimension, size of fact, ...)
and a spreadsheed summarizing the results.

Overall, the results seem pretty good - in many cases the queries get
about 2x faster and I haven't seen any regressions. That's pretty nice.

One annoying thing is that this only works for non-parallel queries.
That is, I saw no improvements with parallel queries. It was still
faster than the non-parallel query with aggregate pushdown, but the
parallel query did not improve.

An example of timing looks like this:

   non-parallel (pushdown = off): 918 ms
   non-parallel (pushdown = on):  561 ms
   parallel (pushdown = off):     313 ms
   parallel (pushdown = on):      313 ms

The non-parallel query gets faster because after enabling push-down the
plan changes (and we get partial aggregate below the join). With
parallel query that does not happen, the plans stay the same.

I'm not claiming this is a bug - we end up picking the fastest execution
plan (i.e. we don't make a mistake by e.g. switching to the non-parallel
one with pushdown).

My understanding is that the aggregate pushdown can't (currently?) be
used with queries that use partial aggregate because of parallelism.
That is we can either end up with a plan like this:

   Finalize Aggregate
     -> Join
       -> Partial Aggregate
       -> ...

or a parallel plan like this:

   Finalize Aggregate
     -> Gather
         -> Partial Aggregate
             -> Join
                 -> ...
                 -> ...

but we currently don't support a combination of both

   Finalize Aggregate
     -> Gather
         -> Join
             -> Partial Aggregate
             -> ...

I wonder if that's actually even possible and what would it take to make
it work?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: nuko yokohama
Date:
Subject: Re: Implementing Incremental View Maintenance
Next
From: Michael Paquier
Date:
Subject: Re: [Logical Replication] TRAP:FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX"