Home > mailing lists
Re: [HACKERS] WIP: Aggregation push-down - Mailing list pgsql-hackers

From	Antonin Houska
Subject	Re: [HACKERS] WIP: Aggregation push-down
Date	January 16, 2020 15:25:50
Msg-id	72558.1579188350@antos Whole thread
In response to	Re: [HACKERS] WIP: Aggregation push-down (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: WIP: Aggregation push-down Re: [HACKERS] WIP: Aggregation push-down
List	pgsql-hackers
Tree view
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> Hi,
> 
> I've been looking at the last version (v14) of this patch series,
> submitted way back in July and unfortunately quiet since then. Antonin
> is probably right one of the reasons for the lack of reviews is that it
> requires interest/experience with planner.
> 
> Initially it was also a bit hard to understand what are the benefits
> (and the patch shifted a bit), which is now mostly addressed by the
> README in the last patch. The trouble is that's hidden in the patch and
> so not immediately obvious to people considering reviewing it :-( Tom
> posted a nice summary in November 2018, but it was perhaps focused on
> the internals.
> 
> So my first suggestion it to write a short summary with example how it
> affects practical queries (plan change, speedup) etc.

I think README plus regression test output should be enough for someone who is
about to review patch as complex as this.

> My second suggestion is to have meaningful commit messages for each part
> of the patch series, explaining why we need that particular part. It
> might have been explained somewhere in the thread, but as a reviewer I
> really don't want to reverse-engineer the whole thread.

ok, done.

> Now, regarding individual parts of the patch:
> 
> 
> 1) v14-0001-Introduce-RelInfoList-structure.patch
> -------------------------------------------------
> 
> - I'm not entirely sure why we need this change. We had the list+hash
>   before, so I assume we do this because we need the output functions?

I believe that this is what Tom proposed in [1], see "Maybe an appropriate
preliminary patch is to refactor the support code ..." near the end of that
message. The point is that now we need two lists: one for "plain" relations
and one for grouped ones.

> - The RelInfoList naming is pretty confusing, because it's actually not
>   a list but a combination of list+hash. And the embedded list is called
>   items, to make it yet a bit more confusing. I suggest we call this
>   just RelInfo or RelInfoLookup or something else not including List.

I think it can be considered a list by the caller of add_join_rel() etc. The
hash table is hidden from user and is empty until the list becomes long
enough. The word "list" in the structure name may just indicate that new
elements can be added to the end, which shouldn't be expected if the structure
was an array.

I searched a bit in tools/pgindent/typedefs.list and found at least a few
structures that also do have "list" in the name but are not lists internally:
CatCList, FuncCandidateList, MCVList.

Actually I'm not opposed to renaming the structure, but don't have better idea
right now. As for *Lookup: following is the only use of such a structure name
in PG code and it's not something to store data in:

typedef enum
{
    IDENTIFIER_LOOKUP_NORMAL,    /* normal processing of var names */
    IDENTIFIER_LOOKUP_DECLARE,    /* In DECLARE --- don't look up names */
    IDENTIFIER_LOOKUP_EXPR        /* In SQL expression --- special case */
} IdentifierLookup;

> - RelInfoEntry seems severely under-documented, particularly the data
>   part (which is void* making it even harder to understand what's it for
>   without having to read the functions). AFAICS it's just simply a link
>   to the embedded list, but maybe I'm wrong.

This is just JoinHashEntry (which was also undocumented) renamed. I've added a
short comment now.

> - I suggest we move find_join_rel/add_rel_info/add_join_rel in relnode.c
>   right before build_join_rel. This makes diff clearer/smaller and
>   visual diffs nicer.

Hm, it might improve readability of the diff, but this API is exactly where I
still need feedback from Tom. I'm not eager to optimize the diff as long as
there's a risk that these functions will be removed or renamed.

> 2) v14-0002-Introduce-make_join_rel_common-function.patch
> ---------------------------------------------------------
> 
> I see no point in keeping this change in a separate patch, as it prety
> much just renames make_join_rel to make_join_rel_common and then adds 
> 
>   make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
>   {
>       return make_join_rel_common(root, rel1, rel2);
>   }
> 
> which seems rather trivial and useless on it's own. I'd just merge it
> into 0003 where we use the _common function more extensively.

ok. I thought that this improves readability of the diffs, but it doesn't look
that bad if this is included in 0003.

> 
> 3) v14-0003-Aggregate-push-down-basic-functionality.patch
> ---------------------------------------------------------
> 
> I haven't done much review on this yet, but I've done some testing and
> benchmarking so let me share at least that. The queries I used were of
> the type I mentioned earlier in this thread - essentially a star schema,
> i.e. fact table referencing dimensions, with aggregation per columns in
> the dimension. So something like
> 
>   SELECT d.c, sum(f) FROM fact JOIN dimension d ON (d.id = f.d_id)
>   GROUP BY d.c;
> 
> where "fact" table is much much larger than the dimension.
> 
> Attached is a script I used for testing with a bunch of queries and a
> number of parameters (parallelism, size of dimension, size of fact, ...)
> and a spreadsheed summarizing the results.
> 
> Overall, the results seem pretty good - in many cases the queries get
> about 2x faster and I haven't seen any regressions. That's pretty nice.

Thanks for such an elaborate testing! The ultimate goal of this patch is to
improve sharding (using postgres_fdw), but it's nice to see significant
improvement even for local queries.

> One annoying thing is that this only works for non-parallel queries.
> That is, I saw no improvements with parallel queries. It was still
> faster than the non-parallel query with aggregate pushdown, but the
> parallel query did not improve.

> An example of timing looks like this:
> 
>   non-parallel (pushdown = off): 918 ms
>   non-parallel (pushdown = on):  561 ms
>   parallel (pushdown = off):     313 ms
>   parallel (pushdown = on):      313 ms
> 
> The non-parallel query gets faster because after enabling push-down the
> plan changes (and we get partial aggregate below the join). With
> parallel query that does not happen, the plans stay the same.
> 
> I'm not claiming this is a bug - we end up picking the fastest execution
> plan (i.e. we don't make a mistake by e.g. switching to the non-parallel
> one with pushdown).
> 
> My understanding is that the aggregate pushdown can't (currently?) be
> used with queries that use partial aggregate because of parallelism.
> That is we can either end up with a plan like this:
> 
>   Finalize Aggregate
>     -> Join
>       -> Partial Aggregate
>       -> ...
> 
> or a parallel plan like this:
> 
>   Finalize Aggregate
>     -> Gather
>         -> Partial Aggregate
>             -> Join
>                 -> ...
>                 -> ...
> 
> but we currently don't support a combination of both
> 
>   Finalize Aggregate
>     -> Gather
>         -> Join
>             -> Partial Aggregate
>             -> ...
> 
> I wonder if that's actually even possible and what would it take to make
> it work?

I had a prototype of the feature that does affect parallel queries (as well as
partitioned tables and postgres_fdw), but didn't include these parts in the
recent patch versions. The point is that the API to store RelOptInfos can
still change and in such a case I'd have to rebase too much code.

v15 is attached. Actually there are no significant changes relative to v14,
but v14 does not apply to the current master:

error: patch failed: src/test/regress/serial_schedule:140
error: src/test/regress/serial_schedule: patch does not apply

This is interesting because no problem is currently reported at
http://commitfest.cputube.org/

[1] https://www.postgresql.org/message-id/9726.1542577439@sss.pgh.pa.us

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com
Attachment

pgsql-hackers by date:
From: Tom Lane
Date: 16 January 2020, 15:21:02
Subject: Re: empty range
From: Tom Lane
Date: 16 January 2020, 15:27:01
Subject: Re: SlabCheck leaks memory into TopMemoryContext
Re: [HACKERS] WIP: Aggregation push-down - Mailing list pgsql-hackers

Attachment

Previous

Next