Re: parallel append vs. simple UNION ALL - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: parallel append vs. simple UNION ALL
Date
Msg-id CAFjFpRfWAYmHfCOzKg+46y1tCfDkg0jpFahigSMroc9GZQe3Rw@mail.gmail.com
Whole thread Raw
In response to Re: parallel append vs. simple UNION ALL  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: parallel append vs. simple UNION ALL  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, Feb 24, 2018 at 2:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:

>
> Here's an extended series of patches that now handles both the simple
> UNION ALL case (where we flatten it) and the unflattened case:
>

The patches look clean. I particularly looked at 0003.

patch 0001
+    /*
+     * Generate partial paths for final_rel, too, if outer query levels might
+     * be able to make use of them.
+     */
I am not able to understand the construct esp. the if clause. Did you want to
say "... if there are outer query levels. Those might ..." or something like
that?

0002
                (op->all == top_union->all || op->all) &&
This isn't really your change.  Checking
op->all is cheaper than checking equality, so may be we should check that first
and take advantage of short-circuit condition evaluation. If we do that above
condition reduces to (op->all || !top_union->all) which is two boolean
conditions, even cheaper. But may be the second optimization is not worth the
loss of readability.

"identically-propertied UNIONs" may be "UNIONs with identical properties".

0003
Probably we want to rename generate_union_path() as generate_union_rel() or
generate_union_paths() since the function doesn't return a path anymore.
Similarly for generate_nonunion_path().

In recurse_set_operations()
-        return NULL;            /* keep compiler quiet */
This line is deleted and instead rel is initialized to NULL. That way we loose
any chance to detect a future bug because of a block leaving rel uninitialized
through compiler warning. May be we should replace "return NULL" with "rel =
NULL", which will not be executed because of the error.

+    /* Build path list and relid set. */
+    foreach(lc, rellist)
+    {
With the changes in this patch, we could actually use add_paths_to_append_rel()
to create an append path. That function builds paths with different pathkeys,
parameterization (doesn't matter here) and also handles parallel append. So we
can avoid code duplication and also leverage more optimizations like using
MergeAppend instead of overall sort etc. But that function doesn't have ability
to add a final node like make_union_unique(). A similar requirement has arisen
in partition-wise join where we need to add a final node for finalising
aggregate on top of paths created by add_paths_to_append_rel().  May be we can
change that function to return a list of paths, which are then finalized by the
caller and added to "append" rel. But I don't think doing all that is in the
scope of this patch set.

0004
+        if (!op->all)
+            ppath = make_union_unique(op, ppath, tlist, root);
We could probably push the grouping/sorting down to the parallel workers. But
again not part of this patchset.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: [PoC PATCH] Parallel dump to /dev/null
Next
From: David Steele
Date:
Subject: Re: 2018-03 Commitfest starts tomorrow