parallel append vs. simple UNION ALL - Mailing list pgsql-hackers

From Robert Haas
Subject parallel append vs. simple UNION ALL
Date
Msg-id CA+Tgmoa6L9A1nNCk3aTDVZLZ4KkHDn1+tm7mFyFvP+uQPS7bAg@mail.gmail.com
Whole thread Raw
Responses Re: parallel append vs. simple UNION ALL  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
As I mentioned in the commit message for the Parallel Append commit
(ab72716778128fb63d54ac256adf7fe6820a1185), it's kind of sad that this
doesn't work with UNION ALL queries, which are an obvious candidate
for such parallelization.  It turns out that it actually does work to
a limited degree: assuming that the UNION ALL query can be converted
to a simple appendrel, it can consider a parallel append of
non-partial paths only.  The attached patch lets it consider a
parallel append of partial paths by doing the following things:

1. Teaching set_subquery_pathlist to create *partial* SubqueryScan
paths as well as non-partial ones.
2. Teaching grouping_planner to create partial paths for the final rel
if not at the outermost query level.
3. Modifying finalize_plan to allow the gather_param to be passed
across subquery boundaries.

#3 is the only part I'm really unsure about; the other stuff looks
pretty cut and dried.

I have a draft patch that handles the case where the union can't be
converted to a simple appendrel, too, but that's not quite baked
enough to post yet.

For those for whom the above may be too technical to follow, here's an example:

pgbench -i 40
explain (costs off) select a.bid from pgbench_accounts a,
pgbench_branches b where a.bid = b.bid and aid % 1000 = 0 union all
select a.bid from pgbench_accounts a where aid % 1000 = 0;

Unpatched:

 Append
   ->  Gather
         Workers Planned: 2
         ->  Hash Join
               Hash Cond: (a.bid = b.bid)
               ->  Parallel Seq Scan on pgbench_accounts a
                     Filter: ((aid % 1000) = 0)
               ->  Hash
                     ->  Seq Scan on pgbench_branches b
   ->  Gather
         Workers Planned: 2
         ->  Parallel Seq Scan on pgbench_accounts a_1
               Filter: ((aid % 1000) = 0)

Patched:

 Gather
   Workers Planned: 2
   ->  Parallel Append
         ->  Hash Join
               Hash Cond: (a.bid = b.bid)
               ->  Parallel Seq Scan on pgbench_accounts a
                     Filter: ((aid % 1000) = 0)
               ->  Hash
                     ->  Seq Scan on pgbench_branches b
         ->  Parallel Seq Scan on pgbench_accounts a_1
               Filter: ((aid % 1000) = 0)

In this particular case the change doesn't buy very much, but the
second plan is better because avoid shutting down one set of workers
and starting a new set.  That's more efficient, plus it allows the two
branches to be worked in parallel rather than serially.  On a small
enough scale factor, even without the patch, you get this...

 Gather
   Workers Planned: 2
   ->  Parallel Append
         ->  Nested Loop
               Join Filter: (a.bid = b.bid)
               ->  Seq Scan on pgbench_branches b
               ->  Seq Scan on pgbench_accounts a
                     Filter: ((aid % 1000) = 0)
         ->  Seq Scan on pgbench_accounts a_1
               Filter: ((aid % 1000) = 0)

...but that's not good because now we have regular sequential scans
instead of partial sequential scans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: konstantin knizhnik
Date:
Subject: Re: AS OF queries
Next
From: Erik Rijkers
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions