Re: UNION ALL - Mailing list pgsql-hackers

From Tom Lane
Subject Re: UNION ALL
Date
Msg-id 21588.1565969450@sss.pgh.pa.us
Whole thread Raw
In response to Re: UNION ALL  (Mark Pasterkamp <markpasterkamp1994@gmail.com>)
List pgsql-hackers
Mark Pasterkamp <markpasterkamp1994@gmail.com> writes:
> I am comparing two queries, q1 and q2 respectively.
> Query q1 is the original query and q2 is an attempt to reduce the cost of
> execution via leveraging the materialized view ci_t_15.
> ...
> Running explain analyze on both queries I get the following execution plans.

Huh ... I wonder why the planner decided to try to parallelize Q2 (and not
Q1)?  That seems like a pretty stupid choice, because if I'm reading the
plan correctly (I might not be) each worker process has to read all of
the "title" table and build its own copy of the hash table.  That seems
likely to swamp whatever performance gain might come from parallelizing
the scan of cast_info --- which is likely to be not much, anyway, on a
laptop with probably-not-great disk I/O bandwidth.

In any case, whether that decision was good or bad, making it differently
renders the performance of Q1 and Q2 not very comparable.  It'd be worth
disabling parallelism (SET max_parallel_workers_per_gather = 0) and
retrying Q2 to get a more apples-to-apples comparison.

Another bit of advice is to increase work_mem, so the hashes don't
have to be split into quite so many batches.

I'm noting also that your queries aren't giving the same results ---
Q2 reports returning fewer rows overall.  Do you have rows where
title.production_year is null, perhaps?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Ibrar Ahmed
Date:
Subject: Re: block-level incremental backup
Next
From: Anastasia Lubennikova
Date:
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.