Home > mailing lists

Re: UNION ALL - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: UNION ALL
Date	August 16, 2019 18:30:50
Msg-id	21588.1565969450@sss.pgh.pa.us Whole thread Raw
In response to	Re: UNION ALL (Mark Pasterkamp <markpasterkamp1994@gmail.com>)
List	pgsql-hackers

Tree view

Mark Pasterkamp <markpasterkamp1994@gmail.com> writes:
> I am comparing two queries, q1 and q2 respectively.
> Query q1 is the original query and q2 is an attempt to reduce the cost of
> execution via leveraging the materialized view ci_t_15.
> ...
> Running explain analyze on both queries I get the following execution plans.

Huh ... I wonder why the planner decided to try to parallelize Q2 (and not
Q1)?  That seems like a pretty stupid choice, because if I'm reading the
plan correctly (I might not be) each worker process has to read all of
the "title" table and build its own copy of the hash table.  That seems
likely to swamp whatever performance gain might come from parallelizing
the scan of cast_info --- which is likely to be not much, anyway, on a
laptop with probably-not-great disk I/O bandwidth.

In any case, whether that decision was good or bad, making it differently
renders the performance of Q1 and Q2 not very comparable.  It'd be worth
disabling parallelism (SET max_parallel_workers_per_gather = 0) and
retrying Q2 to get a more apples-to-apples comparison.

Another bit of advice is to increase work_mem, so the hashes don't
have to be split into quite so many batches.

I'm noting also that your queries aren't giving the same results ---
Q2 reports returning fewer rows overall.  Do you have rows where
title.production_year is null, perhaps?

            regards, tom lane

pgsql-hackers by date:

From: Ibrar Ahmed
Date: 16 August 2019, 17:36:50
Subject: Re: block-level incremental backup

From: Anastasia Lubennikova
Date: 16 August 2019, 18:56:12
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

Re: UNION ALL - Mailing list pgsql-hackers

Previous

Next