[GENERAL] Query plan for Merge Semi Join - Mailing list pgsql-general
From | Peter J. Holzer |
---|---|
Subject | [GENERAL] Query plan for Merge Semi Join |
Date | |
Msg-id | 20171101092425.4m2wttlnf7tsv4qw@hjp.at Whole thread Raw |
Responses |
Re: [GENERAL] Query plan for Merge Semi Join
|
List | pgsql-general |
[PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit] While investigating a performance issue, I found this query plan: wds=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register, c.id, canonicalname,description, parent, type, r.sortorder from term t, concept c, relation r where facttablename='facttable_kon_eh'and columnname='arbeitsvolumen' and exists (select 1 from facttable_kon_eh f wheref.arbeitsvolumen=t.term and thema in (values (E'E'))) and c.id=concept_id and r.child=concept_id; ╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ QUERY PLAN ║ ╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢ ║ Nested Loop (cost=22984.13..241822.98 rows=1 width=169) (actual time=667.608..3275.037 rows=6 loops=1) ║ ║ -> Nested Loop (cost=22983.70..241822.50 rows=1 width=154) (actual time=667.587..3274.972 rows=6 loops=1) ║ ║ -> Merge Semi Join (cost=22983.27..241819.04 rows=1 width=87) (actual time=667.559..3274.877 rows=6 loops=1) ║ ║ Merge Cond: ((t.term)::text = (f.arbeitsvolumen)::text) ║ ║ -> Index Scan using term_term_idx on term t (cost=0.56..206841.09 rows=18 width=87) (actual time=667.467..2929.430rows=7 loops=1) ║ ║ Filter: (((facttablename)::text = 'facttable_kon_eh'::text) AND ((columnname)::text = 'arbeitsvolumen'::text)) ║ ║ Rows Removed by Filter: 3874190 ║ ║ -> Materialize (cost=0.43..399167.10 rows=3548798 width=7) (actual time=0.086..295.708 rows=184791 loops=1) ║ ║ -> Nested Loop Semi Join (cost=0.43..390295.10 rows=3548798 width=7) (actual time=0.076..278.006rows=184791 loops=1) ║ ║ Join Filter: ((f.thema)::text = ('E'::text)) ║ ║ -> Index Scan using facttable_kon_eh_arbeitsvolumen_idx on facttable_kon_eh f (cost=0.43..337063.11rows=3548798 width=9) (actual time=0.052..199.733 rows=184791 loops=1) ║ ║ -> Materialize (cost=0.00..0.03 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=184791) ║ ║ -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) ║ ║ -> Index Scan using concept_pkey on concept c (cost=0.43..3.45 rows=1 width=67) (actual time=0.011..0.012 rows=1loops=6) ║ ║ Index Cond: (id = t.concept_id) ║ ║ -> Index Scan using relation_child_idx on relation r (cost=0.43..0.47 rows=1 width=19) (actual time=0.008..0.009 rows=1loops=6) ║ ║ Index Cond: (child = c.id) ║ ║ Planning time: 15.624 ms ║ ║ Execution time: 3275.341 ms ║ ╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ That is almost certainly not ideal, but this is not my question. My question is what does that merge semi join actually do? In general a merge join needs two inputs sorted by the merge key. It walks both in parallel and joins matching lines. Correct? The first input is the index scan. The second is the output of the materialize. Since we need only the column arbeitsvolumen this would be something like select arbeitsvolumen from facttable_kon_eh where thema='E' orderby arbeitsvolumen; So far so good. But there are a few things I don't understand: Where does Rows Removed by Filter: 3874190 come from? The number doesn't match any count I can come up with: It is a bit larger than the total number of rows where term is not null but smaller than the total number of rows where the filter doesn't match. And it is much larger than the number of rows I would expect if the merge stopped once there could not be a possible match any more. And does it really check the filter condition even for rows that don't satisfy the merge condition? Of course it makes sense from a modularization point of view, but that's a lot of random accesses, most of which are unneccessary. The materialize returns 184791 rows. This one I understand: There are 6 non-null distinct values of arbeitsvolumen in facttable_kon_eh, and each appears 36958 times. 36958 * 5 + 1 = 184791. So it stops once it reaches the largest value. Although now I'm wondering how it knows that this is the largest value without scanning to the end). hp - - _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
pgsql-general by date: