Merge Joins and Views - Mailing list pgsql-general

From Chris Mayfield
Subject Merge Joins and Views
Date
Msg-id fsgv8t$1c5p$1@news.hub.org
Whole thread Raw
Responses Re: Merge Joins and Views
List pgsql-general
Hello,

I have a scenario with two tables, one with 5M rows and the other with
about 3.7M (a subset taken from the first table).  Each is clustered
using its primary key (a single bigint column), and pg_stats shows that
the id's correlation is 1 for both tables.  In addition, I have a view
over the 3.7M table that coalesces some columns that allow nulls.

So I'm running a simple query that does a left outer join from the 5M to
the 3.7M, which basically combines the information between the two (and
results in 5M rows, of course).  It seems to me that the best plan
should involve two index scans and a merge join.  However, I get
different plans depending on whether I use the view or the underlying
table directly, and even the use of ORDER BY -- see examples below for
details.

I don't know if this is a bug (I'm using version 8.3.0), but can anyone
please explain why the optimizer (or rule system?) behaves this way?

Thank you,
--Chris


-----Example 1:-----
SELECT * FROM a LEFT OUTER JOIN b ON (a.id = b.id);

Merge Left Join  (cost=43.99..353657.21 rows=5001671 width=106) (actual
time=0.653..32529.319 rows=5000000 loops=1)
   Merge Cond: (a.id = b.id)
   ->  Index Scan using a_pkey on a  (cost=0.00..173752.86 rows=5001671
width=81) (actual time=0.353..9754.375 rows=5000000 loops=1)
   ->  Index Scan using b_pkey on b  (cost=0.00..120980.85 rows=3713546
width=25) (actual time=0.279..8120.104 rows=3711523 loops=1) Total
runtime: 33836.167 ms


-----Example 2:-----
-- v is a view that does SELECT ... FROM b;
SELECT * FROM a LEFT OUTER JOIN v ON (a.id = v.id);

Merge Left Join  (cost=580217.86..822178.09 rows=5001671 width=100)
(actual time=34260.004..67869.059 rows=5000000 loops=1)
   Merge Cond: (a.id = b.id)
   ->  Index Scan using a_pkey on a  (cost=0.00..173752.86 rows=5001671
width=81) (actual time=0.270..10104.528 rows=5000000 loops=1)
   ->  Materialize  (cost=580217.86..626637.18 rows=3713546 width=19)
(actual time=34259.696..43199.389 rows=3711523 loops=1)
         ->  Sort  (cost=580217.86..589501.72 rows=3713546 width=19)
(actual time=34259.679..39448.310 rows=3711523 loops=1)
               Sort Key: b.id
               Sort Method:  external sort  Disk: 136632kB
               ->  Seq Scan on b  (cost=0.00..61693.46 rows=3713546
width=25) (actual time=0.094..10224.402 rows=3711523 loops=1) Total
runtime: 69202.529 ms


-----Example 3:-----
SELECT * FROM a LEFT OUTER JOIN (
   SELECT * FROM v ORDER BY id
) sub ON (a.id = sub.id);

Merge Right Join  (cost=0.00..390792.67 rows=5001671 width=100) (actual
time=0.497..38120.694 rows=5000000 loops=1)
   Merge Cond: (b.id = a.id)
   ->  Index Scan using b_pkey on b  (cost=0.00..120980.85 rows=3713546
width=25) (actual time=0.262..13686.064 rows=3711523 loops=1)
   ->  Index Scan using a_pkey on a  (cost=0.00..173752.86 rows=5001671
width=81) (actual time=0.219..11233.746 rows=5000000 loops=1) Total
runtime: 39467.843 ms

pgsql-general by date:

Previous
From: David T
Date:
Subject: Persistent user-defined functions
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Persistent user-defined functions