Thread: Merge Joins and Views

Merge Joins and Views

From
Chris Mayfield
Date:
Hello,

I have a scenario with two tables, one with 5M rows and the other with
about 3.7M (a subset taken from the first table).  Each is clustered
using its primary key (a single bigint column), and pg_stats shows that
the id's correlation is 1 for both tables.  In addition, I have a view
over the 3.7M table that coalesces some columns that allow nulls.

So I'm running a simple query that does a left outer join from the 5M to
the 3.7M, which basically combines the information between the two (and
results in 5M rows, of course).  It seems to me that the best plan
should involve two index scans and a merge join.  However, I get
different plans depending on whether I use the view or the underlying
table directly, and even the use of ORDER BY -- see examples below for
details.

I don't know if this is a bug (I'm using version 8.3.0), but can anyone
please explain why the optimizer (or rule system?) behaves this way?

Thank you,
--Chris


-----Example 1:-----
SELECT * FROM a LEFT OUTER JOIN b ON (a.id = b.id);

Merge Left Join  (cost=43.99..353657.21 rows=5001671 width=106) (actual
time=0.653..32529.319 rows=5000000 loops=1)
   Merge Cond: (a.id = b.id)
   ->  Index Scan using a_pkey on a  (cost=0.00..173752.86 rows=5001671
width=81) (actual time=0.353..9754.375 rows=5000000 loops=1)
   ->  Index Scan using b_pkey on b  (cost=0.00..120980.85 rows=3713546
width=25) (actual time=0.279..8120.104 rows=3711523 loops=1) Total
runtime: 33836.167 ms


-----Example 2:-----
-- v is a view that does SELECT ... FROM b;
SELECT * FROM a LEFT OUTER JOIN v ON (a.id = v.id);

Merge Left Join  (cost=580217.86..822178.09 rows=5001671 width=100)
(actual time=34260.004..67869.059 rows=5000000 loops=1)
   Merge Cond: (a.id = b.id)
   ->  Index Scan using a_pkey on a  (cost=0.00..173752.86 rows=5001671
width=81) (actual time=0.270..10104.528 rows=5000000 loops=1)
   ->  Materialize  (cost=580217.86..626637.18 rows=3713546 width=19)
(actual time=34259.696..43199.389 rows=3711523 loops=1)
         ->  Sort  (cost=580217.86..589501.72 rows=3713546 width=19)
(actual time=34259.679..39448.310 rows=3711523 loops=1)
               Sort Key: b.id
               Sort Method:  external sort  Disk: 136632kB
               ->  Seq Scan on b  (cost=0.00..61693.46 rows=3713546
width=25) (actual time=0.094..10224.402 rows=3711523 loops=1) Total
runtime: 69202.529 ms


-----Example 3:-----
SELECT * FROM a LEFT OUTER JOIN (
   SELECT * FROM v ORDER BY id
) sub ON (a.id = sub.id);

Merge Right Join  (cost=0.00..390792.67 rows=5001671 width=100) (actual
time=0.497..38120.694 rows=5000000 loops=1)
   Merge Cond: (b.id = a.id)
   ->  Index Scan using b_pkey on b  (cost=0.00..120980.85 rows=3713546
width=25) (actual time=0.262..13686.064 rows=3711523 loops=1)
   ->  Index Scan using a_pkey on a  (cost=0.00..173752.86 rows=5001671
width=81) (actual time=0.219..11233.746 rows=5000000 loops=1) Total
runtime: 39467.843 ms

Re: Merge Joins and Views

From
Tom Lane
Date:
Chris Mayfield <cmayfiel@cs.purdue.edu> writes:
> [ planner finds better plan with a forced ORDER BY ]

That shouldn't happen.  Can you show the details of your case?
It may be something specific to the particular view definition...

            regards, tom lane

Re: Merge Joins and Views

From
Chris Mayfield
Date:
See attached -- I've simplified my actual database quite a bit, but this
example shows the same results.

Thanks,
--Chris
                             version
------------------------------------------------------------------
 PostgreSQL 8.3.0 on sparc-sun-solaris2.8, compiled by GCC 2.95.2
(1 row)

DROP VIEW
DROP TABLE
DROP TABLE
CREATE TABLE
CREATE TABLE
CREATE VIEW
INSERT 0 5000000
INSERT 0 3711523
ANALYZE
ANALYZE
 schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |
                                                                                    histogram_bounds
                                                                         | correlation  

------------+-----------+---------+-----------+-----------+------------+------------------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 public     | a         | id      |         0 |         4 |         -1 |                  |                   |
{226,480817,946403,1463901,1905168,2486162,2964834,3411486,3947522,4446167,4996780}
                                                                                 |           1 
 public     | a         | val     |         0 |         8 |         -1 |                  |                   |
{0.00023875804618001,0.0914572249166667,0.189253146760166,0.282982839271426,0.393971057608724,0.491479988675565,0.592469296883792,0.693580291699618,0.803486418910325,0.899317930918187,0.999949590768665}
| -0.0345742 
 public     | b         | id      |         0 |         4 |         -1 |                  |                   |
{2380,409226,804058,1186283,1525765,1874817,2199262,2566896,2939230,3316455,3709638}
                                                                                 |           1 
 public     | b         | opt     |  0.503667 |         8 |         -1 |                  |                   |
{0.000438648741692305,0.0946335387416184,0.194745551329106,0.308890894055367,0.403113955631852,0.50895657017827,0.62006954383105,0.724281970411539,0.805469979997724,0.907830006908625,0.999940330628306}
|    0.034033 
(4 rows)

                                                              QUERY PLAN
              

--------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=0.00..330371.44 rows=5000180 width=24) (actual time=0.319..30850.276 rows=5000000 loops=1)
   Merge Cond: (a.id = b.id)
   ->  Index Scan using a_pkey on a  (cost=0.00..156882.50 rows=5000180 width=12) (actual time=0.244..12665.648
rows=5000000loops=1) 
   ->  Index Scan using b_pkey on b  (cost=0.00..114600.84 rows=3711012 width=12) (actual time=0.061..7336.846
rows=3711523loops=1) 
 Total runtime: 32191.735 ms
(5 rows)

                                                              QUERY PLAN
              

--------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=560793.89..785842.02 rows=5000180 width=24) (actual time=23542.157..55124.203 rows=5000000
loops=1)
   Merge Cond: (a.id = b.id)
   ->  Index Scan using a_pkey on a  (cost=0.00..156882.50 rows=5000180 width=12) (actual time=0.282..12397.933
rows=5000000loops=1) 
   ->  Materialize  (cost=560793.89..607181.54 rows=3711012 width=12) (actual time=23541.845..31825.216 rows=3711523
loops=1)
         ->  Sort  (cost=560793.89..570071.42 rows=3711012 width=12) (actual time=23541.833..28215.551 rows=3711523
loops=1)
               Sort Key: b.id
               Sort Method:  external sort  Disk: 116056kB
               ->  Seq Scan on b  (cost=0.00..55326.12 rows=3711012 width=12) (actual time=0.073..4694.892 rows=3711523
loops=1)
 Total runtime: 56409.946 ms
(9 rows)

                                                             QUERY PLAN
             

-------------------------------------------------------------------------------------------------------------------------------------
 Merge Right Join  (cost=0.00..367481.56 rows=5000180 width=24) (actual time=1.794..28075.029 rows=5000000 loops=1)
   Merge Cond: (b.id = a.id)
   ->  Index Scan using b_pkey on b  (cost=0.00..114600.84 rows=3711012 width=12) (actual time=0.322..8132.872
rows=3711523loops=1) 
   ->  Index Scan using a_pkey on a  (cost=0.00..156882.50 rows=5000180 width=12) (actual time=1.457..9714.089
rows=5000000loops=1) 
 Total runtime: 29366.349 ms
(5 rows)


Attachment

Re: Merge Joins and Views

From
Tom Lane
Date:
Chris Mayfield <cmayfiel@cs.purdue.edu> writes:
> See attached -- I've simplified my actual database quite a bit, but this
> example shows the same results.

OK, here's the problem:

> CREATE VIEW v AS
>   SELECT id, COALESCE(opt, 0) AS opt FROM b;

You're using this inside the nullable side of an outer join, and that
means the COALESCE() creates a problem: its output won't go to null
just because "opt" does.  So the COALESCE has to be evaluated below
the outer join, which means that the view can't be "flattened" into
the upper query.  You end up with a dumb seqscan that corresponds to
planning the view in isolation, and then the best way of joining that
with the other table is going to be the sort and merge join.

In the case where you introduce the intermediate sub-select, the
view *can* be flattened into that, producing
    SELECT id, COALESCE(opt, 0) AS opt FROM b ORDER BY id
Again, that can't be flattened into the top query, but looking at
it in isolation the planner chooses an indexscan as the best plan
(by no means a sure thing, but it will do it if the index correlation
is high).  And then the mergejoin without sort falls out from that.

So the long and the short of it is that the COALESCE acts as an
optimization fence in the presence of outer joins.  We've seen this
before and there are some rough ideas about fixing it.  (In fact,
I thought it was on the TODO list, but I can't find an entry now.)
Don't hold your breath though --- it'll take major planner surgery.

            regards, tom lane

Re: Merge Joins and Views

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> In the case where you introduce the intermediate sub-select, the
> view *can* be flattened into that, producing
>     SELECT id, COALESCE(opt, 0) AS opt FROM b ORDER BY id
> Again, that can't be flattened into the top query, but looking at
> it in isolation the planner chooses an indexscan as the best plan
> (by no means a sure thing, but it will do it if the index correlation
> is high).  And then the mergejoin without sort falls out from that.
>
> So the long and the short of it is that the COALESCE acts as an
> optimization fence in the presence of outer joins.  We've seen this
> before and there are some rough ideas about fixing it.  (In fact,
> I thought it was on the TODO list, but I can't find an entry now.)
> Don't hold your breath though --- it'll take major planner surgery.

In this case isn't all the planner needs the pathkey list to give it a hint
that that ordering might be useful? It can't re-order the join but it can
still try to produce those rows in any order which can be useful for the upper
joins.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: Merge Joins and Views

From
Chris Mayfield
Date:
Thank you for your prompt reply, I appreciate your insight on this.

 > So the COALESCE has to be evaluated below the outer join, which means
 > that the view can't be "flattened" into the upper query.
 > ...
 > So the long and the short of it is that the COALESCE acts as an
 > optimization fence in the presence of outer joins.  We've seen this
 > before and there are some rough ideas about fixing it.

You may already have this rough idea somewhere, but it seems to me that
the view could be flattened into the upper query as long as the join
predicates don't depend on coalesced columns.  In the examples I sent,
even if the COALESCE is evaluated at the very end of the query, the
merge join (on the id columns) would still be correct.

--Chris

Re: Merge Joins and Views

From
Gregory Stark
Date:
"Chris Mayfield" <cmayfiel@cs.purdue.edu> writes:

> You may already have this rough idea somewhere, but it seems to me that the
> view could be flattened into the upper query as long as the join predicates
> don't depend on coalesced columns.  In the examples I sent, even if the
> COALESCE is evaluated at the very end of the query, the merge join (on the id
> columns) would still be correct.

I don't remember the example but that's not true in general, consider
something like

select *
  from a
left outer join (
  select *, coalesce(b.x,'foo') from b
  ) as b
 using (a.y=b.y)

The coalesce() doesn't depend on any join predicates, but if it's evaluated
late it will be 'foo' for all the non-matching records instead of NULL.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: Merge Joins and Views

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Don't hold your breath though --- it'll take major planner surgery.

> In this case isn't all the planner needs the pathkey list to give it a hint
> that that ordering might be useful?

You could maybe make that work if you were willing to speculatively
re-plan the entire subquery for each potentially useful ordering.
I think that would chew an unacceptable number of cycles though.
The upper planner doesn't have any clue about what indexes are available
in the lower query, so it would end up requesting a lot of useless
re-plans.

In any case the nullable-targetlist restriction causes a whole lot
of other problems that this wouldn't address.  I'd rather spend my
time on solving the more general problem.

            regards, tom lane

Re: Merge Joins and Views

From
Tom Lane
Date:
Chris Mayfield <cmayfiel@cs.purdue.edu> writes:
>>> So the long and the short of it is that the COALESCE acts as an
>>> optimization fence in the presence of outer joins.  We've seen this
>>> before and there are some rough ideas about fixing it.

> You may already have this rough idea somewhere, but it seems to me that
> the view could be flattened into the upper query as long as the join
> predicates don't depend on coalesced columns.  In the examples I sent,
> even if the COALESCE is evaluated at the very end of the query, the
> merge join (on the id columns) would still be correct.

But the output would not be: the join column would fail to go to null
when it was supposed to.  See the example that made us put in that
restriction in the first place:
http://archives.postgresql.org/pgsql-bugs/2001-04/msg00223.php

            regards, tom lane