Thread: Merge Joins and Views
Hello, I have a scenario with two tables, one with 5M rows and the other with about 3.7M (a subset taken from the first table). Each is clustered using its primary key (a single bigint column), and pg_stats shows that the id's correlation is 1 for both tables. In addition, I have a view over the 3.7M table that coalesces some columns that allow nulls. So I'm running a simple query that does a left outer join from the 5M to the 3.7M, which basically combines the information between the two (and results in 5M rows, of course). It seems to me that the best plan should involve two index scans and a merge join. However, I get different plans depending on whether I use the view or the underlying table directly, and even the use of ORDER BY -- see examples below for details. I don't know if this is a bug (I'm using version 8.3.0), but can anyone please explain why the optimizer (or rule system?) behaves this way? Thank you, --Chris -----Example 1:----- SELECT * FROM a LEFT OUTER JOIN b ON (a.id = b.id); Merge Left Join (cost=43.99..353657.21 rows=5001671 width=106) (actual time=0.653..32529.319 rows=5000000 loops=1) Merge Cond: (a.id = b.id) -> Index Scan using a_pkey on a (cost=0.00..173752.86 rows=5001671 width=81) (actual time=0.353..9754.375 rows=5000000 loops=1) -> Index Scan using b_pkey on b (cost=0.00..120980.85 rows=3713546 width=25) (actual time=0.279..8120.104 rows=3711523 loops=1) Total runtime: 33836.167 ms -----Example 2:----- -- v is a view that does SELECT ... FROM b; SELECT * FROM a LEFT OUTER JOIN v ON (a.id = v.id); Merge Left Join (cost=580217.86..822178.09 rows=5001671 width=100) (actual time=34260.004..67869.059 rows=5000000 loops=1) Merge Cond: (a.id = b.id) -> Index Scan using a_pkey on a (cost=0.00..173752.86 rows=5001671 width=81) (actual time=0.270..10104.528 rows=5000000 loops=1) -> Materialize (cost=580217.86..626637.18 rows=3713546 width=19) (actual time=34259.696..43199.389 rows=3711523 loops=1) -> Sort (cost=580217.86..589501.72 rows=3713546 width=19) (actual time=34259.679..39448.310 rows=3711523 loops=1) Sort Key: b.id Sort Method: external sort Disk: 136632kB -> Seq Scan on b (cost=0.00..61693.46 rows=3713546 width=25) (actual time=0.094..10224.402 rows=3711523 loops=1) Total runtime: 69202.529 ms -----Example 3:----- SELECT * FROM a LEFT OUTER JOIN ( SELECT * FROM v ORDER BY id ) sub ON (a.id = sub.id); Merge Right Join (cost=0.00..390792.67 rows=5001671 width=100) (actual time=0.497..38120.694 rows=5000000 loops=1) Merge Cond: (b.id = a.id) -> Index Scan using b_pkey on b (cost=0.00..120980.85 rows=3713546 width=25) (actual time=0.262..13686.064 rows=3711523 loops=1) -> Index Scan using a_pkey on a (cost=0.00..173752.86 rows=5001671 width=81) (actual time=0.219..11233.746 rows=5000000 loops=1) Total runtime: 39467.843 ms
Chris Mayfield <cmayfiel@cs.purdue.edu> writes: > [ planner finds better plan with a forced ORDER BY ] That shouldn't happen. Can you show the details of your case? It may be something specific to the particular view definition... regards, tom lane
See attached -- I've simplified my actual database quite a bit, but this example shows the same results. Thanks, --Chris version ------------------------------------------------------------------ PostgreSQL 8.3.0 on sparc-sun-solaris2.8, compiled by GCC 2.95.2 (1 row) DROP VIEW DROP TABLE DROP TABLE CREATE TABLE CREATE TABLE CREATE VIEW INSERT 0 5000000 INSERT 0 3711523 ANALYZE ANALYZE schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------+---------+-----------+-----------+------------+------------------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------- public | a | id | 0 | 4 | -1 | | | {226,480817,946403,1463901,1905168,2486162,2964834,3411486,3947522,4446167,4996780} | 1 public | a | val | 0 | 8 | -1 | | | {0.00023875804618001,0.0914572249166667,0.189253146760166,0.282982839271426,0.393971057608724,0.491479988675565,0.592469296883792,0.693580291699618,0.803486418910325,0.899317930918187,0.999949590768665} | -0.0345742 public | b | id | 0 | 4 | -1 | | | {2380,409226,804058,1186283,1525765,1874817,2199262,2566896,2939230,3316455,3709638} | 1 public | b | opt | 0.503667 | 8 | -1 | | | {0.000438648741692305,0.0946335387416184,0.194745551329106,0.308890894055367,0.403113955631852,0.50895657017827,0.62006954383105,0.724281970411539,0.805469979997724,0.907830006908625,0.999940330628306} | 0.034033 (4 rows) QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=0.00..330371.44 rows=5000180 width=24) (actual time=0.319..30850.276 rows=5000000 loops=1) Merge Cond: (a.id = b.id) -> Index Scan using a_pkey on a (cost=0.00..156882.50 rows=5000180 width=12) (actual time=0.244..12665.648 rows=5000000loops=1) -> Index Scan using b_pkey on b (cost=0.00..114600.84 rows=3711012 width=12) (actual time=0.061..7336.846 rows=3711523loops=1) Total runtime: 32191.735 ms (5 rows) QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=560793.89..785842.02 rows=5000180 width=24) (actual time=23542.157..55124.203 rows=5000000 loops=1) Merge Cond: (a.id = b.id) -> Index Scan using a_pkey on a (cost=0.00..156882.50 rows=5000180 width=12) (actual time=0.282..12397.933 rows=5000000loops=1) -> Materialize (cost=560793.89..607181.54 rows=3711012 width=12) (actual time=23541.845..31825.216 rows=3711523 loops=1) -> Sort (cost=560793.89..570071.42 rows=3711012 width=12) (actual time=23541.833..28215.551 rows=3711523 loops=1) Sort Key: b.id Sort Method: external sort Disk: 116056kB -> Seq Scan on b (cost=0.00..55326.12 rows=3711012 width=12) (actual time=0.073..4694.892 rows=3711523 loops=1) Total runtime: 56409.946 ms (9 rows) QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Merge Right Join (cost=0.00..367481.56 rows=5000180 width=24) (actual time=1.794..28075.029 rows=5000000 loops=1) Merge Cond: (b.id = a.id) -> Index Scan using b_pkey on b (cost=0.00..114600.84 rows=3711012 width=12) (actual time=0.322..8132.872 rows=3711523loops=1) -> Index Scan using a_pkey on a (cost=0.00..156882.50 rows=5000180 width=12) (actual time=1.457..9714.089 rows=5000000loops=1) Total runtime: 29366.349 ms (5 rows)
Attachment
Chris Mayfield <cmayfiel@cs.purdue.edu> writes: > See attached -- I've simplified my actual database quite a bit, but this > example shows the same results. OK, here's the problem: > CREATE VIEW v AS > SELECT id, COALESCE(opt, 0) AS opt FROM b; You're using this inside the nullable side of an outer join, and that means the COALESCE() creates a problem: its output won't go to null just because "opt" does. So the COALESCE has to be evaluated below the outer join, which means that the view can't be "flattened" into the upper query. You end up with a dumb seqscan that corresponds to planning the view in isolation, and then the best way of joining that with the other table is going to be the sort and merge join. In the case where you introduce the intermediate sub-select, the view *can* be flattened into that, producing SELECT id, COALESCE(opt, 0) AS opt FROM b ORDER BY id Again, that can't be flattened into the top query, but looking at it in isolation the planner chooses an indexscan as the best plan (by no means a sure thing, but it will do it if the index correlation is high). And then the mergejoin without sort falls out from that. So the long and the short of it is that the COALESCE acts as an optimization fence in the presence of outer joins. We've seen this before and there are some rough ideas about fixing it. (In fact, I thought it was on the TODO list, but I can't find an entry now.) Don't hold your breath though --- it'll take major planner surgery. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > In the case where you introduce the intermediate sub-select, the > view *can* be flattened into that, producing > SELECT id, COALESCE(opt, 0) AS opt FROM b ORDER BY id > Again, that can't be flattened into the top query, but looking at > it in isolation the planner chooses an indexscan as the best plan > (by no means a sure thing, but it will do it if the index correlation > is high). And then the mergejoin without sort falls out from that. > > So the long and the short of it is that the COALESCE acts as an > optimization fence in the presence of outer joins. We've seen this > before and there are some rough ideas about fixing it. (In fact, > I thought it was on the TODO list, but I can't find an entry now.) > Don't hold your breath though --- it'll take major planner surgery. In this case isn't all the planner needs the pathkey list to give it a hint that that ordering might be useful? It can't re-order the join but it can still try to produce those rows in any order which can be useful for the upper joins. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Thank you for your prompt reply, I appreciate your insight on this. > So the COALESCE has to be evaluated below the outer join, which means > that the view can't be "flattened" into the upper query. > ... > So the long and the short of it is that the COALESCE acts as an > optimization fence in the presence of outer joins. We've seen this > before and there are some rough ideas about fixing it. You may already have this rough idea somewhere, but it seems to me that the view could be flattened into the upper query as long as the join predicates don't depend on coalesced columns. In the examples I sent, even if the COALESCE is evaluated at the very end of the query, the merge join (on the id columns) would still be correct. --Chris
"Chris Mayfield" <cmayfiel@cs.purdue.edu> writes: > You may already have this rough idea somewhere, but it seems to me that the > view could be flattened into the upper query as long as the join predicates > don't depend on coalesced columns. In the examples I sent, even if the > COALESCE is evaluated at the very end of the query, the merge join (on the id > columns) would still be correct. I don't remember the example but that's not true in general, consider something like select * from a left outer join ( select *, coalesce(b.x,'foo') from b ) as b using (a.y=b.y) The coalesce() doesn't depend on any join predicates, but if it's evaluated late it will be 'foo' for all the non-matching records instead of NULL. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> Don't hold your breath though --- it'll take major planner surgery. > In this case isn't all the planner needs the pathkey list to give it a hint > that that ordering might be useful? You could maybe make that work if you were willing to speculatively re-plan the entire subquery for each potentially useful ordering. I think that would chew an unacceptable number of cycles though. The upper planner doesn't have any clue about what indexes are available in the lower query, so it would end up requesting a lot of useless re-plans. In any case the nullable-targetlist restriction causes a whole lot of other problems that this wouldn't address. I'd rather spend my time on solving the more general problem. regards, tom lane
Chris Mayfield <cmayfiel@cs.purdue.edu> writes: >>> So the long and the short of it is that the COALESCE acts as an >>> optimization fence in the presence of outer joins. We've seen this >>> before and there are some rough ideas about fixing it. > You may already have this rough idea somewhere, but it seems to me that > the view could be flattened into the upper query as long as the join > predicates don't depend on coalesced columns. In the examples I sent, > even if the COALESCE is evaluated at the very end of the query, the > merge join (on the id columns) would still be correct. But the output would not be: the join column would fail to go to null when it was supposed to. See the example that made us put in that restriction in the first place: http://archives.postgresql.org/pgsql-bugs/2001-04/msg00223.php regards, tom lane