Re: An Idea for planner hints - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: An Idea for planner hints |
Date | |
Msg-id | 44EDF39B.5040407@markdilger.com Whole thread Raw |
In response to | Re: An Idea for planner hints ("Jim C. Nasby" <jnasby@pervasive.com>) |
List | pgsql-hackers |
> Is there actually evidence that there's a lot of problems with bad join > orders? ISTM that's one of the areas where the planner actually does a > pretty good job. I put together a quick demonstration using AxBxC where AxB is empty but AxC is not. Sure enough, postgres chooses AxC first, then xB, which results in extra work. This is a contrived example, but it would be a pain to try to post a real example with all the data and analysis. I think it is fair to say that if it is making the wrong choice in this example, it is sometimes making the wrong choice in practice. Cross-table statistics are supposed to help avoid this, right? But I think it would only help if the system had the statistics for AxB. I think I have been hearing other people propose systems which would track which joins the system is actually using and then recommend to the user that those statistics be gathered. I think we need to go beyond that to recommending statistics (or automatically gathering statistics, or whatever) for joins that *might* be used given different plans than the one currently chosen by the planner. test=# create table A (a integer); CREATE TABLE Time: 60.151 ms test=# create table B (b integer); CREATE TABLE Time: 3.270 ms test=# create table C (c integer); CREATE TABLE Time: 2.421 ms test=# insert into A (a) (select * from generate_series(1,10000,2)); INSERT 0 5000 Time: 67.829 ms test=# insert into B (b) (select * from generate_series(2,10000,2)); INSERT 0 5000 Time: 60.031 ms test=# insert into C (c) (select * from generate_series(1,1000,2)); INSERT 0 500 Time: 6.303 ms test=# analyze A; ANALYZE Time: 69.669 ms test=# analyze B; ANALYZE Time: 24.548 ms test=# analyze C; ANALYZE Time: 2.936 ms test=# explain select * from A, B, C where A.a = B.b and A.a = C.c; QUERY PLAN --------------------------------------------------------------------------- Hash Join (cost=113.50..216.50 rows=500 width=12) Hash Cond: ("outer".b = "inner".a) -> Seq Scan on b (cost=0.00..73.00 rows=5000 width=4) -> Hash (cost=112.25..112.25rows=500 width=8) -> Hash Join (cost=9.25..112.25 rows=500 width=8) Hash Cond:("outer".a = "inner".c) -> Seq Scan on a (cost=0.00..73.00 rows=5000 width=4) -> Hash (cost=8.00..8.00 rows=500 width=4) -> Seq Scan on c (cost=0.00..8.00 rows=500 width=4) (9 rows) Time: 4.807 ms test=# select * from A, B, C where A.a = B.b and A.a = C.c; a | b | c ---+---+--- (0 rows) Time: 34.561 ms test=# select count(*) from A, C where A.a = C.c; count ------- 500 (1 row) Time: 8.450 ms test=# select count(*) from A, B where A.a = B.b; count ------- 0 (1 row) Time: 33.757 ms
pgsql-hackers by date: