Re: An Idea for planner hints - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: An Idea for planner hints
Date
Msg-id 44EDF39B.5040407@markdilger.com
Whole thread Raw
In response to Re: An Idea for planner hints  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-hackers
> Is there actually evidence that there's a lot of problems with bad join
> orders? ISTM that's one of the areas where the planner actually does a
> pretty good job.

I put together a quick demonstration using AxBxC where AxB is empty but AxC is 
not.  Sure enough, postgres chooses AxC first, then xB, which results in extra 
work.  This is a contrived example, but it would be a pain to try to post a real 
example with all the data and analysis.  I think it is fair to say that if it is 
making the wrong choice in this example, it is sometimes making the wrong choice 
in practice.  Cross-table statistics are supposed to help avoid this, right? 
But I think it would only help if the system had the statistics for AxB.  I 
think I have been hearing other people propose systems which would track which 
joins the system is actually using and then recommend to the user that those 
statistics be gathered.  I think we need to go beyond that to recommending 
statistics (or automatically gathering statistics, or whatever) for joins that 
*might* be used given different plans than the one currently chosen by the planner.

test=# create table A (a integer);
CREATE TABLE
Time: 60.151 ms
test=# create table B (b integer);
CREATE TABLE
Time: 3.270 ms
test=# create table C (c integer);
CREATE TABLE
Time: 2.421 ms
test=# insert into A (a) (select * from generate_series(1,10000,2));
INSERT 0 5000
Time: 67.829 ms
test=# insert into B (b) (select * from generate_series(2,10000,2));
INSERT 0 5000
Time: 60.031 ms
test=# insert into C (c) (select * from generate_series(1,1000,2));
INSERT 0 500
Time: 6.303 ms
test=# analyze A;
ANALYZE
Time: 69.669 ms
test=# analyze B;
ANALYZE
Time: 24.548 ms
test=# analyze C;
ANALYZE
Time: 2.936 ms
test=# explain select * from A, B, C where A.a = B.b and A.a = C.c;                                QUERY PLAN
--------------------------------------------------------------------------- Hash Join  (cost=113.50..216.50 rows=500
width=12)  Hash Cond: ("outer".b = "inner".a)   ->  Seq Scan on b  (cost=0.00..73.00 rows=5000 width=4)   ->  Hash
(cost=112.25..112.25rows=500 width=8)         ->  Hash Join  (cost=9.25..112.25 rows=500 width=8)               Hash
Cond:("outer".a = "inner".c)               ->  Seq Scan on a  (cost=0.00..73.00 rows=5000 width=4)               ->
Hash (cost=8.00..8.00 rows=500 width=4)                     ->  Seq Scan on c  (cost=0.00..8.00 rows=500 width=4)
 
(9 rows)

Time: 4.807 ms
test=# select * from A, B, C where A.a = B.b and A.a = C.c; a | b | c
---+---+---
(0 rows)

Time: 34.561 ms
test=# select count(*) from A, C where A.a = C.c; count
-------   500
(1 row)

Time: 8.450 ms
test=# select count(*) from A, B where A.a = B.b; count
-------     0
(1 row)

Time: 33.757 ms



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: tsvector/tsearch equality and/or portability issue
Next
From: Bernd Helmle
Date:
Subject: Re: [PATCHES] Updatable views