Re: Wrong query plan when using a left outer join - Mailing list pgsql-sql

From Feike Steenbergen
Subject Re: Wrong query plan when using a left outer join
Date
Msg-id CAK_s-G3tj7gv0WhVWCP5K6bzF874KYWH87_fbAEk3no9jbUzbw@mail.gmail.com
Whole thread Raw
In response to Re: Wrong query plan when using a left outer join  (Filip Rembiałkowski <plk.zuber@gmail.com>)
Responses Re: Wrong query plan when using a left outer join  (Feike Steenbergen <feikesteenbergen@gmail.com>)
List pgsql-sql
> BTW, add a foreign key and index on handhistory_plain.hand_id (unless> you have it already).
It's there already:

feiketracker=# \d+ handhistory_plain;           Table "stage.handhistory_plain"Column  |  Type   | Modifiers | Storage
|Description
 
---------+---------+-----------+----------+-------------hand_id | integer | not null  | plain    |history | text    |
notnull  | extended |
 
Indexes:   "handhistory_plain_pkey" PRIMARY KEY, btree (hand_id) CLUSTER
Foreign-key constraints:   "handhistory_plain_hand_id_fkey" FOREIGN KEY (hand_id) REFERENCES
hand_meta(hand_id)

> BTW2, if you really don't care on handhistory you can just use
> original query with no join.

Well, sometimes I do, sometimes I don't. For easier application access
I wanted to create a view that joins both these tables together:
easier application design and better performance, as the analyzer
should know best when not to use the handhistory_plain table.


The design is as follows:

hand_meta - holds all metadata for a pokerhand
handhistory_plain holds the history for a pokerhand

hand_meta is going to be used the most, it is around 165 bytes per tuple
handhistory_plain is not going to be used often (it is there as a
reference); it is around 5000 bytes per tuple.

They both hold the same column as primary key, handhistory_plain holds
a fraction of the tuples of hand_meta, the split was only made to make
sure the processed data (hand_meta) is smaller in size and should
therefore require less I/O and thus increase performance.

I'm not sure what to make of:
> imagine that the view on the right side of join has some side effects.
I can see some side effects may occur, but as it is a left join, the
left hand side will always be part of the returning set (there is no
where clause), so the index should be used.
Even though I don't understand, you seem to be right, a natural join
is 30 times faster:

feiketracker=# explain analyze select max(hand_id) from hand_meta left
join handhistory_plain using(hand_id);                                                                QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=10000049261.00..10000049261.01 rows=1 width=4)
 
(actual time=31179.238..31179.241 rows=1 loops=1)  ->  Seq Scan on hand_meta  (cost=10000000000.00..10000043062.40
rows=2479440 width=4) (actual time=0.131..16039.886 rows=2479440
loops=1)Total runtime: 31179.725 ms
(3 rows)

Time: 31185.088 ms

feiketracker=# explain analyze select max(hand_id) from hand_meta join
handhistory_plain using(hand_id);
    QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=53043.61..53043.62 rows=1 width=4) (actual
 
time=962.242..962.245 rows=1 loops=1)  ->  Nested Loop  (cost=0.00..53029.93 rows=5470 width=4) (actual
time=0.400..920.582 rows=5470 loops=1)        ->  Index Scan using handhistory_plain_pkey on
handhistory_plain  (cost=0.00..14494.27 rows=5470 width=4) (actual
time=0.215..101.177 rows=5470 loops=1)        ->  Index Scan using hand_meta_pkey on hand_meta
(cost=0.00..7.03 rows=1 width=4) (actual time=0.100..0.115 rows=1
loops=5470)              Index Cond: (hand_meta.hand_id = handhistory_plain.hand_id)Total runtime: 962.968 ms


> try to experiment with SET enable_seqscan TO false; - and see what happens.
Didn't make a difference; therefore I think postgres determines it is
unable to use the index, is that correct?


Thank you for now: I'll use the inner join (or natural join in this
case) for this specific view


pgsql-sql by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: UPDATE COMPATIBILITY
Next
From: Feike Steenbergen
Date:
Subject: Re: Wrong query plan when using a left outer join