Thread: explicit joins wrong planning

explicit joins wrong planning

From
Tomasz Myrta
Date:
Hi
SELECT version();
PostgreSQL 7.3.4 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc 
(GCC) 3.3.2 20031005 (Debian prerelease)

Let's say I have 3 tables:

groups (  groupid     integer primary key,  name          varchar,  begindate   date
);

offsets (  offset_id     integer,  groupid    integer references groups,  offset_value  integer
);

events (  offset_id integer   references offsets,  event_date        date,  primary key (offset_id,event_date)
);



explain analyze select *
from  groups g  join offsets o using (groupid)  join events e on (e.offsetid=o.offset_id and
e.event_date=g.begindate+o.offset_value)
where g.name='some_name';

Postgres doesn't use join on these both fields and doesn't use index 
scan properly.
I get:  Hash Cond: ("outer".offset_id = "inner".offset_id)   Join Filter: ("outer".event_date = ("inner".begindate + 
"inner".offset_value))

Why?
I lost few hours trying to fix it and I found, that copying one of these 
conditions into where clause solved my problem:

explain analyze select *
from  groups g  join offsets o using (groupid)  join events e on (e.offsetid=o.offset_id and
e.event_date=g.begindate+o.offset_value)
where g.name='some_name' and e.offsetid=o.offset_id;

 Join Filter: ("outer".event_date = ("inner".begindate + 
"inner".offset_value))   Nested Loop...     Join Filter: ("outer".offset_id = "inner".offset_id)

Why? What was I doing wrong?

Regards,
Tomasz Myrta






Re: explicit joins wrong planning

From
Tom Lane
Date:
Tomasz Myrta <jasiek@klaster.net> writes:
> Postgres doesn't use join on these both fields and doesn't use index 
> scan properly.

Hard to say much when you didn't actually show us the output of EXPLAIN
ANALYZE.
        regards, tom lane


Re: explicit joins wrong planning

From
Tomasz Myrta
Date:
Dnia 2003-11-28 20:18, Użytkownik Tom Lane napisał:
> Hard to say much when you didn't actually show us the output of EXPLAIN
> ANALYZE.

OK, Here you are:

explain analyze select *
from plany pl  join linia_trasy lt using (id_linii)  join kursy k on (k.id_trasy=lt.id_trasy and 
k.event_date=pl.begindate+lt.offset)
where pl.id_planu=508;

event_date - date
begindate - date
id_linii - integer
id_trasy - integer
offset - integer
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
HashJoin  (cost=5.82..1210.68 rows=2 width=40) (actual 
 
time=718.11..759.27 rows=5 loops=1)   Hash Cond: ("outer".id_trasy = "inner".id_trasy)   Join Filter:
("outer".event_date= ("inner".begindate + "inner".offset))   ->  Seq Scan on kursy k  (cost=0.00..876.07 rows=58707
width=11)
 
(actual time=0.02..547.65 rows=58707 loops=1)   ->  Hash  (cost=5.81..5.81 rows=7 width=29) (actual time=0.43..0.43 
rows=0 loops=1)         ->  Nested Loop  (cost=0.00..5.81 rows=7 width=29) (actual 
time=0.16..0.37 rows=7 loops=1)               Join Filter: ("outer".id_linii = "inner".id_linii)               ->
IndexScan using plany_pkey on plany pl 
 
(cost=0.00..4.49 rows=1 width=17) (actual time=0.09..0.11 rows=1 loops=1)                     Index Cond: (id_planu =
508)              ->  Seq Scan on linia_trasy lt  (cost=0.00..1.14 rows=14 
 
width=12) (actual time=0.02..0.12 rows=14 loops=1)


If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the 
query works fine.

Regards,
Tomasz Myrta



Re: explicit joins wrong planning

From
Tom Lane
Date:
Tomasz Myrta <jasiek@klaster.net> writes:
> If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the 
> query works fine.

Define "works fine", please (again, EXPLAIN ANALYZE would be a nice
concrete description).
        regards, tom lane


Re: explicit joins wrong planning

From
Tomasz Myrta
Date:
Dnia 2003-11-28 20:52, Użytkownik Tom Lane napisał:

> Tomasz Myrta <jasiek@klaster.net> writes:
> 
>>If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the 
>>query works fine.
> 
> 
> Define "works fine", please (again, EXPLAIN ANALYZE would be a nice
> concrete description).

Sorry, I thought the wrong case would be enough.
In first case (the worse one) I had: Total runtime: 678.31 msec


After my changes I got:

explain analyze select *
from plany pl  join linia_trasy lt using (id_linii)  join kursy k on (k.event_date=pl.begindate+lt.offset and 
k.id_trasy=lt.id_trasy)
where pl.id_planu=508
and k.id_trasy=lt.id_trasy;
^^^^^^^^^^^^^^^^^^^^^^^^^^^                 QUERY PLAN
-------------------------------------------------------- Nested Loop  (cost=0.00..94.43 rows=2 width=40) (actual 
time=2.97..77.55 rows=5 loops=1)   Join Filter: ("inner".event_date = ("outer".begindate + "outer".offset))   ->
NestedLoop  (cost=0.00..5.81 rows=7 width=29) (actual 
 
time=0.15..0.41 rows=7 loops=1)         Join Filter: ("outer".id_linii = "inner".id_linii)         ->  Index Scan using
plany_pkeyon plany pl  (cost=0.00..4.49 
 
rows=1 width=17) (actual time=0.09..0.10 rows=1 loops=1)               Index Cond: (id_planu = 508)         ->  Seq
Scanon linia_trasy lt  (cost=0.00..1.14 rows=14 
 
width=12) (actual time=0.02..0.15 rows=14 loops=1)   ->  Index Scan using kursy_pkey on kursy k  (cost=0.00..7.62 
rows=288 width=11) (actual time=0.05..8.01 rows=533 loops=7)         Index Cond: ((k.id_trasy = "outer".id_trasy) AND
(k.id_trasy= 
 
"outer".id_trasy)) Total runtime: 78.01 msec

It's much better now (10x faster), but I've just found this plan still 
isn't as I want to have. I wish I could have index usage on both fields, 
it means:
Index Cond: ((k.id_trasy = "outer".id_trasy) AND (("inner".event_date = 
("outer".begindate + "outer".offset)



Regards,
Tomasz Myrta



Re: explicit joins wrong planning

From
Tom Lane
Date:
Tomasz Myrta <jasiek@klaster.net> writes:
> It's much better now (10x faster), but I've just found this plan still 
> isn't as I want to have. I wish I could have index usage on both fields, 

FWIW, I see this plan in 7.4 using your original test case:
Nested Loop  (cost=22.51..71.79 rows=1 width=56)  ->  Hash Join  (cost=22.51..47.56 rows=5 width=48)        Hash Cond:
("outer".groupid= "inner".groupid)        ->  Seq Scan on offsets o  (cost=0.00..20.00 rows=1000 width=12)        ->
Hash (cost=22.50..22.50 rows=5 width=40)              ->  Seq Scan on groups g  (cost=0.00..22.50 rows=5 width=40)
             Filter: ((name)::text = 'some_name'::text)  ->  Index Scan using events_pkey on events e  (cost=0.00..4.83
rows=1width=8)        Index Cond: ((e.offset_id = "outer".offset_id) AND (e.event_date = ("outer".begindate +
"outer".offset_value)))

I believe the difficulty in 7.3 is because begindate and offset_value
come from different relations, and specifically from different relations
than the first index condition uses.  This was fixed here:

2002-11-24 16:52  tgl
* src/: backend/nodes/copyfuncs.c, backend/nodes/equalfuncs.c,backend/nodes/list.c,
backend/nodes/outfuncs.c,backend/nodes/readfuncs.c,
backend/optimizer/path/indxpath.c,backend/optimizer/path/joinpath.c,backend/optimizer/path/orindxpath.c,backend/optimizer/path/tidpath.c,backend/optimizer/plan/initsplan.c,backend/optimizer/util/pathnode.c,backend/optimizer/util/plancat.c,
backend/optimizer/util/relnode.c,backend/optimizer/util/restrictinfo.c,include/nodes/nodes.h,include/nodes/pg_list.h,
include/nodes/relation.h,include/optimizer/paths.h,include/optimizer/restrictinfo.h:Restructure planning of nestloop
innerindexscans so that the setof usable joinclauses is determined accurately for each join. Formerly, the code only
consideredjoinclauses that used all of therels from the outer side of the join; thus for example       FROM (aCROSS
JOINb) JOIN c ON (c.f1 = a.x AND c.f2 = b.y) could notexploit a two-column index on c(f1,f2), since neither of the
qualclauseswould be in the joininfo list it looked in.  The new codedoes this correctly, and also is able to eliminate
redundantclauses,thus fixing the problem noted 24-Oct-02 by Hans-J�rgenSch�nig.
 
        regards, tom lane


Re: explicit joins wrong planning

From
Tomasz Myrta
Date:
Dnia 2003-11-28 22:57, Użytkownik Tom Lane napisał:
>     Restructure planning of nestloop inner indexscans so that the set
>     of usable joinclauses is determined accurately for each join. 
>     Formerly, the code only considered joinclauses that used all of the
>     rels from the outer side of the join; thus for example       FROM (a
>     CROSS JOIN b) JOIN c ON (c.f1 = a.x AND c.f2 = b.y) could not
>     exploit a two-column index on c(f1,f2), since neither of the qual
>     clauses would be in the joininfo list it looked in.  The new code
>     does this correctly, and also is able to eliminate redundant
>     clauses, thus fixing the problem noted 24-Oct-02 by Hans-Jürgen
>     Schönig.

Yes, this is a description of my problem. Thanks a lot.

Regards,
Tomasz Myrta