Thread: explicit joins wrong planning
Hi SELECT version(); PostgreSQL 7.3.4 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc (GCC) 3.3.2 20031005 (Debian prerelease) Let's say I have 3 tables: groups ( groupid integer primary key, name varchar, begindate date ); offsets ( offset_id integer, groupid integer references groups, offset_value integer ); events ( offset_id integer references offsets, event_date date, primary key (offset_id,event_date) ); explain analyze select * from groups g join offsets o using (groupid) join events e on (e.offsetid=o.offset_id and e.event_date=g.begindate+o.offset_value) where g.name='some_name'; Postgres doesn't use join on these both fields and doesn't use index scan properly. I get: Hash Cond: ("outer".offset_id = "inner".offset_id) Join Filter: ("outer".event_date = ("inner".begindate + "inner".offset_value)) Why? I lost few hours trying to fix it and I found, that copying one of these conditions into where clause solved my problem: explain analyze select * from groups g join offsets o using (groupid) join events e on (e.offsetid=o.offset_id and e.event_date=g.begindate+o.offset_value) where g.name='some_name' and e.offsetid=o.offset_id; Join Filter: ("outer".event_date = ("inner".begindate + "inner".offset_value)) Nested Loop... Join Filter: ("outer".offset_id = "inner".offset_id) Why? What was I doing wrong? Regards, Tomasz Myrta
Tomasz Myrta <jasiek@klaster.net> writes: > Postgres doesn't use join on these both fields and doesn't use index > scan properly. Hard to say much when you didn't actually show us the output of EXPLAIN ANALYZE. regards, tom lane
Dnia 2003-11-28 20:18, Użytkownik Tom Lane napisał: > Hard to say much when you didn't actually show us the output of EXPLAIN > ANALYZE. OK, Here you are: explain analyze select * from plany pl join linia_trasy lt using (id_linii) join kursy k on (k.id_trasy=lt.id_trasy and k.event_date=pl.begindate+lt.offset) where pl.id_planu=508; event_date - date begindate - date id_linii - integer id_trasy - integer offset - integer QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- HashJoin (cost=5.82..1210.68 rows=2 width=40) (actual time=718.11..759.27 rows=5 loops=1) Hash Cond: ("outer".id_trasy = "inner".id_trasy) Join Filter: ("outer".event_date= ("inner".begindate + "inner".offset)) -> Seq Scan on kursy k (cost=0.00..876.07 rows=58707 width=11) (actual time=0.02..547.65 rows=58707 loops=1) -> Hash (cost=5.81..5.81 rows=7 width=29) (actual time=0.43..0.43 rows=0 loops=1) -> Nested Loop (cost=0.00..5.81 rows=7 width=29) (actual time=0.16..0.37 rows=7 loops=1) Join Filter: ("outer".id_linii = "inner".id_linii) -> IndexScan using plany_pkey on plany pl (cost=0.00..4.49 rows=1 width=17) (actual time=0.09..0.11 rows=1 loops=1) Index Cond: (id_planu = 508) -> Seq Scan on linia_trasy lt (cost=0.00..1.14 rows=14 width=12) (actual time=0.02..0.12 rows=14 loops=1) If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the query works fine. Regards, Tomasz Myrta
Tomasz Myrta <jasiek@klaster.net> writes: > If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the > query works fine. Define "works fine", please (again, EXPLAIN ANALYZE would be a nice concrete description). regards, tom lane
Dnia 2003-11-28 20:52, Użytkownik Tom Lane napisał: > Tomasz Myrta <jasiek@klaster.net> writes: > >>If I add "and k.id_trasy=lt.id_trasy" into where clause (duplicate), the >>query works fine. > > > Define "works fine", please (again, EXPLAIN ANALYZE would be a nice > concrete description). Sorry, I thought the wrong case would be enough. In first case (the worse one) I had: Total runtime: 678.31 msec After my changes I got: explain analyze select * from plany pl join linia_trasy lt using (id_linii) join kursy k on (k.event_date=pl.begindate+lt.offset and k.id_trasy=lt.id_trasy) where pl.id_planu=508 and k.id_trasy=lt.id_trasy; ^^^^^^^^^^^^^^^^^^^^^^^^^^^ QUERY PLAN -------------------------------------------------------- Nested Loop (cost=0.00..94.43 rows=2 width=40) (actual time=2.97..77.55 rows=5 loops=1) Join Filter: ("inner".event_date = ("outer".begindate + "outer".offset)) -> NestedLoop (cost=0.00..5.81 rows=7 width=29) (actual time=0.15..0.41 rows=7 loops=1) Join Filter: ("outer".id_linii = "inner".id_linii) -> Index Scan using plany_pkeyon plany pl (cost=0.00..4.49 rows=1 width=17) (actual time=0.09..0.10 rows=1 loops=1) Index Cond: (id_planu = 508) -> Seq Scanon linia_trasy lt (cost=0.00..1.14 rows=14 width=12) (actual time=0.02..0.15 rows=14 loops=1) -> Index Scan using kursy_pkey on kursy k (cost=0.00..7.62 rows=288 width=11) (actual time=0.05..8.01 rows=533 loops=7) Index Cond: ((k.id_trasy = "outer".id_trasy) AND (k.id_trasy= "outer".id_trasy)) Total runtime: 78.01 msec It's much better now (10x faster), but I've just found this plan still isn't as I want to have. I wish I could have index usage on both fields, it means: Index Cond: ((k.id_trasy = "outer".id_trasy) AND (("inner".event_date = ("outer".begindate + "outer".offset) Regards, Tomasz Myrta
Tomasz Myrta <jasiek@klaster.net> writes: > It's much better now (10x faster), but I've just found this plan still > isn't as I want to have. I wish I could have index usage on both fields, FWIW, I see this plan in 7.4 using your original test case: Nested Loop (cost=22.51..71.79 rows=1 width=56) -> Hash Join (cost=22.51..47.56 rows=5 width=48) Hash Cond: ("outer".groupid= "inner".groupid) -> Seq Scan on offsets o (cost=0.00..20.00 rows=1000 width=12) -> Hash (cost=22.50..22.50 rows=5 width=40) -> Seq Scan on groups g (cost=0.00..22.50 rows=5 width=40) Filter: ((name)::text = 'some_name'::text) -> Index Scan using events_pkey on events e (cost=0.00..4.83 rows=1width=8) Index Cond: ((e.offset_id = "outer".offset_id) AND (e.event_date = ("outer".begindate + "outer".offset_value))) I believe the difficulty in 7.3 is because begindate and offset_value come from different relations, and specifically from different relations than the first index condition uses. This was fixed here: 2002-11-24 16:52 tgl * src/: backend/nodes/copyfuncs.c, backend/nodes/equalfuncs.c,backend/nodes/list.c, backend/nodes/outfuncs.c,backend/nodes/readfuncs.c, backend/optimizer/path/indxpath.c,backend/optimizer/path/joinpath.c,backend/optimizer/path/orindxpath.c,backend/optimizer/path/tidpath.c,backend/optimizer/plan/initsplan.c,backend/optimizer/util/pathnode.c,backend/optimizer/util/plancat.c, backend/optimizer/util/relnode.c,backend/optimizer/util/restrictinfo.c,include/nodes/nodes.h,include/nodes/pg_list.h, include/nodes/relation.h,include/optimizer/paths.h,include/optimizer/restrictinfo.h:Restructure planning of nestloop innerindexscans so that the setof usable joinclauses is determined accurately for each join. Formerly, the code only consideredjoinclauses that used all of therels from the outer side of the join; thus for example FROM (aCROSS JOINb) JOIN c ON (c.f1 = a.x AND c.f2 = b.y) could notexploit a two-column index on c(f1,f2), since neither of the qualclauseswould be in the joininfo list it looked in. The new codedoes this correctly, and also is able to eliminate redundantclauses,thus fixing the problem noted 24-Oct-02 by Hans-J�rgenSch�nig. regards, tom lane
Dnia 2003-11-28 22:57, Użytkownik Tom Lane napisał: > Restructure planning of nestloop inner indexscans so that the set > of usable joinclauses is determined accurately for each join. > Formerly, the code only considered joinclauses that used all of the > rels from the outer side of the join; thus for example FROM (a > CROSS JOIN b) JOIN c ON (c.f1 = a.x AND c.f2 = b.y) could not > exploit a two-column index on c(f1,f2), since neither of the qual > clauses would be in the joininfo list it looked in. The new code > does this correctly, and also is able to eliminate redundant > clauses, thus fixing the problem noted 24-Oct-02 by Hans-Jürgen > Schönig. Yes, this is a description of my problem. Thanks a lot. Regards, Tomasz Myrta