Re: planner chooses unoptimal plan on joins with complex key - Mailing list pgsql-performance
From | Dmitry Potapov |
---|---|
Subject | Re: planner chooses unoptimal plan on joins with complex key |
Date | |
Msg-id | 878c83960801290725h6b5346d2h896f45e984da2dcd@mail.gmail.com Whole thread Raw |
In response to | Re: planner chooses unoptimal plan on joins with complex key (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: planner chooses unoptimal plan on joins with complex key
|
List | pgsql-performance |
Hello, (Tom, sorry if you receive this letter twice. The first copy was unintentionally sent with 'reply to sender only', I resend it to the list, reply this one to keep the thread, please.) 2008/1/25, Tom Lane <tgl@sss.pgh.pa.us>: > Well, there's our problem: an estimate of 1 row for a join that's > actually 30805113 rows is uncool :-(. > > It's hard to tell whether the planner is just being overoptimistic > about the results of ANDing the three join conditions, or if one or > more of the basic condition selectivities were misestimated. Could > you try > > explain analyze select 1 from t1, t2 where t1.m1 = t2.m1; > explain analyze select 1 from t1, t2 where t1.m2 = t2.m2; > explain analyze select 1 from t1, t2 where t1.m3 = t2.m3; > > and show the results? This will probably be slow too, but we don't > care --- we just need to see the estimated and actual rowcounts. I've indexed m1, m2, m3 in each table individually, to speed things up. The first query is too slow. In fact, it's still running, for 4 days now: =# select procpid, current_query, now()-query_start from pg_stat_activity; procpid | current_query | ?column? ---------+-------------------------------------------------------------------------+----------------------- 11403 | explain analyze select 1 from t1, t2 where t1.m1 = t2.m1; | 4 days 06:11:52.18082 I wonder if it will ever finish the work :( So, for now, the only thing I can show is: =# explain select 1 from t1, t2 where t1.m1=t2.m1 ; QUERY PLAN -------------------------------------------------------------------------------- Nested Loop (cost=0.00..162742993234.25 rows=57462242912003 width=0) -> Seq Scan on t2 (cost=0.00..690784.54 rows=30820054 width=4) -> Index Scan using i_t1_m1 on t1 (cost=0.00..3080.74 rows=175973 width=4) Index Cond: (t1.m1 = t2.m1) (4 rows) I'll post explain analyze result as soon as it'll finish up. Second and third queries are less work: Result for m2 join: =# explain analyze select 1 from t1, t2 where t1.m2=t2.m2 ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..2390772.31 rows=32274433 width=0) (actual time=44.460..376892.633 rows=32466668 loops=1) Merge Cond: (t1.m2 = t2.m2) -> Index Scan using i_t1_m2 on t1 (cost=0.00..861938.04 rows=21292688 width=8) (actual time=22.178..54862.030 rows=21292689 loops=1) -> Index Scan using i_t2_m2 on t2 (cost=0.00..1023944.35 rows=30820054 width=8) (actual time=22.263..245649.669 rows=32481348 loops=1) Total runtime: 389871.753 ms (5 rows) Results for m3 join: =# explain analyze select 1 from t1, t2 where t1.m3=t2.m3 ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=3460701.40..6971662.01 rows=148454021 width=0) (actual time=292433.263..1269127.008 rows=170051982 loops=1) Merge Cond: (t2.m3 = t1.m3) -> Index Scan using i_t2_m3 on t2 (cost=0.00..1207227.84 rows=30820054 width=8) (actual time=28.996..622876.390 rows=30820054 loops=1) -> Sort (cost=3460701.40..3513933.12 rows=21292688 width=8) (actual time=292404.240..426620.070 rows=170051989 loops=1) Sort Key: t1.m3 -> Seq Scan on t1 (cost=0.00..635040.88 rows=21292688 width=8) (actual time=0.031..65919.482 rows=21292689 loops=1) Total runtime: 1333669.966 ms (7 rows) -- Regards, Dmitry
pgsql-performance by date: