Re: planner chooses unoptimal plan on joins with complex key - Mailing list pgsql-performance

From Dmitry Potapov
Subject Re: planner chooses unoptimal plan on joins with complex key
Date
Msg-id 878c83960801290725h6b5346d2h896f45e984da2dcd@mail.gmail.com
Whole thread Raw
In response to Re: planner chooses unoptimal plan on joins with complex key  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: planner chooses unoptimal plan on joins with complex key
List pgsql-performance
Hello,

(Tom, sorry if you receive this letter twice. The first copy was
unintentionally sent with 'reply to sender only', I resend it to the
list, reply this one to keep the thread, please.)

2008/1/25, Tom Lane <tgl@sss.pgh.pa.us>:
> Well, there's our problem: an estimate of 1 row for a join that's
> actually 30805113 rows is uncool :-(.
>
> It's hard to tell whether the planner is just being overoptimistic
> about the results of ANDing the three join conditions, or if one or
> more of the basic condition selectivities were misestimated.  Could
> you try
>
>         explain analyze select 1 from t1, t2 where t1.m1 = t2.m1;
>         explain analyze select 1 from t1, t2 where t1.m2 = t2.m2;
>         explain analyze select 1 from t1, t2 where t1.m3 = t2.m3;
>
> and show the results?  This will probably be slow too, but we don't
> care --- we just need to see the estimated and actual rowcounts.

I've indexed m1, m2, m3 in each table individually, to speed things up.

The first query is too slow. In fact, it's still running, for 4 days now:
=# select procpid, current_query, now()-query_start from pg_stat_activity;
 procpid |                              current_query
            |       ?column?
---------+-------------------------------------------------------------------------+-----------------------
   11403 | explain analyze select 1 from t1, t2 where t1.m1 = t2.m1;
            | 4 days 06:11:52.18082

I wonder if it will ever finish the work :( So, for now, the only
thing I can show is:

=# explain select 1 from t1, t2 where t1.m1=t2.m1 ;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..162742993234.25 rows=57462242912003 width=0)
   ->  Seq Scan on t2  (cost=0.00..690784.54 rows=30820054 width=4)
   ->  Index Scan using i_t1_m1 on t1  (cost=0.00..3080.74 rows=175973 width=4)
         Index Cond: (t1.m1 = t2.m1)
(4 rows)

I'll post explain analyze result as soon as it'll finish up. Second
and third queries are less work:

Result for m2 join:

=# explain analyze select 1 from t1, t2 where t1.m2=t2.m2 ;
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..2390772.31 rows=32274433 width=0) (actual
time=44.460..376892.633 rows=32466668 loops=1)
   Merge Cond: (t1.m2 = t2.m2)
   ->  Index Scan using i_t1_m2 on t1  (cost=0.00..861938.04
rows=21292688 width=8) (actual time=22.178..54862.030 rows=21292689
loops=1)
   ->  Index Scan using i_t2_m2 on t2  (cost=0.00..1023944.35
rows=30820054 width=8) (actual time=22.263..245649.669 rows=32481348
loops=1)
 Total runtime: 389871.753 ms
(5 rows)

Results for m3 join:

=# explain analyze select 1 from t1, t2 where t1.m3=t2.m3 ;
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=3460701.40..6971662.01 rows=148454021 width=0)
(actual time=292433.263..1269127.008 rows=170051982 loops=1)
   Merge Cond: (t2.m3 = t1.m3)
   ->  Index Scan using i_t2_m3 on t2  (cost=0.00..1207227.84
rows=30820054 width=8) (actual time=28.996..622876.390 rows=30820054
loops=1)
   ->  Sort  (cost=3460701.40..3513933.12 rows=21292688 width=8)
(actual time=292404.240..426620.070 rows=170051989 loops=1)
         Sort Key: t1.m3
         ->  Seq Scan on t1  (cost=0.00..635040.88 rows=21292688
width=8) (actual time=0.031..65919.482 rows=21292689 loops=1)
 Total runtime: 1333669.966 ms
(7 rows)


--
Regards,
            Dmitry

pgsql-performance by date:

Previous
From: Matthew
Date:
Subject: Re: RAID arrays and performance
Next
From: Gregory Stark
Date:
Subject: Re: RAID arrays and performance