Home > mailing lists

Re: slow joining very large table to smaller ones - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: slow joining very large table to smaller ones
Date	July 14, 2005 20:08:46
Msg-id	5545.1121382523@sss.pgh.pa.us Whole thread Raw
In response to	Re: slow joining very large table to smaller ones (Dan Harris <fbsd@drivefaster.net>)
List	pgsql-performance

Tree view

Dan Harris <fbsd@drivefaster.net> writes:
> Here's the explain select for that one, since
> that's the best I can get.

> explain select recordtext from eventactivity,k_r,k_b where
> eventactivity.incidentid = k_r.incidentid and
> eventactivity.incidentid = k_b.incidentid and k_r.id = 94 and k_b.id
> = 107;
>                                                    QUERY PLAN
> ------------------------------------------------------------------------
> --------------------------------------
> Merge Join  (cost=9624.61..4679590.52 rows=151009549 width=35)
>     Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>     ->  Merge Join  (cost=4766.92..4547684.26 rows=16072733 width=117)
>           Merge Cond: (("outer".incidentid)::text = "inner"."?column2?")
>           ->  Index Scan using eventactivity1 on eventactivity
> (cost=0.00..4186753.16 rows=46029271 width=49)
>           ->  Sort  (cost=4766.92..4771.47 rows=1821 width=68)
>                 Sort Key: (k_b.incidentid)::text
>                 ->  Index Scan using k_b_idx on k_b
> (cost=0.00..4668.31 rows=1821 width=68)
>                       Index Cond: (id = 107)
>     ->  Sort  (cost=4857.69..4862.39 rows=1879 width=68)
>           Sort Key: (k_r.incidentid)::text
>           ->  Index Scan using k_r_idx on k_r  (cost=0.00..4755.52
> rows=1879 width=68)
>                 Index Cond: (id = 94)
> (13 rows)

There's something awfully fishy here.  The 8.0 planner is definitely
capable of figuring out that it ought to join the smaller relations
first.  As an example, using 8.0.3+ (CVS branch tip) I did

regression=# create table eventactivity(incidentid varchar, recordtext text);
CREATE TABLE
regression=# create table k_r(incidentid varchar);
CREATE TABLE
regression=# create table k_b(incidentid varchar);
CREATE TABLE
regression=# explain select recordtext from eventactivity inner join
(select incidentid from k_r) a using (incidentid)
inner join (select incidentid from k_b) b using (incidentid);

(Being too impatient to actually fill the eventactivity table with 36M
rows of data, I just did some debugger magic to make the planner think
that that was the table size...)  The default plan looks like

 Merge Join  (cost=16137814.70..36563453.23 rows=1361700000 width=32)
   Merge Cond: (("outer".incidentid)::text = "inner"."?column3?")
   ->  Merge Join  (cost=170.85..290.48 rows=7565 width=64)
         Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
         ->  Sort  (cost=85.43..88.50 rows=1230 width=32)
               Sort Key: (k_r.incidentid)::text
               ->  Seq Scan on k_r  (cost=0.00..22.30 rows=1230 width=32)
         ->  Sort  (cost=85.43..88.50 rows=1230 width=32)
               Sort Key: (k_b.incidentid)::text
               ->  Seq Scan on k_b  (cost=0.00..22.30 rows=1230 width=32)
   ->  Sort  (cost=16137643.84..16227643.84 rows=36000000 width=64)
         Sort Key: (eventactivity.incidentid)::text
         ->  Seq Scan on eventactivity  (cost=0.00..1080000.00 rows=36000000 width=64)

and if I "set enable_mergejoin TO 0;" I get

 Hash Join  (cost=612.54..83761451.54 rows=1361700000 width=32)
   Hash Cond: (("outer".incidentid)::text = ("inner".incidentid)::text)
   ->  Seq Scan on eventactivity  (cost=0.00..1080000.00 rows=36000000 width=64)
   ->  Hash  (cost=504.62..504.62 rows=7565 width=64)
         ->  Hash Join  (cost=25.38..504.62 rows=7565 width=64)
               Hash Cond: (("outer".incidentid)::text = ("inner".incidentid)::text)
               ->  Seq Scan on k_r  (cost=0.00..22.30 rows=1230 width=32)
               ->  Hash  (cost=22.30..22.30 rows=1230 width=32)
                     ->  Seq Scan on k_b  (cost=0.00..22.30 rows=1230 width=32)

which is the plan I would judge Most Likely To Succeed based on what we
know about Dan's problem.  (The fact that the planner is estimating it
as twice as expensive as the mergejoin comes from the fact that with no
statistics about the join keys, the planner deliberately estimates hash
join as expensive, because it can be pretty awful in the presence of
many equal keys.)

So the planner is certainly capable of finding the desired plan, even
without any tweaking of the query text.  This means that what we have
is mainly a statistical problem.  Have you ANALYZEd these tables
recently?  If so, may we see the pg_stats rows for incidentid in all
three tables?

            regards, tom lane

pgsql-performance by date:

From: Michael Stone
Date: 14 July 2005, 19:47:58
Subject: Re: slow joining very large table to smaller ones

From: John A Meinel
Date: 14 July 2005, 20:14:03
Subject: Re: slow joining very large table to smaller ones

Re: slow joining very large table to smaller ones - Mailing list pgsql-performance

Previous

Next