Re: JOINs ... how I hate them ... - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: JOINs ... how I hate them ...
Date
Msg-id 1013108872.6992.29.camel@taru.tm.ee
Whole thread Raw
In response to JOINs ... how I hate them ...  ("Marc G. Fournier" <scrappy@hub.org>)
List pgsql-hackers
On Thu, 2002-02-07 at 16:23, Marc G. Fournier wrote:
> 
> 
> 
> The MergeJoin between poc/pgf will only return 12000 records, and since it
> is a 1:1 relationship between each of those tables, there will *only* be
> 12000 records pulled from profiles ... yet its doing a SeqScan through all
> 485k records for each of those UIDs?
> 
> This is after I've performed a VACUUM ANALYZE ...
> 
> The final query itself is:
> 
> SELECT p.uid, p.profiles_handle
>     FROM  ( ( profiles_orientation_c poc JOIN profiles_gender_f pgf USING ( uid ) )>             JOIN iwantu_profiles
pUSING (uid ) ) LEFT JOIN iwantu_last_login ll USING ( uid );
 
> 
> Which explains as:
> 
> Hash Join  (cost=31636.40..78239.34 rows=75664 width=43)
>   ->  Hash Join  (cost=6023.92..47537.10 rows=75664 width=35)
>         ->  Seq Scan on iwantu_profiles p  (cost=0.00..35707.69 rows=485969 width=19)
>         ->  Hash  (cost=5834.76..5834.76 rows=75664 width=16)
>               ->  Merge Join  (cost=0.00..5834.76 rows=75664 width=16)
>                     ->  Index Scan using poc_uid on profiles_orientation_c poc  (cost=0.00..2807.82 rows=126477
width=8)
>                     ->  Index Scan using pgf_uid on profiles_gender_f pgf  (cost=0.00..1575.79 rows=75664 width=8)
>   ->  Hash  (cost=7955.64..7955.64 rows=485964 width=8)
>         ->  Seq Scan on iwantu_last_login ll  (cost=0.00..7955.64 rows=485964 width=8)
> 
> EXPLAIN
> 
> So, poc&pgf are MergeJoin's, leaving me with 12000 records again ... then
> there is the SeqScan/HashJoin wiht profiles, which will leave me with
> 12000 records, but with more information ... but, again, for each of
> *those* 12000 records, its doing a SeqScan on last_login's 485k records,
> instead of using the index ... again, like pgf and poc, there is only one
> record for every uid, so we aren't dealing with duplicates ...

I recently sped up a somewhat similar query from 15 sec to < 1 sec by
rewriting it to use a subselect:

SELECT p.uid, p.profiles_handle FROM profiles_orientation_c poc,      profiles_gender_f pgf      (select uid,
profiles_handle        from iwantu_profiles ip        where ip.uid = pgf.uid      ) p WHERE poc.uid = pgf.uid
 

If you need something from iwantu_last_login it should go into that
subselect as well

That tricked my case to do the small join first.

-----------------
Hannu





pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Summary of new configuration file and data directory locations
Next
From: David Terrell
Date:
Subject: Re: DRDA, network protocol, and documentation