JOIN between three *simple* tables ... - Mailing list pgsql-hackers

From Marc G. Fournier
Subject JOIN between three *simple* tables ...
Date
Msg-id 20020206162412.F57607-100000@earth.hub.org
Whole thread Raw
Responses Re: JOIN between three *simple* tables ...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Morning all ...
First off, this is using v7.2 release ...
Okay, this is going to drive me up the proverbial wall ... very
simple query:

SELECT p.uid, p.handle   FROM orientation_c poc, profiles p, gender_f pgf  WHERE (p.uid = pgf.uid )    AND (pgf.uid =
poc.uid) ;
 

profiles contains:

iwantu=# select count(1) from profiles;count
--------485969
(1 row)

and is everyone in the system ... no problems there ...

gender_f contains:

iwantu=# select count(1) from gender_f;count
-------75664
(1 row)

And is *just* the uid's of those in profiles that are female ...

finally, orientation_c:

iwantu=# select count(1) from orientation_c;count
--------126477
(1 row)

Is again *just* the uid's of those in profiles that have a 'c'
orientiation ...

Now, the above wquery has an explain of:

Hash Join  (cost=6363.90..47877.08 rows=19692 width=35) ->  Seq Scan on profiles p  (cost=0.00..35707.69 rows=485969
width=19)->  Hash  (cost=6174.74..6174.74 rows=75664 width=16)       ->  Hash Join  (cost=2928.34..6174.74 rows=75664
width=16)            ->  Seq Scan on gender_f pgf  (cost=0.00..1165.64 rows=75664 width=8)             ->  Hash
(cost=1948.77..1948.77rows=126477 width=8)                   ->  Seq Scan on orientation_c poc  (cost=0.00..1948.77
rows=126477width=8)
 

Now, a join between poc and pgf alone comes out to:

iwantu=# select count(1) from orientation_c poc, gender_f pgf where poc.uid = pgf.uid;count
-------12703
(1 row)

iwantu=# explain select count(1) from orientation_c poc, gender_f pgf where poc.uid = pgf.uid;
NOTICE:  QUERY PLAN:

Aggregate  (cost=6363.90..6363.90 rows=1 width=16) ->  Hash Join  (cost=2928.34..6174.74 rows=75664 width=16)       ->
SeqScan on gender_f pgf  (cost=0.00..1165.64 rows=75664 width=8)       ->  Hash  (cost=1948.77..1948.77 rows=126477
width=8)            ->  Seq Scan on orientation_c poc  (cost=0.00..1948.77 rows=126477 width=8)
 

EXPLAIN


Now, what I'd like to have happen is a SEQ SCAN through the smaller table
(gender_f), and grab everything in orientation_c that matches (both tables
have zero duplicates of uid, its purely a one of, so I would think that I
should be able to take 1 uid from pgf, and use the index on poc to
determine if it exists, and do that 75664 times ...

That would live me with 12703 UIDs to match up with apropriate records in
the almost 500+k records in profiles itself, instead of having to scan
through each of thoose 500+k records themselves ...

Then again, let's go one simpler:

iwantu=# \d orientation_c
Table "orientation_c"Column |  Type  | Modifiers
--------+--------+-----------uid    | bigint |
Indexes: poc_uid

iwantu=# \d poc_uidIndex "poc_uid"Column |  Type
--------+--------uid    | bigint
btree

iwantu=# explain select count(1) from orientation_c poc where uid = 1;
NOTICE:  QUERY PLAN:

Aggregate  (cost=2264.97..2264.97 rows=1 width=0) ->  Seq Scan on orientation_c poc  (cost=0.00..2264.96 rows=1
width=0)

EXPLAIN

if all varlues in orientation_c are unique, and there are 127k
records ... shouldn't it use the index instead of scanning through all 127k records ?  Or am I missing something
totallyobvious here?
 



pgsql-hackers by date:

Previous
From: mlw
Date:
Subject: Re: [GENERAL] PostgreSQL v7.2 Final Release
Next
From: Haroldo Stenger
Date:
Subject: Re: Threaded PosgreSQL server