Re: query speed joining tables - Mailing list pgsql-sql

From Tomasz Myrta
Subject Re: query speed joining tables
Date
Msg-id 3E233743.2080100@klaster.net
Whole thread Raw
In response to query speed joining tables  (Christopher Smith <christopherl_smith@yahoo.com>)
Responses Re: query speed joining tables
List pgsql-sql
Christopher Smith wrote:

> I have 4 tables that I need to query... 3 of the tables are links by the
> field userid.
>
> below are table sql fragments. as well as the query.  The subquery
> causes very high
>
> CPU usages.  It typically returns ~3000 matches. Is there another way to
> rewrite this?
>
> SELECT user_login.userid FROM user_login,user_details_p,user_match_details
>  WHERE user_login.userid = user_details_p.userid AND
> user_details_p.userid = user_match_details.userid AND
> user_details_p.gender ='W' AND
> user_details_p.seekgender ='M' AND
> user_details_p.age >=18 AND
> user_details_p.age <=50 AND
> user_match_details.min_age <= 30 AND
> user_match_details.max_age >= 30 AND
> user_details_p.ethnictype = 'Caucasian (White)' AND
> strpos(user_match_details.ethnicity,'Asian') !=0 AND
> user_details_p.zipcode in (select zips_max.destination from zips_max
> where zips_max.origin='90210' )
> order by user_login.last_login desc;


explicit joins show better idea of your query and helps postgres 
choosing indexing.

select userid
from user_login join user_details using (userid) join user_match_details using (userid)
where user_details_p.gender ='W' AND user_details_p.seekgender ='M' AND user_details_p.age between 18 and 50 and
user_match_details.min_age<= 30 AND user_match_details.max_age >= 30 AND user_details_p.ethnictype = 'Caucasian
(White)'AND strpos(user_match_details.ethnicity,'Asian') !=0 AND user_details_p.zipcode in (select zips_max.destination
fromzips_max where zips_max.origin='90210' ) order by user_login.last_login desc;
 


How can I help you with subquery if you didn't write even zips_max 
definition?!?

If origin is unique value in that table, you can change subquery into 
join on "from" list.

>                Table "public.user_login"
>    Column   |           Type           |   Modifiers
> ------------+--------------------------+---------------
>  userid     | character varying(30)    | not null
>  password   | character varying(30)    | not null
>  email      | character varying(50)    | not null
>  last_login | timestamp with time zone | not null
>  Indexes: user_login_pkey primary key btree (userid),

Do you really need userid as varchar?
indexing on int4 or int8 would be much faster than varchar

Why do you have 3 tables? It looks like only one table would be enough. 
Remember, that null values don't take too much space.

>
>
>
>
>                Table "public.user_details_p"
>      Column      |           Type           |   Modifiers
> -----------------+--------------------------+---------------
>  userid          | character varying(30)    | not null
>  gender          | character varying(1)     |
>  age             | integer                  |
>  height          | character varying(10)    |
>  ethnicty pe      | character varying(30)    |
>  education       | character varying(30)    |
>  createdate      | timestamp with time zone | default now()
>  zipcode         | character varying(5)     |
>  birthdate       | date                     | default now()
>  zodiac          | character varying(40)    |
>  seekgender      | character varying(2)     |
> Indexes: user_details_p_pkey primary key btree (userid),
>          user_details_p_age_idx btree (age),
>       &nb sp;  user_details_p_ethnic_idx btree (ethnictype),
>          user_details_p_gender_idx btree (gender),

>
>          user_details_p_last_login_idx btree (last_login),
>          user_details_p_seekgender_idx btree (seekgender),
>          user_details_p_state_idx btree (state)

There is too many indexes -
index on gender (2 possible values) is useless,
index on ethnic (how many values - I think not too many?) is possibly 
useless

Consider creating single index on several fields for queries like this:
select
...
where user_details_p.gender ='W' AND user_details_p.seekgender ='M' AND user_details_p.age between 18 and 50 and

index on (age,seekgender,gender)

>
> Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES
> user_login(userid) ON UPDATE NO ACTI
> ON ON DELETE CASCADE
>
>
>            Table "public.user_match_details"
>       Column      |          Type          | Modifiers
> ------------------+------------------------+-----------
>  userid           | character varying(30)  | not null
>  soughtmate       | character varying(200) |
>  ethnicity        | character varying(200) |
>  marital_status   | character varying(200) |
>  min_age          | integer                |
>  max_age          | integer   &nbs p;            |
>  city             | character varying(50)  |
>  state            | character varying(2)   |
>  zipcode          | integer                |
>  match_distance   | integer                |
> Indexes: user_match_details_pkey primary key btree (userid)
> Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES
> user_login(userid) ON UPDATE NO ACTION ON DELETE CASCADE
>
>             Table "public.zips_300"
>    Column    |         Type         | Modifiers
> -------------+----------------------+-----------
>  origin      | character varying(5) |
>  destination | character varying(5) |
> Indexes: zips_300_origin_idx btree (origin)


If you need more specific answer, you have to add more information - how 
many records do you have in your tables and how many possible values do 
you use for example for zipcodes, ethnicity etc.

Regards,
Tomasz Myrta



pgsql-sql by date:

Previous
From: Christopher Smith
Date:
Subject: Re: query speed joining tables
Next
From: Christopher Smith
Date:
Subject: Re: query speed joining tables