Re: Help understanding indexes, explain, and optimizing - Mailing list pgsql-performance

From Chris
Subject Re: Help understanding indexes, explain, and optimizing
Date
Msg-id 440CD673.8080208@gmail.com
Whole thread Raw
In response to Help understanding indexes, explain, and optimizing a query  ("i.v.r." <ivanvega@gmail.com>)
Responses Re: Help understanding indexes, explain, and optimizing
Re: Help understanding indexes, explain, and optimizing
List pgsql-performance
i.v.r. wrote:
> Hi everyone,
>
> I'm experimenting with PostgreSQL, but since I'm no expert DBA, I'm
> experiencing some performance issues.
>
> Please take a look at the following query:
>
> SELECT
>  /*groups."name" AS t2_r1,
>  groups."id" AS t2_r3,
>  groups."user_id" AS t2_r0,
>  groups."pretty_url" AS t2_r2,
>  locations."postal_code" AS t0_r6,
>  locations."pretty_url" AS t0_r7,
>  locations."id" AS t0_r8,
>  locations."colony_id" AS t0_r0,
>  locations."user_id" AS t0_r1,
>  locations."group_id" AS t0_r2,
>  locations."distinction" AS t0_r3,
>  locations."street" AS t0_r4,
>  locations."street_2" AS t0_r5,
>  schools."updated" AS t1_r10,
>  schools."level_id" AS t1_r4,
>  schools."pretty_url" AS t1_r11,
>  schools."user_id" AS t1_r5,
>  schools."id" AS t1_r12,
>  schools."type_id" AS t1_r6,
>  schools."distinction" AS t1_r7,
>  schools."cct" AS t1_r8,
>  schools."created_on" AS t1_r9,
>  schools."location_id" AS t1_r0,
>  schools."service_id" AS t1_r1,
>  schools."sustentation_id" AS t1_r2,
>  schools."dependency_id" AS t1_r3*/
>  groups.*,
>  locations.*,
>  schools.*
> FROM locations
> LEFT OUTER JOIN groups ON groups.id = locations.group_id
> LEFT OUTER JOIN schools ON schools.location_id = locations.id
> WHERE (colony_id = 71501)
> ORDER BY groups.name, locations.distinction, schools.distinction
>
> As you can see, I've commented out some parts. I did that as an
> experiment, and it improved the query by 2x. I really don't understand
> how is that possible... I also tried changing the second join to an
> INNER join, and that improves it a little bit also.
>
> Anyway, the main culprit seems to be that second join. Here's the output
> from EXPLAIN:
>
> Sort  (cost=94315.15..94318.02 rows=1149 width=852)
>   Sort Key: groups.name, locations.distinction, schools.distinction
>   ->  Merge Left Join  (cost=93091.96..94256.74 rows=1149 width=852)
>         Merge Cond: ("outer".id = "inner".location_id)
>         ->  Sort  (cost=4058.07..4060.94 rows=1148 width=646)
>               Sort Key: locations.id
>               ->  Hash Left Join  (cost=1.01..3999.72 rows=1148 width=646)
>                     Hash Cond: ("outer".group_id = "inner".id)
>                     ->  Index Scan using locations_colony_id on
> locations  (cost=0.00..3992.91 rows=1148 width=452)
>                           Index Cond: (colony_id = 71501)
>                     ->  Hash  (cost=1.01..1.01 rows=1 width=194)
>                           ->  Seq Scan on groups  (cost=0.00..1.01
> rows=1 width=194)
>         ->  Sort  (cost=89033.90..89607.67 rows=229510 width=206)
>               Sort Key: schools.location_id
>               ->  Seq Scan on schools  (cost=0.00..5478.10 rows=229510
> width=206)
>
> I don't completely understand what that output means, but it would seem
> that the first join costs about 4000, but if I remove that join from the
> query, the performance difference is negligible. So as I said, it seems
> the problem is the join on the schools table.
>
> I hope it's ok for me to post the relevant tables here, so here they are
> (I removed some constraints and indexes that aren't relevant to the
> query above):
>
> CREATE TABLE groups
> (
>  user_id int4 NOT NULL,
>  name varchar(50) NOT NULL,
>  pretty_url varchar(50) NOT NULL,
>  id serial NOT NULL,
>  CONSTRAINT groups_pk PRIMARY KEY (id),
> )
>
> CREATE TABLE locations
> (
>  colony_id int4 NOT NULL,
>  user_id int4 NOT NULL,
>  group_id int4 NOT NULL,
>  distinction varchar(60) NOT NULL,
>  street varchar(60) NOT NULL,
>  street_2 varchar(50) NOT NULL,
>  postal_code varchar(5) NOT NULL,
>  pretty_url varchar(60) NOT NULL,
>  id serial NOT NULL,
>  CONSTRAINT locations_pk PRIMARY KEY (id),
>  CONSTRAINT colony FOREIGN KEY (colony_id)
>      REFERENCES colonies (id) MATCH SIMPLE
>      ON UPDATE NO ACTION ON DELETE NO ACTION,
>  CONSTRAINT "group" FOREIGN KEY (group_id)
>      REFERENCES groups (id) MATCH SIMPLE
>      ON UPDATE NO ACTION ON DELETE NO ACTION,
> )
> CREATE INDEX locations_fki_colony
>  ON locations
>  USING btree
>  (colony_id);
> CREATE INDEX locations_fki_group
>  ON locations
>  USING btree
>  (group_id);
>
> CREATE TABLE schools
> (
>  location_id int4 NOT NULL,
>  service_id int4 NOT NULL,
>  sustentation_id int4 NOT NULL,
>  dependency_id int4 NOT NULL,
>  level_id int4 NOT NULL,
>  user_id int4 NOT NULL,
>  type_id int4 NOT NULL,
>  distinction varchar(25) NOT NULL,
>  cct varchar(20) NOT NULL,
>  created_on timestamp(0) NOT NULL,
>  updated timestamp(0),
>  pretty_url varchar(25) NOT NULL,
>  id serial NOT NULL,
>  CONSTRAINT schools_pk PRIMARY KEY (id),
>  CONSTRAINT "location" FOREIGN KEY (location_id)
>      REFERENCES locations (id) MATCH SIMPLE
>      ON UPDATE NO ACTION ON DELETE NO ACTION,
> )
> CREATE INDEX schools_fki_location
>  ON schools
>  USING btree
>  (location_id);
>
> So I'm wondering what I'm doing wrong. I migrated this database from
> MySQL, and on there it ran pretty fast.

Have you done an 'analyze' or 'vacuum analyze' over these tables?

A left outer join gets *everything* from the second table:

 > LEFT OUTER JOIN groups ON groups.id = locations.group_id
 > LEFT OUTER JOIN schools ON schools.location_id = locations.id

So they will load everything from groups and schools. Maybe they should
be left join's not left outer joins?


--
Postgresql & php tutorials
http://www.designmagick.com/

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Planner enhancement suggestion.
Next
From: "i.v.r."
Date:
Subject: Re: Help understanding indexes, explain, and optimizing