Re: Help understanding indexes, explain, and optimizing - Mailing list pgsql-performance
From | Chris |
---|---|
Subject | Re: Help understanding indexes, explain, and optimizing |
Date | |
Msg-id | 440CD673.8080208@gmail.com Whole thread Raw |
In response to | Help understanding indexes, explain, and optimizing a query ("i.v.r." <ivanvega@gmail.com>) |
Responses |
Re: Help understanding indexes, explain, and optimizing
Re: Help understanding indexes, explain, and optimizing |
List | pgsql-performance |
i.v.r. wrote: > Hi everyone, > > I'm experimenting with PostgreSQL, but since I'm no expert DBA, I'm > experiencing some performance issues. > > Please take a look at the following query: > > SELECT > /*groups."name" AS t2_r1, > groups."id" AS t2_r3, > groups."user_id" AS t2_r0, > groups."pretty_url" AS t2_r2, > locations."postal_code" AS t0_r6, > locations."pretty_url" AS t0_r7, > locations."id" AS t0_r8, > locations."colony_id" AS t0_r0, > locations."user_id" AS t0_r1, > locations."group_id" AS t0_r2, > locations."distinction" AS t0_r3, > locations."street" AS t0_r4, > locations."street_2" AS t0_r5, > schools."updated" AS t1_r10, > schools."level_id" AS t1_r4, > schools."pretty_url" AS t1_r11, > schools."user_id" AS t1_r5, > schools."id" AS t1_r12, > schools."type_id" AS t1_r6, > schools."distinction" AS t1_r7, > schools."cct" AS t1_r8, > schools."created_on" AS t1_r9, > schools."location_id" AS t1_r0, > schools."service_id" AS t1_r1, > schools."sustentation_id" AS t1_r2, > schools."dependency_id" AS t1_r3*/ > groups.*, > locations.*, > schools.* > FROM locations > LEFT OUTER JOIN groups ON groups.id = locations.group_id > LEFT OUTER JOIN schools ON schools.location_id = locations.id > WHERE (colony_id = 71501) > ORDER BY groups.name, locations.distinction, schools.distinction > > As you can see, I've commented out some parts. I did that as an > experiment, and it improved the query by 2x. I really don't understand > how is that possible... I also tried changing the second join to an > INNER join, and that improves it a little bit also. > > Anyway, the main culprit seems to be that second join. Here's the output > from EXPLAIN: > > Sort (cost=94315.15..94318.02 rows=1149 width=852) > Sort Key: groups.name, locations.distinction, schools.distinction > -> Merge Left Join (cost=93091.96..94256.74 rows=1149 width=852) > Merge Cond: ("outer".id = "inner".location_id) > -> Sort (cost=4058.07..4060.94 rows=1148 width=646) > Sort Key: locations.id > -> Hash Left Join (cost=1.01..3999.72 rows=1148 width=646) > Hash Cond: ("outer".group_id = "inner".id) > -> Index Scan using locations_colony_id on > locations (cost=0.00..3992.91 rows=1148 width=452) > Index Cond: (colony_id = 71501) > -> Hash (cost=1.01..1.01 rows=1 width=194) > -> Seq Scan on groups (cost=0.00..1.01 > rows=1 width=194) > -> Sort (cost=89033.90..89607.67 rows=229510 width=206) > Sort Key: schools.location_id > -> Seq Scan on schools (cost=0.00..5478.10 rows=229510 > width=206) > > I don't completely understand what that output means, but it would seem > that the first join costs about 4000, but if I remove that join from the > query, the performance difference is negligible. So as I said, it seems > the problem is the join on the schools table. > > I hope it's ok for me to post the relevant tables here, so here they are > (I removed some constraints and indexes that aren't relevant to the > query above): > > CREATE TABLE groups > ( > user_id int4 NOT NULL, > name varchar(50) NOT NULL, > pretty_url varchar(50) NOT NULL, > id serial NOT NULL, > CONSTRAINT groups_pk PRIMARY KEY (id), > ) > > CREATE TABLE locations > ( > colony_id int4 NOT NULL, > user_id int4 NOT NULL, > group_id int4 NOT NULL, > distinction varchar(60) NOT NULL, > street varchar(60) NOT NULL, > street_2 varchar(50) NOT NULL, > postal_code varchar(5) NOT NULL, > pretty_url varchar(60) NOT NULL, > id serial NOT NULL, > CONSTRAINT locations_pk PRIMARY KEY (id), > CONSTRAINT colony FOREIGN KEY (colony_id) > REFERENCES colonies (id) MATCH SIMPLE > ON UPDATE NO ACTION ON DELETE NO ACTION, > CONSTRAINT "group" FOREIGN KEY (group_id) > REFERENCES groups (id) MATCH SIMPLE > ON UPDATE NO ACTION ON DELETE NO ACTION, > ) > CREATE INDEX locations_fki_colony > ON locations > USING btree > (colony_id); > CREATE INDEX locations_fki_group > ON locations > USING btree > (group_id); > > CREATE TABLE schools > ( > location_id int4 NOT NULL, > service_id int4 NOT NULL, > sustentation_id int4 NOT NULL, > dependency_id int4 NOT NULL, > level_id int4 NOT NULL, > user_id int4 NOT NULL, > type_id int4 NOT NULL, > distinction varchar(25) NOT NULL, > cct varchar(20) NOT NULL, > created_on timestamp(0) NOT NULL, > updated timestamp(0), > pretty_url varchar(25) NOT NULL, > id serial NOT NULL, > CONSTRAINT schools_pk PRIMARY KEY (id), > CONSTRAINT "location" FOREIGN KEY (location_id) > REFERENCES locations (id) MATCH SIMPLE > ON UPDATE NO ACTION ON DELETE NO ACTION, > ) > CREATE INDEX schools_fki_location > ON schools > USING btree > (location_id); > > So I'm wondering what I'm doing wrong. I migrated this database from > MySQL, and on there it ran pretty fast. Have you done an 'analyze' or 'vacuum analyze' over these tables? A left outer join gets *everything* from the second table: > LEFT OUTER JOIN groups ON groups.id = locations.group_id > LEFT OUTER JOIN schools ON schools.location_id = locations.id So they will load everything from groups and schools. Maybe they should be left join's not left outer joins? -- Postgresql & php tutorials http://www.designmagick.com/
pgsql-performance by date: