Help understanding indexes, explain, and optimizing a query - Mailing list pgsql-performance
From | i.v.r. |
---|---|
Subject | Help understanding indexes, explain, and optimizing a query |
Date | |
Msg-id | 440CD0BB.2020800@gmail.com Whole thread Raw |
Responses |
Re: Help understanding indexes, explain, and optimizing
|
List | pgsql-performance |
Hi everyone, I'm experimenting with PostgreSQL, but since I'm no expert DBA, I'm experiencing some performance issues. Please take a look at the following query: SELECT /*groups."name" AS t2_r1, groups."id" AS t2_r3, groups."user_id" AS t2_r0, groups."pretty_url" AS t2_r2, locations."postal_code" AS t0_r6, locations."pretty_url" AS t0_r7, locations."id" AS t0_r8, locations."colony_id" AS t0_r0, locations."user_id" AS t0_r1, locations."group_id" AS t0_r2, locations."distinction" AS t0_r3, locations."street" AS t0_r4, locations."street_2" AS t0_r5, schools."updated" AS t1_r10, schools."level_id" AS t1_r4, schools."pretty_url" AS t1_r11, schools."user_id" AS t1_r5, schools."id" AS t1_r12, schools."type_id" AS t1_r6, schools."distinction" AS t1_r7, schools."cct" AS t1_r8, schools."created_on" AS t1_r9, schools."location_id" AS t1_r0, schools."service_id" AS t1_r1, schools."sustentation_id" AS t1_r2, schools."dependency_id" AS t1_r3*/ groups.*, locations.*, schools.* FROM locations LEFT OUTER JOIN groups ON groups.id = locations.group_id LEFT OUTER JOIN schools ON schools.location_id = locations.id WHERE (colony_id = 71501) ORDER BY groups.name, locations.distinction, schools.distinction As you can see, I've commented out some parts. I did that as an experiment, and it improved the query by 2x. I really don't understand how is that possible... I also tried changing the second join to an INNER join, and that improves it a little bit also. Anyway, the main culprit seems to be that second join. Here's the output from EXPLAIN: Sort (cost=94315.15..94318.02 rows=1149 width=852) Sort Key: groups.name, locations.distinction, schools.distinction -> Merge Left Join (cost=93091.96..94256.74 rows=1149 width=852) Merge Cond: ("outer".id = "inner".location_id) -> Sort (cost=4058.07..4060.94 rows=1148 width=646) Sort Key: locations.id -> Hash Left Join (cost=1.01..3999.72 rows=1148 width=646) Hash Cond: ("outer".group_id = "inner".id) -> Index Scan using locations_colony_id on locations (cost=0.00..3992.91 rows=1148 width=452) Index Cond: (colony_id = 71501) -> Hash (cost=1.01..1.01 rows=1 width=194) -> Seq Scan on groups (cost=0.00..1.01 rows=1 width=194) -> Sort (cost=89033.90..89607.67 rows=229510 width=206) Sort Key: schools.location_id -> Seq Scan on schools (cost=0.00..5478.10 rows=229510 width=206) I don't completely understand what that output means, but it would seem that the first join costs about 4000, but if I remove that join from the query, the performance difference is negligible. So as I said, it seems the problem is the join on the schools table. I hope it's ok for me to post the relevant tables here, so here they are (I removed some constraints and indexes that aren't relevant to the query above): CREATE TABLE groups ( user_id int4 NOT NULL, name varchar(50) NOT NULL, pretty_url varchar(50) NOT NULL, id serial NOT NULL, CONSTRAINT groups_pk PRIMARY KEY (id), ) CREATE TABLE locations ( colony_id int4 NOT NULL, user_id int4 NOT NULL, group_id int4 NOT NULL, distinction varchar(60) NOT NULL, street varchar(60) NOT NULL, street_2 varchar(50) NOT NULL, postal_code varchar(5) NOT NULL, pretty_url varchar(60) NOT NULL, id serial NOT NULL, CONSTRAINT locations_pk PRIMARY KEY (id), CONSTRAINT colony FOREIGN KEY (colony_id) REFERENCES colonies (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT "group" FOREIGN KEY (group_id) REFERENCES groups (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, ) CREATE INDEX locations_fki_colony ON locations USING btree (colony_id); CREATE INDEX locations_fki_group ON locations USING btree (group_id); CREATE TABLE schools ( location_id int4 NOT NULL, service_id int4 NOT NULL, sustentation_id int4 NOT NULL, dependency_id int4 NOT NULL, level_id int4 NOT NULL, user_id int4 NOT NULL, type_id int4 NOT NULL, distinction varchar(25) NOT NULL, cct varchar(20) NOT NULL, created_on timestamp(0) NOT NULL, updated timestamp(0), pretty_url varchar(25) NOT NULL, id serial NOT NULL, CONSTRAINT schools_pk PRIMARY KEY (id), CONSTRAINT "location" FOREIGN KEY (location_id) REFERENCES locations (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, ) CREATE INDEX schools_fki_location ON schools USING btree (location_id); So I'm wondering what I'm doing wrong. I migrated this database from MySQL, and on there it ran pretty fast. Kind regards, Ivan V.
pgsql-performance by date: