Help understanding indexes, explain, and optimizing a query - Mailing list pgsql-performance

From i.v.r.
Subject Help understanding indexes, explain, and optimizing a query
Date
Msg-id 440CD0BB.2020800@gmail.com
Whole thread Raw
Responses Re: Help understanding indexes, explain, and optimizing
List pgsql-performance
Hi everyone,

I'm experimenting with PostgreSQL, but since I'm no expert DBA, I'm
experiencing some performance issues.

Please take a look at the following query:

SELECT
  /*groups."name" AS t2_r1,
  groups."id" AS t2_r3,
  groups."user_id" AS t2_r0,
  groups."pretty_url" AS t2_r2,
  locations."postal_code" AS t0_r6,
  locations."pretty_url" AS t0_r7,
  locations."id" AS t0_r8,
  locations."colony_id" AS t0_r0,
  locations."user_id" AS t0_r1,
  locations."group_id" AS t0_r2,
  locations."distinction" AS t0_r3,
  locations."street" AS t0_r4,
  locations."street_2" AS t0_r5,
  schools."updated" AS t1_r10,
  schools."level_id" AS t1_r4,
  schools."pretty_url" AS t1_r11,
  schools."user_id" AS t1_r5,
  schools."id" AS t1_r12,
  schools."type_id" AS t1_r6,
  schools."distinction" AS t1_r7,
  schools."cct" AS t1_r8,
  schools."created_on" AS t1_r9,
  schools."location_id" AS t1_r0,
  schools."service_id" AS t1_r1,
  schools."sustentation_id" AS t1_r2,
  schools."dependency_id" AS t1_r3*/
  groups.*,
  locations.*,
  schools.*
FROM locations
LEFT OUTER JOIN groups ON groups.id = locations.group_id
LEFT OUTER JOIN schools ON schools.location_id = locations.id
WHERE (colony_id = 71501)
ORDER BY groups.name, locations.distinction, schools.distinction

As you can see, I've commented out some parts. I did that as an
experiment, and it improved the query by 2x. I really don't understand
how is that possible... I also tried changing the second join to an
INNER join, and that improves it a little bit also.

Anyway, the main culprit seems to be that second join. Here's the output
from EXPLAIN:

Sort  (cost=94315.15..94318.02 rows=1149 width=852)
   Sort Key: groups.name, locations.distinction, schools.distinction
   ->  Merge Left Join  (cost=93091.96..94256.74 rows=1149 width=852)
         Merge Cond: ("outer".id = "inner".location_id)
         ->  Sort  (cost=4058.07..4060.94 rows=1148 width=646)
               Sort Key: locations.id
               ->  Hash Left Join  (cost=1.01..3999.72 rows=1148 width=646)
                     Hash Cond: ("outer".group_id = "inner".id)
                     ->  Index Scan using locations_colony_id on
locations  (cost=0.00..3992.91 rows=1148 width=452)
                           Index Cond: (colony_id = 71501)
                     ->  Hash  (cost=1.01..1.01 rows=1 width=194)
                           ->  Seq Scan on groups  (cost=0.00..1.01
rows=1 width=194)
         ->  Sort  (cost=89033.90..89607.67 rows=229510 width=206)
               Sort Key: schools.location_id
               ->  Seq Scan on schools  (cost=0.00..5478.10 rows=229510
width=206)

I don't completely understand what that output means, but it would seem
that the first join costs about 4000, but if I remove that join from the
query, the performance difference is negligible. So as I said, it seems
the problem is the join on the schools table.

I hope it's ok for me to post the relevant tables here, so here they are
(I removed some constraints and indexes that aren't relevant to the
query above):

CREATE TABLE groups
(
  user_id int4 NOT NULL,
  name varchar(50) NOT NULL,
  pretty_url varchar(50) NOT NULL,
  id serial NOT NULL,
  CONSTRAINT groups_pk PRIMARY KEY (id),
)

CREATE TABLE locations
(
  colony_id int4 NOT NULL,
  user_id int4 NOT NULL,
  group_id int4 NOT NULL,
  distinction varchar(60) NOT NULL,
  street varchar(60) NOT NULL,
  street_2 varchar(50) NOT NULL,
  postal_code varchar(5) NOT NULL,
  pretty_url varchar(60) NOT NULL,
  id serial NOT NULL,
  CONSTRAINT locations_pk PRIMARY KEY (id),
  CONSTRAINT colony FOREIGN KEY (colony_id)
      REFERENCES colonies (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT "group" FOREIGN KEY (group_id)
      REFERENCES groups (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
)
CREATE INDEX locations_fki_colony
  ON locations
  USING btree
  (colony_id);
CREATE INDEX locations_fki_group
  ON locations
  USING btree
  (group_id);

CREATE TABLE schools
(
  location_id int4 NOT NULL,
  service_id int4 NOT NULL,
  sustentation_id int4 NOT NULL,
  dependency_id int4 NOT NULL,
  level_id int4 NOT NULL,
  user_id int4 NOT NULL,
  type_id int4 NOT NULL,
  distinction varchar(25) NOT NULL,
  cct varchar(20) NOT NULL,
  created_on timestamp(0) NOT NULL,
  updated timestamp(0),
  pretty_url varchar(25) NOT NULL,
  id serial NOT NULL,
  CONSTRAINT schools_pk PRIMARY KEY (id),
  CONSTRAINT "location" FOREIGN KEY (location_id)
      REFERENCES locations (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
)
CREATE INDEX schools_fki_location
  ON schools
  USING btree
  (location_id);

So I'm wondering what I'm doing wrong. I migrated this database from
MySQL, and on there it ran pretty fast.

Kind regards,
Ivan V.


pgsql-performance by date:

Previous
From: "Jignesh K. Shah"
Date:
Subject: Re: t1000/t2000 sun-servers
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Planner enhancement suggestion.