Re: Performance woes - Mailing list pgsql-general

From Benjamin Smith
Subject Re: Performance woes
Date
Msg-id 200512121557.20818.lists@benjamindsmith.com
Whole thread Raw
In response to Re: Performance woes  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
Responses Re: Performance woes
List pgsql-general
The example that I gave was a small one to illustrate my understanding of
multiple foreign keys, indexes and how they work together. (or don't) The
actual query is quite a bit bigger and nastier. I've considered breaking it
up into smaller pieces, but this query has been extensively tested and
debugged. It's on a complex schema carefully designed to meet some very
demanding requirements.

Summary: It provides a list of assignments (stmoduleobjass) for a student,
within the dates of their registering for classes, said assignments being
grouped by "modules" (stmodules - think: chapters) that are assigned with a
unique due date. This pulls information about the class (lcclasses),
transcript, (tr_summary) module, curriculum, (lccourses, lccourses2classes),
and grades (stgrades) given on the assignments if given, constrained by the
students school enrollment date (enrollments), as well as the date the
student was officially enrolled in the class (lcregistrations).

Some values are dynamically filled in by the underlying PHP script with a form
of prepared statement.  (not the PG prepared statement)
    1) students_id = the id# of the student.
    2) scope_id = the id# of the school.
    3) schoolyear. "2005 - 2006".

Would it be needed to provide the schema as well?

explain analyze SELECT
    lcclasses.name AS title,
    stmoduleobjass.lcclasses_id,
    lcclasses.transcriptcode AS lcclasses_transcriptcode,
    lcclasses.coursecode AS lcclasses_coursecode,
    lcclasses.credits AS lcclasses_credits,
    lcclasses.credittype AS lcclasses_credittype,
    lccourses2classes.value AS lccourses_value,
    tr_summary.title AS classcategory,
    stmodules.lccourses_id AS lccourses_id,
    stmodules.lccourses_currcode AS lccourses_currcode,
    stmodules.lccourses_title AS name,
    stmodules.module AS module,
    stmodules.title AS mod_title,
    stmoduleobjass.title AS ass_title,
    stmoduleobjass.due,
    stmoduleobjass.weight,
    stmoduleobjass.isobj,
    stmoduleobjass.lcclassweighttypes_key AS key,
    stgrades.grade,
    lccourses.text as lccourses_text,
    lccourses.category AS category
FROM enrollments, stmoduleobjass
LEFT OUTER JOIN stmodules ON
    (
    stmodules.module=stmoduleobjass.module
    AND stmodules.id=stmoduleobjass.stmodules_id
    AND stmodules.lcclasses_id=stmoduleobjass.lcclasses_id
    )
LEFT OUTER JOIN stgrades ON
    (
    stmoduleobjass.due =stgrades.due
    AND stmoduleobjass.lcclasses_id =stgrades.lcclasses_id
    AND stmoduleobjass.stmodules_id  =stgrades.stmodules_id
    AND stmoduleobjass.title     =stgrades.title
    AND stgrades.students_id=2019
    AND (1=0 OR stgrades.approved=TRUE)
    )
LEFT OUTER JOIN lccourses2classes ON
    (
    stmoduleobjass.lcclasses_id=lccourses2classes.lcclasses_id
    AND lccourses2classes.scope_id=18
    AND lccourses2classes.lccourses_id=stmodules.lccourses_id
    )
LEFT OUTER JOIN lccourses ON
    (
    stmodules.lccourses_id=lccourses.id
    ),
lcregistrations, lcclasses
LEFT OUTER JOIN tr_summary ON
    (
    lcclasses.tr_summary_id=tr_summary.id
    AND tr_summary.scope_id=18
    )
WHERE enrollments.students_id=2019
AND enrollments.start<=stmoduleobjass.due
AND        (
    enrollments.finish>=stmoduleobjass.due
    OR enrollments.finish=0
    )
AND      (
    stmoduleobjass.approved=TRUE
    OR 1=0
    )
AND stmoduleobjass.lcclasses_id=lcregistrations.lcclasses_id
AND lcregistrations.students_id=2019
AND lcregistrations.startdayt <= stmoduleobjass.due
AND    (
    lcregistrations.finishdayt >=stmoduleobjass.due
    OR lcregistrations.finishdayt=0
    )
AND stmoduleobjass.lcclasses_id=lcclasses.id
AND lcclasses.scope_id=18
AND lcclasses.schoolyear='2005 - 2006'
AND lcclasses.id=stmodules.lcclasses_id
ORDER BY lcclasses_id, due ASC;

Here's the output:
**************************************************************
 Sort  (cost=998.26..998.27 rows=1 width=276) (actual time=2218.759..2219.133
rows=251 loops=1)
   Sort Key: stmoduleobjass.lcclasses_id, stmoduleobjass.due
   ->  Nested Loop  (cost=778.06..998.25 rows=1 width=276) (actual
time=1230.066..2216.758 rows=251 loops=1)
         Join Filter: ("outer".lcclasses_id = "inner".id)
         ->  Nested Loop  (cost=772.83..802.04 rows=1 width=216) (actual
time=1227.643..1552.699 rows=251 loops=1)
               Join Filter: (("inner".startdayt <= "outer".due) AND
(("inner".finishdayt >= "outer".due) OR ("inner".finishdayt = 0)))
               ->  Nested Loop  (cost=772.83..796.42 rows=1 width=212) (actual
time=1227.501..1477.948 rows=10334 loops=1)
                     Join Filter: (("outer"."start" <= "inner".due) AND
(("outer".finish >= "inner".due) OR ("outer".finish = 0)))
                     ->  Index Scan using e_valid_students2start on
enrollments  (cost=0.00..5.71 rows=1 width=8) (actual time=0.053..0.060
rows=1 loops=1)
                           Index Cond: (students_id = 2019)
                     ->  Nested Loop Left Join  (cost=772.83..790.64 rows=4
width=212) (actual time=1227.413..1442.024 rows=10334 loops=1)
                           ->  Merge Left Join  (cost=772.83..774.70 rows=4
width=178) (actual time=1227.361..1320.527 rows=10334 loops=1)
                                 Merge Cond: (("outer".lcclasses_id =
"inner".lcclasses_id) AND ("outer".lccourses_id = "inner".lccourses_id))
                                 ->  Sort  (cost=754.15..754.16 rows=4
width=174) (actual time=1226.011..1248.426 rows=10334 loops=1)
                                       Sort Key: stmoduleobjass.lcclasses_id,
stmodules.lccourses_id
                                       ->  Nested Loop Left Join
(cost=58.13..754.11 rows=4 width=174) (actual time=8.530..1125.515 rows=10334
loops=1)
                                             ->  Hash Join
(cost=58.13..730.01 rows=4 width=173) (actual time=8.490..81.181 rows=10334
loops=1)
                                                   Hash Cond: (("outer".module
= "inner".module) AND ("outer".stmodules_id = "inner".id) AND
("outer".lcclasses_id = "inner".lcclasses_id))
                                                   ->  Seq Scan on
stmoduleobjass  (cost=0.00..441.14 rows=10253 width=92) (actual
time=0.015..25.313 rows=10334 loops=1)
                                                         Filter: approved
                                                   ->  Hash
(cost=45.22..45.22 rows=1722 width=89) (actual time=8.437..8.437 rows=1722
loops=1)
                                                         ->  Seq Scan on
stmodules  (cost=0.00..45.22 rows=1722 width=89) (actual time=0.009..3.921
rows=1722 loops=1)
                                             ->  Index Scan using
get_stgrades_to_work2 on stgrades  (cost=0.00..6.01 rows=1 width=83) (actual
time=0.096..0.096 rows=0 loops=10334)
                                                   Index Cond:
(("outer".lcclasses_id = stgrades.lcclasses_id) AND ("outer".stmodules_id =
stgrades.stmodules_id) AND (("outer".title)::text = (stgrades.title)::text)
AND ("outer".due = stgrades.due))
                                                   Filter: ((students_id =
2019) AND approved)
                                 ->  Sort  (cost=18.68..19.31 rows=251
width=12) (actual time=1.317..16.207 rows=9728 loops=1)
                                       Sort Key:
lccourses2classes.lcclasses_id, lccourses2classes.lccourses_id
                                       ->  Seq Scan on lccourses2classes
(cost=0.00..8.68 rows=251 width=12) (actual time=0.087..0.646 rows=251
loops=1)
                                             Filter: (scope_id = 18)
                           ->  Index Scan using lccourses_id_key on lccourses
(cost=0.00..3.97 rows=1 width=38) (actual time=0.003..0.005 rows=1
loops=10334)
                                 Index Cond: ("outer".lccourses_id =
lccourses.id)
               ->  Index Scan using unique_lcclasses_students2 on
lcregistrations  (cost=0.00..5.61 rows=1 width=12) (actual time=0.004..0.004
rows=0 loops=10334)
                     Index Cond: (("outer".lcclasses_id =
lcregistrations.lcclasses_id) AND (lcregistrations.students_id = 2019))
         ->  Hash Left Join  (cost=5.22..193.08 rows=250 width=72) (actual
time=0.040..2.240 rows=250 loops=251)
               Hash Cond: ("outer".tr_summary_id = "inner".id)
               ->  Seq Scan on lcclasses  (cost=0.00..186.50 rows=250
width=60) (actual time=0.036..1.296 rows=250 loops=251)
                     Filter: ((scope_id = 18) AND ((schoolyear)::text = '2005
- 2006'::text))
               ->  Hash  (cost=5.19..5.19 rows=12 width=20) (actual
time=0.102..0.102 rows=10 loops=1)
                     ->  Bitmap Heap Scan on tr_summary  (cost=2.04..5.19
rows=12 width=20) (actual time=0.043..0.063 rows=10 loops=1)
                           Recheck Cond: (scope_id = 18)
                           ->  Bitmap Index Scan on unique_rollover
(cost=0.00..2.04 rows=12 width=0) (actual time=0.027..0.027 rows=10 loops=1)
                                 Index Cond: (scope_id = 18)
 Total runtime: 2222.063 ms
**************************************************************

Configuration: I tried tweaking shared_buffers, but adding more/less did
nothing to improve performance. Current values:

Dual proc Opteron 2.0 Ghz,
4 GB ECC RAM.
10k SCSI drives, software RAID 1
Centos 4.2 (Redhat ES clone)
PostgreSQL 8.1, 64 bit, loaded with RPMs from the PG website for Redhat ES.

max_connections 64
shared_buffers 250000
temp_buffers 10000
max_prepared_transactions = 0
work_mem 1024
maintenance_work_mem = 16384
max_stack_depth 9240
autovacuum on
autovacuum_naptime 600

Side note: When I add indexes or change table definitions to try to get PG to
use indexes, performance tanks instantly to very, poor. (> 30 seconds query
time) But, when I run the query a few times, and then run vacuum analyze, it
snaps back down to the 2-ish second range.

-Ben

On Saturday 10 December 2005 11:50, Stephan Szabo wrote:
>
> On Sat, 10 Dec 2005, Benjamin Smith wrote:
>
> > A few questions:
> >
> > 1) Let's assume that I have some multipile foreign keys, and I join on
three
> > values. For example:
> >
> > Create table gangsters (
> >     name varchar not null,
> >     birthdate integer not null,
> >     shirtnumber integer not null,
> >     primary key (name, birthdate, shirtnumber);
> >
> > create table children (
> >     father_name varchar not null,
> >     father_bd integer not null,
> >     father_shirtnumber integer not null,
> >     birthdate integer not null,
> >     name varchar not null,
> >     foreign key (father_name, father_bd, father_shirtnumber) REFERENCES
> >         gangsters(name, birthdate, shirtnumber)
> >     );
> >
> > We have two table declarations, each with implicit indexes:
> >     1) table gangsters has a primary_key index on name, birthdate,
shirtnumber.
> >     2) children has an implicit index on father_name, father_bd,
> > father_shirtnumber. (right?)
>
> AFAIK, not unless you create one.
>
> Explain analyze output for the query would probably be useful as well.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

--
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: File access problem access(), stat()
Next
From: Tom Lane
Date:
Subject: Re: Performance woes