Re: Performance woes - Mailing list pgsql-general
From | Benjamin Smith |
---|---|
Subject | Re: Performance woes |
Date | |
Msg-id | 200512121557.20818.lists@benjamindsmith.com Whole thread Raw |
In response to | Re: Performance woes (Stephan Szabo <sszabo@megazone.bigpanda.com>) |
Responses |
Re: Performance woes
|
List | pgsql-general |
The example that I gave was a small one to illustrate my understanding of multiple foreign keys, indexes and how they work together. (or don't) The actual query is quite a bit bigger and nastier. I've considered breaking it up into smaller pieces, but this query has been extensively tested and debugged. It's on a complex schema carefully designed to meet some very demanding requirements. Summary: It provides a list of assignments (stmoduleobjass) for a student, within the dates of their registering for classes, said assignments being grouped by "modules" (stmodules - think: chapters) that are assigned with a unique due date. This pulls information about the class (lcclasses), transcript, (tr_summary) module, curriculum, (lccourses, lccourses2classes), and grades (stgrades) given on the assignments if given, constrained by the students school enrollment date (enrollments), as well as the date the student was officially enrolled in the class (lcregistrations). Some values are dynamically filled in by the underlying PHP script with a form of prepared statement. (not the PG prepared statement) 1) students_id = the id# of the student. 2) scope_id = the id# of the school. 3) schoolyear. "2005 - 2006". Would it be needed to provide the schema as well? explain analyze SELECT lcclasses.name AS title, stmoduleobjass.lcclasses_id, lcclasses.transcriptcode AS lcclasses_transcriptcode, lcclasses.coursecode AS lcclasses_coursecode, lcclasses.credits AS lcclasses_credits, lcclasses.credittype AS lcclasses_credittype, lccourses2classes.value AS lccourses_value, tr_summary.title AS classcategory, stmodules.lccourses_id AS lccourses_id, stmodules.lccourses_currcode AS lccourses_currcode, stmodules.lccourses_title AS name, stmodules.module AS module, stmodules.title AS mod_title, stmoduleobjass.title AS ass_title, stmoduleobjass.due, stmoduleobjass.weight, stmoduleobjass.isobj, stmoduleobjass.lcclassweighttypes_key AS key, stgrades.grade, lccourses.text as lccourses_text, lccourses.category AS category FROM enrollments, stmoduleobjass LEFT OUTER JOIN stmodules ON ( stmodules.module=stmoduleobjass.module AND stmodules.id=stmoduleobjass.stmodules_id AND stmodules.lcclasses_id=stmoduleobjass.lcclasses_id ) LEFT OUTER JOIN stgrades ON ( stmoduleobjass.due =stgrades.due AND stmoduleobjass.lcclasses_id =stgrades.lcclasses_id AND stmoduleobjass.stmodules_id =stgrades.stmodules_id AND stmoduleobjass.title =stgrades.title AND stgrades.students_id=2019 AND (1=0 OR stgrades.approved=TRUE) ) LEFT OUTER JOIN lccourses2classes ON ( stmoduleobjass.lcclasses_id=lccourses2classes.lcclasses_id AND lccourses2classes.scope_id=18 AND lccourses2classes.lccourses_id=stmodules.lccourses_id ) LEFT OUTER JOIN lccourses ON ( stmodules.lccourses_id=lccourses.id ), lcregistrations, lcclasses LEFT OUTER JOIN tr_summary ON ( lcclasses.tr_summary_id=tr_summary.id AND tr_summary.scope_id=18 ) WHERE enrollments.students_id=2019 AND enrollments.start<=stmoduleobjass.due AND ( enrollments.finish>=stmoduleobjass.due OR enrollments.finish=0 ) AND ( stmoduleobjass.approved=TRUE OR 1=0 ) AND stmoduleobjass.lcclasses_id=lcregistrations.lcclasses_id AND lcregistrations.students_id=2019 AND lcregistrations.startdayt <= stmoduleobjass.due AND ( lcregistrations.finishdayt >=stmoduleobjass.due OR lcregistrations.finishdayt=0 ) AND stmoduleobjass.lcclasses_id=lcclasses.id AND lcclasses.scope_id=18 AND lcclasses.schoolyear='2005 - 2006' AND lcclasses.id=stmodules.lcclasses_id ORDER BY lcclasses_id, due ASC; Here's the output: ************************************************************** Sort (cost=998.26..998.27 rows=1 width=276) (actual time=2218.759..2219.133 rows=251 loops=1) Sort Key: stmoduleobjass.lcclasses_id, stmoduleobjass.due -> Nested Loop (cost=778.06..998.25 rows=1 width=276) (actual time=1230.066..2216.758 rows=251 loops=1) Join Filter: ("outer".lcclasses_id = "inner".id) -> Nested Loop (cost=772.83..802.04 rows=1 width=216) (actual time=1227.643..1552.699 rows=251 loops=1) Join Filter: (("inner".startdayt <= "outer".due) AND (("inner".finishdayt >= "outer".due) OR ("inner".finishdayt = 0))) -> Nested Loop (cost=772.83..796.42 rows=1 width=212) (actual time=1227.501..1477.948 rows=10334 loops=1) Join Filter: (("outer"."start" <= "inner".due) AND (("outer".finish >= "inner".due) OR ("outer".finish = 0))) -> Index Scan using e_valid_students2start on enrollments (cost=0.00..5.71 rows=1 width=8) (actual time=0.053..0.060 rows=1 loops=1) Index Cond: (students_id = 2019) -> Nested Loop Left Join (cost=772.83..790.64 rows=4 width=212) (actual time=1227.413..1442.024 rows=10334 loops=1) -> Merge Left Join (cost=772.83..774.70 rows=4 width=178) (actual time=1227.361..1320.527 rows=10334 loops=1) Merge Cond: (("outer".lcclasses_id = "inner".lcclasses_id) AND ("outer".lccourses_id = "inner".lccourses_id)) -> Sort (cost=754.15..754.16 rows=4 width=174) (actual time=1226.011..1248.426 rows=10334 loops=1) Sort Key: stmoduleobjass.lcclasses_id, stmodules.lccourses_id -> Nested Loop Left Join (cost=58.13..754.11 rows=4 width=174) (actual time=8.530..1125.515 rows=10334 loops=1) -> Hash Join (cost=58.13..730.01 rows=4 width=173) (actual time=8.490..81.181 rows=10334 loops=1) Hash Cond: (("outer".module = "inner".module) AND ("outer".stmodules_id = "inner".id) AND ("outer".lcclasses_id = "inner".lcclasses_id)) -> Seq Scan on stmoduleobjass (cost=0.00..441.14 rows=10253 width=92) (actual time=0.015..25.313 rows=10334 loops=1) Filter: approved -> Hash (cost=45.22..45.22 rows=1722 width=89) (actual time=8.437..8.437 rows=1722 loops=1) -> Seq Scan on stmodules (cost=0.00..45.22 rows=1722 width=89) (actual time=0.009..3.921 rows=1722 loops=1) -> Index Scan using get_stgrades_to_work2 on stgrades (cost=0.00..6.01 rows=1 width=83) (actual time=0.096..0.096 rows=0 loops=10334) Index Cond: (("outer".lcclasses_id = stgrades.lcclasses_id) AND ("outer".stmodules_id = stgrades.stmodules_id) AND (("outer".title)::text = (stgrades.title)::text) AND ("outer".due = stgrades.due)) Filter: ((students_id = 2019) AND approved) -> Sort (cost=18.68..19.31 rows=251 width=12) (actual time=1.317..16.207 rows=9728 loops=1) Sort Key: lccourses2classes.lcclasses_id, lccourses2classes.lccourses_id -> Seq Scan on lccourses2classes (cost=0.00..8.68 rows=251 width=12) (actual time=0.087..0.646 rows=251 loops=1) Filter: (scope_id = 18) -> Index Scan using lccourses_id_key on lccourses (cost=0.00..3.97 rows=1 width=38) (actual time=0.003..0.005 rows=1 loops=10334) Index Cond: ("outer".lccourses_id = lccourses.id) -> Index Scan using unique_lcclasses_students2 on lcregistrations (cost=0.00..5.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=10334) Index Cond: (("outer".lcclasses_id = lcregistrations.lcclasses_id) AND (lcregistrations.students_id = 2019)) -> Hash Left Join (cost=5.22..193.08 rows=250 width=72) (actual time=0.040..2.240 rows=250 loops=251) Hash Cond: ("outer".tr_summary_id = "inner".id) -> Seq Scan on lcclasses (cost=0.00..186.50 rows=250 width=60) (actual time=0.036..1.296 rows=250 loops=251) Filter: ((scope_id = 18) AND ((schoolyear)::text = '2005 - 2006'::text)) -> Hash (cost=5.19..5.19 rows=12 width=20) (actual time=0.102..0.102 rows=10 loops=1) -> Bitmap Heap Scan on tr_summary (cost=2.04..5.19 rows=12 width=20) (actual time=0.043..0.063 rows=10 loops=1) Recheck Cond: (scope_id = 18) -> Bitmap Index Scan on unique_rollover (cost=0.00..2.04 rows=12 width=0) (actual time=0.027..0.027 rows=10 loops=1) Index Cond: (scope_id = 18) Total runtime: 2222.063 ms ************************************************************** Configuration: I tried tweaking shared_buffers, but adding more/less did nothing to improve performance. Current values: Dual proc Opteron 2.0 Ghz, 4 GB ECC RAM. 10k SCSI drives, software RAID 1 Centos 4.2 (Redhat ES clone) PostgreSQL 8.1, 64 bit, loaded with RPMs from the PG website for Redhat ES. max_connections 64 shared_buffers 250000 temp_buffers 10000 max_prepared_transactions = 0 work_mem 1024 maintenance_work_mem = 16384 max_stack_depth 9240 autovacuum on autovacuum_naptime 600 Side note: When I add indexes or change table definitions to try to get PG to use indexes, performance tanks instantly to very, poor. (> 30 seconds query time) But, when I run the query a few times, and then run vacuum analyze, it snaps back down to the 2-ish second range. -Ben On Saturday 10 December 2005 11:50, Stephan Szabo wrote: > > On Sat, 10 Dec 2005, Benjamin Smith wrote: > > > A few questions: > > > > 1) Let's assume that I have some multipile foreign keys, and I join on three > > values. For example: > > > > Create table gangsters ( > > name varchar not null, > > birthdate integer not null, > > shirtnumber integer not null, > > primary key (name, birthdate, shirtnumber); > > > > create table children ( > > father_name varchar not null, > > father_bd integer not null, > > father_shirtnumber integer not null, > > birthdate integer not null, > > name varchar not null, > > foreign key (father_name, father_bd, father_shirtnumber) REFERENCES > > gangsters(name, birthdate, shirtnumber) > > ); > > > > We have two table declarations, each with implicit indexes: > > 1) table gangsters has a primary_key index on name, birthdate, shirtnumber. > > 2) children has an implicit index on father_name, father_bd, > > father_shirtnumber. (right?) > > AFAIK, not unless you create one. > > Explain analyze output for the query would probably be useful as well. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- "The best way to predict the future is to invent it." - XEROX PARC slogan, circa 1978
pgsql-general by date: