Re: enable_XXX options - Mailing list pgsql-performance

From Peterson, Bjorn
Subject Re: enable_XXX options
Date
Msg-id 42E7C401A4CBCD49829BB53DDD538BA402114414@chmpgexgmail3.ic.ncs.com
Whole thread Raw
In response to enable_XXX options  ("Peterson, Bjorn" <Bjorn.Peterson@pearson.com>)
Responses Re: enable_XXX options
List pgsql-performance
-----Original Message-----
>From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>Sent: Friday, April 01, 2005 10:37 AM
>To: Peterson, Bjorn
>Cc: pgsql-performance@postgresql.org
>Subject: Re: [PERFORM] enable_XXX options
>
>"Peterson, Bjorn" <Bjorn.Peterson@pearson.com> writes:
>> I have a query in my application that takes an unreasonable amount of
time
>> to complete (>4.5 hours execution time).   After reviewing the EXPLAIN
and
>> EXPLAIN ANALYZE output for that and similar queries, my colleagues and I
>> determined that turning off the enable_nestloop option might help - we
>> noticed dramatic speed improvements for that specific query after doing
so
>> (<2 minutes execution time).  I was warned not to mess with the
enable_XXX
>> options in a production environment, but does anyone see any problem with
>> turning off the enable_nestloop option right before executing my query
and
>> turning it back on afterwards?
>
>That's what it's there for ... but it would be useful to look into why
>the planner gets it so wrong without that hint.  Could we see EXPLAIN
>ANALYZE both ways?
>
>            regards, tom lane
>


Below is my query and the output of EXPLAIN - I was not able to run EXPLAIN
ANALYZE, as the query never completes unless we turn enable_nestloop off:

SELECT t.term_id AS term_id, a.user_id AS user_id, a.time_slot AS course_id,
a.attendance_status AS status, SUM(CASE WHEN a.attendance_date>=t.start_date
THEN 1 ELSE 0 END) AS cur_total, COUNT(a.attendance_date) AS ytd_total FROM
"Attendance" a, "Terms" t, "Terms" ytd, "CoursesUsers" cu, "Courses" c,
"CoursesOffered" co, "Schools" s WHERE a.attendance_type=1 AND
a.attendance_status IN(3,4,1,2) AND a.attendance_date>=ytd.start_date AND
a.attendance_date<=t.end_date AND a.attendance_date<=now() AND
a.user_id=cu.user_id AND a.time_slot=cu.course_id AND
cu.course_id=c.course_id AND co.course_offered_id=c.course_offered_id AND
co.school_id=s.school_id AND s.district_id=2 AND ytd.term_id=t.top_term_id
AND c.course_id IN
(221,395,244,394,366,370,400,11,373,369,392,406,398,381,391,393,403,376,220,
846,440,935,910,431,428,904,905,222,201,453,913,1794,408,901,856,424,443,175
0,452,461,462,471,463,911,489,821,916,501,223) GROUP BY a.user_id,
a.time_slot, t.term_id, a.attendance_status ORDER BY a.user_id, a.time_slot,
t.term_id, a.attendance_status

The Attendance table is the largest (about 2 million records), Terms has
about 50 records, CoursesUsers has about 30,000 records, Courses has about
2000 records, CoursesOffered has about 1000 records, and Schools has 3
records.  The purpose of this query is to retrieve the number of absences
for each student/course/term combination - we need separate totals for
year-to-date (from the start of the school year), and for absences only
within the current term.  Every field referenced in the WHERE clause has an
appropriate single or multi-column index.

We are using the standard PostgreSQL JDBC driver and the only parameter
being set in this query is the district_id (s.district_id=2).  We are
running Postgres 8.0.1 on a Windows 2000 server.


With enable_nestloop on (default):

QUERY PLAN
GroupAggregate  (cost=4674.63..4677.13 rows=100 width=22)
  ->  Sort  (cost=4674.63..4674.88 rows=100 width=22)
        Sort Key: a.user_id
        ->  Nested Loop  (cost=276.69..4671.30 rows=100 width=22)
              Join Filter: (("outer".attendance_date <= "inner".end_date)
AND ("outer".attendance_date >= "inner".start_date))
              ->  Hash Join  (cost=273.30..4649.92 rows=20 width=14)
                    Hash Cond: ("outer".school_id = "inner".school_id)
                    ->  Nested Loop  (cost=272.26..4648.50 rows=25 width=18)
                          ->  Hash Join  (cost=272.26..986.69 rows=836
width=16)
                                Hash Cond: ("outer".course_offered_id =
"inner".course_offered_id)
                                ->  Hash Join  (cost=246.81..948.70 rows=836
width=16)
                                      Hash Cond: ("outer".course_id =
"inner".course_id)
                                      ->  Seq Scan on "CoursesUsers" cu
(cost=0.00..545.02 rows=29702 width=8)
                                      ->  Hash  (cost=246.68..246.68 rows=49
width=8)
                                            ->  Seq Scan on "Courses" c
(cost=0.00..246.68 rows=49 width=8)
                                                  Filter: ((course_id = 221)
OR (course_id = 395) OR (course_id = 244) OR (course_id = 394) OR (course_id
= 366) OR (course_id = 370) OR (course_id = 400) OR (course_id = 11) OR
(course_id = 373) OR (course_i (..)
                                ->  Hash  (cost=23.36..23.36 rows=836
width=8)
                                      ->  Seq Scan on "CoursesOffered" co
(cost=0.00..23.36 rows=836 width=8)
                          ->  Index Scan using "Attendance_pkey" on
"Attendance" a  (cost=0.00..4.37 rows=1 width=14)
                                Index Cond: ((a.attendance_date <= now())
AND (a.attendance_type = 1) AND ("outer".course_id = a.time_slot) AND
(a.user_id = "outer".user_id))
                                Filter: ((attendance_status = 3) OR
(attendance_status = 4) OR (attendance_status = 1) OR (attendance_status =
2))
                    ->  Hash  (cost=1.04..1.04 rows=3 width=4)
                          ->  Seq Scan on "Schools" s  (cost=0.00..1.04
rows=3 width=4)
                                Filter: (district_id = 2)
              ->  Materialize  (cost=3.39..3.75 rows=36 width=16)
                    ->  Hash Join  (cost=1.45..3.35 rows=36 width=16)
                          Hash Cond: ("outer".top_term_id = "inner".term_id)
                          ->  Seq Scan on "Terms" t  (cost=0.00..1.36
rows=36 width=16)
                          ->  Hash  (cost=1.36..1.36 rows=36 width=8)
                                ->  Seq Scan on "Terms" ytd
(cost=0.00..1.36 rows=36 width=8)

-------------------------------------------------------------------------

After turning enable_nestloop off:

QUERY PLAN
GroupAggregate  (cost=100078595.13..100078597.63 rows=100 width=22)
  ->  Sort  (cost=100078595.13..100078595.38 rows=100 width=22)
        Sort Key: a.user_id
        ->  Nested Loop  (cost=100078571.91..100078591.81 rows=100 width=22)
              Join Filter: (("inner".attendance_date <= "outer".end_date)
AND ("inner".attendance_date >= "outer".start_date))
              ->  Hash Join  (cost=1.45..3.35 rows=36 width=16)
                    Hash Cond: ("outer".top_term_id = "inner".term_id)
                    ->  Seq Scan on "Terms" t  (cost=0.00..1.36 rows=36
width=16)
                    ->  Hash  (cost=1.36..1.36 rows=36 width=8)
                          ->  Seq Scan on "Terms" ytd  (cost=0.00..1.36
rows=36 width=8)
              ->  Materialize  (cost=78570.46..78570.66 rows=20 width=14)
                    ->  Hash Join  (cost=991.91..78570.44 rows=20 width=14)
                          Hash Cond: ("outer".school_id = "inner".school_id)
                          ->  Hash Join  (cost=990.87..78569.02 rows=25
width=18)
                                Hash Cond: (("outer".time_slot =
"inner".course_id) AND ("outer".user_id = "inner".user_id))
                                ->  Seq Scan on "Attendance" a
(cost=0.00..75599.26 rows=79148 width=14)
                                      Filter: ((attendance_type = 1) AND
((attendance_status = 3) OR (attendance_status = 4) OR (attendance_status =
1) OR (attendance_status = 2)) AND (attendance_date <= now()))
                                ->  Hash  (cost=986.69..986.69 rows=836
width=16)
                                      ->  Hash Join  (cost=272.26..986.69
rows=836 width=16)
                                            Hash Cond:
("outer".course_offered_id = "inner".course_offered_id)
                                            ->  Hash Join
(cost=246.81..948.70 rows=836 width=16)
                                                  Hash Cond:
("outer".course_id = "inner".course_id)
                                                  ->  Seq Scan on
"CoursesUsers" cu  (cost=0.00..545.02 rows=29702 width=8)
                                                  ->  Hash
(cost=246.68..246.68 rows=49 width=8)
                                                        ->  Seq Scan on
"Courses" c  (cost=0.00..246.68 rows=49 width=8)
                                                              Filter:
((course_id = 221) OR (course_id = 395) OR (course_id = 244) OR (course_id =
394) OR (course_id = 366) OR (course_id = 370) OR (course_id = 400) OR
(course_id = 11) OR (course_id = 373)  (..)
                                            ->  Hash  (cost=23.36..23.36
rows=836 width=8)
                                                  ->  Seq Scan on
"CoursesOffered" co  (cost=0.00..23.36 rows=836 width=8)
                          ->  Hash  (cost=1.04..1.04 rows=3 width=4)
                                ->  Seq Scan on "Schools" s
(cost=0.00..1.04 rows=3 width=4)
                                      Filter: (district_id = 2)




****************************************************************************
This email may contain confidential material.
If you were not an intended recipient,
Please notify the sender and delete all copies.
We may monitor email to and from our network.
****************************************************************************

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: enable_XXX options
Next
From: Tom Lane
Date:
Subject: Re: enable_XXX options