Mysterious performance of query because of plsql function in where condition - Mailing list pgsql-performance
From | Peter Alberer |
---|---|
Subject | Mysterious performance of query because of plsql function in where condition |
Date | |
Msg-id | 000001c46009$062e8260$5be0d089@ekelhardt Whole thread Raw |
Responses |
Re: Mysterious performance of query because of plsql function in where condition
Re: Mysterious performance of query because of plsql function in where condition Re: Mysterious performance of query because of plsql function in |
List | pgsql-performance |
Hi there, i have a problem with a query that uses the result of a plsql function In the where clause: SELECT assignments.assignment_id, assignments.package_id AS package_id, assignments.title AS title, COUNT(*) AS Count FROM assignments INNER JOIN submissions ON (assignments.assignment_id=submissions.assignment_id) WHERE package_id=949589 AND submission_status(submissions.submission_id)='closed' GROUP BY assignments.assignment_id, assignments.package_id, assignments.title ORDER BY assignments.title; Postgres seems to execute the function "submission_status" for every row of the submissions table (~1500 rows). The query therefore takes quite a lot time, although in fact no row is returned from the assignments table when the condition package_id=949589 is used. QUERY PLAN ------------------------------------------------------------------------ --- --------------------------------------------------- Sort (cost=41.21..41.21 rows=1 width=35) (actual time=4276.978..4276.978 rows=0 loops=1) Sort Key: assignments.title -> HashAggregate (cost=41.19..41.20 rows=1 width=35) (actual time=4276.970..4276.970 rows=0 loops=1) -> Hash Join (cost=2.40..41.18 rows=1 width=35) (actual time=4276.966..4276.966 rows=0 loops=1) Hash Cond: ("outer".assignment_id = "inner".assignment_id) -> Seq Scan on submissions (cost=0.00..38.73 rows=9 width=4) (actual time=10.902..4276.745 rows=38 loops=1) Filter: (submission_status(submission_id) = 'closed'::text) -> Hash (cost=2.40..2.40 rows=2 width=35) (actual time=0.058..0.058 rows=0 loops=1) -> Seq Scan on assignments (cost=0.00..2.40 rows=2 width=35) (actual time=0.015..0.052 rows=2 loops=1) Filter: (package_id = 949589) Total runtime: 4277.078 ms (11 rows) I therefore tried to rephrase the query, to make sure that the function is only used for the rows returned by the join but not even the following does help (the subselect t1 does not return a single row): select * from ( SELECT a.assignment_id, a.package_id, a.title, s.submission_id, COUNT(*) AS Count FROM assignments a INNER JOIN submissions s ON (a.assignment_id=s.assignment_id) WHERE a.package_id=949589 GROUP BY a.assignment_id, a.package_id, a.title, s.submission_id ) t1 where submission_status(t1.submission_id)='closed' order by title; QUERY PLAN ------------------------------------------------------------------------ --- ----------------------------------------------------------- Sort (cost=41.21..41.22 rows=1 width=188) (actual time=4114.251..4114.251 rows=0 loops=1) Sort Key: title -> Subquery Scan t1 (cost=41.20..41.20 rows=1 width=188) (actual time=4114.242..4114.242 rows=0 loops=1) -> HashAggregate (cost=41.20..41.20 rows=1 width=39) (actual time=4114.238..4114.238 rows=0 loops=1) -> Hash Join (cost=2.40..41.18 rows=1 width=39) (actual time=4114.235..4114.235 rows=0 loops=1) Hash Cond: ("outer".assignment_id = "inner".assignment_id) -> Seq Scan on submissions s (cost=0.00..38.73 rows=9 width=8) (actual time=7.179..4113.984 rows=38 loops=1) Filter: (submission_status(submission_id) = 'closed'::text) -> Hash (cost=2.40..2.40 rows=2 width=35) (actual time=0.100..0.100 rows=0 loops=1) -> Seq Scan on assignments a (cost=0.00..2.40 rows=2 width=35) (actual time=0.045..0.094 rows=2 loops=1) Filter: (package_id = 949589) Total runtime: 4114.356 ms (12 rows) The function is nevertheless executed for every row in the submissions table. A simple "select *, submission_status(submission_id) from submissions" takes about the same time as the 2 queries stated above. The whole database has been vacuum analysed right before the explain analyse output has been captured. What can I do to reduce the time this query takes? And why is the function executed although there is no row in the result set of t1 in my rephrased query? TIA, peter Ps: table definitions: Table "public.assignments" Column | Type | Modifiers ---------------+-----------------------------+------------------------ assignment_id | integer | not null title | character varying(100) | not null max_grade | smallint | not null start_date | timestamp without time zone | not null default now() end_date | timestamp without time zone | not null over_due_date | timestamp without time zone | score_release | smallint | not null default 1 package_id | integer | not null cal_item_id | integer | Indexes: "assignments_pk" primary key, btree (assignment_id) Check constraints: "assignments_sr_ck" CHECK (score_release = 1 OR score_release = 2 OR score_release = 3) Foreign-key constraints: "cal_item_id" FOREIGN KEY (cal_item_id) REFERENCES cal_items(cal_item_id) ON DELETE SET NULL "package_id_fk" FOREIGN KEY (package_id) REFERENCES apm_packages(package_id) "assignment_id_fk" FOREIGN KEY (assignment_id) REFERENCES acs_objects(object_id) ON DELETE CASCADE Table "public.submissions" Column | Type | Modifiers ---------------+-----------------------------+----------- submission_id | integer | not null person_id | integer | not null assignment_id | integer | not null last_modified | timestamp without time zone | not null recovery_date | timestamp without time zone | grading | smallint | grading_date | timestamp without time zone | Indexes: "submissions_pk" primary key, btree (submission_id) "submissions_person_ass_un" unique, btree (person_id, assignment_id) Foreign-key constraints: "assignment_id_fk" FOREIGN KEY (assignment_id) REFERENCES assignments(assignment_id) "person_id_fk" FOREIGN KEY (person_id) REFERENCES persons(person_id) -- peter.alberer@wu-wien.ac.at Tel: +43/1/31336/4341 Abteilung für Wirtschaftsinformatik, Wirtschaftsuniversitaet Wien, Austria
pgsql-performance by date: