Thread: oddly slow query
Hello. I'm trying to figure out why a query I'm doing is incredibly slow (~10 minutes.) The incredibly slow query is something like: SELECT count(*) from registration LEFT JOIN person USING (person_id) WHERE x_program(registration.x_type_code) = 'blah'; The person view is quite big (~69000 rows). I don't actually want to return the count, but I've been using that for testing purposes as it has a similar response time to returning the fields needed. Queries related to this, like: A) SELECT count(*) from registration LEFT JOIN person USING (person_id); and B) SELECT count(*) from registration WHERE x_program(registration.x_type_code) = 'blah'; are nearly instantaneous. I've run EXPLAIN, EXPLAIN VERBOSE, and EXPLAIN ANALYZE on the query, and that hasn't helped me. According to EXPLAIN ANALYZE, the slow query takes 709704 ms to execute, and query A takes 1554 ms, but looking at the execution plans, query A is predicted to take longer than the slow query. The hold-up seems to be in a 'Nested Loop Left Join', which is only in the plan for the slow query. Here are the first two lines of EXPLAIN ANALYZE on the slow query: Aggregate (cost=8969.16..8969.17 rows=1 width=0) (actual time=709703.985..709703.987 rows=1 loops=1) -> Nested Loop Left Join (cost=755.61..8968.29 rows=346 width=0) (actual time=147.667..709700.553 rows=684 loops=1) Does anybody have ideas why this is so slow? Thank you, Jessi -- Jessi Berkelhammer Downtown Emergency Service Center Computer Programming Specialist
Jessi Berkelhammer <jberkelhammer@desc.org> writes: > The hold-up seems to be in a 'Nested Loop Left Join', which is only in > the plan for the slow query. > Here are the first two lines of EXPLAIN ANALYZE on the slow query: So you've left out all the information that would let anyone guess what the problem is ... Let's see the *whole* EXPLAIN ANALYZE output for all three of the queries you mentioned. It would also be appropriate to mention exactly which PG version you're using. regards, tom lane
On 11/01/2008, Jessi Berkelhammer <jberkelhammer@desc.org> wrote: > Hello. > > I'm trying to figure out why a query I'm doing is incredibly slow (~10 > minutes.) The incredibly slow query is something like: > > SELECT count(*) from registration LEFT JOIN person USING (person_id) > WHERE x_program(registration.x_type_code) = 'blah'; Hello what do you do in x_program function? Are you sure so it is fast? Regards Pavel Stehule
Hello. Thanks for the replies. Pavel Stehule wrote: > what do you do in x_program function? Are you sure so it is fast? > I do not think the function is the problem, as running the slow query without it is just as slow, and similar queries using that function are quick. Tom Lane wrote: > > Let's see the *whole* EXPLAIN ANALYZE output for all three of the > queries you mentioned. Sorry for not including more. Here are the 3 EXPLAIN ANALYZE commands followed by the output: explain analyze select count(*) from clinical_reg_current LEFT JOIN client using (client_id) WHERE tier_program(clinical_reg_current.benefit_type_code) = 'SAGE'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=8969.16..8969.17 rows=1 width=0) (actual time=738140.818..738140.820 rows=1 loops=1) -> Nested Loop Left Join (cost=755.61..8968.29 rows=346 width=0) (actual time=150.918..738137.244 rows=684 loops=1) Join Filter: (clinical_reg_current.client_id = client.client_id) -> Subquery Scan clinical_reg_current (cost=754.36..758.23 rows=1 width=4) (actual time=57.359..146.717 rows=684 loops=1) Filter: (tier_program(benefit_type_code) = 'SAGE'::text) -> Unique (cost=754.36..756.47 rows=117 width=211) (actual time=56.427..67.998 rows=1000 loops=1) -> Sort (cost=754.36..755.42 rows=422 width=211) (actual time=56.419..60.020 rows=1003 loops=1) Sort Key: tbl_clinical_reg.client_id, tbl_clinical_reg.clinical_reg_date -> Seq Scan on tbl_clinical_reg (cost=0.00..735.96 rows=422 width=211) (actual time=7.447..52.914 rows=1003 loops=1) Filter: ((NOT is_deleted) AND (clinical_reg_date <= ('now'::text)::date) AND ((clinical_reg_date_end >= ('now'::text)::date) OR (clinical_reg_date_end IS NULL)) AND ((kc_authorization_status_code)::text <> ALL (('{CX,TM}'::character varying[])::text[])) AND ((benefit_type_code)::text <> ALL (('{75,98,99,00}'::character varying[])::text[]))) -> Hash Left Join (cost=1.25..6653.69 rows=69172 width=632) (actual time=0.025..925.160 rows=69179 loops=684) Hash Cond: (tbl_client.client_id = prot.client_id) -> Seq Scan on tbl_client (cost=0.00..3453.20 rows=69172 width=427) (actual time=0.012..221.612 rows=69179 loops=684) Filter: (NOT is_deleted) -> Hash (cost=1.20..1.20 rows=4 width=209) (actual time=0.172..0.172 rows=4 loops=1) -> Subquery Scan prot (cost=1.14..1.20 rows=4 width=209) (actual time=0.111..0.153 rows=4 loops=1) -> Unique (cost=1.14..1.16 rows=4 width=162) (actual time=0.103..0.128 rows=4 loops=1) -> Sort (cost=1.14..1.15 rows=4 width=162) (actual time=0.099..0.106 rows=4 loops=1) Sort Key: tbl_client_protected.client_id, tbl_client_protected.client_protected_date -> Seq Scan on tbl_client_protected (cost=0.00..1.10 rows=4 width=162) (actual time=0.039..0.061 rows=4 loops=1) Filter: ((NOT is_deleted) AND (client_protected_date <= ('now'::text)::date) AND ((client_protected_date_end > ('now'::text)::date) OR (client_protected_date_end IS NULL))) Total runtime: 738142.119 ms (22 rows) --------------------------------------------------------------- explain analyze select count(*) from clinical_reg_current LEFT JOIN client using (client_id); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=10485.44..10485.45 rows=1 width=0) (actual time=1497.870..1497.872 rows=1 loops=1) -> Hash Left Join (cost=8964.43..10384.27 rows=40466 width=0) (actual time=1482.614..1495.678 rows=1000 loops=1) Hash Cond: (clinical_reg_current.client_id = client.client_id) -> Unique (cost=754.36..756.47 rows=117 width=211) (actual time=59.971..66.819 rows=1000 loops=1) -> Sort (cost=754.36..755.42 rows=422 width=211) (actual time=59.963..62.183 rows=1003 loops=1) Sort Key: tbl_clinical_reg.client_id, tbl_clinical_reg.clinical_reg_date -> Seq Scan on tbl_clinical_reg (cost=0.00..735.96 rows=422 width=211) (actual time=6.604..56.440 rows=1003 loops=1) Filter: ((NOT is_deleted) AND (clinical_reg_date <= ('now'::text)::date) AND ((clinical_reg_date_end >= ('now'::text)::date) OR (clinical_reg_date_end IS NULL)) AND ((kc_authorization_status_code)::text <> ALL (('{CX,TM}'::character varying[])::text[])) AND ((benefit_type_code)::text <> ALL (('{75,98,99,00}'::character varying[])::text[]))) -> Hash (cost=7345.41..7345.41 rows=69172 width=4) (actual time=1422.563..1422.563 rows=69179 loops=1) -> Subquery Scan client (cost=1.25..7345.41 rows=69172 width=4) (actual time=0.282..1240.906 rows=69179 loops=1) -> Hash Left Join (cost=1.25..6653.69 rows=69172 width=632) (actual time=0.275..940.975 rows=69179 loops=1) Hash Cond: (tbl_client.client_id = prot.client_id) -> Seq Scan on tbl_client (cost=0.00..3453.20 rows=69172 width=427) (actual time=0.019..227.016 rows=69179 loops=1) Filter: (NOT is_deleted) -> Hash (cost=1.20..1.20 rows=4 width=209) (actual time=0.179..0.179 rows=4 loops=1) -> Subquery Scan prot (cost=1.14..1.20 rows=4 width=209) (actual time=0.100..0.152 rows=4 loops=1) -> Unique (cost=1.14..1.16 rows=4 width=162) (actual time=0.089..0.119 rows=4 loops=1) -> Sort (cost=1.14..1.15 rows=4 width=162) (actual time=0.085..0.093 rows=4 loops=1) Sort Key: tbl_client_protected.client_id, tbl_client_protected.client_protected_date -> Seq Scan on tbl_client_protected (cost=0.00..1.10 rows=4 width=162) (actual time=0.031..0.058 rows=4 loops=1) Filter: ((NOT is_deleted) AND (client_protected_date <= ('now'::text)::date) AND ((client_protected_date_end > ('now'::text)::date) OR (client_protected_date_end IS NULL))) Total runtime: 1498.442 ms (22 rows) --------------------------------------------------- explain analyze select count(*) from clinical_reg_current WHERE tier_program(clinical_reg_current.benefit_type_code) = 'SAGE' QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=758.23..758.24 rows=1 width=0) (actual time=90.335..90.337 rows=1 loops=1) -> Subquery Scan clinical_reg_current (cost=754.36..758.23 rows=1 width=0) (actual time=51.892..88.528 rows=684 loops=1) Filter: (tier_program(benefit_type_code) = 'SAGE'::text) -> Unique (cost=754.36..756.47 rows=117 width=211) (actual time=51.355..58.708 rows=1000 loops=1) -> Sort (cost=754.36..755.42 rows=422 width=211) (actual time=51.349..53.521 rows=1003 loops=1) Sort Key: tbl_clinical_reg.client_id, tbl_clinical_reg.clinical_reg_date -> Seq Scan on tbl_clinical_reg (cost=0.00..735.96 rows=422 width=211) (actual time=6.361..48.347 rows=1003 loops=1) Filter: ((NOT is_deleted) AND (clinical_reg_date <= ('now'::text)::date) AND ((clinical_reg_date_end >= ('now'::text)::date) OR (clinical_reg_date_end IS NULL)) AND ((kc_authorization_status_code)::text <> ALL (('{CX,TM}'::character varying[])::text[])) AND ((benefit_type_code)::text <> ALL (('{75,98,99,00}'::character varying[])::text[]))) Total runtime: 90.452 ms (9 rows) > It would also be appropriate to mention > exactly which PG version you're using. > Version is 8.2.5. Thanks, jessi -- Jessi Berkelhammer Downtown Emergency Service Center Computer Programming Specialist
Jessi Berkelhammer <jberkelhammer@desc.org> writes: > Here are the 3 EXPLAIN ANALYZE commands followed by the output: Well, here's the problem: > Join Filter: (clinical_reg_current.client_id = client.client_id) > -> Subquery Scan clinical_reg_current (cost=754.36..758.23 > rows=1 width=4) (actual time=57.359..146.717 rows=684 loops=1) > Filter: (tier_program(benefit_type_code) = 'SAGE'::text) > -> Unique (cost=754.36..756.47 rows=117 width=211) > (actual time=56.427..67.998 rows=1000 loops=1) For some reason it's estimating only one row out of the clinical_reg_current view will satisfy the tier_program(benefit_type_code) = 'SAGE' constraint. This causes it to think a nestloop join to the client view would be a good idea. The same estimation error is present in your example with the function and no join, but it doesn't hurt anything because there are no planning decisions that depend on the estimate in that case. The estimate of the view's rowcount without the filter isn't that great either (117 vs 1000 actual) but it's not wrong enough to prompt selection of a bad plan choice. There's something funny going on with the estimation of the function's selectivity --- does the expression "tier_program(benefit_type_code)" match an index, perhaps? If so, have you updated stats for that table lately? I'm also wondering why the function call isn't getting pushed down further into the plan --- what's the definition of that view look like? regards, tom lane
Hello. Thanks for the help. Tom Lane wrote: > Jessi Berkelhammer <jberkelhammer@desc.org> writes: >> Here are the 3 EXPLAIN ANALYZE commands followed by the output: > > Well, here's the problem: > >> Join Filter: (clinical_reg_current.client_id = client.client_id) >> -> Subquery Scan clinical_reg_current (cost=754.36..758.23 >> rows=1 width=4) (actual time=57.359..146.717 rows=684 loops=1) >> Filter: (tier_program(benefit_type_code) = 'SAGE'::text) >> -> Unique (cost=754.36..756.47 rows=117 width=211) >> (actual time=56.427..67.998 rows=1000 loops=1) > > For some reason it's estimating only one row out of the > clinical_reg_current view will satisfy the > tier_program(benefit_type_code) = 'SAGE' constraint. This causes it to > think a nestloop join to the client view would be a good idea. The same > estimation error is present in your example with the function and no > join, but it doesn't hurt anything because there are no planning > decisions that depend on the estimate in that case. > > The estimate of the view's rowcount without the filter isn't that great > either (117 vs 1000 actual) but it's not wrong enough to prompt selection > of a bad plan choice. There's something funny going on with the > estimation of the function's selectivity --- does the expression > "tier_program(benefit_type_code)" match an index, perhaps? If so, have > you updated stats for that table lately? > Our database is analyzed & vacuumed nightly. > I'm also wondering why the function call isn't getting pushed down > further into the plan --- what's the definition of that view look like? > Here is the definition of the function: Schema | Name | Result data type | Argument data types | Owner | Language | Source code | Description --------+--------------+------------------+---------------------+---------+----------+-----------------------------------------------------------------------------------------------------------+------------- public | tier_program | text | character varying | Chasers | sql | | : : SELECT COALESCE(clinical_project_code,description) FROM l_benefit_type WHERE benefit_type_code = $1; : : Here is the definition of clinical_reg_current view: View "public.clinical_reg_current" Column | Type | Modifiers | Description ------------------------------+--------------------------------+-----------+------------- clinical_reg_id | integer | | client_id | integer | | clinical_reg_date | date | | benefit_type_code | character varying(10) | | funding_source_code | character varying(10) | | clinical_reg_date_end | date | | clinical_exit_reason_code | character varying(10) | | kc_authorization_id | integer | | kc_authorization_status_code | character varying(10) | | current_case_rate | numeric(8,2) | | case_rate_reason_code | character varying(10) | | kc_exit_type_code | character varying(10) | | added_by | integer | | added_at | timestamp(0) without time zone | | changed_by | integer | | changed_at | timestamp(0) without time zone | | is_deleted | boolean | | deleted_at | timestamp(0) without time zone | | deleted_by | integer | | deleted_comment | text | | sys_log | text | | View definition: SELECT DISTINCT ON (clinical_reg.client_id) clinical_reg.clinical_reg_id, clinical_reg.client_id, clinical_reg.clinical_reg_date, clinical_reg.benefit_type_code, clinical_reg.funding_source_code, clinical_reg.clinical_reg_date_end, clinical_reg.clinical_exit_reason_code, clinical_reg.kc_authorization_id, clinical_reg.kc_authorization_status_code, clinical_reg.current_case_rate, clinical_reg.case_rate_reason_code, clinical_reg.kc_exit_type_code, clinical_reg.added_by, clinical_reg.added_at, clinical_reg.changed_by, clinical_reg.changed_at, clinical_reg.is_deleted, clinical_reg.deleted_at, clinical_reg.deleted_by, clinical_reg.deleted_comment, clinical_reg.sys_log FROM clinical_reg WHERE clinical_reg.clinical_reg_date <= 'now'::text::date AND (clinical_reg.clinical_reg_date_end >= 'now'::text::date OR clinical_reg.clinical_reg_date_end IS NULL) AND (clinical_reg.kc_authorization_status_code::text <> ALL (ARRAY['CX'::character varying, 'TM'::character varying]::text[])) AND (clinical_reg.benefit_type_code::text <> ALL (ARRAY['75'::character varying, '98'::character varying, '99'::character varying, '00'::character varying]::text[])) ORDER BY clinical_reg.client_id, clinical_reg.clinical_reg_date DESC; The clinical_reg view is everything from tbl_clinical_reg, where is_deleted is false. tbl_clinical_reg, the underlying table has the same columns as clinical_reg_current and clinical_reg. The other information from its definition is here: Indexes: "tbl_clinical_reg_pkey" PRIMARY KEY, btree (clinical_reg_id) "tbl_clinical_reg_kc_authorization_id_key" UNIQUE, btree (kc_authorization_id) "index_tbl_clinical_reg_benefit_type_code" btree (benefit_type_code) WHERE NOT is_deleted "index_tbl_clinical_reg_benefit_type_status" btree (benefit_type_code, kc_authorization_status_code) WHERE NOT is_deleted "index_tbl_clinical_reg_client_id" btree (client_id) WHERE NOT is_deleted "index_tbl_clinical_reg_client_id_clinical_reg_date" btree (client_id, clinical_reg_date) WHERE NOT is_deleted "index_tbl_clinical_reg_client_id_dates" btree (client_id, clinical_reg_date, clinical_reg_date_end) WHERE NOT is_deleted "index_tbl_clinical_reg_clinical_reg_date" btree (clinical_reg_date) WHERE NOT is_deleted "index_tbl_clinical_reg_clinical_reg_dates" btree (clinical_reg_date, clinical_reg_date_end) WHERE NOT is_deleted "index_tbl_clinical_reg_dates_client_id" btree (clinical_reg_date_end, clinical_reg_date, client_id) WHERE NOT is_deleted "index_tbl_clinical_reg_dates_client_id_deleted" btree (clinical_reg_date_end, clinical_reg_date, client_id) WHERE NOT is_deleted "index_tbl_clinical_reg_status_benefit_type" btree (kc_authorization_status_code, benefit_type_code) WHERE NOT is_deleted "index_tbl_clinical_reg_status_dates" btree (kc_authorization_status_code, clinical_reg_date, clinical_reg_date_end) WHERE NOT is_deleted "index_tbl_clinical_reg_status_dates_client_id_deleted" btree (kc_authorization_status_code, clinical_reg_date_end, clinical_reg_date, client_id) WHERE NOT is_deleted Check constraints: "na_only_for_parke" CHECK (benefit_type_code::text <> 'NA'::text OR funding_source_code::text = 'PARKE'::text) "tbl_clinical_reg_check" CHECK (NOT is_deleted AND deleted_at IS NULL OR is_deleted AND deleted_at IS NOT NULL) "tbl_clinical_reg_check1" CHECK (NOT is_deleted AND deleted_by IS NULL OR is_deleted AND deleted_by IS NOT NULL) "tbl_clinical_reg_current_case_rate_check" CHECK (current_case_rate >= 0::numeric) Foreign-key constraints: "tbl_clinical_reg_added_by_fkey" FOREIGN KEY (added_by) REFERENCES tbl_staff(staff_id) "tbl_clinical_reg_benefit_type_code_fkey" FOREIGN KEY (benefit_type_code) REFERENCES l_benefit_type(benefit_type_code) "tbl_clinical_reg_case_rate_reason_code_fkey" FOREIGN KEY (case_rate_reason_code) REFERENCES l_kc_case_rate_reason(kc_case_rate_reason_code) "tbl_clinical_reg_changed_by_fkey" FOREIGN KEY (changed_by) REFERENCES tbl_staff(staff_id) "tbl_clinical_reg_client_id_fkey" FOREIGN KEY (client_id) REFERENCES tbl_client(client_id) "tbl_clinical_reg_clinical_exit_reason_code_fkey" FOREIGN KEY (clinical_exit_reason_code) REFERENCES l_clinical_exit_reason(clinical_exit_reason_code) "tbl_clinical_reg_deleted_by_fkey" FOREIGN KEY (deleted_by) REFERENCES tbl_staff(staff_id) "tbl_clinical_reg_funding_source_code_fkey" FOREIGN KEY (funding_source_code) REFERENCES l_funding_source(funding_source_code) "tbl_clinical_reg_kc_authorization_status_code_fkey" FOREIGN KEY (kc_authorization_status_code) REFERENCES l_kc_authorization_status(kc_authorization_status_code) "tbl_clinical_reg_kc_exit_type_code_fkey" FOREIGN KEY (kc_exit_type_code) REFERENCES l_kc_exit_type(kc_exit_type_code) Triggers: tbl_clinical_reg_changed_at_update BEFORE UPDATE ON tbl_clinical_reg FOR EACH ROW EXECUTE PROCEDURE auto_changed_at_update() tbl_clinical_reg_log_chg AFTER INSERT OR DELETE OR UPDATE ON tbl_clinical_reg FOR EACH ROW EXECUTE PROCEDURE table_log() Has OIDs: no Thank you. take care, jessi -- Jessi Berkelhammer Downtown Emergency Service Center Computer Programming Specialist
Jessi Berkelhammer <jberkelhammer@desc.org> writes: > Tom Lane wrote: >> For some reason it's estimating only one row out of the >> clinical_reg_current view will satisfy the >> tier_program(benefit_type_code) = 'SAGE' constraint. My math was off the other day --- actually, that's exactly what you'd expect for the default estimate on an equality condition it has no stats for, when the underlying scan is estimated to have only 117 rows. So either you should do something about getting that underlying estimate up closer to reality (perhaps increasing default_statistics_target would improve matters?), or you need to fix things so that the planner can apply its statistics to estimating what is happening with the tier_program constraint. Expressed as a function this way, it's just a black box to the planner so you get a default estimate. Given that the function is just extracting from a table, I think you could remove the function call and express the condition with a join instead, and that might result in a better estimate. >> I'm also wondering why the function call isn't getting pushed down >> further into the plan --- what's the definition of that view look like? > View definition: > SELECT DISTINCT ON (clinical_reg.client_id) Ah, it's the DISTINCT ON that's preventing any better optimization. Not much to be done about that, unless you can recast things to not need DISTINCT ON, which looks a bit hard. regards, tom lane