Thread: query planning different in plpgsql?
Hi, Is there any way to get the query plan of the query run in the stored procedure? I am running the following one and it takes 10 minutes in the procedure when it is pretty fast standalone. Any ideas would be welcome! # EXPLAIN ANALYZE SELECT m.domain_id, nsr_id FROM nsr_meta m, last_snapshot l WHERE m.domain_id = l.domain_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..562432.32 rows=12227848 width=16) (actual time=1430.034..7476.081 rows=294418 loops=1) -> Seq Scan on last_snapshot l (cost=0.00..3983.68 rows=60768 width=8) (actual time=0.010..57.304 rows=60641 loops=1) -> Index Scan using idx_nsr_meta_domain_id on nsr_meta m (cost=0.00..6.68 rows=201 width=16) (actual time=0.111..0.115 rows=5 loops=60641) Index Cond: (m.domain_id = l.domain_id) Total runtime: 7635.625 ms (5 rows) Time: 7646.243 ms Many thanks, Michal -- I hear and I forget. I see and I believe. I do and I understand. (Confucius)
On Fri, Oct 23, 2009 at 11:38 AM, Michal J. Kubski <michal.kubski@cdt.pl> wrote:
Do you not have an index on last_snapshot.domain_id?
--Scott
Hi,
Is there any way to get the query plan of the query run in the stored
procedure?
I am running the following one and it takes 10 minutes in the procedure
when it is pretty fast standalone.
Any ideas would be welcome!
# EXPLAIN ANALYZE SELECT m.domain_id, nsr_id FROM nsr_meta m, last_snapshot
l WHERE m.domain_id = l.domain_id;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..562432.32 rows=12227848 width=16) (actual
time=1430.034..7476.081 rows=294418 loops=1)
-> Seq Scan on last_snapshot l (cost=0.00..3983.68 rows=60768 width=8)
(actual time=0.010..57.304 rows=60641 loops=1)
-> Index Scan using idx_nsr_meta_domain_id on nsr_meta m
(cost=0.00..6.68 rows=201 width=16) (actual time=0.111..0.115 rows=5
loops=60641)
Index Cond: (m.domain_id = l.domain_id)
Total runtime: 7635.625 ms
(5 rows)
Time: 7646.243 ms
Do you not have an index on last_snapshot.domain_id?
--Scott
On Fri, Oct 23, 2009 at 4:49 PM, Scott Mead <scott.lists@enterprisedb.com> wrote:
that, and also try rewriting a query as JOIN. There might be difference in performance/plan.
Do you not have an index on last_snapshot.domain_id?
that, and also try rewriting a query as JOIN. There might be difference in performance/plan.
--
GJ
On Fri, Oct 23, 2009 at 11:38 AM, Michal J. Kubski <michal.kubski@cdt.pl>wrote: >> I am running the following one and it takes 10 minutes in the procedure >> when it is pretty fast standalone. >> >> # EXPLAIN ANALYZE SELECT m.domain_id, nsr_id FROM nsr_meta m, last_snapshot >> l WHERE m.domain_id = l.domain_id; Is it *really* just like that inside the stored procedure? Usually the reason for a difference in plan is that the procedure's query references some variables of the procedure, which people think act like constants but they don't. Also, if you're executing the SELECT as a plpgsql FOR-loop, it will be planned like a cursor, so the thing to compare against is explain [analyze] declare x cursor for select ... regards, tom lane
Hi, On Fri, 23 Oct 2009 16:56:36 +0100, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > On Fri, Oct 23, 2009 at 4:49 PM, Scott Mead > <scott.lists@enterprisedb.com>wrote: > >> >> >> Do you not have an index on last_snapshot.domain_id? >> > > that, and also try rewriting a query as JOIN. There might be difference in > performance/plan. > Thanks, it runs better (average 240s, not 700s) with the index. Rewriting queries as JOINs does not make any difference. The last_snapshot is a temp table created earlier in the procedure and the query in question is preceded with CREATE TEMPORARY TABLE as well, not a cursor. I still do not get why it performs differently inside the procedure. Is there any way to see what planning decisions were made? Best regards, Michal -- I hear and I forget. I see and I believe. I do and I understand. (Confucius)
On Mon, Oct 26, 2009 at 6:05 AM, Michal J. Kubski <michal.kubski@cdt.pl> wrote: > On Fri, 23 Oct 2009 16:56:36 +0100, Grzegorz Jaśkiewicz > <gryzman@gmail.com> wrote: >> On Fri, Oct 23, 2009 at 4:49 PM, Scott Mead >> <scott.lists@enterprisedb.com>wrote: >> >>> >>> >>> Do you not have an index on last_snapshot.domain_id? >>> >> >> that, and also try rewriting a query as JOIN. There might be difference > in >> performance/plan. >> > Thanks, it runs better (average 240s, not 700s) with the index. Rewriting > queries > as JOINs does not make any difference. > The last_snapshot is a temp table created earlier in the procedure > and the query in question is preceded with CREATE TEMPORARY TABLE as well, > not a cursor. > I still do not get why it performs differently inside the procedure. > Is there any way to see what planning decisions were made? not directly....can we see the function? merlin
On Mon, 26 Oct 2009 09:19:26 -0400, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Oct 26, 2009 at 6:05 AM, Michal J. Kubski <michal.kubski@cdt.pl> > wrote: >> On Fri, 23 Oct 2009 16:56:36 +0100, Grzegorz Jaśkiewicz >> <gryzman@gmail.com> wrote: >>> On Fri, Oct 23, 2009 at 4:49 PM, Scott Mead >>> <scott.lists@enterprisedb.com>wrote: >>> >>>> >>>> >>>> Do you not have an index on last_snapshot.domain_id? >>>> >>> >>> that, and also try rewriting a query as JOIN. There might be difference >> in >>> performance/plan. >>> >> Thanks, it runs better (average 240s, not 700s) with the index. > Rewriting >> queries >> as JOINs does not make any difference. >> The last_snapshot is a temp table created earlier in the procedure >> and the query in question is preceded with CREATE TEMPORARY TABLE as > well, >> not a cursor. >> I still do not get why it performs differently inside the procedure. >> Is there any way to see what planning decisions were made? > > not directly....can we see the function? > > merlin It looks like that (I stripped off some fields in result_rs record, to make it more brief and leave the relevant part) CREATE OR REPLACE FUNCTION build_list() RETURNS SETOF result_rs AS $$ DECLARE start_time TIMESTAMP; rec result_rs; BEGIN start_time := timeofday()::timestamp; CREATE TEMPORARY TABLE last_snapshot AS SELECT * FROM last_take; -- last_take is a view CREATE INDEX last_snapshot_idx ON last_snapshot USING btree(domain_id) WITH (fillfactor=100); CREATE TEMPORARY TABLE tmp_lm AS SELECT m.domain_id, nsr_id FROM nsrs_meta m JOIN last_snapshot l ON m.domain_id = l.domain_id; CREATE INDEX tmp_lm_idx ON tmp_lm USING btree(nsr_id) WITH (fillfactor=100); CREATE TEMPORARY TABLE tmp_ns_bl_matching_domains AS SELECT DISTINCT lm.domain_id FROM tmp_lm lm JOIN nsrs n ON lm.nsr_id = n.id JOIN ns_bl b ON n.ip_id = b.ip_id; CREATE INDEX tmp_bls_0 ON tmp_ns_bl_matching_domains USING btree(domain_id) WITH (fillfactor=100); DROP TABLE tmp_lm; CREATE TEMPORARY TABLE temp_result AS SELECT t.domain_id, t.name, (CASE WHEN b.domain_id IS NULL THEN 0 ELSE 1 END) AS is_bl, (CASE WHEN f.is_s IS NULL THEN 0 ELSE f.is_s::INTEGER END) AS is_s, FROM last_snapshot t LEFT JOIN tmp_ns_bl_matching_domains b ON b.domain_id=t.domain_id LEFT JOIN (SELECT DISTINCT ON (domain_id) * FROM domain_flags f) f ON t.domain_id=f.domain_id; FOR rec IN SELECT UTC_NOW(), name, is_bl, is_s FROM temp_result t LOOP RETURN NEXT rec; END LOOP; DROP TABLE temp_result; DROP TABLE tmp_ns_bl_matching_domains; PERFORM time_log('BUILD', get_elapsed_time(start_time)); END; $$ LANGUAGE plpgsql; Thanks, Michal -- I hear and I forget. I see and I believe. I do and I understand. (Confucius)
"Michal J. Kubski" <michal.kubski@cdt.pl> writes: > [ function that creates a bunch of temporary tables and immediately > joins them ] It'd probably be a good idea to insert an ANALYZE on the temp tables after you fill them. The way you've got this set up, there is no chance of auto-analyze correcting that oversight for you, so the planner will be planning the join "blind" without any stats. Good results would only come by pure luck. regards, tom lane
Try to force a unique plan, like that:<br /><br /> SELECT field, field2 ...<br /> FROM table1<br /> WHERE field3 = 'xxx'<br/> AND field4 = 'yyy'<br /> AND field5 = 'zzz'<br /><br /> so, in that example, I need the planner to use my field4index, but the planner insists to use the field5, so I rewrite the query like this:<br /><br /> SELECT field, field2...<br /> FROM table1<br /> WHERE trim(field3) = 'xxx'<br /> AND field4 = 'yyy'<br /> AND trim(field5) = 'zzz'<br /><br/> I didn´t give any option to the planner, so I get what plan I want.<br /><br /> Waldomiro<br /><br /><br /> TomLane escreveu: <blockquote cite="mid:13574.1256580589@sss.pgh.pa.us" type="cite"><pre wrap="">"Michal J. Kubski" <a class="moz-txt-link-rfc2396E"href="mailto:michal.kubski@cdt.pl"><michal.kubski@cdt.pl></a> writes: </pre><blockquotetype="cite"><pre wrap="">[ function that creates a bunch of temporary tables and immediately joins them ] </pre></blockquote><pre wrap=""> It'd probably be a good idea to insert an ANALYZE on the temp tables after you fill them. The way you've got this set up, there is no chance of auto-analyze correcting that oversight for you, so the planner will be planning the join "blind" without any stats. Good results would only come by pure luck. regards, tom lane </pre></blockquote><br />
On Mon, 26 Oct 2009 11:52:22 -0400, Merlin Moncure <mmoncure@gmail.com> wrote: >>>>>> Do you not have an index on last_snapshot.domain_id? >>>>>> >>>>> that, and also try rewriting a query as JOIN. There might be >>>>> difference in performance/plan. >>>>> >>>> Thanks, it runs better (average 240s, not 700s) with the index. >>> Rewriting >>>> queries >>>> as JOINs does not make any difference. >>>> The last_snapshot is a temp table created earlier in the procedure >>>> and the query in question is preceded with CREATE TEMPORARY TABLE as well, >>>> not a cursor. >>>> I still do not get why it performs differently inside the procedure. >>>> Is there any way to see what planning decisions were made? >>> >>> not directly....can we see the function? >>> >>> merlin >> >> It looks like that (I stripped off some fields in result_rs record, to >> make >> it more brief >> and leave the relevant part) >> >> [..function cut off..] > hm. what version of postgres are you using? I have some version > dependent suggestions. Also, is it ok to respond to the list quoting > any/all of your function? (I'd perfer to keep the discussion public if > possible). > Hi, Apologies for late response. It is 8.3.7. Tom Lane's suggestion to add ANALYZE seem to help it, though I still sometimes get long query runs. Thanks, Michal -- I hear and I forget. I see and I believe. I do and I understand. (Confucius)
On Mon, 26 Oct 2009 14:09:49 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Michal J. Kubski" <michal.kubski@cdt.pl> writes: >> [ function that creates a bunch of temporary tables and immediately >> joins them ] > > It'd probably be a good idea to insert an ANALYZE on the temp tables > after you fill them. The way you've got this set up, there is no chance > of auto-analyze correcting that oversight for you, so the planner will > be planning the join "blind" without any stats. Good results would only > come by pure luck. > > regards, tom lane Hi, Apologies for late response. Thanks a lot: ANALYZE seem to help it! I still sometimes get long query runs though. As far as I understand using index over sequential scan on joins should be faster. Could it be possible that the query planner decides to use seqscan instead of index scan on some random occasions? Thanks, Michal -- I hear and I forget. I see and I believe. I do and I understand. (Confucius)
On 10/23/09 8:38 AM, "Michal J.Kubski" <michal.kubski@cdt.pl> wrote: > > > > > Hi, > > > > Is there any way to get the query plan of the query run in the stored > > procedure? > > I am running the following one and it takes 10 minutes in the procedure > > when it is pretty fast standalone. > > > > Any ideas would be welcome! > > If your query is SELECT field, field2 FROM table1 WHERE field3 = 'xxx' AND field4 = 'yyy' And you want to test what the planner will do without the knowledge of the exact values 'xxx' and 'yyy', you can prepare a statement: #PREPARE foo() AS SELECT field, field2 FROM table1 WHERE field3 = $1 AND field4 = $2; #EXPLAIN execute foo('xxx', 'yyy'); If field3 and field4 don't have unique indexes, the plan might differ. It will most likely differ if 'xxx' or 'yyy' is a very common value in the table and the table is not tiny.