Thread: query planning different in plpgsql?

query planning different in plpgsql?

From

Michal J. Kubski

Date:

23 October 2009, 12:44:42


Hi,

Is there any way to get the query plan of the query run in the stored
procedure?
I am running the following one and it takes 10 minutes in the procedure
when it is pretty fast standalone.

Any ideas would be welcome!

# EXPLAIN ANALYZE SELECT m.domain_id, nsr_id FROM nsr_meta m, last_snapshot
l WHERE m.domain_id = l.domain_id;
                                                                        
QUERY PLAN                                                                 
       

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..562432.32 rows=12227848 width=16) (actual
time=1430.034..7476.081 rows=294418 loops=1)
   ->  Seq Scan on last_snapshot l  (cost=0.00..3983.68 rows=60768 width=8)
(actual time=0.010..57.304 rows=60641 loops=1)
   ->  Index Scan using idx_nsr_meta_domain_id on nsr_meta m 
(cost=0.00..6.68 rows=201 width=16) (actual time=0.111..0.115 rows=5
loops=60641)
         Index Cond: (m.domain_id = l.domain_id)
 Total runtime: 7635.625 ms
(5 rows)

Time: 7646.243 ms

Many thanks,
Michal

-- 
I hear and I forget. I see and I believe. I do and I understand.
(Confucius)

Re: query planning different in plpgsql?

From

Scott Mead

Date:

23 October 2009, 12:50:03

On Fri, Oct 23, 2009 at 11:38 AM, Michal J. Kubski <michal.kubski@cdt.pl> wrote:

Hi,

Is there any way to get the query plan of the query run in the stored
procedure?
I am running the following one and it takes 10 minutes in the procedure
when it is pretty fast standalone.

Any ideas would be welcome!

# EXPLAIN ANALYZE SELECT m.domain_id, nsr_id FROM nsr_meta m, last_snapshot
l WHERE m.domain_id = l.domain_id;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..562432.32 rows=12227848 width=16) (actual
time=1430.034..7476.081 rows=294418 loops=1)
-> Seq Scan on last_snapshot l (cost=0.00..3983.68 rows=60768 width=8)
(actual time=0.010..57.304 rows=60641 loops=1)
-> Index Scan using idx_nsr_meta_domain_id on nsr_meta m
(cost=0.00..6.68 rows=201 width=16) (actual time=0.111..0.115 rows=5
loops=60641)
Index Cond: (m.domain_id = l.domain_id)
Total runtime: 7635.625 ms
(5 rows)

Time: 7646.243 ms

Do you not have an index on last_snapshot.domain_id?

--Scott

Re: query planning different in plpgsql?

From

Grzegorz Jaśkiewicz

Date:

23 October 2009, 12:56:45

On Fri, Oct 23, 2009 at 4:49 PM, Scott Mead <scott.lists@enterprisedb.com> wrote:

Do you not have an index on last_snapshot.domain_id?

that, and also try rewriting a query as JOIN. There might be difference in performance/plan.

--
GJ

Re: query planning different in plpgsql?

From

Tom Lane

Date:

23 October 2009, 13:20:33

On Fri, Oct 23, 2009 at 11:38 AM, Michal J. Kubski <michal.kubski@cdt.pl>wrote:
>> I am running the following one and it takes 10 minutes in the procedure
>> when it is pretty fast standalone.
>>
>> # EXPLAIN ANALYZE SELECT m.domain_id, nsr_id FROM nsr_meta m, last_snapshot
>> l WHERE m.domain_id = l.domain_id;

Is it *really* just like that inside the stored procedure?  Usually
the reason for a difference in plan is that the procedure's query
references some variables of the procedure, which people think act
like constants but they don't.

Also, if you're executing the SELECT as a plpgsql FOR-loop, it will be
planned like a cursor, so the thing to compare against is
    explain [analyze] declare x cursor for select ...

            regards, tom lane

Re: query planning different in plpgsql?

From

Michal J. Kubski

Date:

26 October 2009, 07:05:46


Hi,

On Fri, 23 Oct 2009 16:56:36 +0100, Grzegorz Jaśkiewicz
<gryzman@gmail.com> wrote:
> On Fri, Oct 23, 2009 at 4:49 PM, Scott Mead
> <scott.lists@enterprisedb.com>wrote:
> 
>>
>>
>>   Do you not have an index on last_snapshot.domain_id?
>>
> 
> that, and also try rewriting a query as JOIN. There might be difference
in
> performance/plan.
> 
Thanks, it runs better (average 240s, not 700s) with the index. Rewriting
queries
as JOINs does not make any difference.
The last_snapshot is a temp table created earlier in the procedure
and the query in question is preceded with CREATE TEMPORARY TABLE as well,
not a cursor. 
I still do not get why it performs differently inside the procedure. 
Is there any way to see what planning decisions were made?

Best regards,
Michal

-- 
I hear and I forget. I see and I believe. I do and I understand.
(Confucius)

Re: query planning different in plpgsql?

From

Merlin Moncure

Date:

26 October 2009, 10:45:12

On Mon, Oct 26, 2009 at 6:05 AM, Michal J. Kubski <michal.kubski@cdt.pl> wrote:
> On Fri, 23 Oct 2009 16:56:36 +0100, Grzegorz Jaśkiewicz
> <gryzman@gmail.com> wrote:
>> On Fri, Oct 23, 2009 at 4:49 PM, Scott Mead
>> <scott.lists@enterprisedb.com>wrote:
>>
>>>
>>>
>>>   Do you not have an index on last_snapshot.domain_id?
>>>
>>
>> that, and also try rewriting a query as JOIN. There might be difference
> in
>> performance/plan.
>>
> Thanks, it runs better (average 240s, not 700s) with the index. Rewriting
> queries
> as JOINs does not make any difference.
> The last_snapshot is a temp table created earlier in the procedure
> and the query in question is preceded with CREATE TEMPORARY TABLE as well,
> not a cursor.
> I still do not get why it performs differently inside the procedure.
> Is there any way to see what planning decisions were made?

not directly....can we see the function?

merlin

Re: query planning different in plpgsql?

From

Michal J. Kubski

Date:

26 October 2009, 15:02:38


On Mon, 26 Oct 2009 09:19:26 -0400, Merlin Moncure <mmoncure@gmail.com>
wrote:
> On Mon, Oct 26, 2009 at 6:05 AM, Michal J. Kubski <michal.kubski@cdt.pl>
> wrote:
>> On Fri, 23 Oct 2009 16:56:36 +0100, Grzegorz Jaśkiewicz
>> <gryzman@gmail.com> wrote:
>>> On Fri, Oct 23, 2009 at 4:49 PM, Scott Mead
>>> <scott.lists@enterprisedb.com>wrote:
>>>
>>>>
>>>>
>>>>   Do you not have an index on last_snapshot.domain_id?
>>>>
>>>
>>> that, and also try rewriting a query as JOIN. There might be difference
>> in
>>> performance/plan.
>>>
>> Thanks, it runs better (average 240s, not 700s) with the index.
> Rewriting
>> queries
>> as JOINs does not make any difference.
>> The last_snapshot is a temp table created earlier in the procedure
>> and the query in question is preceded with CREATE TEMPORARY TABLE as
> well,
>> not a cursor.
>> I still do not get why it performs differently inside the procedure.
>> Is there any way to see what planning decisions were made?
> 
> not directly....can we see the function?
> 
> merlin

It looks like that (I stripped off some fields in result_rs record, to make
it more brief
and leave the relevant part) 

CREATE OR REPLACE FUNCTION build_list() RETURNS SETOF result_rs AS $$
DECLARE 
    start_time TIMESTAMP;
    rec result_rs;
BEGIN
    start_time := timeofday()::timestamp;

    CREATE TEMPORARY TABLE last_snapshot AS SELECT * FROM last_take; --
last_take is a view
    CREATE INDEX last_snapshot_idx ON last_snapshot USING btree(domain_id)
WITH (fillfactor=100);

    CREATE TEMPORARY TABLE tmp_lm AS SELECT m.domain_id, nsr_id FROM
nsrs_meta m JOIN last_snapshot l ON m.domain_id = l.domain_id;
    CREATE INDEX tmp_lm_idx ON tmp_lm USING btree(nsr_id) WITH
(fillfactor=100);

    CREATE TEMPORARY TABLE tmp_ns_bl_matching_domains AS SELECT DISTINCT
lm.domain_id FROM tmp_lm lm JOIN nsrs n ON lm.nsr_id = n.id JOIN ns_bl b ON
n.ip_id = b.ip_id;
    CREATE INDEX tmp_bls_0 ON tmp_ns_bl_matching_domains USING
btree(domain_id) WITH (fillfactor=100);
    DROP TABLE tmp_lm;

    CREATE TEMPORARY TABLE temp_result AS
        SELECT
            t.domain_id,
            t.name,
            (CASE WHEN b.domain_id IS NULL THEN 0 ELSE 1 END) AS is_bl,
            (CASE WHEN f.is_s IS NULL THEN 0 ELSE f.is_s::INTEGER END) AS
is_s,
        FROM last_snapshot t  
            LEFT JOIN tmp_ns_bl_matching_domains b ON
b.domain_id=t.domain_id
            LEFT JOIN (SELECT DISTINCT ON (domain_id) * FROM domain_flags
f) f ON t.domain_id=f.domain_id;

    FOR rec IN SELECT
            UTC_NOW(),
            name,
            is_bl,
            is_s
        FROM temp_result t
    LOOP
        RETURN NEXT rec;
    END LOOP;

    DROP TABLE temp_result;
    DROP TABLE tmp_ns_bl_matching_domains;

    PERFORM time_log('BUILD', get_elapsed_time(start_time));

END;
$$ LANGUAGE plpgsql;

Thanks,
Michal


-- 
I hear and I forget. I see and I believe. I do and I understand.
(Confucius)

Re: query planning different in plpgsql?

From

Tom Lane

Date:

26 October 2009, 16:34:32

"Michal J. Kubski" <michal.kubski@cdt.pl> writes:
> [ function that creates a bunch of temporary tables and immediately
> joins them ]

It'd probably be a good idea to insert an ANALYZE on the temp tables
after you fill them.  The way you've got this set up, there is no chance
of auto-analyze correcting that oversight for you, so the planner will
be planning the join "blind" without any stats.  Good results would only
come by pure luck.

            regards, tom lane

Re: query planning different in plpgsql?

From

Waldomiro

Date:

26 October 2009, 17:11:10

Try to force a unique plan, like that:<br /><br /> SELECT field, field2 ...<br /> FROM table1<br /> WHERE field3 =
'xxx'<br/> AND field4 = 'yyy'<br /> AND field5 = 'zzz'<br /><br /> so, in that example, I need the planner to use my
field4index, but the planner insists to use the field5, so I rewrite the query like this:<br /><br /> SELECT field,
field2...<br /> FROM table1<br /> WHERE trim(field3) = 'xxx'<br /> AND field4 = 'yyy'<br /> AND trim(field5) = 'zzz'<br
/><br/> I  didn´t give any option to the planner, so I get what plan I want.<br /><br /> Waldomiro<br /><br /><br />
TomLane escreveu: <blockquote cite="mid:13574.1256580589@sss.pgh.pa.us" type="cite"><pre wrap="">"Michal J. Kubski" <a
class="moz-txt-link-rfc2396E"href="mailto:michal.kubski@cdt.pl"><michal.kubski@cdt.pl></a> writes:
</pre><blockquotetype="cite"><pre wrap="">[ function that creates a bunch of temporary tables and immediately
 
joins them ]   </pre></blockquote><pre wrap="">
It'd probably be a good idea to insert an ANALYZE on the temp tables
after you fill them.  The way you've got this set up, there is no chance
of auto-analyze correcting that oversight for you, so the planner will
be planning the join "blind" without any stats.  Good results would only
come by pure luck.
        regards, tom lane
 </pre></blockquote><br />

Re: query planning different in plpgsql?

From

"Michal J. Kubski"

Date:

29 October 2009, 11:28:29

On Mon, 26 Oct 2009 11:52:22 -0400, Merlin Moncure <mmoncure@gmail.com>
wrote:
>>>>>>   Do you not have an index on last_snapshot.domain_id?
>>>>>>
>>>>> that, and also try rewriting a query as JOIN. There might be
>>>>> difference in performance/plan.
>>>>>
>>>> Thanks, it runs better (average 240s, not 700s) with the index.
>>> Rewriting
>>>> queries
>>>> as JOINs does not make any difference.
>>>> The last_snapshot is a temp table created earlier in the procedure
>>>> and the query in question is preceded with CREATE TEMPORARY TABLE as
well,
>>>> not a cursor.
>>>> I still do not get why it performs differently inside the procedure.
>>>> Is there any way to see what planning decisions were made?
>>>
>>> not directly....can we see the function?
>>>
>>> merlin
>>
>> It looks like that (I stripped off some fields in result_rs record, to
>> make
>> it more brief
>> and leave the relevant part)
>>

>> [..function cut off..]

> hm.  what version of postgres are you using?  I have some version
> dependent suggestions.   Also, is it ok to respond to the list quoting
> any/all of your function? (I'd perfer to keep the discussion public if
> possible).
> 

Hi,

Apologies for late response. It is 8.3.7. Tom Lane's suggestion to add
ANALYZE seem to help it,
though I still sometimes get long query runs.

Thanks,
Michal

-- 
I hear and I forget. I see and I believe. I do and I understand.
(Confucius)

Re: query planning different in plpgsql?

From

"Michal J. Kubski"

Date:

29 October 2009, 11:32:52

On Mon, 26 Oct 2009 14:09:49 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Michal J. Kubski" <michal.kubski@cdt.pl> writes:
>> [ function that creates a bunch of temporary tables and immediately
>> joins them ]
> 
> It'd probably be a good idea to insert an ANALYZE on the temp tables
> after you fill them.  The way you've got this set up, there is no chance
> of auto-analyze correcting that oversight for you, so the planner will
> be planning the join "blind" without any stats.  Good results would only
> come by pure luck.
> 
>             regards, tom lane

Hi,

Apologies for late response. Thanks a lot: ANALYZE seem to help it! I
still sometimes
get long query runs though. As far as I understand using index over
sequential scan
on joins should be faster. Could it be possible that the query planner
decides
to use seqscan instead of index scan on some random occasions? 

Thanks,
Michal

-- 
I hear and I forget. I see and I believe. I do and I understand.
(Confucius)

Re: query planning different in plpgsql?

From

Scott Carey

Date:

29 October 2009, 13:44:32

On 10/23/09 8:38 AM, "Michal J.Kubski" <michal.kubski@cdt.pl> wrote:

>
>
>
>
> Hi,
>
>
>
> Is there any way to get the query plan of the query run in the stored
>
> procedure?
>
> I am running the following one and it takes 10 minutes in the procedure
>
> when it is pretty fast standalone.
>
>
>
> Any ideas would be welcome!
>
>

If your query is
SELECT field, field2 FROM table1 WHERE field3 = 'xxx' AND field4 = 'yyy'

And you want to test what the planner will do without the knowledge of the
exact values 'xxx' and 'yyy', you can prepare a statement:

#PREPARE foo() AS SELECT field, field2 FROM table1 WHERE field3 = $1 AND
field4 = $2;

#EXPLAIN execute foo('xxx', 'yyy');

If field3 and field4 don't have unique indexes, the plan might differ.  It
will most likely differ if 'xxx' or 'yyy' is a very common value in the
table and the table is not tiny.