Re: Slow queries in PL/PGSQL function - Mailing list pgsql-general

From Gary Doades
Subject Re: Slow queries in PL/PGSQL function
Date
Msg-id rptc3057jerjnq50tl8lqfijmdr7l7qvsv@4ax.com
Whole thread Raw
In response to Slow queries in PL/PGSQL function  (Jim Crate <jcrate@deepskytech.com>)
Responses Re: Slow queries in PL/PGSQL function
List pgsql-general
I have a very similar problem. I put the following SQL into a
function:

SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS,
SC.MIN_HOURS,
 (SELECT COUNT(*) FROM TIMESHEET_DETAIL
JOIN MAIN_ORDER ON (MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID
AND MAIN_ORDER.CLIENT_ID = $3)
 WHERE TIMESHEET_DETAIL.CONTRACT_ID = SC.CONTRACT_ID) AS VISITS,
 (SELECT (SUM(R.DURATION+1))/60.0 FROM ORDER_REQT R
 JOIN STAFF_BOOKING B ON (B.REQT_ID = R.REQT_ID)
 JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID)
 WHERE B.CONTRACT_ID = SC.CONTRACT_ID
 AND BP.BOOKING_DATE BETWEEN $1 AND $2) AS RHOURS
 FROM VSTAFF VS
  JOIN STAFF_CONTRACT SC ON (SC.STAFF_ID = VS.STAFF_ID)
 JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID)
 JOIN SEARCH_REQT_RESULT SR ON (SR.STAFF_ID = VS.STAFF_ID)
WHERE SR.SEARCH_ID = $4
AND SC.CONTRACT_ID IN
(SELECT C.CONTRACT_ID FROM STAFF_PRODUCT P,STAFF_CONTRACT C
 WHERE P.CONTRACT_ID=C.CONTRACT_ID AND C.STAFF_ID = VS.STAFF_ID AND
P.PRODUCT_ID IN (SELECT PRODUCT_ID FROM SEARCH_ORDER_REQT WHERE
SEARCH_ID = $4)
 AND C.AVAIL_DATE_FROM <= $1 AND C.AVAIL_DATE_TO >= $2  GROUP BY
C.CONTRACT_ID
 HAVING (COUNT(C.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID)
FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = $4)));

When executed from the client and substituting constants for the
variables ($1,$2,$3) it takes 1 second. When executed from within a
function using exactly the same values it takes 30 seconds! Using your
method to get the execution plan with variables I can see it has
switched to a sequential scan on the BOOKING_PLAN table (250K records)
instead of the index it uses otherwise. I have spent a while
optimising this and tested with quite a range of date values. It
always uses the indexes when executed from the client. The date ranges
given are usually quite small.

I am trying to migrate from MS SQLSever to PG and this has now stopped
me for a while. I could transfer all the stored procedures into client
code, but that seems very inefficient.

Is there someway to force the use of an index. Or at least get the
backend to substitue the parameters in a function before doing the
first query plan so it has more typical values to work with?

Regards,
Gary.

On Fri, 20 Feb 2004 09:20:24 -0500, tgl@sss.pgh.pa.us (Tom Lane)
wrote:

>Richard Huxton <dev@archonet.com> writes:
>> On Thursday 19 February 2004 23:00, Jim Crate wrote:
>>> explain analyze SELECT DISTINCT i_ip
>>> FROM x_rbl_ips
>>> LEFT JOIN filter_ips ON x_rbl_ips.i_ip = filter_ips.i_filter_ip
>>> WHERE x_rbl_ips.dts_last_modified > '2004-02-18 22:24:15.901689+00'
>>> AND filter_ips.i_filter_ip IS NOT NULL
>>> AND (i_filter_type_flags & X'02000000'::integer) <> X'02000000'::integer
>
>> I'm guessing that the values in your query are variables/parameters in the
>> plpgsql function? The problem is that the plan is compiled when the function
>> is first run, so it doesn't know what values you will use. You might tend to
>> use values that make sense to index, but it can't tell.
>
>Specifically, the only part of that that looks indexable is the
>dts_last_modified constraint.  If it's always "dts_last_modified >
>some-time-in-the-recent-past" then the range of values scanned is going
>to be small enough to make an indexscan worthwhile.  But if what the
>planner sees is "dts_last_modified > variable" then it isn't gonna risk
>an indexscan, because in the general case that could mean scanning a
>large part of the table, and the indexscan would be tremendously slow.
>
>What you can do to work around this (I'm assuming dts_last_modified
>never contains values in the future) is add a dummy constraint:
>
>    WHERE x_rbl_ips.dts_last_modified > variable
>      AND x_rbl_ips.dts_last_modified <= now()
>      AND other-stuff
>
>Now what the planner sees is dts_last_modified being constrained to a
>range, rather than an open-ended interval.  It still has to guess about
>how much of the index will be scanned, but its guess in this scenario
>is much smaller and it should usually pick the indexscan instead.
>
>BTW, in recent releases you can investigate planner choices involving
>queries with variables by using PREPARE and EXPLAIN EXECUTE.  For
>example
>
>    PREPARE myq(timestamptz) AS
>    ...
>    WHERE x_rbl_ips.dts_last_modified > $1
>    ...
>
>    EXPLAIN EXECUTE myq('2004-02-18 22:24:15.901689+00');
>
>This allows you to exactly reproduce the conditions that the planner has
>to work under when planning a query from a plpgsql function.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend


pgsql-general by date:

Previous
From: Christopher Browne
Date:
Subject: Re: embeding postgre
Next
From: Chris Browne
Date:
Subject: Re: embeding postgreSQL