Thread: spurious function execution in prepared statements.
OK, I have a situation that might be a performance problem, a bug, or an unavoidable consequence of using prepared statements. The short version is that I am getting function executions for rows not returned in a result set when they are in a prepared statement. In other words, I have a query: select f(t.c) from t where [boolean expr on t] limit 1; because of the limit phrase, obviously, at most one record is returned and f executes at most once regardless of the plan used (in practice, sometimes index, sometimes seq_scan. Now, if the same query is executed as a prepared statement, prepare ps(...) as select f(t.c) from t where [expr] limit 1; execute ps; now, if ps ends up using a index scan on t, everything is ok. However, if ps does a seqscan, f executes for every row on t examined until the [expr] criteria is met. Is this a bug? If necessary I should be able to set up a reproducible example. The easy workaround is to not use prepared statements in these situations, but I need to be able to guarantee that f only executes once (even if that means exploring subqueries). Merlin
On Thu, Sep 30, 2004 at 09:45:51AM -0400, Merlin Moncure wrote: > Now, if the same query is executed as a prepared statement, > prepare ps(...) as select f(t.c) from t where [expr] limit 1; > execute ps; > > now, if ps ends up using a index scan on t, everything is ok. However, > if ps does a seqscan, f executes for every row on t examined until the > [expr] criteria is met. Is this a bug? If necessary I should be able > to set up a reproducible example. The easy workaround is to not use > prepared statements in these situations, but I need to be able to > guarantee that f only executes once (even if that means exploring > subqueries). Here's another workaround that may let you use a prepared statement: prepare ps(...) as select f(c) from (select c from t where [expr] limit 1) as t1 -Mike
On Thu, 30 Sep 2004, Merlin Moncure wrote: > OK, I have a situation that might be a performance problem, a bug, or an > unavoidable consequence of using prepared statements. The short version > is that I am getting function executions for rows not returned in a > result set when they are in a prepared statement. > > In other words, I have a query: > select f(t.c) from t where [boolean expr on t] limit 1; An actual boolean expr on t? Or on a column in t? > because of the limit phrase, obviously, at most one record is returned > and f executes at most once regardless of the plan used (in practice, > sometimes index, sometimes seq_scan. > > Now, if the same query is executed as a prepared statement, > prepare ps(...) as select f(t.c) from t where [expr] limit 1; > execute ps; > > now, if ps ends up using a index scan on t, everything is ok. However, > if ps does a seqscan, f executes for every row on t examined until the > [expr] criteria is met. Is this a bug? If necessary I should be able > to set up a reproducible example. The easy workaround is to not use > prepared statements in these situations, but I need to be able to > guarantee that f only executes once (even if that means exploring > subqueries). I think a reproducible example would be good. Simple attempts to duplicate this on 8.0b2 have failed for me, unless I'm using order by.
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > now, if ps ends up using a index scan on t, everything is ok. However, > if ps does a seqscan, f executes for every row on t examined until the > [expr] criteria is met. Is this a bug? Works for me. regression=# create function f(int) returns int as ' regression'# begin regression'# raise notice ''f(%)'', $1; regression'# return $1; regression'# end' language plpgsql; CREATE FUNCTION regression=# select f(unique2) from tenk1 where unique2%2 = 1 limit 2; NOTICE: f(1) NOTICE: f(3) f --- 1 3 (2 rows) regression=# prepare ps as regression-# select f(unique2) from tenk1 where unique2%2 = 1 limit 2; PREPARE regression=# execute ps; NOTICE: f(1) NOTICE: f(3) f --- 1 3 (2 rows) regression=# You sure you aren't using f() in the WHERE clause? regards, tom lane
Stephan Szabo wrote: > On Thu, 30 Sep 2004, Merlin Moncure wrote: > > > OK, I have a situation that might be a performance problem, a bug, or an > > unavoidable consequence of using prepared statements. The short version > > is that I am getting function executions for rows not returned in a > > result set when they are in a prepared statement. > An actual boolean expr on t? Or on a column in t? [...] > I think a reproducible example would be good. Simple attempts to duplicate > this on 8.0b2 have failed for me, unless I'm using order by. Note: I confirmed that breaking out the 'where' part of the query into subquery suppresses the behavior. Here is the actual query: select lock_cuid(id), * from data3.wclaim_line_file where wcl_vin_no >= '32-MHAB-C-X-7243' and (wcl_vin_no > '32-MHAB-C-X-7243' or wcl_claim_no >= 001) and (wcl_vin_no > '32-MHAB-C-X-7243' or wcl_claim_no > 001 or id > 2671212) order by wcl_vin_no, wcl_claim_no, id limit 1 Here is the prepared statement declaration: prepare data3_read_next_wclaim_line_file_1_lock (character varying, numeric, int8, numeric) as select lock_cuid(id), * from data3.wclaim_line_file where wcl_vin_no >= $1 and (wcl_vin_no > $1 or wcl_claim_no >= $2) and (wcl_vin_no > $1 or wcl_claim_no > $2 or id > $3) order by wcl_vin_no, wcl_claim_no, id limit $4 Here is the plan when it runs lock_cuid repeatedly (aside: disabling seqscans causes an index plan, but that's not the point): esp=# explain execute data3_read_next_wclaim_line_file_1_lock ('32-MHAB-C-X-7243', 001, 2671212, 1); QUERY PLAN ------------------------------------------------------------------------ ---------------------------- ------------------------------------------------------------------------ ---------------------------- -------------------------------- Limit (cost=13108.95..13162.93 rows=21592 width=260) -> Sort (cost=13108.95..13162.93 rows=21592 width=260) Sort Key: wcl_vin_no, wcl_claim_no, id -> Seq Scan on wclaim_line_file (cost=0.00..11554.52 rows=21592 width=260) Filter: (((wcl_vin_no)::text >= ($1)::text) AND (((wcl_vin_no)::text > ($1)::text) OR ((wcl_claim_no)::numeric >= $2)) AND (((wcl_vin_no)::text > ($1)::text) OR ((wcl_claim_no)::numeric > $2) OR ((id)::bigint > $3))) (5 rows)
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > Here is the actual query: > select lock_cuid(id), * > ... > order by wcl_vin_no, wcl_claim_no, id > limit 1 Looks like Stephan made the right guess. Logically the LIMIT executes after the ORDER BY, so the sorted result has to be formed completely. The fact that we are able to optimize this in some cases does not represent a promise that we can do it in all cases. Ergo, it's not a bug. regards, tom lane