Thread: Bug with STABLE function using the wrong snapshot (probably during planning)
Bug with STABLE function using the wrong snapshot (probably during planning)
From
Matthijs Bomhoff
Date:
Hi, The bit of SQL below does not behave the way it should on postgres 8.4.4 (t= ested by me) and 9.0.3 (verified independently on #postgresql). The third statement in the quux() function calls the a_bar() function that = should find a single row in the 'bar' table and return its value. This sing= le row is INSERTed into the 'bar' table on the previous line. However, the = SELECT statement in the a_bar() function throws the following error: "ERROR= : query returned no rows". It thus appears not to see the INSERTed value i= n the 'bar' table. (The expected behavior is that the a_bar() function retu= rns the value 500 instead of throwing an error.) Removing the STABLE attribute from a_bar() works around the problem, as doe= s moving the "INSERT INTO bar ..." statement out of the quux() function and= executing it before calling the quux() function itself. Some initial debugging by RhodiumToad on #postgresql led to the following o= bservation: The error occurs only when the "SELECT ... WHERE i =3D a_bar();= " is being planned, not when it is being executed, with the snapshot being = used to plan the query apparently being too old to see the result of the pr= eceding insert. By the way: the EXECUTE around the SELECT in the a_bar() function is probab= ly not required to trigger the bug, but this is the version we tested. Regards, Matthijs Bomhoff BEGIN; CREATE TABLE foo(i INTEGER); CREATE TABLE bar(i INTEGER); CREATE OR REPLACE FUNCTION a_bar() RETURNS INTEGER AS $EOF$ DECLARE result INTEGER; BEGIN EXECUTE 'SELECT i FROM bar' INTO STRICT result; RETURN result; END $EOF$ LANGUAGE plpgsql STABLE; CREATE OR REPLACE FUNCTION quux() RETURNS INTEGER AS $EOF$ DECLARE result INTEGER; BEGIN INSERT INTO foo(i) SELECT s.a FROM generate_series(1,1000,1) s(a); INSERT INTO bar(i) VALUES(500); SELECT INTO STRICT result COUNT(*) FROM foo WHERE i =3D a_bar(); RETURN result; END $EOF$ LANGUAGE plpgsql; SELECT quux(); ROLLBACK;
Re: Bug with STABLE function using the wrong snapshot (probably during planning)
From
Noah Misch
Date:
Hi Matthijs, Thanks for the report. On Tue, Mar 22, 2011 at 04:31:47PM +0100, Matthijs Bomhoff wrote: > The bit of SQL below does not behave the way it should on postgres 8.4.4 (tested by me) and 9.0.3 (verified independentlyon #postgresql). On git master, too. > The third statement in the quux() function calls the a_bar() function that should find a single row in the 'bar' tableand return its value. This single row is INSERTed into the 'bar' table on the previous line. However, the SELECT statementin the a_bar() function throws the following error: "ERROR: query returned no rows". It thus appears not to seethe INSERTed value in the 'bar' table. (The expected behavior is that the a_bar() function returns the value 500 insteadof throwing an error.) > > Removing the STABLE attribute from a_bar() works around the problem, as does moving the "INSERT INTO bar ..." statementout of the quux() function and executing it before calling the quux() function itself. > > Some initial debugging by RhodiumToad on #postgresql led to the following observation: The error occurs only when the "SELECT... WHERE i = a_bar();" is being planned, not when it is being executed, with the snapshot being used to plan thequery apparently being too old to see the result of the preceding insert. Quite so. All the core procedural languages have _SPI_execute_plan() manage CommandCounterIncrement() and PushActiveSnapshot()/PopActiveSnapshot() for the SQL statements they execute. Many statements use a snapshot during planning, but _SPI_prepare_plan() never pushes one. Therefore, in this example, planning uses the snapshot pushed in PortalRunSelect(). Expressions evaluated at plan time miss any changes from earlier in the volatile function. This is fine when they merely give "wrong" answers: we might get an inferior selectivity estimate. In your example, a function that actually needs to see the latest data to avoid throwing an error, we do have a problem. The simplest fix I can see is to have _SPI_prepare_plan() push/pop a new snapshot when analyze_requires_snapshot() returns true on the raw parse tree. That strategy can break down in the other direction if the caller is STABLE; consider this example: CREATE TABLE foo(i INTEGER); CREATE TABLE bar(i INTEGER); INSERT INTO foo(i) SELECT s.a FROM generate_series(1,2) s(a); INSERT INTO bar(i) VALUES(500); BEGIN; CREATE OR REPLACE FUNCTION a_bar() RETURNS INTEGER AS $EOF$ DECLARE result INTEGER; BEGIN EXECUTE 'SELECT i FROM bar' INTO STRICT result; RETURN result; END $EOF$ LANGUAGE plpgsql STABLE; CREATE OR REPLACE FUNCTION quux() RETURNS INTEGER AS $EOF$ BEGIN LOOP RAISE NOTICE 'iteration'; EXECUTE 'SELECT COUNT(*) FROM foo WHERE i = a_bar()'; PERFORM pg_sleep(3); END LOOP; END $EOF$ LANGUAGE plpgsql STABLE; SELECT quux(); -- concurrently: -- INSERT INTO bar VALUES (501); ROLLBACK; With the current code, the function call runs indefinitely. With the _SPI_prepare_plan() change, it fails during planning on the next iteration after the concurrent change. This seems less severe than the current bug, but it's still not great. We could preserve the behavior of that example by instead adding a "read_only" parameter to SPI_prepare* (or defining new functions with the parameter) and having that parameter control snapshot acquisition as it does for SPI_execute*. Opinions? Better ideas? > BEGIN; > > CREATE TABLE foo(i INTEGER); > CREATE TABLE bar(i INTEGER); > > CREATE OR REPLACE FUNCTION a_bar() RETURNS INTEGER AS $EOF$ > DECLARE > result INTEGER; > BEGIN > EXECUTE 'SELECT i FROM bar' INTO STRICT result; > RETURN result; > END > $EOF$ LANGUAGE plpgsql STABLE; > > CREATE OR REPLACE FUNCTION quux() RETURNS INTEGER AS $EOF$ > DECLARE > result INTEGER; > BEGIN > INSERT INTO foo(i) SELECT s.a FROM generate_series(1,1000,1) s(a); > INSERT INTO bar(i) VALUES(500); > SELECT INTO STRICT result COUNT(*) FROM foo WHERE i = a_bar(); > RETURN result; > END > $EOF$ LANGUAGE plpgsql; > > SELECT quux(); > > ROLLBACK;
Re: Re: Bug with STABLE function using the wrong snapshot (probably during planning)
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: >> Some initial debugging by RhodiumToad on #postgresql led to the following observation: The error occurs only when the"SELECT ... WHERE i = a_bar();" is being planned, not when it is being executed, with the snapshot being used to planthe query apparently being too old to see the result of the preceding insert. > The simplest fix I can see is to have _SPI_prepare_plan() push/pop a new > snapshot when analyze_requires_snapshot() returns true on the raw parse tree. > That strategy can break down in the other direction if the caller is STABLE; > consider this example: Yes. I'm of the opinion that we should not change this. In general, marking functions STABLE that have major side effects (such as throwing errors) is not a good idea, and putting such things into WHERE clauses is a worse one. We explicitly do not guarantee anything about timing or order of evaluation in WHERE clauses, because to do so would cripple the planner's ability to optimize them. So I think this is a "don't do that" case rather than "we should try to make planning happen with the same snapshot that will be used at execution" case. regards, tom lane
Re: Re: Bug with STABLE function using the wrong snapshot (probably during planning)
From
Matthijs Bomhoff
Date:
On May 13, 2011, at 12:46 AM, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: >>> Some initial debugging by RhodiumToad on #postgresql led to the followi= ng observation: The error occurs only when the "SELECT ... WHERE i =3D a_ba= r();" is being planned, not when it is being executed, with the snapshot be= ing used to plan the query apparently being too old to see the result of th= e preceding insert. >=20 >> The simplest fix I can see is to have _SPI_prepare_plan() push/pop a new >> snapshot when analyze_requires_snapshot() returns true on the raw parse = tree. >> That strategy can break down in the other direction if the caller is STA= BLE; >> consider this example: >=20 > Yes. I'm of the opinion that we should not change this. In general, > marking functions STABLE that have major side effects (such as throwing > errors) is not a good idea, and putting such things into WHERE clauses > is a worse one. We explicitly do not guarantee anything about timing or > order of evaluation in WHERE clauses, because to do so would cripple the > planner's ability to optimize them. So I think this is a "don't do > that" case rather than "we should try to make planning happen with the > same snapshot that will be used at execution" case. First of all, thanks to both of you for your replies to my email; at least = now I understand a bit more of what went wrong. As I know almost nothing about the internals, I am not one to argue about w= hether or not to change the current behavior. It would however perhaps be n= ice to have the documentation of the volatility categories mention explicit= ly that throwing an error is also considered a (major) side-effect. In part= icular: apparently not only are modifications to the database itself consid= ered side-effects, but so is "irregular" control flow within a procedural l= anguage... Based on the current documentation, I thought that as my functio= n made no changes to the database and returns the same results given the sa= me arguments for all rows within a single statement, it could safely be mar= ked as STABLE. Regards, Matthijs