Thread: [HACKERS] Bug in prepared statement cache invalidation?

[HACKERS] Bug in prepared statement cache invalidation?

From
Konstantin Knizhnik
Date:
Hi hackers,

I find out that now Postgres correctly invalidates prepared plans which 
directly depend on altered relation, but doesn't invalidate plans having 
transitive (indirect) dependencies.
Is it a bug or feature?

postgres=# create table foo(x integer);
CREATE TABLE
postgres=# select * from foo; x
---
(0 rows)

postgres=# create function returnqueryf()returns setof foo  as $$ begin 
return query select * from foo; end; $$ language plpgsql;
CREATE FUNCTION
postgres=# select * from returnqueryf(); x
---
(0 rows)

postgres=# create function returnqueryff()returns setof foo  as $$ begin 
return query select * from returnqueryf(); end; $$ language plpgsql;
CREATE FUNCTION
postgres=# select * from returnqueryff(); x
---
(0 rows)

postgres=# alter table foo add column y integer;
ALTER TABLE
postgres=# select * from foo; x | y
---+---
(0 rows)

postgres=# select * from returnqueryf(); x | y
---+---
(0 rows)

postgres=# select * from returnqueryff();
ERROR:  structure of query does not match function result type
DETAIL:  Number of returned columns (1) does not match expected column 
count (2).
CONTEXT:  PL/pgSQL function returnqueryff() line 1 at RETURN QUERY
p

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Bug in prepared statement cache invalidation?

From
Robert Haas
Date:
On Fri, Apr 28, 2017 at 3:01 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> I find out that now Postgres correctly invalidates prepared plans which
> directly depend on altered relation, but doesn't invalidate plans having
> transitive (indirect) dependencies.

I think the problem here is that replanning and recompiling are two
different things.  Here's a slightly different example:

rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create table bar (a text, b int);
CREATE TABLE
rhaas=# create function quux() returns void as $$declare f foo; begin
select * from foo into f limit 1; raise notice '%', f; end$$ language
plpgsql;
CREATE FUNCTION
rhaas=# insert into foo values (1, 'one');
INSERT 0 1
rhaas=# insert into bar values ('two', 2);
INSERT 0 1
rhaas=# select quux();
NOTICE:  (1,one)quux
------

(1 row)

rhaas=# begin;
BEGIN
rhaas=# alter table foo rename to snazzy;
ALTER TABLE
rhaas=# alter table bar rename to foo;
ALTER TABLE
rhaas=# select quux();
ERROR:  invalid input syntax for integer: "two"
CONTEXT:  PL/pgSQL function quux() line 1 at SQL statement

So what has happened here is that the query "SELECT * FROM foo" inside
of quux() got replanned, but the variable f declared by quux() as
being type foo did not get changed to match the new definition of type
foo.  The plancache is working just fine here; it correctly identifies
each statement that needs re-planning and performs that re-planning as
required.  The problem is that this only affects the statements
themselves, not other PL-created data structures like local variables.
The PL's cache of compiled functions is the issue, not the plancache.
I think that's also the problem in your test case.  Your test case
doesn't explicitly involve a variable but I think there must be
something like that under the hood.

This problem has been discussed before but nobody's done anything
about it.  The problem is a bit tricky because the core system doesn't
know anything about the function caches maintained by individual PLs.
I suppose ideally there'd be a way for a PL to say "if the definition
of X changes, please tell me to recompile function Y".  That probably
wouldn't be perfect because the PL might not be able to figure out
everything on which they actually depend; that might be
Turing-complete in some cases.  But even a partial solution would
probably be welcomed by users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Bug in prepared statement cache invalidation?

From
Konstantin Knizhnik
Date:

On 01.05.2017 16:09, Robert Haas wrote:
>
> This problem has been discussed before but nobody's done anything
> about it.  The problem is a bit tricky because the core system doesn't
> know anything about the function caches maintained by individual PLs.
> I suppose ideally there'd be a way for a PL to say "if the definition
> of X changes, please tell me to recompile function Y".  That probably
> wouldn't be perfect because the PL might not be able to figure out
> everything on which they actually depend; that might be
> Turing-complete in some cases.  But even a partial solution would
> probably be welcomed by users.
>

Thank you for explanation.
May be I am missing something, but what is the problem with keeping 
dependencies for  PL functions?
As you wrote, PL can inform core that functions depends on some set of 
relations/types/functions and so has to be recompiled if some of them 
are changed.
It is not necessary to build closure of dependency graph - instead of it 
cascade invalidation can be used.
So it is not clear to me where you see here the source of complexity and 
why this task may be "Turing-complete in some cases"?

The problem can be with overloaded functions and PL languages without 
static type checking.
In this case  resolving has to be performed at runtime during function 
evaluation. But there should be no such problem with PLpgSQL.

But definitely introducing such dependency tracking mechanism for PL 
will require a lot of changes, in all PL implementations. Looks like no 
easy fix is possible here.
I am not sure how critical is this problem. Definitely it rarely 
happens, but lack of normal workarounds (restart backend, recreate 
function?) seems to be  disappointing.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Bug in prepared statement cache invalidation?

From
Robert Haas
Date:
On Tue, May 2, 2017 at 6:07 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> Thank you for explanation.
> May be I am missing something, but what is the problem with keeping
> dependencies for  PL functions?
> As you wrote, PL can inform core that functions depends on some set of
> relations/types/functions and so has to be recompiled if some of them are
> changed.
> It is not necessary to build closure of dependency graph - instead of it
> cascade invalidation can be used.
> So it is not clear to me where you see here the source of complexity and why
> this task may be "Turing-complete in some cases"?

Well, if the user's using PL/python or similar, they can do an
arbitrary computation and use the result as a table name.  There's no
way for the core system to know what table name that will end up
referencing.

> The problem can be with overloaded functions and PL languages without static
> type checking.
> In this case  resolving has to be performed at runtime during function
> evaluation. But there should be no such problem with PLpgSQL.

Yeah, maybe.  But what if the PL/pgsql function calls some other
function that creates a table, and then tries to access that table
from the original function, or something like that?  There are weird
cases, I think, where even in PL/pgSQL it's not easy to figure out for
sure what the dependencies are.

> I am not sure how critical is this problem. Definitely it rarely happens,
> but lack of normal workarounds (restart backend, recreate function?) seems
> to be  disappointing.

The problem goes away if you reconnect.  The problematic cache is only
backend-lifetime.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Bug in prepared statement cache invalidation?

From
Konstantin Knizhnik
Date:
On 05/02/2017 09:30 PM, Robert Haas wrote:
>> I am not sure how critical is this problem. Definitely it rarely happens,
>> but lack of normal workarounds (restart backend, recreate function?) seems
>> to be  disappointing.
> The problem goes away if you reconnect.  The problematic cache is only
> backend-lifetime.
>
Most of clients are not connected to the Postgres directly, them are using some kind of connection pooling.
It means that backends are never restarted. And it will be necessary to restart the whole service just because we do
nothave
 
dependency tracking mechanism for PL code. Even invalidation of all functions in case of DDL seems to be more
acceptablesolution.
 

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Bug in prepared statement cache invalidation?

From
Robert Haas
Date:
On Tue, May 2, 2017 at 3:10 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> On 05/02/2017 09:30 PM, Robert Haas wrote:
>>> I am not sure how critical is this problem. Definitely it rarely happens,
>>> but lack of normal workarounds (restart backend, recreate function?)
>>> seems
>>> to be  disappointing.
>>
>> The problem goes away if you reconnect.  The problematic cache is only
>> backend-lifetime.
>>
> Most of clients are not connected to the Postgres directly, them are using
> some kind of connection pooling.
> It means that backends are never restarted. And it will be necessary to
> restart the whole service just because we do not have
> dependency tracking mechanism for PL code. Even invalidation of all
> functions in case of DDL seems to be more acceptable solution.

Yeah.  I think there should be a way to tell a PL to flush any
internal caches it is maintaining, some variant of DISCARD.  But that
would require a bunch of code that nobody's written yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Bug in prepared statement cache invalidation?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Yeah.  I think there should be a way to tell a PL to flush any
> internal caches it is maintaining, some variant of DISCARD.  But that
> would require a bunch of code that nobody's written yet.

That mechanism already exists, so far as the core code is concerned:
register a syscache inval callback.  But you're right, getting plpgsql
to actually do anything about changes in composite types would require
a bunch of code that nobody's written yet.

If you'll pardon my bashing on a long-deceased horse, this would be
noticeably easier if we stopped using the PLPGSQL_DTYPE_ROW code
paths for composite-type variables.  That mechanism was really
designed for cases like "SELECT ... INTO a,b,c" where the row
contents are fully determined by the function text.  It's practically
impossible to make it cope with dynamic changes; at the very least
you have to recompile the whole function.
        regards, tom lane



Re: [HACKERS] Bug in prepared statement cache invalidation?

From
Robert Haas
Date:
On Tue, May 2, 2017 at 5:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Yeah.  I think there should be a way to tell a PL to flush any
>> internal caches it is maintaining, some variant of DISCARD.  But that
>> would require a bunch of code that nobody's written yet.
>
> That mechanism already exists, so far as the core code is concerned:
> register a syscache inval callback.  But you're right, getting plpgsql
> to actually do anything about changes in composite types would require
> a bunch of code that nobody's written yet.

Well, that'd be a way of doing automatic invalidations, not manual
ones.  Making DISCARD PROCEDURAL LANGUAGE CRAP work would a different
pile of code.

> If you'll pardon my bashing on a long-deceased horse, this would be
> noticeably easier if we stopped using the PLPGSQL_DTYPE_ROW code
> paths for composite-type variables.  That mechanism was really
> designed for cases like "SELECT ... INTO a,b,c" where the row
> contents are fully determined by the function text.  It's practically
> impossible to make it cope with dynamic changes; at the very least
> you have to recompile the whole function.

I guess that's also a project in need of some round tuits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company