Thread: BUG #15122: can't import data if table has a constraint with afunction calling another function

The following bug has been logged on the website:

Bug reference:      15122
Logged by:          Asier Lostalé
Email address:      asier.lostale@openbravo.com
PostgreSQL version: 9.3.22
Operating system:   ubuntu
Description:

Using only public schema, data in tables that use in check constraints
functions that invoke other functions does not get imported with pg_restore
after it was dumped with pg_dump. But if functions in check constraints do
not invoke other functions, data is correctly imported.

I have noted this behavior after minor upgrading from 9.3.22 to 9.3.23 and
from 9.4.16 to 9.4.17; in 9.3.22 and 9.3.17 it worked fine.

I understand is due to changes in search_path
(https://bucardo.org/postgres_all_versions.html#version_9.3.22).

But it's unclear to me why having one level public functions is allowed but
it is not those functions to invoke other ones. It looks inconsistent.

For example, having this structure and data:

CREATE OR REPLACE FUNCTION is_even(n integer) RETURNS boolean AS $BODY$ 
BEGIN
  return n%2 = 0;  
END ; $BODY$  LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION is_even_positive(n integer) RETURNS boolean AS
$BODY$ 
BEGIN
  return is_even(n) and n > 0;  
END ; $BODY$  LANGUAGE plpgsql;


CREATE TABLE test_check (
  n integer
  CONSTRAINT even_chk CHECK (is_even(n)));

CREATE TABLE test_check2(
  n integer
  CONSTRAINT even_positive_chk CHECK (is_even_positive(n)));

insert into test_check values (2);
insert into test_check values (-2);
insert into test_check2 values (2);
Exporting it with:

pg_dump -h localhost -p 5432  -F c -b -v -f test.dmp test -U test
And importing it in a new database:

$ pg_restore -d test2 -U test -v test.dmp -h localhost
pg_restore: connecting to database for restore
pg_restore: creating SCHEMA "public"
pg_restore: creating COMMENT "SCHEMA public"
pg_restore: creating EXTENSION "plpgsql"
pg_restore: creating COMMENT "EXTENSION plpgsql"
pg_restore: creating FUNCTION "public.is_even(integer)"
pg_restore: creating FUNCTION "public.is_even_positive(integer)"
pg_restore: creating TABLE "public.test_check"
pg_restore: creating TABLE "public.test_check2"
pg_restore: processing data for table "public.test_check"
pg_restore: processing data for table "public.test_check2"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2035; 0 7784774 TABLE DATA
test_check2 tad
pg_restore: [archiver (db)] COPY failed for table "test_check2": ERROR:
function is_even(integer) does not exist
LINE 1: SELECT is_even(n) and n > 0
               ^
HINT:  No function matches the given name and argument types. You might need
to add explicit type casts.
QUERY:  SELECT is_even(n) and n > 0
CONTEXT:  PL/pgSQL function public.is_even_positive(integer) line 3 at
RETURN
COPY test_check2, line 1: "2"
pg_restore: creating ACL "SCHEMA public"
WARNING: errors ignored on restore: 1





>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> Using only public schema, data in tables that use in check
 PG> constraints functions that invoke other functions does not get
 PG> imported with pg_restore after it was dumped with pg_dump. But if
 PG> functions in check constraints do not invoke other functions, data
 PG> is correctly imported.

 PG> I have noted this behavior after minor upgrading from 9.3.22 to
 PG> 9.3.23 and from 9.4.16 to 9.4.17; in 9.3.22 and 9.3.17 it worked
 PG> fine.

This is definitely fallout from the security fixes related to
search_path.

 PG> But it's unclear to me why having one level public functions is
 PG> allowed but it is not those functions to invoke other ones. It
 PG> looks inconsistent.

Yes, the reason for this is that function bodies are for the most part
treated as opaque strings everywhere except when actually executing the
function. This means that when a function makes a non-schema-qualified
reference to another function in its body, the search_path lookup is
performed at runtime, and so it depends on the runtime setting of
search_path.

In contrast, a CHECK constraint stores a pre-parsed expression tree
which refers to the function by its oid, not name, so that when pg_dump
dumps it out as SQL it can use a schema-qualified name in the output.

Since pg_dump now does the restore with only pg_catalog in the
search_path, the first function is successfully called because it is
schema-qualified in the CHECK constraint definition, but the second
function is not found because it is referenced only by an unqualified
name. You could do this:

 PG>   return is_even(n) and n > 0;  

  return public.is_even(n) and n > 0;

-- 
Andrew (irc:RhodiumToad)


Thanks Andrew for your quick response and clear explanation.

Can I understand from your explanation this is not considered as a bug?

Although the adding a qualified reference workarounds the problem, it forces to write pl code that is aware of the schema it is going to be imported in. How could I write this code to be schema agnostic, so I can import it in any schema without modifying it?

On Tue, Mar 20, 2018 at 11:13 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> Using only public schema, data in tables that use in check
 PG> constraints functions that invoke other functions does not get
 PG> imported with pg_restore after it was dumped with pg_dump. But if
 PG> functions in check constraints do not invoke other functions, data
 PG> is correctly imported.

 PG> I have noted this behavior after minor upgrading from 9.3.22 to
 PG> 9.3.23 and from 9.4.16 to 9.4.17; in 9.3.22 and 9.3.17 it worked
 PG> fine.

This is definitely fallout from the security fixes related to
search_path.

 PG> But it's unclear to me why having one level public functions is
 PG> allowed but it is not those functions to invoke other ones. It
 PG> looks inconsistent.

Yes, the reason for this is that function bodies are for the most part
treated as opaque strings everywhere except when actually executing the
function. This means that when a function makes a non-schema-qualified
reference to another function in its body, the search_path lookup is
performed at runtime, and so it depends on the runtime setting of
search_path.

In contrast, a CHECK constraint stores a pre-parsed expression tree
which refers to the function by its oid, not name, so that when pg_dump
dumps it out as SQL it can use a schema-qualified name in the output.

Since pg_dump now does the restore with only pg_catalog in the
search_path, the first function is successfully called because it is
schema-qualified in the CHECK constraint definition, but the second
function is not found because it is referenced only by an unqualified
name. You could do this:

 PG>   return is_even(n) and n > 0;

  return public.is_even(n) and n > 0;

--
Andrew (irc:RhodiumToad)

>>>>> "Asier" == Asier Lostalé <asier.lostale@openbravo.com> writes:

 Asier> Thanks Andrew for your quick response and clear explanation.

 Asier> Can I understand from your explanation this is not considered as
 Asier> a bug?

I would call it a misfeature rather than a bug.

 Asier> Although the adding a qualified reference workarounds the
 Asier> problem, it forces to write pl code that is aware of the schema
 Asier> it is going to be imported in. How could I write this code to be
 Asier> schema agnostic, so I can import it in any schema without
 Asier> modifying it?

For plpgsql (and other pl/* languages, but not LANGUAGE SQL) the best
way is probably to do this:

SET search_path = public;   -- or whatever schema

CREATE OR REPLACE FUNCTION is_even_positive(integer)
  RETURNS boolean
  LANGUAGE plpgsql
  IMMUTABLE
  SET SEARCH_PATH FROM CURRENT    -- ** this is the important bit
  AS $$
    begin
        return is_even($1) and $1 >= 0;
    end;
$$;

Some caveats:

1) The default search_path is "$user",public. Using SET SEARCH_PATH FROM
   CURRENT doesn't interact well with this (arguably this part _is_ a
   bug), so either ensure that the search_path is set to something that
   doesn't exclude $user, or (if you need something that works in a
   script) you can canonicalize it first using this query:

   SELECT set_config('search_path',
                     string_agg(quote_ident(s),','),
                     false)  -- change to true for equivalent of SET LOCAL
     FROM unnest(current_schemas(false)) s;

2) This doesn't work well for LANGUAGE SQL functions since it would
   block inlining, which is usually the primary reason for using
   LANGUAGE SQL in the first place. I don't know of any good workaround
   for those except to explicitly use the schema in the function body
   (possibly via text substitution).

--
Andrew (irc:RhodiumToad)


On Tue, Mar 20, 2018 at 12:11:45PM +0000, Andrew Gierth wrote:
> >>>>> "Asier" == Asier Lostalé <asier.lostale@openbravo.com> writes:
> 
>  Asier> Thanks Andrew for your quick response and clear explanation.
> 
>  Asier> Can I understand from your explanation this is not considered as
>  Asier> a bug?
> 
> I would call it a misfeature rather than a bug.
> 
>  Asier> Although the adding a qualified reference workarounds the
>  Asier> problem, it forces to write pl code that is aware of the schema
>  Asier> it is going to be imported in. How could I write this code to be
>  Asier> schema agnostic, so I can import it in any schema without
>  Asier> modifying it?
> 
> For plpgsql (and other pl/* languages, but not LANGUAGE SQL) the best
> way is probably to do this:
> 
> SET search_path = public;   -- or whatever schema
> 
> CREATE OR REPLACE FUNCTION is_even_positive(integer)
>   RETURNS boolean
>   LANGUAGE plpgsql
>   IMMUTABLE
>   SET SEARCH_PATH FROM CURRENT    -- ** this is the important bit
>   AS $$
>     begin
>         return is_even($1) and $1 >= 0;
>     end;
> $$;
> 
> Some caveats:
> 
> 1) The default search_path is "$user",public. Using SET SEARCH_PATH FROM
>    CURRENT doesn't interact well with this (arguably this part _is_ a
>    bug), so either ensure that the search_path is set to something that
>    doesn't exclude $user, or (if you need something that works in a
>    script) you can canonicalize it first using this query:
> 
>    SELECT set_config('search_path',
>                      string_agg(quote_ident(s),','),
>                      false)  -- change to true for equivalent of SET LOCAL
>      FROM unnest(current_schemas(false)) s;
> 
> 2) This doesn't work well for LANGUAGE SQL functions since it would
>    block inlining, which is usually the primary reason for using
>    LANGUAGE SQL in the first place. I don't know of any good workaround
>    for those except to explicitly use the schema in the function body
>    (possibly via text substitution).

[Hackers email list added.]

Uh, you might want to look at this thread too:


https://www.postgresql.org/message-id/flat/152106914669.1223.5104148605998271987%40wrigleys.postgresql.org#152106914669.1223.5104148605998271987@wrigleys.postgresql.org

In that thread there is discussion of how function body extensions can
reference other extension objects when the function doesn't know what
schema it is being installed in.  It uses SQL functions in an index, and
it is inlining that is causing the restore failure.

I had forgotten that SET SEARCH_PATH FROM CURRENT can be used during
function creation, but that forces the function to trust the schema it
is being installed in, which we don't want because it opens us up to the
same problems the removal of public was trying to avoid.

So, the crux of the problem is how do you install a function in an
non-predefined schema, e.g. public, and reference other extension
objects in a safe and transparent way.  This is true of non-extension
objects as well if they are assuming they can reference other objects
that were created earlier, and the schema is not predefined.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


On Tue, Mar 20, 2018 at 12:11:45PM +0000, Andrew Gierth wrote:
> >>>>> "Asier" == Asier Lostalé <asier.lostale@openbravo.com> writes:
> 
>  Asier> Thanks Andrew for your quick response and clear explanation.
> 
>  Asier> Can I understand from your explanation this is not considered as
>  Asier> a bug?
> 
> I would call it a misfeature rather than a bug.
> 
>  Asier> Although the adding a qualified reference workarounds the
>  Asier> problem, it forces to write pl code that is aware of the schema
>  Asier> it is going to be imported in. How could I write this code to be
>  Asier> schema agnostic, so I can import it in any schema without
>  Asier> modifying it?
> 
> For plpgsql (and other pl/* languages, but not LANGUAGE SQL) the best
> way is probably to do this:
> 
> SET search_path = public;   -- or whatever schema
> 
> CREATE OR REPLACE FUNCTION is_even_positive(integer)
>   RETURNS boolean
>   LANGUAGE plpgsql
>   IMMUTABLE
>   SET SEARCH_PATH FROM CURRENT    -- ** this is the important bit
>   AS $$
>     begin
>         return is_even($1) and $1 >= 0;
>     end;
> $$;
> 
> Some caveats:
> 
> 1) The default search_path is "$user",public. Using SET SEARCH_PATH FROM
>    CURRENT doesn't interact well with this (arguably this part _is_ a
>    bug), so either ensure that the search_path is set to something that
>    doesn't exclude $user, or (if you need something that works in a
>    script) you can canonicalize it first using this query:
> 
>    SELECT set_config('search_path',
>                      string_agg(quote_ident(s),','),
>                      false)  -- change to true for equivalent of SET LOCAL
>      FROM unnest(current_schemas(false)) s;
> 
> 2) This doesn't work well for LANGUAGE SQL functions since it would
>    block inlining, which is usually the primary reason for using
>    LANGUAGE SQL in the first place. I don't know of any good workaround
>    for those except to explicitly use the schema in the function body
>    (possibly via text substitution).

[Hackers email list added.]

Uh, you might want to look at this thread too:


https://www.postgresql.org/message-id/flat/152106914669.1223.5104148605998271987%40wrigleys.postgresql.org#152106914669.1223.5104148605998271987@wrigleys.postgresql.org

In that thread there is discussion of how function body extensions can
reference other extension objects when the function doesn't know what
schema it is being installed in.  It uses SQL functions in an index, and
it is inlining that is causing the restore failure.

I had forgotten that SET SEARCH_PATH FROM CURRENT can be used during
function creation, but that forces the function to trust the schema it
is being installed in, which we don't want because it opens us up to the
same problems the removal of public was trying to avoid.

So, the crux of the problem is how do you install a function in an
non-predefined schema, e.g. public, and reference other extension
objects in a safe and transparent way.  This is true of non-extension
objects as well if they are assuming they can reference other objects
that were created earlier, and the schema is not predefined.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +