Thread: Function search_path

Function search_path

From
"Heinemann, Manfred (IMS)"
Date:

We recently started upgrading to Postgres 10.3 and have run into trouble with the security change to search_path.

 

The release notes say:

In cases where user-provided functions are indirectly executed by these programs — for example, user-provided functions in index expressions — the tighter search_path may result in errors, which will need to be corrected by adjusting those user-provided functions to not assume anything about what search path they are invoked under. That has always been good practice, but now it will be necessary for correct behavior. (CVE-2018-1058)

 

We have this issue with a custom function that calls other custom functions and that function is used in a functional index. Now autoanalyze and autovacuum fail on tables with those indices.

 

The simplest fix seemed to be to set the functions search_path to ‘$user’ but that has caused large updates to the indexed table to run out of memory.

 

Hardcoding the schema in the function would mean having to have separate functions for each schema affected. It also seems to make the function less memory efficient.

 

Are there other options?

 

Thanks,

Manfred




Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.

Re: Function search_path

From
Fabio Pardi
Date:
Hi Manfred,


While at the moment I m not aware of other options (I m right now studying CVE-2018-1058 and the impact on the
installationsI maintain), I would focus on the 'out of memory' error.
 

If Postgres is correctly configured, no action you perform on the database should result in 'out of memory' errors. 

Is it maybe the case to review your memory settings to avoid such conditions?


Regards,

Fabio 


On 03/14/2018 08:18 PM, Heinemann, Manfred (IMS) wrote:
> We recently started upgrading to Postgres 10.3 and have run into trouble with the security change to search_path.
> 
>  
> 
> The release notes say:
> 
> In cases where user-provided functions are indirectly executed by these programs — for example, user-provided
functionsin index expressions — the tighter |search_path| may result in errors, which will need to be corrected by
adjustingthose user-provided functions to not assume anything about what search path they are invoked under. That has
alwaysbeen good practice, but now it will be necessary for correct behavior. (CVE-2018-1058)
 
> 
>  
> 
> We have this issue with a custom function that calls other custom functions and that function is used in a functional
index.Now autoanalyze and autovacuum fail on tables with those indices.
 
> 
>  
> 
> The simplest fix seemed to be to set the functions search_path to ‘$user’ but that has caused large updates to the
indexedtable to run out of memory.
 
> 
>  
> 
> Hardcoding the schema in the function would mean having to have separate functions for each schema affected. It also
seemsto make the function less memory efficient.
 
> 
>  
> 
> Are there other options?
> 
>  
> 
> Thanks,
> 
> Manfred
> 
> 
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are
notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or
copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error.
 

RE: Function search_path

From
"Heinemann, Manfred (IMS)"
Date:
I have played around with the postgres memory settings and setting search_path on a function causes a lot more memory
tobe used than if no search_path was set. 

Has anyone hit issues with system users not being able to autoanalyze or autovacuum tables that have functional indices
becauseof the search_path change? 

Thanks,
Manfred

>-----Original Message-----
>From: Fabio Pardi [mailto:f.pardi@portavita.eu]
>Sent: Thursday, March 15, 2018 5:30 AM
>To: Heinemann, Manfred (IMS) <HeinemannM@imsweb.com>; pgsql-admin@lists.postgresql.org
>Subject: Re: Function search_path
>
>Hi Manfred,
>
>
>While at the moment I m not aware of other options (I m right now studying CVE-2018-1058 and the impact on the
installationsI maintain), I would focus on the 'out of memory' error. 
>
>If Postgres is correctly configured, no action you perform on the database should result in 'out of memory' errors.
>
>Is it maybe the case to review your memory settings to avoid such conditions?
>
>
>Regards,
>
>Fabio
>
>
>On 03/14/2018 08:18 PM, Heinemann, Manfred (IMS) wrote:
>> We recently started upgrading to Postgres 10.3 and have run into trouble with the security change to search_path.
>>
>>
>>
>> The release notes say:
>>
>> In cases where user-provided functions are indirectly executed by these programs - for example, user-provided
functionsin index expressions - the tighter |search_path| may result in errors, which will need to be corrected by
adjustingthose user-provided functions to not assume anything about what search path they are invoked under. That has
alwaysbeen good practice, but now it will be necessary for correct behavior. (CVE-2018-1058) 
>>
>>
>>
>> We have this issue with a custom function that calls other custom functions and that function is used in a
functionalindex. Now autoanalyze and autovacuum fail on tables with those indices. 
>>
>>
>>
>> The simplest fix seemed to be to set the functions search_path to '$user' but that has caused large updates to the
indexedtable to run out of memory. 
>>
>>
>>
>> Hardcoding the schema in the function would mean having to have separate functions for each schema affected. It also
seemsto make the function less memory efficient. 
>>
>>
>>
>> Are there other options?
>>
>>
>>
>> Thanks,
>>
>> Manfred
>>
>>
>>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you
arenot the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution,
orcopying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error. 

________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are
notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or
copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error. 


Re: Function search_path

From
Tom Lane
Date:
"Heinemann, Manfred (IMS)" <HeinemannM@imsweb.com> writes:
> I have played around with the postgres memory settings and setting search_path on a function causes a lot more memory
tobe used than if no search_path was set. 

That's a pretty broad claim with a pretty small amount of evidence
offered.

I can certainly believe that attaching a SET clause (whether for
search_path or any other GUC variable) would have an efficiency impact;
one non-obvious example is that it prevents inlining if the function is
a SQL function.  But I don't immediately see a reason for major memory
consumption from that.  I suspect what you're seeing is specific to a
particular use-case.  If you were to provide a concrete example, we
could look into what's happening.

            regards, tom lane


RE: Function search_path

From
"Heinemann, Manfred (IMS)"
Date:
>> I have played around with the postgres memory settings and setting search_path on a function causes a lot more
memoryto be used than if no search_path was set. 
>
>That's a pretty broad claim with a pretty small amount of evidence offered.
>
>I can certainly believe that attaching a SET clause (whether for search_path or any other GUC variable) would have an
efficiencyimpact; one non-obvious example is that it prevents inlining if the function is a SQL function.  But I don't
immediatelysee a reason for major memory consumption from that.  I suspect what you're seeing is specific to a
particularuse-case.  If you were to provide a concrete example, we could look into what's happening. 
>
>regards, tom lane

Here is an example where I can show significant extra memory consumption when setting search_path on a function:

CREATE TABLE test_search_path(date_last_modified timestamp, last_name varchar);

--populate 1,000,000 rows with random values from 1,000 surnames for 'SURNAME'
INSERT INTO test_search_path(date_last_modified, last_name) VALUES(clock_timestamp(), 'SURNAME');

CREATE OR REPLACE FUNCTION clean_name_upper(original_name varchar)
  RETURNS varchar
  LANGUAGE plpgsql
AS $$
BEGIN
  RETURN trim(upper(original_name));
END;
$$ IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION split_name_on_hyphen(original_name character varying)
  RETURNS text[]
  LANGUAGE plpgsql
AS $$
BEGIN
    RETURN string_to_array(clean_name_upper(original_name), '-');
END;
$$ IMMUTABLE STRICT SET search_path = '$user';

CREATE INDEX idx_test_search_path_clean_name_upper_last_name ON test_search_path (clean_name_upper(last_name));
CREATE INDEX idx_test_search_path_split_name_on_hyphen_last_name ON test_search_path USING
gin(split_name_on_hyphen(last_name));

UPDATE test_search_path SET date_last_modified = (date_last_modified - interval '7 days');

This only seems to be an issue when indices exist for both functions and when the inner called function does not have
search_pathset. 

Thanks,
Manfred

________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are
notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or
copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error. 


Re: Function search_path

From
Tom Lane
Date:
"Heinemann, Manfred (IMS)" <HeinemannM@imsweb.com> writes:
> Here is an example where I can show significant extra memory consumption when setting search_path on a function:

I got around to testing this example today, and I don't see what you're
seeing --- the memory consumption seems stable, and about the same with
or without the "SET search_path" clause.

You didn't specify exactly how to do this bit:

--populate 1,000,000 rows with random values from 1,000 surnames for 'SURNAME'

so I did it like this:

INSERT INTO test_search_path(date_last_modified, last_name)
select clock_timestamp(), (random()*1000)::int::text
from generate_series(1,1000000);

The example doesn't seem like it'd be terribly data-dependent, but
maybe that's wrong?

Also, I assume you're running a reasonably up-to-date PG release,
otherwise I'd be wondering about leaks in the GIN index AM; we've
fixed some issues of that sort in GIN bulk updates in the past.
But again, it's not clear what the connection to a function SET
clause would be.  So I lack a plausible theory at the moment.

            regards, tom lane


RE: Function search_path

From
"Heinemann, Manfred (IMS)"
Date:
>> Here is an example where I can show significant extra memory consumption when setting search_path on a function:
>
>I got around to testing this example today, and I don't see what you're seeing --- the memory consumption seems
stable,and about the same with or without the "SET search_path" clause. 
>
>You didn't specify exactly how to do this bit:
>
>--populate 1,000,000 rows with random values from 1,000 surnames for 'SURNAME'
>
>so I did it like this:
>
>INSERT INTO test_search_path(date_last_modified, last_name) select clock_timestamp(), (random()*1000)::int::text from
generate_series(1,1000000);
>
>The example doesn't seem like it'd be terribly data-dependent, but maybe that's wrong?
>
>Also, I assume you're running a reasonably up-to-date PG release, otherwise I'd be wondering about leaks in the GIN
indexAM; we've fixed some issues of that sort in GIN bulk updates in the past. 
>But again, it's not clear what the connection to a function SET clause would be.  So I lack a plausible theory at the
moment.

Tom,
Thanks for your looking at this.

I tested this well on 9.6.7 but I saw the same issue on 10.3.
You are correct the issue is not data dependant and I saw the same issues with the random numeric inserts you used.

On a server with a lot of memory(100GB) and using the postgres default memory settings, the update was able to complete
butwhen watching top there was a clear spike in %MEM for the update when the function search_path was set. 
The issue becomes more noticeable as you add more rows, for example at 5,000,000 %MEM went over 35. Without the
search_pathset %MEM stays real low and the update was a lot quicker. 

Here is the more complete test case:

CREATE SCHEMA test_search_path AUTHORIZATION test_search_path;

--then logged in as the test_search_path user

CREATE TABLE test_search_path(date_last_modified timestamp, last_name varchar);

INSERT INTO test_search_path(date_last_modified, last_name) select clock_timestamp(), (random()*1000)::int::text from
generate_series(1,1000000);

CREATE OR REPLACE FUNCTION clean_name_upper(original_name varchar)
  RETURNS varchar
  LANGUAGE plpgsql
AS $$
BEGIN
  RETURN trim(upper(original_name));
END;
$$ IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION split_name_on_hyphen(original_name character varying)
  RETURNS text[]
  LANGUAGE plpgsql
AS $$
BEGIN
    RETURN string_to_array(clean_name_upper(original_name), '-');
END;
$$ IMMUTABLE STRICT SET search_path = '$user';

CREATE INDEX idx_test_search_path_clean_name_upper_last_name ON test_search_path (clean_name_upper(last_name));
CREATE INDEX idx_test_search_path_split_name_on_hyphen_last_name ON test_search_path USING
gin(split_name_on_hyphen(last_name));

UPDATE test_search_path SET date_last_modified = (date_last_modified - interval '7 days');

There seems to be something about the inner function (clean_name_upper) being used in an index of its own on the same
fieldas the GIN functional index (split_name_on_hyphen) that is causing this. 

Thanks,
Manfred

________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are
notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or
copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error.