Thread: Two different methods of sneaking non-immutable data into an index

Two different methods of sneaking non-immutable data into an index

From
Merlin Moncure
Date:
While chatting with Haas off-list regarding how the new array/string
functions should work (see the thread in its glory here:
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg148865.html)
the debate  morphed into the relative pros and cons about the proposed
concat() being marked stable vs immutable.  I did some checking into
how things work now, and found some surprising cases.

*) No dependency check on user definable casts:
postgres=# create function hackdate(date) returns text as $$ select
'casted!'::text; $$ language sql;
CREATE FUNCTION
postgres=# create cast (date as text) with function hackdate(date);
CREATE CAST
postgres=# select now()::date || 'abc'::text;  -- you're right!?column?
------------casted!abc
postgres=# create table vtest(a date, b text);
CREATE TABLE
postgres=# create unique index vtest_idx on vtest((a || b));
CREATE INDEX
postgres=# insert into vtest values (now(), 'test');
INSERT 0 1
postgres=# insert into vtest values (now(), 'test'); -- should fail
ERROR:  duplicate key value violates unique constraint "vtest_idx"
DETAIL:  Key ((a || b))=(casted!test) already exists.
postgres=# drop cast (date as text);
DROP CAST
postgres=# insert into vtest values (now(), 'test');
INSERT 0 1
postgres=# select * from vtest;   a      |  b
------------+------2010-08-01 | test2010-08-01 | test
(2 rows)

*) textanycat is defined immutable and shouldn't be:
create table vtest(a date, b text);
create unique index vtest_idx on vtest((a::text || b)); -- fails on
immutable check
create unique index vtest_idx on vtest((a || b)); -- works??
insert into vtest values (now(), 'test');
set datestyle to 'SQL, DMY';
insert into vtest values (now(), 'test');
postgres=# select * from vtest;date   a      |  b
------------+------31/07/2010 | test31/07/2010 | test
(2 rows)
postgres=# select * from vtest where a|| b = now()::date || 'test';   a      |  b
------------+------31/07/2010 | test
(1 row)

*) also, isn't it possible to change text cast influencing GUCs 'n'
times per statement considering any query can call a function and any
function can say, change datestyle?  Shouldn't the related functions
be marked 'volatile', not stable?

merlin


Re: Two different methods of sneaking non-immutable data into an index

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> While chatting with Haas off-list regarding how the new array/string
> functions should work (see the thread in its glory here:
> http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg148865.html)
> the debate  morphed into the relative pros and cons about the proposed
> concat() being marked stable vs immutable.  I did some checking into
> how things work now, and found some surprising cases.

Er ... "now" being defined as what?  I can't replicate your results in
HEAD.  In particular, textanycat isn't immutable anymore.

The DROP CAST case is a bit interesting.  We don't record a dependency
on the cast as such, but on the underlying function --- if you'd tried
to drop the function you'd not have been allowed to.  It is a bit
peculiar that dropping the cast causes the meaning of a::text to change,
but I'm not sure there's much we can do about that.  In any case, it
seems like that's not nearly as much of a hazard as doing CREATE OR
REPLACE FUNCTION and changing the computation done by the function.
We could disallow that maybe, but that cure seems worse than the
disease.
        regards, tom lane


Re: Two different methods of sneaking non-immutable data into an index

From
Merlin Moncure
Date:
On Wed, Aug 4, 2010 at 7:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> While chatting with Haas off-list regarding how the new array/string
>> functions should work (see the thread in its glory here:
>> http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg148865.html)
>> the debate  morphed into the relative pros and cons about the proposed
>> concat() being marked stable vs immutable.  I did some checking into
>> how things work now, and found some surprising cases.
>
> Er ... "now" being defined as what?  I can't replicate your results in
> HEAD.  In particular, textanycat isn't immutable anymore.

ah, my mistake.  I'm using a fairly old build of 9.0.

> The DROP CAST case is a bit interesting.  We don't record a dependency
> on the cast as such, but on the underlying function --- if you'd tried
> to drop the function you'd not have been allowed to.  It is a bit
> peculiar that dropping the cast causes the meaning of a::text to change,
> but I'm not sure there's much we can do about that.  In any case, it
> seems like that's not nearly as much of a hazard as doing CREATE OR
> REPLACE FUNCTION and changing the computation done by the function.
> We could disallow that maybe, but that cure seems worse than the
> disease.

yep (the textanycat was much more interesting to me)

merlin


Re: Two different methods of sneaking non-immutable data into an index

From
Robert Haas
Date:
On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> *) also, isn't it possible to change text cast influencing GUCs 'n'
> times per statement considering any query can call a function and any
> function can say, change datestyle?  Shouldn't the related functions
> be marked 'volatile', not stable?

This is just evil.  It seems to me that we might want to instead
prevent functions from changing things for their callers, or
postponing any such changes until the end of the statement, or, uh,
something.  We can't afford to put ourselves in a situation of having
to make everything volatile; at least, not if "performance" is
anywhere in our top 50 goals.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Two different methods of sneaking non-immutable data into an index

From
Merlin Moncure
Date:
On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> *) also, isn't it possible to change text cast influencing GUCs 'n'
>> times per statement considering any query can call a function and any
>> function can say, change datestyle?  Shouldn't the related functions
>> be marked 'volatile', not stable?
>
> This is just evil.  It seems to me that we might want to instead
> prevent functions from changing things for their callers, or
> postponing any such changes until the end of the statement, or, uh,
> something.  We can't afford to put ourselves in a situation of having
> to make everything volatile; at least, not if "performance" is
> anywhere in our top 50 goals.


yeah -- perhaps you shouldn't be allowed set things like datestyle in
functions then.  I realize this is a corner (of the universe) case,
but I can't recall any other case of volatility being relaxed on
performance grounds... :-).  Maybe a documentation warning would
suffice?

merlin


Re: Two different methods of sneaking non-immutable data into an index

From
Chris Browne
Date:
mmoncure@gmail.com (Merlin Moncure) writes:
> On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> *) also, isn't it possible to change text cast influencing GUCs 'n'
>>> times per statement considering any query can call a function and any
>>> function can say, change datestyle?  Shouldn't the related functions
>>> be marked 'volatile', not stable?
>>
>> This is just evil.  It seems to me that we might want to instead
>> prevent functions from changing things for their callers, or
>> postponing any such changes until the end of the statement, or, uh,
>> something.  We can't afford to put ourselves in a situation of having
>> to make everything volatile; at least, not if "performance" is
>> anywhere in our top 50 goals.
>
> yeah -- perhaps you shouldn't be allowed set things like datestyle in
> functions then.  I realize this is a corner (of the universe) case,
> but I can't recall any other case of volatility being relaxed on
> performance grounds... :-).  Maybe a documentation warning would
> suffice?

That would cause grief for Slony-I, methinks, and probably other things
that behave somewhat similar.

The "logtrigger()" function coerces datestyle to ISO, so that when dates
get stored, they are stored in a canonical form, irrespective of an
individual connection's decisions on datestyle, so we don't have to
include datestyle information as part of the replicated data.
-- 
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxfinances.info/info/postgresql.html
Chaotic Evil means never having to say you're sorry.


Re: Two different methods of sneaking non-immutable data into an index

From
Merlin Moncure
Date:
On Thu, Aug 5, 2010 at 12:59 PM, Chris Browne <cbbrowne@acm.org> wrote:
> mmoncure@gmail.com (Merlin Moncure) writes:
>> On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>>> *) also, isn't it possible to change text cast influencing GUCs 'n'
>>>> times per statement considering any query can call a function and any
>>>> function can say, change datestyle?  Shouldn't the related functions
>>>> be marked 'volatile', not stable?
>>>
>>> This is just evil.  It seems to me that we might want to instead
>>> prevent functions from changing things for their callers, or
>>> postponing any such changes until the end of the statement, or, uh,
>>> something.  We can't afford to put ourselves in a situation of having
>>> to make everything volatile; at least, not if "performance" is
>>> anywhere in our top 50 goals.
>>
>> yeah -- perhaps you shouldn't be allowed set things like datestyle in
>> functions then.  I realize this is a corner (of the universe) case,
>> but I can't recall any other case of volatility being relaxed on
>> performance grounds... :-).  Maybe a documentation warning would
>> suffice?
>
> That would cause grief for Slony-I, methinks, and probably other things
> that behave somewhat similar.
>
> The "logtrigger()" function coerces datestyle to ISO, so that when dates
> get stored, they are stored in a canonical form, irrespective of an
> individual connection's decisions on datestyle, so we don't have to
> include datestyle information as part of the replicated data.

hm -- interesting -- couldn't that cause exactly the sort of situation
though where stability of statement is violated?

merlin


Re: Two different methods of sneaking non-immutable data into an index

From
Tom Lane
Date:
Chris Browne <cbbrowne@acm.org> writes:
> mmoncure@gmail.com (Merlin Moncure) writes:
>> yeah -- perhaps you shouldn't be allowed set things like datestyle in
>> functions then.

> That would cause grief for Slony-I, methinks, and probably other things
> that behave somewhat similar.

Yeah, it's not really practical (or useful IMO) to try to lock this down
completely.
        regards, tom lane


Re: Two different methods of sneaking non-immutable data into an index

From
Robert Haas
Date:
On Thu, Aug 5, 2010 at 12:59 PM, Chris Browne <cbbrowne@acm.org> wrote:
> mmoncure@gmail.com (Merlin Moncure) writes:
>> On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>>> *) also, isn't it possible to change text cast influencing GUCs 'n'
>>>> times per statement considering any query can call a function and any
>>>> function can say, change datestyle?  Shouldn't the related functions
>>>> be marked 'volatile', not stable?
>>>
>>> This is just evil.  It seems to me that we might want to instead
>>> prevent functions from changing things for their callers, or
>>> postponing any such changes until the end of the statement, or, uh,
>>> something.  We can't afford to put ourselves in a situation of having
>>> to make everything volatile; at least, not if "performance" is
>>> anywhere in our top 50 goals.
>>
>> yeah -- perhaps you shouldn't be allowed set things like datestyle in
>> functions then.  I realize this is a corner (of the universe) case,
>> but I can't recall any other case of volatility being relaxed on
>> performance grounds... :-).  Maybe a documentation warning would
>> suffice?
>
> That would cause grief for Slony-I, methinks, and probably other things
> that behave somewhat similar.
>
> The "logtrigger()" function coerces datestyle to ISO, so that when dates
> get stored, they are stored in a canonical form, irrespective of an
> individual connection's decisions on datestyle, so we don't have to
> include datestyle information as part of the replicated data.

I think functions should be allowed to change GUCs internally, but
maybe not for the context from which they were called.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: Two different methods of sneaking non-immutable data into an index

From
Chris Browne
Date:
mmoncure@gmail.com (Merlin Moncure) writes:
> On Thu, Aug 5, 2010 at 12:59 PM, Chris Browne <cbbrowne@acm.org> wrote:
>> mmoncure@gmail.com (Merlin Moncure) writes:
>>> On Wed, Aug 4, 2010 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Wed, Aug 4, 2010 at 6:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>>>> *) also, isn't it possible to change text cast influencing GUCs 'n'
>>>>> times per statement considering any query can call a function and any
>>>>> function can say, change datestyle?  Shouldn't the related functions
>>>>> be marked 'volatile', not stable?
>>>>
>>>> This is just evil.  It seems to me that we might want to instead
>>>> prevent functions from changing things for their callers, or
>>>> postponing any such changes until the end of the statement, or, uh,
>>>> something.  We can't afford to put ourselves in a situation of having
>>>> to make everything volatile; at least, not if "performance" is
>>>> anywhere in our top 50 goals.
>>>
>>> yeah -- perhaps you shouldn't be allowed set things like datestyle in
>>> functions then.  I realize this is a corner (of the universe) case,
>>> but I can't recall any other case of volatility being relaxed on
>>> performance grounds... :-).  Maybe a documentation warning would
>>> suffice?
>>
>> That would cause grief for Slony-I, methinks, and probably other things
>> that behave somewhat similar.
>>
>> The "logtrigger()" function coerces datestyle to ISO, so that when dates
>> get stored, they are stored in a canonical form, irrespective of an
>> individual connection's decisions on datestyle, so we don't have to
>> include datestyle information as part of the replicated data.
>
> hm -- interesting -- couldn't that cause exactly the sort of situation
> though where stability of statement is violated?

It shouldn't...

The data gets stored physically, on disk, in a canonical form.

Why should it be "unstable" to capture data in a canonical form, when
that's what gets stored on disk?
-- 
(format nil "~S@~S" "cbbrowne" "gmail.com")
The statistics on  sanity are that one out of  every four Americans is
suffering from some  form of mental illness. Think  of your three best
friends. If they're okay, then it's you. -- Rita Mae Brown