Thread: PostgreSQL 8.3.4 reproducible crash

PostgreSQL 8.3.4 reproducible crash

From
"Dmitry Koterov"
Date:
Hello.<br /><br />Here is the SQL to reproduce the server crash:<br /><br /><br />CREATE SCHEMA bug1 AUTHORIZATION
postgres;<br/><br />SET search_path = bug1, pg_catalog;<br /><br />CREATE FUNCTION bug1.domain_check (integer) RETURNS
boolean<br/> AS <br />$body$<br />SELECT $1 <> 0<br />$body$<br />LANGUAGE sql IMMUTABLE STRICT;<br /><br
/>CREATEDOMAIN bug1."domain" AS integer<br />        CONSTRAINT "check" CHECK (bug1.domain_check(VALUE));<br /><br
/>CREATETYPE bug1.composite AS (<br />   id domain<br />);<br /><br />select '(1)'::bug1.composite;<br /><br /> 

Re: PostgreSQL 8.3.4 reproducible crash

From
Heikki Linnakangas
Date:
Dmitry Koterov wrote:
> Hello.
> 
> Here is the SQL to reproduce the server crash:
> 
> 
> CREATE SCHEMA bug1 AUTHORIZATION postgres;
> 
> SET search_path = bug1, pg_catalog;
> 
> CREATE FUNCTION bug1.domain_check (integer) RETURNS boolean
> AS
> $body$
> SELECT $1 <> 0
> $body$
> LANGUAGE sql IMMUTABLE STRICT;
> 
> CREATE DOMAIN bug1."domain" AS integer
>         CONSTRAINT "check" CHECK (bug1.domain_check(VALUE));
> 
> CREATE TYPE bug1.composite AS (
>   id domain
> );
> 
> select '(1)'::bug1.composite;

Reproducible on 8.2 as well.

It's crashing in execQual.c, on line 299 (on REL_8_3_STABLE):/* * In a read-only function, use the surrounding query's
snapshot;* otherwise take a new snapshot for each query.  The snapshot should * include a fresh command ID so that all
workto date in this transaction * is visible.    We copy in both cases so that postquel_end can * unconditionally do
FreeSnapshot.*/if (fcache->readonly_func)>>>        snapshot = CopySnapshot(ActiveSnapshot);else{
CommandCounterIncrement();   snapshot = CopySnapshot(GetTransactionSnapshot());}
 

because ActiveSnapshot is NULL. ActiveSnapshot has not yet been set, 
because the input function and domain checking is done in the parse 
analyze phase.

I propose that we simply add add a NULL-check there:

*** src/backend/executor/functions.c    1 Jan 2008 19:45:49 -0000 
1.120
--- src/backend/executor/functions.c    10 Dec 2008 13:13:25 -0000
***************
*** 295,301 ****         * is visible.  We copy in both cases so that postquel_end can         * unconditionally do
FreeSnapshot.        */
 
!       if (fcache->readonly_func)                snapshot = CopySnapshot(ActiveSnapshot);        else        {
--- 295,301 ----         * is visible.  We copy in both cases so that postquel_end can         * unconditionally do
FreeSnapshot.        */
 
!       if (fcache->readonly_func && ActiveSnapshot != NULL)                snapshot = CopySnapshot(ActiveSnapshot);
   else        {
 



8.1 and before are not vulnerable to this, but only because the domain 
check wasn't invoked at all in cases like this:

> postgres=# select '(0)'::bug1.composite;
>  composite 
> -----------
>  (0)
> (1 row)

That was a known issue that was fixed in 8.2.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: PostgreSQL 8.3.4 reproducible crash

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:
> Dmitry Koterov wrote:

>> select '(1)'::bug1.composite;
>
> Reproducible on 8.2 as well.

And HEAD too.

> because ActiveSnapshot is NULL. ActiveSnapshot has not yet been set,  
> because the input function and domain checking is done in the parse  
> analyze phase.

I think the right fix is to ensure that ActiveSnapshot is set.



-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: PostgreSQL 8.3.4 reproducible crash

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I think the right fix is to ensure that ActiveSnapshot is set.

The proximate cause is certainly lack of a snapshot, but I think there
might be some other interesting issues here too.  The crash comes when
the SQL-language domain check function demands a snapshot --- but if
you doselect '1'::bug1.domain;
it works fine!  The reason for the discrepancy is that parse_coerce.c
handles a coercion to a domain by coercing the literal to the underlying
type (int4 in this case) and then setting up a *runtime* domain coercion
using a CoerceToDomain expression node.  So in that case the domain
check doesn't occur till runtime when a snapshot will be available.
But in the crashing case we simply invoke record_in, which invokes
domain_in, which invokes the SQL function.

So there seem to be a couple of things we might want to do:

1. Ensure that a snapshot is set before doing parse analysis of any
non-utility command.  (We *must* not set a snap before LOCK or a
transaction control command, and it seems best to not do it for any
utility command.)  One issue here is that this would change the behavior
for mixed utility and non-utility commands in a single query string;
though I'm not sure how much that matters.

2. Treat coercion to a record type as being something that should occur
at runtime, ie build a RowExpr instead of a constant.  This would
produce more uniform behavior in terms of when domain constraints get
applied.

I think the core issue is really whether it is important to postpone
domain constraint checks.  Consider something like
create domain d as int;create view v as select '-1'::d;alter domain d add constraint "c" check (value > 0);select *
fromv;
 

Right now you get an error at the SELECT, but that seems a bit
surprising.  It's even more surprising that the CREATE still works if
you made the constraint first.  And a novice might reasonably wonder why
the domain check is postponed when the underlying type's checks occur
instantly --- for example, this fails outright:create view v as select 'z'::d;

So this is all a bit odd to start with, and then on top of that we have
the issue that the check timing changes if you put the domain inside a
record.

Comments?
        regards, tom lane


Re: PostgreSQL 8.3.4 reproducible crash

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I propose that we simply add add a NULL-check there:

Certainly not.  A read-only function has no business causing a
snapshot to become established.
        regards, tom lane


Re: PostgreSQL 8.3.4 reproducible crash

From
Jeff Davis
Date:
On Wed, 2008-12-10 at 14:12 -0500, Tom Lane wrote:
>     create domain d as int;
>     create view v as select '-1'::d;
>     alter domain d add constraint "c" check (value > 0);
>     select * from v;
> 
> Right now you get an error at the SELECT, but that seems a bit
> surprising.  It's even more surprising that the CREATE still works if
> you made the constraint first.  And a novice might reasonably wonder why
> the domain check is postponed when the underlying type's checks occur
> instantly --- for example, this fails outright:
>     create view v as select 'z'::d;
> 
> So this is all a bit odd to start with, and then on top of that we have
> the issue that the check timing changes if you put the domain inside a
> record.
> 
> Comments?
> 

Does the standard provide guidance here? I took a look, and it's
difficult to tell, because it uses words like "evaluation" (and I don't
think that a view is required to actually evaluate anything).

It also talks about deferrable and non-deferrable, which indicate that
the constraint should apply at insertion time.

Standard aside... To me, it seems reasonable that something like the
CREATE VIEW above should fail, because you're specifying a literal of
type "d" (invoking the type selector for "d" on a value representation
of unknown type), and it is invalid in the domain "d".

However, a similar construction:

create view v as select cast('-1'::int AS d);

seems slightly different to me, because the value already has a type,
and the exception is raised from the explicit cast. It's the same as if
it casted some variable "x" instead of '-1'::int, because variables
already have types.

Consider something like this:

create view v as select 1::int/0::int;

Here, I see division as a function that can raise an exception, similar
to how an explicit cast can raise an exception. There's no expectation
that the view will evaluate 1/0 at CREATE VIEW time, because 0 might be
some variable "x" (ranging over some underlying table) that can't
possibly be evaluated at view creation time.

In other words, I see casts as functions that may or may not raise an
exception during evaluation, and that should not be evaluated at view
creation time. However, I do not see type selectors
('representation'::type) as functions, because they do not have an
argument of a specific type. I think type selectors should be evaluated
at view creation time, because the value must become a variable at that
time.

Regards,Jeff Davis



Re: PostgreSQL 8.3.4 reproducible crash

From
Alvaro Herrera
Date:
Tom Lane wrote:

> 1. Ensure that a snapshot is set before doing parse analysis of any
> non-utility command.  (We *must* not set a snap before LOCK or a
> transaction control command, and it seems best to not do it for any
> utility command.)  One issue here is that this would change the behavior
> for mixed utility and non-utility commands in a single query string;
> though I'm not sure how much that matters.

I think this is the easiest way out, and the most robust -- we won't be
bitten by some other operation that the parser may think of doing.
(Note that utility commands have their snapshot set in
PortalRunUtility).  Also, perhaps this would let us clean the mess in
pg_plan_queries.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: PostgreSQL 8.3.4 reproducible crash

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> 1. Ensure that a snapshot is set before doing parse analysis of any
>> non-utility command.

> I think this is the easiest way out, and the most robust -- we won't be
> bitten by some other operation that the parser may think of doing.

Yeah.  I think we probably have to do that in any case because we have
an assumption that datatype input routines are allowed to make use of
a snapshot (see comments in fastpath.c for instance).  The fact that
no one's noticed this crash before suggests that none of the common ones
actually do, but I don't think we want to back off that assumption.

There's still a question of whether we want to alter the treatment of
record-type input to make the handling of embedded domains more uniform,
but that's something for the future.
        regards, tom lane


Re: PostgreSQL 8.3.4 reproducible crash

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I think this is the easiest way out, and the most robust -- we won't be
> bitten by some other operation that the parser may think of doing.
> (Note that utility commands have their snapshot set in
> PortalRunUtility).  Also, perhaps this would let us clean the mess in
> pg_plan_queries.

After looking at this some more, it seems the problem is that the mess
moves to the callers --- at least the ones that can't count on a snap
being already set.  Either that or we put the samw make-a-snap-if-needed
logic into pg_analyze_and_rewrite; which doesn't seem very attractive
because then we get two separate snapshot creation cycles for parse
analysis and planning.  It seems better to do it in the callers so that
only one cycle is needed.

Another possibility is to merge pg_analyze_and_rewrite and
pg_plan_queries into a single function, but this doesn't appear to
be convenient for all callers.

The good news is that my concern about changing the snapshotting
behavior of multiple-command query strings seems unfounded.  That
is only relevant to exec_simple_query and it's already set up so that
parse analysis, planning, and execution happen one command at a time.
So if a plannable statement comes before a utility statement, a
transaction snapshot would've been acquired before the utility command
executes anyway.
        regards, tom lane


Re: PostgreSQL 8.3.4 reproducible crash

From
Bruce Momjian
Date:
Dmitry Koterov wrote:
> Hello.
> 
> Here is the SQL to reproduce the server crash:
> 
> 
> CREATE SCHEMA bug1 AUTHORIZATION postgres;
> 
> SET search_path = bug1, pg_catalog;
> 
> CREATE FUNCTION bug1.domain_check (integer) RETURNS boolean
> AS
> $body$
> SELECT $1 <> 0
> $body$
> LANGUAGE sql IMMUTABLE STRICT;
> 
> CREATE DOMAIN bug1."domain" AS integer
>         CONSTRAINT "check" CHECK (bug1.domain_check(VALUE));
> 
> CREATE TYPE bug1.composite AS (
>   id domain
> );
> 
> select '(1)'::bug1.composite;

This has been fixed in CVS HEAD but I am unsure if and how far it was
backpatched.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +