Thread: Output functions with multiple arguments considered harmful

Output functions with multiple arguments considered harmful

From
Tom Lane
Date:
An example that Elein put up yesterday:
http://archives.postgresql.org/pgsql-general/2005-04/msg01384.php
caused me to realize that type output functions that depend on
additional arguments to determine what they are dealing with are
fundamentally security holes.  It is trivial to crash 8.0's record_out
by lying to it about the rowtype of its first argument.

I have fixed that in CVS tip (and I suppose we had better think about
a security update) ... but what I am now contemplating is taking steps
of various levels of draconianness to make sure we don't get burnt this
way again.  At a minimum I am going to change the type_sanity regression
test to complain about built-in output functions that have multiple
arguments.  I am also fairly strongly tempted to modify CREATE TYPE so
that it doesn't allow user-defined types to have multi-argument output
functions, either.  I doubt there is anyone out there trying to do that,
because the extra arguments that are passed don't seem useful for any
non-builtin types.  But I thought I should throw it out for discussion
--- has anyone got a problem with such a change?

BTW, the comparable arguments for input functions are not dangerous, and
in fact are necessary.  The reason they are not dangerous is that every
datatype input converter applies sufficient sanity checks to input text
strings to not accept invalid input.  We have an issue on the output
side because internal representations are generally taken on faith to be
valid (and would be hard to validate even if we tried).
        regards, tom lane


Re: Output functions with multiple arguments considered harmful

From
elein@varlena.com (elein)
Date:
On Sat, Apr 30, 2005 at 04:17:59PM -0400, Tom Lane wrote:
> An example that Elein put up yesterday:
> http://archives.postgresql.org/pgsql-general/2005-04/msg01384.php
> caused me to realize that type output functions that depend on
> additional arguments to determine what they are dealing with are
> fundamentally security holes.  It is trivial to crash 8.0's record_out
> by lying to it about the rowtype of its first argument.

Is it not as trivial to crash it if one passes bad data into it?
Why is the oid arg worse than the data arg?  Could it be possible
that those built-in generic type output functions that take OIDS
should check more carefully.  I believe this is an issue primarily
wrt generic types.

> 
> I have fixed that in CVS tip (and I suppose we had better think about
> a security update) ... but what I am now contemplating is taking steps
> of various levels of draconianness to make sure we don't get burnt this
> way again.  At a minimum I am going to change the type_sanity regression
> test to complain about built-in output functions that have multiple
> arguments.  I am also fairly strongly tempted to modify CREATE TYPE so
> that it doesn't allow user-defined types to have multi-argument output
> functions, either.  I doubt there is anyone out there trying to do that,
> because the extra arguments that are passed don't seem useful for any
> non-builtin types.  But I thought I should throw it out for discussion
> --- has anyone got a problem with such a change?

I think a distinction needs to be made with super types.  The generic types
are super types at this time. But remember the grand-parental issue just recently
discussed? http://archives.postgresql.org/pgsql-hackers/2005-04/msg00402.php
Theoretically any time can be a super type which does complicate things.

I believe the generic types' output functions need to be able to be told
what sort of thingy they are expecting to be.  Maybe we can bury this to
internal functions only by enabling casting of unnamed row types, generic arrays,
and other super types in SQL.  Which was my original issue. 

I'm not at a place to retest but shouldn't i be able to do 
myrowtype( ) around anything to cast it to a "myrowtype", successfully or not?

> 
> BTW, the comparable arguments for input functions are not dangerous, and
> in fact are necessary.  The reason they are not dangerous is that every
> datatype input converter applies sufficient sanity checks to input text
> strings to not accept invalid input.  We have an issue on the output
> side because internal representations are generally taken on faith to be
> valid (and would be hard to validate even if we tried).

> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
> 

Elein
elein@varlena.com


Re: Output functions with multiple arguments considered harmful

From
Tom Lane
Date:
elein@varlena.com (elein) writes:
> On Sat, Apr 30, 2005 at 04:17:59PM -0400, Tom Lane wrote:
>> It is trivial to crash 8.0's record_out
>> by lying to it about the rowtype of its first argument.

> Is it not as trivial to crash it if one passes bad data into it?
> Why is the oid arg worse than the data arg?

The first argument is presumably valid in itself (if not, you've got
worse problems than this).  The problem is that record_out was coded
to believe the second argument correctly gives the rowtype of the
first, so it could be induced to try to interpret the tuple using the
wrong tuple descriptor.  I've not bothered to try to construct an
actual crash scenario, but I'm sure Bad Things are possible.

The whole thing is unnecessary anyway, because in the system as-released
every composite Datum is guaranteed to carry internal type labeling;
record_out should simply rely on that always, rather than believing a
type OID that comes from someplace else.

> Theoretically any time can be a super type which does complicate things.

Not really.  Every rowtype Datum will carry its own concrete type.

> I believe the generic types' output functions need to be able to be
> told what sort of thingy they are expecting to be.

That's exactly the thinking I think we need to get away from.  What I'm
really after here is enforcing the viewpoint that instances of generic
types (such as arrays and rows) have to be self-identifying as to just
which subtype they are.  If the info comes from outside the object
itself, it's unreliable.  We have already found ourselves forced to
adopt this position with respect to arrays and records, so I'm thinking
we want to make sure we don't make the same mistake yet again.
        regards, tom lane


Re: Output functions with multiple arguments considered

From
James William Pye
Date:
On Sat, 2005-04-30 at 16:17 -0400, Tom Lane wrote:
> An example that Elein put up yesterday:
> http://archives.postgresql.org/pgsql-general/2005-04/msg01384.php
> caused me to realize that type output functions that depend on
> additional arguments to determine what they are dealing with are
> fundamentally security holes.  It is trivial to crash 8.0's record_out
> by lying to it about the rowtype of its first argument.

I was bitten by this a little while ago where I was running an
OidFunctionCall1(yes, 1) on typoutput's. Andrew on IRC pointed out that
calls to recordout out normally used a FunctionCall3, thus showing the
reason for the issue. Sometimes junk data in the heap signalled the
function to use it instead of the datum tuple's typoid, normally causing
a failed cache lookup. I figured it was somehow my fault, but I just
couldn't put my finger on it.(thanks again Andrew if you're on here)

I imagine this change will also help save others from that mistake as
well.


Re: Output functions with multiple arguments considered harmful

From
elein@varlena.com (elein)
Date:
On Sat, Apr 30, 2005 at 05:31:28PM -0400, Tom Lane wrote:
> elein@varlena.com (elein) writes:
> > On Sat, Apr 30, 2005 at 04:17:59PM -0400, Tom Lane wrote:
> >> It is trivial to crash 8.0's record_out
> >> by lying to it about the rowtype of its first argument.
> 
> > Is it not as trivial to crash it if one passes bad data into it?
> > Why is the oid arg worse than the data arg?
> 
> The first argument is presumably valid in itself (if not, you've got
> worse problems than this).  The problem is that record_out was coded
> to believe the second argument correctly gives the rowtype of the
> first, so it could be induced to try to interpret the tuple using the
> wrong tuple descriptor.  I've not bothered to try to construct an
> actual crash scenario, but I'm sure Bad Things are possible.
> 
> The whole thing is unnecessary anyway, because in the system as-released
> every composite Datum is guaranteed to carry internal type labeling;
> record_out should simply rely on that always, rather than believing a
> type OID that comes from someplace else.
> 
> > Theoretically any time can be a super type which does complicate things.
> 
> Not really.  Every rowtype Datum will carry its own concrete type.
> 
> > I believe the generic types' output functions need to be able to be
> > told what sort of thingy they are expecting to be.
> 
> That's exactly the thinking I think we need to get away from.  What I'm
> really after here is enforcing the viewpoint that instances of generic
> types (such as arrays and rows) have to be self-identifying as to just
> which subtype they are.  If the info comes from outside the object
> itself, it's unreliable.  We have already found ourselves forced to
> adopt this position with respect to arrays and records, so I'm thinking
> we want to make sure we don't make the same mistake yet again.

I agree with you.  Now we just need to be able to cast unnamed row types
in SQL and/or access elements of unnamed row types somehow.

--elein


> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>