Thread: Proposal for resolving casting issues

Proposal for resolving casting issues

From
Tom Lane
Date:
We've been discussing this stuff in fits and starts for months now, but
nothing satisfactory has been arrived at.  I've concluded that part of
the problem is that we are trying to force the system's behavior into
a model that is too limiting: we need more than an implicit/explicit cast
distinction.  Accordingly, I suggest we bite the bullet and make it happen.
(Note that I've resigned myself to having to do an initdb for 7.3beta2.)

I think we must extend pg_cast's castimplicit column to a three-way value:* okay as implicit cast in expression (or in
assignment)*okay as implicit cast in assignment only* okay only as explicit cast
 

"In expression" refers to cases where we have (or potentially have) multiple
possible interpretations; essentially, anytime a value is being fed to a
function or operator, there can be ambiguity due to overloading, and so we
need to restrict the set of possible implicit casts to limit ambiguity and
ensure a reasonable choice of function is made.

"In assignment only" actually means any case where the destination datatype
is known with certainty.  For example CoerceTargetExpr is currently used to
coerce an array subscript expression to integer, and I think it's okay to
treat that context like store assignment.

Question: what shall we call these alternatives in CREATE CAST?  The SQL99
phrase AS ASSIGNMENT looks like it should mean the second, but I think
the spec semantics require it to mean the first.  Ugh.  Perhaps AS
ASSIGNMENT ONLY for the second case?

Also, I think we should allow cast functions to take an optional boolean
second argument "isExplicit", so that explicit casts can be distinguished
from implicit at runtime.  We'll use this to get spec-compliant semantics
for char/varchar truncation (there shouldn't be an error if you explicitly
cast to a shorter length).

We'll need to add fields to Func and RelabelType nodes so that we can tell
whether a node was generated due to an explicit function call, implicit
cast, or explicit cast; we'll use these for better reverse-listing.  (In
particular this will let us hide the boolean second argument from being
reverse-listed, when present.)

Now, as to just what to do with it --- Peter posted a list of questions
awhile back that weren't ever resolved, but I think we can make some
progress with this scheme in mind:

> From looking at the set of implicit or not casts, I think there are two
> major issues to discuss:
> 
> 1. Should truncating/rounding casts be implicit?  (e.g., float4 -> int4)
> 
> I think there's a good argument for "no", but for some reason SQL99 says
> "yes", at least for the family of numerical types.

We can make this work cleanly if "down" casts are assignment-only while
"up" casts are fully implicit.  I think that the spec requires implicit
casting only in the context of store assignment.

> 2. Should casts from non-character types to text be implicit?  (e.g., date
> -> text)
> 
> I think this should be "no", for the same reason that the other direction
> is already disallowed.  It's just sloppy programming.

I agree with this in principle, but in practice we probably have to allow
implicit casts to text, at least for awhile yet.  Seems that too many
people depend on stuff likeSELECT 'Meeting time is ' || timestamp_var
Since this is an expression context we don't get any help from the notion
of store assignment :-(

> I also have a few individual cases that look worthy of consideration:
> 
> abstime <-> int4:  I think these should not be implicit because they
> represent different "kinds" of data.  (These are binary compatible casts,
> so changing them to not implicit probably won't have any effect.  I'd have
> to check this.)

I believe that as of current sources we can mark a binary cast non-implicit,
and I agree with marking these two explicit-only.

> date -> timestamp[tz]:  I'm suspicious of this one, but it's hard to
> explain.  The definition to fill in the time component with zeros is
> reasonable, but it's not the same thing as casting integers to floats
> because dates really represent a time span of 24 hours and timestamps an
> indivisible point in time.  I suggest making this non-implicit, for
> conformance with SQL and for general consistency between the date/time
> types.

I disagree here; promoting date to timestamp seems perfectly reasonable,
and I think it's something a lot of people rely on.

> time -> interval:  I'm not even sure this cast should exist at all.
> Proper arithmetic would be IntervalValue = TimeValue - TIME 'midnight'.
> At least make it non-implicit.

I'd go along with marking it assignment-only.

> timestamp -> abstime:  This can be implicit AFAICS.

This is lossy (abstime doesn't preserve fractional seconds) so I'd vote
for making it assignment-only.


In a later message Peter wrote:

> Since almost every cast to "text" is implicit, then I believe so should
> inet -> text
> macaddr -> text
> int4 -> varchar
> int8 -> varchar
> which are currently not.

I'd go along with making the inet->text and macaddr->text cases implicit,
since as you note all the other casts to text are.  However, those two
casts to varchar must not be implicit (or at most assignment-only) else
they will create ambiguity against the implicit casts to text for the same
source datatype.


In summary: I haven't yet gone through the existing casts in detail, but
I propose the following general rules for deciding how to mark casts:

* Casts across datatype categories should be explicit-only, with the
exception of casts to text, which we will allow implicitly for backward
compatibility's sake.

* Within a category, "up" (lossless) conversions are implicit, "down"
(potentially lossy) conversions should be assignment-only.

Comments?
        regards, tom lane


Re: Proposal for resolving casting issues

From
"Christopher Kings-Lynne"
Date:
> > abstime <-> int4:  I think these should not be implicit because they
> > represent different "kinds" of data.  (These are binary
> compatible casts,
> > so changing them to not implicit probably won't have any
> effect.  I'd have
> > to check this.)
>
> I believe that as of current sources we can mark a binary cast
> non-implicit,
> and I agree with marking these two explicit-only.

Everything in this proposal looks pretty good.  With regards to the above
abstime<->int4 thing - what about the 'magic' values in that conversion.
(eg. -infinity, etc.)

Chris



Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> Everything in this proposal looks pretty good.  With regards to the above
> abstime<->int4 thing - what about the 'magic' values in that conversion.
> (eg. -infinity, etc.)

They map to some magic int4 values, same as it ever was.  I'm not
interested in trying to improve the semantics of any specific conversion
at the moment...
        regards, tom lane


Re: Proposal for resolving casting issues

From
"Zeugswetter Andreas SB SD"
Date:
> I think we must extend pg_cast's castimplicit column to a three-way value:
>     * okay as implicit cast in expression (or in assignment)
>     * okay as implicit cast in assignment only
>     * okay only as explicit cast

> In summary: I haven't yet gone through the existing casts in detail, but
> I propose the following general rules for deciding how to mark casts:
>
> * Casts across datatype categories should be explicit-only, with the
> exception of casts to text, which we will allow implicitly for backward
> compatibility's sake.
>
> * Within a category, "up" (lossless) conversions are implicit, "down"
> (potentially lossy) conversions should be assignment-only.

First of all, thank you for taking this on !

I think the following three states may enable a closer match to an actually
desired (Peter said mandated by SQL99) behavior.

1. okay as implicit cast in expression or assignment
2. okay as implicit cast in expression or assignment but needs runtime check(precision loss possible)
3. okay only as explicit cast (precision loss possible)

Now, since we prbbly can't do this all in beta I think it would be okay to
interpret my state 2 with yours for this release, and add expressions with
runtime checks later.

Regarding the "isExplicit": I think a closer match would be an output argument
"PrecisionInfo" enum(ok, precision loss, [conversion failed ?]). With this,
the caller can differentiate whether to raise a warning (note that char truncation
is actually supposed to raise a warning in sqlca.sqlwarn).
Maybe make it in/out so you can tell the function to not abort on conversion error,
since what I think we need for constants is a "try convert" that does not even abort
on wrong input.

For numbers there is probably only the solution to invent an "anynumber" generic type.

Examples that should imho succeed (and do succeed in other db's):select int2col = 1000000;    conversion fails (looses
precision?) --> return false                    this case could probably behave better if it where defined,
      that is_a_number but doesn't convert is a precision loss,                    maybe with this anynumeric would not
benecessary select char6col = '123456789';    conversion looses precision --> return false for eqselect int2col = 10.0;
      conversion ok --> use index on int2col (same for '10', '10.0') 

Andreas

PS: pg snapshot 09/11 does not compile on AIX (large files (don't want _LARGE_FILES), and mb conversions
(pg_ascii2mic and pg_mic2ascii not found in the postmaster and not included from elsewhere)
are the culprit) (make check hangs on without_oid's vacuum when removing conversions from Makefile and
undef _LARGE_FILES to make it compile)
There are also tons of "unsigned char vs signed char" warnings in current mb sources :-(


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> I think the following three states may enable a closer match to an actually 
> desired (Peter said mandated by SQL99) behavior.

> 1. okay as implicit cast in expression or assignment
> 2. okay as implicit cast in expression or assignment but needs runtime check
>     (precision loss possible)
> 3. okay only as explicit cast (precision loss possible)

The runtime checks are there already, eg

regression=# select 123456789::int4::int2;
ERROR:  i4toi2: '123456789' causes int2 overflow

However this does not help us much; the critical point is that if we
want function overloading to work in a sane fashion, we have to prefer
up-conversions to down-conversions *at parse time*, at least for the
operands of functions and operators (which is what I meant by "in
expressions").  Runtime checks are irrelevant to this problem.

> Regarding the "isExplicit": I think a closer match would be an output
> argument "PrecisionInfo" enum(ok, precision loss, [conversion failed
> ?]).

I'm not planning to add output arguments to fix this problem ;-)

> For numbers there is probably only the solution to invent an
> "anynumber" generic type.

Actually, I had been toying with the notion of doing the following:

1. A numeric literal is initially typed as the smallest type that will
hold it in the series int2, int4, int8, numeric (notice NOT float8).

2. Allow implicit up-coercion int2->int4->int8->numeric->float4->float8,
but down-coercions aren't implicit except for assignment.

3. Eliminate most or all of the cross-numeric-type operators (eg, there
is no reason to support int2+int4 as a separate operator).

With this approach, an expression like "int4var = 42" would be initially
typed as int4 and int2, but then the constant would be coerced to int4
because int4=int4 is the closest-match operator.  (int2=int2 would not
be considered because down-coercion isn't implicitly invokable.)  Also
we get more nearly SQL-standard behavior in expressions that combine
numeric with float4/float8: the preferred type will be float, which
accords with the spec's notions of exact numeric vs. approximate numeric.

I think this solves most or all of our problems with poor type choices
for numeric literals, but it still needs thought --- I'm not suggesting
we shoehorn it into 7.3.  In particular, I'm not sure whether the
current preferred-type arrangement (namely, numeric and float8 are both
preferred types for the numeric category) would need to change.

> There are also tons of "unsigned char vs signed char" warnings in
> current mb sources :-( 

Yeah, I know :-( ... I see 'em too when using HPUX' vendor compiler.
We ought to clean that up someday.
        regards, tom lane


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
I said:
> Also, I think we should allow cast functions to take an optional boolean
> second argument "isExplicit", so that explicit casts can be distinguished
> from implicit at runtime.  We'll use this to get spec-compliant semantics
> for char/varchar truncation (there shouldn't be an error if you explicitly
> cast to a shorter length).

After looking closely at SQL92 sections 6.10 (cast specification) and
9.2 (store assignment), it seems that the only places where the spec
demands different behavior for an explicit cast than for an implicit
assignment cast are for length coercions of char, varchar, bit, and
varbit types.

Accordingly, the places where we actually *need* the extra isExplicit
argument are not in the type-coercion functions per se, but in the
length-coercion functions associated with these four datatypes.

While we could still add the extra argument for the type-coercion
functions, I'm inclined not to do so; there is no need for it for spec
compliance of the standard types, and I don't think we should encourage
user-defined types to behave differently for explicit and implicit
casts.

What I will do instead is adjust parse_coerce.c so that a
length-coercion function can have either of the signaturesfoo(foo,int4) returns foo
orfoo(foo,int4,bool) returns foo
and then modify the above-mentioned length coercion functions to provide
the desired behavior.  This has no direct impact on pg_cast because we
do not use pg_cast for length-coercion functions.
        regards, tom lane


Re: Proposal for resolving casting issues

From
Peter Eisentraut
Date:
Tom Lane writes:

> I think we must extend pg_cast's castimplicit column to a three-way value:
>     * okay as implicit cast in expression (or in assignment)
>     * okay as implicit cast in assignment only
>     * okay only as explicit cast

Viewed in isolation this looks entirely reasonable, but I think we would
be adding a lot of infrastructure for the benefit of a relatively small
number of cases.

As the writer of a cast, this presents me with at least one more option
than I can really manage.

As the user of a cast, these options make the whole system nearly
unpredictable because in any non-trivial expression each of these
behaviors could take effect somehow (possibly even depending on how the
inner expressions turned out).

I am not aware of any programming language that has more than three
castability levels (never/explicit/implicit).

Finally, I believe this paints over the real problems, namely the
inadequate and hardcoded type category preferences and the inadequate
handling of numerical constants.  Both of these issues have had adequate
approaches proposed in the past and would solve this an a number of other
issues.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Proposal for resolving casting issues

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> I think we must extend pg_cast's castimplicit column to a three-way value:
>> * okay as implicit cast in expression (or in assignment)
>> * okay as implicit cast in assignment only
>> * okay only as explicit cast

> As the user of a cast, these options make the whole system nearly
> unpredictable because in any non-trivial expression each of these
> behaviors could take effect somehow (possibly even depending on how the
> inner expressions turned out).

How so?  Only the first set of casts applies inside an expression.

It seems to me that this proposal actually *reduces* the number of casts
that might apply in any given context, and thus makes the behavior more
predictable not less so.  Certainly it is more predictable than
any-cast-can-be-applied-implicitly, which I seem to remember you arguing
for (at least for the numeric types).

> I am not aware of any programming language that has more than three
> castability levels (never/explicit/implicit).

Actually I think that this scheme would allow us to model typical
programming-language behavior quite accurately.  C for example will let
you assign a float to an integer (with appropriate runtime behavior) ---
but if you add a float and an integer, you get a float addition; there's
no possibility that the system will choose to coerce the float to int
and do an int addition.  So the set of available implicit casts is
different in an assignment context than it is in an expression context.
Seems pretty close to what I'm suggesting.

> Finally, I believe this paints over the real problems, namely the
> inadequate and hardcoded type category preferences and the inadequate
> handling of numerical constants.  Both of these issues have had adequate
> approaches proposed in the past and would solve this an a number of other
> issues.

If they were adequate they would have gotten implemented; we had issues
with all the proposals so far.  See my later response to Andreas for a
possible solution to the numerical-constant issue based on this
mechanism.
        regards, tom lane


Re: Proposal for resolving casting issues

From
Bruce Momjian
Date:
Tom Lane wrote:
> We've been discussing this stuff in fits and starts for months now, but
> nothing satisfactory has been arrived at.  I've concluded that part of
> the problem is that we are trying to force the system's behavior into
> a model that is too limiting: we need more than an implicit/explicit cast
> distinction.  Accordingly, I suggest we bite the bullet and make it happen.
> (Note that I've resigned myself to having to do an initdb for 7.3beta2.)

I was reading my backlog of email and thinking, "Oh, things are shaping
up well", then I hit this message.  Let me try to collect open items
tomorrow and get a plan together.  I have caught up on my email.  I am
heading to bed.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Proposal for resolving casting issues

From
Bruce Momjian
Date:
Tom Lane wrote:
> "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> > I think the following three states may enable a closer match to an actually 
> > desired (Peter said mandated by SQL99) behavior.
> 
> > 1. okay as implicit cast in expression or assignment
> > 2. okay as implicit cast in expression or assignment but needs runtime check
> >     (precision loss possible)
> > 3. okay only as explicit cast (precision loss possible)
> 
> The runtime checks are there already, eg
> 
> regression=# select 123456789::int4::int2;
> ERROR:  i4toi2: '123456789' causes int2 overflow
> 
> However this does not help us much; the critical point is that if we
> want function overloading to work in a sane fashion, we have to prefer
> up-conversions to down-conversions *at parse time*, at least for the
> operands of functions and operators (which is what I meant by "in
> expressions").  Runtime checks are irrelevant to this problem.

I think there is some confusion here.  The runtime checks Andreas was
talking about was allowing a double of 64.0 to cast to an int4 while
disallowing 64.1 from being cast to an int4 because it is not a hole
number.  

I am not sure doubles have enough precision to make such comparisons
functional (NUMERIC certainly does) but that was his proposal, and he
stated he thought the standard required it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I think there is some confusion here.  The runtime checks Andreas was
> talking about was allowing a double of 64.0 to cast to an int4 while
> disallowing 64.1 from being cast to an int4 because it is not a hole
> number.  

> I am not sure doubles have enough precision to make such comparisons
> functional (NUMERIC certainly does) but that was his proposal, and he
> stated he thought the standard required it.

It seems clear to me that the standard requires us NOT to reject that.

In the explicit-cast case, SQL92 6.10 <cast specification> saith:
        3) If TD is exact numeric, then
           Case:
           a) If SD is exact numeric or approximate numeric, then
             Case:
             i) If there is a representation of SV in the data type TD                that does not lose any leading
significantdigits after                rounding or truncating if necessary, then TV is that rep-
resentation.The choice of whether to round or truncate is                implementation-defined.
 
            ii) Otherwise, an exception condition is raised: data exception-                numeric value out of
range.

So we are *only* allowed to throw an error for overflow; having to round
is not an error condition.

In the implicit-cast case, section 9.2 Store assignment has
           k) If the data type of T is numeric and there is an approxi-             mation obtained by rounding or
truncationof the numerical             value of V for the data type of T, then the value of T is set             to
suchan approximation.
 
             If there is no such approximation, then an exception condi-             tion is raised: data
exception-numericvalue out of range.
 
             If the data type of T is exact numeric, then it is implementation-             defined whether the
approximationis obtained by rounding or             by truncation.
 

which is different wording but seems to boil down to the same thing: the
only error condition is out-of-range.
        regards, tom lane


Re: Proposal for resolving casting issues

From
"Zeugswetter Andreas SB SD"
Date:
> What I will do instead is adjust parse_coerce.c so that a
> length-coercion function can have either of the signatures
>     foo(foo,int4) returns foo
> or
>     foo(foo,int4,bool) returns foo
> and then modify the above-mentioned length coercion functions to provide
> the desired behavior.  This has no direct impact on pg_cast because we
> do not use pg_cast for length-coercion functions.

Sounds good to me.

When those are really truncated ESQL/C needs to set a warning in sqlca.sqlwarn
though, thus I think the second signature should also have an output flag to tell
whether truncation actually occurred.
Maybe this should be kept for a protocol change though, since I would not think
a NOTICE would be suitable here.

Andreas


Re: Proposal for resolving casting issues

From
"Zeugswetter Andreas SB SD"
Date:
> I think there is some confusion here.  The runtime checks Andreas was
> talking about was allowing a double of 64.0 to cast to an int4 while
> disallowing 64.1 from being cast to an int4 because it is not a hole
> number.

Yes, and Tom's proposal for numbers is sufficient for constants, since the 64.0
will initially be an int2 and thus do the correct thing together with an int4,
and the 64.1 constant will be a numeric, and thus also do the correct thing with
all other types.

It is not sufficient for the optimizer for joins though, since it cannot use the
int4 index when confronted with "where tab1.int4col = tab2.numericcol".
Here only a runtime (non aborting) check would help.
Maybe this could be overcome if the index access (or something inbetween) would allow
a "numeric" constant for an int4 index (If the "numeric" value does not cleanly convert
to int4, return no rows).

Andreas


Re: Proposal for resolving casting issues

From
"Zeugswetter Andreas SB SD"
Date:
> > For numbers there is probably only the solution to invent an
> > "anynumber" generic type.
>
> Actually, I had been toying with the notion of doing the following:
>
> 1. A numeric literal is initially typed as the smallest type that will
> hold it in the series int2, int4, int8, numeric (notice NOT float8).

Yes, that sounds like a good plan for all scenarios that can follow !

> 2. Allow implicit up-coercion int2->int4->int8->numeric->float4->float8,
> but down-coercions aren't implicit except for assignment.

How about int2->int4->int8->numeric->float4->float8->numeric ?
That would also allow an upward path from float8.

> 3. Eliminate most or all of the cross-numeric-type operators
> (eg, there is no reason to support int2+int4 as a separate operator).

Yes.

> With this approach, an expression like "int4var = 42" would be initially
> typed as int4 and int2, but then the constant would be coerced to int4
> because int4=int4 is the closest-match operator.  (int2=int2 would not
> be considered because down-coercion isn't implicitly invokable.)

It would fix the constants issue, yes. How about where int2col=int4col
and it's indexability of int2col though ?

> Also
> we get more nearly SQL-standard behavior in expressions that combine
> numeric with float4/float8: the preferred type will be float, which
> accords with the spec's notions of exact numeric vs.
> approximate numeric.

I do not understand the standard here.
Especially the following would seem awkward if that would switch to approximate:
set numericcol = numericcol * float4col;

Andreas


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> When those are really truncated ESQL/C needs to set a warning in sqlca.sqlwarn
> though, thus I think the second signature should also have an output flag to tell 
> whether truncation actually occurred.
> Maybe this should be kept for a protocol change though, since I would not think
> a NOTICE would be suitable here. 

Again, I don't want to invent output arguments for functions today ;-).

I agree that a NOTICE would be overkill, and that we need a protocol
change to implement completion conditions (sqlca.sqlwarn) properly.
When that happens, I think the explicit-cast paths in the coercion
routines can easily call the "set a completion condition" routine for
themselves; I see no reason to pass back the condition one level
before doing so.
        regards, tom lane


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> It is not sufficient for the optimizer for joins though, since it
> cannot use the int4 index when confronted with "where tab1.int4col =
> tab2.numericcol".

For cross-datatype joins, the proposal as I sketched it would result in
the parser producing, eg,where tab1.int4col::numeric = tab2.numericcol
that is, we'd have a single-datatype operator and a runtime cast in the
expression.

The optimizer is today capable of producing a nested loop with inner
indexscan join from this --- so long as the inner indexscan is on the
uncasted column (numericcol in this case).  It won't consider an int4
index on int4col for this.  This seems okay to me, actually.  It's
better than what you get now with a cross-datatype comparison operator
(neither side can be indexscanned since the operator matches neither
index opclass).

The major failing that needs to be rectified is that merge and hash
joins won't even be considered, because that code only works with
quals that are unadorned "Var = Var".  I don't believe there is any
fundamental reason for this restriction.  As long as the top operator
is merge/hashjoinable, any expression should work on either side.
It's just a matter of cleaning up a few unwarranted shortcuts in the
planner.

But that work does need to be done before we can rip out all the
cross-datatype operators ... so this is definitely not happening
for 7.3 ...
        regards, tom lane


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
>> 2. Allow implicit up-coercion int2->int4->int8->numeric->float4->float8,
>> but down-coercions aren't implicit except for assignment.

> How about int2->int4->int8->numeric->float4->float8->numeric ?
> That would also allow an upward path from float8.

Uh, what?  That seems logically impossible to me ... or at least it
would reintroduce exactly the problem we need to get away from: casts
between float4, float8, numeric would be considered equally good in
either direction, creating ambiguity about which operator to use.
How are you envisioning it would work exactly?

Perhaps I should clarify what I had in mind: because the parser only
considers one level of type coercion when choosing a function or
assigning to a result column, it's actually necessary to have all thirty
cast combinations between the six numeric types available in pg_cast.
My notation "int2->int4->int8->numeric->float4->float8" is intended to
imply that of the thirty, these would be marked as implicitly coercible:

int2->int4
int2->int8
int2->numeric
int2->float4
int2->float8
int4->int8
int4->numeric
int4->float4
int4->float8
int8->numeric
int8->float4
int8->float8
numeric->float4
numeric->float8
float4->float8

while the fifteen reverse coercions would be assignment-only.

If we allow any circularity then we will have pairs of types with both
cast pathways marked as implicit, which will leave the parser unable to
choose which operator to use.  This is exactly why "numeric = float8"
has failed in past versions: there are two alternatives that are equally
easy to reach.


> It would fix the constants issue, yes. How about where int2col=int4col 
> and it's indexability of int2col though ?

See my other response.  The current scheme of using a cross-datatype
operator isn't helpful for indexing such cases anyway...

>> Also
>> we get more nearly SQL-standard behavior in expressions that combine
>> numeric with float4/float8: the preferred type will be float, which
>> accords with the spec's notions of exact numeric vs. 
>> approximate numeric.

> I do not understand the standard here.
> Especially the following would seem awkward if that would switch to
> approximate:
>    set numericcol = numericcol * float4col; 

Well, the spec's notion is that combining an "exact" number and an
"approximate" number must yield an "approximate" result.  This logic
is hard to argue with, even though in our implementation it would
seem to make more sense for numeric to be the top of the hierarchy
on range and precision grounds.

Note that if you write, say,set numericcol = numericcol * 3.14159;
my proposal would do the "right thing" since the constant would be typed
as numeric to start with and would stay that way.  To do what you want
with a float variable, it'd be necessary to writeset numericcol = numericcol * float4col::numeric;
which is sort of ugly; but no uglier thanset float4col = float4col * numericcol::float4;
which is what you'd have to write if the system preferred numeric and
you wanted the other behavior.

I too have been thinking for a long time that I didn't like following
the spec's lead on this point; but I am now starting to think that it's
not all that bad.  This approach to handling constants is *much* cleaner
than what we've done in the past, or even any of the unimplemented
proposals that I can recall.  The behavior you'd get with combinations
of float and numeric variables is, well, debatable; from an
implementor's point of view preferring a numeric result makes sense,
but it's much less clear that users would automatically think the same.
Given the spec's position, I am starting to think that preferring float
is the right thing to do.

BTW, I am thinking that we don't need the notion of "preferred type" at
all in the numeric category if we use this approach.  I have not worked
through the details for the other type categories, but perhaps if we
adopt similar systems of one-way implicit promotions in each category,
we could retire "preferred types" altogether --- which would let us get
rid of hardwired type categories, too.
        regards, tom lane


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
I wrote:
> I think we must extend pg_cast's castimplicit column to a three-way value:
>     * okay as implicit cast in expression (or in assignment)
>     * okay as implicit cast in assignment only
>     * okay only as explicit cast

> Question: what shall we call these alternatives in CREATE CAST?  The SQL99
> phrase AS ASSIGNMENT looks like it should mean the second, but I think
> the spec semantics require it to mean the first.  Ugh.  Perhaps AS
> ASSIGNMENT ONLY for the second case?

On looking more closely, SQL99 appears to define user-defined casts as
invocable *only* in explicit cast and assignment contexts.  Part 2 sez:
        4.13  Data conversions
        Explicit data conversions can be specified by a CAST operator.        A CAST operator defines how values of a
sourcedata type are        converted into a value of a target data type according to        the Syntax Rules and
GeneralRules of Subclause 6.22, "<cast        specification>". Data conversions between predefined data types
andbetween constructed types are defined by the rules of this part        of ISO/IEC 9075. Data conversions between one
ormore user-defined        types are defined by a user-defined cast.
 
        A user-defined cast identifies an SQL-invoked function, called the        cast function, that has one SQL
parameterwhose declared type is        the same as the source data type and a result data type that is the
targetdata type. A cast function may optionally be specified to        be implicitly invoked whenever values are
assignedto targets of        its result data type. Such a cast function is called an implicitly        invocable cast
function.

This seems to mean that we can get away with defining AS ASSIGNMENT to
mean my second category (implicit in assignment only), and then picking
some more natural term for my first category (implicit anywhere).

I favor using IMPLICIT, which would make the syntax of CREATE CAST be
   CREATE CAST (sourcetype AS targettype)       WITH FUNCTION funcname (argtype)       [ AS ASSIGNMENT | IMPLICIT ]
CREATE CAST (sourcetype AS targettype)       WITHOUT FUNCTION       [ AS ASSIGNMENT | IMPLICIT ]
 

Or possibly it should be AS IMPLICIT?

Comments?
        regards, tom lane


Re: Proposal for resolving casting issues

From
Bruce Momjian
Date:
Tom Lane wrote:
> I favor using IMPLICIT, which would make the syntax of CREATE CAST be
> 
>     CREATE CAST (sourcetype AS targettype)
>         WITH FUNCTION funcname (argtype)
>         [ AS ASSIGNMENT | IMPLICIT ]
>     
>     CREATE CAST (sourcetype AS targettype)
>         WITHOUT FUNCTION
>         [ AS ASSIGNMENT | IMPLICIT ]
> 
> Or possibly it should be AS IMPLICIT?

I think AS IMPLICIT would be better because we have other AS [var]
clauses.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Proposal for resolving casting issues

From
Bruce Momjian
Date:
Tom Lane wrote:
> Note that if you write, say,
>     set numericcol = numericcol * 3.14159;
> my proposal would do the "right thing" since the constant would be typed
> as numeric to start with and would stay that way.  To do what you want
> with a float variable, it'd be necessary to write
>     set numericcol = numericcol * float4col::numeric;
> which is sort of ugly; but no uglier than
>     set float4col = float4col * numericcol::float4;
> which is what you'd have to write if the system preferred numeric and
> you wanted the other behavior.

I need a clarification.  In the non-assignment case, does:
WHERE numericcol = numericcol * 3.14159

evaluate "numericcol * 3.14159" as a numeric?

And does:
WHERE 5.55 = numericcol * 3.14159

evaluate "numericcol * 3.14159" as a numeric too?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I need a clarification.  In the non-assignment case, does:
>     WHERE numericcol = numericcol * 3.14159
> evaluate "numericcol * 3.14159" as a numeric?

Yup (given my proposed changes that is).

> And does:
>     WHERE 5.55 = numericcol * 3.14159
> evaluate "numericcol * 3.14159" as a numeric too?

Yup.  The context does not matter: when we have foo * bar, we are going
to decide which kind of * operator is meant without regard to
surrounding context.  It's very much a bottom-up process, and has to be.
        regards, tom lane


Re: Proposal for resolving casting issues

From
"Zeugswetter Andreas SB SD"
Date:
> > Note that if you write, say,
> >     set numericcol = numericcol * 3.14159;
> > my proposal would do the "right thing" since the constant would be typed
> > as numeric to start with and would stay that way.  To do what you want
> > with a float variable, it'd be necessary to write
> >     set numericcol = numericcol * float4col::numeric;

Yes, that is the case where the new behavior would imho not be good (but you
say spec compliant). I loose precision even though there is room to hold it.

> > which is sort of ugly; but no uglier than
> >     set float4col = float4col * numericcol::float4;

Informix does the calculations in numeric, and then converts the result
if no casts are supplied (would do set float4col = float4(float4col::numeric * numericcol)).

Would be interesting what others do ?

Test script:
create table atab (a decimal(30), b smallfloat, c decimal(30), d smallfloat);
insert into atab values (1.000000000000001,100000.0,0, 0);
update atab set c=a*b-b, d=a*b-b where 1=1;
select a*b-b, b, c,d from atab;
   (expression)              b                c              d
          1e-10 100000.0000000            1e-10          1e-10

I hope this test is ok ?
It still seems to me, that numeric should be the preferred type, and not float8.

Andreas


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> Note that if you write, say,
> set numericcol = numericcol * 3.14159;
> my proposal would do the "right thing" since the constant would be typed
> as numeric to start with and would stay that way.  To do what you want
> with a float variable, it'd be necessary to write
> set numericcol = numericcol * float4col::numeric;

> Yes, that is the case where the new behavior would imho not be good (but you 
> say spec compliant). I loose precision even though there is room to hold it.

Lose what precision?  It seems silly to imagine that the product of
a numeric and a float4 is good to more digits than there are in the
float4.  This is exactly the spec's point: combining an exact and an
approximate input will give you an approximate result.

(Unless of course the value in the float4 happens to be exact, eg,
an integer of not very many digits.  But if you are relying on that
to be true, why aren't you using an exact format for storing it?)

> Informix does the calculations in numeric, and then converts the result
> if no casts are supplied (would do set float4col = float4(float4col::numeric * numericcol)).

I am not sure what the argument is for following Informix's lead rather
than the standard's lead; especially when Informix evidently doesn't
understand numerical analysis ;-)
        regards, tom lane


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
I wrote:
> [Peter wrote:]
>> time -> interval:  I'm not even sure this cast should exist at all.
>> Proper arithmetic would be IntervalValue = TimeValue - TIME 'midnight'.
>> At least make it non-implicit.

> I'd go along with marking it assignment-only.

I started to make this change, but have momentarily backed off after
observing that it causes a failure in the regression tests:

*** ./expected/horology-no-DST-before-1970.out    Wed Sep 18 13:56:41 2002
--- ./results/horology.out    Wed Sep 18 15:45:54 2002
***************
*** 277,287 ****  -- subtract time from date should not make sense; use interval instead SELECT date '1991-02-03' -
time'04:05:06' AS "Subtract Time";
 
!       Subtract Time       
! --------------------------
!  Sat Feb 02 19:54:54 1991
! (1 row)
!  SELECT date '1991-02-03' - time with time zone '04:05:06 UTC' AS "Subtract Time UTC"; ERROR:  Unable to identify an
operator'-' for types 'date' and 'time with time zone'     You will have to retype this query using an explicit cast
 
--- 277,284 ----  -- subtract time from date should not make sense; use interval instead SELECT date '1991-02-03' -
time'04:05:06' AS "Subtract Time";
 
! ERROR:  Unable to identify an operator '-' for types 'date' and 'time without time zone'
!     You will have to retype this query using an explicit cast SELECT date '1991-02-03' - time with time zone
'04:05:06UTC' AS "Subtract Time UTC"; ERROR:  Unable to identify an operator '-' for types 'date' and 'time with time
zone'    You will have to retype this query using an explicit cast
 


The regression test is evidently relying on the implicit cast from time
to interval to allow the date - interval operator to be used for this
query.

Now, given that the regression test itself observes that 'date - time'
is wrong, and should be 'date - interval', maybe this behavioral change
is a Good Thing.  Or maybe it will just break applications.  Comments?

I'm going to commit my pg_cast changes without this change later today,
but we can still go back and add this change if we decide it's good.
        regards, tom lane


Re: Proposal for resolving casting issues

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> > Or possibly it should be AS IMPLICIT?
>
> I think AS IMPLICIT would be better because we have other AS [var]
> clauses.

But IMPLICIT is not a variable.

-- 
Peter Eisentraut   peter_e@gmx.net



AIX compilation problems (was Re: Proposal ...)

From
Peter Eisentraut
Date:
Zeugswetter Andreas SB SD writes:

> PS: pg snapshot 09/11 does not compile on AIX (large files (don't want
> _LARGE_FILES),

Please provide details.

> and mb conversions (pg_ascii2mic and pg_mic2ascii not
> found in the postmaster and not included from elsewhere)

And details here as well.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Proposal for resolving casting issues

From
Peter Eisentraut
Date:
Tom Lane writes:

> On looking more closely, SQL99 appears to define user-defined casts as
> invocable *only* in explicit cast and assignment contexts.

> This seems to mean that we can get away with defining AS ASSIGNMENT to
> mean my second category (implicit in assignment only), and then picking
> some more natural term for my first category (implicit anywhere).

Sounds good.

Have you seen 9.4 "Subject routine determination" and 9.5 "Type
precedence list determination"?  In essence, the SQL standard has a
hard-coded precedence list much like we have.  Since we support the
creation of non-structured user-defined types, the additional castability
level effectively gives us a way to override the built-in precedence
lists.  In fact, now that we have given up in the numeric/float8
precedence, the other hard-coded categories should be easy to eliminate.

>     CREATE CAST (sourcetype AS targettype)
>         WITH FUNCTION funcname (argtype)
>         [ AS ASSIGNMENT | IMPLICIT ]

Fine with me.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Proposal for resolving casting issues

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian writes:
> 
> > > Or possibly it should be AS IMPLICIT?
> >
> > I think AS IMPLICIT would be better because we have other AS [var]
> > clauses.
> 
> But IMPLICIT is not a variable.

I meant we have cases where we do AS [ keyword1 | keyword2 ].
           CREATE OPERATOR CLASS any_name opt_default FOR TYPE_P Typename           USING access_method AS
opclass_item_list

What I am saying is that is better to do AS [ keyword | keyword ] rather
than [ AS keyword | keyword ].

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> What I am saying is that is better to do AS [ keyword | keyword ] rather
> than [ AS keyword | keyword ].

Yeah, I thought the same after looking at it a little.  Committed that
way (of course it's still open to adjustment...)
        regards, tom lane


Re: BLOB

From
Manuel Cabido
Date:
Hi PostgreSQL Folks,
   I would like to inquire how is the BLOB support in PostgreSQL is doing 
now? Had there been some improvements? Can I have the blob support like in 
this manner?

    create table myblob (blobid  serial not null primary key,name    varchar(50),image   blob));
    for some doc,xls, and ppt files, can i do this operations?
    Insert into myblob (name,image) values (' personal data','personal.doc');    Insert into myblob (name,image) values
('business data','business.xls');    Insert into myblob (name,image) values (' presentation data','present.ppt');
 
   I would appreciate it very much for whatever comments you can give me 
on this.
   Thank you and MORE POWER TO THE BEST OPENSOURCE DBMS!

Mr. Manny Cabido
Philippines




Re: BLOB

From
Rod Taylor
Date:
On Wed, 2002-09-18 at 18:32, Manuel Cabido wrote:
> Hi PostgreSQL Folks,
> 
>     I would like to inquire how is the BLOB support in PostgreSQL is doing 
> now? Had there been some improvements? Can I have the blob support like in 

I'm unsure about blob (didn't know we had a blob type), but bytea works
perfectly fine for that.

--  Rod Taylor



Re: BLOB

From
"Christopher Kings-Lynne"
Date:
>     I would like to inquire how is the BLOB support in
> PostgreSQL is doing
> > now? Had there been some improvements? Can I have the blob
> support like in
>
> I'm unsure about blob (didn't know we had a blob type), but bytea works
> perfectly fine for that.

Is there some reason why we didn't call text 'clob' and bytea 'blob'?  or at
least add aliases?

Chris



Re: Proposal for resolving casting issues

From
"Zeugswetter Andreas SB SD"
Date:
> "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> > Note that if you write, say,
> > set numericcol = numericcol * 3.14159;
> > my proposal would do the "right thing" since the constant would be typed
> > as numeric to start with and would stay that way.  To do what you want
> > with a float variable, it'd be necessary to write
> > set numericcol = numericcol * float4col::numeric;
>
> > Yes, that is the case where the new behavior would imho not be good (but you
> > say spec compliant). I loose precision even though there is room to hold it.
>
> Lose what precision?  It seems silly to imagine that the product of

Have you seen my example ? If calculated in float4 the result of
1.00000000000001*1000.0-1000.0 would be 0.0, no ?

> a numeric and a float4 is good to more digits than there are in the
> float4.  This is exactly the spec's point: combining an exact and an
> approximate input will give you an approximate result.

Does it actually say how approximate the result needs to be, or is it simply
approximate by nature that one part was only approximate ?
Do they really mean, that an approximate calculation with one float4 must be
calculated in float4 arithmetic ? If you e.g. calculate in float8 it would still
be an approximate result and thus imho conform.

> (Unless of course the value in the float4 happens to be exact, eg,
> an integer of not very many digits.  But if you are relying on that
> to be true, why aren't you using an exact format for storing it?)

Probably because the approximate is more efficient in storage size,
or the designer knew he only wants to store 6 significant digits ?

> > Informix does the calculations in numeric, and then converts the result
> > if no casts are supplied (would do set float4col = float4(float4col::numeric * numericcol)).
>
> I am not sure what the argument is for following Informix's lead rather
> than the standard's lead; especially when Informix evidently doesn't
> understand numerical analysis ;-)

It was only an example of how someone else does it and was why I asked what
other db's do. I would e.g. suspect Oracle does it similarily.
Please, someone check another db !

Andreas


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> Yes, that is the case where the new behavior would imho not be good (but you 
> say spec compliant). I loose precision even though there is room to hold it.
>> 
>> Lose what precision?  It seems silly to imagine that the product of

> Have you seen my example ? If calculated in float4 the result of
> 1.00000000000001*1000.0-1000.0 would be 0.0, no ? 

So?  If you are storing one input as float4, then you cannot rationally
say that you know the result to better than 6 digits, because you don't
know the input to better than 6 digits.  Claiming that 1000.00000000001
is a more accurate answer for the product than 1000.0 is simply wishful
thinking on your part: nothing to the right of the sixth digit actually
means a darn thing, because you don't know whether the input was really
exactly 1000, or should have been perhaps 1000.001.

> Do they really mean, that an approximate calculation with one float4 must be 
> calculated in float4 arithmetic ? If you e.g. calculate in float8 it would still 
> be an approximate result and thus imho conform.

And still the output would be illusory: if you think you'd get 16 digits
of precision that way, then you are failing to grasp the problem.

>> (Unless of course the value in the float4 happens to be exact, eg,
>> an integer of not very many digits.  But if you are relying on that
>> to be true, why aren't you using an exact format for storing it?)

> Probably because the approximate is more efficient in storage size,
> or the designer knew he only wants to store 6 significant digits ?

Seems an exceedingly uncompelling scenario.  The only values that could
be expected to be stored exactly in a float4 (without very careful
analysis) are integers of up to 6 digits; you might as well store the
column as int4 if that's what you plan to keep in it.
        regards, tom lane


Re: Proposal for resolving casting issues

From
"Zeugswetter Andreas SB SD"
Date:
> > Yes, that is the case where the new behavior would imho not be good (but you
> > say spec compliant). I loose precision even though there is room to hold it.
> >>
> >> Lose what precision?  It seems silly to imagine that the product of
>
> > Have you seen my example ? If calculated in float4 the result of
> > 1.00000000000001*1000.0-1000.0 would be 0.0, no ?
>
> So?  If you are storing one input as float4, then you cannot rationally
> say that you know the result to better than 6 digits, because you don't
> know the input to better than 6 digits.  Claiming that 1000.00000000001
> is a more accurate answer for the product than 1000.0 is simply wishful
> thinking on your part: nothing to the right of the sixth digit actually
> means a darn thing, because you don't know whether the input was really
> exactly 1000, or should have been perhaps 1000.001.

I still see 1E-10 as a better answer to above calculation than your 0,
and my snapshot 9/11 does return that 1E-10.

For better understanding the test in pg:
create table atab (a decimal(30,20), b float4, c decimal(30,20), d float4);
insert into atab values (1.000000000000001,100000.0,0, 0);
update atab set c=a*b-b, d=a*b-b where 1=1;
create view av as select a*b-b, 1, b, c,d from atab;
\d av
View definition: SELECT ((atab.a * "numeric"(atab.b)) - "numeric"(atab.b)), atab.a, atab.b
, atab.c, atab.d FROM atab;

If I understood your proposal that would now change to:
View definition: SELECT (("float4"(atab.a) * atab.b) - atab.b), atab.a, atab.b
, atab.c, atab.d FROM atab;

>
> > Do they really mean, that an approximate calculation with one float4 must be
> > calculated in float4 arithmetic ? If you e.g. calculate in float8 it would still
> > be an approximate result and thus imho conform.
>
> And still the output would be illusory: if you think you'd get 16 digits
> of precision that way, then you are failing to grasp the problem.

I have not said 16 digits exact precision. I was saying, that an approximate
result calculated in numeric makes more sense, than your float4 calculated result,
and does the correct thing more often than not in the db centric cases I can think
of.

I do think I grasp the problem :-)

> >> (Unless of course the value in the float4 happens to be exact, eg,
> >> an integer of not very many digits.  But if you are relying on that
> >> to be true, why aren't you using an exact format for storing it?)
>
> > Probably because the approximate is more efficient in storage size,
> > or the designer knew he only wants to store 6 significant digits ?
>
> Seems an exceedingly uncompelling scenario.  The only values that could
> be expected to be stored exactly in a float4 (without very careful
> analysis) are integers of up to 6 digits; you might as well store the
> column as int4 if that's what you plan to keep in it.

You can store 6 significant digits and an exponent (iirc 10E+-38) !
e.g. 1.23456E-20 an int can't do that.

I give up now. I voiced my concern, and that is as far as my interest goes on this
actually. I still think fielding what other db's do in this area would be a good
thing before proceeding further.

Andreas


Re: Proposal for resolving casting issues

From
"Ross J. Reedstrom"
Date:
On Thu, Sep 19, 2002 at 04:57:30PM +0200, Zeugswetter Andreas SB SD wrote:
> 
> > 
> > > Have you seen my example ? If calculated in float4 the result of
> > > 1.00000000000001*1000.0-1000.0 would be 0.0, no ? 
> > 
> > So?  If you are storing one input as float4, then you cannot rationally
> > say that you know the result to better than 6 digits, because you don't
> > know the input to better than 6 digits.  Claiming that 1000.00000000001
> > is a more accurate answer for the product than 1000.0 is simply wishful
> > thinking on your part: nothing to the right of the sixth digit actually
> > means a darn thing, because you don't know whether the input was really
> > exactly 1000, or should have been perhaps 1000.001.
> 
> I still see 1E-10 as a better answer to above calculation than your 0,
> and my snapshot 9/11 does return that 1E-10.

Well, then you'd be wrong. Numerical analysis says you _can't_ get more
information out than went in to the _least_ precise part of a calculation.
What your suggesting is the equivalent of wanting to put up a shelf, so
you estimate the length of the wall by eyeballing it, then measure the
wood for the shelf with a micrometer, to be sure it fits exactly right.

We teach this in intro science classes all the time: if you calculate with
3.14 as an approximation to pi, you better not report the circumference
of a circle as 2.45678932 cm, I'll take off points!

> 
> I do think I grasp the problem :-)

Hmm, I'm not so sure. ;-)

> 
> I give up now. I voiced my concern, and that is as far as my interest goes on this
> actually. I still think fielding what other db's do in this area would be a good 
> thing before proceeding further.

Ah, sorry to drag this on, then. But this is one of those clear cases
were we must fo the right thing, not follow the crowd. PostgreSQL gets
used by a lot of scientific projects (Have you noticed all the big
bioinformatics databases being mentioned on the lists?). Partly because
we're always underfunded, partly because we're academics who like to
have the code. If we start getting basic maths wrong, that'll be a huge
balck eye for the project.

Ross
-- 
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Executive Director                                  phone: 713-348-6166
Gulf Coast Consortium for Bioinformatics              fax: 713-348-6182
Rice University MS-39
Houston, TX 77005


Re: Proposal for resolving casting issues

From
"Ross J. Reedstrom"
Date:
On Thu, Sep 19, 2002 at 10:30:51AM -0500, Ross J. Reedstrom wrote:
> 
> Ah, sorry to drag this on, then. But this is one of those clear cases
> were we must fo the right thing, not follow the crowd. PostgreSQL gets              do
> used by a lot of scientific projects (Have you noticed all the big
> bioinformatics databases being mentioned on the lists?). Partly because
> we're always underfunded, partly because we're academics who like to ^^(scientific projects)                  ^^
> have the code. If we start getting basic maths wrong, that'll be a huge                   ^^(PostgreSQL)
> balck eye for the project. black

Clearly, it's time for an early lunch for me. I need sugar for my brain.

Ross



Re: Proposal for resolving casting issues

From
"Zeugswetter Andreas SB SD"
Date:
> > > > Have you seen my example ? If calculated in float4 the result of
> > > > 1.00000000000001*1000.0-1000.0 would be 0.0, no ?
> > >
> > > So?  If you are storing one input as float4, then you cannot rationally
> > > say that you know the result to better than 6 digits, because you don't
> > > know the input to better than 6 digits.  Claiming that 1000.00000000001
> > > is a more accurate answer for the product than 1000.0 is simply wishful
> > > thinking on your part: nothing to the right of the sixth digit actually
> > > means a darn thing, because you don't know whether the input was really
> > > exactly 1000, or should have been perhaps 1000.001.
> >
> > I still see 1E-10 as a better answer to above calculation than your 0,
> > and my snapshot 9/11 does return that 1E-10.
>
> Well, then you'd be wrong. Numerical analysis says you _can't_ get more
> information out than went in to the _least_ precise part of a calculation.
> What your suggesting is the equivalent of wanting to put up a shelf, so
> you estimate the length of the wall by eyeballing it, then measure the
> wood for the shelf with a micrometer, to be sure it fits
> exactly right.
> We teach this in intro science classes all the time: if you calculate with
> 3.14 as an approximation to pi, you better not report the circumference
> of a circle as 2.45678932 cm, I'll take off points!

What if he must display 9 digits and says the result is approximately 2.45678932
would that be worse than 2.46000000 ?
That is what I am trying to say. Probably the standard is meant as a hint for db
users, that such results will be approximate, not where the first digit sits that
is not exact any more.

For above calculation pg will in the future return 0.00000000000000000000 as an
answer to 1.00000000000001*1000.0-1000.0 when used in my example context, while
it currently returns 0.000000000010 ...
You both are saying, that 0.00000000000000000000 is a better answer.

Andreas


Re: Proposal for resolving casting issues

From
"Ross J. Reedstrom"
Date:
On Thu, Sep 19, 2002 at 06:00:37PM +0200, Zeugswetter Andreas SB SD wrote:
> 
> What if he must display 9 digits and says the result is approximately 2.45678932
> would that be worse than 2.46000000 ? 

Yup. Trailing zeros are not significant. That's why scientific notation is nice:
you don't fill in all those insignificant placeholders.

> 
> For above calculation pg will in the future return 0.00000000000000000000 as an
> answer to 1.00000000000001*1000.0-1000.0 when used in my example context, while
> it currently returns 0.000000000010 ... 
> You both are saying, that 0.00000000000000000000 is a better answer. 

That's right. And correct, as well.

Ross


Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> For above calculation pg will in the future return 0.00000000000000000000 as an
> answer to 1.00000000000001*1000.0-1000.0 when used in my example context, while
> it currently returns 0.000000000010 ... 
> You both are saying, that 0.00000000000000000000 is a better answer. 

Not exactly: we are saying it is not a worse answer.  There's no reason
to prefer one over the other, because they are both within the range
of uncertainty given the inherent uncertainty in the float4 input.

If you want exact results, you should be using exact datatypes.
        regards, tom lane


Re: BLOB

From
Peter Eisentraut
Date:
Christopher Kings-Lynne writes:

> Is there some reason why we didn't call text 'clob' and bytea 'blob'?

At the time our types were created there was no standard defining the
other types.

> or at least add aliases?

Mapping clob to text might be OK, but blob and bytea have totally
different input formats.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Proposal for resolving casting issues

From
"Christopher Kings-Lynne"
Date:
Will the new casting stuff address this kind of annoyance?

usa=# select average(octet_length(val)) from users_sessions;
ERROR:  Function 'average(int4)' does not exist       Unable to identify a function that satisfies the given argument
types       You may need to add explicit typecasts

Chris



Re: Proposal for resolving casting issues

From
"Christopher Kings-Lynne"
Date:
Doh - I'm stupid.  Ignore my question :)

Helps if you spell 'average' as 'avg' :)

Chris

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Christopher
> Kings-Lynne
> Sent: Friday, 20 September 2002 10:03 AM
> To: Tom Lane; Zeugswetter Andreas SB SD
> Cc: Bruce Momjian; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Proposal for resolving casting issues 
> 
> 
> Will the new casting stuff address this kind of annoyance?
> 
> usa=# select average(octet_length(val)) from users_sessions;
> ERROR:  Function 'average(int4)' does not exist
>         Unable to identify a function that satisfies the given argument
> types
>         You may need to add explicit typecasts
> 
> Chris
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 



Re: Proposal for resolving casting issues

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> Will the new casting stuff address this kind of annoyance?
> usa=# select average(octet_length(val)) from users_sessions;
> ERROR:  Function 'average(int4)' does not exist

regression=# select * from pg_proc where proname = 'average';proname | pronamespace | proowner | prolang | proisagg |
prosecdef| proisstrict | proretset | provolatile | pronargs | prorettype | proargtypes | prosrc | probin | proacl
 

---------+--------------+----------+---------+----------+-----------+-------------+-----------+-------------+----------+------------+-------------+--------+--------+--------
(0 rows)


No, I think you'll get the same error ...
        regards, tom lane