Thread: Problems with avg on interval data type

Problems with avg on interval data type

From
pgsql-bugs@postgresql.org
Date:
Jeremy Blumenfeld (jeremy@horizonlive.com) reports a bug with a severity of 1
The lower the number the more severe it is.

Short Description
Problems with avg on interval data type

Long Description
We have recently upgraded from 7.0.3 to 7.1 and a query which used to work is no longer working.
The query does an avg on an interval column and now gets the error:

ERROR:  Bad interval external representation '0'

We've reproduced this problem with other tables using an interval type.  It can handle count, sum, but not avg.

Sample Code


No file was uploaded with this report

Re: Problems with avg on interval data type

From
Tom Lane
Date:
pgsql-bugs@postgresql.org writes:
> The query does an avg on an interval column and now gets the error:
> ERROR:  Bad interval external representation '0'

Sorry about that :-(.  A last-minute tightening of the allowed input
formats for interval broke avg(interval), but you're the first one to
notice.

I have corrected this in the sources for 7.1.2, but that will not help
you much unless you care to re-initdb with 7.1.2.  If you don't want to
initdb, you can manually correct the erroneous catalog entry with

update pg_aggregate set agginitval = '{0 second,0 second}' where
aggname = 'avg' and aggbasetype = 1186;

Note you will need to do this in each extant database including
template1 (or at least all the databases where you plan to use
avg(interval)).

            regards, tom lane

Re: Problems with avg on interval data type

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> We have recently upgraded from 7.0.3 to 7.1 and a query which used
>> to work is no longer working.
>> The query does an avg on an interval column and now gets the error:
>> ERROR:  Bad interval external representation '0'

> OK, there is one case of interval constant which is not handled
> correctly in the 7.1.x release -- the simplest interval specification
> having only an unadorned integer. That is a bug, for which I have a
> patch (or patches) available.

I have modified the declaration of avg(interval) to initialize its
accumulator as '0 second' instead of just '0', so as far as the
aggregate goes it's not necessary to consider this a bug.

> Currently, we interpret various forms as follows:

>   Value    Units
>   +8        hours
>   -8        hours
>   8.0        seconds
>   8        ?? seconds ??

> I would propose that the last example should be interpreted in units of
> seconds, but that could be perilously close to the conventions required
> for the signed examples. Comments?

Yipes.  I do not like the idea that '+8' and '8' yield radically
different results.  That's definitely going to create unhappiness.

I suggest that the current code is more correct than you think ;-).
ISTM it is a good idea to require a units field, or at least some
punctuation giving a clue about units --- for example I do not object to
'08:00' being interpreted as hours and minutes.  But I would be inclined
to reject all four of the forms '+8', '-8', '8.0', and '8' as ambiguous.
Is there something in the SQL spec that requires us to accept them?
        regards, tom lane


Re: Problems with avg on interval data type

From
Thomas Lockhart
Date:
> We have recently upgraded from 7.0.3 to 7.1 and a query which used
> to work is no longer working.
> The query does an avg on an interval column and now gets the error:
> ERROR:  Bad interval external representation '0'

OK, there is one case of interval constant which is not handled
correctly in the 7.1.x release -- the simplest interval specification
having only an unadorned integer. That is a bug, for which I have a
patch (or patches) available.

Before I post the patch (which should go into the 7.1.2 release as a bug
fix) I need feedback on a conventions dilemma, which led to the code
modifications which introduced the bug. Here it is:

Intervals usually indicate a time span, and can be specified with either
"# units" strings (e.g. '5 hours') or (as of 7.1) as "hh:mm:ss" (e.g.
'05:00').

A new construct, "a_expr AT TIME ZONE c_expr" is supported in 7.1, per
SQL99 spec. One of the possible arguments is
 a_expr AT TIME ZONE 'PST'

and
 a_expr AT TIME ZONE INTERVAL '-08:00'

It is this last style which leads to the problem of how to interpret
signed or unsigned integers as interval types. For example, in this
context
 INTERVAL '-8'

must be interpreted as having units of "hours", while our historical
usage has
 INTERVAL '8'

being interpreted as "seconds" (even with signed values). Currently, we
interpret various forms as follows:
 Value    Units +8    hours -8    hours 8.0    seconds 8    ?? seconds ??

I would propose that the last example should be interpreted in units of
seconds, but that could be perilously close to the conventions required
for the signed examples. Comments?
                    - Thomas


Re: Problems with avg on interval data type

From
Thomas Lockhart
Date:
> >   Value       Units
> >   +8          hours
> >   -8          hours
> >   8.0         seconds
> >   8           ?? seconds ??
> > I would propose that the last example should be interpreted in units of
> > seconds, but that could be perilously close to the conventions required
> > for the signed examples. Comments?
> Yipes.  I do not like the idea that '+8' and '8' yield radically
> different results.  That's definitely going to create unhappiness.

Yeah, I agree. The ugliness is that an unsigned integer has been
accepted in the past as "seconds", and would seem to be a reasonable
assumption.

> I suggest that the current code is more correct than you think ;-).
> ISTM it is a good idea to require a units field, or at least some
> punctuation giving a clue about units --- for example I do not object to
> '08:00' being interpreted as hours and minutes.  But I would be inclined
> to reject all four of the forms '+8', '-8', '8.0', and '8' as ambiguous.
> Is there something in the SQL spec that requires us to accept them?

Single-field signed integers (and unsigned integers?) must be acceptable
for a time zone specification (pretty sure this is covered in the SQL
spec). Remember that SQL is woefully inadequate for date, time, and time
zone support, but afaicr a signed integer is one of the few things they
do specify ;)
                     - Thomas


Re: Re: Problems with avg on interval data type

From
Peter Eisentraut
Date:
Tom Lane writes:

> I suggest that the current code is more correct than you think ;-).
> ISTM it is a good idea to require a units field, or at least some
> punctuation giving a clue about units --- for example I do not object to
> '08:00' being interpreted as hours and minutes.  But I would be inclined
> to reject all four of the forms '+8', '-8', '8.0', and '8' as ambiguous.
> Is there something in the SQL spec that requires us to accept them?

Our interval is quite a bit different from the SQL version.  In SQL, an
interval value looks like this:

INTERVAL -'5 12:30:15.3' DAY TO SECOND

The unit qualifier is required.  Consequentially, I would reject anything
without units, except '0' maybe.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Problems with avg on interval data type

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> I suggest that the current code is more correct than you think ;-).
>> ISTM it is a good idea to require a units field, or at least some
>> punctuation giving a clue about units --- for example I do not object to
>> '08:00' being interpreted as hours and minutes.  But I would be inclined
>> to reject all four of the forms '+8', '-8', '8.0', and '8' as ambiguous.
>> Is there something in the SQL spec that requires us to accept them?

> Single-field signed integers (and unsigned integers?) must be acceptable
> for a time zone specification (pretty sure this is covered in the SQL
> spec).

But surely there is other context cuing you that the number is a
timezone?  In any case, you weren't proposing that interval_in
should accept '8' as a timezone ...
        regards, tom lane


Re: Problems with avg on interval data type

From
Thomas Lockhart
Date:
> > Single-field signed integers (and unsigned integers?) must be acceptable
> > for a time zone specification (pretty sure this is covered in the SQL
> > spec).
> But surely there is other context cuing you that the number is a
> timezone?  In any case, you weren't proposing that interval_in
> should accept '8' as a timezone ...

In the particular case I mentioned, any context is gone by the time the
parser gets beyond gram.y (and that is well before the constant is
evaluated). The general point is that in this case (and perhaps some
other cases) there is a need to convert an interval into a time zone, so
the specification of either or both had better be self consistant.

We do not have an explicit timezone type which could make different
assumptions about units on unadorned integers (and I'd like to avoid
defining a new type for this purpose since SQL9x seems to think that
"interval" should be enough). I'd also like to avoid *requiring* the
full brain damage of SQL9x interval specifications such as Peter
mentioned; we may support it, but should not require it since it is
truely horrid.
                      - Thomas