Thread: Interval input: usec, msec

Interval input: usec, msec

From
Neil Conway
Date:
neilc=# select '1 day 1 millisecond'::interval;
      interval
--------------------
 1 day 00:00:00.001
(1 row)
neilc=# select '1 millisecond'::interval;
ERROR:  invalid input syntax for type interval: "1 millisecond"
neilc=# select '0.001 seconds'::interval;
   interval
--------------
 00:00:00.001
(1 row)

Similarly for "microsecond".

I think allowing the first input but not the second is pretty
inconsistent. Attached is a patch that fixes this for both milliseconds
and microseconds. The previous coding allowed "x seconds y milliseconds"
to be specified, so this patch preserves that behavior. It *also* allows
"x seconds y seconds" as valid input, although you could make a
consistency argument for that anyway: (with CVS HEAD, unpatched)

neilc=# select '5 days 10 days'::interval;
 interval
----------
 15 days
neilc=# select '1 week 2 week 3 weeks'::interval;
 interval
----------
 42 days
neilc=# select '1 second 2 second'::interval;
ERROR:  invalid input syntax for type interval: "1 second 2 second"

Is there any reason to why DecodeInterval() is willing to accept
multiple specifications for some time units but not others?

-Neil


Attachment

Re: Interval input: usec, msec

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Is there any reason to why DecodeInterval() is willing to accept
> multiple specifications for some time units but not others?

I'd argue that it's an oversight.  I don't have a problem with adding up
the values of units that really translate to the same thing (eg,
'1 week 1 day' -> '8 days'), but I think '1 second 2 second' should
be rejected because it's almost certainly user error.

Does your patch allow '1 millisec 2 microsec', which should be allowed
by this argument?  I suspect that to make it work unsurprisingly, we'd
need to allocate a distinct tmask bit to each logically distinct unit.
(Maybe shortage of tmask bits is why the code is like it is?)

            regards, tom lane

Re: Interval input: usec, msec

From
Neil Conway
Date:
On Mon, 2007-28-05 at 10:50 -0400, Tom Lane wrote:
> I'd argue that it's an oversight.  I don't have a problem with adding up
> the values of units that really translate to the same thing (eg,
> '1 week 1 day' -> '8 days'), but I think '1 second 2 second' should
> be rejected because it's almost certainly user error.

I don't see why "1 week 1 week" is any less likely to be user error than
"1 second 1 second".

> Does your patch allow '1 millisec 2 microsec', which should be allowed
> by this argument?

Yes: in order to allow the above input, the straightforward coding also
allows "1 second 2 second".

> I suspect that to make it work unsurprisingly, we'd
> need to allocate a distinct tmask bit to each logically distinct unit.

Okay, attached is a patch that does this. To summarize, the changes are:

   * add tmask bits for msec, usec (I reordered the #defines to keep
     them logically contiguous, but AFAICS this isn't necessary)
   * if the user specifies multiple instances of usec, msec, or sec,
     reject as invalid input
   * if the user specifies a fractional second ("5.5 seconds"), also
     consider that to be millisecond and microsecond input from the
     POV of rejecting duplicate units. So "5.5 seconds 1 millisecond"
     will be rejected.

Docs and regression tests updated. If people are happy with the above
behavior, I'll commit this to HEAD shortly (today/tomorrow), and perhaps
backport it: not accepting "1 msec" and similar inputs is a clear bug,
IMHO.

BTW, does anyone know why a few of the regression tests (tinterval,
point, geometry, etc.) explicitly disable and then re-enable GEQO?
AFAICS the regression tests are just doing fairly routine two-table
joins, so GEQO will not be invoked with a sane configuration anyway.

-Neil


Attachment

Re: Interval input: usec, msec

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Mon, 2007-28-05 at 10:50 -0400, Tom Lane wrote:
>> I'd argue that it's an oversight.  I don't have a problem with adding up
>> the values of units that really translate to the same thing (eg,
>> '1 week 1 day' -> '8 days'), but I think '1 second 2 second' should
>> be rejected because it's almost certainly user error.

> I don't see why "1 week 1 week" is any less likely to be user error than
> "1 second 1 second".

Right.  I guess you misunderstood me: I was arguing for rejecting double
occurrences of the same unit name, but not occurrences of different unit
names that we happen to map into the same interval field internally.
IOW the behavior ought to be predictable without knowing which unit
names map to the same field.

>    * add tmask bits for msec, usec (I reordered the #defines to keep
>      them logically contiguous, but AFAICS this isn't necessary)

I forget --- are the tmask bits used in stored typmod values for
intervals?  It'd probably be best not to change the meanings of typmod
bits, since those are visible to client code if it wants to look.

> BTW, does anyone know why a few of the regression tests (tinterval,
> point, geometry, etc.) explicitly disable and then re-enable GEQO?

Hmmm ... if you check the cvs history for those tests you might find
some evidence.

            regards, tom lane

Re: Interval input: usec, msec

From
Neil Conway
Date:
On Mon, 2007-28-05 at 15:05 -0400, Tom Lane wrote:
> Right.  I guess you misunderstood me: I was arguing for rejecting double
> occurrences of the same unit name, but not occurrences of different unit
> names that we happen to map into the same interval field internally.

Makes sense to me. I'll send  a patch for this shortly (but I'll commit
the patch for the usec/msec issue first -- this second change shouldn't
be backpatched).

> I forget --- are the tmask bits used in stored typmod values for
> intervals?  It'd probably be best not to change the meanings of typmod
> bits, since those are visible to client code if it wants to look.

According to datetime.h, only some of the tmask bits are used in stored
typmods: YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND, and the patch
doesn't modify any of those. So I think we're okay.

> Hmmm ... if you check the cvs history for those tests you might find
> some evidence.

"Modify regression tests to allow GEQ optimizer (order results).",
according to the 1997 CVS commit from Thomas Lockhart that added the
lines in question. Presumably that is no longer relevant. Nothing
unexpected happens if the disabling of GEQO is removed, so I'll apply
the attached patch shortly barring any objections.

-Neil


Attachment

Re: Interval input: usec, msec

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Mon, 2007-28-05 at 15:05 -0400, Tom Lane wrote:
>> Hmmm ... if you check the cvs history for those tests you might find
>> some evidence.

> "Modify regression tests to allow GEQ optimizer (order results).",
> according to the 1997 CVS commit from Thomas Lockhart that added the
> lines in question. Presumably that is no longer relevant.

Some digging in the CVS history shows that the parameter now known as
geqo_threshold didn't exist until here:

1997-05-31 22:55  momjian

    * src/: backend/optimizer/path/allpaths.c,
    include/optimizer/geqo.h: Enable GEQO for more than six tables,
    from Martin.

so at the time Tom was griping about this, GEQO would indeed have
randomized the results for even two-table joins.

> Nothing
> unexpected happens if the disabling of GEQO is removed, so I'll apply
> the attached patch shortly barring any objections.

No objection here.

            regards, tom lane

Re: Interval input: usec, msec

From
Neil Conway
Date:
On Mon, 2007-28-05 at 13:54 -0400, Neil Conway wrote:
> Okay, attached is a patch that does this. To summarize, the changes are:
>
>    * add tmask bits for msec, usec (I reordered the #defines to keep
>      them logically contiguous, but AFAICS this isn't necessary)
>    * if the user specifies multiple instances of usec, msec, or sec,
>      reject as invalid input
>    * if the user specifies a fractional second ("5.5 seconds"), also
>      consider that to be millisecond and microsecond input from the
>      POV of rejecting duplicate units. So "5.5 seconds 1 millisecond"
>      will be rejected.

Applied to HEAD, backported to 8.2 and 8.1

After I applied the patches, it occurred to me that this patch actually
slightly changes previous behavior: the previous coding accepted inputs
like "1 microsecond 1 microsecond 1 second", whereas the patch rejects
this input. I don't think this is a major problem (input of this kind is
likely a client bug), but I figured I'd mention it...

-Neil



Re: Interval input: usec, msec

From
Michael Glaesemann
Date:
On May 29, 2007, at 0:06 , Neil Conway wrote:

> Applied to HEAD, backported to 8.2 and 8.1

One thing I noticed when looking over the patch is that there are a
few bare numbers in datetime.c such as 100000, 1000, 1e-3, and 1e-6.
In timestamp.[hc] we've defined macros for conversions such as
#define #define USECS_PER_SEC    INT64CONST(1000000)

I'd like to work up a patch that would add similar macros for
datetime.c, in particular using the INT64CONST construction where
appropriate. Thoughts?

Michael Glaesemann
grzm seespotcode net