Re: FWD: overlaps() bug? - Mailing list pgsql-hackers

From Thomas Lockhart
Subject Re: FWD: overlaps() bug?
Date
Msg-id 3C6C7771.F01BBC9F@fourpalms.org
Whole thread Raw
List pgsql-hackers
(back on list; it is an interesting discussion imho)

> Thanks for the enlightening reply. It seems self-evident to me that if
> following a specification results in a mis-assertion (as demonstrated in my
> test case) then either the specification contains an error in logic or the
> application logic is in violation of a predicating assumption inherent in
> the specification. In this case, I think the latter applies. The "swap
> inputs if E1 < D1" logic is predicated on the assumption of an Euclidean
> space while time-of-day data points form an essentially cylindrical space.
> The correct
> logic for this type of space is:
>                          if E1 < D1
>                               return (D2, E2) not overlap (D1,E1)
>                          else
>                               return (D1,E1) overlap (D2,E2)
> where the overlap function above is the Euclidean overlap as currently
> defined.

Yup. But correct and intuitive may not be the same in this case, given
the guidance of the SQL99 spec as I understand it.

> "But in your case, choosing to record only times but then expecting the
> code to respect a day boundary ..."
> The problem I am trying to solve is completely generic. Periodic schedules
> are quite common and are usually _not_ associated with specific dates.
> Examples are television broadcast schedules, shipping and routing schedules,
> maintenance schedules and the like.

Sure, I agree. And changing the second argument to an interval does not
help, since the spec seems to call for some implicit math which turns it
into exactly the case you already see.

> The problem you allude to (intervals that exceed 24 hours) would seem to me
> to abuse of the time (of day) data type. As a programmer I would expect to
> have to handle such corner cases as exceptions (although the overlap
> function could easily handle this case since _any_ time interval overlaps an
> interval that exceeds 24 hours!) What I don't expect is for a built-in
> Boolean function to lie to me when used according to the published API!
> Violating a specification's underlying assumption is the same as violating
> the specification. One should either re-write the overlap function to
> prperly handle time/timetz data points or eliminate the overlap function for
> the time data altogether. As it stands, it is broken and dangerous.

Sorry, I haven't yet made the leap from taking the spec literally (as I
think we have done) to somehow violating the spec's underlying
assumption. Clearly the spec puts TIME and TIME WITH TIME ZONE into the
same "datetime data type" category discussed in the OVERLAPS definition.
What "underlying assumption" are you referring to? I *know* that this
particular case seems to lead to non-intuitive behavior, and I've made
the argument before that we should violate a spec if it is sufficiently
damaged, but I'm not sure that we should make that leap here. I'm not
actually arguing against it, other than we should be inclined by default
to follow the spec.

Comments?
                    - Thomas

> > Note the third row in the query result below is in error. The four hour
> > interval (2300UTC - 0300UTC) does not overlap the interval
> 1530UTC-1627UTC).
> > Is this a bug?
> 
> No. It conforms to (my reading of) the SQL99 spec. So it is a feature,
> even if I misread the spec. Which I think I didn't ;) But if I did, then
> we can change the implementation of course.
> 
> I've included the relevant part of the spec below. It seems clause (3)
> requires that we reorder the arguments to OVERLAPS, though perhaps
> someone would like to research whether TIME is allowed to be used with
> OVERLAPS at all (if not, then we could make up the rules ourselves).
> 
> > It would be cool if timetz (or time) datatypes were to wrap properly
> > across day boundaries (i.e. if start time < stop time then assume start
> time
> > is day before) but at the very least, the overlaps functions should not
> lie
> > to you!
> 
> Some parts of the spec aren't cool, or interfer with coolness. This may
> be one of them. If everything conforms to the standard, then we can
> start discussing whether that part of the standard is so brain-dead as
> to be useless or likely to directly cause damage.
> 
> But in your case, choosing to record only times but then expecting the
> code to respect a day boundary seems to be an assumption which could
> bite you in other ways later. What happens when an interval happens to
> be longer than a day??
> 
> hth
> 
>                     - Thomas
> 
> (omit some text defining the input as "(D1, E1) OVERLAPS (D2, E2)" as
> the input to the OVERLAPS operator)
> 
> 3) If D1 is the null value or if E1 < D1, then let S1 = E1 and let
>    T1 = D1. Otherwise, let S1 = D1 and let T1 = E1.
> 4) Case:
>    a) If the most specific type of the second field of <row value
>       expression 2> is a datetime data type, then let E2 be the
>       value of the second field of <row value expression 2>.
>     b) If the most specific type of the second field of <row value
>        expression 2> is INTERVAL, then let I2 be the value of the
>        second field of <row value expression 2>. Let E2 = D2 + I2.
> 5) If D2 is the null value or if E2 < D2, then let S2 = E2 and let
>    T2 = D2. Otherwise, let S2 = D2 and let T2 = E2.
> 6) The result of the <overlaps predicate> is the result of the
>    following expression:
>      ( S1 > S2 AND NOT ( S1 >= T2 AND T1 >= T2 ) )
>           OR
>      ( S2 > S1 AND NOT ( S2 >= T1 AND T2 >= T1 ) )
>           OR
> 
>      ( S1 = S2 AND ( T1 <> T2 OR T1 = T2 ) )


pgsql-hackers by date:

Previous
From: Brian Hirt
Date:
Subject: Re: Strange problem when upgrading to 7.2 with pg_upgrade.
Next
From: Tom Lane
Date:
Subject: Re: Ready to branch 7.2/7.3 ?