Thread: Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Vitaly Burovoy

Date:

25 February 2016, 01:21:26

On 2/2/16, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 2/2/16 6:39 PM, Tom Lane wrote:
>> I'm inclined to think that a good solution would be to create an
>> artificial restriction to not accept years beyond, say, 100000 AD.
>> That would leave us with a lot of daylight to not have to worry
>> about corner-case overflows in timestamp arithmetic.  I'm not sure
>> though where we'd need to enforce such a restriction; certainly in
>> timestamp[tz]_in, but where else?
>
> Probably some of the casts (I'd think at least timestamp->timestamptz).
> Maybe timestamp[tz]_recv. Most of the time*pl* functions. :/

Please find attached a patch checks boundaries of date/timestamp[tz].
There are more functions: converting to/from timestamptz, truncating,
constructing from date and time etc.

I left the upper boundary as described[1] in the documentation
(294276-12-31 AD), lower - "as is" (4714-11-24 BC).
It is easy to change the lower boundary to 4713-01-01BC (as described
in the documentation) and it seems necessary because it allows to
simplify IS_VALID_JULIAN and IS_VALID_JULIAN4STAMPS and avoid the next
behavior:

postgres=# select
postgres-#      to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
postgres-#     ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
postgres-#     ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
  to_char  |  to_char  |  to_char
-----------+-----------+-----------
 monday    | monday    | thursday
(1 row)

since 4714-12-28 BC and to the past detection when a week is starting
is broken (because it is boundary of isoyears -4713 and -4712).
Is it worth to break undocumented range or leave it as is?

There is one more flaw: checking for a correctness begins from date
and if default TZ is not UTC, dump/restore of values of type
timestamptz which are close to allowed boundaries can be broken (and
such result can't be restored because date is not in allowed range):
postgres=# SET TIME ZONE 'GMT+1';
SET
postgres=# COPY (SELECT '4714-11-24 00:00:00.000000+00
BC'::timestamptz) TO STDOUT;
4714-11-23 23:00:00-01 BC

Also I'm asking for a help because the query (in default TZ='GMT+1'):
postgres=# SELECT '4714-11-24 00:00:00.000000+00 BC'::timestamptz;

in psql gives a result "4714-11-23 23:00:00-01 BC",
but in a testing system gives "Sun Nov 23 23:00:00 4714 GMT BC"
without TZ offset.

I don't see what can be added to the documentation with the applied patch.

More testings, finding bugs, uncovered functions, advice, comment
improvements are very appreciated.

[1]http://www.postgresql.org/docs/devel/static/datatype-datetime.html

--
Best regards,
Vitaly Burovoy

Attachment

fix_timestamps_overflow.v001.patch

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Vitaly Burovoy

Date:

25 February 2016, 01:34:03

On 2/24/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> On 2/2/16, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> On 2/2/16 6:39 PM, Tom Lane wrote:
>>> I'm inclined to think that a good solution would be to create an
>>> artificial restriction to not accept years beyond, say, 100000 AD.
>>> That would leave us with a lot of daylight to not have to worry
>>> about corner-case overflows in timestamp arithmetic.  I'm not sure
>>> though where we'd need to enforce such a restriction; certainly in
>>> timestamp[tz]_in, but where else?
>>
>> Probably some of the casts (I'd think at least timestamp->timestamptz).
>> Maybe timestamp[tz]_recv. Most of the time*pl* functions. :/
>
> Please find attached a patch checks boundaries of date/timestamp[tz].
> There are more functions: converting to/from timestamptz, truncating,
> constructing from date and time etc.
>
> I left the upper boundary as described[1] in the documentation
> (294276-12-31 AD), lower - "as is" (4714-11-24 BC).
> It is easy to change the lower boundary to 4713-01-01BC (as described
> in the documentation) and it seems necessary because it allows to
> simplify IS_VALID_JULIAN and IS_VALID_JULIAN4STAMPS and avoid the next
> behavior:
>
> postgres=# select
> postgres-#      to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
> postgres-#     ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
> postgres-#     ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
>   to_char  |  to_char  |  to_char
> -----------+-----------+-----------
>  monday    | monday    | thursday
> (1 row)
>
> since 4714-12-28 BC and to the past detection when a week is starting
> is broken (because it is boundary of isoyears -4713 and -4712).
> Is it worth to break undocumented range or leave it as is?
>
> There is one more flaw: checking for a correctness begins from date
> and if default TZ is not UTC, dump/restore of values of type
> timestamptz which are close to allowed boundaries can be broken (and
> such result can't be restored because date is not in allowed range):
> postgres=# SET TIME ZONE 'GMT+1';
> SET
> postgres=# COPY (SELECT '4714-11-24 00:00:00.000000+00
> BC'::timestamptz) TO STDOUT;
> 4714-11-23 23:00:00-01 BC
>
> Also I'm asking for a help because the query (in default TZ='GMT+1'):
> postgres=# SELECT '4714-11-24 00:00:00.000000+00 BC'::timestamptz;
>
> in psql gives a result "4714-11-23 23:00:00-01 BC",
> but in a testing system gives "Sun Nov 23 23:00:00 4714 GMT BC"
> without TZ offset.
>
>
> I don't see what can be added to the documentation with the applied patch.
>
> More testings, finding bugs, uncovered functions, advice, comment
> improvements are very appreciated.
>
> [1]http://www.postgresql.org/docs/devel/static/datatype-datetime.html

I'm sorry, gmail hasn't added a header "In-Reply-To" to the last email.
Previous thread is by the link[2].

http://www.postgresql.org/message-id/flat/CAKOSWNked3JOE++PEs49GMDNfR2ieu9A2jFCEZ6EW-+1c5_u9Q@mail.gmail.com

-- 
Best regards,
Vitaly Burovoy

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Vitaly Burovoy

Date:

25 February 2016, 21:44:32

> <overquoting>

Added to the commitfest 2016-03.

[CF] https://commitfest.postgresql.org/9/540/

-- 
Best regards,
Vitaly Burovoy

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

David Steele

Date:

14 March 2016, 13:23:44

On 2/25/16 4:44 PM, Vitaly Burovoy wrote:

> Added to the commitfest 2016-03.
>
> [CF] https://commitfest.postgresql.org/9/540/

This looks like a fairly straight-forward bug fix (the size of the patch 
is deceptive because there a lot of new tests included).  It applies 
cleanly.

Anastasia, I see you have signed up to review.  Do you have an idea when 
you will get the chance to do that?

Thanks,
-- 
-David
david@pgmasters.net

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Mark Dilger

Date:

14 March 2016, 16:10:16

> On Mar 14, 2016, at 6:23 AM, David Steele <david@pgmasters.net> wrote:
>
> On 2/25/16 4:44 PM, Vitaly Burovoy wrote:
>
>> Added to the commitfest 2016-03.
>>
>> [CF] https://commitfest.postgresql.org/9/540/
>
> This looks like a fairly straight-forward bug fix (the size of the patch is deceptive because there a lot of new
testsincluded).  It applies cleanly. 
>
> Anastasia, I see you have signed up to review.  Do you have an idea when you will get the chance to do that?

The first thing I notice about this patch is that src/include/datatype/timestamp.h
has some #defines that are brittle.  The #defines have comments explaining
their logic, but I'd rather embed that in the #define directly:

This:

+#ifdef HAVE_INT64_TIMESTAMP
+#define MIN_TIMESTAMP  INT64CONST(-211813488000000000)
+/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */
+#define MAX_TIMESTAMP  INT64CONST(9223371331200000000)
+/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */
+#else
+#define MIN_TIMESTAMP  -211813488000.0
+/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */
+#define MAX_TIMESTAMP  9223371331200.0
+/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */
+#endif

Could be written as:

#ifdef HAVE_INT64_TIMESTAMP
#define MIN_TIMESTAMP   ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
#define MAX_TIMESTAMP   ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
#else
#define MIN_TIMESTAMP   ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
#define MAX_TIMESTAMP   ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
#endif

I assume modern compilers would convert these to the same constants at compile-time,
rather than impose a runtime penalty.  The #defines would be less brittle in the event, for
example, that the postgres epoch were ever changed.

Mark Dilger

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Anastasia Lubennikova

Date:

14 March 2016, 16:12:32

14.03.2016 16:23, David Steele:
> On 2/25/16 4:44 PM, Vitaly Burovoy wrote:
>
>> Added to the commitfest 2016-03.
>>
>> [CF] https://commitfest.postgresql.org/9/540/
>
> This looks like a fairly straight-forward bug fix (the size of the 
> patch is deceptive because there a lot of new tests included).  It 
> applies cleanly.
>
> Anastasia, I see you have signed up to review.  Do you have an idea 
> when you will get the chance to do that?
>
> Thanks,

I've read the patch thoroughly and haven't found any problems. I think 
that the patch is in a very good shape.
It fixes a bug and has an excellent set of tests.

There is an issue, mentioned in the thread above:

>postgres=# select
>postgres-#      to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
>postgres-#     ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
>postgres-#     ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
>  to_char  |  to_char  |  to_char
>-----------+-----------+-----------
> monday    | monday    | thursday
>(1 row)

>since 4714-12-28 BC and to the past detection when a week is starting
>is broken (because it is boundary of isoyears -4713 and -4712).
>Is it worth to break undocumented range or leave it as is?

But I suppose that behavior of undocumented dates is not essential.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Vitaly Burovoy

Date:

15 March 2016, 00:12:41

On 3/14/16, Mark Dilger <hornschnorter@gmail.com> wrote:
> The first thing I notice about this patch is that
> src/include/datatype/timestamp.h
> has some #defines that are brittle.  The #defines have comments explaining
> their logic, but I'd rather embed that in the #define directly:
>
> This:
>
> +#ifdef HAVE_INT64_TIMESTAMP
> +#define MIN_TIMESTAMP  INT64CONST(-211813488000000000)
> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */
> +#define MAX_TIMESTAMP  INT64CONST(9223371331200000000)
> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC
> */
> +#else
> +#define MIN_TIMESTAMP  -211813488000.0
> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */
> +#define MAX_TIMESTAMP  9223371331200.0
> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */
> +#endif
>
> Could be written as:
>
> #ifdef HAVE_INT64_TIMESTAMP
> #define MIN_TIMESTAMP
> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
> #define MAX_TIMESTAMP
> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
> #else
> #define MIN_TIMESTAMP
> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
> #define MAX_TIMESTAMP
> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
> #endif
>
> I assume modern compilers would convert these to the same constants at
> compile-time,

Firstly, Postgres is compiling not only by modern compilers.

> rather than impose a runtime penalty.

Secondary, It is hard to write it correctly obeying Postgres' coding
standard (e.g. 80-columns border) and making it clear: it is not so
visual difference between USECS_PER_DAY and SECS_PER_DAY in the
expressions above (but it is vivid in comments in the file). Also a
questions raise "Why is INT64CONST(0) necessary in `#else' block"
(whereas `float' is necessary there) or "Why is INT64CONST set for
MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_
int64).

The file is full of constants. For example, JULIAN_MAX and
SECS_PER_DAY are one of them.

I would leave it as is.

> The #defines would be less brittle in
> the event, for example, that the postgres epoch were ever changed.

I don't think it is real, and even in such case all constants are
collected together in the file and will be found and changed at once.

> Mark Dilger

-- 
Best regards,
Vitaly Burovoy

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Vitaly Burovoy

Date:

15 March 2016, 00:21:36

On 3/14/16, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:
> 14.03.2016 16:23, David Steele:
>> On 2/25/16 4:44 PM, Vitaly Burovoy wrote:
>>
>>> Added to the commitfest 2016-03.
>>>
>>> [CF] https://commitfest.postgresql.org/9/540/
>>
>> This looks like a fairly straight-forward bug fix (the size of the
>> patch is deceptive because there a lot of new tests included).  It
>> applies cleanly.
>>
>> Anastasia, I see you have signed up to review.  Do you have an idea
>> when you will get the chance to do that?
>>
>> Thanks,
>
> I've read the patch thoroughly and haven't found any problems. I think
> that the patch is in a very good shape.
> It fixes a bug and has an excellent set of tests.
>
> There is an issue, mentioned in the thread above:
>
>>postgres=# select
>>postgres-#      to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
>>postgres-#     ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
>>postgres-#     ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
>>  to_char  |  to_char  |  to_char
>>-----------+-----------+-----------
>> monday    | monday    | thursday
>>(1 row)
>
>>since 4714-12-28 BC and to the past detection when a week is starting
>>is broken (because it is boundary of isoyears -4713 and -4712).
>>Is it worth to break undocumented range or leave it as is?
>
> But I suppose that behavior of undocumented dates is not essential.

I'm sorry... What should I do with "Waiting on Author" state if you
don't have complaints?

> --
> Anastasia Lubennikova
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

-- 
Best regards,
Vitaly Burovoy

Re: Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Anastasia Lubennikova

Date:

15 March 2016, 10:46:28

15.03.2016 03:21, Vitaly Burovoy:
> On 3/14/16, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:
>> 14.03.2016 16:23, David Steele:
>>> On 2/25/16 4:44 PM, Vitaly Burovoy wrote:
>>>
>>>> Added to the commitfest 2016-03.
>>>>
>>>> [CF] https://commitfest.postgresql.org/9/540/
>>> This looks like a fairly straight-forward bug fix (the size of the
>>> patch is deceptive because there a lot of new tests included).  It
>>> applies cleanly.
>>>
>>> Anastasia, I see you have signed up to review.  Do you have an idea
>>> when you will get the chance to do that?
>>>
>>> Thanks,
>> I've read the patch thoroughly and haven't found any problems. I think
>> that the patch is in a very good shape.
>> It fixes a bug and has an excellent set of tests.
>>
>> There is an issue, mentioned in the thread above:
>>
>>> postgres=# select
>>> postgres-#      to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
>>> postgres-#     ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
>>> postgres-#     ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
>>>   to_char  |  to_char  |  to_char
>>> -----------+-----------+-----------
>>> monday    | monday    | thursday
>>> (1 row)
>>> since 4714-12-28 BC and to the past detection when a week is starting
>>> is broken (because it is boundary of isoyears -4713 and -4712).
>>> Is it worth to break undocumented range or leave it as is?
>> But I suppose that behavior of undocumented dates is not essential.
> I'm sorry... What should I do with "Waiting on Author" state if you
> don't have complaints?
>

I was going to set "Ready for Committer", but then I've noticed message 
from Mark Dilger and changed my mind.
Now, when you answered him, I have no objections.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Mark Dilger

Date:

15 March 2016, 15:36:04

> On Mar 14, 2016, at 5:12 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>
> On 3/14/16, Mark Dilger <hornschnorter@gmail.com> wrote:
>> The first thing I notice about this patch is that
>> src/include/datatype/timestamp.h
>> has some #defines that are brittle.  The #defines have comments explaining
>> their logic, but I'd rather embed that in the #define directly:
>>
>> This:
>>
>> +#ifdef HAVE_INT64_TIMESTAMP
>> +#define MIN_TIMESTAMP  INT64CONST(-211813488000000000)
>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */
>> +#define MAX_TIMESTAMP  INT64CONST(9223371331200000000)
>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC
>> */
>> +#else
>> +#define MIN_TIMESTAMP  -211813488000.0
>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */
>> +#define MAX_TIMESTAMP  9223371331200.0
>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */
>> +#endif
>>
>> Could be written as:
>>
>> #ifdef HAVE_INT64_TIMESTAMP
>> #define MIN_TIMESTAMP
>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
>> #define MAX_TIMESTAMP
>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
>> #else
>> #define MIN_TIMESTAMP
>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
>> #define MAX_TIMESTAMP
>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
>> #endif
>>
>> I assume modern compilers would convert these to the same constants at
>> compile-time,
>
> Firstly, Postgres is compiling not only by modern compilers.

Do you have an example of a compiler that will not do this constant folding
at compile time?

>> rather than impose a runtime penalty.
>
> Secondary, It is hard to write it correctly obeying Postgres' coding
> standard (e.g. 80-columns border) and making it clear: it is not so
> visual difference between USECS_PER_DAY and SECS_PER_DAY in the
> expressions above (but it is vivid in comments in the file).

Hmm.  I think using USECS_PER_DAY is perfectly clear, but that is a personal
opinion.  I don't see any way to argue if you don't see it that way.

> Also a
> questions raise "Why is INT64CONST(0) necessary in `#else' block"
> (whereas `float' is necessary there)

You appear to be correct.  The second half should be defined in terms of
float.  I compiled on a system with int64 support, so I didn't notice. Thanks
for catching that.

> or "Why is INT64CONST set for
> MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_
> int64).

I was only casting the zero to int64.  That doesn't seem necessary, so it can
be removed.  Both MIN_TIMESTAMP and MAX_TIMESTAMP were defined
in terms of USECS_PER_DAY, which itself is defined in terms of INT64CONST,
so I believe they both work out to be an int64 constant.

> The file is full of constants. For example, JULIAN_MAX and
> SECS_PER_DAY are one of them.

JULIAN_MAXYEAR and JULIAN_MAXYEAR4STAMPS appear to be magic
numbers with no explanation.  I would not point to them as examples to be
followed.

> I would leave it as is.
>
>> The #defines would be less brittle in
>> the event, for example, that the postgres epoch were ever changed.
>
> I don't think it is real, and even in such case all constants are
> collected together in the file and will be found and changed at once.

I agree that they would be found at once.  I disagree that the example
is not real, as I have changed the postgres epoch myself in some builds,
to be able to use int32 timestamps on small devices.

Regards,
Mark Dilger

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Mark Dilger

Date:

15 March 2016, 15:43:35

> On Mar 15, 2016, at 8:35 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
>
>
>> On Mar 14, 2016, at 5:12 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>>
>> On 3/14/16, Mark Dilger <hornschnorter@gmail.com> wrote:
>>> The first thing I notice about this patch is that
>>> src/include/datatype/timestamp.h
>>> has some #defines that are brittle.  The #defines have comments explaining
>>> their logic, but I'd rather embed that in the #define directly:
>>>
>>> This:
>>>
>>> +#ifdef HAVE_INT64_TIMESTAMP
>>> +#define MIN_TIMESTAMP  INT64CONST(-211813488000000000)
>>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */
>>> +#define MAX_TIMESTAMP  INT64CONST(9223371331200000000)
>>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC
>>> */
>>> +#else
>>> +#define MIN_TIMESTAMP  -211813488000.0
>>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */
>>> +#define MAX_TIMESTAMP  9223371331200.0
>>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */
>>> +#endif
>>>
>>> Could be written as:
>>>
>>> #ifdef HAVE_INT64_TIMESTAMP
>>> #define MIN_TIMESTAMP
>>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
>>> #define MAX_TIMESTAMP
>>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
>>> #else
>>> #define MIN_TIMESTAMP
>>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
>>> #define MAX_TIMESTAMP
>>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
>>> #endif
>>>
>>> I assume modern compilers would convert these to the same constants at
>>> compile-time,
>>
>> Firstly, Postgres is compiling not only by modern compilers.
>
> Do you have an example of a compiler that will not do this constant folding
> at compile time?
>
>>> rather than impose a runtime penalty.
>>
>> Secondary, It is hard to write it correctly obeying Postgres' coding
>> standard (e.g. 80-columns border) and making it clear: it is not so
>> visual difference between USECS_PER_DAY and SECS_PER_DAY in the
>> expressions above (but it is vivid in comments in the file).
>
> Hmm.  I think using USECS_PER_DAY is perfectly clear, but that is a personal
> opinion.  I don't see any way to argue if you don't see it that way.
>
>> Also a
>> questions raise "Why is INT64CONST(0) necessary in `#else' block"
>> (whereas `float' is necessary there)
>
> You appear to be correct.  The second half should be defined in terms of
> float.  I compiled on a system with int64 support, so I didn't notice. Thanks
> for catching that.
>
>> or "Why is INT64CONST set for
>> MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_
>> int64).
>
> I was only casting the zero to int64.  That doesn't seem necessary, so it can
> be removed.  Both MIN_TIMESTAMP and MAX_TIMESTAMP were defined
> in terms of USECS_PER_DAY, which itself is defined in terms of INT64CONST,
> so I believe they both work out to be an int64 constant.
>
>> The file is full of constants. For example, JULIAN_MAX and
>> SECS_PER_DAY are one of them.
>
> JULIAN_MAXYEAR and JULIAN_MAXYEAR4STAMPS appear to be magic
> numbers with no explanation.  I would not point to them as examples to be
> followed.
>
>> I would leave it as is.
>>
>>> The #defines would be less brittle in
>>> the event, for example, that the postgres epoch were ever changed.
>>
>> I don't think it is real, and even in such case all constants are
>> collected together in the file and will be found and changed at once.
>
> I agree that they would be found at once.  I disagree that the example
> is not real, as I have changed the postgres epoch myself in some builds,
> to be able to use int32 timestamps on small devices.
>
> Regards,
> Mark Dilger

I forgot to mention that I am not trying to block this commit.  These were just
my observations about the patch, for your consideration.

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Tom Lane

Date:

16 March 2016, 15:33:00

Mark Dilger <hornschnorter@gmail.com> writes:
>> On Mar 14, 2016, at 5:12 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>> I don't think it is real, and even in such case all constants are
>> collected together in the file and will be found and changed at once.

> I agree that they would be found at once.  I disagree that the example
> is not real, as I have changed the postgres epoch myself in some builds,
> to be able to use int32 timestamps on small devices.

I concur with Vitaly that it's not this patch's job to make it easier to
change the epoch date.  If you want to submit a patch for that purpose,
you're welcome to.

I have a bigger problem though: I see that the patch enforces AD 294277
as the endpoint for both integer and floating-point datetimes.  This
contradicts the statement in the docs (section 8.5) that
   Note that using floating-point datetimes allows a larger range of   timestamp values to be represented than shown
above:from 4713 BC up   to 5874897 AD.

Since that is just about the only non-historical reason why somebody might
wish to use float timestamps, I'm rather reluctant to remove the feature,
especially without any discussion --- and I don't see any discussion of
this point upthread.

My feeling is we ought to preserve the old behavior here, which would
involve making JULIAN_MAXYEAR_FOR_TIMESTAMPS format-dependent and
adjusting the float values for the two derived constants; not much of a
problem code-wise.  I think though that it would break quite a number of
the proposed new regression tests for the float case.  TBH, I thought
the number of added test cases was rather excessive anyway, so I wouldn't
have a problem with just leaving out whichever ones don't pass with both
build options.

Comments?
        regards, tom lane

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Tom Lane

Date:

16 March 2016, 17:24:29

I wrote:
> My feeling is we ought to preserve the old behavior here, which would
> involve making JULIAN_MAXYEAR_FOR_TIMESTAMPS format-dependent and
> adjusting the float values for the two derived constants; not much of a
> problem code-wise.  I think though that it would break quite a number of
> the proposed new regression tests for the float case.  TBH, I thought
> the number of added test cases was rather excessive anyway, so I wouldn't
> have a problem with just leaving out whichever ones don't pass with both
> build options.

Actually, it seems a lot of the provided test cases fail for float
timestamps anyway; there's an assumption that294276-12-31 23:59:59.999999294277-01-01 00:00:00.000000
are distinct timestamps, which they are not in float mode.
        regards, tom lane

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Tom Lane

Date:

16 March 2016, 19:40:01

I had almost gotten to the point of being willing to commit this patch
when I noticed that it fails to fix the originally-complained-of-problem:

regression=# set time zone 'GMT+1';
SET
regression=# select '4714-11-24 00:00:00+00 BC'::timestamptz;       timestamptz        
---------------------------4714-11-23 23:00:00-01 BC
(1 row)

regression=# select '4714-11-23 23:00:00-01 BC'::timestamptz;
ERROR:  timestamp out of range: "4714-11-23 23:00:00-01 BC"
LINE 1: select '4714-11-23 23:00:00-01 BC'::timestamptz;              ^

The problem here is that the timestamp satisfies IS_VALID_TIMESTAMP just
fine, but its printed form contains a date that the Julian-day routines
can't handle.

AFAICS the only way that we can avoid a dump/reload hazard is to tighten
up the allowed range of timestamps by at least one day, so that any
timestamp that passes IS_VALID_TIMESTAMP() is guaranteed to print, in
any timezone, with a contained date that the Julian-day routines can
handle.  I'd be inclined to set the lower limit of timestamps as
'4713-01-01 00:00 GMT BC' just to keep things simple.  (The upper limit
can stay where it is.)

While dates don't have this timezone rotation problem, the documentation
says that they have the same lower-bound limit as timestamps, and there
are places in the code that assume that too.  Is it okay to move their
lower bound as well?
        regards, tom lane

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Vitaly Burovoy

Date:

16 March 2016, 21:27:52

On 2016-03-16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> My feeling is we ought to preserve the old behavior here, which would
> involve making JULIAN_MAXYEAR_FOR_TIMESTAMPS format-dependent and
> adjusting the float values for the two derived constants; not much of a
> problem code-wise.  I think though that it would break quite a number of
> the proposed new regression tests for the float case.

I think I can be solved by adding new tests for new upper bound for
float values and creating a new version of expected file like
date_1.out, horology_1.out, timestamp_1.out and timestamptz.out (the
original files are for integer values; with prefix "_1" are for float
ones).

> TBH, I thought
> the number of added test cases was rather excessive anyway, so I wouldn't
> have a problem with just leaving out whichever ones don't pass with both
> build options.

The tests checks edge cases. In case of saving bigger interval of
allowed values for float values you'll remove checks when they should
fail. What's the left cases for?

On 2016-03-16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Actually, it seems a lot of the provided test cases fail for float
> timestamps anyway; there's an assumption that
>     294276-12-31 23:59:59.999999
>     294277-01-01 00:00:00.000000
> are distinct timestamps, which they are not in float mode.

I'm ready to change fractional part '.999999' to '.5' (to avoid issues
of different implementation of floating point on different
architectures) to emphasize "almost reached the threshold".

On 2016-03-16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> AFAICS the only way that we can avoid a dump/reload hazard is to tighten
> up the allowed range of timestamps by at least one day, so that any
> timestamp that passes IS_VALID_TIMESTAMP() is guaranteed to print, in
> any timezone, with a contained date that the Julian-day routines can
> handle.  I'd be inclined to set the lower limit of timestamps as
> '4713-01-01 00:00 GMT BC' just to keep things simple.  (The upper limit
> can stay where it is.)
>
> While dates don't have this timezone rotation problem, the documentation
> says that they have the same lower-bound limit as timestamps, and there
> are places in the code that assume that too.  Is it okay to move their
> lower bound as well?

I think it is important (see initial letter) since it is out of
supported interval (according to the documentation) and don't work as
expected in some cases (see "to_char(date_trunc('week', '4714-12-28
BC'::date),'day')"). It seems it leads to change
IS_VALID_JULIAN[_FOR_STAMPS] as well to something like:

#define IS_VALID_JULIAN(y,m,d) \((JULIAN_MINYEAR < (y) && (y) < JULIAN_MAXYEAR) ||(JULIAN_MINYEAR == (y) && (m) == 12
&&(d) == 31) ||(JULIAN_MAXYEAR == (y) && (m) == 01 && (d) == 01))

It can be correct if checks for IS_VALID_DATE is added in date.c to
places marked as "IS_VALID_JULIAN is enough for checking..."
All other places are (I'm sure) covered by IS_VALID_DATE/IS_VALID_TIMESTAMP.

What about the question:
On 2016-02-24, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> Also I'm asking for a help because the query (in default TZ='GMT+1'):
> postgres=# SELECT '4714-11-24 00:00:00.000000+00 BC'::timestamptz;
>
> in psql gives a result "4714-11-23 23:00:00-01 BC",
> but in a testing system gives "Sun Nov 23 23:00:00 4714 GMT BC"
> without TZ offset.

Why there is "GMT" instead of "GMT+01"? Is it bug? If so should it be
fixed in this patch or can be fixed later?

-- 
Best regards,
Vitaly Burovoy

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Vitaly Burovoy

Date:

16 March 2016, 21:47:58

On 2016-03-15, Mark Dilger <hornschnorter@gmail.com> wrote:
>
>> On Mar 14, 2016, at 5:12 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com>
>> wrote:
>>
>> On 3/14/16, Mark Dilger <hornschnorter@gmail.com> wrote:
>>> The first thing I notice about this patch is that
>>> src/include/datatype/timestamp.h
>>> has some #defines that are brittle.  The #defines have comments
>>> explaining
>>> their logic, but I'd rather embed that in the #define directly:
>>>
>>> This:
>>>
>>> +#ifdef HAVE_INT64_TIMESTAMP
>>> +#define MIN_TIMESTAMP  INT64CONST(-211813488000000000)
>>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */
>>> +#define MAX_TIMESTAMP  INT64CONST(9223371331200000000)
>>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 *
>>> USECS_PER_SEC
>>> */
>>> +#else
>>> +#define MIN_TIMESTAMP  -211813488000.0
>>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */
>>> +#define MAX_TIMESTAMP  9223371331200.0
>>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */
>>> +#endif
>>>
>>> Could be written as:
>>>
>>> #ifdef HAVE_INT64_TIMESTAMP
>>> #define MIN_TIMESTAMP
>>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
>>> #define MAX_TIMESTAMP
>>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY)
>>> #else
>>> #define MIN_TIMESTAMP
>>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
>>> #define MAX_TIMESTAMP
>>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY)
>>> #endif
>>>
>>> I assume modern compilers would convert these to the same constants at
>>> compile-time,
>>
>> Firstly, Postgres is compiling not only by modern compilers.
>
> Do you have an example of a compiler that will not do this constant folding
> at compile time?

No, I'm not good at knowing features of all versions and all kings of
compilers, but I'm sure constants are better than expressions for big
values. =)

>>> rather than impose a runtime penalty.
>>
>> Secondary, It is hard to write it correctly obeying Postgres' coding
>> standard (e.g. 80-columns border) and making it clear: it is not so
>> visual difference between USECS_PER_DAY and SECS_PER_DAY in the
>> expressions above (but it is vivid in comments in the file).
>
> Hmm.  I think using USECS_PER_DAY is perfectly clear, but that is a
> personal
> opinion.  I don't see any way to argue if you don't see it that way.

I'm talking about perception of the constants when they a very similar
but they are not justified by a single column (where difference
_in_expressions_ are clear). There was a difference by a single char
("U") only which is not so obvious without deep looking on it (be
honest I'd missed it until started to write an answer).

>> or "Why is INT64CONST set for
>> MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_
>> int64).
>
> I was only casting the zero to int64.  That doesn't seem necessary, so it
> can
> be removed.  Both MIN_TIMESTAMP and MAX_TIMESTAMP were defined
> in terms of USECS_PER_DAY, which itself is defined in terms of INT64CONST,
> so I believe they both work out to be an int64 constant.

I hope so. But in such cases I'd prefer to _begin_ calculations from
int64, not to _finish_ by it.
It is impossible to pass constants (like JULIAN_MAX4STAMPS) to
INT64CONST macros. Inserting "(int64)" makes rows larger by 7 chars...
;-)

>>> The #defines would be less brittle in
>>> the event, for example, that the postgres epoch were ever changed.
>>
>> I don't think it is real, and even in such case all constants are
>> collected together in the file and will be found and changed at once.
>
> I agree that they would be found at once.  I disagree that the example
> is not real, as I have changed the postgres epoch myself in some builds,
> to be able to use int32 timestamps on small devices.

Wow! I'm sorry, I didn't know about it.
But in such case (tighten to int32) there are more than two places
which should be changed. Two more (four with disabled
HAVE_INT64_TIMESTAMP) constants are not big deal with it.

-- 
Best regards,
Vitaly Burovoy

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Mark Dilger

Date:

16 March 2016, 22:02:00

> 
>>>> The #defines would be less brittle in
>>>> the event, for example, that the postgres epoch were ever changed.
>>> 
>>> I don't think it is real, and even in such case all constants are
>>> collected together in the file and will be found and changed at once.
>> 
>> I agree that they would be found at once.  I disagree that the example
>> is not real, as I have changed the postgres epoch myself in some builds,
>> to be able to use int32 timestamps on small devices.
> 
> Wow! I'm sorry, I didn't know about it.
> But in such case (tighten to int32) there are more than two places
> which should be changed. Two more (four with disabled
> HAVE_INT64_TIMESTAMP) constants are not big deal with it.

Please, do not worry about that.  I do not mean that your code needs
to be compatible with my fork.

Regards,
Mark Dilger

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Tom Lane

Date:

16 March 2016, 23:19:25

I wrote:
> I had almost gotten to the point of being willing to commit this patch
> when I noticed that it fails to fix the originally-complained-of-problem:
> ...
> AFAICS the only way that we can avoid a dump/reload hazard is to tighten
> up the allowed range of timestamps by at least one day, so that any
> timestamp that passes IS_VALID_TIMESTAMP() is guaranteed to print, in
> any timezone, with a contained date that the Julian-day routines can
> handle.

I realized that there's a way to do this without reducing the historical
limits of the datatypes.  It's already the case that date2j() can handle
dates a little beyond the stated upper limit of the date type, and if
you take a close look at it, you'll realize that it doesn't have a
problem with dates a little before 4714-11-24 BC, either.  (It looks
like it would work back to about 4800 BC, though I've not attempted
to verify that independently.)  All we really need is for it to return
-1 for 4714-11-23 BC, and it certainly does that.  Then, if we allow
such a date to pass IS_VALID_JULIAN, we can accept something like
'4714-11-23 23:00:00-01 BC'::timestamptz and apply the exact range
check only after doing the timestamp adjustment.  The same trick
works at the upper end of the range.  The key is simply that we have
to have an explicit range check at the end rather than relying entirely
on IS_VALID_JULIAN --- but this patch was about 80% of the way there
already.

So I fixed that up and committed it, with a very short set of new
regression test cases.  I seriously doubt that the other ones add
enough value to be worth trying to make them work in both float- and
int-timestamp cases; though if you want to submit a new patch to
add more test cases we could consider it.  My feeling about it is that
that kind of testing does nothing for errors of omission (ie functions
that fail to range-check their results), which is the most likely sort
of bug here, and so I doubt it's worth adding them to regression tests
that many people will run many times a day for a long time to come.
        regards, tom lane

Re: Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Tom Lane

Date:

17 March 2016, 00:13:24

Anastasia Lubennikova <a.lubennikova@postgrespro.ru> writes:
> There is an issue, mentioned in the thread above:

>> postgres=# select
>> postgres-#      to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
>> postgres-#     ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
>> postgres-#     ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
>> to_char  |  to_char  |  to_char
>> -----------+-----------+-----------
>> monday    | monday    | thursday
>> (1 row)

>> since 4714-12-28 BC and to the past detection when a week is starting
>> is broken (because it is boundary of isoyears -4713 and -4712).

BTW, I think the actual problem is that j2day() figured that coercing
its argument to unsigned int would be sufficient to produce a sane
answer for negative inputs.  It isn't.  Nobody sees this with inputs
after 4714BC, but when probing in 4714 the code considers the
reference point 4714-01-04, which has a negative Julian date and so
we end up passing a negative date to j2day().
        regards, tom lane

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Vitaly Burovoy

Date:

17 March 2016, 03:02:35

On 3/16/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I fixed that up and committed it, with a very short set of new
> regression test cases.  I seriously doubt that the other ones add
> enough value to be worth trying to make them work in both float- and
> int-timestamp cases; though if you want to submit a new patch to
> add more test cases we could consider it.  My feeling about it is that
> that kind of testing does nothing for errors of omission (ie functions
> that fail to range-check their results), which is the most likely sort
> of bug here, and so I doubt it's worth adding them to regression tests
> that many people will run many times a day for a long time to come.
>
>             regards, tom lane

Thank you very much! If you decide such kind of tests is not
necessary, it is enough for me.

Is there any reason to leave JULIAN_MINDAY and JULIAN_MAXDAY which are
not used now?
Also why JULIAN_MAXMONTH is set to "6" whereas
{DATE|TIMESTAMP}_END_JULIAN use "1" as month?

-- 
Best regards,
Vitaly Burovoy

Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

From

Tom Lane

Date:

17 March 2016, 13:58:15

Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
> Is there any reason to leave JULIAN_MINDAY and JULIAN_MAXDAY which are
> not used now?

Those are just there to document what the limits really are.  Possibly
some code would need them in future.

> Also why JULIAN_MAXMONTH is set to "6" whereas
> {DATE|TIMESTAMP}_END_JULIAN use "1" as month?

Because we're intentionally allowing a wider range for IS_VALID_JULIAN
than for IS_VALID_DATE/TIMESTAMP.
        regards, tom lane