Thread: Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Vitaly Burovoy
Date:
On 2/2/16, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 2/2/16 6:39 PM, Tom Lane wrote: >> I'm inclined to think that a good solution would be to create an >> artificial restriction to not accept years beyond, say, 100000 AD. >> That would leave us with a lot of daylight to not have to worry >> about corner-case overflows in timestamp arithmetic. I'm not sure >> though where we'd need to enforce such a restriction; certainly in >> timestamp[tz]_in, but where else? > > Probably some of the casts (I'd think at least timestamp->timestamptz). > Maybe timestamp[tz]_recv. Most of the time*pl* functions. :/ Please find attached a patch checks boundaries of date/timestamp[tz]. There are more functions: converting to/from timestamptz, truncating, constructing from date and time etc. I left the upper boundary as described[1] in the documentation (294276-12-31 AD), lower - "as is" (4714-11-24 BC). It is easy to change the lower boundary to 4713-01-01BC (as described in the documentation) and it seems necessary because it allows to simplify IS_VALID_JULIAN and IS_VALID_JULIAN4STAMPS and avoid the next behavior: postgres=# select postgres-# to_char(date_trunc('week', '4713-01-01 BC'::date),'day') postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day') postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day'); to_char | to_char | to_char -----------+-----------+----------- monday | monday | thursday (1 row) since 4714-12-28 BC and to the past detection when a week is starting is broken (because it is boundary of isoyears -4713 and -4712). Is it worth to break undocumented range or leave it as is? There is one more flaw: checking for a correctness begins from date and if default TZ is not UTC, dump/restore of values of type timestamptz which are close to allowed boundaries can be broken (and such result can't be restored because date is not in allowed range): postgres=# SET TIME ZONE 'GMT+1'; SET postgres=# COPY (SELECT '4714-11-24 00:00:00.000000+00 BC'::timestamptz) TO STDOUT; 4714-11-23 23:00:00-01 BC Also I'm asking for a help because the query (in default TZ='GMT+1'): postgres=# SELECT '4714-11-24 00:00:00.000000+00 BC'::timestamptz; in psql gives a result "4714-11-23 23:00:00-01 BC", but in a testing system gives "Sun Nov 23 23:00:00 4714 GMT BC" without TZ offset. I don't see what can be added to the documentation with the applied patch. More testings, finding bugs, uncovered functions, advice, comment improvements are very appreciated. [1]http://www.postgresql.org/docs/devel/static/datatype-datetime.html -- Best regards, Vitaly Burovoy
Attachment
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Vitaly Burovoy
Date:
On 2/24/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > On 2/2/16, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: >> On 2/2/16 6:39 PM, Tom Lane wrote: >>> I'm inclined to think that a good solution would be to create an >>> artificial restriction to not accept years beyond, say, 100000 AD. >>> That would leave us with a lot of daylight to not have to worry >>> about corner-case overflows in timestamp arithmetic. I'm not sure >>> though where we'd need to enforce such a restriction; certainly in >>> timestamp[tz]_in, but where else? >> >> Probably some of the casts (I'd think at least timestamp->timestamptz). >> Maybe timestamp[tz]_recv. Most of the time*pl* functions. :/ > > Please find attached a patch checks boundaries of date/timestamp[tz]. > There are more functions: converting to/from timestamptz, truncating, > constructing from date and time etc. > > I left the upper boundary as described[1] in the documentation > (294276-12-31 AD), lower - "as is" (4714-11-24 BC). > It is easy to change the lower boundary to 4713-01-01BC (as described > in the documentation) and it seems necessary because it allows to > simplify IS_VALID_JULIAN and IS_VALID_JULIAN4STAMPS and avoid the next > behavior: > > postgres=# select > postgres-# to_char(date_trunc('week', '4713-01-01 BC'::date),'day') > postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day') > postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day'); > to_char | to_char | to_char > -----------+-----------+----------- > monday | monday | thursday > (1 row) > > since 4714-12-28 BC and to the past detection when a week is starting > is broken (because it is boundary of isoyears -4713 and -4712). > Is it worth to break undocumented range or leave it as is? > > There is one more flaw: checking for a correctness begins from date > and if default TZ is not UTC, dump/restore of values of type > timestamptz which are close to allowed boundaries can be broken (and > such result can't be restored because date is not in allowed range): > postgres=# SET TIME ZONE 'GMT+1'; > SET > postgres=# COPY (SELECT '4714-11-24 00:00:00.000000+00 > BC'::timestamptz) TO STDOUT; > 4714-11-23 23:00:00-01 BC > > Also I'm asking for a help because the query (in default TZ='GMT+1'): > postgres=# SELECT '4714-11-24 00:00:00.000000+00 BC'::timestamptz; > > in psql gives a result "4714-11-23 23:00:00-01 BC", > but in a testing system gives "Sun Nov 23 23:00:00 4714 GMT BC" > without TZ offset. > > > I don't see what can be added to the documentation with the applied patch. > > More testings, finding bugs, uncovered functions, advice, comment > improvements are very appreciated. > > [1]http://www.postgresql.org/docs/devel/static/datatype-datetime.html I'm sorry, gmail hasn't added a header "In-Reply-To" to the last email. Previous thread is by the link[2]. http://www.postgresql.org/message-id/flat/CAKOSWNked3JOE++PEs49GMDNfR2ieu9A2jFCEZ6EW-+1c5_u9Q@mail.gmail.com -- Best regards, Vitaly Burovoy
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Vitaly Burovoy
Date:
> <overquoting> Added to the commitfest 2016-03. [CF] https://commitfest.postgresql.org/9/540/ -- Best regards, Vitaly Burovoy
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
David Steele
Date:
On 2/25/16 4:44 PM, Vitaly Burovoy wrote: > Added to the commitfest 2016-03. > > [CF] https://commitfest.postgresql.org/9/540/ This looks like a fairly straight-forward bug fix (the size of the patch is deceptive because there a lot of new tests included). It applies cleanly. Anastasia, I see you have signed up to review. Do you have an idea when you will get the chance to do that? Thanks, -- -David david@pgmasters.net
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Mark Dilger
Date:
> On Mar 14, 2016, at 6:23 AM, David Steele <david@pgmasters.net> wrote: > > On 2/25/16 4:44 PM, Vitaly Burovoy wrote: > >> Added to the commitfest 2016-03. >> >> [CF] https://commitfest.postgresql.org/9/540/ > > This looks like a fairly straight-forward bug fix (the size of the patch is deceptive because there a lot of new testsincluded). It applies cleanly. > > Anastasia, I see you have signed up to review. Do you have an idea when you will get the chance to do that? The first thing I notice about this patch is that src/include/datatype/timestamp.h has some #defines that are brittle. The #defines have comments explaining their logic, but I'd rather embed that in the #define directly: This: +#ifdef HAVE_INT64_TIMESTAMP +#define MIN_TIMESTAMP INT64CONST(-211813488000000000) +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */ +#define MAX_TIMESTAMP INT64CONST(9223371331200000000) +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */ +#else +#define MIN_TIMESTAMP -211813488000.0 +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */ +#define MAX_TIMESTAMP 9223371331200.0 +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */ +#endif Could be written as: #ifdef HAVE_INT64_TIMESTAMP #define MIN_TIMESTAMP ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) #define MAX_TIMESTAMP ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) #else #define MIN_TIMESTAMP ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) #define MAX_TIMESTAMP ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) #endif I assume modern compilers would convert these to the same constants at compile-time, rather than impose a runtime penalty. The #defines would be less brittle in the event, for example, that the postgres epoch were ever changed. Mark Dilger
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Anastasia Lubennikova
Date:
14.03.2016 16:23, David Steele: > On 2/25/16 4:44 PM, Vitaly Burovoy wrote: > >> Added to the commitfest 2016-03. >> >> [CF] https://commitfest.postgresql.org/9/540/ > > This looks like a fairly straight-forward bug fix (the size of the > patch is deceptive because there a lot of new tests included). It > applies cleanly. > > Anastasia, I see you have signed up to review. Do you have an idea > when you will get the chance to do that? > > Thanks, I've read the patch thoroughly and haven't found any problems. I think that the patch is in a very good shape. It fixes a bug and has an excellent set of tests. There is an issue, mentioned in the thread above: >postgres=# select >postgres-# to_char(date_trunc('week', '4713-01-01 BC'::date),'day') >postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day') >postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day'); > to_char | to_char | to_char >-----------+-----------+----------- > monday | monday | thursday >(1 row) >since 4714-12-28 BC and to the past detection when a week is starting >is broken (because it is boundary of isoyears -4713 and -4712). >Is it worth to break undocumented range or leave it as is? But I suppose that behavior of undocumented dates is not essential. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Vitaly Burovoy
Date:
On 3/14/16, Mark Dilger <hornschnorter@gmail.com> wrote: > The first thing I notice about this patch is that > src/include/datatype/timestamp.h > has some #defines that are brittle. The #defines have comments explaining > their logic, but I'd rather embed that in the #define directly: > > This: > > +#ifdef HAVE_INT64_TIMESTAMP > +#define MIN_TIMESTAMP INT64CONST(-211813488000000000) > +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */ > +#define MAX_TIMESTAMP INT64CONST(9223371331200000000) > +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC > */ > +#else > +#define MIN_TIMESTAMP -211813488000.0 > +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */ > +#define MAX_TIMESTAMP 9223371331200.0 > +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */ > +#endif > > Could be written as: > > #ifdef HAVE_INT64_TIMESTAMP > #define MIN_TIMESTAMP > ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) > #define MAX_TIMESTAMP > ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) > #else > #define MIN_TIMESTAMP > ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) > #define MAX_TIMESTAMP > ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) > #endif > > I assume modern compilers would convert these to the same constants at > compile-time, Firstly, Postgres is compiling not only by modern compilers. > rather than impose a runtime penalty. Secondary, It is hard to write it correctly obeying Postgres' coding standard (e.g. 80-columns border) and making it clear: it is not so visual difference between USECS_PER_DAY and SECS_PER_DAY in the expressions above (but it is vivid in comments in the file). Also a questions raise "Why is INT64CONST(0) necessary in `#else' block" (whereas `float' is necessary there) or "Why is INT64CONST set for MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_ int64). The file is full of constants. For example, JULIAN_MAX and SECS_PER_DAY are one of them. I would leave it as is. > The #defines would be less brittle in > the event, for example, that the postgres epoch were ever changed. I don't think it is real, and even in such case all constants are collected together in the file and will be found and changed at once. > Mark Dilger -- Best regards, Vitaly Burovoy
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Vitaly Burovoy
Date:
On 3/14/16, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > 14.03.2016 16:23, David Steele: >> On 2/25/16 4:44 PM, Vitaly Burovoy wrote: >> >>> Added to the commitfest 2016-03. >>> >>> [CF] https://commitfest.postgresql.org/9/540/ >> >> This looks like a fairly straight-forward bug fix (the size of the >> patch is deceptive because there a lot of new tests included). It >> applies cleanly. >> >> Anastasia, I see you have signed up to review. Do you have an idea >> when you will get the chance to do that? >> >> Thanks, > > I've read the patch thoroughly and haven't found any problems. I think > that the patch is in a very good shape. > It fixes a bug and has an excellent set of tests. > > There is an issue, mentioned in the thread above: > >>postgres=# select >>postgres-# to_char(date_trunc('week', '4713-01-01 BC'::date),'day') >>postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day') >>postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day'); >> to_char | to_char | to_char >>-----------+-----------+----------- >> monday | monday | thursday >>(1 row) > >>since 4714-12-28 BC and to the past detection when a week is starting >>is broken (because it is boundary of isoyears -4713 and -4712). >>Is it worth to break undocumented range or leave it as is? > > But I suppose that behavior of undocumented dates is not essential. I'm sorry... What should I do with "Waiting on Author" state if you don't have complaints? > -- > Anastasia Lubennikova > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- Best regards, Vitaly Burovoy
Re: Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Anastasia Lubennikova
Date:
15.03.2016 03:21, Vitaly Burovoy: > On 3/14/16, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: >> 14.03.2016 16:23, David Steele: >>> On 2/25/16 4:44 PM, Vitaly Burovoy wrote: >>> >>>> Added to the commitfest 2016-03. >>>> >>>> [CF] https://commitfest.postgresql.org/9/540/ >>> This looks like a fairly straight-forward bug fix (the size of the >>> patch is deceptive because there a lot of new tests included). It >>> applies cleanly. >>> >>> Anastasia, I see you have signed up to review. Do you have an idea >>> when you will get the chance to do that? >>> >>> Thanks, >> I've read the patch thoroughly and haven't found any problems. I think >> that the patch is in a very good shape. >> It fixes a bug and has an excellent set of tests. >> >> There is an issue, mentioned in the thread above: >> >>> postgres=# select >>> postgres-# to_char(date_trunc('week', '4713-01-01 BC'::date),'day') >>> postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day') >>> postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day'); >>> to_char | to_char | to_char >>> -----------+-----------+----------- >>> monday | monday | thursday >>> (1 row) >>> since 4714-12-28 BC and to the past detection when a week is starting >>> is broken (because it is boundary of isoyears -4713 and -4712). >>> Is it worth to break undocumented range or leave it as is? >> But I suppose that behavior of undocumented dates is not essential. > I'm sorry... What should I do with "Waiting on Author" state if you > don't have complaints? > I was going to set "Ready for Committer", but then I've noticed message from Mark Dilger and changed my mind. Now, when you answered him, I have no objections. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Mark Dilger
Date:
> On Mar 14, 2016, at 5:12 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > > On 3/14/16, Mark Dilger <hornschnorter@gmail.com> wrote: >> The first thing I notice about this patch is that >> src/include/datatype/timestamp.h >> has some #defines that are brittle. The #defines have comments explaining >> their logic, but I'd rather embed that in the #define directly: >> >> This: >> >> +#ifdef HAVE_INT64_TIMESTAMP >> +#define MIN_TIMESTAMP INT64CONST(-211813488000000000) >> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */ >> +#define MAX_TIMESTAMP INT64CONST(9223371331200000000) >> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC >> */ >> +#else >> +#define MIN_TIMESTAMP -211813488000.0 >> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */ >> +#define MAX_TIMESTAMP 9223371331200.0 >> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */ >> +#endif >> >> Could be written as: >> >> #ifdef HAVE_INT64_TIMESTAMP >> #define MIN_TIMESTAMP >> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) >> #define MAX_TIMESTAMP >> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) >> #else >> #define MIN_TIMESTAMP >> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) >> #define MAX_TIMESTAMP >> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) >> #endif >> >> I assume modern compilers would convert these to the same constants at >> compile-time, > > Firstly, Postgres is compiling not only by modern compilers. Do you have an example of a compiler that will not do this constant folding at compile time? >> rather than impose a runtime penalty. > > Secondary, It is hard to write it correctly obeying Postgres' coding > standard (e.g. 80-columns border) and making it clear: it is not so > visual difference between USECS_PER_DAY and SECS_PER_DAY in the > expressions above (but it is vivid in comments in the file). Hmm. I think using USECS_PER_DAY is perfectly clear, but that is a personal opinion. I don't see any way to argue if you don't see it that way. > Also a > questions raise "Why is INT64CONST(0) necessary in `#else' block" > (whereas `float' is necessary there) You appear to be correct. The second half should be defined in terms of float. I compiled on a system with int64 support, so I didn't notice. Thanks for catching that. > or "Why is INT64CONST set for > MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_ > int64). I was only casting the zero to int64. That doesn't seem necessary, so it can be removed. Both MIN_TIMESTAMP and MAX_TIMESTAMP were defined in terms of USECS_PER_DAY, which itself is defined in terms of INT64CONST, so I believe they both work out to be an int64 constant. > The file is full of constants. For example, JULIAN_MAX and > SECS_PER_DAY are one of them. JULIAN_MAXYEAR and JULIAN_MAXYEAR4STAMPS appear to be magic numbers with no explanation. I would not point to them as examples to be followed. > I would leave it as is. > >> The #defines would be less brittle in >> the event, for example, that the postgres epoch were ever changed. > > I don't think it is real, and even in such case all constants are > collected together in the file and will be found and changed at once. I agree that they would be found at once. I disagree that the example is not real, as I have changed the postgres epoch myself in some builds, to be able to use int32 timestamps on small devices. Regards, Mark Dilger
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Mark Dilger
Date:
> On Mar 15, 2016, at 8:35 AM, Mark Dilger <hornschnorter@gmail.com> wrote: > > >> On Mar 14, 2016, at 5:12 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: >> >> On 3/14/16, Mark Dilger <hornschnorter@gmail.com> wrote: >>> The first thing I notice about this patch is that >>> src/include/datatype/timestamp.h >>> has some #defines that are brittle. The #defines have comments explaining >>> their logic, but I'd rather embed that in the #define directly: >>> >>> This: >>> >>> +#ifdef HAVE_INT64_TIMESTAMP >>> +#define MIN_TIMESTAMP INT64CONST(-211813488000000000) >>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */ >>> +#define MAX_TIMESTAMP INT64CONST(9223371331200000000) >>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC >>> */ >>> +#else >>> +#define MIN_TIMESTAMP -211813488000.0 >>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */ >>> +#define MAX_TIMESTAMP 9223371331200.0 >>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */ >>> +#endif >>> >>> Could be written as: >>> >>> #ifdef HAVE_INT64_TIMESTAMP >>> #define MIN_TIMESTAMP >>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) >>> #define MAX_TIMESTAMP >>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) >>> #else >>> #define MIN_TIMESTAMP >>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) >>> #define MAX_TIMESTAMP >>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) >>> #endif >>> >>> I assume modern compilers would convert these to the same constants at >>> compile-time, >> >> Firstly, Postgres is compiling not only by modern compilers. > > Do you have an example of a compiler that will not do this constant folding > at compile time? > >>> rather than impose a runtime penalty. >> >> Secondary, It is hard to write it correctly obeying Postgres' coding >> standard (e.g. 80-columns border) and making it clear: it is not so >> visual difference between USECS_PER_DAY and SECS_PER_DAY in the >> expressions above (but it is vivid in comments in the file). > > Hmm. I think using USECS_PER_DAY is perfectly clear, but that is a personal > opinion. I don't see any way to argue if you don't see it that way. > >> Also a >> questions raise "Why is INT64CONST(0) necessary in `#else' block" >> (whereas `float' is necessary there) > > You appear to be correct. The second half should be defined in terms of > float. I compiled on a system with int64 support, so I didn't notice. Thanks > for catching that. > >> or "Why is INT64CONST set for >> MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_ >> int64). > > I was only casting the zero to int64. That doesn't seem necessary, so it can > be removed. Both MIN_TIMESTAMP and MAX_TIMESTAMP were defined > in terms of USECS_PER_DAY, which itself is defined in terms of INT64CONST, > so I believe they both work out to be an int64 constant. > >> The file is full of constants. For example, JULIAN_MAX and >> SECS_PER_DAY are one of them. > > JULIAN_MAXYEAR and JULIAN_MAXYEAR4STAMPS appear to be magic > numbers with no explanation. I would not point to them as examples to be > followed. > >> I would leave it as is. >> >>> The #defines would be less brittle in >>> the event, for example, that the postgres epoch were ever changed. >> >> I don't think it is real, and even in such case all constants are >> collected together in the file and will be found and changed at once. > > I agree that they would be found at once. I disagree that the example > is not real, as I have changed the postgres epoch myself in some builds, > to be able to use int32 timestamps on small devices. > > Regards, > Mark Dilger I forgot to mention that I am not trying to block this commit. These were just my observations about the patch, for your consideration.
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Tom Lane
Date:
Mark Dilger <hornschnorter@gmail.com> writes: >> On Mar 14, 2016, at 5:12 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: >> I don't think it is real, and even in such case all constants are >> collected together in the file and will be found and changed at once. > I agree that they would be found at once. I disagree that the example > is not real, as I have changed the postgres epoch myself in some builds, > to be able to use int32 timestamps on small devices. I concur with Vitaly that it's not this patch's job to make it easier to change the epoch date. If you want to submit a patch for that purpose, you're welcome to. I have a bigger problem though: I see that the patch enforces AD 294277 as the endpoint for both integer and floating-point datetimes. This contradicts the statement in the docs (section 8.5) that Note that using floating-point datetimes allows a larger range of timestamp values to be represented than shown above:from 4713 BC up to 5874897 AD. Since that is just about the only non-historical reason why somebody might wish to use float timestamps, I'm rather reluctant to remove the feature, especially without any discussion --- and I don't see any discussion of this point upthread. My feeling is we ought to preserve the old behavior here, which would involve making JULIAN_MAXYEAR_FOR_TIMESTAMPS format-dependent and adjusting the float values for the two derived constants; not much of a problem code-wise. I think though that it would break quite a number of the proposed new regression tests for the float case. TBH, I thought the number of added test cases was rather excessive anyway, so I wouldn't have a problem with just leaving out whichever ones don't pass with both build options. Comments? regards, tom lane
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Tom Lane
Date:
I wrote: > My feeling is we ought to preserve the old behavior here, which would > involve making JULIAN_MAXYEAR_FOR_TIMESTAMPS format-dependent and > adjusting the float values for the two derived constants; not much of a > problem code-wise. I think though that it would break quite a number of > the proposed new regression tests for the float case. TBH, I thought > the number of added test cases was rather excessive anyway, so I wouldn't > have a problem with just leaving out whichever ones don't pass with both > build options. Actually, it seems a lot of the provided test cases fail for float timestamps anyway; there's an assumption that294276-12-31 23:59:59.999999294277-01-01 00:00:00.000000 are distinct timestamps, which they are not in float mode. regards, tom lane
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Tom Lane
Date:
I had almost gotten to the point of being willing to commit this patch when I noticed that it fails to fix the originally-complained-of-problem: regression=# set time zone 'GMT+1'; SET regression=# select '4714-11-24 00:00:00+00 BC'::timestamptz; timestamptz ---------------------------4714-11-23 23:00:00-01 BC (1 row) regression=# select '4714-11-23 23:00:00-01 BC'::timestamptz; ERROR: timestamp out of range: "4714-11-23 23:00:00-01 BC" LINE 1: select '4714-11-23 23:00:00-01 BC'::timestamptz; ^ The problem here is that the timestamp satisfies IS_VALID_TIMESTAMP just fine, but its printed form contains a date that the Julian-day routines can't handle. AFAICS the only way that we can avoid a dump/reload hazard is to tighten up the allowed range of timestamps by at least one day, so that any timestamp that passes IS_VALID_TIMESTAMP() is guaranteed to print, in any timezone, with a contained date that the Julian-day routines can handle. I'd be inclined to set the lower limit of timestamps as '4713-01-01 00:00 GMT BC' just to keep things simple. (The upper limit can stay where it is.) While dates don't have this timezone rotation problem, the documentation says that they have the same lower-bound limit as timestamps, and there are places in the code that assume that too. Is it okay to move their lower bound as well? regards, tom lane
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Vitaly Burovoy
Date:
On 2016-03-16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > My feeling is we ought to preserve the old behavior here, which would > involve making JULIAN_MAXYEAR_FOR_TIMESTAMPS format-dependent and > adjusting the float values for the two derived constants; not much of a > problem code-wise. I think though that it would break quite a number of > the proposed new regression tests for the float case. I think I can be solved by adding new tests for new upper bound for float values and creating a new version of expected file like date_1.out, horology_1.out, timestamp_1.out and timestamptz.out (the original files are for integer values; with prefix "_1" are for float ones). > TBH, I thought > the number of added test cases was rather excessive anyway, so I wouldn't > have a problem with just leaving out whichever ones don't pass with both > build options. The tests checks edge cases. In case of saving bigger interval of allowed values for float values you'll remove checks when they should fail. What's the left cases for? On 2016-03-16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Actually, it seems a lot of the provided test cases fail for float > timestamps anyway; there's an assumption that > 294276-12-31 23:59:59.999999 > 294277-01-01 00:00:00.000000 > are distinct timestamps, which they are not in float mode. I'm ready to change fractional part '.999999' to '.5' (to avoid issues of different implementation of floating point on different architectures) to emphasize "almost reached the threshold". On 2016-03-16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > AFAICS the only way that we can avoid a dump/reload hazard is to tighten > up the allowed range of timestamps by at least one day, so that any > timestamp that passes IS_VALID_TIMESTAMP() is guaranteed to print, in > any timezone, with a contained date that the Julian-day routines can > handle. I'd be inclined to set the lower limit of timestamps as > '4713-01-01 00:00 GMT BC' just to keep things simple. (The upper limit > can stay where it is.) > > While dates don't have this timezone rotation problem, the documentation > says that they have the same lower-bound limit as timestamps, and there > are places in the code that assume that too. Is it okay to move their > lower bound as well? I think it is important (see initial letter) since it is out of supported interval (according to the documentation) and don't work as expected in some cases (see "to_char(date_trunc('week', '4714-12-28 BC'::date),'day')"). It seems it leads to change IS_VALID_JULIAN[_FOR_STAMPS] as well to something like: #define IS_VALID_JULIAN(y,m,d) \((JULIAN_MINYEAR < (y) && (y) < JULIAN_MAXYEAR) ||(JULIAN_MINYEAR == (y) && (m) == 12 &&(d) == 31) ||(JULIAN_MAXYEAR == (y) && (m) == 01 && (d) == 01)) It can be correct if checks for IS_VALID_DATE is added in date.c to places marked as "IS_VALID_JULIAN is enough for checking..." All other places are (I'm sure) covered by IS_VALID_DATE/IS_VALID_TIMESTAMP. What about the question: On 2016-02-24, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > Also I'm asking for a help because the query (in default TZ='GMT+1'): > postgres=# SELECT '4714-11-24 00:00:00.000000+00 BC'::timestamptz; > > in psql gives a result "4714-11-23 23:00:00-01 BC", > but in a testing system gives "Sun Nov 23 23:00:00 4714 GMT BC" > without TZ offset. Why there is "GMT" instead of "GMT+01"? Is it bug? If so should it be fixed in this patch or can be fixed later? -- Best regards, Vitaly Burovoy
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Vitaly Burovoy
Date:
On 2016-03-15, Mark Dilger <hornschnorter@gmail.com> wrote: > >> On Mar 14, 2016, at 5:12 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> >> wrote: >> >> On 3/14/16, Mark Dilger <hornschnorter@gmail.com> wrote: >>> The first thing I notice about this patch is that >>> src/include/datatype/timestamp.h >>> has some #defines that are brittle. The #defines have comments >>> explaining >>> their logic, but I'd rather embed that in the #define directly: >>> >>> This: >>> >>> +#ifdef HAVE_INT64_TIMESTAMP >>> +#define MIN_TIMESTAMP INT64CONST(-211813488000000000) >>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 * USECS_PER_SEC */ >>> +#define MAX_TIMESTAMP INT64CONST(9223371331200000000) >>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 * >>> USECS_PER_SEC >>> */ >>> +#else >>> +#define MIN_TIMESTAMP -211813488000.0 >>> +/* == (0 - POSTGRES_EPOCH_JDATE) * 86400 */ >>> +#define MAX_TIMESTAMP 9223371331200.0 >>> +/* == (JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * 86400 */ >>> +#endif >>> >>> Could be written as: >>> >>> #ifdef HAVE_INT64_TIMESTAMP >>> #define MIN_TIMESTAMP >>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) >>> #define MAX_TIMESTAMP >>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * USECS_PER_DAY) >>> #else >>> #define MIN_TIMESTAMP >>> ((INT64CONST(0) - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) >>> #define MAX_TIMESTAMP >>> ((JULIAN_MAX4STAMPS - POSTGRES_EPOCH_JDATE) * SECS_PER_DAY) >>> #endif >>> >>> I assume modern compilers would convert these to the same constants at >>> compile-time, >> >> Firstly, Postgres is compiling not only by modern compilers. > > Do you have an example of a compiler that will not do this constant folding > at compile time? No, I'm not good at knowing features of all versions and all kings of compilers, but I'm sure constants are better than expressions for big values. =) >>> rather than impose a runtime penalty. >> >> Secondary, It is hard to write it correctly obeying Postgres' coding >> standard (e.g. 80-columns border) and making it clear: it is not so >> visual difference between USECS_PER_DAY and SECS_PER_DAY in the >> expressions above (but it is vivid in comments in the file). > > Hmm. I think using USECS_PER_DAY is perfectly clear, but that is a > personal > opinion. I don't see any way to argue if you don't see it that way. I'm talking about perception of the constants when they a very similar but they are not justified by a single column (where difference _in_expressions_ are clear). There was a difference by a single char ("U") only which is not so obvious without deep looking on it (be honest I'd missed it until started to write an answer). >> or "Why is INT64CONST set for >> MIN_TIMESTAMP, but not for MAX_TIMESTAMP?" (JULIAN_MAX4STAMPS is _not_ >> int64). > > I was only casting the zero to int64. That doesn't seem necessary, so it > can > be removed. Both MIN_TIMESTAMP and MAX_TIMESTAMP were defined > in terms of USECS_PER_DAY, which itself is defined in terms of INT64CONST, > so I believe they both work out to be an int64 constant. I hope so. But in such cases I'd prefer to _begin_ calculations from int64, not to _finish_ by it. It is impossible to pass constants (like JULIAN_MAX4STAMPS) to INT64CONST macros. Inserting "(int64)" makes rows larger by 7 chars... ;-) >>> The #defines would be less brittle in >>> the event, for example, that the postgres epoch were ever changed. >> >> I don't think it is real, and even in such case all constants are >> collected together in the file and will be found and changed at once. > > I agree that they would be found at once. I disagree that the example > is not real, as I have changed the postgres epoch myself in some builds, > to be able to use int32 timestamps on small devices. Wow! I'm sorry, I didn't know about it. But in such case (tighten to int32) there are more than two places which should be changed. Two more (four with disabled HAVE_INT64_TIMESTAMP) constants are not big deal with it. -- Best regards, Vitaly Burovoy
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Mark Dilger
Date:
> >>>> The #defines would be less brittle in >>>> the event, for example, that the postgres epoch were ever changed. >>> >>> I don't think it is real, and even in such case all constants are >>> collected together in the file and will be found and changed at once. >> >> I agree that they would be found at once. I disagree that the example >> is not real, as I have changed the postgres epoch myself in some builds, >> to be able to use int32 timestamps on small devices. > > Wow! I'm sorry, I didn't know about it. > But in such case (tighten to int32) there are more than two places > which should be changed. Two more (four with disabled > HAVE_INT64_TIMESTAMP) constants are not big deal with it. Please, do not worry about that. I do not mean that your code needs to be compatible with my fork. Regards, Mark Dilger
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Tom Lane
Date:
I wrote: > I had almost gotten to the point of being willing to commit this patch > when I noticed that it fails to fix the originally-complained-of-problem: > ... > AFAICS the only way that we can avoid a dump/reload hazard is to tighten > up the allowed range of timestamps by at least one day, so that any > timestamp that passes IS_VALID_TIMESTAMP() is guaranteed to print, in > any timezone, with a contained date that the Julian-day routines can > handle. I realized that there's a way to do this without reducing the historical limits of the datatypes. It's already the case that date2j() can handle dates a little beyond the stated upper limit of the date type, and if you take a close look at it, you'll realize that it doesn't have a problem with dates a little before 4714-11-24 BC, either. (It looks like it would work back to about 4800 BC, though I've not attempted to verify that independently.) All we really need is for it to return -1 for 4714-11-23 BC, and it certainly does that. Then, if we allow such a date to pass IS_VALID_JULIAN, we can accept something like '4714-11-23 23:00:00-01 BC'::timestamptz and apply the exact range check only after doing the timestamp adjustment. The same trick works at the upper end of the range. The key is simply that we have to have an explicit range check at the end rather than relying entirely on IS_VALID_JULIAN --- but this patch was about 80% of the way there already. So I fixed that up and committed it, with a very short set of new regression test cases. I seriously doubt that the other ones add enough value to be worth trying to make them work in both float- and int-timestamp cases; though if you want to submit a new patch to add more test cases we could consider it. My feeling about it is that that kind of testing does nothing for errors of omission (ie functions that fail to range-check their results), which is the most likely sort of bug here, and so I doubt it's worth adding them to regression tests that many people will run many times a day for a long time to come. regards, tom lane
Re: Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Tom Lane
Date:
Anastasia Lubennikova <a.lubennikova@postgrespro.ru> writes: > There is an issue, mentioned in the thread above: >> postgres=# select >> postgres-# to_char(date_trunc('week', '4713-01-01 BC'::date),'day') >> postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day') >> postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day'); >> to_char | to_char | to_char >> -----------+-----------+----------- >> monday | monday | thursday >> (1 row) >> since 4714-12-28 BC and to the past detection when a week is starting >> is broken (because it is boundary of isoyears -4713 and -4712). BTW, I think the actual problem is that j2day() figured that coercing its argument to unsigned int would be sufficient to produce a sane answer for negative inputs. It isn't. Nobody sees this with inputs after 4714BC, but when probing in 4714 the code considers the reference point 4714-01-04, which has a negative Julian date and so we end up passing a negative date to j2day(). regards, tom lane
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Vitaly Burovoy
Date:
On 3/16/16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So I fixed that up and committed it, with a very short set of new > regression test cases. I seriously doubt that the other ones add > enough value to be worth trying to make them work in both float- and > int-timestamp cases; though if you want to submit a new patch to > add more test cases we could consider it. My feeling about it is that > that kind of testing does nothing for errors of omission (ie functions > that fail to range-check their results), which is the most likely sort > of bug here, and so I doubt it's worth adding them to regression tests > that many people will run many times a day for a long time to come. > > regards, tom lane Thank you very much! If you decide such kind of tests is not necessary, it is enough for me. Is there any reason to leave JULIAN_MINDAY and JULIAN_MAXDAY which are not used now? Also why JULIAN_MAXMONTH is set to "6" whereas {DATE|TIMESTAMP}_END_JULIAN use "1" as month? -- Best regards, Vitaly Burovoy
Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check
From
Tom Lane
Date:
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: > Is there any reason to leave JULIAN_MINDAY and JULIAN_MAXDAY which are > not used now? Those are just there to document what the limits really are. Possibly some code would need them in future. > Also why JULIAN_MAXMONTH is set to "6" whereas > {DATE|TIMESTAMP}_END_JULIAN use "1" as month? Because we're intentionally allowing a wider range for IS_VALID_JULIAN than for IS_VALID_DATE/TIMESTAMP. regards, tom lane