Thread: Extract epoch from Interval weird behavior
Hi all, I noticed something odd when going through some of the Interval code. The DAYS_PER_YEAR constant is defined in src/include/datatype/timestamp.h. > #define DAYS_PER_YEAR 365.25 /* assumes leap year every four years */ We execute the EXTRACT and date_part functions in src/backend/utils/adt/timestamp.c in > static Datum > interval_part_common(PG_FUNCTION_ARGS, bool retnumeric) When executing date_part we multiply the total years in the Interval by DAYS_PER_YEAR > result += ((double) DAYS_PER_YEAR * SECS_PER_DAY) * (interval->month / MONTHS_PER_YEAR); > result += ((double) DAYS_PER_MONTH * SECS_PER_DAY) * (interval->month % MONTHS_PER_YEAR); > result += ((double) SECS_PER_DAY) * interval->day; However when executing EXTRACT we first truncate DAYS_PER_YEAR to an integer, and then multiply it by the total years in the Interval /* this always fits into int64 */ > secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) + > (int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) + > interval->day) * SECS_PER_DAY; Is this truncation on purpose? It seems like EXTRACT is not accounting for leap years in it's calculation. - Joe Koshakow
On Wed, Feb 23, 2022 at 7:42 PM Joseph Koshakow <koshy44@gmail.com> wrote: > > Hi all, > > I noticed something odd when going through some > of the Interval code. The DAYS_PER_YEAR constant > is defined in src/include/datatype/timestamp.h. > > #define DAYS_PER_YEAR 365.25 /* assumes leap year every four years */ > > We execute the EXTRACT and date_part functions in > src/backend/utils/adt/timestamp.c in > > static Datum > > interval_part_common(PG_FUNCTION_ARGS, bool retnumeric) > > When executing date_part we multiply the total > years in the Interval by DAYS_PER_YEAR > > result += ((double) DAYS_PER_YEAR * SECS_PER_DAY) * (interval->month / MONTHS_PER_YEAR); > > result += ((double) DAYS_PER_MONTH * SECS_PER_DAY) * (interval->month % MONTHS_PER_YEAR); > > result += ((double) SECS_PER_DAY) * interval->day; > > > However when executing EXTRACT we first truncate > DAYS_PER_YEAR to an integer, and then multiply it > by the total years in the Interval > /* this always fits into int64 */ > > secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) + > > (int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) + > > interval->day) * SECS_PER_DAY; > > Is this truncation on purpose? It seems like > EXTRACT is not accounting for leap years in > it's calculation. > > - Joe Koshakow Oops I sent that to the wrong email. If this isn't intented I've created a patch that fixes it, with the following two open questions * DAYS_PER_YEAR_NUM is recalculated every time. Is there anyway to convert a float directly to a numeric to avoid this? * For some reason the change adds a lot of trailing zeros to the result. I'm not sure why that is. - Joe Koshakow
Attachment
Hi Joseph,
> > Is this truncation on purpose? It seems like
> > EXTRACT is not accounting for leap years in
> > it's calculation.
Extracting an epoch from an interval is quite a strange case since intervals are not connected to any specific dates.
For instance:
select extract('epoch' from interval '1 month')
.. returns 2592000 = 30*24*60*60. But what if the month is February? Should we account for the different number of days for intervals like 6 months or 24 months?
Also, leap years don't just happen every 4 years. Here is the actual logic:
bool is_leap_year(int Y) {
if(Y % 400 == 0) return true;
else if(Y % 100 == 0) return false;
else if(Y % 4 == 0) return true;
else return false;
}
And what about leap seconds?
All in all, I don't think that the benefit of the proposed change outweighs the fact that it will break the previous behavior for the users who may rely on it. I suggest keeping it simple, i.e. the way it is now. What I think we could do instead is explicitly document this behavior in [1].
[1]: https://www.postgresql.org/docs/current/functions-datetime.html
--
Best regards,
Aleksander Alekseev
> > Is this truncation on purpose? It seems like
> > EXTRACT is not accounting for leap years in
> > it's calculation.
Extracting an epoch from an interval is quite a strange case since intervals are not connected to any specific dates.
For instance:
select extract('epoch' from interval '1 month')
.. returns 2592000 = 30*24*60*60. But what if the month is February? Should we account for the different number of days for intervals like 6 months or 24 months?
Also, leap years don't just happen every 4 years. Here is the actual logic:
bool is_leap_year(int Y) {
if(Y % 400 == 0) return true;
else if(Y % 100 == 0) return false;
else if(Y % 4 == 0) return true;
else return false;
}
And what about leap seconds?
All in all, I don't think that the benefit of the proposed change outweighs the fact that it will break the previous behavior for the users who may rely on it. I suggest keeping it simple, i.e. the way it is now. What I think we could do instead is explicitly document this behavior in [1].
[1]: https://www.postgresql.org/docs/current/functions-datetime.html
--
Best regards,
Aleksander Alekseev
On Thu, Feb 24, 2022 at 4:47 AM Aleksander Alekseev <aleksander@timescale.com> wrote: > Extracting an epoch from an interval is quite a strange case since intervals are not connected to any specific dates. I agree, I think it's a weird use case and that it's probably not worth fixing. Though it was fun for me to try. > > All in all, I don't think that the benefit of the proposed change outweighs the fact that it will break the previous behaviorfor the users who may rely on it. I suggest keeping it simple, i.e. the way it is now. What I think we could do insteadis explicitly document this behavior in [1]. > > [1]: https://www.postgresql.org/docs/current/functions-datetime.html I do want to briefly mention, if I'm understanding the history of EXTRACT correctly, that the previous behavior actually was to multiply by 365.25, not 365. However The commit that changed the return type from numeric [1] changed that behavior. Looking through the discussions [2], I don't see any mention of it, which makes me think it was a mistake. However there is a lot of discussion around numeric performance and being able to optimize numeric division because every divisor was a power of 10. Fixing this issue would break that assumption and cause some performance degradations which probably isn't worth it. [1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2da77cdb4661826482ebf2ddba1f953bc74afe4 [2]: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu - Joe Koshakow
Joseph Koshakow <koshy44@gmail.com> writes: > I do want to briefly mention, if I'm understanding the history of > EXTRACT correctly, that the previous behavior > actually was to multiply by 365.25, not 365. However The commit that > changed the return type from numeric [1] > changed that behavior. Looking through the discussions [2], I don't > see any mention of it, which makes me think > it was a mistake. Indeed ... Peter? regards, tom lane
On 24.02.22 03:35, Joseph Koshakow wrote: >> However when executing EXTRACT we first truncate >> DAYS_PER_YEAR to an integer, and then multiply it >> by the total years in the Interval >> /* this always fits into int64 */ >>> secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) + >>> (int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) + >>> interval->day) * SECS_PER_DAY; >> Is this truncation on purpose? It seems like >> EXTRACT is not accounting for leap years in >> it's calculation. This was not intentional. The cast is only to make the multiplication happen in int64; it didn't mean to drop any fractional parts. > Oops I sent that to the wrong email. If this isn't intented I've created a patch > that fixes it, with the following two open questions > * DAYS_PER_YEAR_NUM is recalculated every time. Is there anyway > to convert a float directly to a numeric to avoid this? We really wanted to avoid doing calculations in numeric as much as possible. So we should figure out a different way to write this. The attached patch works for me. It's a bit ugly since it hardcodes some factors. Maybe we can rephrase it a bit more elegantly.
Attachment
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > We really wanted to avoid doing calculations in numeric as much as > possible. So we should figure out a different way to write this. The > attached patch works for me. It's a bit ugly since it hardcodes some > factors. Maybe we can rephrase it a bit more elegantly. I think it's fine but needs some commentary. Maybe about like "To do this calculation in integer arithmetic even though DAYS_PER_YEAR is fractional, multiply everything by 4 and then divide by 4 again at the end. This relies on DAYS_PER_YEAR being a multiple of 0.25 and on SECS_PER_DAY being a multiple of 4." BTW, it might be good to parenthesize as (... big calculation ...) * (SECS_PER_DAY/4) to eliminate any question of whether the value could overflow before the final division by 4. regards, tom lane
On 08.04.22 15:10, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >> We really wanted to avoid doing calculations in numeric as much as >> possible. So we should figure out a different way to write this. The >> attached patch works for me. It's a bit ugly since it hardcodes some >> factors. Maybe we can rephrase it a bit more elegantly. > > I think it's fine but needs some commentary. Maybe about like > "To do this calculation in integer arithmetic even though > DAYS_PER_YEAR is fractional, multiply everything by 4 > and then divide by 4 again at the end. This relies on > DAYS_PER_YEAR being a multiple of 0.25 and on SECS_PER_DAY > being a multiple of 4." > > BTW, it might be good to parenthesize as > > (... big calculation ...) * (SECS_PER_DAY/4) > > to eliminate any question of whether the value could overflow > before the final division by 4. done that way