Thread: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?
Hi hackers, There are several overloaded versions of timezone() function. One version accepts timezone name and timestamptz and returns timestamp: =# show time zone; TimeZone --------------- Europe/Moscow =# select timezone('MSK', '2021-08-30 12:34:56 MSK' :: timestamptz); timezone --------------------- 2021-08-30 12:34:56 This function is marked as IMMUTABLE and it's possible to use it in functional indexes. I believe it's a bug. Since the function accepts the name of the time zone, and the rules of time zones change, this function may return different results for the same arguments in the future. This makes it STABLE, or at least definitely not IMMUTABLE [1]. timezone(text, timestamp), which returns timestamptz should be STABLE as well for the same reasons. The proposed patch (v1) fixes this. Other versions of timezone() seem to be fine, except: =# \df+ timezone ... -[ RECORD 4 ]-------+--------------------------------------- Schema | pg_catalog Name | timezone Result data type | time with time zone Argument data types | text, time with time zone Type | func Volatility | volatile Parallel | safe Owner | eax Security | invoker Access privileges | Language | internal Source code | timetz_zone Description | adjust time with time zone to new zone ... Does anyone know the reason why, unlike other versions, it's marked VOLATILE? I attached an alternative version of the patch (v2), which fixes this too. None of the patches includes any regression tests. As I understand there is little reason to re-check the volatility stated in pg_proc.dat in runtime. [1]: https://www.postgresql.org/docs/current/xfunc-volatility.html -- Best regards, Aleksander Alekseev
Attachment
Aleksander Alekseev <aleksander@timescale.com> writes: > There are several overloaded versions of timezone() function. One > version accepts timezone name and timestamptz and returns timestamp: > This function is marked as IMMUTABLE and it's possible to use it in > functional indexes. I believe it's a bug. That's a deliberate choice IIRC. I agree that the behavior could change after a tzdata update, but if the standard is that "immutable" means "no conceivable future code or configuration change could alter the results", then there's not a lot of functions that will pass that test :-(. As a pretty relevant example, we're not going to stop marking text comparison operators as immutable, even though we know all too well that the OS' sort order might change underneath us. The loss of functionality and performance that would result from downgrading those to stable is just not acceptable. It's better to treat them as immutable and accept the risk of sometimes having to rebuild indexes. I don't see a lot of argument for treating tzdata changes differently from OS locale changes. > Other versions of timezone() seem to be fine, except: > Source code | timetz_zone > Does anyone know the reason why, unlike other versions, it's marked > VOLATILE? Looking at the code, it decides whether to use DST or not based on the current time ... which it gets using time(NULL). So the volatile marking is correct for this implementation, because it could change intra-query. This seems like a pretty dumb choice though: I'd think it'd make more sense to use the value of now() as the referent. Then it could be stable, and it'd also be faster because it wouldn't need its own kernel call. regards, tom lane
I wrote: > Aleksander Alekseev <aleksander@timescale.com> writes: >> [ why is timetz_zone volatile? ] Ah ... after a bit of digging in the git history, I found this [1]: Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: master Release: REL8_1_BR [35979e6c3] 2005-09-09 06:51:12 +0000 Given its current definition that depends on time(NULL), timetz_zone is certainly no longer immutable, but must indeed be marked volatile. I wonder if it should use the value of now() (that is, transaction start time) so that it could be marked stable. But it's probably not important enough to be worth changing the code for ... indeed, I'm not even going to force an initdb for this catalog change, seeing that we just did one a few hours ago. I wasn't excited enough about it personally to change it, and I'm still not --- but if you want to, send a patch. regards, tom lane [1] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=35979e6c3
Hi Tom,
Thanks for the quick reply.
> I don't see a lot of argument for treating tzdata changes differently
> from OS locale changes.
--
Best regards,
Aleksander Alekseev
Thanks for the quick reply.
> I don't see a lot of argument for treating tzdata changes differently
> from OS locale changes.
Got it. But in this case, what's your opinion on the differences between date_trunc() and timezone()? Shouldn't date_trunc() be always IMMUTABLE as well?
I can see pros and cons to be IMMUTABLE _or_ STABLE when dealing with time zones, but at least PostgreSQL should be consistent in this, right?
--
Best regards,
Aleksander Alekseev
Aleksander Alekseev <aleksander@timescale.com> writes: > Got it. But in this case, what's your opinion on the differences between > date_trunc() and timezone()? Shouldn't date_trunc() be always IMMUTABLE as > well? No, because date_trunc depends on the current timezone setting, or at least its stable variants do. regards, tom lane
On Mon, Aug 30, 2021 at 12:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Aleksander Alekseev <aleksander@timescale.com> writes:
> > Got it. But in this case, what's your opinion on the differences between
> > date_trunc() and timezone()? Shouldn't date_trunc() be always IMMUTABLE as
> > well?
>
> No, because date_trunc depends on the current timezone setting,
> or at least its stable variants do.
A light bulb went off in my head just now, because I modeled date_bin() in part on date_trunc(), but apparently it didn't get the memo that the variant with timezone should have been marked stable.
I believe it's been discussed before that it'd be safer if pg_proc.dat had the same defaults as CREATE FUNCTION, and this is further evidence for that.
--
John Naylor
EDB: http://www.enterprisedb.com
--
John Naylor
EDB: http://www.enterprisedb.com
John Naylor <john.naylor@enterprisedb.com> writes: > I believe it's been discussed before that it'd be safer if pg_proc.dat had > the same defaults as CREATE FUNCTION, and this is further evidence for that. Yeah, maybe so. It'd make the .dat file quite a bit bigger, but maybe less mistake-prone. regards, tom lane
Hi Tom, > No, because date_trunc depends on the current timezone setting, > or at least its stable variants do. Once again, many thanks for your answers! > I wasn't excited enough about it personally to change it, and I'm > still not --- but if you want to, send a patch. Here is the patch. -- Best regards, Aleksander Alekseev
Attachment
Aleksander Alekseev <aleksander@timescale.com> writes: >>> [ timetz_zone is VOLATILE ] >> I wasn't excited enough about it personally to change it, and I'm >> still not --- but if you want to, send a patch. > Here is the patch. I looked at this patch, and felt unhappy about the fact that it left timetz_zone() still depending on pg_time_t and pg_localtime, which nothing else in date.c does. Poking at it closer, I realized that the DYNTZ code path is actually completely broken, and has been for years. Observe: regression=# select timezone('America/Santiago', '12:34 -02'::timetz); timezone ------------- 11:34:00-03 (1 row) That's fine. But CLT, which should be entirely equivalent to America/Santiago, produces seeming garbage: regression=# select timezone('CLT', '12:34 -02'::timetz); timezone ------------------- 09:51:14-04:42:46 (1 row) <digression> What's happening there is that pg_localtime produces a struct tm containing POSIX-style values, in particular tm_year is relative to 1900. But DetermineTimeZoneAbbrevOffset expects a struct using the PG convention that tm_year is relative to "AD 0". So it sees a date in the second century AD, decides that that's way out of range, and falls back to the "LMT" offset provided by the tzdb database. That lines up with what you'd get from regression=# set timezone = 'America/Santiago'; SET regression=# select '0121-09-03 12:34'::timestamptz; timestamptz ------------------------------ 0121-09-03 12:34:00-04:42:46 (1 row) </digression> Basically the problem here is that this is incredibly hoary code that's never been touched or tested as we revised datetime-related APIs elsewhere. I'm fairly unhappy now that we don't have any regression test coverage for this function. However, I see no very good way to make that happen, because the interesting code paths will (by definition) produce different results at different times of year. I suppose we could carry two variant expected-files, but ick. The DYNTZ path is particularly problematic here, because that's only used for timezones that have changed definitions over time, meaning they're particularly likely to change again. Anyway, attached is a revised patch that gets rid of the antique code, and it produces correct results AFAICT. BTW, it's customary to *not* include catversion bumps in submitted patches, because that accomplishes little except to ensure that your patch will soon fail to apply. (This one already is failing.) If you feel a need to remind the committer that a catversion bump is needed, just comment to that effect in the submission email. I'm not entirely sure what to do about the discovery that the DYNTZ path has pre-existing breakage. Perhaps it'd be sensible to back-patch this patch, minus the catalog change. I doubt that anyone would have a problem with the nominal change of behavior near DST boundaries. Or we could just ignore the bug in the back branches, since it's fairly clear that basically no one uses this function. regards, tom lane diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c index 0bea16cb67..cf85a61778 100644 --- a/src/backend/utils/adt/date.c +++ b/src/backend/utils/adt/date.c @@ -3029,7 +3029,7 @@ extract_timetz(PG_FUNCTION_ARGS) /* timetz_zone() * Encode time with time zone type with specified time zone. - * Applies DST rules as of the current date. + * Applies DST rules as of the transaction start time. */ Datum timetz_zone(PG_FUNCTION_ARGS) @@ -3068,12 +3068,11 @@ timetz_zone(PG_FUNCTION_ARGS) } else if (type == DYNTZ) { - /* dynamic-offset abbreviation, resolve using current time */ - pg_time_t now = (pg_time_t) time(NULL); - struct pg_tm *tm; + /* dynamic-offset abbreviation, resolve using transaction start time */ + TimestampTz now = GetCurrentTransactionStartTimestamp(); + int isdst; - tm = pg_localtime(&now, tzp); - tz = DetermineTimeZoneAbbrevOffset(tm, tzname, tzp); + tz = DetermineTimeZoneAbbrevOffsetTS(now, tzname, tzp, &isdst); } else { @@ -3081,12 +3080,15 @@ timetz_zone(PG_FUNCTION_ARGS) tzp = pg_tzset(tzname); if (tzp) { - /* Get the offset-from-GMT that is valid today for the zone */ - pg_time_t now = (pg_time_t) time(NULL); - struct pg_tm *tm; + /* Get the offset-from-GMT that is valid now for the zone */ + TimestampTz now = GetCurrentTransactionStartTimestamp(); + struct pg_tm tm; + fsec_t fsec; - tm = pg_localtime(&now, tzp); - tz = -tm->tm_gmtoff; + if (timestamp2tm(now, &tz, &tm, &fsec, NULL, tzp) != 0) + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("timestamp out of range"))); } else { diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index c07b2c6a55..d068d6532e 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5953,7 +5953,7 @@ proname => 'timestamp_larger', prorettype => 'timestamp', proargtypes => 'timestamp timestamp', prosrc => 'timestamp_larger' }, { oid => '2037', descr => 'adjust time with time zone to new zone', - proname => 'timezone', provolatile => 'v', prorettype => 'timetz', + proname => 'timezone', provolatile => 's', prorettype => 'timetz', proargtypes => 'text timetz', prosrc => 'timetz_zone' }, { oid => '2038', descr => 'adjust time with time zone to new zone', proname => 'timezone', prorettype => 'timetz',
> BTW, it's customary to *not* include catversion bumps in submitted > patches Thanks, Tom. > Anyway, attached is a revised patch that gets rid of the antique > code, and it produces correct results AFAICT. I tested your patch against the current master branch 78aa616b on MacOS Catalina. I have nothing to add to the patch. > I'm fairly unhappy now that we don't have any > regression test coverage for this function. Yep, that's unfortunate. I see several tests for `AT TIME ZONE` syntax, which is a syntax sugar to timezone() with timestamp[tz] arguments. But considering how `timetz` type is broken in the first place [1], I'm not surprised few people feel motivated to do anything related to it. Do you think there is a possibility that one day we may be brave enough to get rid of this type? [1]: https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timetz -- Best regards, Aleksander Alekseev
Aleksander Alekseev <aleksander@timescale.com> writes: >> Anyway, attached is a revised patch that gets rid of the antique >> code, and it produces correct results AFAICT. > I tested your patch against the current master branch 78aa616b on > MacOS Catalina. I have nothing to add to the patch. Thanks. Pushed, along with a quick-and-dirty patch to resolve the DYNTZ problem in the back branches. >> I'm fairly unhappy now that we don't have any >> regression test coverage for this function. > Yep, that's unfortunate. I see several tests for `AT TIME ZONE` > syntax, which is a syntax sugar to timezone() with timestamp[tz] > arguments. But considering how `timetz` type is broken in the first > place [1], I'm not surprised few people feel motivated to do anything > related to it. Do you think there is a possibility that one day we may > be brave enough to get rid of this type? I'm afraid not, seeing that it's required by the SQL standard. I thought about adding tests based on the CLT example I showed upthread, and just accepting the need for two variant result files. Maybe we should do that. However, it still wouldn't be a great test, because it would not prove that the DST switchover happens at the right time of year, or indeed at all. So for the moment I didn't. regards, tom lane