Thread: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Aleksander Alekseev
Date:
Hi hackers,

There are several overloaded versions of timezone() function. One
version accepts timezone name and timestamptz and returns timestamp:

=# show time zone;
   TimeZone
---------------
 Europe/Moscow

=# select timezone('MSK', '2021-08-30 12:34:56 MSK' :: timestamptz);
      timezone
---------------------
 2021-08-30 12:34:56

This function is marked as IMMUTABLE and it's possible to use it in
functional indexes. I believe it's a bug. Since the function accepts
the name of the time zone, and the rules of time zones change, this
function may return different results for the same arguments in the
future. This makes it STABLE, or at least definitely not IMMUTABLE
[1]. timezone(text, timestamp), which returns timestamptz should be
STABLE as well for the same reasons.

The proposed patch (v1) fixes this.

Other versions of timezone() seem to be fine, except:

=# \df+ timezone
...
-[ RECORD 4 ]-------+---------------------------------------
Schema              | pg_catalog
Name                | timezone
Result data type    | time with time zone
Argument data types | text, time with time zone
Type                | func
Volatility          | volatile
Parallel            | safe
Owner               | eax
Security            | invoker
Access privileges   |
Language            | internal
Source code         | timetz_zone
Description         | adjust time with time zone to new zone
...


Does anyone know the reason why, unlike other versions, it's marked
VOLATILE? I attached an alternative version of the patch (v2), which
fixes this too. None of the patches includes any regression tests. As
I understand there is little reason to re-check the volatility stated
in pg_proc.dat in runtime.

[1]: https://www.postgresql.org/docs/current/xfunc-volatility.html

-- 
Best regards,
Aleksander Alekseev

Attachment

Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Tom Lane
Date:
Aleksander Alekseev <aleksander@timescale.com> writes:
> There are several overloaded versions of timezone() function. One
> version accepts timezone name and timestamptz and returns timestamp:
> This function is marked as IMMUTABLE and it's possible to use it in
> functional indexes. I believe it's a bug.

That's a deliberate choice IIRC.  I agree that the behavior could
change after a tzdata update, but if the standard is that "immutable"
means "no conceivable future code or configuration change could alter
the results", then there's not a lot of functions that will pass that
test :-(.

As a pretty relevant example, we're not going to stop marking text
comparison operators as immutable, even though we know all too well
that the OS' sort order might change underneath us.  The loss of
functionality and performance that would result from downgrading
those to stable is just not acceptable.  It's better to treat them
as immutable and accept the risk of sometimes having to rebuild
indexes.

I don't see a lot of argument for treating tzdata changes differently
from OS locale changes.

> Other versions of timezone() seem to be fine, except:
> Source code         | timetz_zone
> Does anyone know the reason why, unlike other versions, it's marked
> VOLATILE?

Looking at the code, it decides whether to use DST or not based on
the current time ... which it gets using time(NULL).  So the volatile
marking is correct for this implementation, because it could change
intra-query.  This seems like a pretty dumb choice though: I'd think
it'd make more sense to use the value of now() as the referent.
Then it could be stable, and it'd also be faster because it wouldn't
need its own kernel call.

            regards, tom lane



Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Tom Lane
Date:
I wrote:
> Aleksander Alekseev <aleksander@timescale.com> writes:
>> [ why is timetz_zone volatile? ]

Ah ... after a bit of digging in the git history, I found this [1]:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL8_1_BR [35979e6c3] 2005-09-09 06:51:12 +0000

    Given its current definition that depends on time(NULL), timetz_zone
    is certainly no longer immutable, but must indeed be marked volatile.
    I wonder if it should use the value of now() (that is, transaction
    start time) so that it could be marked stable.  But it's probably not
    important enough to be worth changing the code for ... indeed, I'm not
    even going to force an initdb for this catalog change, seeing that we
    just did one a few hours ago.

I wasn't excited enough about it personally to change it, and I'm
still not --- but if you want to, send a patch.

            regards, tom lane

[1] https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=35979e6c3



Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Aleksander Alekseev
Date:
Hi Tom,

Thanks for the quick reply.

> I don't see a lot of argument for treating tzdata changes differently
> from OS locale changes.

Got it. But in this case, what's your opinion on the differences between date_trunc() and timezone()? Shouldn't date_trunc() be always IMMUTABLE as well?

I can see pros and cons to be IMMUTABLE _or_ STABLE when dealing with time zones, but at least PostgreSQL should be consistent in this, right?

--
Best regards,
Aleksander Alekseev

Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Tom Lane
Date:
Aleksander Alekseev <aleksander@timescale.com> writes:
> Got it. But in this case, what's your opinion on the differences between
> date_trunc() and timezone()? Shouldn't date_trunc() be always IMMUTABLE as
> well?

No, because date_trunc depends on the current timezone setting,
or at least its stable variants do.

            regards, tom lane



Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
John Naylor
Date:

On Mon, Aug 30, 2021 at 12:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Aleksander Alekseev <aleksander@timescale.com> writes:
> > Got it. But in this case, what's your opinion on the differences between
> > date_trunc() and timezone()? Shouldn't date_trunc() be always IMMUTABLE as
> > well?
>
> No, because date_trunc depends on the current timezone setting,
> or at least its stable variants do.

A light bulb went off in my head just now, because I modeled date_bin() in part on date_trunc(), but apparently it didn't get the memo that the variant with timezone should have been marked stable.

I believe it's been discussed before that it'd be safer if pg_proc.dat had the same defaults as CREATE FUNCTION, and this is further evidence for that.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Tom Lane
Date:
John Naylor <john.naylor@enterprisedb.com> writes:
> I believe it's been discussed before that it'd be safer if pg_proc.dat had
> the same defaults as CREATE FUNCTION, and this is further evidence for that.

Yeah, maybe so.  It'd make the .dat file quite a bit bigger, but maybe
less mistake-prone.

            regards, tom lane



Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Aleksander Alekseev
Date:
Hi Tom,

> No, because date_trunc depends on the current timezone setting,
> or at least its stable variants do.

Once again, many thanks for your answers!

> I wasn't excited enough about it personally to change it, and I'm
> still not --- but if you want to, send a patch.

Here is the patch.

-- 
Best regards,
Aleksander Alekseev

Attachment

Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Tom Lane
Date:
Aleksander Alekseev <aleksander@timescale.com> writes:
>>> [ timetz_zone is VOLATILE ]
>> I wasn't excited enough about it personally to change it, and I'm
>> still not --- but if you want to, send a patch.

> Here is the patch.

I looked at this patch, and felt unhappy about the fact that it left
timetz_zone() still depending on pg_time_t and pg_localtime, which
nothing else in date.c does.  Poking at it closer, I realized that
the DYNTZ code path is actually completely broken, and has been
for years.  Observe:

regression=# select timezone('America/Santiago', '12:34 -02'::timetz);
  timezone   
-------------
 11:34:00-03
(1 row)

That's fine.  But CLT, which should be entirely equivalent
to America/Santiago, produces seeming garbage:

regression=# select timezone('CLT', '12:34 -02'::timetz);
     timezone      
-------------------
 09:51:14-04:42:46
(1 row)

<digression>
What's happening there is that pg_localtime produces a struct tm
containing POSIX-style values, in particular tm_year is relative
to 1900.  But DetermineTimeZoneAbbrevOffset expects a struct using
the PG convention that tm_year is relative to "AD 0".  So it sees
a date in the second century AD, decides that that's way out of
range, and falls back to the "LMT" offset provided by the tzdb
database.  That lines up with what you'd get from

regression=# set timezone = 'America/Santiago';
SET
regression=# select '0121-09-03 12:34'::timestamptz;
         timestamptz          
------------------------------
 0121-09-03 12:34:00-04:42:46
(1 row)

</digression>

Basically the problem here is that this is incredibly hoary code
that's never been touched or tested as we revised datetime-related
APIs elsewhere.  I'm fairly unhappy now that we don't have any
regression test coverage for this function.  However, I see no
very good way to make that happen, because the interesting code
paths will (by definition) produce different results at different
times of year.  I suppose we could carry two variant expected-files,
but ick.  The DYNTZ path is particularly problematic here, because
that's only used for timezones that have changed definitions over
time, meaning they're particularly likely to change again.

Anyway, attached is a revised patch that gets rid of the antique
code, and it produces correct results AFAICT.

BTW, it's customary to *not* include catversion bumps in submitted
patches, because that accomplishes little except to ensure that
your patch will soon fail to apply.  (This one already is failing.)
If you feel a need to remind the committer that a catversion bump
is needed, just comment to that effect in the submission email.

I'm not entirely sure what to do about the discovery that the
DYNTZ path has pre-existing breakage.  Perhaps it'd be sensible
to back-patch this patch, minus the catalog change.  I doubt that
anyone would have a problem with the nominal change of behavior
near DST boundaries.  Or we could just ignore the bug in the back
branches, since it's fairly clear that basically no one uses this
function.

            regards, tom lane

diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 0bea16cb67..cf85a61778 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -3029,7 +3029,7 @@ extract_timetz(PG_FUNCTION_ARGS)
 
 /* timetz_zone()
  * Encode time with time zone type with specified time zone.
- * Applies DST rules as of the current date.
+ * Applies DST rules as of the transaction start time.
  */
 Datum
 timetz_zone(PG_FUNCTION_ARGS)
@@ -3068,12 +3068,11 @@ timetz_zone(PG_FUNCTION_ARGS)
     }
     else if (type == DYNTZ)
     {
-        /* dynamic-offset abbreviation, resolve using current time */
-        pg_time_t    now = (pg_time_t) time(NULL);
-        struct pg_tm *tm;
+        /* dynamic-offset abbreviation, resolve using transaction start time */
+        TimestampTz now = GetCurrentTransactionStartTimestamp();
+        int            isdst;
 
-        tm = pg_localtime(&now, tzp);
-        tz = DetermineTimeZoneAbbrevOffset(tm, tzname, tzp);
+        tz = DetermineTimeZoneAbbrevOffsetTS(now, tzname, tzp, &isdst);
     }
     else
     {
@@ -3081,12 +3080,15 @@ timetz_zone(PG_FUNCTION_ARGS)
         tzp = pg_tzset(tzname);
         if (tzp)
         {
-            /* Get the offset-from-GMT that is valid today for the zone */
-            pg_time_t    now = (pg_time_t) time(NULL);
-            struct pg_tm *tm;
+            /* Get the offset-from-GMT that is valid now for the zone */
+            TimestampTz now = GetCurrentTransactionStartTimestamp();
+            struct pg_tm tm;
+            fsec_t        fsec;
 
-            tm = pg_localtime(&now, tzp);
-            tz = -tm->tm_gmtoff;
+            if (timestamp2tm(now, &tz, &tm, &fsec, NULL, tzp) != 0)
+                ereport(ERROR,
+                        (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+                         errmsg("timestamp out of range")));
         }
         else
         {
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c07b2c6a55..d068d6532e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5953,7 +5953,7 @@
   proname => 'timestamp_larger', prorettype => 'timestamp',
   proargtypes => 'timestamp timestamp', prosrc => 'timestamp_larger' },
 { oid => '2037', descr => 'adjust time with time zone to new zone',
-  proname => 'timezone', provolatile => 'v', prorettype => 'timetz',
+  proname => 'timezone', provolatile => 's', prorettype => 'timetz',
   proargtypes => 'text timetz', prosrc => 'timetz_zone' },
 { oid => '2038', descr => 'adjust time with time zone to new zone',
   proname => 'timezone', prorettype => 'timetz',

Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Aleksander Alekseev
Date:
> BTW, it's customary to *not* include catversion bumps in submitted
> patches

Thanks, Tom.

> Anyway, attached is a revised patch that gets rid of the antique
> code, and it produces correct results AFAICT.

I tested your patch against the current master branch 78aa616b on
MacOS Catalina. I have nothing to add to the patch.

> I'm fairly unhappy now that we don't have any
> regression test coverage for this function.

Yep, that's unfortunate. I see several tests for `AT TIME ZONE`
syntax, which is a syntax sugar to timezone() with timestamp[tz]
arguments. But considering how `timetz` type is broken in the first
place [1], I'm not surprised few people feel motivated to do anything
related to it. Do you think there is a possibility that one day we may
be brave enough to get rid of this type?


[1]: https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timetz

-- 
Best regards,
Aleksander Alekseev



Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

From
Tom Lane
Date:
Aleksander Alekseev <aleksander@timescale.com> writes:
>> Anyway, attached is a revised patch that gets rid of the antique
>> code, and it produces correct results AFAICT.

> I tested your patch against the current master branch 78aa616b on
> MacOS Catalina. I have nothing to add to the patch.

Thanks.  Pushed, along with a quick-and-dirty patch to resolve the
DYNTZ problem in the back branches.

>> I'm fairly unhappy now that we don't have any
>> regression test coverage for this function.

> Yep, that's unfortunate. I see several tests for `AT TIME ZONE`
> syntax, which is a syntax sugar to timezone() with timestamp[tz]
> arguments. But considering how `timetz` type is broken in the first
> place [1], I'm not surprised few people feel motivated to do anything
> related to it. Do you think there is a possibility that one day we may
> be brave enough to get rid of this type?

I'm afraid not, seeing that it's required by the SQL standard.

I thought about adding tests based on the CLT example I showed upthread,
and just accepting the need for two variant result files.  Maybe we
should do that.  However, it still wouldn't be a great test, because
it would not prove that the DST switchover happens at the right time of
year, or indeed at all.  So for the moment I didn't.

            regards, tom lane