Thread: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

From
Aleksander Alekseev
Date:
Hi hackers,

We discovered one strange edge case with TimestampTz:

```
=# set datestyle to 'Postgres';
SET
=# SELECT '1000-01-01'::timestamptz::text;
             text
------------------------------
 Wed Jan 01 00:00:00 1000 LMT

=# SELECT '1000-01-01'::timestamptz::text::timestamptz;
ERROR:  invalid input syntax for type timestamp with time zone: "Wed
Jan 01 00:00:00 1000 LMT"
```

When DateStyle is set to 'ISO' everything works fine:

```
=# set datestyle to 'ISO';
SET
=# SELECT '1000-01-01'::timestamptz::text;
             text
------------------------------
 1000-01-01 00:00:00+02:30:17

=# SELECT '1000-01-01'::timestamptz::text::timestamptz;
         timestamptz
------------------------------
 1000-01-01 00:00:00+02:30:17
```

If I understand correctly, text->timestamptz doesn't understand the
'LMT' timezone. Until 1879 it doesn't work, but in 1880 we switch to
'MMT' and then it works:

```
eax=# SELECT '1879-01-01'::timestamptz::text::timestamptz;
ERROR:  invalid input syntax for type timestamp with time zone: "Wed
Jan 01 00:00:00 1879 LMT"
eax=# SELECT '1880-01-01'::timestamptz::text::timestamptz;
         timestamptz
------------------------------
 Wed Dec 31 20:00:17 1879 LMT
(1 row)

eax=# SELECT '1880-01-01'::timestamptz::text;
             text
------------------------------
 Thu Jan 01 00:00:00 1880 MMT
```

It seems to me that in the first case we should either accept "Wed Jan
01 00:00:00 1000 LMT" (since we just generated it) or alternatively
produce something else when casting timestamptz to text.

Thoughts?

-- 
Best regards,
Aleksander Alekseev



Aleksander Alekseev <aleksander@timescale.com> writes:
> =# set datestyle to 'Postgres';
> SET
> =# SELECT '1000-01-01'::timestamptz::text::timestamptz;
> ERROR:  invalid input syntax for type timestamp with time zone: "Wed
> Jan 01 00:00:00 1000 LMT"

> If I understand correctly, text->timestamptz doesn't understand the
> 'LMT' timezone.

So it seems.  In a quick experiment, it seemed like just treating LMT
as a noise word on input might be enough, but I don't have time today
to poke at it further.

            regards, tom lane



Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

From
Aleksander Alekseev
Date:
Hi Tom,

> > If I understand correctly, text->timestamptz doesn't understand the
> > 'LMT' timezone.
>
> So it seems.  In a quick experiment, it seemed like just treating LMT
> as a noise word on input might be enough, but I don't have time today
> to poke at it further.

Thanks for the hint. I've found the corresponding piece of code and
will submit a proper patch shortly.

-- 
Best regards,
Aleksander Alekseev



Michael Paquier <michael@paquier.xyz> writes:
> FWIW, I have a hard time understanding why ignoring "LMT", which is
> a timezone abbreviation meaning "Local Mean Time", in datetktbl[],
> which is a table for date/time keywords, is the right thing to do.

I agree.  We should be mapping it to whatever GMT offset "LMT" means
in the selected zone (assuming there is an entry, which it seems like
there is in most of them).  I've not gotten around to looking at what
that would take, but it's probably not entirely trivial, since it
would mean different GMT offsets in different zones.

One thing that might be interesting to look at is whether the same
mechanism could be used for other TZ abbreviations defined by the tzdb
data, instead of relying on our hard-wired lists.

> Worth noting that we may want to do something in ECPG's dt_common.c as
> well, that holds a similar table with TZ abbreviation data and LMT is
> missing there?

ECPG's datetime support is so far behind the server's that it's kind
of sad.  But bringing it up to speed would be a large investment of
effort, something I for one am not prepared to put into it.  I sure
don't think we should hold up fixing this on the server side pending
making that happen.

            regards, tom lane



I wrote:
> I agree.  We should be mapping it to whatever GMT offset "LMT" means
> in the selected zone (assuming there is an entry, which it seems like
> there is in most of them).  I've not gotten around to looking at what
> that would take, but it's probably not entirely trivial, since it
> would mean different GMT offsets in different zones.

Here's a draft patch for that.  It turns out to be simpler than I
thought, because we can mostly piggy-back on the existing DYNTZ
support.  The difference between LMT and one of our existing dynamic
abbreviations is that LMT means whatever it means in the prevailing
session timezone, while a dynamic abbreviation specifies which TZDB
timezone it refers to.

> One thing that might be interesting to look at is whether the same
> mechanism could be used for other TZ abbreviations defined by the tzdb
> data, instead of relying on our hard-wired lists.

As I set it up here, we first check the timezone abbreviation list,
then look into the session timezone to see if it has an entry.
I don't expect that this lookup will succeed for anything except LMT,
because every other abbreviation that TZDB knows about is already
listed in our standard abbreviation list.  But in the future we
could imagine removing entries from the abbreviation list so that
this code path takes more of the burden.  That'd be particularly
attractive for abbreviations that have conflicts, because then
our interpretation of the abbreviation would automatically adapt
based on the timezone setting.  That's something to pursue another
day though.  We might need to work a bit harder on optimizing
this code path before we let it take anything more common than
LMT, too.

This is not committable because I didn't think about documentation
yet.  We probably want to explain this in Appendix B, and also
there are assorted comments about what a "dynamic abbreviation"
is that will need some adjustment.

            regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 0b19cddf54..6a10488d9a 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1812,9 +1812,35 @@ DetermineTimeZoneAbbrevOffsetTS(TimestampTz ts, const char *abbr,
 }


+/* TimeZoneAbbrevIsKnown()
+ *
+ * Detect whether the given string is a time zone abbreviation that's known
+ * in the specified TZDB timezone.  The match is not case-sensitive.
+ */
+static bool
+TimeZoneAbbrevIsKnown(const char *abbr, pg_tz *tzp)
+{
+    int            offset,
+                isdst;
+
+    /*
+     * DetermineTimeZoneAbbrevOffsetInternal does more than is really needed
+     * here, but it's not clear that it's worth optimizing further.
+     *
+     * We can use the epoch as the probe time, since we don't care here about
+     * exactly what the abbreviation resolves as.
+     */
+    return DetermineTimeZoneAbbrevOffsetInternal((pg_time_t) 0,
+                                                 abbr,
+                                                 tzp,
+                                                 &offset,
+                                                 &isdst);
+}
+
+
 /* DetermineTimeZoneAbbrevOffsetInternal()
  *
- * Workhorse for above two functions: work from a pg_time_t probe instant.
+ * Workhorse for above three functions: work from a pg_time_t probe instant.
  * On success, return GMT offset and DST status into *offset and *isdst.
  */
 static bool
@@ -3106,9 +3132,25 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken,
     }
     if (tp == NULL)
     {
-        *ftype = UNKNOWN_FIELD;
-        *offset = 0;
-        *tz = NULL;
+        /*
+         * See if the current session_timezone recognizes it.  This is mainly
+         * meant to handle "LMT", whose meaning varies across timezones, so we
+         * don't try to cache the result.
+         */
+        if (session_timezone &&
+            TimeZoneAbbrevIsKnown(lowtoken, session_timezone))
+        {
+            /* we can treat it like a dynamic-offset abbreviation */
+            *ftype = DYNTZ;
+            *offset = 0;
+            *tz = session_timezone;
+        }
+        else
+        {
+            *ftype = UNKNOWN_FIELD;
+            *offset = 0;
+            *tz = NULL;
+        }
     }
     else
     {
@@ -3278,9 +3320,6 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
     *offset = 0;                /* avoid uninitialized vars on failure */
     *tz = NULL;

-    if (!zoneabbrevtbl)
-        return -1;                /* no abbrevs known, so fail immediately */
-
     /* Downcase as much of the string as we could need */
     for (len = 0; len < TOKMAXLEN; len++)
     {
@@ -3299,9 +3338,12 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
      */
     while (len > 0)
     {
-        const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
-                                        zoneabbrevtbl->numabbrevs);
+        const datetkn *tp = NULL;

+        /* Known in zoneabbrevtbl? */
+        if (zoneabbrevtbl)
+            tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
+                             zoneabbrevtbl->numabbrevs);
         if (tp != NULL)
         {
             if (tp->type == DYNTZ)
@@ -3324,6 +3366,17 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
                 return len;
             }
         }
+
+        /* See if the current session_timezone recognizes it. */
+        if (session_timezone &&
+            TimeZoneAbbrevIsKnown(lowtoken, session_timezone))
+        {
+            /* we can treat it like a dynamic-offset abbreviation */
+            *tz = session_timezone;
+            return len;
+        }
+
+        /* Nope, try the next shorter string. */
         lowtoken[--len] = '\0';
     }

diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index cb28dfbaee..b90bfcd794 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -3332,6 +3332,12 @@ SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ');  -- dyntz
  Sat Dec 17 23:38:00 2011 PST
 (1 row)

+SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ');  -- dyntz
+         to_timestamp
+------------------------------
+ Sat Dec 17 23:52:58 2011 PST
+(1 row)
+
 SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
          to_timestamp
 ------------------------------
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index b437613ac8..b5f0e0b8db 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -176,6 +176,45 @@ SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST
  Fri Jan 10 07:32:01 205000 PST
 (1 row)

+-- Recognize "LMT" as whatever it means in the current zone
+SELECT 'Jan 01 00:00:00 1000'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Sun Dec 31 23:52:58 2023 PST
+(1 row)
+
+SET timezone = 'Europe/London';
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Wed Jan 01 00:00:00 1000 LMT
+(1 row)
+
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+         timestamptz
+------------------------------
+ Mon Jan 01 00:01:15 2024 GMT
+(1 row)
+
+-- which might be nothing
+SET timezone = 'UTC';
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;  -- fail
+ERROR:  invalid input syntax for type timestamp with time zone: "Jan 01 00:00:00 2024 LMT"
+LINE 1: SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+               ^
+RESET timezone;
 -- Test non-error-throwing API
 SELECT pg_input_is_valid('now', 'timestamptz');
  pg_input_is_valid
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 4aa88b4ba9..1310b43277 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -538,6 +538,7 @@ SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ');
 SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ');  -- dyntz
+SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ');  -- dyntz
 SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
 SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS');
 SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ');  -- error
diff --git a/src/test/regress/sql/timestamptz.sql b/src/test/regress/sql/timestamptz.sql
index 6b91e7eddc..ea0ffca310 100644
--- a/src/test/regress/sql/timestamptz.sql
+++ b/src/test/regress/sql/timestamptz.sql
@@ -109,6 +109,18 @@ SELECT '20500110 173201 Europe/Helsinki'::timestamptz; -- non-DST
 SELECT '205000-07-10 17:32:01 Europe/Helsinki'::timestamptz; -- DST
 SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST

+-- Recognize "LMT" as whatever it means in the current zone
+SELECT 'Jan 01 00:00:00 1000'::timestamptz;
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+SET timezone = 'Europe/London';
+SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz;
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;
+-- which might be nothing
+SET timezone = 'UTC';
+SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz;  -- fail
+RESET timezone;
+
 -- Test non-error-throwing API
 SELECT pg_input_is_valid('now', 'timestamptz');
 SELECT pg_input_is_valid('garbage', 'timestamptz');

I wrote:
> As I set it up here, we first check the timezone abbreviation list,
> then look into the session timezone to see if it has an entry.
> I don't expect that this lookup will succeed for anything except LMT,
> because every other abbreviation that TZDB knows about is already
> listed in our standard abbreviation list.  But in the future we
> could imagine removing entries from the abbreviation list so that
> this code path takes more of the burden.

I had a second thought here.  The original problem is not really
restricted to "LMT", though that case tends to give an obvious
"no such timezone abbreviation" failure.  There is a more insidious
possibility that the cast-back-and-forth succeeds but yields a changed
timestamp value, because the abbreviation emitted by timestamptz_out
is recognized but interpreted differently by timestamptz_in.
I believe that that's at least theoretically possible today, because
what timestamptz_out prints for the abbreviation comes out of the
prevailing zone's TZDB entry, but what timestamptz_in consults is the
timezone_abbreviations list.  If our list is out of sync with TZDB,
even for just part of the history of a zone, we've got problems.

It may well be that there are no live problems today, especially
since TZDB has gotten rid of a lot of abbreviations that they
decided were made-up rather than in real-world usage.  I don't
feel like that's a great bet though, and even if it's true today
it might not be in the future.

So that leads me to wonder if we should flip the lookup order
and consult the prevailing zone's TZDB entry first, falling back
to timezone_abbreviations only if the abbrev is not known in the
current zone.

There are certainly reasons not to do that, one being that there
would be no way to override TZDB's opinion if you don't like it.
And I'd not propose it for back-patch.  But it's something to
consider.

            regards, tom lane



I wrote:
> I had a second thought here.  The original problem is not really
> restricted to "LMT", though that case tends to give an obvious
> "no such timezone abbreviation" failure.  There is a more insidious
> possibility that the cast-back-and-forth succeeds but yields a changed
> timestamp value, because the abbreviation emitted by timestamptz_out
> is recognized but interpreted differently by timestamptz_in.
> I believe that that's at least theoretically possible today, because
> what timestamptz_out prints for the abbreviation comes out of the
> prevailing zone's TZDB entry, but what timestamptz_in consults is the
> timezone_abbreviations list.  If our list is out of sync with TZDB,
> even for just part of the history of a zone, we've got problems.

I decided to troll through TZDB to see whether this is a live issue
or not, and it didn't take me more than a couple minutes to find
such a case:

regression=# set timezone to 'America/Montevideo';
SET
regression=# set datestyle to postgres;
SET
regression=# select '1912-01-01'::timestamptz;
         timestamptz          
------------------------------
 Mon Jan 01 00:00:00 1912 MMT
(1 row)

regression=# select '1912-01-01'::timestamptz::text::timestamptz;
         timestamptz          
------------------------------
 Sun Dec 31 13:45:09 1911 MMT
(1 row)

That's because our default timezone_abbreviations list thinks MMT
means

MMT     23400    # Myanmar Time (obsolete)

whereas TZDB has this for Montevideo:

Zone America/Montevideo    -3:44:51 -    LMT    1908 Jun 10
            -3:44:51 -    MMT    1920 May  1 # Montevideo MT
            -4:00    -    %z    1923 Oct  1
                        ...

Other examples are not hard to come by, eg the same date
in zone Europe/Athens.  So this shows that the problem is
real and it can result in sizable errors.

I now recall that our timezone abbreviation list was built by
considering *current* zone abbreviations that appeared in TZDB
whenever we made that list, a decade or two back.  We never
thought of trying to cope with historical values.  There are
many more conflicting abbreviations if you include the
historical entries.

So it feels like we'd better do something about this, if we
don't want to deprecate the Postgres datestyle entirely.

I find it scary that the solution involves changing the
behavior of timestamptz_in, because that's surely got hazards
of unexpected side-effects; but it's clear that this is a mess.

            regards, tom lane



On Wed, 2024-12-11 at 17:40 -0500, Tom Lane wrote:
> So it feels like we'd better do something about this, if we
> don't want to deprecate the Postgres datestyle entirely.

I agree that something should be done, but I'd have no problem
with deprecating the Postgres datestyle.  People who depend on
it can switch to using to_date() and to_timestamp(), which
shouldn't impose too much hardship if we give them enough time.

Yours,
Laurenz Albe



Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I agree that something should be done, but I'd have no problem
> with deprecating the Postgres datestyle.  People who depend on
> it can switch to using to_date() and to_timestamp(), which
> shouldn't impose too much hardship if we give them enough time.

It's not like to_timestamp doesn't have the same problem...

Anyway, I've started a -hackers thread about this issue at [1].
Whatever we do probably deserves wider discussion than it will
get on pgsql-bugs.

            regards, tom lane

[1] https://www.postgresql.org/message-id/957280.1734046384%40sss.pgh.pa.us