Thread: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'
TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'
From
Aleksander Alekseev
Date:
Hi hackers, We discovered one strange edge case with TimestampTz: ``` =# set datestyle to 'Postgres'; SET =# SELECT '1000-01-01'::timestamptz::text; text ------------------------------ Wed Jan 01 00:00:00 1000 LMT =# SELECT '1000-01-01'::timestamptz::text::timestamptz; ERROR: invalid input syntax for type timestamp with time zone: "Wed Jan 01 00:00:00 1000 LMT" ``` When DateStyle is set to 'ISO' everything works fine: ``` =# set datestyle to 'ISO'; SET =# SELECT '1000-01-01'::timestamptz::text; text ------------------------------ 1000-01-01 00:00:00+02:30:17 =# SELECT '1000-01-01'::timestamptz::text::timestamptz; timestamptz ------------------------------ 1000-01-01 00:00:00+02:30:17 ``` If I understand correctly, text->timestamptz doesn't understand the 'LMT' timezone. Until 1879 it doesn't work, but in 1880 we switch to 'MMT' and then it works: ``` eax=# SELECT '1879-01-01'::timestamptz::text::timestamptz; ERROR: invalid input syntax for type timestamp with time zone: "Wed Jan 01 00:00:00 1879 LMT" eax=# SELECT '1880-01-01'::timestamptz::text::timestamptz; timestamptz ------------------------------ Wed Dec 31 20:00:17 1879 LMT (1 row) eax=# SELECT '1880-01-01'::timestamptz::text; text ------------------------------ Thu Jan 01 00:00:00 1880 MMT ``` It seems to me that in the first case we should either accept "Wed Jan 01 00:00:00 1000 LMT" (since we just generated it) or alternatively produce something else when casting timestamptz to text. Thoughts? -- Best regards, Aleksander Alekseev
Aleksander Alekseev <aleksander@timescale.com> writes: > =# set datestyle to 'Postgres'; > SET > =# SELECT '1000-01-01'::timestamptz::text::timestamptz; > ERROR: invalid input syntax for type timestamp with time zone: "Wed > Jan 01 00:00:00 1000 LMT" > If I understand correctly, text->timestamptz doesn't understand the > 'LMT' timezone. So it seems. In a quick experiment, it seemed like just treating LMT as a noise word on input might be enough, but I don't have time today to poke at it further. regards, tom lane
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'
From
Aleksander Alekseev
Date:
Hi Tom, > > If I understand correctly, text->timestamptz doesn't understand the > > 'LMT' timezone. > > So it seems. In a quick experiment, it seemed like just treating LMT > as a noise word on input might be enough, but I don't have time today > to poke at it further. Thanks for the hint. I've found the corresponding piece of code and will submit a proper patch shortly. -- Best regards, Aleksander Alekseev
Michael Paquier <michael@paquier.xyz> writes: > FWIW, I have a hard time understanding why ignoring "LMT", which is > a timezone abbreviation meaning "Local Mean Time", in datetktbl[], > which is a table for date/time keywords, is the right thing to do. I agree. We should be mapping it to whatever GMT offset "LMT" means in the selected zone (assuming there is an entry, which it seems like there is in most of them). I've not gotten around to looking at what that would take, but it's probably not entirely trivial, since it would mean different GMT offsets in different zones. One thing that might be interesting to look at is whether the same mechanism could be used for other TZ abbreviations defined by the tzdb data, instead of relying on our hard-wired lists. > Worth noting that we may want to do something in ECPG's dt_common.c as > well, that holds a similar table with TZ abbreviation data and LMT is > missing there? ECPG's datetime support is so far behind the server's that it's kind of sad. But bringing it up to speed would be a large investment of effort, something I for one am not prepared to put into it. I sure don't think we should hold up fixing this on the server side pending making that happen. regards, tom lane
I wrote: > I agree. We should be mapping it to whatever GMT offset "LMT" means > in the selected zone (assuming there is an entry, which it seems like > there is in most of them). I've not gotten around to looking at what > that would take, but it's probably not entirely trivial, since it > would mean different GMT offsets in different zones. Here's a draft patch for that. It turns out to be simpler than I thought, because we can mostly piggy-back on the existing DYNTZ support. The difference between LMT and one of our existing dynamic abbreviations is that LMT means whatever it means in the prevailing session timezone, while a dynamic abbreviation specifies which TZDB timezone it refers to. > One thing that might be interesting to look at is whether the same > mechanism could be used for other TZ abbreviations defined by the tzdb > data, instead of relying on our hard-wired lists. As I set it up here, we first check the timezone abbreviation list, then look into the session timezone to see if it has an entry. I don't expect that this lookup will succeed for anything except LMT, because every other abbreviation that TZDB knows about is already listed in our standard abbreviation list. But in the future we could imagine removing entries from the abbreviation list so that this code path takes more of the burden. That'd be particularly attractive for abbreviations that have conflicts, because then our interpretation of the abbreviation would automatically adapt based on the timezone setting. That's something to pursue another day though. We might need to work a bit harder on optimizing this code path before we let it take anything more common than LMT, too. This is not committable because I didn't think about documentation yet. We probably want to explain this in Appendix B, and also there are assorted comments about what a "dynamic abbreviation" is that will need some adjustment. regards, tom lane diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 0b19cddf54..6a10488d9a 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -1812,9 +1812,35 @@ DetermineTimeZoneAbbrevOffsetTS(TimestampTz ts, const char *abbr, } +/* TimeZoneAbbrevIsKnown() + * + * Detect whether the given string is a time zone abbreviation that's known + * in the specified TZDB timezone. The match is not case-sensitive. + */ +static bool +TimeZoneAbbrevIsKnown(const char *abbr, pg_tz *tzp) +{ + int offset, + isdst; + + /* + * DetermineTimeZoneAbbrevOffsetInternal does more than is really needed + * here, but it's not clear that it's worth optimizing further. + * + * We can use the epoch as the probe time, since we don't care here about + * exactly what the abbreviation resolves as. + */ + return DetermineTimeZoneAbbrevOffsetInternal((pg_time_t) 0, + abbr, + tzp, + &offset, + &isdst); +} + + /* DetermineTimeZoneAbbrevOffsetInternal() * - * Workhorse for above two functions: work from a pg_time_t probe instant. + * Workhorse for above three functions: work from a pg_time_t probe instant. * On success, return GMT offset and DST status into *offset and *isdst. */ static bool @@ -3106,9 +3132,25 @@ DecodeTimezoneAbbrev(int field, const char *lowtoken, } if (tp == NULL) { - *ftype = UNKNOWN_FIELD; - *offset = 0; - *tz = NULL; + /* + * See if the current session_timezone recognizes it. This is mainly + * meant to handle "LMT", whose meaning varies across timezones, so we + * don't try to cache the result. + */ + if (session_timezone && + TimeZoneAbbrevIsKnown(lowtoken, session_timezone)) + { + /* we can treat it like a dynamic-offset abbreviation */ + *ftype = DYNTZ; + *offset = 0; + *tz = session_timezone; + } + else + { + *ftype = UNKNOWN_FIELD; + *offset = 0; + *tz = NULL; + } } else { @@ -3278,9 +3320,6 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz) *offset = 0; /* avoid uninitialized vars on failure */ *tz = NULL; - if (!zoneabbrevtbl) - return -1; /* no abbrevs known, so fail immediately */ - /* Downcase as much of the string as we could need */ for (len = 0; len < TOKMAXLEN; len++) { @@ -3299,9 +3338,12 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz) */ while (len > 0) { - const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, - zoneabbrevtbl->numabbrevs); + const datetkn *tp = NULL; + /* Known in zoneabbrevtbl? */ + if (zoneabbrevtbl) + tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, + zoneabbrevtbl->numabbrevs); if (tp != NULL) { if (tp->type == DYNTZ) @@ -3324,6 +3366,17 @@ DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz) return len; } } + + /* See if the current session_timezone recognizes it. */ + if (session_timezone && + TimeZoneAbbrevIsKnown(lowtoken, session_timezone)) + { + /* we can treat it like a dynamic-offset abbreviation */ + *tz = session_timezone; + return len; + } + + /* Nope, try the next shorter string. */ lowtoken[--len] = '\0'; } diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out index cb28dfbaee..b90bfcd794 100644 --- a/src/test/regress/expected/horology.out +++ b/src/test/regress/expected/horology.out @@ -3332,6 +3332,12 @@ SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ'); -- dyntz Sat Dec 17 23:38:00 2011 PST (1 row) +SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ'); -- dyntz + to_timestamp +------------------------------ + Sat Dec 17 23:52:58 2011 PST +(1 row) + SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); to_timestamp ------------------------------ diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out index b437613ac8..b5f0e0b8db 100644 --- a/src/test/regress/expected/timestamptz.out +++ b/src/test/regress/expected/timestamptz.out @@ -176,6 +176,45 @@ SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST Fri Jan 10 07:32:01 205000 PST (1 row) +-- Recognize "LMT" as whatever it means in the current zone +SELECT 'Jan 01 00:00:00 1000'::timestamptz; + timestamptz +------------------------------ + Wed Jan 01 00:00:00 1000 LMT +(1 row) + +SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz; + timestamptz +------------------------------ + Wed Jan 01 00:00:00 1000 LMT +(1 row) + +SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz; + timestamptz +------------------------------ + Sun Dec 31 23:52:58 2023 PST +(1 row) + +SET timezone = 'Europe/London'; +SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz; + timestamptz +------------------------------ + Wed Jan 01 00:00:00 1000 LMT +(1 row) + +SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz; + timestamptz +------------------------------ + Mon Jan 01 00:01:15 2024 GMT +(1 row) + +-- which might be nothing +SET timezone = 'UTC'; +SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz; -- fail +ERROR: invalid input syntax for type timestamp with time zone: "Jan 01 00:00:00 2024 LMT" +LINE 1: SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz; + ^ +RESET timezone; -- Test non-error-throwing API SELECT pg_input_is_valid('now', 'timestamptz'); pg_input_is_valid diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql index 4aa88b4ba9..1310b43277 100644 --- a/src/test/regress/sql/horology.sql +++ b/src/test/regress/sql/horology.sql @@ -538,6 +538,7 @@ SELECT to_timestamp('2011-12-18 11:38 EST', 'YYYY-MM-DD HH12:MI TZ'); SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZ'); SELECT to_timestamp('2011-12-18 11:38 +01:30', 'YYYY-MM-DD HH12:MI TZ'); SELECT to_timestamp('2011-12-18 11:38 MSK', 'YYYY-MM-DD HH12:MI TZ'); -- dyntz +SELECT to_timestamp('2011-12-18 00:00 LMT', 'YYYY-MM-DD HH24:MI TZ'); -- dyntz SELECT to_timestamp('2011-12-18 11:38ESTFOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); SELECT to_timestamp('2011-12-18 11:38-05FOO24', 'YYYY-MM-DD HH12:MITZFOOSS'); SELECT to_timestamp('2011-12-18 11:38 JUNK', 'YYYY-MM-DD HH12:MI TZ'); -- error diff --git a/src/test/regress/sql/timestamptz.sql b/src/test/regress/sql/timestamptz.sql index 6b91e7eddc..ea0ffca310 100644 --- a/src/test/regress/sql/timestamptz.sql +++ b/src/test/regress/sql/timestamptz.sql @@ -109,6 +109,18 @@ SELECT '20500110 173201 Europe/Helsinki'::timestamptz; -- non-DST SELECT '205000-07-10 17:32:01 Europe/Helsinki'::timestamptz; -- DST SELECT '205000-01-10 17:32:01 Europe/Helsinki'::timestamptz; -- non-DST +-- Recognize "LMT" as whatever it means in the current zone +SELECT 'Jan 01 00:00:00 1000'::timestamptz; +SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz; +SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz; +SET timezone = 'Europe/London'; +SELECT 'Jan 01 00:00:00 1000 LMT'::timestamptz; +SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz; +-- which might be nothing +SET timezone = 'UTC'; +SELECT 'Jan 01 00:00:00 2024 LMT'::timestamptz; -- fail +RESET timezone; + -- Test non-error-throwing API SELECT pg_input_is_valid('now', 'timestamptz'); SELECT pg_input_is_valid('garbage', 'timestamptz');
I wrote: > As I set it up here, we first check the timezone abbreviation list, > then look into the session timezone to see if it has an entry. > I don't expect that this lookup will succeed for anything except LMT, > because every other abbreviation that TZDB knows about is already > listed in our standard abbreviation list. But in the future we > could imagine removing entries from the abbreviation list so that > this code path takes more of the burden. I had a second thought here. The original problem is not really restricted to "LMT", though that case tends to give an obvious "no such timezone abbreviation" failure. There is a more insidious possibility that the cast-back-and-forth succeeds but yields a changed timestamp value, because the abbreviation emitted by timestamptz_out is recognized but interpreted differently by timestamptz_in. I believe that that's at least theoretically possible today, because what timestamptz_out prints for the abbreviation comes out of the prevailing zone's TZDB entry, but what timestamptz_in consults is the timezone_abbreviations list. If our list is out of sync with TZDB, even for just part of the history of a zone, we've got problems. It may well be that there are no live problems today, especially since TZDB has gotten rid of a lot of abbreviations that they decided were made-up rather than in real-world usage. I don't feel like that's a great bet though, and even if it's true today it might not be in the future. So that leads me to wonder if we should flip the lookup order and consult the prevailing zone's TZDB entry first, falling back to timezone_abbreviations only if the abbrev is not known in the current zone. There are certainly reasons not to do that, one being that there would be no way to override TZDB's opinion if you don't like it. And I'd not propose it for back-patch. But it's something to consider. regards, tom lane
I wrote: > I had a second thought here. The original problem is not really > restricted to "LMT", though that case tends to give an obvious > "no such timezone abbreviation" failure. There is a more insidious > possibility that the cast-back-and-forth succeeds but yields a changed > timestamp value, because the abbreviation emitted by timestamptz_out > is recognized but interpreted differently by timestamptz_in. > I believe that that's at least theoretically possible today, because > what timestamptz_out prints for the abbreviation comes out of the > prevailing zone's TZDB entry, but what timestamptz_in consults is the > timezone_abbreviations list. If our list is out of sync with TZDB, > even for just part of the history of a zone, we've got problems. I decided to troll through TZDB to see whether this is a live issue or not, and it didn't take me more than a couple minutes to find such a case: regression=# set timezone to 'America/Montevideo'; SET regression=# set datestyle to postgres; SET regression=# select '1912-01-01'::timestamptz; timestamptz ------------------------------ Mon Jan 01 00:00:00 1912 MMT (1 row) regression=# select '1912-01-01'::timestamptz::text::timestamptz; timestamptz ------------------------------ Sun Dec 31 13:45:09 1911 MMT (1 row) That's because our default timezone_abbreviations list thinks MMT means MMT 23400 # Myanmar Time (obsolete) whereas TZDB has this for Montevideo: Zone America/Montevideo -3:44:51 - LMT 1908 Jun 10 -3:44:51 - MMT 1920 May 1 # Montevideo MT -4:00 - %z 1923 Oct 1 ... Other examples are not hard to come by, eg the same date in zone Europe/Athens. So this shows that the problem is real and it can result in sizable errors. I now recall that our timezone abbreviation list was built by considering *current* zone abbreviations that appeared in TZDB whenever we made that list, a decade or two back. We never thought of trying to cope with historical values. There are many more conflicting abbreviations if you include the historical entries. So it feels like we'd better do something about this, if we don't want to deprecate the Postgres datestyle entirely. I find it scary that the solution involves changing the behavior of timestamptz_in, because that's surely got hazards of unexpected side-effects; but it's clear that this is a mess. regards, tom lane
On Wed, 2024-12-11 at 17:40 -0500, Tom Lane wrote: > So it feels like we'd better do something about this, if we > don't want to deprecate the Postgres datestyle entirely. I agree that something should be done, but I'd have no problem with deprecating the Postgres datestyle. People who depend on it can switch to using to_date() and to_timestamp(), which shouldn't impose too much hardship if we give them enough time. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > I agree that something should be done, but I'd have no problem > with deprecating the Postgres datestyle. People who depend on > it can switch to using to_date() and to_timestamp(), which > shouldn't impose too much hardship if we give them enough time. It's not like to_timestamp doesn't have the same problem... Anyway, I've started a -hackers thread about this issue at [1]. Whatever we do probably deserves wider discussion than it will get on pgsql-bugs. regards, tom lane [1] https://www.postgresql.org/message-id/957280.1734046384%40sss.pgh.pa.us