Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Date
Msg-id 3961.1564086915@sss.pgh.pa.us
Whole thread Raw
In response to Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)  (Shay Rojansky <roji@roji.org>)
List pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> FWIW this is now fixed for FreeBSD 13-CURRENT, with a good chance of
> back-patch.  I don't know if there are any other operating systems
> that are shipping zoneinfo but failing to install zone1970.tab, but if
> there are it's a mistake IMHO and they'll probably fix that if someone
> complains, considering that zone.tab literally tells you to go and use
> the newer version, and Paul Eggert has implied that zone1970.tab is
> the "full" and "canonical" list[1].

I'm not sure we're any closer to a meeting of the minds on whether
consulting zone[1970].tab is a good thing to do, but we got an actual
user complaint[1] about how "localtime" should not be a preferred
spelling.  So I want to go ahead and insert the discussed anti-preference
against "localtime" and "posixrules", as per 0001 below.  If we do do
something with zone[1970].tab, we'd still need these special rules,
so I don't think this is blocking anything.

Also, I poked into the question of the "Factory" zone a bit more,
and was disappointed to find that not only does FreeBSD still install
the "Factory" zone, but they are apparently hacking the data so that
it emits the two-changes-back abbreviation "Local time zone must be
set--use tzsetup".  This bypasses the filter in pg_timezone_names that
is expressly trying to prevent showing such silly "abbreviations".
So I now feel that not only can we not remove initdb's discrimination
against "Factory", but we indeed need to make the pg_timezone_names
filter more aggressive.  Hence, I now propose 0002 below to tweak
what we're doing with "Factory".  I did remove our special cases for
it in zic.c, as we don't need them anymore with modern tzdb data, and
there's no reason to support running "zic -P" with hacked-up data.

            regards, tom lane

[1] https://www.postgresql.org/message-id/CADT4RqCCnj6FKLisvT8tTPfTP4azPhhDFJqDF1JfBbOH5w4oyQ@mail.gmail.com

diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index a5c9c9e..786e787 100644
--- a/src/bin/initdb/findtimezone.c
+++ b/src/bin/initdb/findtimezone.c
@@ -608,22 +608,28 @@ check_system_link_file(const char *linkname, struct tztry *tt,
 /*
  * Given a timezone name, determine whether it should be preferred over other
  * names which are equally good matches. The output is arbitrary but we will
- * use 0 for "neutral" default preference.
- *
- * Ideally we'd prefer the zone.tab/zone1970.tab names, since in general those
- * are the ones offered to the user to select from. But for the moment, to
- * minimize changes in behaviour, simply prefer UTC over alternative spellings
- * such as UCT that otherwise cause confusion. The existing "shortest first"
- * rule would prefer "UTC" over "Etc/UTC" so keep that the same way (while
- * still preferring Etc/UTC over Etc/UCT).
+ * use 0 for "neutral" default preference; larger values are more preferred.
  */
 static int
 zone_name_pref(const char *zonename)
 {
+    /*
+     * Prefer UTC over alternatives such as UCT.  Also prefer Etc/UTC over
+     * Etc/UCT; but UTC is preferred to Etc/UTC.
+     */
     if (strcmp(zonename, "UTC") == 0)
         return 50;
     if (strcmp(zonename, "Etc/UTC") == 0)
         return 40;
+
+    /*
+     * We don't want to pick "localtime" or "posixrules", unless we can find
+     * no other name for the prevailing zone.  Those aren't real zone names.
+     */
+    if (strcmp(zonename, "localtime") == 0 ||
+        strcmp(zonename, "posixrules") == 0)
+        return -50;
+
     return 0;
 }

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 4d8db1a..972fcd2 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -4826,16 +4826,15 @@ pg_timezone_names(PG_FUNCTION_ARGS)
             continue;            /* ignore if conversion fails */

         /*
-         * Ignore zic's rather silly "Factory" time zone.  The long string
-         * about "see zic manual page" is used in tzdata versions before
-         * 2016g; we can drop it someday when we're pretty sure no such data
-         * exists in the wild on platforms using --with-system-tzdata.  In
-         * 2016g and later, the time zone abbreviation "-00" is used for
-         * "Factory" as well as some invalid cases, all of which we can
-         * reasonably omit from the pg_timezone_names view.
+         * IANA's rather silly "Factory" time zone used to emit ridiculously
+         * long "abbreviations" such as "Local time zone must be set--see zic
+         * manual page" or "Local time zone must be set--use tzsetup".  While
+         * modern versions of tzdb emit the much saner "-00", it seems some
+         * benighted packagers are hacking the IANA data so that it continues
+         * to produce these strings.  To prevent producing a weirdly wide
+         * abbrev column, reject ridiculously long abbreviations.
          */
-        if (tzn && (strcmp(tzn, "-00") == 0 ||
-                    strcmp(tzn, "Local time zone must be set--see zic manual page") == 0))
+        if (tzn && strlen(tzn) > 31)
             continue;

         /* Found a displayable zone */
diff --git a/src/timezone/zic.c b/src/timezone/zic.c
index 95ab854..c27fb45 100644
--- a/src/timezone/zic.c
+++ b/src/timezone/zic.c
@@ -2443,13 +2443,10 @@ writezone(const char *const name, const char *const string, char version,
                     unsigned char tm = types[i];
                     char       *thisabbrev = &thischars[indmap[desigidx[tm]]];

-                    /* filter out assorted junk entries */
-                    if (strcmp(thisabbrev, GRANDPARENTED) != 0 &&
-                        strcmp(thisabbrev, "zzz") != 0)
-                        fprintf(stdout, "%s\t" INT64_FORMAT "%s\n",
-                                thisabbrev,
-                                utoffs[tm],
-                                isdsts[tm] ? "\tD" : "");
+                    fprintf(stdout, "%s\t" INT64_FORMAT "%s\n",
+                            thisabbrev,
+                            utoffs[tm],
+                            isdsts[tm] ? "\tD" : "");
                 }
             }
             /* Print the default type if we have no transitions at all */
@@ -2458,13 +2455,10 @@ writezone(const char *const name, const char *const string, char version,
                 unsigned char tm = defaulttype;
                 char       *thisabbrev = &thischars[indmap[desigidx[tm]]];

-                /* filter out assorted junk entries */
-                if (strcmp(thisabbrev, GRANDPARENTED) != 0 &&
-                    strcmp(thisabbrev, "zzz") != 0)
-                    fprintf(stdout, "%s\t" INT64_FORMAT "%s\n",
-                            thisabbrev,
-                            utoffs[tm],
-                            isdsts[tm] ? "\tD" : "");
+                fprintf(stdout, "%s\t" INT64_FORMAT "%s\n",
+                        thisabbrev,
+                        utoffs[tm],
+                        isdsts[tm] ? "\tD" : "");
             }
         }


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Question about MemoryContexts / possible memory leak inCachedPlanSource usage
Next
From: Tom Lane
Date:
Subject: Re: "localtime" value in TimeZone