Thread: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Re: Tom Lane 2019-04-26 <E1hK8qL-0005yH-VX@gemulon.postgresql.org> > Update time zone data files to tzdata release 2019a. > > DST law changes in Palestine and Metlakatla. > Historical corrections for Israel. > > Etc/UCT is now a backward-compatibility link to Etc/UTC, instead > of being a separate zone that generates the abbreviation "UCT", > which nowadays is typically a typo. Postgres will still accept > "UCT" as an input zone name, but it won't output it. There is something wrong here. On Debian Buster/unstable, using system tzdata (2019a-1), if /etc/timezone is "Etc/UTC": 11.3's initdb adds timezone = 'UCT' to postgresql.conf 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf Is that expected behavior? Docker users are complaining that "UCT" messes up their testsuites. https://github.com/docker-library/postgres/issues/577 Christoph
>>>>> "Christoph" == Christoph Berg <myon@debian.org> writes: >> Etc/UCT is now a backward-compatibility link to Etc/UTC, instead of >> being a separate zone that generates the abbreviation "UCT", which >> nowadays is typically a typo. Postgres will still accept "UCT" as an >> input zone name, but it won't output it. Christoph> There is something wrong here. On Debian Buster/unstable, Christoph> using system tzdata (2019a-1), if /etc/timezone is Christoph> "Etc/UTC": Christoph> 11.3's initdb adds timezone = 'UCT' to postgresql.conf Christoph> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf Christoph> Is that expected behavior? It's clearly not what users expect and it's clearly the wrong thing to do, though it's the expected behavior of the current code: * On most systems, we rely on trying to match the observable behavior of * the C library's localtime() function. The database zone that matches * furthest into the past is the one to use. Often there will be several * zones with identical rankings (since the IANA database assigns multiple * names to many zones). We break ties arbitrarily by preferring shorter, * then alphabetically earlier zone names. I believe I pointed out a long, long time ago that this tie-breaking strategy was insane, and that the rule should be to prefer canonical names and use something else only in the case of a strictly better match. If TZ is set or if /etc/localtime is a symlink rather than a hardlink or copy of the zone file, then PG can get the zone name directly rather than having to do the comparisons, so the above comment doesn't apply; that gives you a workaround. -- Andrew (irc:RhodiumToad)
Christoph Berg <myon@debian.org> writes: > There is something wrong here. On Debian Buster/unstable, using > system tzdata (2019a-1), if /etc/timezone is "Etc/UTC": > 11.3's initdb adds timezone = 'UCT' to postgresql.conf > 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf Hm, I don't have a Debian machine at hand, but I'm unable to reproduce this using macOS or RHEL. I tried things like $ TZ=UTC initdb ... selecting default timezone ... UTC ... Is your build using --with-system-tzdata? If so, which tzdb release is the system on, and is it a completely stock copy of that release? Given the tie-breaking behavior in findtimezone.c, * ... Often there will be several * zones with identical rankings (since the IANA database assigns multiple * names to many zones). We break ties arbitrarily by preferring shorter, * then alphabetically earlier zone names. it's not so surprising that UCT might be chosen, but I don't understand how Etc/UCT would be. BTW, does Debian set up /etc/timezone as a symlink, by any chance, rather than a copy or hard link? If it's a symlink, we could improve matters by teaching identify_system_timezone() to inspect it. regards, tom lane
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > I believe I pointed out a long, long time ago that this tie-breaking > strategy was insane, and that the rule should be to prefer canonical > names and use something else only in the case of a strictly better > match. This is assuming that the tzdb data has a concept of a canonical name for a zone, which unfortunately it does not. UTC, UCT, Etc/UTC, and about four other strings are equivalent names for the same zone so far as one can tell from the installed data. We could imagine layering some additional data on top of tzdb, but I don't much want to go there from a maintenance standpoint. regards, tom lane
Hi, On 2019-06-04 11:27:31 -0400, Tom Lane wrote: > Hm, I don't have a Debian machine at hand, but I'm unable to > reproduce this using macOS or RHEL. I tried things like > > $ TZ=UTC initdb > ... > selecting default timezone ... UTC > ... On debian unstable that's what I get too, both with system and PG tzdata. > BTW, does Debian set up /etc/timezone as a symlink, by any chance, > rather than a copy or hard link? If it's a symlink, we could improve > matters by teaching identify_system_timezone() to inspect it. On my system it's a copy (link count 1, not a symlink). Or did you mean /etc/localtime? Because that's indeed a symlink. If I set the system-wide default, using dpkg-reconfigure -plow tzdata, to UTC I *do* get Etc/UTC. root@alap4:/home/andres/src/postgresql# cat /etc/timezone Etc/UTC root@alap4:/home/andres/src/postgresql# ls -l /etc/timezone -rw-r--r-- 1 root root 8 Jun 4 15:44 /etc/timezone selecting default timezone ... Etc/UTC This is independent of being built with system or non-system tzdata. Enabling debugging shows: selecting default timezone ... symbolic link "/etc/localtime" contains "/usr/share/zoneinfo/Etc/UCT" TZ "Etc/UCT" gets max score 5200 Etc/UCT Greetings, Andres Freund
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >> I believe I pointed out a long, long time ago that this tie-breaking >> strategy was insane, and that the rule should be to prefer canonical >> names and use something else only in the case of a strictly better >> match. Tom> This is assuming that the tzdb data has a concept of a canonical Tom> name for a zone, which unfortunately it does not. UTC, UCT, Tom> Etc/UTC, and about four other strings are equivalent names for the Tom> same zone so far as one can tell from the installed data. The simplest definition is that the names listed in zone.tab or zone1970.tab if you prefer that one are canonical, and Etc/UTC and the Etc/GMT[offset] names could be regarded as canonical too. Everything else is either an alias or a backward-compatibility hack. -- Andrew (irc:RhodiumToad)
Hi, On 2019-06-04 11:27:31 -0400, Tom Lane wrote: > $ TZ=UTC initdb > ... > selecting default timezone ... UTC > ... Btw, if the input is Etc/UTZ, do you also get UTC or Etc/UTZ? Because it seems that debian only configures Etc/UTZ on a system-wide basis now. Which seems not insane, given that's it's a backward compat thing now. - Andres
>>>>> "Christoph" == Christoph Berg <myon@debian.org> writes: Christoph> There is something wrong here. On Debian Buster/unstable, Christoph> using system tzdata (2019a-1), if /etc/timezone is Christoph> "Etc/UTC": Christoph> 11.3's initdb adds timezone = 'UCT' to postgresql.conf Christoph> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf fwiw on FreeBSD with no /etc/localtime and no TZ in the environment (and hence running on UTC), I get "UCT" on both 11.3 and HEAD. -- Andrew (irc:RhodiumToad)
Hi, On 2019-06-04 08:53:30 -0700, Andres Freund wrote: > If I set the system-wide default, using dpkg-reconfigure -plow tzdata, > to UTC I *do* get Etc/UTC. > > root@alap4:/home/andres/src/postgresql# cat /etc/timezone > Etc/UTC > root@alap4:/home/andres/src/postgresql# ls -l /etc/timezone > -rw-r--r-- 1 root root 8 Jun 4 15:44 /etc/timezone > > selecting default timezone ... Etc/UTC > > This is independent of being built with system or non-system tzdata. > > Enabling debugging shows: Sorry, I was not awake enough while reading the thread (and UCT looks so similar to UTC). I do indeed see the behaviour of choosing UCT in 11, but not in 12. Independent of system/non-system tzdata. With system tzdata, I get the following debug output (after filtering lots of lines wiht out |grep -v 'scores 0'|grep -v 'uses leap seconds') TZ "Zulu" gets max score 5200 TZ "UCT" gets max score 5200 TZ "Universal" gets max score 5200 TZ "UTC" gets max score 5200 TZ "Etc/Zulu" gets max score 5200 TZ "Etc/UCT" gets max score 5200 TZ "Etc/Universal" gets max score 5200 TZ "Etc/UTC" gets max score 5200 TZ "localtime" gets max score 5200 TZ "posix/Zulu" gets max score 5200 TZ "posix/UCT" gets max score 5200 TZ "posix/Universal" gets max score 5200 TZ "posix/UTC" gets max score 5200 TZ "posix/Etc/Zulu" gets max score 5200 TZ "posix/Etc/UCT" gets max score 5200 TZ "posix/Etc/Universal" gets max score 5200 TZ "posix/Etc/UTC" gets max score 5200 ok whereas master only does: selecting default timezone ... symbolic link "/etc/localtime" contains "/usr/share/zoneinfo/Etc/UTC" TZ "Etc/UTC" gets max score 5200 Etc/UTC The reason for the behaviour difference between v12 and 11 is that 12 does: /* * Try to avoid the brute-force search by seeing if we can recognize the * system's timezone setting directly. * * Currently we just check /etc/localtime; there are other conventions for * this, but that seems to be the only one used on enough platforms to be * worth troubling over. */ if (check_system_link_file("/etc/localtime", &tt, resultbuf)) return resultbuf; which prevents having to iterate through all of these files, and ending up with a lot of equivalently scored timezones. Greetings, Andres Freund
Hi, On 2019-06-04 17:20:42 +0100, Andrew Gierth wrote: > fwiw on FreeBSD with no /etc/localtime and no TZ in the environment (and > hence running on UTC), I get "UCT" on both 11.3 and HEAD. That makes sense. As far as I can tell the reason that 12 sometimes ends up with the proper timezone is that we shortcut the search by: /* * Try to avoid the brute-force search by seeing if we can recognize the * system's timezone setting directly. * * Currently we just check /etc/localtime; there are other conventions for * this, but that seems to be the only one used on enough platforms to be * worth troubling over. */ if (check_system_link_file("/etc/localtime", &tt, resultbuf)) return resultbuf; which is actually a behaviour changing, rather than just an optimization, when there's a lot of equivalently scoring timezones. Greetings, Andres Freund
Re: Tom Lane 2019-06-04 <65800.1559662051@sss.pgh.pa.us> > > There is something wrong here. On Debian Buster/unstable, using > > system tzdata (2019a-1), if /etc/timezone is "Etc/UTC": > > Is your build using --with-system-tzdata? If so, which tzdb > release is the system on, and is it a completely stock copy > of that release? It's using system tzdata (2019a-1). There's one single patch on top of that: https://sources.debian.org/src/tzdata/2019a-1/debian/patches/ > BTW, does Debian set up /etc/timezone as a symlink, by any chance, > rather than a copy or hard link? If it's a symlink, we could improve > matters by teaching identify_system_timezone() to inspect it. In the meantime I realized that I was only testing /etc/timezone (which is a plain file with just the zone name), while not touching /etc/localtime at all. In this environment, it's a symlink: lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC ... but the name still gets canonicalized to Etc/UCT or UCT. Christoph
[ sorry for slow response, I'm on vacation ] Andres Freund <andres@anarazel.de> writes: > That makes sense. As far as I can tell the reason that 12 sometimes ends > up with the proper timezone is that we shortcut the search by: > /* > * Try to avoid the brute-force search by seeing if we can recognize the > * system's timezone setting directly. > * > * Currently we just check /etc/localtime; there are other conventions for > * this, but that seems to be the only one used on enough platforms to be > * worth troubling over. > */ > if (check_system_link_file("/etc/localtime", &tt, resultbuf)) > return resultbuf; > which is actually a behaviour changing, rather than just an > optimization, when there's a lot of equivalently scoring timezones. Sure, that is intentionally a behavior change in this situation. The theory is that if "Etc/UCT" is what the user put in /etc/localtime, then that's the spelling she wants. See 23bd3cec6. But it seems to me that this code is *not* determining the result in Christoph's case, because if it were, it'd be settling on Etc/UTC, according to his followup report that >> lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC I'm not too familiar with what actually determines glibc's behavior on Debian, but I'm suspicious that there's an inconsistency between /etc/localtime and /etc/timezone. We won't adopt the spelling we see in /etc/localtime unless it agrees with the observed behavior of localtime(3). regards, tom lane
Hi, On 2019-06-06 12:51:30 -0400, Tom Lane wrote: > [ sorry for slow response, I'm on vacation ] Good. > Andres Freund <andres@anarazel.de> writes: > > That makes sense. As far as I can tell the reason that 12 sometimes ends > > up with the proper timezone is that we shortcut the search by: > > > /* > > * Try to avoid the brute-force search by seeing if we can recognize the > > * system's timezone setting directly. > > * > > * Currently we just check /etc/localtime; there are other conventions for > > * this, but that seems to be the only one used on enough platforms to be > > * worth troubling over. > > */ > > if (check_system_link_file("/etc/localtime", &tt, resultbuf)) > > return resultbuf; > > > which is actually a behaviour changing, rather than just an > > optimization, when there's a lot of equivalently scoring timezones. > > Sure, that is intentionally a behavior change in this situation. > The theory is that if "Etc/UCT" is what the user put in /etc/localtime, > then that's the spelling she wants. See 23bd3cec6. Right, I'm not complaining about that. I'm just noting that that explains the cross-version divergence. Note that on 11 I *do* end up with some *other* timezone with the newer timezone data: $cat /etc/timezone;ls -l /etc/localtime Etc/UTC lrwxrwxrwx 1 root root 27 Jun 6 17:02 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC $ rm -rf /tmp/tztest;~/build/postgres/11-assert/install/bin/initdb /tmp/tztest 2>&1|grep -v 'scores 0'|grep -v 'uses leapseconds';grep timezone /tmp/tztest/postgresql.conf ... TZ "Zulu" gets max score 5200 TZ "UCT" gets max score 5200 TZ "Universal" gets max score 5200 TZ "UTC" gets max score 5200 TZ "Etc/Zulu" gets max score 5200 TZ "Etc/UCT" gets max score 5200 TZ "Etc/Universal" gets max score 5200 TZ "Etc/UTC" gets max score 5200 TZ "localtime" gets max score 5200 TZ "posix/Zulu" gets max score 5200 TZ "posix/UCT" gets max score 5200 TZ "posix/Universal" gets max score 5200 TZ "posix/UTC" gets max score 5200 TZ "posix/Etc/Zulu" gets max score 5200 TZ "posix/Etc/UCT" gets max score 5200 TZ "posix/Etc/Universal" gets max score 5200 TZ "posix/Etc/UTC" gets max score 5200 ok ... log_timezone = 'UCT' timezone = 'UCT' #timezone_abbreviations = 'Default' # Select the set of available time zone # share/timezonesets/. As you can see the switch from Etc/UTC to UCT does happen here (presumably in any branch before 12). Which did not happen before the import of 2019a / when using a system tzdata that's before that. There you get: TZ "Zulu" gets max score 5200 TZ "Universal" gets max score 5200 TZ "UTC" gets max score 5200 TZ "Etc/Zulu" gets max score 5200 TZ "Etc/Universal" gets max score 5200 TZ "Etc/UTC" gets max score 5200 ok and end up with UTC as the selection. I do think that < 12 clearly regressed here, although it's only exposing previous behaviour further. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-06-06 12:51:30 -0400, Tom Lane wrote: >> Sure, that is intentionally a behavior change in this situation. >> The theory is that if "Etc/UCT" is what the user put in /etc/localtime, >> then that's the spelling she wants. See 23bd3cec6. > Right, I'm not complaining about that. I'm just noting that that > explains the cross-version divergence. It explains some cross-version divergence for sure. What I'm still not clear about is whether Christoph's report is entirely that, or whether there's some additional factor we don't understand yet. > As you can see the switch from Etc/UTC to UCT does happen here > (presumably in any branch before 12). Which did not happen before the > import of 2019a / when using a system tzdata that's before > that. Right. Before 2019a, UCT would not have been a match to a system setting of UTC because the zone abbreviation reported by localtime() was different. Now it's the same abbreviation. Maybe we should consider back-patching 23bd3cec6. regards, tom lane
Christoph Berg <myon@debian.org> writes: > In the meantime I realized that I was only testing /etc/timezone > (which is a plain file with just the zone name), while not touching > /etc/localtime at all. In this environment, it's a symlink: > lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC > ... but the name still gets canonicalized to Etc/UCT or UCT. Now that I'm home again, I tried to replicate this behavior. I don't have Debian Buster installed, but I do have an up-to-date Stretch install, and I can't get it to do this. What I see is that 1. HEAD will follow the spelling appearing in /etc/localtime, if that's a symlink. It will not pay any attention to /etc/timezone --- but as far as I can tell, glibc doesn't either. (For instance, if I remove /etc/localtime, then date(1) starts reporting UTC, independently of what /etc/timezone might say.) 2. Pre-v12, or if we can't get a valid zone name out of /etc/localtime, the identify_system_timezone() search settles on "UCT" as being the shortest and alphabetically first of the various equivalent names for the zone. The only way I can get it to pick "Etc/UCT" is if that's what I put into /etc/localtime. (In which case I maintain that that's not a bug, or at least not our bug.) So I'm still mystified by Christoph's report, and am forced to suspect pilot error -- specifically, /etc/localtime not containing what he said. Anyway, moving on to the question of what should we do about this, I don't really have anything better to offer than back-patching 23bd3cec6. I'm fairly hesitant to do that given the small amount of testing it's gotten ... but given that it's been in the tree since September, maybe we can feel like we'd have noticed any really bad problems. I don't have any use for Andrew's suggestion of looking into zone1970.tab: in the first place I'm unconvinced that the tzdb guys intend that file to offer canonical zone names, and in the second place I doubt we can rely on the file to be present (it's not installed by zic itself), and in the third place it definitely won't fix this particular issue because it has no entries for UTC/UCT/GMT etc, only for geographical locations. Thoughts? regards, tom lane PS: As a side note, I do notice an interesting difference between the timezone database files as they appear on Debian versus what I see on RHEL or in a PG-generated timezone tree. Debian seems to use symlinks for multiple equivalent zones: $ ls -l /usr/share/zoneinfo/U?? -rw-r--r-- 1 root root 127 Mar 27 16:34 /usr/share/zoneinfo/UCT lrwxrwxrwx 1 root root 3 Mar 27 16:34 /usr/share/zoneinfo/UTC -> UCT $ ls -l /usr/share/zoneinfo/Etc/U?? lrwxrwxrwx 1 root root 6 Mar 27 16:34 /usr/share/zoneinfo/Etc/UCT -> ../UCT lrwxrwxrwx 1 root root 6 Mar 27 16:34 /usr/share/zoneinfo/Etc/UTC -> ../UCT but elsewhere these are hard links: $ ls -l /usr/share/zoneinfo/U?? -rw-r--r--. 8 root root 118 Mar 26 11:37 /usr/share/zoneinfo/UCT -rw-r--r--. 8 root root 118 Mar 26 11:37 /usr/share/zoneinfo/UTC $ ls -l /usr/share/zoneinfo/Etc/U?? -rw-r--r--. 8 root root 118 Mar 26 11:37 /usr/share/zoneinfo/Etc/UCT -rw-r--r--. 8 root root 118 Mar 26 11:37 /usr/share/zoneinfo/Etc/UTC However, identify_system_timezone() doesn't treat symlinks differently from regular files, so this doesn't explain anything about the problem at hand, AFAICS.
Re: Tom Lane 2019-06-11 <24452.1560285699@sss.pgh.pa.us> > The only way I can get it to pick "Etc/UCT" is if that's what I put > into /etc/localtime. (In which case I maintain that that's not a bug, > or at least not our bug.) Did you try a symlink or a plain file for /etc/localtime? > So I'm still mystified by Christoph's report, and am forced to suspect > pilot error -- specifically, /etc/localtime not containing what he said. On Debian unstable, deleting /etc/timezone, $TZ not set, and with this symlink: lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC /usr/lib/postgresql/11/bin/initdb -D pgdata $ grep timezone pgdata/postgresql.conf log_timezone = 'UCT' timezone = 'UCT' /usr/lib/postgresql/12/bin/initdb -D pgdata $ grep timezone pgdata/postgresql.conf log_timezone = 'Etc/UTC' timezone = 'Etc/UTC' Same behavior on Debian Stretch (stable): lrwxrwxrwx 1 root root 27 Mai 7 11:14 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC $ grep timezone pgdata/postgresql.conf log_timezone = 'UCT' timezone = 'UCT' $ grep timezone pgdata/postgresql.conf log_timezone = 'Etc/UTC' timezone = 'Etc/UTC' > Anyway, moving on to the question of what should we do about this, > I don't really have anything better to offer than back-patching 23bd3cec6. The PG12 behavior seems sane, so +1. Christoph
Christoph Berg <myon@debian.org> writes: > Re: Tom Lane 2019-06-11 <24452.1560285699@sss.pgh.pa.us> >> The only way I can get it to pick "Etc/UCT" is if that's what I put >> into /etc/localtime. (In which case I maintain that that's not a bug, >> or at least not our bug.) > Did you try a symlink or a plain file for /etc/localtime? Symlink --- if it's a plain file, our code can't learn anything from it. > On Debian unstable, deleting /etc/timezone, $TZ not set, and with this symlink: > lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC > /usr/lib/postgresql/11/bin/initdb -D pgdata > $ grep timezone pgdata/postgresql.conf > log_timezone = 'UCT' > timezone = 'UCT' > /usr/lib/postgresql/12/bin/initdb -D pgdata > $ grep timezone pgdata/postgresql.conf > log_timezone = 'Etc/UTC' > timezone = 'Etc/UTC' That's what I'd expect. Do you think your upthread report of HEAD picking "Etc/UCT" was a typo? Or maybe you actually had /etc/localtime set that way? >> Anyway, moving on to the question of what should we do about this, >> I don't really have anything better to offer than back-patching 23bd3cec6. > The PG12 behavior seems sane, so +1. OK, I'll make that happen. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >>> Anyway, moving on to the question of what should we do about this, >>> I don't really have anything better to offer than back-patching >>> 23bd3cec6. >> The PG12 behavior seems sane, so +1. Tom> OK, I'll make that happen. This isn't good enough, because it still picks "UCT" on a system with no /etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on FreeBSD, but it'll be the same anywhere else): % ls -l /etc/*time* ls: /etc/*time*: No such file or directory % env -u TZ bin/initdb -D data -E UTF8 --no-locale [...] selecting default timezone ... UCT We need to absolutely prefer UTC over UCT if both match. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > This isn't good enough, because it still picks "UCT" on a system with no > /etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on > FreeBSD, but it'll be the same anywhere else): [ shrug... ] Too bad. I doubt that that's a common situation anyway. > We need to absolutely prefer UTC over UCT if both match. I don't see a reason why that's a hard requirement. There are at least two ways for a user to override initdb's decision (/etc/localtime or TZ), or she could just change the GUC setting after the fact, and for that matter it's not obvious that it matters to most people how TimeZone is spelled as long as it delivers the right external behavior. We had the business with "Navajo" being preferred for US Mountain time for quite a few years, with not very many complaints. I don't see any way that we could "fix" this except with a hardwired special case to prefer UTC over other spellings, and I definitely do not want to go there. If we start putting in magic special cases to make particular zone names be preferred over other ones, where will we stop? (I've been lurking on the tzdb mailing list for long enough now to know that that's a fine recipe for opening ourselves up to politically- motivated demands that name X be preferred over name Y.) A possibly better idea is to push back on tzdb's choice to unify these zones. Don't know if they'd listen, but we could try. The UCT symlink hasn't been out there so long that it's got much inertia. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> This isn't good enough, because it still picks "UCT" on a system with no >> /etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on >> FreeBSD, but it'll be the same anywhere else): Tom> [ shrug... ] Too bad. I doubt that that's a common situation anyway. Literally every server I have set up is like this... >> We need to absolutely prefer UTC over UCT if both match. Tom> I don't see a reason why that's a hard requirement. Because the reverse is clearly insane. -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
From
Christopher Browne
Date:
On Fri, Jun 14, 2019, 3:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
A possibly better idea is to push back on tzdb's choice to unify
these zones. Don't know if they'd listen, but we could try. The
UCT symlink hasn't been out there so long that it's got much inertia.
One oddity; AIX had a preference for CUT with fallbacks to CUT0 and UCT back when we had AIX boxes (5.2 or 5.3, if my memory still works on this).
We wound up setting PGTZ explicitly to UTC to overrule any such fighting between time zones.
There may therefore be some older history (and some sort of inertia) in AIX land than meets the eye elsewhere.
That doesn't prevent it from being a good idea to talk to tzdb maintainers, of course.
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> This isn't good enough, because it still picks "UCT" on a system >> with no /etc/localtime and no TZ variable. Testing on HEAD as of >> 3da73d683 (on FreeBSD, but it'll be the same anywhere else): Tom> [ shrug... ] Too bad. I doubt that that's a common situation anyway. I'm also reminded that this applies also if the /etc/localtime file is a _copy_ of the UTC zonefile rather than a symlink, which is possibly even more common. -- Andrew (irc:RhodiumToad)
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >>> This isn't good enough, because it still picks "UCT" on a system >>> with no /etc/localtime and no TZ variable. Testing on HEAD as of >>> 3da73d683 (on FreeBSD, but it'll be the same anywhere else): Tom> [ shrug... ] Too bad. I doubt that that's a common situation anyway. Andrew> I'm also reminded that this applies also if the /etc/localtime Andrew> file is a _copy_ of the UTC zonefile rather than a symlink, Andrew> which is possibly even more common. And testing shows that if you select "UTC" when installing FreeBSD, you indeed get /etc/localtime as a copy not a symlink, and I've confirmed that initdb picks "UCT" in that case. So here is my current proposed fix. -- Andrew (irc:RhodiumToad) diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c index 3477a08efd..f7c199a006 100644 --- a/src/bin/initdb/findtimezone.c +++ b/src/bin/initdb/findtimezone.c @@ -128,8 +128,11 @@ pg_load_tz(const char *name) * the C library's localtime() function. The database zone that matches * furthest into the past is the one to use. Often there will be several * zones with identical rankings (since the IANA database assigns multiple - * names to many zones). We break ties arbitrarily by preferring shorter, - * then alphabetically earlier zone names. + * names to many zones). We break ties by first checking for "preferred" + * names (such as "UTC"), and then arbitrarily by preferring shorter, then + * alphabetically earlier zone names. (If we did not explicitly prefer + * "UTC", we would get the alias name "UCT" instead due to alphabetic + * ordering.) * * Many modern systems use the IANA database, so if we can determine the * system's idea of which zone it is using and its behavior matches our zone @@ -602,6 +605,28 @@ check_system_link_file(const char *linkname, struct tztry *tt, #endif } +/* + * Given a timezone name, determine whether it should be preferred over other + * names which are equally good matches. The output is arbitrary but we will + * use 0 for "neutral" default preference. + * + * Ideally we'd prefer the zone.tab/zone1970.tab names, since in general those + * are the ones offered to the user to select from. But for the moment, to + * minimize changes in behaviour, simply prefer UTC over alternative spellings + * such as UCT that otherwise cause confusion. The existing "shortest first" + * rule would prefer "UTC" over "Etc/UTC" so keep that the same way (while + * still preferring Etc/UTC over Etc/UCT). + */ +static int +zone_name_pref(const char *zonename) +{ + if (strcmp(zonename, "UTC") == 0) + return 50; + if (strcmp(zonename, "Etc/UTC") == 0) + return 40; + return 0; +} + /* * Recursively scan the timezone database looking for the best match to * the system timezone behavior. @@ -674,7 +699,8 @@ scan_available_timezones(char *tzdir, char *tzdirsub, struct tztry *tt, else if (score == *bestscore) { /* Consider how to break a tie */ - if (strlen(tzdirsub) < strlen(bestzonename) || + if (zone_name_pref(tzdirsub) > zone_name_pref(bestzonename) || + strlen(tzdirsub) < strlen(bestzonename) || (strlen(tzdirsub) == strlen(bestzonename) && strcmp(tzdirsub, bestzonename) < 0)) strlcpy(bestzonename, tzdirsub, TZ_STRLEN_MAX + 1);
Re: Tom Lane 2019-06-14 <26948.1560517875@sss.pgh.pa.us> > > /usr/lib/postgresql/12/bin/initdb -D pgdata > > $ grep timezone pgdata/postgresql.conf > > log_timezone = 'Etc/UTC' > > timezone = 'Etc/UTC' > > That's what I'd expect. Do you think your upthread report of HEAD > picking "Etc/UCT" was a typo? Or maybe you actually had /etc/localtime > set that way? That was likely a typo, yes. Sorry for the confusion, there's many variables... Christoph
On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote: > So here is my current proposed fix. Before pushing a commit that's controversial - and this clearly seems to somewhat be - it'd be good to give others a heads up that you intend to do so, so they can object. Rather than just pushing less than 24h later, without a warning. - Andres
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote: > > So here is my current proposed fix. > > Before pushing a commit that's controversial - and this clearly seems to > somewhat be - it'd be good to give others a heads up that you intend to > do so, so they can object. Rather than just pushing less than 24h later, > without a warning. Seems like that would have meant a potentially very late commit to avoid having a broken (for some value of broken anyway) point release (either with new code, or with reverting the timezone changes previously committed), which isn't great either. In general, I agree with you, and we should try to give everyone time to discuss when something is controversial, but this seems like it was at least a bit of a tough call. Thanks, Stephen
Attachment
Hi, On 2019-06-17 14:34:58 -0400, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote: > > > So here is my current proposed fix. > > > > Before pushing a commit that's controversial - and this clearly seems to > > somewhat be - it'd be good to give others a heads up that you intend to > > do so, so they can object. Rather than just pushing less than 24h later, > > without a warning. > > Seems like that would have meant a potentially very late commit to avoid > having a broken (for some value of broken anyway) point release (either > with new code, or with reverting the timezone changes previously > committed), which isn't great either. > In general, I agree with you, and we should try to give everyone time to > discuss when something is controversial, but this seems like it was at > least a bit of a tough call. Hm? All I'm saying is that Andrew's email should have included something to the effect of "Due to the upcoming release, I'm intending to push and backpatch the attached fix in ~20h". Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-06-17 14:34:58 -0400, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote: > > > > So here is my current proposed fix. > > > > > > Before pushing a commit that's controversial - and this clearly seems to > > > somewhat be - it'd be good to give others a heads up that you intend to > > > do so, so they can object. Rather than just pushing less than 24h later, > > > without a warning. > > > > Seems like that would have meant a potentially very late commit to avoid > > having a broken (for some value of broken anyway) point release (either > > with new code, or with reverting the timezone changes previously > > committed), which isn't great either. > > > In general, I agree with you, and we should try to give everyone time to > > discuss when something is controversial, but this seems like it was at > > least a bit of a tough call. > > Hm? All I'm saying is that Andrew's email should have included something > to the effect of "Due to the upcoming release, I'm intending to push and > backpatch the attached fix in ~20h". Ah, ok, I agree that would have been good to do. Of course, hindsight being 20/20 and all that. Something to keep in mind for the future though. Thanks, Stephen
Attachment
On Mon, Jun 17, 2019 at 2:41 PM Stephen Frost <sfrost@snowman.net> wrote: > Ah, ok, I agree that would have been good to do. Of course, hindsight > being 20/20 and all that. Something to keep in mind for the future > though. I think it was inappropriate to commit this at all. You can't just say "some other committer objects, but I think I'm right so I'll just ignore them and commit anyway." If we all do that it'll be chaos. I don't know exactly how many concurring vote it takes to override somebody else's -1, but it's got to be more than zero. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jun 17, 2019 at 2:41 PM Stephen Frost <sfrost@snowman.net> wrote: >> Ah, ok, I agree that would have been good to do. Of course, hindsight >> being 20/20 and all that. Something to keep in mind for the future >> though. > I think it was inappropriate to commit this at all. You can't just > say "some other committer objects, but I think I'm right so I'll just > ignore them and commit anyway." If we all do that it'll be chaos. FWIW, that was my concern about this. > I don't know exactly how many concurring vote it takes to override > somebody else's -1, but it's got to be more than zero. If even one other person had +1'd Andrew's proposal, I'd have yielded to the consensus --- this was certainly an issue on which it's not totally clear what to do. But unless I missed some traffic, the vote was exactly 1 to 1. There is no way that that represents consensus to commit. Also on the topic of process: 48 hours before a wrap deadline is *particularly* not the time to play fast and loose with this sort of thing. It'd have been better to wait till after this week's releases, so there'd at least be time to reconsider if the patch turned out to have unexpected side-effects. regards, tom lane
BTW ... now that that patch has been in long enough to collect some actual data on what it's doing, I set out to scrape the buildfarm logs to see what is happening in the farm. Here are the popularities of various timezone settings, as of the end of May: 3 America/Los_Angeles 9 America/New_York 3 America/Sao_Paulo 2 Asia/Tokyo 2 CET 24 Etc/UTC 3 Europe/Amsterdam 11 Europe/Berlin 1 Europe/Brussels 1 Europe/Helsinki 1 Europe/Isle_of_Man 2 Europe/London 7 Europe/Paris 6 Europe/Prague 5 Europe/Stockholm 1 ROK 7 UCT 1 US/Central 7 US/Eastern 2 US/Pacific 15 UTC 1 localtime (These are the zone choices reported in the initdb-C step for the animal's last successful run before 06-01. I excluded animals for which the configuration summary shows that their choice is being forced by a TZ environment variable.) As of now, six of the seven UCT-reporting members have switched to UTC; the lone holdout is elver which hasn't run in ten days. (Perhaps it zneeds unwedged.) There are no other changes, so it seems like Andrew's patch is doing what it says on the tin. However, that one entry for 'localtime' disturbs me. (It's from snapper.) That seems like a particularly useless choice of representation: it's not informative, it's not portable, and it would lead to postmaster startup failure if someone were to remove the machine's localtime file, which I assume is a nonstandard insertion into /usr/share/zoneinfo. Very likely the only reason we don't see this behavior more is that sticking a "localtime" file into /usr/share/zoneinfo is an obsolescent practice. On machines that have such a file, it has a good chance of winning on the grounds of being a short name. So I'm toying with the idea of extending Andrew's patch to put a negative preference on "localtime", ensuring we'll use some other name for the zone if one is available. Also, now that we have this mechanism, maybe we should charge it with de-preferencing the old "Factory" zone, removing the hard-wired kluge that we currently have for rejecting that. (Modern tzdb doesn't install "Factory" at all, but some installations might still do so in the service of blind backwards compatibility.) Thoughts? regards, tom lane
On Thu, Jun 20, 2019 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > As of now, six of the seven UCT-reporting members have switched to UTC; > the lone holdout is elver which hasn't run in ten days. (Perhaps it > zneeds unwedged.) There are no other changes, so it seems like Andrew's > patch is doing what it says on the tin. Oops. Apparentlly REL_10 of the build farm scripts lost the ability to find "buildroot" in the current working directory automatically. I have updated eelpout and elver's .conf file to have an explicit path, and they are now busily building stuff. -- Thomas Munro https://enterprisedb.com
I wrote: > So I'm toying with the idea of extending Andrew's patch to put a negative > preference on "localtime", ensuring we'll use some other name for the zone > if one is available. Oh ... after further review it seems like "posixrules" should be de-preferred on the same basis: it's uninformative and unportable, and it's short enough to have a good chance of capturing initdb's attention. I recall having seen at least one machine picking it recently. Moreover, while I think most tzdb installations have that file (ours certainly do), the handwriting is on the wall for it to go away, leaving only postmaster startup failures behind: http://mm.icann.org/pipermail/tz/2019-June/028172.html regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> So I'm toying with the idea of extending Andrew's patch to put a Tom> negative preference on "localtime", ensuring we'll use some other Tom> name for the zone if one is available. Tom> Also, now that we have this mechanism, maybe we should charge it Tom> with de-preferencing the old "Factory" zone, removing the Tom> hard-wired kluge that we currently have for rejecting that. Tom> (Modern tzdb doesn't install "Factory" at all, but some Tom> installations might still do so in the service of blind backwards Tom> compatibility.) I was planning on submitting a follow-up myself (for pg13+) for discussion of further improvements. My suggestion would be that we should have the following order of preference, from highest to lowest: - UTC (justified by being an international standard) - Etc/UTC - zones in zone.tab/zone1970.tab: These are the zone names that are intended to be presented to the user to select from. Dispute the exact meaning as you will, but I think it makes sense that these names should be chosen over equivalently good matches just on that basis. - zones in Africa/ America/ Antarctica/ Asia/ Atlantic/ Australia/ Europe/ Indian/ Pacific/ Arctic/ These subdirs are the ones generated by the "primary" zone data files, including both Zone and Link statements but not counting the "backward" and "etcetera" files. - GMT (justified on the basis of its presence as a default in the code) - Etc/* - any other zone name with a / - any zone name without a /, excluding 'localtime' and 'Factory' - 'localtime' - 'Factory' Choosing names with / over ones without is a change from our existing preference for shorter names, but it's more robust in the face of the various crap that gets dumped in the top level of the zoneinfo dir. It could be argued that we should reverse the relative order of UTC vs. Etc/UTC and likewise for GMT for the same reason, but I think that's less important. -- Andrew (irc:RhodiumToad)
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> 1 Europe/Isle_of_Man Is this from HEAD and therefore possibly getting the value from an /etc/localtime symlink? I can't see any other way that Europe/Isle_of_Man could ever be chosen over Europe/London... -- Andrew (irc:RhodiumToad)
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Also on the topic of process: 48 hours before a wrap deadline is > *particularly* not the time to play fast and loose with this sort of > thing. It'd have been better to wait till after this week's releases, > so there'd at least be time to reconsider if the patch turned out to > have unexpected side-effects. Our typical process for changes that actually end up breaking other things is to put things back the way they were and come up with a better answer. Should we have reverted the code change that caused the issue in the first place, namely, as I understand it at least, the tz code update, to give us time to come up with a better solution and to fix it properly? I'll admit that I wasn't following the thread very closely initially, but I don't recall seeing that even discussed as an option, even though we do it routinely and even had another such case for this set of releases. Possibly a bad assumption on my part, but I did assume that the lack of such a discussion meant that reverting wasn't really an option due to the nature of the changes, leading us into an atypical case already where our usual processes weren't able to be followed. That doesn't mean we should throw the whole thing out the window either, certainly, but I'm not sure that between the 3 options of 'revert', 'live with things being arguably broken', and 'push a contentious commit' that I'd have seen a better option either. I do agree that it would have been better if intentions had been made clearer, such as announcing the plan to push the changes so that we didn't end up with an issue during this patch set (either from out of date zone information, or from having the wrong timezone alias be used), but also with feelings on both sides- if there had been a more explicit "hey, we really need input from someone else on which way they think this should go" ideally with the options spelled out, it would have helped. I don't want to come across as implying that I'm saying what was done was 'fine', or that we shouldn't be having this conversation, I'm just trying to figure out how we can frame it in a way that we learn from it and work to improve on it for the future, should something like this happen again. Thanks, Stephen
Attachment
On Thu, Jun 20, 2019 at 8:52 AM Stephen Frost <sfrost@snowman.net> wrote: > I don't want to come across as implying that I'm saying what was done > was 'fine', or that we shouldn't be having this conversation, I'm just > trying to figure out how we can frame it in a way that we learn from it > and work to improve on it for the future, should something like this > happen again. I agree that it's a difficult situation. I do kind of wonder whether we were altogether overreacting. If we had shipped it as it was, what's the worst thing that would have happened? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2019-06-20 12:02:30 -0400, Robert Haas wrote: > On Thu, Jun 20, 2019 at 8:52 AM Stephen Frost <sfrost@snowman.net> wrote: > > I don't want to come across as implying that I'm saying what was done > > was 'fine', or that we shouldn't be having this conversation, I'm just > > trying to figure out how we can frame it in a way that we learn from it > > and work to improve on it for the future, should something like this > > happen again. > > I agree that it's a difficult situation. I do kind of wonder whether > we were altogether overreacting. If we had shipped it as it was, > what's the worst thing that would have happened? I think it's not good, but also nothing particularly bad came out of it. I don't think we should try to set up procedures for future occurances, and rather work/plan on that not happening very often. Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-06-20 12:02:30 -0400, Robert Haas wrote: > > On Thu, Jun 20, 2019 at 8:52 AM Stephen Frost <sfrost@snowman.net> wrote: > > > I don't want to come across as implying that I'm saying what was done > > > was 'fine', or that we shouldn't be having this conversation, I'm just > > > trying to figure out how we can frame it in a way that we learn from it > > > and work to improve on it for the future, should something like this > > > happen again. > > > > I agree that it's a difficult situation. I do kind of wonder whether > > we were altogether overreacting. If we had shipped it as it was, > > what's the worst thing that would have happened? > > I think it's not good, but also nothing particularly bad came out of > it. I don't think we should try to set up procedures for future > occurances, and rather work/plan on that not happening very often. Agreed. Thanks, Stephen
Attachment
On 2019-Jun-20, Andres Freund wrote: > On 2019-06-20 12:02:30 -0400, Robert Haas wrote: > > I agree that it's a difficult situation. I do kind of wonder whether > > we were altogether overreacting. If we had shipped it as it was, > > what's the worst thing that would have happened? > > I think it's not good, but also nothing particularly bad came out of > it. I don't think we should try to set up procedures for future > occurances, and rather work/plan on that not happening very often. I suppose we could have a moratorium on commits starting from (say) EOB Wednesday of the week prior to the release; patches can only be committed after that if they have ample support (where "ample support" might be defined as having +1 from, say, two other committers). That way there's time to discuss/revert/fix anything that is deemed controversial. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 20, 2019 at 1:28 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I suppose we could have a moratorium on commits starting from (say) EOB > Wednesday of the week prior to the release; patches can only be > committed after that if they have ample support (where "ample support" > might be defined as having +1 from, say, two other committers). That > way there's time to discuss/revert/fix anything that is deemed > controversial. Or we could have a moratorium on any change at any time that has a -1 from a committer and a +1 from nobody. I mean, your idea is not bad either. I'm just saying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Jun 20, 2019 at 1:28 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > I suppose we could have a moratorium on commits starting from (say) EOB > > Wednesday of the week prior to the release; patches can only be > > committed after that if they have ample support (where "ample support" > > might be defined as having +1 from, say, two other committers). That > > way there's time to discuss/revert/fix anything that is deemed > > controversial. > > Or we could have a moratorium on any change at any time that has a -1 > from a committer and a +1 from nobody. What about a change that's already been committed but another committer feels caused a regression? If that gets a -1, does it get reverted until things are sorted out, or...? In the situation that started this discussion, a change had already been made and it was only later realized that it caused a regression. Piling on to that, the regression was entwined with other important changes that we wanted to include in the release. Having a system where when the commit was made is a driving factor seems like it would potentially reward people who pushed a change early by giving them the upper hand in such a discussion as this. Ultimately though, I still agree with Andres that this is something we should act to avoid these situation and we shouldn't try to make a policy to fit what's been a very rare occurance. If nothing else, I feel like we'd probably re-litigate the policy every time since it would likely have been a long time since the last discussion of it and the specific circumstances will always be at least somewhat different. Thanks, Stephen
Attachment
>>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes: Stephen> In the situation that started this discussion, a change had Stephen> already been made and it was only later realized that it Stephen> caused a regression. Just to keep the facts straight: The regression was introduced by importing tzdb 2019a (in late April) into the previous round of point releases; the change in UTC behaviour was not mentioned in the commit and presumably didn't show up on anyone's radar until there were field complaints (which didn't reach our mailing lists until Jun 4 as far as I know). Tom's "fix" of backpatching 23bd3cec6 (which happened on Friday 14th) addressed only a subset of cases, as far as I know working only on Linux (the historical convention has always been for /etc/localtime to be a copy of a zonefile, not a symlink to one). I only decided to write (and if need be commit) my own followup fix after confirming that the bug was unfixed in a default FreeBSD install when set to UTC, and there was a good chance that a number of other less-popular platforms were affected too. Stephen> Piling on to that, the regression was entwined with other Stephen> important changes that we wanted to include in the release. I'm not sure what you're referring to here? -- Andrew (irc:RhodiumToad)
Greetings, * Andrew Gierth (andrew@tao11.riddles.org.uk) wrote: > >>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes: > > Stephen> In the situation that started this discussion, a change had > Stephen> already been made and it was only later realized that it > Stephen> caused a regression. > > Just to keep the facts straight: > > The regression was introduced by importing tzdb 2019a (in late April) Ah, thanks, I had misunderstood when that was committed then. > into the previous round of point releases; the change in UTC behaviour > was not mentioned in the commit and presumably didn't show up on > anyone's radar until there were field complaints (which didn't reach our > mailing lists until Jun 4 as far as I know). Ok. > Tom's "fix" of backpatching 23bd3cec6 (which happened on Friday 14th) > addressed only a subset of cases, as far as I know working only on Linux > (the historical convention has always been for /etc/localtime to be a > copy of a zonefile, not a symlink to one). I only decided to write (and > if need be commit) my own followup fix after confirming that the bug was > unfixed in a default FreeBSD install when set to UTC, and there was a > good chance that a number of other less-popular platforms were affected > too. > > Stephen> Piling on to that, the regression was entwined with other > Stephen> important changes that we wanted to include in the release. > > I'm not sure what you're referring to here? I was referring to the fact that the regression was introduced by a, presumably important, tzdb update (2019a, as mentioned above). At least, I made the assumption that the commit of the import of 2019a had more than just the change that introduced the regression, but I'm happy to admit I'm no where near as close to the code here as you/Tom here. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Andrew Gierth (andrew@tao11.riddles.org.uk) wrote: > "Stephen" == Stephen Frost <sfrost@snowman.net> writes: >> Stephen> Piling on to that, the regression was entwined with other >> Stephen> important changes that we wanted to include in the release. >> >> I'm not sure what you're referring to here? I was confused by that too. > I was referring to the fact that the regression was introduced by a, > presumably important, tzdb update (2019a, as mentioned above). At > least, I made the assumption that the commit of the import of 2019a had > more than just the change that introduced the regression, but I'm happy > to admit I'm no where near as close to the code here as you/Tom here. Keep in mind that dealing with whatever tzdb chooses to ship is not optional from our standpoint. Even if we'd refused to import 2019a, every installation using --with-system-tzdata (which, I sincerely hope, includes most production installs) is going to have to deal with it as soon as the respective platform vendor gets around to shipping the tzdata update. So reverting that commit was never on the table. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> I was referring to the fact that the regression was introduced by a, >> presumably important, tzdb update (2019a, as mentioned above). At >> least, I made the assumption that the commit of the import of 2019a >> had more than just the change that introduced the regression, but >> I'm happy to admit I'm no where near as close to the code here as >> you/Tom here. Tom> Keep in mind that dealing with whatever tzdb chooses to ship is Tom> not optional from our standpoint. Even if we'd refused to import Tom> 2019a, every installation using --with-system-tzdata (which, I Tom> sincerely hope, includes most production installs) is going to Tom> have to deal with it as soon as the respective platform vendor Tom> gets around to shipping the tzdata update. So reverting that Tom> commit was never on the table. Exactly. But that means that if the combination of our arbitrary rules and the data in the tzdb results in an undesirable result, then we have no real option but to fix our rules (we can't reasonably expect the tzdb upstream to choose zone names to make our alphabetical-order preference come out right). My commit was intended to be the minimum fix that would restore the pre-2019a behavior on all systems. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> Keep in mind that dealing with whatever tzdb chooses to ship is > Tom> not optional from our standpoint. Even if we'd refused to import > Tom> 2019a, every installation using --with-system-tzdata (which, I > Tom> sincerely hope, includes most production installs) is going to > Tom> have to deal with it as soon as the respective platform vendor > Tom> gets around to shipping the tzdata update. So reverting that > Tom> commit was never on the table. > Exactly. But that means that if the combination of our arbitrary rules > and the data in the tzdb results in an undesirable result, then we have > no real option but to fix our rules (we can't reasonably expect the tzdb > upstream to choose zone names to make our alphabetical-order preference > come out right). My position is basically that having TimeZone come out as 'UCT' rather than 'UTC' (affecting no visible behavior of the timestamp types, AFAIK) was not such a grave problem as to require violating community norms to get it fixed in this week's releases rather than the next batch. I hadn't had time to consider your patch last week because I was (a) busy with release prep and (b) sick as a dog. I figured we could let it slide and discuss it after the release work died down. I imagine the reason you got zero other responses was that nobody else thought it was of life-and-death urgency either. Anyway, as I said already, my beef is not with the substance of the patch but with failing to follow community process. One "yes" vote and one "no" vote do not constitute consensus. You had no business assuming that I would reverse the "no" vote. regards, tom lane
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > Tom's "fix" of backpatching 23bd3cec6 (which happened on Friday 14th) > addressed only a subset of cases, as far as I know working only on Linux > (the historical convention has always been for /etc/localtime to be a > copy of a zonefile, not a symlink to one). I only decided to write (and > if need be commit) my own followup fix after confirming that the bug was > unfixed in a default FreeBSD install when set to UTC, and there was a > good chance that a number of other less-popular platforms were affected > too. I think your info is out of date on that. NetBSD uses a symlink, and has done for at least 5 years: see set_timezone in http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.sbin/sysinst/util.c?only_with_tag=MAIN macOS seems to have done it like that for at least 10 years, too. I didn't bother digging into their source repo, as it's likely that System Preferences isn't open-source; but *all* of my macOS machines have symlinks there, and some of those link files are > 10 years old. I could not easily find OpenBSD's logic to set the zone during install, if they have any; but at least their admin-facing documentation says to create the file as a symlink: https://www.openbsd.org/faq/faq8.html#TimeZone and there are plenty of similar recommendations found by Mr. Google. In short, I think FreeBSD are holdouts not the norm. I note that even their code will preserve /etc/localtime's symlink status if it was a symlink to start with: see install_zoneinfo_file in https://github.com/freebsd/freebsd/blob/master/usr.sbin/tzsetup/tzsetup.c regards, tom lane
[ starting to come up for air again after a truly nasty sinus infection... fortunately, once I stopped thinking it was "a cold" and went to the doctor, antibiotics seem to be working ] Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> 1 Europe/Isle_of_Man > Is this from HEAD and therefore possibly getting the value from an > /etc/localtime symlink? I can't see any other way that > Europe/Isle_of_Man could ever be chosen over Europe/London... All of the results I quoted there are HEAD-only, since we did not put the code to make initdb print its timezone selection into the back branches until 14-June. regards, tom lane
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > I was planning on submitting a follow-up myself (for pg13+) for > discussion of further improvements. My suggestion would be that we > should have the following order of preference, from highest to lowest: > - UTC (justified by being an international standard) > - Etc/UTC > - zones in zone.tab/zone1970.tab: > These are the zone names that are intended to be presented to the > user to select from. Dispute the exact meaning as you will, but I > think it makes sense that these names should be chosen over > equivalently good matches just on that basis. > - zones in Africa/ America/ Antarctica/ Asia/ Atlantic/ Australia/ > Europe/ Indian/ Pacific/ Arctic/ > These subdirs are the ones generated by the "primary" zone data > files, including both Zone and Link statements but not counting > the "backward" and "etcetera" files. > - GMT (justified on the basis of its presence as a default in the code) > - Etc/* > - any other zone name with a / > - any zone name without a /, excluding 'localtime' and 'Factory' > - 'localtime' > - 'Factory' TBH, I find this borderline insane: it's taking a problem we did not have and moving the goalposts to the next county. Not just any old county, either, but one where there's a shooting war going on. As soon as you do something like putting detailed preferences into the zone name selection rules, you are going to be up against problems like "should Europe/ have priority over Asia/, or vice versa?" This is not academic; see for example Link Asia/Nicosia Europe/Nicosia Link Europe/Istanbul Asia/Istanbul # Istanbul is in both continents. These choices affect exactly the people who are going to get bent out of shape because you picked the "wrong" name for their zone. Doesn't matter that both names are "wrong" to different subsets. As long as we have a trivial and obviously apolitical rule like alphabetical order, I think we can skate over such things; but the minute we have any sort of human choices involved there, we're going to be getting politically driven requests to do-it-like-this-because-I-think- the-default-should-be-that. Again, trawl the tzdb list archives for awhile if you think this might not be a problem: http://mm.icann.org/pipermail/tz/ I think we can get away with fixing simple cases that are directly caused by tzdb's own idiosyncrasies, ie "localtime" and "posixrules" and "Factory". If we go further than that, we *will* regret it. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> TBH, I find this borderline insane: it's taking a problem we did Tom> not have and moving the goalposts to the next county. Not just any Tom> old county, either, but one where there's a shooting war going on. Tom> As soon as you do something like putting detailed preferences into Tom> the zone name selection rules, you are going to be up against Tom> problems like "should Europe/ have priority over Asia/, or vice Tom> versa?" I would say that this problem exists with arbitrary preferences too. Tom> As long as we have a trivial and obviously apolitical rule like Tom> alphabetical order, I think we can skate over such things; but the Tom> minute we have any sort of human choices involved there, we're Tom> going to be getting politically driven requests to Tom> do-it-like-this-because-I-think- the-default-should-be-that. The actual content of the rules I suggested all come from the tzdb distribution; anyone complaining can be told to take it up with them. For the record, this is the list of zones (91 out of 348, or about 26%) that we currently deduce wrongly, as obtained by trying each zone name listed in zone1970.tab and seeing which zone we deduce when that zone's file is copied to /etc/localtime. Note in particular that our arbitrary rules heavily prefer the deprecated backward-compatibility aliases which are the most likely to disappear in future versions. (not all of these are fixable, of course) Africa/Abidjan -> GMT Africa/Cairo -> Egypt Africa/Johannesburg -> Africa/Maseru Africa/Maputo -> Africa/Harare Africa/Nairobi -> Africa/Asmara Africa/Tripoli -> Libya America/Adak -> US/Aleutian America/Anchorage -> US/Alaska America/Argentina/Buenos_Aires -> America/Buenos_Aires America/Argentina/Catamarca -> America/Catamarca America/Argentina/Cordoba -> America/Cordoba America/Argentina/Jujuy -> America/Jujuy America/Argentina/Mendoza -> America/Mendoza America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia America/Chicago -> US/Central America/Creston -> MST America/Curacao -> America/Aruba America/Denver -> Navajo America/Detroit -> US/Michigan America/Edmonton -> Canada/Mountain America/Havana -> Cuba America/Indiana/Indianapolis -> US/East-Indiana America/Indiana/Knox -> America/Knox_IN America/Jamaica -> Jamaica America/Kentucky/Louisville -> America/Louisville America/Los_Angeles -> US/Pacific America/Manaus -> Brazil/West America/Mazatlan -> Mexico/BajaSur America/Mexico_City -> Mexico/General America/New_York -> US/Eastern America/Panama -> EST America/Phoenix -> US/Arizona America/Port_of_Spain -> America/Virgin America/Rio_Branco -> Brazil/Acre America/Sao_Paulo -> Brazil/East America/Toronto -> Canada/Eastern America/Vancouver -> Canada/Pacific America/Whitehorse -> Canada/Yukon America/Winnipeg -> Canada/Central Asia/Dhaka -> Asia/Dacca Asia/Ho_Chi_Minh -> Asia/Saigon Asia/Hong_Kong -> Hongkong Asia/Jerusalem -> Israel Asia/Kathmandu -> Asia/Katmandu Asia/Kuala_Lumpur -> Singapore Asia/Macau -> Asia/Macao Asia/Riyadh -> Asia/Aden Asia/Seoul -> ROK Asia/Shanghai -> PRC Asia/Singapore -> Singapore Asia/Taipei -> ROC Asia/Tehran -> Iran Asia/Thimphu -> Asia/Thimbu Asia/Tokyo -> Japan Asia/Ulaanbaatar -> Asia/Ulan_Bator Atlantic/Reykjavik -> Iceland Atlantic/South_Georgia -> Etc/GMT+2 Australia/Adelaide -> Australia/South Australia/Broken_Hill -> Australia/Yancowinna Australia/Darwin -> Australia/North Australia/Lord_Howe -> Australia/LHI Australia/Melbourne -> Australia/Victoria Australia/Perth -> Australia/West Australia/Sydney -> Australia/ACT Europe/Belgrade -> Europe/Skopje Europe/Dublin -> Eire Europe/Istanbul -> Turkey Europe/Lisbon -> Portugal Europe/London -> GB Europe/Moscow -> W-SU Europe/Warsaw -> Poland Europe/Zurich -> Europe/Vaduz Indian/Christmas -> Etc/GMT-7 Indian/Mahe -> Etc/GMT-4 Indian/Reunion -> Etc/GMT-4 Pacific/Auckland -> NZ Pacific/Chatham -> NZ-CHAT Pacific/Chuuk -> Pacific/Yap Pacific/Funafuti -> Etc/GMT-12 Pacific/Gambier -> Etc/GMT+9 Pacific/Guadalcanal -> Etc/GMT-11 Pacific/Honolulu -> US/Hawaii Pacific/Kwajalein -> Kwajalein Pacific/Pago_Pago -> US/Samoa Pacific/Palau -> Etc/GMT-9 Pacific/Pohnpei -> Pacific/Ponape Pacific/Port_Moresby -> Etc/GMT-10 Pacific/Tahiti -> Etc/GMT+10 Pacific/Tarawa -> Etc/GMT-12 Pacific/Wake -> Etc/GMT-12 Pacific/Wallis -> Etc/GMT-12 -- Andrew (irc:RhodiumToad)
On Wed, Jun 26, 2019 at 6:32 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > Pacific/Auckland -> NZ Right. On a FreeBSD system here in New Zealand you get "NZ" with default configure options (ie using PostgreSQL's tzdata). But if you build with --with-system-tzdata=/usr/share/zoneinfo you get "Pacific/Auckland", and that's because the FreeBSD zoneinfo directory doesn't include the old non-city names like "NZ", "GB", "Japan", "US/Eastern" etc. (Unfortunately the FreeBSD packages for PostgreSQL are not being built with that option so initdb chooses the old names. Something to take up with the maintainers.) -- Thomas Munro https://enterprisedb.com
>>>>> "Thomas" == Thomas Munro <thomas.munro@gmail.com> writes: >> Pacific/Auckland -> NZ Thomas> Right. On a FreeBSD system here in New Zealand you get "NZ" Thomas> with default configure options (ie using PostgreSQL's tzdata). Thomas> But if you build with --with-system-tzdata=/usr/share/zoneinfo Thomas> you get "Pacific/Auckland", and that's because the FreeBSD Thomas> zoneinfo directory doesn't include the old non-city names like Thomas> "NZ", "GB", "Japan", "US/Eastern" etc. (Unfortunately the Thomas> FreeBSD packages for PostgreSQL are not being built with that Thomas> option so initdb chooses the old names. Something to take up Thomas> with the maintainers.) Same issue here with Europe/London getting "GB". -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Thomas" == Thomas Munro <thomas.munro@gmail.com> writes: > Thomas> Right. On a FreeBSD system here in New Zealand you get "NZ" > Thomas> with default configure options (ie using PostgreSQL's tzdata). > Thomas> But if you build with --with-system-tzdata=/usr/share/zoneinfo > Thomas> you get "Pacific/Auckland", and that's because the FreeBSD > Thomas> zoneinfo directory doesn't include the old non-city names like > Thomas> "NZ", "GB", "Japan", "US/Eastern" etc. > Same issue here with Europe/London getting "GB". FreeBSD offers yet another obstacle to Andrew's proposal: $ uname -a FreeBSD rpi3.sss.pgh.pa.us 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC arm64 $ ls /usr/share/zoneinfo/ Africa/ Australia/ Etc/ MST WET America/ CET Europe/ MST7MDT posixrules Antarctica/ CST6CDT Factory PST8PDT zone.tab Arctic/ EET HST Pacific/ Asia/ EST Indian/ SystemV/ Atlantic/ EST5EDT MET UTC No zone1970.tab. I do not think we can rely on that file being there, since zic itself doesn't install it; it's up to packagers whether or where to install the "*.tab" files. In general, the point I'm trying to make is that our policy should be "Ties are broken arbitrarily, and if you don't like the choice that initdb makes, here's how to fix it". As soon as we try to break some ties in favor of somebody's idea of what is "right", we are in for neverending problems with different people disagreeing about what is "right", and insisting that their preference should be the one the code enforces. Let's *please* not go there, or even within hailing distance of it. (By this light, even preferring UTC over UCT is a dangerous precedent. I won't argue for reverting that, but I don't want to go further.) regards, tom lane
Further on this --- I now remember that the reason we used to want to reject the "Factory" timezone is that it used to report this as the zone abbreviation: Local time zone must be set--see zic manual page which (a) resulted in syntactically invalid timestamp output from the timeofday() function and (b) completely screwed up the column width in the pg_timezone_names view. But since 2016g, it's reported the much-less-insane string "-00". I propose therefore that it's time to just drop the discrimination against "Factory", as per attached. There doesn't seem to be any reason anymore to forbid people from seeing it in pg_timezone_names or selecting it as the timezone if they're so inclined. We would only have a problem if somebody is using --with-system-tzdata in a machine where they've not updated the system tzdata since 2016, and I'm no longer willing to consider that a valid use-case. regards, tom lane diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 9def318..91b1847 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -4845,19 +4845,6 @@ pg_timezone_names(PG_FUNCTION_ARGS) &tzoff, &tm, &fsec, &tzn, tz) != 0) continue; /* ignore if conversion fails */ - /* - * Ignore zic's rather silly "Factory" time zone. The long string - * about "see zic manual page" is used in tzdata versions before - * 2016g; we can drop it someday when we're pretty sure no such data - * exists in the wild on platforms using --with-system-tzdata. In - * 2016g and later, the time zone abbreviation "-00" is used for - * "Factory" as well as some invalid cases, all of which we can - * reasonably omit from the pg_timezone_names view. - */ - if (tzn && (strcmp(tzn, "-00") == 0 || - strcmp(tzn, "Local time zone must be set--see zic manual page") == 0)) - continue; - /* Found a displayable zone */ break; } diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c index f91fd31..fc86ff0 100644 --- a/src/bin/initdb/findtimezone.c +++ b/src/bin/initdb/findtimezone.c @@ -413,12 +413,7 @@ identify_system_timezone(void) &tt, &bestscore, resultbuf); if (bestscore > 0) - { - /* Ignore IANA's rather silly "Factory" zone; use GMT instead */ - if (strcmp(resultbuf, "Factory") == 0) - return NULL; return resultbuf; - } /* * Couldn't find a match in the database, so next we try constructed zone
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> No zone1970.tab. zone.tab is an adequate substitute - a fact which I thought was sufficiently obvious as to not be worth mentioning. (also see https://reviews.freebsd.org/D20646 ) Tom> I do not think we can rely on that file being there, since zic Tom> itself doesn't install it; it's up to packagers whether or where Tom> to install the "*.tab" files. The proposed rules I suggested do work almost as well if zone[1970].tab is absent, though obviously that's not the optimal situation. But are there any systems which lack it? It's next to impossible to implement a sane "ask the user what timezone to use" procedure without it. Tom> In general, the point I'm trying to make is that our policy should Tom> be "Ties are broken arbitrarily, and if you don't like the choice Tom> that initdb makes, here's how to fix it". Yes, you've repeated that point at some length, and I am not convinced. Is anyone else? -- Andrew (irc:RhodiumToad)
Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)
From
Daniel Gustafsson
Date:
> On 27 Jun 2019, at 00:48, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > Tom> In general, the point I'm trying to make is that our policy should > Tom> be "Ties are broken arbitrarily, and if you don't like the choice > Tom> that initdb makes, here's how to fix it". > > Yes, you've repeated that point at some length, and I am not convinced. > Is anyone else? I don’t have any insights into the patches comitted or proposed. However, having been lurking on the tz mailinglist for a long time, I totally see where Tom is coming from with this. cheers ./daniel
Greetings, * Daniel Gustafsson (daniel@yesql.se) wrote: > > On 27 Jun 2019, at 00:48, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > > > Tom> In general, the point I'm trying to make is that our policy should > > Tom> be "Ties are broken arbitrarily, and if you don't like the choice > > Tom> that initdb makes, here's how to fix it". > > > > Yes, you've repeated that point at some length, and I am not convinced. > > Is anyone else? > > I don’t have any insights into the patches comitted or proposed. However, > having been lurking on the tz mailinglist for a long time, I totally see where > Tom is coming from with this. I understand this concern, but I have to admit that I'm not entirely thrilled with having the way we pick defaults be based on the concern that people will complain. If anything, this community, at least in my experience, has thankfully been relatively reasonable and I have some pretty serious doubts that a change like this will suddenly invite the masses to argue with us or that, should someone try, they'd end up getting much traction. On the other hand, picking deprecated spellings is clearly a poor choice, and we don't prevent people from picking whatever they want to. I also don't see what Andrew's suggesting as being terribly controversial, though that's likely because I'm looking through rose-colored glasses, as the saying goes. Even with that understanding though, I tend to side with Andrew on this. Thanks, Stephen
Attachment
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> In general, the point I'm trying to make is that our policy should > Tom> be "Ties are broken arbitrarily, and if you don't like the choice > Tom> that initdb makes, here's how to fix it". > Yes, you've repeated that point at some length, and I am not convinced. [ shrug... ] You haven't convinced me, either. By my count we each have about 0.5 other votes in favor of our positions, so barring more opinions there's no consensus here for the sort of behavioral change you suggest. However, not to let the perfect be the enemy of the good, it seems like nobody has spoken against the ideas of (a) installing negative preferences for the "localtime" and "posixrules" pseudo-zones, and (b) getting rid of our now-unnecessary special treatment for "Factory". How about we do that much and leave any more-extensive change for another day? regards, tom lane
On Tue, Jun 25, 2019 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > As long as we have a trivial and obviously apolitical rule like > alphabetical order, I think we can skate over such things; but the minute > we have any sort of human choices involved there, we're going to be > getting politically driven requests to do-it-like-this-because-I-think- > the-default-should-be-that. Again, trawl the tzdb list archives for > awhile if you think this might not be a problem: > http://mm.icann.org/pipermail/tz/ I'm kind of unsure what to think about this whole debate substantively. If Andrew is correct that zone.tab or zone1970.tab is a list of time zone names to be preferred over alternatives, then it seems like we ought to prefer them. He remarks that we are preferring "deprecated backward-compatibility aliases" and to the extent that this is true, it seems like a bad thing. We can't claim to be altogether here apolitical, because when those deprecated backward-compatibility names are altogether removed, we are going to remove them and they're going to stop working. If we know which ones are likely to suffer that fate eventually, we ought to stop spitting them out. It's no more political to de-prefer them when upstream does than it is to remove them with the upstream does. However, I don't know whether Andrew is right about those things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I'm kind of unsure what to think about this whole debate > substantively. If Andrew is correct that zone.tab or zone1970.tab is a > list of time zone names to be preferred over alternatives, then it > seems like we ought to prefer them. It's not really clear to me that the IANA folk intend those files to be read as a list of preferred zone names. If they do, what are we to make of the fact that no variant of "UTC" appears in them? > He remarks that we are preferring > "deprecated backward-compatibility aliases" and to the extent that > this is true, it seems like a bad thing. We can't claim to be > altogether here apolitical, because when those deprecated > backward-compatibility names are altogether removed, we are going to > remove them and they're going to stop working. If we know which ones > are likely to suffer that fate eventually, we ought to stop spitting > them out. It's no more political to de-prefer them when upstream does > than it is to remove them with the upstream does. I think that predicting what IANA will do in the future is a fool's errand. Our contract is to select some one of the aliases that the tzdb database presents, not to guess about whether it might present a different set in the future. (Also note that a lot of the observed variation here has to do with whether individual platforms choose to install backward-compatibility zone names. I think the odds that IANA proper will remove those links are near zero; TTBOMK they never have removed one yet.) More generally, my unhappiness about Andrew's proposal is: 1. It's solving a problem that just about nobody cares about, as evidenced by the very tiny number of complaints we've had to date. As long as the "timezone" setting has the correct external behavior (UTC offset, DST rules, and abbreviations), very few people notice it at all. With the addition of the code to resolve /etc/localtime when it's a symlink, the population of people who might care has taken a further huge drop. 2. Changing this behavior might create more problems than it solves. In particular, it seemed to me that a lot of the complaints in the UCT/UTC kerfuffle were less about "UCT is a silly name for my zone" than about "this change broke my regression test that expected timezone to be set to X in this environment". Rearranging the tiebreak rules is just going to make different sets of such people unhappy. (Admittedly, the symlink-lookup addition has already created some risk of this ilk. Maybe we should wait for that to be in the field for more than a week before we judge whether further hacking is advisable.) 3. The proposal has technical issues, in particular I'm not nearly as sanguine as Andrew is about whether we can rely on zone[1970].tab to be available. So I'm very unexcited about writing a bunch of new code or opening ourselves to politically-driven complaints in order to change this. It seems like a net loss almost independently of the details. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Robert Haas <robertmhaas@gmail.com> writes: >> I'm kind of unsure what to think about this whole debate >> substantively. If Andrew is correct that zone.tab or zone1970.tab is >> a list of time zone names to be preferred over alternatives, then it >> seems like we ought to prefer them. Tom> It's not really clear to me that the IANA folk intend those files Tom> to be read as a list of preferred zone names. The files exist to support user selection of zone names. That is, it is intended that you can use them to allow the user to choose their country and then timezone within that country, rather than offering them a flat regional list (which can be large and the choices non-obvious). The zone*.tab files therefore include only geographic names, and not either Posix-style abbreviations or special cases like Etc/UTC. Programs that use zone*.tab to allow user selection handle cases like that separately (for example, FreeBSD's tzsetup offers "UTC" at the "regional" menu). It's quite possible that people have implemented time zone selection interfaces that use some other presentation of the list, but that doesn't particularly diminish the value of zone*.tab. In particular, the current zone1970.tab has: - at least one entry for every iso3166 country code that's not an uninhabited remote island; - an entry for every distinct "Zone" in the primary data files, with the exception of entries that are specifically commented as being for backward compatibility (e.g. CET, CST6CDT, etc. - see the comments in the europe and northamerica data files for why these exist) The zonefiles that get installed in addition to the ones in zone1970.tab fall into these categories: - they are "Link" entries in the primary data files - they are from the "backward" data file, which is omitted in some system tzdb installations because it exists only for backward compatibility (but we install it because it's still listed in tzdata.zi by default) - they are from the "etcetera" file, which lists special cases such as UTC and fixed UTC offsets Tom> If they do, what are we to make of the fact that no variant of Tom> "UTC" appears in them? That "UTC" is not a geographic timezone name? >> He remarks that we are preferring "deprecated backward-compatibility >> aliases" and to the extent that this is true, it seems like a bad >> thing. We can't claim to be altogether here apolitical, because when >> those deprecated backward-compatibility names are altogether >> removed, we are going to remove them and they're going to stop >> working. If we know which ones are likely to suffer that fate >> eventually, we ought to stop spitting them out. It's no more >> political to de-prefer them when upstream does than it is to remove >> them with the upstream does. Tom> I think that predicting what IANA will do in the future is a Tom> fool's errand. Maybe so, but when something is explicitly in a file called "backward", and the upstream-provided Makefile has specific options for omitting it (even though it is included by default), and all the comments about it are explicit about it being for backward compatibility, I think it's reasonable to avoid _preferring_ the names in it. The list of backward-compatibility zones is in any case extremely arbitrary and nonsensical: for example "GB", "Eire", "Iceland", "Poland", "Portugal" are aliases for their respective countries, but there are no comparable aliases for any other European country. The "Navajo" entry (an alias for America/Denver) has already been mentioned in this thread; our arbitrary rule prefers it (due to shortness) for all US zones that use Mountain time with DST. And so on. Tom> Our contract is to select some one of the aliases that the tzdb Tom> database presents, not to guess about whether it might present a Tom> different set in the future. (Also note that a lot of the observed Tom> variation here has to do with whether individual platforms choose Tom> to install backward-compatibility zone names. I think the odds Tom> that IANA proper will remove those links are near zero; TTBOMK Tom> they never have removed one yet.) Well, we should also consider the possibility that we might be using the system tzdata and that the upstream OS or distro packager may choose to remove the "backward" data or split it to a separate package. Tom> More generally, my unhappiness about Andrew's proposal is: [...] Tom> 3. The proposal has technical issues, in particular I'm not nearly Tom> as sanguine as Andrew is about whether we can rely on Tom> zone[1970].tab to be available. My proposal works even if it's not, though I don't expect that to be an issue in practice. -- Andrew (irc:RhodiumToad)
On Thu, Jun 27, 2019 at 1:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's not really clear to me that the IANA folk intend those files to > be read as a list of preferred zone names. If they do, what are we > to make of the fact that no variant of "UTC" appears in them? I think their intent is key. We can't make reasonable decisions about what to do with some data if we don't know what the data is intended to mean. > I think that predicting what IANA will do in the future is a fool's > errand. Our contract is to select some one of the aliases that the > tzdb database presents, not to guess about whether it might present > a different set in the future. (Also note that a lot of the observed > variation here has to do with whether individual platforms choose to > install backward-compatibility zone names. I think the odds that > IANA proper will remove those links are near zero; TTBOMK they > never have removed one yet.) That doesn't make it a good idea to call Mountain time "Navajo," as Andrew alleges we are doing. Then again, the MacBook upon which I am writing this email thinks that my time zone is "America/New_York," whereas I think it is "US/Eastern," which I suppose reinforces your point about all of this being political. But on the third hand, if somebody tells me that my time zone is America/New_York, I can say to myself "oh, they mean Eastern time," whereas if they say that I'm on "Navajo" time, I'm going to have to sit down with 'diff' and the zoneinfo files to figure out what that actually means. I note that https://github.com/eggert/tz/blob/master/backward seems pretty clear about which things are backward compatibility aliases, which seems to imply that we would not be taking a political position separate from the upstream position if we tried to de-prioritize those. Also, https://github.com/eggert/tz/blob/master/theory.html says... Names normally have the form <var>AREA</var><code>/</code><var>LOCATION</var>, where <var>AREA</var> is a continent or ocean, and <var>LOCATION</var> is a specific location within the area. ...which seems to imply that AREA/LOCATION is the "normal" and thus preferred form, and also that... The file '<code>zone1970.tab</code>' lists geographical locations used to name timezones. It is intended to be an exhaustive list of names for geographic regions as described above; this is a subset of the timezones in the data. ...which seems to support Andrew's idea that you can identify AREA/LOCATION time zones by looking in that file. Long story short, I agree with you that most people probably don't care about this very much, but I also agree with Andrew that some of the current choices we're making are pretty strange, and I'm not convinced as you are that it's impossible to make a principled choice between alternatives in all cases. The upstream data appears to contain some information about intent; it's not just a jumble of exactly-equally-preferred alternatives. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Long story short, I agree with you that most people probably don't > care about this very much, but I also agree with Andrew that some of > the current choices we're making are pretty strange, and I'm not > convinced as you are that it's impossible to make a principled choice > between alternatives in all cases. The upstream data appears to > contain some information about intent; it's not just a jumble of > exactly-equally-preferred alternatives. I agree that if there were an easy way to discount the IANA "backward compatibility" zone names, that'd likely be a reasonable thing to do. The problem is that those names aren't distinguished from others in the representation we have available to us (ie, the actual /usr/share/zoneinfo file tree). I'm dubious that relying on zone[1970].tab would improve matters substantially; it would fix some cases, but I don't think it would fix all of them. Resolving all ambiguous zone-name choices is not the charter of those files. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> I'm dubious that relying on zone[1970].tab would improve matters Tom> substantially; it would fix some cases, but I don't think it would Tom> fix all of them. Resolving all ambiguous zone-name choices is not Tom> the charter of those files. Allowing zone matching by _content_ (as we do) rather than by name does not seem to be supported in any respect whatever by the upstream data; we've always been basically on our own with that. [tl/dr for what follows: my proposal reduces the number of discrepancies from 91 (see previously posted list) to 16 or 7, none of which are new] So here are the ambiguities that are not resolvable at all: Africa/Abidjan -> GMT This happens because the Africa/Abidjan zone is literally just GMT even down to the abbreviation, and we don't want to guess Africa/Abidjan for all GMT installs. America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia Asia/Kuala_Lumpur -> Asia/Singapore These are cases where zone1970.tab, despite its name, includes distinctly-named zones which are distinct only for times in the far past (before 1920 or 1905 respectively). They are otherwise identical by content. We therefore end up choosing arbitrarily. In addition, the following collection of random islands have timezones which lack local abbreviation names, recent offset changes, or DST, and are therefore indistinguishable by content from fixed-offset zones like Etc/GMT+2: Etc/GMT-4 == Indian/Mahe Indian/Reunion Etc/GMT-7 == Indian/Christmas Etc/GMT-9 == Pacific/Palau Etc/GMT-10 == Pacific/Port_Moresby Etc/GMT-11 == Pacific/Guadalcanal Etc/GMT-12 == Pacific/Funafuti Pacific/Tarawa Pacific/Wake Pacific/Wallis Etc/GMT+10 == Pacific/Tahiti Etc/GMT+9 == Pacific/Gambier Etc/GMT+2 == Atlantic/South_Georgia We currently map all of these to the Etc/GMT+x names on the grounds of length. If we chose to prefer zone.tab names over Etc/* names for all of these, we'd be ambiguous only for a handful of relatively small islands. -- Andrew (irc:RhodiumToad)
On Thu, Jun 27, 2019 at 10:48 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> No zone1970.tab. > > zone.tab is an adequate substitute - a fact which I thought was > sufficiently obvious as to not be worth mentioning. > > (also see https://reviews.freebsd.org/D20646 ) FWIW this is now fixed for FreeBSD 13-CURRENT, with a good chance of back-patch. I don't know if there are any other operating systems that are shipping zoneinfo but failing to install zone1970.tab, but if there are it's a mistake IMHO and they'll probably fix that if someone complains, considering that zone.tab literally tells you to go and use the newer version, and Paul Eggert has implied that zone1970.tab is the "full" and "canonical" list[1]. [1] http://mm.icann.org/pipermail/tz/2014-October/021760.html -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > FWIW this is now fixed for FreeBSD 13-CURRENT, with a good chance of > back-patch. I don't know if there are any other operating systems > that are shipping zoneinfo but failing to install zone1970.tab, but if > there are it's a mistake IMHO and they'll probably fix that if someone > complains, considering that zone.tab literally tells you to go and use > the newer version, and Paul Eggert has implied that zone1970.tab is > the "full" and "canonical" list[1]. I'm not sure we're any closer to a meeting of the minds on whether consulting zone[1970].tab is a good thing to do, but we got an actual user complaint[1] about how "localtime" should not be a preferred spelling. So I want to go ahead and insert the discussed anti-preference against "localtime" and "posixrules", as per 0001 below. If we do do something with zone[1970].tab, we'd still need these special rules, so I don't think this is blocking anything. Also, I poked into the question of the "Factory" zone a bit more, and was disappointed to find that not only does FreeBSD still install the "Factory" zone, but they are apparently hacking the data so that it emits the two-changes-back abbreviation "Local time zone must be set--use tzsetup". This bypasses the filter in pg_timezone_names that is expressly trying to prevent showing such silly "abbreviations". So I now feel that not only can we not remove initdb's discrimination against "Factory", but we indeed need to make the pg_timezone_names filter more aggressive. Hence, I now propose 0002 below to tweak what we're doing with "Factory". I did remove our special cases for it in zic.c, as we don't need them anymore with modern tzdb data, and there's no reason to support running "zic -P" with hacked-up data. regards, tom lane [1] https://www.postgresql.org/message-id/CADT4RqCCnj6FKLisvT8tTPfTP4azPhhDFJqDF1JfBbOH5w4oyQ@mail.gmail.com diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c index a5c9c9e..786e787 100644 --- a/src/bin/initdb/findtimezone.c +++ b/src/bin/initdb/findtimezone.c @@ -608,22 +608,28 @@ check_system_link_file(const char *linkname, struct tztry *tt, /* * Given a timezone name, determine whether it should be preferred over other * names which are equally good matches. The output is arbitrary but we will - * use 0 for "neutral" default preference. - * - * Ideally we'd prefer the zone.tab/zone1970.tab names, since in general those - * are the ones offered to the user to select from. But for the moment, to - * minimize changes in behaviour, simply prefer UTC over alternative spellings - * such as UCT that otherwise cause confusion. The existing "shortest first" - * rule would prefer "UTC" over "Etc/UTC" so keep that the same way (while - * still preferring Etc/UTC over Etc/UCT). + * use 0 for "neutral" default preference; larger values are more preferred. */ static int zone_name_pref(const char *zonename) { + /* + * Prefer UTC over alternatives such as UCT. Also prefer Etc/UTC over + * Etc/UCT; but UTC is preferred to Etc/UTC. + */ if (strcmp(zonename, "UTC") == 0) return 50; if (strcmp(zonename, "Etc/UTC") == 0) return 40; + + /* + * We don't want to pick "localtime" or "posixrules", unless we can find + * no other name for the prevailing zone. Those aren't real zone names. + */ + if (strcmp(zonename, "localtime") == 0 || + strcmp(zonename, "posixrules") == 0) + return -50; + return 0; } diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 4d8db1a..972fcd2 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -4826,16 +4826,15 @@ pg_timezone_names(PG_FUNCTION_ARGS) continue; /* ignore if conversion fails */ /* - * Ignore zic's rather silly "Factory" time zone. The long string - * about "see zic manual page" is used in tzdata versions before - * 2016g; we can drop it someday when we're pretty sure no such data - * exists in the wild on platforms using --with-system-tzdata. In - * 2016g and later, the time zone abbreviation "-00" is used for - * "Factory" as well as some invalid cases, all of which we can - * reasonably omit from the pg_timezone_names view. + * IANA's rather silly "Factory" time zone used to emit ridiculously + * long "abbreviations" such as "Local time zone must be set--see zic + * manual page" or "Local time zone must be set--use tzsetup". While + * modern versions of tzdb emit the much saner "-00", it seems some + * benighted packagers are hacking the IANA data so that it continues + * to produce these strings. To prevent producing a weirdly wide + * abbrev column, reject ridiculously long abbreviations. */ - if (tzn && (strcmp(tzn, "-00") == 0 || - strcmp(tzn, "Local time zone must be set--see zic manual page") == 0)) + if (tzn && strlen(tzn) > 31) continue; /* Found a displayable zone */ diff --git a/src/timezone/zic.c b/src/timezone/zic.c index 95ab854..c27fb45 100644 --- a/src/timezone/zic.c +++ b/src/timezone/zic.c @@ -2443,13 +2443,10 @@ writezone(const char *const name, const char *const string, char version, unsigned char tm = types[i]; char *thisabbrev = &thischars[indmap[desigidx[tm]]]; - /* filter out assorted junk entries */ - if (strcmp(thisabbrev, GRANDPARENTED) != 0 && - strcmp(thisabbrev, "zzz") != 0) - fprintf(stdout, "%s\t" INT64_FORMAT "%s\n", - thisabbrev, - utoffs[tm], - isdsts[tm] ? "\tD" : ""); + fprintf(stdout, "%s\t" INT64_FORMAT "%s\n", + thisabbrev, + utoffs[tm], + isdsts[tm] ? "\tD" : ""); } } /* Print the default type if we have no transitions at all */ @@ -2458,13 +2455,10 @@ writezone(const char *const name, const char *const string, char version, unsigned char tm = defaulttype; char *thisabbrev = &thischars[indmap[desigidx[tm]]]; - /* filter out assorted junk entries */ - if (strcmp(thisabbrev, GRANDPARENTED) != 0 && - strcmp(thisabbrev, "zzz") != 0) - fprintf(stdout, "%s\t" INT64_FORMAT "%s\n", - thisabbrev, - utoffs[tm], - isdsts[tm] ? "\tD" : ""); + fprintf(stdout, "%s\t" INT64_FORMAT "%s\n", + thisabbrev, + utoffs[tm], + isdsts[tm] ? "\tD" : ""); } }
> I'm not sure we're any closer to a meeting of the minds on whether
> consulting zone[1970].tab is a good thing to do, but we got an actual
> user complaint[1] about how "localtime" should not be a preferred
> spelling. So I want to go ahead and insert the discussed anti-preference
> against "localtime" and "posixrules", as per 0001 below. If we do do
> something with zone[1970].tab, we'd still need these special rules,
> so I don't think this is blocking anything.
Just want to stress this point from a PostgreSQL driver maintainer perspective (see here[1] for the full details). Having "localtime" as the PostgreSQL timezone basically means that the timezone is completely opaque from a client point of view - there is no way for clients to know what actual timezone the server is in, and react to that. This is a limiting factor in client development, I hope a consensus on this specific point can be reached.
Shay Rojansky <roji@roji.org> writes: >> I'm not sure we're any closer to a meeting of the minds on whether >> consulting zone[1970].tab is a good thing to do, but we got an actual >> user complaint[1] about how "localtime" should not be a preferred >> spelling. So I want to go ahead and insert the discussed anti-preference >> against "localtime" and "posixrules", as per 0001 below. If we do do >> something with zone[1970].tab, we'd still need these special rules, >> so I don't think this is blocking anything. > Just want to stress this point from a PostgreSQL driver maintainer > perspective (see here[1] for the full details). Having "localtime" as the > PostgreSQL timezone basically means that the timezone is completely opaque > from a client point of view - there is no way for clients to know what > actual timezone the server is in, and react to that. This is a limiting > factor in client development, I hope a consensus on this specific point can > be reached. I have in fact committed that patch. It won't do anything for your problem with respect to existing installations that may have picked "localtime", but it'll at least prevent new initdb runs from picking that. regards, tom lane Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: master [3754113f3] 2019-07-26 12:45:32 -0400 Branch: REL_12_STABLE [e31dfe99c] 2019-07-26 12:45:52 -0400 Branch: REL_11_STABLE [4459266bf] 2019-07-26 12:45:57 -0400 Branch: REL_10_STABLE [ae9b91be7] 2019-07-26 12:46:03 -0400 Branch: REL9_6_STABLE [51b47471f] 2019-07-26 12:46:10 -0400 Branch: REL9_5_STABLE [9ef811742] 2019-07-26 12:46:15 -0400 Branch: REL9_4_STABLE [6c4ffab76] 2019-07-26 12:46:20 -0400 Avoid choosing "localtime" or "posixrules" as TimeZone during initdb. Some platforms create a file named "localtime" in the system timezone directory, making it a copy or link to the active time zone file. If Postgres is built with --with-system-tzdata, initdb will see that file as an exact match to localtime(3)'s behavior, and it may decide that "localtime" is the most preferred spelling of the active zone. That's a very bad choice though, because it's neither informative, nor portable, nor stable if someone changes the system timezone setting. Extend the preference logic added by commit e3846a00c so that we will prefer any other zone file that matches localtime's behavior over "localtime". On the same logic, also discriminate against "posixrules", which is another not-really-a-zone file that is often present in the timezone directory. (Since we install "posixrules" but not "localtime", this change can affect the behavior of Postgres with or without --with-system-tzdata.) Note that this change doesn't prevent anyone from choosing these pseudo-zones if they really want to (i.e., by setting TZ for initdb, or modifying the timezone GUC later on). It just prevents initdb from preferring these zone names when there are multiple matches to localtime's behavior. Since we generally prefer to keep timezone-related behavior the same in all branches, and since this is arguably a bug fix, back-patch to all supported branches. Discussion: https://postgr.es/m/CADT4RqCCnj6FKLisvT8tTPfTP4azPhhDFJqDF1JfBbOH5w4oyQ@mail.gmail.com Discussion: https://postgr.es/m/27991.1560984458@sss.pgh.pa.us
Tom,
> I have in fact committed that patch. It won't do anything for your
> problem with respect to existing installations that may have picked
>"localtime", but it'll at least prevent new initdb runs from picking
> that.
Thanks! At least over time the problem will hopefully diminish.
Hi, On 2019-08-01 10:08:01 -0400, Tom Lane wrote: > I have in fact committed that patch. It won't do anything for your > problem with respect to existing installations that may have picked > "localtime", but it'll at least prevent new initdb runs from picking > that. > Avoid choosing "localtime" or "posixrules" as TimeZone during initdb. > > Some platforms create a file named "localtime" in the system > timezone directory, making it a copy or link to the active time > zone file. If Postgres is built with --with-system-tzdata, initdb > will see that file as an exact match to localtime(3)'s behavior, > and it may decide that "localtime" is the most preferred spelling of > the active zone. That's a very bad choice though, because it's > neither informative, nor portable, nor stable if someone changes > the system timezone setting. Extend the preference logic added by > commit e3846a00c so that we will prefer any other zone file that > matches localtime's behavior over "localtime". When used and a symlink, could we resolve the symlink when determining the timezone? When loading a timezone in the backend, not during initdb. While that'd leave us with the instability, it'd at least would help clients etc understand what the setting actually means? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > When used and a symlink, could we resolve the symlink when determining > the timezone? When loading a timezone in the backend, not during > initdb. While that'd leave us with the instability, it'd at least would > help clients etc understand what the setting actually means? The question here is what the string "localtime" means when it's in the timezone variable. I guess yes, we could install some show_hook for timezone that goes and looks to see if it can resolve what that means. But that sure seems to me to be in you've-got-to-be-kidding territory. Especially since the platforms I've seen that do this tend to use hard links, so that it's questionable whether the pushups would accomplish anything at all. regards, tom lane
Hi, On 2019-08-01 13:59:11 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > When used and a symlink, could we resolve the symlink when determining > > the timezone? When loading a timezone in the backend, not during > > initdb. While that'd leave us with the instability, it'd at least would > > help clients etc understand what the setting actually means? > > The question here is what the string "localtime" means when it's in > the timezone variable. Right. > I guess yes, we could install some show_hook for timezone that goes > and looks to see if it can resolve what that means. But that sure > seems to me to be in you've-got-to-be-kidding territory. Fair enough. I'm mildly worried that people will just carry their timezone setting from one version's postgresql.conf to the next as they upgrade. > Especially since the platforms I've seen that do this tend to use hard > links, so that it's questionable whether the pushups would accomplish > anything at all. Hm, debian's is a symlink (or rather a chain of): $ ls -l /usr/share/zoneinfo/localtime lrwxrwxrwx 1 root root 14 Jul 4 14:04 /usr/share/zoneinfo/localtime -> /etc/localtime $ ls -l /etc/localtime lrwxrwxrwx 1 root root 39 Jul 15 15:40 /etc/localtime -> /usr/share/zoneinfo/America/Los_Angeles The system installed versions of postgres I have available all ended up with timezone=localtime. Not sure how long they've been symlinks. I randomly accessed a backup of an older debian installation, from 2014, and there it's a file (with link count 1). But presumably upgrading would yield a postgresql.conf that still had localtime, but localtime becoming a symlink. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Fair enough. I'm mildly worried that people will just carry their > timezone setting from one version's postgresql.conf to the next as they > upgrade. Maybe. I don't believe pg_upgrade copies over the old postgresql.conf, and I doubt we should consider it good practice in any case. regards, tom lane