Thread: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Christoph Berg
Date:
Re: Tom Lane 2019-04-26 <E1hK8qL-0005yH-VX@gemulon.postgresql.org>
> Update time zone data files to tzdata release 2019a.
> 
> DST law changes in Palestine and Metlakatla.
> Historical corrections for Israel.
> 
> Etc/UCT is now a backward-compatibility link to Etc/UTC, instead
> of being a separate zone that generates the abbreviation "UCT",
> which nowadays is typically a typo.  Postgres will still accept
> "UCT" as an input zone name, but it won't output it.

There is something wrong here. On Debian Buster/unstable, using
system tzdata (2019a-1), if /etc/timezone is "Etc/UTC":

11.3's initdb adds timezone = 'UCT' to postgresql.conf
12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf

Is that expected behavior? Docker users are complaining that "UCT"
messes up their testsuites. https://github.com/docker-library/postgres/issues/577

Christoph



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Christoph" == Christoph Berg <myon@debian.org> writes:

 >> Etc/UCT is now a backward-compatibility link to Etc/UTC, instead of
 >> being a separate zone that generates the abbreviation "UCT", which
 >> nowadays is typically a typo. Postgres will still accept "UCT" as an
 >> input zone name, but it won't output it.

 Christoph> There is something wrong here. On Debian Buster/unstable,
 Christoph> using system tzdata (2019a-1), if /etc/timezone is
 Christoph> "Etc/UTC":

 Christoph> 11.3's initdb adds timezone = 'UCT' to postgresql.conf
 Christoph> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf

 Christoph> Is that expected behavior?

It's clearly not what users expect and it's clearly the wrong thing to
do, though it's the expected behavior of the current code:

 * On most systems, we rely on trying to match the observable behavior of
 * the C library's localtime() function.  The database zone that matches
 * furthest into the past is the one to use.  Often there will be several
 * zones with identical rankings (since the IANA database assigns multiple
 * names to many zones).  We break ties arbitrarily by preferring shorter,
 * then alphabetically earlier zone names.

I believe I pointed out a long, long time ago that this tie-breaking
strategy was insane, and that the rule should be to prefer canonical
names and use something else only in the case of a strictly better
match.

If TZ is set or if /etc/localtime is a symlink rather than a hardlink or
copy of the zone file, then PG can get the zone name directly rather
than having to do the comparisons, so the above comment doesn't apply;
that gives you a workaround.

-- 
Andrew (irc:RhodiumToad)



Christoph Berg <myon@debian.org> writes:
> There is something wrong here. On Debian Buster/unstable, using
> system tzdata (2019a-1), if /etc/timezone is "Etc/UTC":

> 11.3's initdb adds timezone = 'UCT' to postgresql.conf
> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf

Hm, I don't have a Debian machine at hand, but I'm unable to
reproduce this using macOS or RHEL.  I tried things like

$ TZ=UTC initdb
...
selecting default timezone ... UTC
...

Is your build using --with-system-tzdata?  If so, which tzdb
release is the system on, and is it a completely stock copy
of that release?

Given the tie-breaking behavior in findtimezone.c,

 * ... Often there will be several
 * zones with identical rankings (since the IANA database assigns multiple
 * names to many zones).  We break ties arbitrarily by preferring shorter,
 * then alphabetically earlier zone names.

it's not so surprising that UCT might be chosen, but I don't
understand how Etc/UCT would be.

BTW, does Debian set up /etc/timezone as a symlink, by any chance,
rather than a copy or hard link?  If it's a symlink, we could improve
matters by teaching identify_system_timezone() to inspect it.

            regards, tom lane



Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> I believe I pointed out a long, long time ago that this tie-breaking
> strategy was insane, and that the rule should be to prefer canonical
> names and use something else only in the case of a strictly better
> match.

This is assuming that the tzdb data has a concept of a canonical name
for a zone, which unfortunately it does not.  UTC, UCT, Etc/UTC,
and about four other strings are equivalent names for the same zone
so far as one can tell from the installed data.

We could imagine layering some additional data on top of tzdb,
but I don't much want to go there from a maintenance standpoint.

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-06-04 11:27:31 -0400, Tom Lane wrote:
> Hm, I don't have a Debian machine at hand, but I'm unable to
> reproduce this using macOS or RHEL.  I tried things like
> 
> $ TZ=UTC initdb
> ...
> selecting default timezone ... UTC
> ...

On debian unstable that's what I get too, both with system and PG
tzdata.


> BTW, does Debian set up /etc/timezone as a symlink, by any chance,
> rather than a copy or hard link?  If it's a symlink, we could improve
> matters by teaching identify_system_timezone() to inspect it.

On my system it's a copy (link count 1, not a symlink). Or did you mean
/etc/localtime? Because that's indeed a symlink.

If I set the system-wide default, using dpkg-reconfigure -plow tzdata,
to UTC I *do* get Etc/UTC.

root@alap4:/home/andres/src/postgresql# cat /etc/timezone
Etc/UTC
root@alap4:/home/andres/src/postgresql# ls -l /etc/timezone
-rw-r--r-- 1 root root 8 Jun  4 15:44 /etc/timezone

selecting default timezone ... Etc/UTC

This is independent of being built with system or non-system tzdata.

Enabling debugging shows:

selecting default timezone ... symbolic link "/etc/localtime" contains "/usr/share/zoneinfo/Etc/UCT"
TZ "Etc/UCT" gets max score 5200
Etc/UCT

Greetings,

Andres Freund



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 > Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
 >> I believe I pointed out a long, long time ago that this tie-breaking
 >> strategy was insane, and that the rule should be to prefer canonical
 >> names and use something else only in the case of a strictly better
 >> match.

 Tom> This is assuming that the tzdb data has a concept of a canonical
 Tom> name for a zone, which unfortunately it does not. UTC, UCT,
 Tom> Etc/UTC, and about four other strings are equivalent names for the
 Tom> same zone so far as one can tell from the installed data.

The simplest definition is that the names listed in zone.tab or
zone1970.tab if you prefer that one are canonical, and Etc/UTC and the
Etc/GMT[offset] names could be regarded as canonical too. Everything
else is either an alias or a backward-compatibility hack.

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-06-04 11:27:31 -0400, Tom Lane wrote:
> $ TZ=UTC initdb
> ...
> selecting default timezone ... UTC
> ...

Btw, if the input is Etc/UTZ, do you also get UTC or Etc/UTZ? Because it
seems that debian only configures Etc/UTZ on a system-wide basis
now. Which seems not insane, given that's it's a backward compat thing
now.

- Andres



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Christoph" == Christoph Berg <myon@debian.org> writes:

 Christoph> There is something wrong here. On Debian Buster/unstable,
 Christoph> using system tzdata (2019a-1), if /etc/timezone is
 Christoph> "Etc/UTC":

 Christoph> 11.3's initdb adds timezone = 'UCT' to postgresql.conf
 Christoph> 12beta1's initdb add timezone = 'Etc/UCT' to postgresql.conf

fwiw on FreeBSD with no /etc/localtime and no TZ in the environment (and
hence running on UTC), I get "UCT" on both 11.3 and HEAD.

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-06-04 08:53:30 -0700, Andres Freund wrote:
> If I set the system-wide default, using dpkg-reconfigure -plow tzdata,
> to UTC I *do* get Etc/UTC.
> 
> root@alap4:/home/andres/src/postgresql# cat /etc/timezone
> Etc/UTC
> root@alap4:/home/andres/src/postgresql# ls -l /etc/timezone
> -rw-r--r-- 1 root root 8 Jun  4 15:44 /etc/timezone
> 
> selecting default timezone ... Etc/UTC
> 
> This is independent of being built with system or non-system tzdata.
>
> Enabling debugging shows:

Sorry, I was not awake enough while reading the thread (and UCT looks so
similar to UTC).

I do indeed see the behaviour of choosing UCT in 11, but not in
12. Independent of system/non-system tzdata. With system tzdata, I get
the following debug output (after filtering lots of lines wiht out |grep
-v 'scores 0'|grep -v 'uses leap seconds')

TZ "Zulu" gets max score 5200
TZ "UCT" gets max score 5200
TZ "Universal" gets max score 5200
TZ "UTC" gets max score 5200
TZ "Etc/Zulu" gets max score 5200
TZ "Etc/UCT" gets max score 5200
TZ "Etc/Universal" gets max score 5200
TZ "Etc/UTC" gets max score 5200
TZ "localtime" gets max score 5200
TZ "posix/Zulu" gets max score 5200
TZ "posix/UCT" gets max score 5200
TZ "posix/Universal" gets max score 5200
TZ "posix/UTC" gets max score 5200
TZ "posix/Etc/Zulu" gets max score 5200
TZ "posix/Etc/UCT" gets max score 5200
TZ "posix/Etc/Universal" gets max score 5200
TZ "posix/Etc/UTC" gets max score 5200
ok

whereas master only does:

selecting default timezone ... symbolic link "/etc/localtime" contains "/usr/share/zoneinfo/Etc/UTC"
TZ "Etc/UTC" gets max score 5200
Etc/UTC

The reason for the behaviour difference between v12 and 11 is that 12
does:

    /*
     * Try to avoid the brute-force search by seeing if we can recognize the
     * system's timezone setting directly.
     *
     * Currently we just check /etc/localtime; there are other conventions for
     * this, but that seems to be the only one used on enough platforms to be
     * worth troubling over.
     */
    if (check_system_link_file("/etc/localtime", &tt, resultbuf))
        return resultbuf;

which prevents having to iterate through all of these files, and ending
up with a lot of equivalently scored timezones.

Greetings,

Andres Freund



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-06-04 17:20:42 +0100, Andrew Gierth wrote:
> fwiw on FreeBSD with no /etc/localtime and no TZ in the environment (and
> hence running on UTC), I get "UCT" on both 11.3 and HEAD.

That makes sense. As far as I can tell the reason that 12 sometimes ends
up with the proper timezone is that we shortcut the search by:

    /*
     * Try to avoid the brute-force search by seeing if we can recognize the
     * system's timezone setting directly.
     *
     * Currently we just check /etc/localtime; there are other conventions for
     * this, but that seems to be the only one used on enough platforms to be
     * worth troubling over.
     */
    if (check_system_link_file("/etc/localtime", &tt, resultbuf))
        return resultbuf;

which is actually a behaviour changing, rather than just an
optimization, when there's a lot of equivalently scoring timezones.

Greetings,

Andres Freund



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Christoph Berg
Date:
Re: Tom Lane 2019-06-04 <65800.1559662051@sss.pgh.pa.us>
> > There is something wrong here. On Debian Buster/unstable, using
> > system tzdata (2019a-1), if /etc/timezone is "Etc/UTC":
> 
> Is your build using --with-system-tzdata?  If so, which tzdb
> release is the system on, and is it a completely stock copy
> of that release?

It's using system tzdata (2019a-1).

There's one single patch on top of that:

https://sources.debian.org/src/tzdata/2019a-1/debian/patches/

> BTW, does Debian set up /etc/timezone as a symlink, by any chance,
> rather than a copy or hard link?  If it's a symlink, we could improve
> matters by teaching identify_system_timezone() to inspect it.

In the meantime I realized that I was only testing /etc/timezone
(which is a plain file with just the zone name), while not touching
/etc/localtime at all. In this environment, it's a symlink:

lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC

... but the name still gets canonicalized to Etc/UCT or UCT.

Christoph



[ sorry for slow response, I'm on vacation ]

Andres Freund <andres@anarazel.de> writes:
> That makes sense. As far as I can tell the reason that 12 sometimes ends
> up with the proper timezone is that we shortcut the search by:

>     /*
>      * Try to avoid the brute-force search by seeing if we can recognize the
>      * system's timezone setting directly.
>      *
>      * Currently we just check /etc/localtime; there are other conventions for
>      * this, but that seems to be the only one used on enough platforms to be
>      * worth troubling over.
>      */
>     if (check_system_link_file("/etc/localtime", &tt, resultbuf))
>         return resultbuf;

> which is actually a behaviour changing, rather than just an
> optimization, when there's a lot of equivalently scoring timezones.

Sure, that is intentionally a behavior change in this situation.
The theory is that if "Etc/UCT" is what the user put in /etc/localtime,
then that's the spelling she wants.  See 23bd3cec6.

But it seems to me that this code is *not* determining the result in
Christoph's case, because if it were, it'd be settling on Etc/UTC,
according to his followup report that

>> lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC

I'm not too familiar with what actually determines glibc's behavior
on Debian, but I'm suspicious that there's an inconsistency between
/etc/localtime and /etc/timezone.  We won't adopt the spelling we
see in /etc/localtime unless it agrees with the observed behavior of
localtime(3).

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-06-06 12:51:30 -0400, Tom Lane wrote:
> [ sorry for slow response, I'm on vacation ]

Good.


> Andres Freund <andres@anarazel.de> writes:
> > That makes sense. As far as I can tell the reason that 12 sometimes ends
> > up with the proper timezone is that we shortcut the search by:
>
> >     /*
> >      * Try to avoid the brute-force search by seeing if we can recognize the
> >      * system's timezone setting directly.
> >      *
> >      * Currently we just check /etc/localtime; there are other conventions for
> >      * this, but that seems to be the only one used on enough platforms to be
> >      * worth troubling over.
> >      */
> >     if (check_system_link_file("/etc/localtime", &tt, resultbuf))
> >         return resultbuf;
>
> > which is actually a behaviour changing, rather than just an
> > optimization, when there's a lot of equivalently scoring timezones.
>
> Sure, that is intentionally a behavior change in this situation.
> The theory is that if "Etc/UCT" is what the user put in /etc/localtime,
> then that's the spelling she wants.  See 23bd3cec6.

Right, I'm not complaining about that. I'm just noting that that
explains the cross-version divergence.

Note that on 11 I *do* end up with some *other* timezone with the newer
timezone data:

$cat /etc/timezone;ls -l /etc/localtime
Etc/UTC
lrwxrwxrwx 1 root root 27 Jun  6 17:02 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC

$ rm -rf /tmp/tztest;~/build/postgres/11-assert/install/bin/initdb /tmp/tztest 2>&1|grep -v 'scores 0'|grep -v 'uses
leapseconds';grep timezone /tmp/tztest/postgresql.conf
 
...
TZ "Zulu" gets max score 5200
TZ "UCT" gets max score 5200
TZ "Universal" gets max score 5200
TZ "UTC" gets max score 5200
TZ "Etc/Zulu" gets max score 5200
TZ "Etc/UCT" gets max score 5200
TZ "Etc/Universal" gets max score 5200
TZ "Etc/UTC" gets max score 5200
TZ "localtime" gets max score 5200
TZ "posix/Zulu" gets max score 5200
TZ "posix/UCT" gets max score 5200
TZ "posix/Universal" gets max score 5200
TZ "posix/UTC" gets max score 5200
TZ "posix/Etc/Zulu" gets max score 5200
TZ "posix/Etc/UCT" gets max score 5200
TZ "posix/Etc/Universal" gets max score 5200
TZ "posix/Etc/UTC" gets max score 5200
ok
...

log_timezone = 'UCT'
timezone = 'UCT'
#timezone_abbreviations = 'Default'     # Select the set of available time zone
                    # share/timezonesets/.

As you can see the switch from Etc/UTC to UCT does happen here
(presumably in any branch before 12). Which did not happen before the
import of 2019a / when using a system tzdata that's before
that. There you get:

TZ "Zulu" gets max score 5200
TZ "Universal" gets max score 5200
TZ "UTC" gets max score 5200
TZ "Etc/Zulu" gets max score 5200
TZ "Etc/Universal" gets max score 5200
TZ "Etc/UTC" gets max score 5200
ok

and end up with UTC as the selection.

I do think that < 12 clearly regressed here, although it's only exposing
previous behaviour further.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> On 2019-06-06 12:51:30 -0400, Tom Lane wrote:
>> Sure, that is intentionally a behavior change in this situation.
>> The theory is that if "Etc/UCT" is what the user put in /etc/localtime,
>> then that's the spelling she wants.  See 23bd3cec6.

> Right, I'm not complaining about that. I'm just noting that that
> explains the cross-version divergence.

It explains some cross-version divergence for sure.  What I'm still not
clear about is whether Christoph's report is entirely that, or whether
there's some additional factor we don't understand yet.

> As you can see the switch from Etc/UTC to UCT does happen here
> (presumably in any branch before 12). Which did not happen before the
> import of 2019a / when using a system tzdata that's before
> that.

Right.  Before 2019a, UCT would not have been a match to a system
setting of UTC because the zone abbreviation reported by localtime()
was different.  Now it's the same abbreviation.

Maybe we should consider back-patching 23bd3cec6.

            regards, tom lane



Christoph Berg <myon@debian.org> writes:
> In the meantime I realized that I was only testing /etc/timezone
> (which is a plain file with just the zone name), while not touching
> /etc/localtime at all. In this environment, it's a symlink:
> lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC
> ... but the name still gets canonicalized to Etc/UCT or UCT.

Now that I'm home again, I tried to replicate this behavior.  I don't
have Debian Buster installed, but I do have an up-to-date Stretch
install, and I can't get it to do this.  What I see is that

1. HEAD will follow the spelling appearing in /etc/localtime, if that's
a symlink.  It will not pay any attention to /etc/timezone --- but as
far as I can tell, glibc doesn't either.  (For instance, if I remove
/etc/localtime, then date(1) starts reporting UTC, independently of
what /etc/timezone might say.)

2. Pre-v12, or if we can't get a valid zone name out of /etc/localtime,
the identify_system_timezone() search settles on "UCT" as being the
shortest and alphabetically first of the various equivalent names for
the zone.

The only way I can get it to pick "Etc/UCT" is if that's what I put
into /etc/localtime.  (In which case I maintain that that's not a bug,
or at least not our bug.)

So I'm still mystified by Christoph's report, and am forced to suspect
pilot error -- specifically, /etc/localtime not containing what he said.

Anyway, moving on to the question of what should we do about this,
I don't really have anything better to offer than back-patching 23bd3cec6.
I'm fairly hesitant to do that given the small amount of testing it's
gotten ... but given that it's been in the tree since September, maybe
we can feel like we'd have noticed any really bad problems.  I don't have
any use for Andrew's suggestion of looking into zone1970.tab: in the
first place I'm unconvinced that the tzdb guys intend that file to offer
canonical zone names, and in the second place I doubt we can rely on the
file to be present (it's not installed by zic itself), and in the third
place it definitely won't fix this particular issue because it has no
entries for UTC/UCT/GMT etc, only for geographical locations.

Thoughts?

            regards, tom lane


PS: As a side note, I do notice an interesting difference between the
timezone database files as they appear on Debian versus what I see on
RHEL or in a PG-generated timezone tree.  Debian seems to use symlinks
for multiple equivalent zones:

$ ls -l /usr/share/zoneinfo/U??
-rw-r--r-- 1 root root 127 Mar 27 16:34 /usr/share/zoneinfo/UCT
lrwxrwxrwx 1 root root   3 Mar 27 16:34 /usr/share/zoneinfo/UTC -> UCT
$ ls -l /usr/share/zoneinfo/Etc/U??
lrwxrwxrwx 1 root root 6 Mar 27 16:34 /usr/share/zoneinfo/Etc/UCT -> ../UCT
lrwxrwxrwx 1 root root 6 Mar 27 16:34 /usr/share/zoneinfo/Etc/UTC -> ../UCT

but elsewhere these are hard links:

$ ls -l /usr/share/zoneinfo/U??
-rw-r--r--. 8 root root 118 Mar 26 11:37 /usr/share/zoneinfo/UCT
-rw-r--r--. 8 root root 118 Mar 26 11:37 /usr/share/zoneinfo/UTC
$ ls -l /usr/share/zoneinfo/Etc/U??
-rw-r--r--. 8 root root 118 Mar 26 11:37 /usr/share/zoneinfo/Etc/UCT
-rw-r--r--. 8 root root 118 Mar 26 11:37 /usr/share/zoneinfo/Etc/UTC

However, identify_system_timezone() doesn't treat symlinks differently
from regular files, so this doesn't explain anything about the problem
at hand, AFAICS.



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Christoph Berg
Date:
Re: Tom Lane 2019-06-11 <24452.1560285699@sss.pgh.pa.us>
> The only way I can get it to pick "Etc/UCT" is if that's what I put
> into /etc/localtime.  (In which case I maintain that that's not a bug,
> or at least not our bug.)

Did you try a symlink or a plain file for /etc/localtime?

> So I'm still mystified by Christoph's report, and am forced to suspect
> pilot error -- specifically, /etc/localtime not containing what he said.

On Debian unstable, deleting /etc/timezone, $TZ not set, and with this symlink:
lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC

/usr/lib/postgresql/11/bin/initdb -D pgdata
$ grep timezone pgdata/postgresql.conf
log_timezone = 'UCT'
timezone = 'UCT'

/usr/lib/postgresql/12/bin/initdb -D pgdata
$ grep timezone pgdata/postgresql.conf
log_timezone = 'Etc/UTC'
timezone = 'Etc/UTC'

Same behavior on Debian Stretch (stable):
lrwxrwxrwx 1 root root 27 Mai  7 11:14 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC

$ grep timezone pgdata/postgresql.conf
log_timezone = 'UCT'
timezone = 'UCT'

$ grep timezone pgdata/postgresql.conf
log_timezone = 'Etc/UTC'
timezone = 'Etc/UTC'

> Anyway, moving on to the question of what should we do about this,
> I don't really have anything better to offer than back-patching 23bd3cec6.

The PG12 behavior seems sane, so +1.

Christoph



Christoph Berg <myon@debian.org> writes:
> Re: Tom Lane 2019-06-11 <24452.1560285699@sss.pgh.pa.us>
>> The only way I can get it to pick "Etc/UCT" is if that's what I put
>> into /etc/localtime.  (In which case I maintain that that's not a bug,
>> or at least not our bug.)

> Did you try a symlink or a plain file for /etc/localtime?

Symlink --- if it's a plain file, our code can't learn anything from it.

> On Debian unstable, deleting /etc/timezone, $TZ not set, and with this symlink:
> lrwxrwxrwx 1 root root 27 Mär 28 14:49 /etc/localtime -> /usr/share/zoneinfo/Etc/UTC

> /usr/lib/postgresql/11/bin/initdb -D pgdata
> $ grep timezone pgdata/postgresql.conf
> log_timezone = 'UCT'
> timezone = 'UCT'

> /usr/lib/postgresql/12/bin/initdb -D pgdata
> $ grep timezone pgdata/postgresql.conf
> log_timezone = 'Etc/UTC'
> timezone = 'Etc/UTC'

That's what I'd expect.  Do you think your upthread report of HEAD
picking "Etc/UCT" was a typo?  Or maybe you actually had /etc/localtime
set that way?

>> Anyway, moving on to the question of what should we do about this,
>> I don't really have anything better to offer than back-patching 23bd3cec6.

> The PG12 behavior seems sane, so +1.

OK, I'll make that happen.

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >>> Anyway, moving on to the question of what should we do about this,
 >>> I don't really have anything better to offer than back-patching
 >>> 23bd3cec6.

 >> The PG12 behavior seems sane, so +1.

 Tom> OK, I'll make that happen.

This isn't good enough, because it still picks "UCT" on a system with no
/etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on
FreeBSD, but it'll be the same anywhere else):

% ls -l /etc/*time*
ls: /etc/*time*: No such file or directory

% env -u TZ bin/initdb -D data -E UTF8 --no-locale
[...]
selecting default timezone ... UCT

We need to absolutely prefer UTC over UCT if both match.

-- 
Andrew (irc:RhodiumToad)



Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> This isn't good enough, because it still picks "UCT" on a system with no
> /etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on
> FreeBSD, but it'll be the same anywhere else):

[ shrug... ]  Too bad.  I doubt that that's a common situation anyway.

> We need to absolutely prefer UTC over UCT if both match.

I don't see a reason why that's a hard requirement.  There are at least
two ways for a user to override initdb's decision (/etc/localtime or TZ),
or she could just change the GUC setting after the fact, and for that
matter it's not obvious that it matters to most people how TimeZone
is spelled as long as it delivers the right external behavior.  We had
the business with "Navajo" being preferred for US Mountain time for
quite a few years, with not very many complaints.

I don't see any way that we could "fix" this except with a hardwired
special case to prefer UTC over other spellings, and I definitely do
not want to go there.  If we start putting in magic special cases to make
particular zone names be preferred over other ones, where will we stop?
(I've been lurking on the tzdb mailing list for long enough now to know
that that's a fine recipe for opening ourselves up to politically-
motivated demands that name X be preferred over name Y.)

A possibly better idea is to push back on tzdb's choice to unify
these zones.   Don't know if they'd listen, but we could try.  The
UCT symlink hasn't been out there so long that it's got much inertia.

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> This isn't good enough, because it still picks "UCT" on a system with no
 >> /etc/localtime and no TZ variable. Testing on HEAD as of 3da73d683 (on
 >> FreeBSD, but it'll be the same anywhere else):

 Tom> [ shrug... ]  Too bad.  I doubt that that's a common situation anyway.

Literally every server I have set up is like this...

 >> We need to absolutely prefer UTC over UCT if both match.

 Tom> I don't see a reason why that's a hard requirement.

Because the reverse is clearly insane.

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Christopher Browne
Date:


On Fri, Jun 14, 2019, 3:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
A possibly better idea is to push back on tzdb's choice to unify
these zones.   Don't know if they'd listen, but we could try.  The
UCT symlink hasn't been out there so long that it's got much inertia.

One oddity; AIX had a preference for CUT with fallbacks to CUT0 and UCT back when we had AIX boxes (5.2 or 5.3, if my memory still works on this).

We wound up setting PGTZ explicitly to UTC to overrule any such fighting between time zones.

There may therefore be some older history (and some sort of inertia) in AIX land than meets the eye elsewhere.

That doesn't prevent it from being a good idea to talk to tzdb maintainers, of course.

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> This isn't good enough, because it still picks "UCT" on a system
 >> with no /etc/localtime and no TZ variable. Testing on HEAD as of
 >> 3da73d683 (on FreeBSD, but it'll be the same anywhere else):

 Tom> [ shrug... ]  Too bad.  I doubt that that's a common situation anyway.

I'm also reminded that this applies also if the /etc/localtime file is a
_copy_ of the UTC zonefile rather than a symlink, which is possibly even
more common.

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >>> This isn't good enough, because it still picks "UCT" on a system
 >>> with no /etc/localtime and no TZ variable. Testing on HEAD as of
 >>> 3da73d683 (on FreeBSD, but it'll be the same anywhere else):

 Tom> [ shrug... ]  Too bad.  I doubt that that's a common situation anyway.

 Andrew> I'm also reminded that this applies also if the /etc/localtime
 Andrew> file is a _copy_ of the UTC zonefile rather than a symlink,
 Andrew> which is possibly even more common.

And testing shows that if you select "UTC" when installing FreeBSD, you
indeed get /etc/localtime as a copy not a symlink, and I've confirmed
that initdb picks "UCT" in that case.

So here is my current proposed fix.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index 3477a08efd..f7c199a006 100644
--- a/src/bin/initdb/findtimezone.c
+++ b/src/bin/initdb/findtimezone.c
@@ -128,8 +128,11 @@ pg_load_tz(const char *name)
  * the C library's localtime() function.  The database zone that matches
  * furthest into the past is the one to use.  Often there will be several
  * zones with identical rankings (since the IANA database assigns multiple
- * names to many zones).  We break ties arbitrarily by preferring shorter,
- * then alphabetically earlier zone names.
+ * names to many zones).  We break ties by first checking for "preferred"
+ * names (such as "UTC"), and then arbitrarily by preferring shorter, then
+ * alphabetically earlier zone names.  (If we did not explicitly prefer
+ * "UTC", we would get the alias name "UCT" instead due to alphabetic
+ * ordering.)
  *
  * Many modern systems use the IANA database, so if we can determine the
  * system's idea of which zone it is using and its behavior matches our zone
@@ -602,6 +605,28 @@ check_system_link_file(const char *linkname, struct tztry *tt,
 #endif
 }
 
+/*
+ * Given a timezone name, determine whether it should be preferred over other
+ * names which are equally good matches. The output is arbitrary but we will
+ * use 0 for "neutral" default preference.
+ *
+ * Ideally we'd prefer the zone.tab/zone1970.tab names, since in general those
+ * are the ones offered to the user to select from. But for the moment, to
+ * minimize changes in behaviour, simply prefer UTC over alternative spellings
+ * such as UCT that otherwise cause confusion. The existing "shortest first"
+ * rule would prefer "UTC" over "Etc/UTC" so keep that the same way (while
+ * still preferring Etc/UTC over Etc/UCT).
+ */
+static int
+zone_name_pref(const char *zonename)
+{
+    if (strcmp(zonename, "UTC") == 0)
+        return 50;
+    if (strcmp(zonename, "Etc/UTC") == 0)
+        return 40;
+    return 0;
+}
+
 /*
  * Recursively scan the timezone database looking for the best match to
  * the system timezone behavior.
@@ -674,7 +699,8 @@ scan_available_timezones(char *tzdir, char *tzdirsub, struct tztry *tt,
             else if (score == *bestscore)
             {
                 /* Consider how to break a tie */
-                if (strlen(tzdirsub) < strlen(bestzonename) ||
+                if (zone_name_pref(tzdirsub) > zone_name_pref(bestzonename) ||
+                    strlen(tzdirsub) < strlen(bestzonename) ||
                     (strlen(tzdirsub) == strlen(bestzonename) &&
                      strcmp(tzdirsub, bestzonename) < 0))
                     strlcpy(bestzonename, tzdirsub, TZ_STRLEN_MAX + 1);

Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Christoph Berg
Date:
Re: Tom Lane 2019-06-14 <26948.1560517875@sss.pgh.pa.us>
> > /usr/lib/postgresql/12/bin/initdb -D pgdata
> > $ grep timezone pgdata/postgresql.conf
> > log_timezone = 'Etc/UTC'
> > timezone = 'Etc/UTC'
> 
> That's what I'd expect.  Do you think your upthread report of HEAD
> picking "Etc/UCT" was a typo?  Or maybe you actually had /etc/localtime
> set that way?

That was likely a typo, yes. Sorry for the confusion, there's many
variables...

Christoph



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote:
> So here is my current proposed fix.

Before pushing a commit that's controversial - and this clearly seems to
somewhat be - it'd be good to give others a heads up that you intend to
do so, so they can object. Rather than just pushing less than 24h later,
without a warning.

- Andres



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote:
> > So here is my current proposed fix.
>
> Before pushing a commit that's controversial - and this clearly seems to
> somewhat be - it'd be good to give others a heads up that you intend to
> do so, so they can object. Rather than just pushing less than 24h later,
> without a warning.

Seems like that would have meant a potentially very late commit to avoid
having a broken (for some value of broken anyway) point release (either
with new code, or with reverting the timezone changes previously
committed), which isn't great either.

In general, I agree with you, and we should try to give everyone time to
discuss when something is controversial, but this seems like it was at
least a bit of a tough call.

Thanks,

Stephen

Attachment

Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-06-17 14:34:58 -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote:
> > > So here is my current proposed fix.
> > 
> > Before pushing a commit that's controversial - and this clearly seems to
> > somewhat be - it'd be good to give others a heads up that you intend to
> > do so, so they can object. Rather than just pushing less than 24h later,
> > without a warning.
> 
> Seems like that would have meant a potentially very late commit to avoid
> having a broken (for some value of broken anyway) point release (either
> with new code, or with reverting the timezone changes previously
> committed), which isn't great either.

> In general, I agree with you, and we should try to give everyone time to
> discuss when something is controversial, but this seems like it was at
> least a bit of a tough call.

Hm? All I'm saying is that Andrew's email should have included something
to the effect of "Due to the upcoming release, I'm intending to push and
backpatch the attached fix in ~20h".

Greetings,

Andres Freund



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-06-17 14:34:58 -0400, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > On 2019-06-14 23:14:09 +0100, Andrew Gierth wrote:
> > > > So here is my current proposed fix.
> > >
> > > Before pushing a commit that's controversial - and this clearly seems to
> > > somewhat be - it'd be good to give others a heads up that you intend to
> > > do so, so they can object. Rather than just pushing less than 24h later,
> > > without a warning.
> >
> > Seems like that would have meant a potentially very late commit to avoid
> > having a broken (for some value of broken anyway) point release (either
> > with new code, or with reverting the timezone changes previously
> > committed), which isn't great either.
>
> > In general, I agree with you, and we should try to give everyone time to
> > discuss when something is controversial, but this seems like it was at
> > least a bit of a tough call.
>
> Hm? All I'm saying is that Andrew's email should have included something
> to the effect of "Due to the upcoming release, I'm intending to push and
> backpatch the attached fix in ~20h".

Ah, ok, I agree that would have been good to do.  Of course, hindsight
being 20/20 and all that.  Something to keep in mind for the future
though.

Thanks,

Stephen

Attachment

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Robert Haas
Date:
On Mon, Jun 17, 2019 at 2:41 PM Stephen Frost <sfrost@snowman.net> wrote:
> Ah, ok, I agree that would have been good to do.  Of course, hindsight
> being 20/20 and all that.  Something to keep in mind for the future
> though.

I think it was inappropriate to commit this at all.  You can't just
say "some other committer objects, but I think I'm right so I'll just
ignore them and commit anyway."  If we all do that it'll be chaos.

I don't know exactly how many concurring vote it takes to override
somebody else's -1, but it's got to be more than zero.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jun 17, 2019 at 2:41 PM Stephen Frost <sfrost@snowman.net> wrote:
>> Ah, ok, I agree that would have been good to do.  Of course, hindsight
>> being 20/20 and all that.  Something to keep in mind for the future
>> though.

> I think it was inappropriate to commit this at all.  You can't just
> say "some other committer objects, but I think I'm right so I'll just
> ignore them and commit anyway."  If we all do that it'll be chaos.

FWIW, that was my concern about this.

> I don't know exactly how many concurring vote it takes to override
> somebody else's -1, but it's got to be more than zero.

If even one other person had +1'd Andrew's proposal, I'd have yielded
to the consensus --- this was certainly an issue on which it's not
totally clear what to do.  But unless I missed some traffic, the vote
was exactly 1 to 1.  There is no way that that represents consensus to
commit.

Also on the topic of process: 48 hours before a wrap deadline is
*particularly* not the time to play fast and loose with this sort of
thing.  It'd have been better to wait till after this week's releases,
so there'd at least be time to reconsider if the patch turned out to
have unexpected side-effects.

            regards, tom lane



BTW ... now that that patch has been in long enough to collect some
actual data on what it's doing, I set out to scrape the buildfarm logs
to see what is happening in the farm.  Here are the popularities of
various timezone settings, as of the end of May:

      3 America/Los_Angeles
      9 America/New_York
      3 America/Sao_Paulo
      2 Asia/Tokyo
      2 CET
     24 Etc/UTC
      3 Europe/Amsterdam
     11 Europe/Berlin
      1 Europe/Brussels
      1 Europe/Helsinki
      1 Europe/Isle_of_Man
      2 Europe/London
      7 Europe/Paris
      6 Europe/Prague
      5 Europe/Stockholm
      1 ROK
      7 UCT
      1 US/Central
      7 US/Eastern
      2 US/Pacific
     15 UTC
      1 localtime

(These are the zone choices reported in the initdb-C step for the
animal's last successful run before 06-01.  I excluded animals for which
the configuration summary shows that their choice is being forced by a
TZ environment variable.)

As of now, six of the seven UCT-reporting members have switched to UTC;
the lone holdout is elver which hasn't run in ten days.  (Perhaps it
zneeds unwedged.)  There are no other changes, so it seems like Andrew's
patch is doing what it says on the tin.

However, that one entry for 'localtime' disturbs me. (It's from snapper.)
That seems like a particularly useless choice of representation: it's not
informative, it's not portable, and it would lead to postmaster startup
failure if someone were to remove the machine's localtime file, which
I assume is a nonstandard insertion into /usr/share/zoneinfo.  Very
likely the only reason we don't see this behavior more is that sticking
a "localtime" file into /usr/share/zoneinfo is an obsolescent practice.
On machines that have such a file, it has a good chance of winning on
the grounds of being a short name.

So I'm toying with the idea of extending Andrew's patch to put a negative
preference on "localtime", ensuring we'll use some other name for the zone
if one is available.

Also, now that we have this mechanism, maybe we should charge it with
de-preferencing the old "Factory" zone, removing the hard-wired kluge
that we currently have for rejecting that.  (Modern tzdb doesn't install
"Factory" at all, but some installations might still do so in the service
of blind backwards compatibility.)

Thoughts?

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Thomas Munro
Date:
On Thu, Jun 20, 2019 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As of now, six of the seven UCT-reporting members have switched to UTC;
> the lone holdout is elver which hasn't run in ten days.  (Perhaps it
> zneeds unwedged.)  There are no other changes, so it seems like Andrew's
> patch is doing what it says on the tin.

Oops.  Apparentlly REL_10 of the build farm scripts lost the ability
to find "buildroot" in the current working directory automatically.  I
have updated eelpout and elver's .conf file to have an explicit path,
and they are now busily building stuff.

-- 
Thomas Munro
https://enterprisedb.com



I wrote:
> So I'm toying with the idea of extending Andrew's patch to put a negative
> preference on "localtime", ensuring we'll use some other name for the zone
> if one is available.

Oh ... after further review it seems like "posixrules" should be
de-preferred on the same basis: it's uninformative and unportable,
and it's short enough to have a good chance of capturing initdb's
attention.  I recall having seen at least one machine picking it
recently.

Moreover, while I think most tzdb installations have that file (ours
certainly do), the handwriting is on the wall for it to go away,
leaving only postmaster startup failures behind:

http://mm.icann.org/pipermail/tz/2019-June/028172.html

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> So I'm toying with the idea of extending Andrew's patch to put a
 Tom> negative preference on "localtime", ensuring we'll use some other
 Tom> name for the zone if one is available.

 Tom> Also, now that we have this mechanism, maybe we should charge it
 Tom> with de-preferencing the old "Factory" zone, removing the
 Tom> hard-wired kluge that we currently have for rejecting that.
 Tom> (Modern tzdb doesn't install "Factory" at all, but some
 Tom> installations might still do so in the service of blind backwards
 Tom> compatibility.)

I was planning on submitting a follow-up myself (for pg13+) for
discussion of further improvements. My suggestion would be that we
should have the following order of preference, from highest to lowest:

 - UTC  (justified by being an international standard)
 
 - Etc/UTC

 - zones in zone.tab/zone1970.tab:

     These are the zone names that are intended to be presented to the
     user to select from. Dispute the exact meaning as you will, but I
     think it makes sense that these names should be chosen over
     equivalently good matches just on that basis.

 - zones in Africa/ America/ Antarctica/ Asia/ Atlantic/ Australia/
   Europe/ Indian/ Pacific/ Arctic/

     These subdirs are the ones generated by the "primary" zone data
     files, including both Zone and Link statements but not counting
     the "backward" and "etcetera" files.

 - GMT  (justified on the basis of its presence as a default in the code)

 - Etc/*

 - any other zone name with a /

 - any zone name without a /, excluding 'localtime' and 'Factory'

 - 'localtime'

 - 'Factory'

Choosing names with / over ones without is a change from our existing
preference for shorter names, but it's more robust in the face of the
various crap that gets dumped in the top level of the zoneinfo dir.
It could be argued that we should reverse the relative order of UTC vs.
Etc/UTC and likewise for GMT for the same reason, but I think that's
less important.

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom>       1 Europe/Isle_of_Man

Is this from HEAD and therefore possibly getting the value from an
/etc/localtime symlink? I can't see any other way that
Europe/Isle_of_Man could ever be chosen over Europe/London...

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Also on the topic of process: 48 hours before a wrap deadline is
> *particularly* not the time to play fast and loose with this sort of
> thing.  It'd have been better to wait till after this week's releases,
> so there'd at least be time to reconsider if the patch turned out to
> have unexpected side-effects.

Our typical process for changes that actually end up breaking other
things is to put things back the way they were and come up with a
better answer.

Should we have reverted the code change that caused the issue in the
first place, namely, as I understand it at least, the tz code update, to
give us time to come up with a better solution and to fix it properly?

I'll admit that I wasn't following the thread very closely initially,
but I don't recall seeing that even discussed as an option, even though
we do it routinely and even had another such case for this set of
releases.  Possibly a bad assumption on my part, but I did assume that
the lack of such a discussion meant that reverting wasn't really an
option due to the nature of the changes, leading us into an atypical
case already where our usual processes weren't able to be followed.

That doesn't mean we should throw the whole thing out the window either,
certainly, but I'm not sure that between the 3 options of 'revert',
'live with things being arguably broken', and 'push a contentious
commit' that I'd have seen a better option either.

I do agree that it would have been better if intentions had been made
clearer, such as announcing the plan to push the changes so that we
didn't end up with an issue during this patch set (either from out of
date zone information, or from having the wrong timezone alias be used),
but also with feelings on both sides- if there had been a more explicit
"hey, we really need input from someone else on which way they think
this should go" ideally with the options spelled out, it would have
helped.

I don't want to come across as implying that I'm saying what was done
was 'fine', or that we shouldn't be having this conversation, I'm just
trying to figure out how we can frame it in a way that we learn from it
and work to improve on it for the future, should something like this
happen again.

Thanks,

Stephen

Attachment

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Robert Haas
Date:
On Thu, Jun 20, 2019 at 8:52 AM Stephen Frost <sfrost@snowman.net> wrote:
> I don't want to come across as implying that I'm saying what was done
> was 'fine', or that we shouldn't be having this conversation, I'm just
> trying to figure out how we can frame it in a way that we learn from it
> and work to improve on it for the future, should something like this
> happen again.

I agree that it's a difficult situation.  I do kind of wonder whether
we were altogether overreacting.  If we had shipped it as it was,
what's the worst thing that would have happened?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-06-20 12:02:30 -0400, Robert Haas wrote:
> On Thu, Jun 20, 2019 at 8:52 AM Stephen Frost <sfrost@snowman.net> wrote:
> > I don't want to come across as implying that I'm saying what was done
> > was 'fine', or that we shouldn't be having this conversation, I'm just
> > trying to figure out how we can frame it in a way that we learn from it
> > and work to improve on it for the future, should something like this
> > happen again.
> 
> I agree that it's a difficult situation.  I do kind of wonder whether
> we were altogether overreacting.  If we had shipped it as it was,
> what's the worst thing that would have happened?

I think it's not good, but also nothing particularly bad came out of
it. I don't think we should try to set up procedures for future
occurances, and rather work/plan on that not happening very often.

Greetings,

Andres Freund



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2019-06-20 12:02:30 -0400, Robert Haas wrote:
> > On Thu, Jun 20, 2019 at 8:52 AM Stephen Frost <sfrost@snowman.net> wrote:
> > > I don't want to come across as implying that I'm saying what was done
> > > was 'fine', or that we shouldn't be having this conversation, I'm just
> > > trying to figure out how we can frame it in a way that we learn from it
> > > and work to improve on it for the future, should something like this
> > > happen again.
> >
> > I agree that it's a difficult situation.  I do kind of wonder whether
> > we were altogether overreacting.  If we had shipped it as it was,
> > what's the worst thing that would have happened?
>
> I think it's not good, but also nothing particularly bad came out of
> it. I don't think we should try to set up procedures for future
> occurances, and rather work/plan on that not happening very often.

Agreed.

Thanks,

Stephen

Attachment

Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Alvaro Herrera
Date:
On 2019-Jun-20, Andres Freund wrote:

> On 2019-06-20 12:02:30 -0400, Robert Haas wrote:

> > I agree that it's a difficult situation.  I do kind of wonder whether
> > we were altogether overreacting.  If we had shipped it as it was,
> > what's the worst thing that would have happened?
> 
> I think it's not good, but also nothing particularly bad came out of
> it. I don't think we should try to set up procedures for future
> occurances, and rather work/plan on that not happening very often.

I suppose we could have a moratorium on commits starting from (say) EOB
Wednesday of the week prior to the release; patches can only be
committed after that if they have ample support (where "ample support"
might be defined as having +1 from, say, two other committers).  That
way there's time to discuss/revert/fix anything that is deemed
controversial.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Robert Haas
Date:
On Thu, Jun 20, 2019 at 1:28 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> I suppose we could have a moratorium on commits starting from (say) EOB
> Wednesday of the week prior to the release; patches can only be
> committed after that if they have ample support (where "ample support"
> might be defined as having +1 from, say, two other committers).  That
> way there's time to discuss/revert/fix anything that is deemed
> controversial.

Or we could have a moratorium on any change at any time that has a -1
from a committer and a +1 from nobody.

I mean, your idea is not bad either.  I'm just saying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Stephen Frost
Date:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Jun 20, 2019 at 1:28 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > I suppose we could have a moratorium on commits starting from (say) EOB
> > Wednesday of the week prior to the release; patches can only be
> > committed after that if they have ample support (where "ample support"
> > might be defined as having +1 from, say, two other committers).  That
> > way there's time to discuss/revert/fix anything that is deemed
> > controversial.
>
> Or we could have a moratorium on any change at any time that has a -1
> from a committer and a +1 from nobody.

What about a change that's already been committed but another committer
feels caused a regression?  If that gets a -1, does it get reverted
until things are sorted out, or...?

In the situation that started this discussion, a change had already been
made and it was only later realized that it caused a regression.  Piling
on to that, the regression was entwined with other important changes
that we wanted to include in the release.

Having a system where when the commit was made is a driving factor seems
like it would potentially reward people who pushed a change early by
giving them the upper hand in such a discussion as this.

Ultimately though, I still agree with Andres that this is something we
should act to avoid these situation and we shouldn't try to make a
policy to fit what's been a very rare occurance.  If nothing else, I
feel like we'd probably re-litigate the policy every time since it would
likely have been a long time since the last discussion of it and the
specific circumstances will always be at least somewhat different.

Thanks,

Stephen

Attachment

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes:

 Stephen> In the situation that started this discussion, a change had
 Stephen> already been made and it was only later realized that it
 Stephen> caused a regression.

Just to keep the facts straight:

The regression was introduced by importing tzdb 2019a (in late April)
into the previous round of point releases; the change in UTC behaviour
was not mentioned in the commit and presumably didn't show up on
anyone's radar until there were field complaints (which didn't reach our
mailing lists until Jun 4 as far as I know).

Tom's "fix" of backpatching 23bd3cec6 (which happened on Friday 14th)
addressed only a subset of cases, as far as I know working only on Linux
(the historical convention has always been for /etc/localtime to be a
copy of a zonefile, not a symlink to one). I only decided to write (and
if need be commit) my own followup fix after confirming that the bug was
unfixed in a default FreeBSD install when set to UTC, and there was a
good chance that a number of other less-popular platforms were affected
too.

 Stephen> Piling on to that, the regression was entwined with other
 Stephen> important changes that we wanted to include in the release.

I'm not sure what you're referring to here?

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Stephen Frost
Date:
Greetings,

* Andrew Gierth (andrew@tao11.riddles.org.uk) wrote:
> >>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes:
>
>  Stephen> In the situation that started this discussion, a change had
>  Stephen> already been made and it was only later realized that it
>  Stephen> caused a regression.
>
> Just to keep the facts straight:
>
> The regression was introduced by importing tzdb 2019a (in late April)

Ah, thanks, I had misunderstood when that was committed then.

> into the previous round of point releases; the change in UTC behaviour
> was not mentioned in the commit and presumably didn't show up on
> anyone's radar until there were field complaints (which didn't reach our
> mailing lists until Jun 4 as far as I know).

Ok.

> Tom's "fix" of backpatching 23bd3cec6 (which happened on Friday 14th)
> addressed only a subset of cases, as far as I know working only on Linux
> (the historical convention has always been for /etc/localtime to be a
> copy of a zonefile, not a symlink to one). I only decided to write (and
> if need be commit) my own followup fix after confirming that the bug was
> unfixed in a default FreeBSD install when set to UTC, and there was a
> good chance that a number of other less-popular platforms were affected
> too.
>
>  Stephen> Piling on to that, the regression was entwined with other
>  Stephen> important changes that we wanted to include in the release.
>
> I'm not sure what you're referring to here?

I was referring to the fact that the regression was introduced by a,
presumably important, tzdb update (2019a, as mentioned above).  At
least, I made the assumption that the commit of the import of 2019a had
more than just the change that introduced the regression, but I'm happy
to admit I'm no where near as close to the code here as you/Tom here.

Thanks,

Stephen

Attachment
Stephen Frost <sfrost@snowman.net> writes:
> * Andrew Gierth (andrew@tao11.riddles.org.uk) wrote:
> "Stephen" == Stephen Frost <sfrost@snowman.net> writes:
>> Stephen> Piling on to that, the regression was entwined with other
>> Stephen> important changes that we wanted to include in the release.
>> 
>> I'm not sure what you're referring to here?

I was confused by that too.

> I was referring to the fact that the regression was introduced by a,
> presumably important, tzdb update (2019a, as mentioned above).  At
> least, I made the assumption that the commit of the import of 2019a had
> more than just the change that introduced the regression, but I'm happy
> to admit I'm no where near as close to the code here as you/Tom here.

Keep in mind that dealing with whatever tzdb chooses to ship is not
optional from our standpoint.  Even if we'd refused to import 2019a,
every installation using --with-system-tzdata (which, I sincerely hope,
includes most production installs) is going to have to deal with it
as soon as the respective platform vendor gets around to shipping the
tzdata update.  So reverting that commit was never on the table.

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> I was referring to the fact that the regression was introduced by a,
 >> presumably important, tzdb update (2019a, as mentioned above). At
 >> least, I made the assumption that the commit of the import of 2019a
 >> had more than just the change that introduced the regression, but
 >> I'm happy to admit I'm no where near as close to the code here as
 >> you/Tom here.

 Tom> Keep in mind that dealing with whatever tzdb chooses to ship is
 Tom> not optional from our standpoint. Even if we'd refused to import
 Tom> 2019a, every installation using --with-system-tzdata (which, I
 Tom> sincerely hope, includes most production installs) is going to
 Tom> have to deal with it as soon as the respective platform vendor
 Tom> gets around to shipping the tzdata update. So reverting that
 Tom> commit was never on the table.

Exactly. But that means that if the combination of our arbitrary rules
and the data in the tzdb results in an undesirable result, then we have
no real option but to fix our rules (we can't reasonably expect the tzdb
upstream to choose zone names to make our alphabetical-order preference
come out right).

My commit was intended to be the minimum fix that would restore the
pre-2019a behavior on all systems.

-- 
Andrew (irc:RhodiumToad)



Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> Keep in mind that dealing with whatever tzdb chooses to ship is
>  Tom> not optional from our standpoint. Even if we'd refused to import
>  Tom> 2019a, every installation using --with-system-tzdata (which, I
>  Tom> sincerely hope, includes most production installs) is going to
>  Tom> have to deal with it as soon as the respective platform vendor
>  Tom> gets around to shipping the tzdata update. So reverting that
>  Tom> commit was never on the table.

> Exactly. But that means that if the combination of our arbitrary rules
> and the data in the tzdb results in an undesirable result, then we have
> no real option but to fix our rules (we can't reasonably expect the tzdb
> upstream to choose zone names to make our alphabetical-order preference
> come out right).

My position is basically that having TimeZone come out as 'UCT' rather
than 'UTC' (affecting no visible behavior of the timestamp types, AFAIK)
was not such a grave problem as to require violating community norms
to get it fixed in this week's releases rather than the next batch.

I hadn't had time to consider your patch last week because I was (a)
busy with release prep and (b) sick as a dog.  I figured we could let
it slide and discuss it after the release work died down.  I imagine
the reason you got zero other responses was that nobody else thought
it was of life-and-death urgency either.

Anyway, as I said already, my beef is not with the substance of the
patch but with failing to follow community process.  One "yes" vote
and one "no" vote do not constitute consensus.  You had no business
assuming that I would reverse the "no" vote.

            regards, tom lane



Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Tom's "fix" of backpatching 23bd3cec6 (which happened on Friday 14th)
> addressed only a subset of cases, as far as I know working only on Linux
> (the historical convention has always been for /etc/localtime to be a
> copy of a zonefile, not a symlink to one). I only decided to write (and
> if need be commit) my own followup fix after confirming that the bug was
> unfixed in a default FreeBSD install when set to UTC, and there was a
> good chance that a number of other less-popular platforms were affected
> too.

I think your info is out of date on that.

NetBSD uses a symlink, and has done for at least 5 years: see
set_timezone in
http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.sbin/sysinst/util.c?only_with_tag=MAIN

macOS seems to have done it like that for at least 10 years, too.
I didn't bother digging into their source repo, as it's likely that
System Preferences isn't open-source; but *all* of my macOS machines
have symlinks there, and some of those link files are > 10 years old.

I could not easily find OpenBSD's logic to set the zone during install,
if they have any; but at least their admin-facing documentation says to
create the file as a symlink:
https://www.openbsd.org/faq/faq8.html#TimeZone
and there are plenty of similar recommendations found by Mr. Google.

In short, I think FreeBSD are holdouts not the norm.  I note that
even their code will preserve /etc/localtime's symlink status if
it was a symlink to start with: see install_zoneinfo_file in
https://github.com/freebsd/freebsd/blob/master/usr.sbin/tzsetup/tzsetup.c

            regards, tom lane



[ starting to come up for air again after a truly nasty sinus infection...
  fortunately, once I stopped thinking it was "a cold" and went to the
  doctor, antibiotics seem to be working ]

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom>       1 Europe/Isle_of_Man

> Is this from HEAD and therefore possibly getting the value from an
> /etc/localtime symlink? I can't see any other way that
> Europe/Isle_of_Man could ever be chosen over Europe/London...

All of the results I quoted there are HEAD-only, since we did not put
the code to make initdb print its timezone selection into the back
branches until 14-June.

            regards, tom lane



Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> I was planning on submitting a follow-up myself (for pg13+) for
> discussion of further improvements. My suggestion would be that we
> should have the following order of preference, from highest to lowest:

>  - UTC  (justified by being an international standard)
>  - Etc/UTC
>  - zones in zone.tab/zone1970.tab:
>      These are the zone names that are intended to be presented to the
>      user to select from. Dispute the exact meaning as you will, but I
>      think it makes sense that these names should be chosen over
>      equivalently good matches just on that basis.
>  - zones in Africa/ America/ Antarctica/ Asia/ Atlantic/ Australia/
>    Europe/ Indian/ Pacific/ Arctic/
>      These subdirs are the ones generated by the "primary" zone data
>      files, including both Zone and Link statements but not counting
>      the "backward" and "etcetera" files.
>  - GMT  (justified on the basis of its presence as a default in the code)
>  - Etc/*
>  - any other zone name with a /
>  - any zone name without a /, excluding 'localtime' and 'Factory'
>  - 'localtime'
>  - 'Factory'

TBH, I find this borderline insane: it's taking a problem we did not
have and moving the goalposts to the next county.  Not just any
old county, either, but one where there's a shooting war going on.

As soon as you do something like putting detailed preferences into the
zone name selection rules, you are going to be up against problems like
"should Europe/ have priority over Asia/, or vice versa?"  This is not
academic; see for example

Link    Asia/Nicosia    Europe/Nicosia
Link    Europe/Istanbul    Asia/Istanbul    # Istanbul is in both continents.

These choices affect exactly the people who are going to get bent out of
shape because you picked the "wrong" name for their zone.  Doesn't matter
that both names are "wrong" to different subsets.

As long as we have a trivial and obviously apolitical rule like
alphabetical order, I think we can skate over such things; but the minute
we have any sort of human choices involved there, we're going to be
getting politically driven requests to do-it-like-this-because-I-think-
the-default-should-be-that.  Again, trawl the tzdb list archives for
awhile if you think this might not be a problem:
http://mm.icann.org/pipermail/tz/

I think we can get away with fixing simple cases that are directly
caused by tzdb's own idiosyncrasies, ie "localtime" and "posixrules"
and "Factory".  If we go further than that, we *will* regret it.

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> TBH, I find this borderline insane: it's taking a problem we did
 Tom> not have and moving the goalposts to the next county. Not just any
 Tom> old county, either, but one where there's a shooting war going on.

 Tom> As soon as you do something like putting detailed preferences into
 Tom> the zone name selection rules, you are going to be up against
 Tom> problems like "should Europe/ have priority over Asia/, or vice
 Tom> versa?"

I would say that this problem exists with arbitrary preferences too.

 Tom> As long as we have a trivial and obviously apolitical rule like
 Tom> alphabetical order, I think we can skate over such things; but the
 Tom> minute we have any sort of human choices involved there, we're
 Tom> going to be getting politically driven requests to
 Tom> do-it-like-this-because-I-think- the-default-should-be-that.

The actual content of the rules I suggested all come from the tzdb
distribution; anyone complaining can be told to take it up with them.

For the record, this is the list of zones (91 out of 348, or about 26%)
that we currently deduce wrongly, as obtained by trying each zone name
listed in zone1970.tab and seeing which zone we deduce when that zone's
file is copied to /etc/localtime. Note in particular that our arbitrary
rules heavily prefer the deprecated backward-compatibility aliases which
are the most likely to disappear in future versions.

(not all of these are fixable, of course)

Africa/Abidjan -> GMT
Africa/Cairo -> Egypt
Africa/Johannesburg -> Africa/Maseru
Africa/Maputo -> Africa/Harare
Africa/Nairobi -> Africa/Asmara
Africa/Tripoli -> Libya
America/Adak -> US/Aleutian
America/Anchorage -> US/Alaska
America/Argentina/Buenos_Aires -> America/Buenos_Aires
America/Argentina/Catamarca -> America/Catamarca
America/Argentina/Cordoba -> America/Cordoba
America/Argentina/Jujuy -> America/Jujuy
America/Argentina/Mendoza -> America/Mendoza
America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia
America/Chicago -> US/Central
America/Creston -> MST
America/Curacao -> America/Aruba
America/Denver -> Navajo
America/Detroit -> US/Michigan
America/Edmonton -> Canada/Mountain
America/Havana -> Cuba
America/Indiana/Indianapolis -> US/East-Indiana
America/Indiana/Knox -> America/Knox_IN
America/Jamaica -> Jamaica
America/Kentucky/Louisville -> America/Louisville
America/Los_Angeles -> US/Pacific
America/Manaus -> Brazil/West
America/Mazatlan -> Mexico/BajaSur
America/Mexico_City -> Mexico/General
America/New_York -> US/Eastern
America/Panama -> EST
America/Phoenix -> US/Arizona
America/Port_of_Spain -> America/Virgin
America/Rio_Branco -> Brazil/Acre
America/Sao_Paulo -> Brazil/East
America/Toronto -> Canada/Eastern
America/Vancouver -> Canada/Pacific
America/Whitehorse -> Canada/Yukon
America/Winnipeg -> Canada/Central
Asia/Dhaka -> Asia/Dacca
Asia/Ho_Chi_Minh -> Asia/Saigon
Asia/Hong_Kong -> Hongkong
Asia/Jerusalem -> Israel
Asia/Kathmandu -> Asia/Katmandu
Asia/Kuala_Lumpur -> Singapore
Asia/Macau -> Asia/Macao
Asia/Riyadh -> Asia/Aden
Asia/Seoul -> ROK
Asia/Shanghai -> PRC
Asia/Singapore -> Singapore
Asia/Taipei -> ROC
Asia/Tehran -> Iran
Asia/Thimphu -> Asia/Thimbu
Asia/Tokyo -> Japan
Asia/Ulaanbaatar -> Asia/Ulan_Bator
Atlantic/Reykjavik -> Iceland
Atlantic/South_Georgia -> Etc/GMT+2
Australia/Adelaide -> Australia/South
Australia/Broken_Hill -> Australia/Yancowinna
Australia/Darwin -> Australia/North
Australia/Lord_Howe -> Australia/LHI
Australia/Melbourne -> Australia/Victoria
Australia/Perth -> Australia/West
Australia/Sydney -> Australia/ACT
Europe/Belgrade -> Europe/Skopje
Europe/Dublin -> Eire
Europe/Istanbul -> Turkey
Europe/Lisbon -> Portugal
Europe/London -> GB
Europe/Moscow -> W-SU
Europe/Warsaw -> Poland
Europe/Zurich -> Europe/Vaduz
Indian/Christmas -> Etc/GMT-7
Indian/Mahe -> Etc/GMT-4
Indian/Reunion -> Etc/GMT-4
Pacific/Auckland -> NZ
Pacific/Chatham -> NZ-CHAT
Pacific/Chuuk -> Pacific/Yap
Pacific/Funafuti -> Etc/GMT-12
Pacific/Gambier -> Etc/GMT+9
Pacific/Guadalcanal -> Etc/GMT-11
Pacific/Honolulu -> US/Hawaii
Pacific/Kwajalein -> Kwajalein
Pacific/Pago_Pago -> US/Samoa
Pacific/Palau -> Etc/GMT-9
Pacific/Pohnpei -> Pacific/Ponape
Pacific/Port_Moresby -> Etc/GMT-10
Pacific/Tahiti -> Etc/GMT+10
Pacific/Tarawa -> Etc/GMT-12
Pacific/Wake -> Etc/GMT-12
Pacific/Wallis -> Etc/GMT-12

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Thomas Munro
Date:
On Wed, Jun 26, 2019 at 6:32 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> Pacific/Auckland -> NZ

Right.  On a FreeBSD system here in New Zealand you get "NZ" with
default configure options (ie using PostgreSQL's tzdata).  But if you
build with --with-system-tzdata=/usr/share/zoneinfo you get
"Pacific/Auckland", and that's because the FreeBSD zoneinfo directory
doesn't include the old non-city names like "NZ", "GB", "Japan",
"US/Eastern" etc.  (Unfortunately the FreeBSD packages for PostgreSQL
are not being built with that option so initdb chooses the old names.
Something to take up with the maintainers.)

-- 
Thomas Munro
https://enterprisedb.com



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Thomas" == Thomas Munro <thomas.munro@gmail.com> writes:

 >> Pacific/Auckland -> NZ

 Thomas> Right. On a FreeBSD system here in New Zealand you get "NZ"
 Thomas> with default configure options (ie using PostgreSQL's tzdata).
 Thomas> But if you build with --with-system-tzdata=/usr/share/zoneinfo
 Thomas> you get "Pacific/Auckland", and that's because the FreeBSD
 Thomas> zoneinfo directory doesn't include the old non-city names like
 Thomas> "NZ", "GB", "Japan", "US/Eastern" etc. (Unfortunately the
 Thomas> FreeBSD packages for PostgreSQL are not being built with that
 Thomas> option so initdb chooses the old names. Something to take up
 Thomas> with the maintainers.)

Same issue here with Europe/London getting "GB".

-- 
Andrew (irc:RhodiumToad)



Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Thomas" == Thomas Munro <thomas.munro@gmail.com> writes:
>  Thomas> Right. On a FreeBSD system here in New Zealand you get "NZ"
>  Thomas> with default configure options (ie using PostgreSQL's tzdata).
>  Thomas> But if you build with --with-system-tzdata=/usr/share/zoneinfo
>  Thomas> you get "Pacific/Auckland", and that's because the FreeBSD
>  Thomas> zoneinfo directory doesn't include the old non-city names like
>  Thomas> "NZ", "GB", "Japan", "US/Eastern" etc.

> Same issue here with Europe/London getting "GB".

FreeBSD offers yet another obstacle to Andrew's proposal:

$ uname -a
FreeBSD rpi3.sss.pgh.pa.us 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC  arm64
$ ls /usr/share/zoneinfo/
Africa/         Australia/      Etc/            MST             WET
America/        CET             Europe/         MST7MDT         posixrules
Antarctica/     CST6CDT         Factory         PST8PDT         zone.tab
Arctic/         EET             HST             Pacific/
Asia/           EST             Indian/         SystemV/
Atlantic/       EST5EDT         MET             UTC

No zone1970.tab.  I do not think we can rely on that file being there,
since zic itself doesn't install it; it's up to packagers whether or
where to install the "*.tab" files.

In general, the point I'm trying to make is that our policy should be
"Ties are broken arbitrarily, and if you don't like the choice that initdb
makes, here's how to fix it".  As soon as we try to break some ties in
favor of somebody's idea of what is "right", we are in for neverending
problems with different people disagreeing about what is "right", and
insisting that their preference should be the one the code enforces.
Let's *please* not go there, or even within hailing distance of it.

(By this light, even preferring UTC over UCT is a dangerous precedent.
I won't argue for reverting that, but I don't want to go further.)

            regards, tom lane



Further on this --- I now remember that the reason we used to want to
reject the "Factory" timezone is that it used to report this as the
zone abbreviation:

    Local time zone must be set--see zic manual page

which (a) resulted in syntactically invalid timestamp output from the
timeofday() function and (b) completely screwed up the column width
in the pg_timezone_names view.

But since 2016g, it's reported the much-less-insane string "-00".
I propose therefore that it's time to just drop the discrimination
against "Factory", as per attached.  There doesn't seem to be any
reason anymore to forbid people from seeing it in pg_timezone_names
or selecting it as the timezone if they're so inclined.  We would
only have a problem if somebody is using --with-system-tzdata in
a machine where they've not updated the system tzdata since 2016,
and I'm no longer willing to consider that a valid use-case.

            regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 9def318..91b1847 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -4845,19 +4845,6 @@ pg_timezone_names(PG_FUNCTION_ARGS)
                          &tzoff, &tm, &fsec, &tzn, tz) != 0)
             continue;            /* ignore if conversion fails */

-        /*
-         * Ignore zic's rather silly "Factory" time zone.  The long string
-         * about "see zic manual page" is used in tzdata versions before
-         * 2016g; we can drop it someday when we're pretty sure no such data
-         * exists in the wild on platforms using --with-system-tzdata.  In
-         * 2016g and later, the time zone abbreviation "-00" is used for
-         * "Factory" as well as some invalid cases, all of which we can
-         * reasonably omit from the pg_timezone_names view.
-         */
-        if (tzn && (strcmp(tzn, "-00") == 0 ||
-                    strcmp(tzn, "Local time zone must be set--see zic manual page") == 0))
-            continue;
-
         /* Found a displayable zone */
         break;
     }
diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index f91fd31..fc86ff0 100644
--- a/src/bin/initdb/findtimezone.c
+++ b/src/bin/initdb/findtimezone.c
@@ -413,12 +413,7 @@ identify_system_timezone(void)
                              &tt,
                              &bestscore, resultbuf);
     if (bestscore > 0)
-    {
-        /* Ignore IANA's rather silly "Factory" zone; use GMT instead */
-        if (strcmp(resultbuf, "Factory") == 0)
-            return NULL;
         return resultbuf;
-    }

     /*
      * Couldn't find a match in the database, so next we try constructed zone

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> No zone1970.tab.

zone.tab is an adequate substitute - a fact which I thought was
sufficiently obvious as to not be worth mentioning.

(also see https://reviews.freebsd.org/D20646 )

 Tom> I do not think we can rely on that file being there, since zic
 Tom> itself doesn't install it; it's up to packagers whether or where
 Tom> to install the "*.tab" files.

The proposed rules I suggested do work almost as well if zone[1970].tab
is absent, though obviously that's not the optimal situation. But are
there any systems which lack it? It's next to impossible to implement a
sane "ask the user what timezone to use" procedure without it.

 Tom> In general, the point I'm trying to make is that our policy should
 Tom> be "Ties are broken arbitrarily, and if you don't like the choice
 Tom> that initdb makes, here's how to fix it".

Yes, you've repeated that point at some length, and I am not convinced.
Is anyone else?

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Daniel Gustafsson
Date:
> On 27 Jun 2019, at 00:48, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

> Tom> In general, the point I'm trying to make is that our policy should
> Tom> be "Ties are broken arbitrarily, and if you don't like the choice
> Tom> that initdb makes, here's how to fix it".
>
> Yes, you've repeated that point at some length, and I am not convinced.
> Is anyone else?

I don’t have any insights into the patches comitted or proposed.  However,
having been lurking on the tz mailinglist for a long time, I totally see where
Tom is coming from with this.

cheers ./daniel


Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Stephen Frost
Date:
Greetings,

* Daniel Gustafsson (daniel@yesql.se) wrote:
> > On 27 Jun 2019, at 00:48, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>
> > Tom> In general, the point I'm trying to make is that our policy should
> > Tom> be "Ties are broken arbitrarily, and if you don't like the choice
> > Tom> that initdb makes, here's how to fix it".
> >
> > Yes, you've repeated that point at some length, and I am not convinced.
> > Is anyone else?
>
> I don’t have any insights into the patches comitted or proposed.  However,
> having been lurking on the tz mailinglist for a long time, I totally see where
> Tom is coming from with this.

I understand this concern, but I have to admit that I'm not entirely
thrilled with having the way we pick defaults be based on the concern
that people will complain.  If anything, this community, at least in my
experience, has thankfully been relatively reasonable and I have some
pretty serious doubts that a change like this will suddenly invite the
masses to argue with us or that, should someone try, they'd end up
getting much traction.

On the other hand, picking deprecated spellings is clearly a poor
choice, and we don't prevent people from picking whatever they want to.
I also don't see what Andrew's suggesting as being terribly
controversial, though that's likely because I'm looking through
rose-colored glasses, as the saying goes.  Even with that understanding
though, I tend to side with Andrew on this.

Thanks,

Stephen

Attachment
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> In general, the point I'm trying to make is that our policy should
>  Tom> be "Ties are broken arbitrarily, and if you don't like the choice
>  Tom> that initdb makes, here's how to fix it".

> Yes, you've repeated that point at some length, and I am not convinced.

[ shrug... ]  You haven't convinced me, either.  By my count we each have
about 0.5 other votes in favor of our positions, so barring more opinions
there's no consensus here for the sort of behavioral change you suggest.

However, not to let the perfect be the enemy of the good, it seems like
nobody has spoken against the ideas of (a) installing negative preferences
for the "localtime" and "posixrules" pseudo-zones, and (b) getting rid of
our now-unnecessary special treatment for "Factory".  How about we do that
much and leave any more-extensive change for another day?

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Robert Haas
Date:
On Tue, Jun 25, 2019 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As long as we have a trivial and obviously apolitical rule like
> alphabetical order, I think we can skate over such things; but the minute
> we have any sort of human choices involved there, we're going to be
> getting politically driven requests to do-it-like-this-because-I-think-
> the-default-should-be-that.  Again, trawl the tzdb list archives for
> awhile if you think this might not be a problem:
> http://mm.icann.org/pipermail/tz/

I'm kind of unsure what to think about this whole debate
substantively. If Andrew is correct that zone.tab or zone1970.tab is a
list of time zone names to be preferred over alternatives, then it
seems like we ought to prefer them. He remarks that we are preferring
"deprecated backward-compatibility aliases" and to the extent that
this is true, it seems like a bad thing. We can't claim to be
altogether here apolitical, because when those deprecated
backward-compatibility names are altogether removed, we are going to
remove them and they're going to stop working. If we know which ones
are likely to suffer that fate eventually, we ought to stop spitting
them out. It's no more political to de-prefer them when upstream does
than it is to remove them with the upstream does.

However, I don't know whether Andrew is right about those things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> I'm kind of unsure what to think about this whole debate
> substantively. If Andrew is correct that zone.tab or zone1970.tab is a
> list of time zone names to be preferred over alternatives, then it
> seems like we ought to prefer them.

It's not really clear to me that the IANA folk intend those files to
be read as a list of preferred zone names.  If they do, what are we
to make of the fact that no variant of "UTC" appears in them?

> He remarks that we are preferring
> "deprecated backward-compatibility aliases" and to the extent that
> this is true, it seems like a bad thing. We can't claim to be
> altogether here apolitical, because when those deprecated
> backward-compatibility names are altogether removed, we are going to
> remove them and they're going to stop working. If we know which ones
> are likely to suffer that fate eventually, we ought to stop spitting
> them out. It's no more political to de-prefer them when upstream does
> than it is to remove them with the upstream does.

I think that predicting what IANA will do in the future is a fool's
errand.  Our contract is to select some one of the aliases that the
tzdb database presents, not to guess about whether it might present
a different set in the future.  (Also note that a lot of the observed
variation here has to do with whether individual platforms choose to
install backward-compatibility zone names.  I think the odds that
IANA proper will remove those links are near zero; TTBOMK they
never have removed one yet.)

More generally, my unhappiness about Andrew's proposal is:

1. It's solving a problem that just about nobody cares about, as
evidenced by the very tiny number of complaints we've had to date.
As long as the "timezone" setting has the correct external behavior
(UTC offset, DST rules, and abbreviations), very few people notice
it at all.  With the addition of the code to resolve /etc/localtime
when it's a symlink, the population of people who might care has
taken a further huge drop.

2. Changing this behavior might create more problems than it solves.
In particular, it seemed to me that a lot of the complaints in the
UCT/UTC kerfuffle were less about "UCT is a silly name for my zone"
than about "this change broke my regression test that expected
timezone to be set to X in this environment".  Rearranging the tiebreak
rules is just going to make different sets of such people unhappy.
(Admittedly, the symlink-lookup addition has already created some
risk of this ilk.  Maybe we should wait for that to be in the field
for more than a week before we judge whether further hacking is
advisable.)

3. The proposal has technical issues, in particular I'm not nearly
as sanguine as Andrew is about whether we can rely on zone[1970].tab
to be available.

So I'm very unexcited about writing a bunch of new code or opening
ourselves to politically-driven complaints in order to change this.
It seems like a net loss almost independently of the details.

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 > Robert Haas <robertmhaas@gmail.com> writes:
 >> I'm kind of unsure what to think about this whole debate
 >> substantively. If Andrew is correct that zone.tab or zone1970.tab is
 >> a list of time zone names to be preferred over alternatives, then it
 >> seems like we ought to prefer them.

 Tom> It's not really clear to me that the IANA folk intend those files
 Tom> to be read as a list of preferred zone names.

The files exist to support user selection of zone names. That is, it is
intended that you can use them to allow the user to choose their country
and then timezone within that country, rather than offering them a flat
regional list (which can be large and the choices non-obvious).

The zone*.tab files therefore include only geographic names, and not
either Posix-style abbreviations or special cases like Etc/UTC. Programs
that use zone*.tab to allow user selection handle cases like that
separately (for example, FreeBSD's tzsetup offers "UTC" at the
"regional" menu).

It's quite possible that people have implemented time zone selection
interfaces that use some other presentation of the list, but that
doesn't particularly diminish the value of zone*.tab. In particular, the
current zone1970.tab has:

  - at least one entry for every iso3166 country code that's not an
    uninhabited remote island;

  - an entry for every distinct "Zone" in the primary data files, with
    the exception of entries that are specifically commented as being
    for backward compatibility (e.g. CET, CST6CDT, etc. - see the
    comments in the europe and northamerica data files for why these
    exist)

The zonefiles that get installed in addition to the ones in zone1970.tab
fall into these categories:

  - they are "Link" entries in the primary data files

  - they are from the "backward" data file, which is omitted in some
    system tzdb installations because it exists only for backward
    compatibility (but we install it because it's still listed in
    tzdata.zi by default)

  - they are from the "etcetera" file, which lists special cases such as
    UTC and fixed UTC offsets

 Tom> If they do, what are we to make of the fact that no variant of
 Tom> "UTC" appears in them?

That "UTC" is not a geographic timezone name?

 >> He remarks that we are preferring "deprecated backward-compatibility
 >> aliases" and to the extent that this is true, it seems like a bad
 >> thing. We can't claim to be altogether here apolitical, because when
 >> those deprecated backward-compatibility names are altogether
 >> removed, we are going to remove them and they're going to stop
 >> working. If we know which ones are likely to suffer that fate
 >> eventually, we ought to stop spitting them out. It's no more
 >> political to de-prefer them when upstream does than it is to remove
 >> them with the upstream does.

 Tom> I think that predicting what IANA will do in the future is a
 Tom> fool's errand.

Maybe so, but when something is explicitly in a file called "backward",
and the upstream-provided Makefile has specific options for omitting it
(even though it is included by default), and all the comments about it
are explicit about it being for backward compatibility, I think it's
reasonable to avoid _preferring_ the names in it.

The list of backward-compatibility zones is in any case extremely
arbitrary and nonsensical: for example "GB", "Eire", "Iceland",
"Poland", "Portugal" are aliases for their respective countries, but
there are no comparable aliases for any other European country. The
"Navajo" entry (an alias for America/Denver) has already been mentioned
in this thread; our arbitrary rule prefers it (due to shortness) for all
US zones that use Mountain time with DST. And so on.

 Tom> Our contract is to select some one of the aliases that the tzdb
 Tom> database presents, not to guess about whether it might present a
 Tom> different set in the future. (Also note that a lot of the observed
 Tom> variation here has to do with whether individual platforms choose
 Tom> to install backward-compatibility zone names. I think the odds
 Tom> that IANA proper will remove those links are near zero; TTBOMK
 Tom> they never have removed one yet.)

Well, we should also consider the possibility that we might be using the
system tzdata and that the upstream OS or distro packager may choose to
remove the "backward" data or split it to a separate package.

 Tom> More generally, my unhappiness about Andrew's proposal is:

 [...]
 Tom> 3. The proposal has technical issues, in particular I'm not nearly
 Tom> as sanguine as Andrew is about whether we can rely on
 Tom> zone[1970].tab to be available.

My proposal works even if it's not, though I don't expect that to be an
issue in practice.

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Robert Haas
Date:
On Thu, Jun 27, 2019 at 1:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's not really clear to me that the IANA folk intend those files to
> be read as a list of preferred zone names.  If they do, what are we
> to make of the fact that no variant of "UTC" appears in them?

I think their intent is key.  We can't make reasonable decisions about
what to do with some data if we don't know what the data is intended
to mean.

> I think that predicting what IANA will do in the future is a fool's
> errand.  Our contract is to select some one of the aliases that the
> tzdb database presents, not to guess about whether it might present
> a different set in the future.  (Also note that a lot of the observed
> variation here has to do with whether individual platforms choose to
> install backward-compatibility zone names.  I think the odds that
> IANA proper will remove those links are near zero; TTBOMK they
> never have removed one yet.)

That doesn't make it a good idea to call Mountain time "Navajo," as
Andrew alleges we are doing.  Then again, the MacBook upon which I am
writing this email thinks that my time zone is "America/New_York,"
whereas I think it is "US/Eastern," which I suppose reinforces your
point about all of this being political. But on the third hand, if
somebody tells me that my time zone is America/New_York, I can say to
myself "oh, they mean Eastern time," whereas if they say that I'm on
"Navajo" time, I'm going to have to sit down with 'diff' and the
zoneinfo files to figure out what that actually means.

I note that https://github.com/eggert/tz/blob/master/backward seems
pretty clear about which things are backward compatibility aliases,
which seems to imply that we would not be taking a political position
separate from the upstream position if we tried to de-prioritize
those.

Also, https://github.com/eggert/tz/blob/master/theory.html says...

Names normally have the form
<var>AREA</var><code>/</code><var>LOCATION</var>, where
<var>AREA</var> is a continent or ocean, and
<var>LOCATION</var> is a specific location within the area.

...which seems to imply that AREA/LOCATION is the "normal" and thus
preferred form, and also that...

The file '<code>zone1970.tab</code>' lists geographical locations used
to name timezones.
It is intended to be an exhaustive list of names for geographic
regions as described above; this is a subset of the timezones in the data.

...which seems to support Andrew's idea that you can identify
AREA/LOCATION time zones by looking in that file.

Long story short, I agree with you that most people probably don't
care about this very much, but I also agree with Andrew that some of
the current choices we're making are pretty strange, and I'm not
convinced as you are that it's impossible to make a principled choice
between alternatives in all cases. The upstream data appears to
contain some information about intent; it's not just a jumble of
exactly-equally-preferred alternatives.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> Long story short, I agree with you that most people probably don't
> care about this very much, but I also agree with Andrew that some of
> the current choices we're making are pretty strange, and I'm not
> convinced as you are that it's impossible to make a principled choice
> between alternatives in all cases. The upstream data appears to
> contain some information about intent; it's not just a jumble of
> exactly-equally-preferred alternatives.

I agree that if there were an easy way to discount the IANA "backward
compatibility" zone names, that'd likely be a reasonable thing to do.
The problem is that those names aren't distinguished from others in
the representation we have available to us (ie, the actual
/usr/share/zoneinfo file tree).  I'm dubious that relying on
zone[1970].tab would improve matters substantially; it would fix
some cases, but I don't think it would fix all of them.  Resolving
all ambiguous zone-name choices is not the charter of those files.

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> I'm dubious that relying on zone[1970].tab would improve matters
 Tom> substantially; it would fix some cases, but I don't think it would
 Tom> fix all of them. Resolving all ambiguous zone-name choices is not
 Tom> the charter of those files.

Allowing zone matching by _content_ (as we do) rather than by name does
not seem to be supported in any respect whatever by the upstream data;
we've always been basically on our own with that.

[tl/dr for what follows: my proposal reduces the number of discrepancies
from 91 (see previously posted list) to 16 or 7, none of which are new]

So here are the ambiguities that are not resolvable at all:

Africa/Abidjan -> GMT

This happens because the Africa/Abidjan zone is literally just GMT even
down to the abbreviation, and we don't want to guess Africa/Abidjan for
all GMT installs.

America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia
Asia/Kuala_Lumpur -> Asia/Singapore

These are cases where zone1970.tab, despite its name, includes
distinctly-named zones which are distinct only for times in the far past
(before 1920 or 1905 respectively). They are otherwise identical by
content. We therefore end up choosing arbitrarily.

In addition, the following collection of random islands have timezones
which lack local abbreviation names, recent offset changes, or DST, and
are therefore indistinguishable by content from fixed-offset zones like
Etc/GMT+2:

Etc/GMT-4 ==
  Indian/Mahe
  Indian/Reunion

Etc/GMT-7 == Indian/Christmas
Etc/GMT-9 == Pacific/Palau
Etc/GMT-10 == Pacific/Port_Moresby
Etc/GMT-11 == Pacific/Guadalcanal

Etc/GMT-12 ==
  Pacific/Funafuti
  Pacific/Tarawa
  Pacific/Wake
  Pacific/Wallis

Etc/GMT+10 == Pacific/Tahiti
Etc/GMT+9 == Pacific/Gambier

Etc/GMT+2 == Atlantic/South_Georgia

We currently map all of these to the Etc/GMT+x names on the grounds of
length. If we chose to prefer zone.tab names over Etc/* names for all of
these, we'd be ambiguous only for a handful of relatively small islands.

-- 
Andrew (irc:RhodiumToad)



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Thomas Munro
Date:
On Thu, Jun 27, 2019 at 10:48 AM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> No zone1970.tab.
>
> zone.tab is an adequate substitute - a fact which I thought was
> sufficiently obvious as to not be worth mentioning.
>
> (also see https://reviews.freebsd.org/D20646 )

FWIW this is now fixed for FreeBSD 13-CURRENT, with a good chance of
back-patch.  I don't know if there are any other operating systems
that are shipping zoneinfo but failing to install zone1970.tab, but if
there are it's a mistake IMHO and they'll probably fix that if someone
complains, considering that zone.tab literally tells you to go and use
the newer version, and Paul Eggert has implied that zone1970.tab is
the "full" and "canonical" list[1].

[1] http://mm.icann.org/pipermail/tz/2014-October/021760.html

-- 
Thomas Munro
https://enterprisedb.com



Thomas Munro <thomas.munro@gmail.com> writes:
> FWIW this is now fixed for FreeBSD 13-CURRENT, with a good chance of
> back-patch.  I don't know if there are any other operating systems
> that are shipping zoneinfo but failing to install zone1970.tab, but if
> there are it's a mistake IMHO and they'll probably fix that if someone
> complains, considering that zone.tab literally tells you to go and use
> the newer version, and Paul Eggert has implied that zone1970.tab is
> the "full" and "canonical" list[1].

I'm not sure we're any closer to a meeting of the minds on whether
consulting zone[1970].tab is a good thing to do, but we got an actual
user complaint[1] about how "localtime" should not be a preferred
spelling.  So I want to go ahead and insert the discussed anti-preference
against "localtime" and "posixrules", as per 0001 below.  If we do do
something with zone[1970].tab, we'd still need these special rules,
so I don't think this is blocking anything.

Also, I poked into the question of the "Factory" zone a bit more,
and was disappointed to find that not only does FreeBSD still install
the "Factory" zone, but they are apparently hacking the data so that
it emits the two-changes-back abbreviation "Local time zone must be
set--use tzsetup".  This bypasses the filter in pg_timezone_names that
is expressly trying to prevent showing such silly "abbreviations".
So I now feel that not only can we not remove initdb's discrimination
against "Factory", but we indeed need to make the pg_timezone_names
filter more aggressive.  Hence, I now propose 0002 below to tweak
what we're doing with "Factory".  I did remove our special cases for
it in zic.c, as we don't need them anymore with modern tzdb data, and
there's no reason to support running "zic -P" with hacked-up data.

            regards, tom lane

[1] https://www.postgresql.org/message-id/CADT4RqCCnj6FKLisvT8tTPfTP4azPhhDFJqDF1JfBbOH5w4oyQ@mail.gmail.com

diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index a5c9c9e..786e787 100644
--- a/src/bin/initdb/findtimezone.c
+++ b/src/bin/initdb/findtimezone.c
@@ -608,22 +608,28 @@ check_system_link_file(const char *linkname, struct tztry *tt,
 /*
  * Given a timezone name, determine whether it should be preferred over other
  * names which are equally good matches. The output is arbitrary but we will
- * use 0 for "neutral" default preference.
- *
- * Ideally we'd prefer the zone.tab/zone1970.tab names, since in general those
- * are the ones offered to the user to select from. But for the moment, to
- * minimize changes in behaviour, simply prefer UTC over alternative spellings
- * such as UCT that otherwise cause confusion. The existing "shortest first"
- * rule would prefer "UTC" over "Etc/UTC" so keep that the same way (while
- * still preferring Etc/UTC over Etc/UCT).
+ * use 0 for "neutral" default preference; larger values are more preferred.
  */
 static int
 zone_name_pref(const char *zonename)
 {
+    /*
+     * Prefer UTC over alternatives such as UCT.  Also prefer Etc/UTC over
+     * Etc/UCT; but UTC is preferred to Etc/UTC.
+     */
     if (strcmp(zonename, "UTC") == 0)
         return 50;
     if (strcmp(zonename, "Etc/UTC") == 0)
         return 40;
+
+    /*
+     * We don't want to pick "localtime" or "posixrules", unless we can find
+     * no other name for the prevailing zone.  Those aren't real zone names.
+     */
+    if (strcmp(zonename, "localtime") == 0 ||
+        strcmp(zonename, "posixrules") == 0)
+        return -50;
+
     return 0;
 }

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 4d8db1a..972fcd2 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -4826,16 +4826,15 @@ pg_timezone_names(PG_FUNCTION_ARGS)
             continue;            /* ignore if conversion fails */

         /*
-         * Ignore zic's rather silly "Factory" time zone.  The long string
-         * about "see zic manual page" is used in tzdata versions before
-         * 2016g; we can drop it someday when we're pretty sure no such data
-         * exists in the wild on platforms using --with-system-tzdata.  In
-         * 2016g and later, the time zone abbreviation "-00" is used for
-         * "Factory" as well as some invalid cases, all of which we can
-         * reasonably omit from the pg_timezone_names view.
+         * IANA's rather silly "Factory" time zone used to emit ridiculously
+         * long "abbreviations" such as "Local time zone must be set--see zic
+         * manual page" or "Local time zone must be set--use tzsetup".  While
+         * modern versions of tzdb emit the much saner "-00", it seems some
+         * benighted packagers are hacking the IANA data so that it continues
+         * to produce these strings.  To prevent producing a weirdly wide
+         * abbrev column, reject ridiculously long abbreviations.
          */
-        if (tzn && (strcmp(tzn, "-00") == 0 ||
-                    strcmp(tzn, "Local time zone must be set--see zic manual page") == 0))
+        if (tzn && strlen(tzn) > 31)
             continue;

         /* Found a displayable zone */
diff --git a/src/timezone/zic.c b/src/timezone/zic.c
index 95ab854..c27fb45 100644
--- a/src/timezone/zic.c
+++ b/src/timezone/zic.c
@@ -2443,13 +2443,10 @@ writezone(const char *const name, const char *const string, char version,
                     unsigned char tm = types[i];
                     char       *thisabbrev = &thischars[indmap[desigidx[tm]]];

-                    /* filter out assorted junk entries */
-                    if (strcmp(thisabbrev, GRANDPARENTED) != 0 &&
-                        strcmp(thisabbrev, "zzz") != 0)
-                        fprintf(stdout, "%s\t" INT64_FORMAT "%s\n",
-                                thisabbrev,
-                                utoffs[tm],
-                                isdsts[tm] ? "\tD" : "");
+                    fprintf(stdout, "%s\t" INT64_FORMAT "%s\n",
+                            thisabbrev,
+                            utoffs[tm],
+                            isdsts[tm] ? "\tD" : "");
                 }
             }
             /* Print the default type if we have no transitions at all */
@@ -2458,13 +2455,10 @@ writezone(const char *const name, const char *const string, char version,
                 unsigned char tm = defaulttype;
                 char       *thisabbrev = &thischars[indmap[desigidx[tm]]];

-                /* filter out assorted junk entries */
-                if (strcmp(thisabbrev, GRANDPARENTED) != 0 &&
-                    strcmp(thisabbrev, "zzz") != 0)
-                    fprintf(stdout, "%s\t" INT64_FORMAT "%s\n",
-                            thisabbrev,
-                            utoffs[tm],
-                            isdsts[tm] ? "\tD" : "");
+                fprintf(stdout, "%s\t" INT64_FORMAT "%s\n",
+                        thisabbrev,
+                        utoffs[tm],
+                        isdsts[tm] ? "\tD" : "");
             }
         }


Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Shay Rojansky
Date:
> I'm not sure we're any closer to a meeting of the minds on whether
> consulting zone[1970].tab is a good thing to do, but we got an actual
> user complaint[1] about how "localtime" should not be a preferred
> spelling.  So I want to go ahead and insert the discussed anti-preference
> against "localtime" and "posixrules", as per 0001 below.  If we do do
> something with zone[1970].tab, we'd still need these special rules,
> so I don't think this is blocking anything.

Just want to stress this point from a PostgreSQL driver maintainer perspective (see here[1] for the full details). Having "localtime" as the PostgreSQL timezone basically means that the timezone is completely opaque from a client point of view - there is no way for clients to know what actual timezone the server is in, and react to that. This is a limiting factor in client development, I hope a consensus on this specific point can be reached.

Shay Rojansky <roji@roji.org> writes:
>> I'm not sure we're any closer to a meeting of the minds on whether
>> consulting zone[1970].tab is a good thing to do, but we got an actual
>> user complaint[1] about how "localtime" should not be a preferred
>> spelling.  So I want to go ahead and insert the discussed anti-preference
>> against "localtime" and "posixrules", as per 0001 below.  If we do do
>> something with zone[1970].tab, we'd still need these special rules,
>> so I don't think this is blocking anything.

> Just want to stress this point from a PostgreSQL driver maintainer
> perspective (see here[1] for the full details). Having "localtime" as the
> PostgreSQL timezone basically means that the timezone is completely opaque
> from a client point of view - there is no way for clients to know what
> actual timezone the server is in, and react to that. This is a limiting
> factor in client development, I hope a consensus on this specific point can
> be reached.

I have in fact committed that patch.  It won't do anything for your
problem with respect to existing installations that may have picked
"localtime", but it'll at least prevent new initdb runs from picking
that.

            regards, tom lane


Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master [3754113f3] 2019-07-26 12:45:32 -0400
Branch: REL_12_STABLE [e31dfe99c] 2019-07-26 12:45:52 -0400
Branch: REL_11_STABLE [4459266bf] 2019-07-26 12:45:57 -0400
Branch: REL_10_STABLE [ae9b91be7] 2019-07-26 12:46:03 -0400
Branch: REL9_6_STABLE [51b47471f] 2019-07-26 12:46:10 -0400
Branch: REL9_5_STABLE [9ef811742] 2019-07-26 12:46:15 -0400
Branch: REL9_4_STABLE [6c4ffab76] 2019-07-26 12:46:20 -0400

    Avoid choosing "localtime" or "posixrules" as TimeZone during initdb.

    Some platforms create a file named "localtime" in the system
    timezone directory, making it a copy or link to the active time
    zone file.  If Postgres is built with --with-system-tzdata, initdb
    will see that file as an exact match to localtime(3)'s behavior,
    and it may decide that "localtime" is the most preferred spelling of
    the active zone.  That's a very bad choice though, because it's
    neither informative, nor portable, nor stable if someone changes
    the system timezone setting.  Extend the preference logic added by
    commit e3846a00c so that we will prefer any other zone file that
    matches localtime's behavior over "localtime".

    On the same logic, also discriminate against "posixrules", which
    is another not-really-a-zone file that is often present in the
    timezone directory.  (Since we install "posixrules" but not
    "localtime", this change can affect the behavior of Postgres
    with or without --with-system-tzdata.)

    Note that this change doesn't prevent anyone from choosing these
    pseudo-zones if they really want to (i.e., by setting TZ for initdb,
    or modifying the timezone GUC later on).  It just prevents initdb
    from preferring these zone names when there are multiple matches to
    localtime's behavior.

    Since we generally prefer to keep timezone-related behavior the
    same in all branches, and since this is arguably a bug fix,
    back-patch to all supported branches.

    Discussion: https://postgr.es/m/CADT4RqCCnj6FKLisvT8tTPfTP4azPhhDFJqDF1JfBbOH5w4oyQ@mail.gmail.com
    Discussion: https://postgr.es/m/27991.1560984458@sss.pgh.pa.us



Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

From
Shay Rojansky
Date:
Tom,

> I have in fact committed that patch.  It won't do anything for your
> problem with respect to existing installations that may have picked
>"localtime", but it'll at least prevent new initdb runs from picking
> that.

Thanks! At least over time the problem will hopefully diminish.

Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-08-01 10:08:01 -0400, Tom Lane wrote:
> I have in fact committed that patch.  It won't do anything for your
> problem with respect to existing installations that may have picked
> "localtime", but it'll at least prevent new initdb runs from picking
> that.

>     Avoid choosing "localtime" or "posixrules" as TimeZone during initdb.
>     
>     Some platforms create a file named "localtime" in the system
>     timezone directory, making it a copy or link to the active time
>     zone file.  If Postgres is built with --with-system-tzdata, initdb
>     will see that file as an exact match to localtime(3)'s behavior,
>     and it may decide that "localtime" is the most preferred spelling of
>     the active zone.  That's a very bad choice though, because it's
>     neither informative, nor portable, nor stable if someone changes
>     the system timezone setting.  Extend the preference logic added by
>     commit e3846a00c so that we will prefer any other zone file that
>     matches localtime's behavior over "localtime".

When used and a symlink, could we resolve the symlink when determining
the timezone? When loading a timezone in the backend, not during
initdb. While that'd leave us with the instability, it'd at least would
help clients etc understand what the setting actually means?

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> When used and a symlink, could we resolve the symlink when determining
> the timezone? When loading a timezone in the backend, not during
> initdb. While that'd leave us with the instability, it'd at least would
> help clients etc understand what the setting actually means?

The question here is what the string "localtime" means when it's in
the timezone variable.

I guess yes, we could install some show_hook for timezone
that goes and looks to see if it can resolve what that means.
But that sure seems to me to be in you've-got-to-be-kidding
territory.  Especially since the platforms I've seen that
do this tend to use hard links, so that it's questionable
whether the pushups would accomplish anything at all.

            regards, tom lane



Re: UCT (Re: pgsql: Update time zone data files to tzdata release2019a.)

From
Andres Freund
Date:
Hi,

On 2019-08-01 13:59:11 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > When used and a symlink, could we resolve the symlink when determining
> > the timezone? When loading a timezone in the backend, not during
> > initdb. While that'd leave us with the instability, it'd at least would
> > help clients etc understand what the setting actually means?
> 
> The question here is what the string "localtime" means when it's in
> the timezone variable.

Right.


> I guess yes, we could install some show_hook for timezone that goes
> and looks to see if it can resolve what that means.  But that sure
> seems to me to be in you've-got-to-be-kidding territory.

Fair enough. I'm mildly worried that people will just carry their
timezone setting from one version's postgresql.conf to the next as they
upgrade.


> Especially since the platforms I've seen that do this tend to use hard
> links, so that it's questionable whether the pushups would accomplish
> anything at all.

Hm, debian's is a symlink (or rather a chain of):

$ ls -l /usr/share/zoneinfo/localtime
lrwxrwxrwx 1 root root 14 Jul  4 14:04 /usr/share/zoneinfo/localtime -> /etc/localtime

$ ls -l /etc/localtime
lrwxrwxrwx 1 root root 39 Jul 15 15:40 /etc/localtime -> /usr/share/zoneinfo/America/Los_Angeles

The system installed versions of postgres I have available all ended up
with timezone=localtime.

Not sure how long they've been symlinks. I randomly accessed a backup of
an older debian installation, from 2014, and there it's a file (with
link count 1).

But presumably upgrading would yield a postgresql.conf that still had
localtime, but localtime becoming a symlink.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> Fair enough. I'm mildly worried that people will just carry their
> timezone setting from one version's postgresql.conf to the next as they
> upgrade.

Maybe.  I don't believe pg_upgrade copies over the old postgresql.conf,
and I doubt we should consider it good practice in any case.

            regards, tom lane