Re: Tracking timezone abbreviation removals in the IANA tz database - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Tracking timezone abbreviation removals in the IANA tz database
Date
Msg-id CA+TgmoaKHYR34JYo=VFhbj8xPXwGNx5OGYJOg+wguPgzt+Qriw@mail.gmail.com
Whole thread Raw
In response to Tracking timezone abbreviation removals in the IANA tz database  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Tracking timezone abbreviation removals in the IANA tz database  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Sep 2, 2016 at 8:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I had thought that this wouldn't really affect us because PG's
> interpretations of zone abbreviations are driven by the info in
> timezone/tznames/ rather than the IANA files in timezone/data/.
> I forgot however that the "dynamic abbreviation" code relies on
> being able to find matching abbreviations in the IANA data.
> For example, timezone/tznames/Default lists
>
> NOVST   Asia/Novosibirsk  # Novosibirsk Summer Time (obsolete)
>
> but that zone abbreviation now fails entirely:
>
> # select '2016-09-02 08:00:00 NOVST'::timestamptz;
> ERROR:  time zone abbreviation "novst" is not used in time zone "Asia/Novosibirsk"
>
> (This is also the cause of bug #14307.)
>
> So the question is what to do about it.
>
> An easy answer is to just delete such entries from the timezone/tznames/
> files altogether.  If we believe the IANA crew's assumption that nobody
> uses these abbreviations in the real world, then that seems like it
> removes useless maintenance overhead for little cost.  I'm a little
> worried though that somebody might have followed IANA's lead and actually
> started using these abbreviations, in which case we'd get complaints.
>
> Another possibility is to keep these entries but get rid of the dynamic
> zone aliases, reducing them to plain numeric UTC offsets.  Probably the
> value to use would be whatever was shown as the most recent active value
> in the IANA list ... although that's open to interpretation.  For
> instance, according to the above we'd likely define NOVT as UTC+7,
> but then logically NOVST ought to be UTC+8, which doesn't match up
> with the fact that when it actually last appeared in the IANA data
> it meant UTC+7.  So that sounds like it'd be a bit of a mess involving
> some judgment calls.
>
> In the end, part of the reason we've got these files is so that users
> can make their own decisions about abbreviation meanings.  So the
> ultimate answer to any complaints is going to be "if you think NOVT
> means thus-and-such then put in an entry that says so".
>
> So I'm leaning to the just-remove-it answer for any deleted abbreviation
> that relies on a dynamic definition.  Names that never had more than one
> UTC offset can remain in the tznames list.
>
> Comments?

I tried to understand these emails today and mostly failed. I think
this stuff needs to be better documented.
src/timezones/tznames/README pretty much presumes that you more or
less know what's going on:

# This directory contains files with timezone sets for PostgreSQL.  The problem
# is that time zone abbreviations are not unique throughout the world and you
# might find out that a time zone abbreviation in the `Default' set collides
# with the one you wanted to use.  This can be fixed by selecting a timezone
# set that defines the abbreviation the way you want it.  There might already
# be a file here that serves your needs.  If not, you can create your own.

Of course, it doesn't bother to define the term 'timezone set', a
phrase that is not used in any other file in the source tree.
Eventually, after some poking around, I figured out that this is
described in the documentation under "B.3. Date/Time Configuration
Files", but I didn't notice that when I searched the documentation for
it.  I only figured it out after I actually looked at the contents of
"src/timezone/tznames/Default" and saw that it recommended looking at
the "Date/Time Support", which I had done, but which I gave up on
doing pretty quickly because there was no section whose title included
the phrase "time zone".

Once I read that section, I ran across this:

# The offset is the equivalent offset in seconds from UTC, positive being east
# from Greenwich and negative being west. For example, -18000 would be five
# hours west of Greenwich, or North American east coast standard time. D
# indicates that the zone name represents local daylight-savings time rather
# than standard time. Alternatively, a time_zone_name can be given, in which
# case that time zone definition is consulted, and the abbreviation's meaning
# in that zone is used.

I dimly understand that "the abbreviation's meaning in that zone" is
referring to an EST vs. EDT type of distinction, but a novice reader
of this documentation will probably not understand that from this
language.

Similarly, your email mentions "dynamic zone aliases" but:

[rhaas pgsql]$ git grep -i 'dynamic zone alias'
[rhaas pgsql]$ git grep -i 'dynamic zone'
[rhaas pgsql]$ git grep -i 'zone alias'
[rhaas pgsql]$

I'm guessing that's referring to the same thing as the above
documentation paragraph, wherein a "time_zone_name" is given, but it
doesn't really refer to the time zone name, but rather the DST or
non-DST configuration of that time zone.  But I'm not sure I'm right
about that.

Another thing that's not very well-documented is the use of our own
tzdata vs. the system's tzdata.  There is a paragraph on it in the
installation instructions:

https://www.postgresql.org/docs/current/static/install-procedure.html

...but what happens if you use it?  Are dynamic zone aliases and time
zone sets and such still in the picture, or have you forfeited that
stuff by going over to the system's tzdata?

Searching the documentation for "time zone" through the web site
yields a positively uninspiring set of results:

https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=time+zone

I suggest that if you (or someone else who understands it, if there is
any such person) were willing to either improve the existing
documentation or maybe even write a whole new chapter on how we do
time zone handling, it would be most welcome.  I basically feel that
such documentation as we already have presumes expert-level knowledge
of what's in the IANA files, what's in our configuration files, and
how everything fits together; and I doubt many people have that
knowledge.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: Logical Replication WIP
Next
From: Tom Lane
Date:
Subject: Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)