Home > mailing lists

Thread: The contents of the pg_timezone_names view bring some surprises

The contents of the pg_timezone_names view bring some surprises

From

Bryn Llewellyn

Date:

19 May 2021, 06:31:57

Some time zones have abbreviations that are identical to their names. This query:

select name

from pg_timezone_names

where abbrev = name

order by name;

gets this result:

EST
GMT
HST
MST
UCT
UTC

This, in itself, doesn’t seem remarkable. I wondered if any time zones have names that occur as time zone abbreviations but where the name and its abbreviation differ.

select name
from
pg_timezone_names
where
name in (select abbrev from pg_timezone_names) and
name <> abbrev
order by name;

gets this result:

CET
EET

So I wondered what rows have CET or EET as either a name or an abbreviation and yet the name and the abbreviation differ.

select name, abbrev, utc_offset, is_dst
from pg_timezone_names
where
(
name in ('CET', 'EET') or
abbrev in ('CET', 'EET')
)
and name <> abbrev
order by name;

gets this result:

Africa/Algiers | CET | 01:00:00 | f
Africa/Cairo | EET | 02:00:00 | f
Africa/Tripoli | EET | 02:00:00 | f
Africa/Tunis | CET | 01:00:00 | f
CET | CEST | 02:00:00 | t
EET | EEST | 03:00:00 | t
Egypt | EET | 02:00:00 | f
Europe/Kaliningrad | EET | 02:00:00 | f
Libya | EET | 02:00:00 | f

This tells me that when CET is used as a timezone name, it denotes an offset of 02:00—at least at some times of the year. And when the same text is used as an abbrev, it denotes an offset of 01:00.

But you can use either a timezone name, or a timezone abbreviation in the `at time zone` clause (see below).

There’s a similar story for EET where it denotes respectively offsets of 03:00 and 02:00.

Here’s what seems to me to be a closely related dilemma. I’d thought that an abbrev uniquely specified the utc_offset. But this test shows that it doesn’t:

with
  v1 as (
  select distinct abbrev, utc_offset
  from pg_timezone_names),
  v2 as (
  select abbrev, count(*)
  from v1
  group by abbrev
  having count(*) > 1)
select name, abbrev, utc_offset, is_dst
from pg_timezone_names
where abbrev in (select abbrev from v2)
order by abbrev;

It gets 46 rows. Here’s an interesting subset:

America/Monterrey   | CDT   | -05:00:00  | t
America/Havana     | CDT   | -04:00:00  | t

ROC   | CST   | 08:00:00   | f
America/Costa_Rica     | CST   | -06:00:00  | f

Eire     | IST   | 01:00:00   | f
Asia/Kolkata     | IST   | 05:30:00   | f

So here, the same text, even when used as abbrev, can denote different utc_offset values. (But note that there seems to be no way, in the 'at time zone' clause, that I can say that I want a text value to be taken as a name and not as an abbreviation, or vice versa.)

This seems to be at odds with what section “8.5.3. Time Zones” at

https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES

says:

A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs view (see Section 51.91). You cannot set the configuration parameters TimeZone or log_timezone to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator.

This claims (as I read it) that a time zone abbreviation uniquely determines an offset from UTC.

It seems that the result of this is therefore undefined because CDT denotes two different utc_offset values.:

select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'CDT';

The same goes for this:

select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'EET';

In summary, each of these texts, for two kinds of reason, can denote two different utc_offset values.:

CET

EET

CDT

CST

IST

Am I missing an essential clue to resolving what seems to me to be a paradox? Or am I seeing two kinds of bug?

Re: The contents of the pg_timezone_names view bring some surprises

From

"David G. Johnston"

Date:

19 May 2021, 07:03:22

On Tuesday, May 18, 2021, Bryn Llewellyn <bryn@yugabyte.com> wrote:

Some time zones have abbreviations that are identical to their names. This query:

Am I missing an essential clue to resolving what seems to me to be a paradox? Or am I seeing two kinds of bug?

You are missing the material in appendix B.4

https://www.postgresql.org/docs/current/datetime-config-files.html

I think that covers dealing with the situations you note.

David J.

Re: The contents of the pg_timezone_names view bring some surprises

From

Tom Lane

Date:

19 May 2021, 13:52:02

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tuesday, May 18, 2021, Bryn Llewellyn <bryn@yugabyte.com> wrote:
>> Am I missing an essential clue to resolving what seems to me to be a
>> paradox? Or am I seeing two kinds of bug?

> You are missing the material in appendix B.4
> https://www.postgresql.org/docs/current/datetime-config-files.html

The short answer is that the reason there are two views is that
there are two sources of truth involved.  pg_timezone_names reflects
the zone names defined in the IANA timezone database, while
pg_timezone_abbrevs reflects the abbreviations defined in our
user-customizable abbreviations table.  It'd be impossible to make
them match exactly, and we don't try exceedingly hard.  In particular,
the IANA list has some zones such as "CET" that don't follow their own
continent/city naming convention.  (AFAIK those are all legacy zones
that they'd get rid of if they weren't concerned with backwards
compatibility.)  If those look like abbreviations, which they mostly
do, then it's confusing.

Where the rubber meets the road is in timestamptz input, and there
we consult the abbreviations table first.  (Not sure if that's
documented, but you can easily prove it by experiment.)

As for the question about "abbreviations" like +09 --- those are
not abbreviations at all, they're just hard-coded numeric UTC
offsets.  So they don't appear in pg_timezone_abbrevs.  IANA
uses those as display offsets in zones where there's not any
widely-used-on-the-ground abbreviation.

            regards, tom lane

Re: The contents of the pg_timezone_names view bring some surprises

From

Adrian Klaver

Date:

19 May 2021, 13:57:13

On 5/18/21 11:31 PM, Bryn Llewellyn wrote:
> Some time zones have abbreviations that are identical to their names. 
> This query:
> 
> Here’s what seems to me to be a closely related dilemma. I’d thought 
> that an abbrev uniquely specified the utc_offset. But this test shows 
> that it doesn’t:
> 
> with
>    v1 as (
>      select distinct abbrev, utc_offset
>      from pg_timezone_names),
>    v2 as (
>      select abbrev, count(*)
>      from v1
>      group by abbrev
>      having count(*) > 1)
> select name, abbrev, utc_offset, is_dst
> from pg_timezone_names
> where abbrev in (select abbrev from v2)
> order by abbrev;
> 
> It gets 46 rows. Here’s an interesting subset:
> 
> America/Monterrey              | CDT    | -05:00:00  | t
> America/Havana                 | CDT    | -04:00:00  | t
> 
> ROC                            | CST    | 08:00:00   | f
> America/Costa_Rica             | CST    | -06:00:00  | f
> 
> Eire                           | IST    | 01:00:00   | f
> Asia/Kolkata                   | IST    | 05:30:00   | f
> 
> So here, the same text, even when used as abbrev, can denote different 
> utc_offset values. (But note that there seems to be no way, in the 'at 
> time zone' clause, that I can say that I want a text value to be taken 
> as a name and not as an abbreviation, or vice versa.)
> 
> This seems to be at odds with what section “8.5.3. Time Zones” at
> 
> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES 
> <https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES>
> 
> says:
> 
> «
> A time zone abbreviation, for example PST. Such a specification merely 
> defines a particular offset from UTC, in contrast to full time zone 
> names which can imply a set of daylight savings transition rules as 
> well. The recognized abbreviations are listed in 
> the pg_timezone_abbrevs view (see Section 51.91). You cannot set the 
> configuration parameters TimeZone or log_timezone to a time zone 
> abbreviation, but you can use abbreviations in date/time input values 
> and with the AT TIME ZONE operator.
> »
> 
> This claims (as I read it) that a time zone abbreviation uniquely 
> determines an offset from UTC.

It says no such thing and would be lying if it did. Take a look at this:

https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations

and see the abbreviations that share offsets.

What it is saying that, for example,  the timezone America/Los_Angeles 
has two timezone abbreviations PDT(what I'm currently in) and PST. If 
you use an abbreviation you don't get the DST transition rules that a 
full timezone name has.

> 
> It seems that the result of this is therefore undefined because CDT 
> denotes two different utc_offset values.:
> 
> select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'CDT';
> 
> The same goes for this:
> 
> select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'EET';
> 
> In summary, each of these texts, for two kinds of reason, can denote two 
> different utc_offset values.:
> 
> CET
> EET
> CDT
> CST
> IST
> 
> Am I missing an essential clue to resolving what seems to me to be a 
> paradox? Or am I seeing two kinds of bug?
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: The contents of the pg_timezone_names view bring some surprises

From

Bryn Llewellyn

Date:

20 May 2021, 00:50:07

tgl@sss.pgh.pa.us wrote:

david.g.johnston@gmail.com wrote:

bryn@yugabyte.com wrote:

Am I missing an essential clue to resolving what seems to me to be a paradox? Or am I seeing two kinds of bug?

You are missing the material in appendix B.4. Date/Time Configuration Files
https://www.postgresql.org/docs/current/datetime-config-files.html

The short answer is that the reason there are two views is that there are two sources of truth involved. pg_timezone_names reflects the zone names defined in the IANA timezone database, while pg_timezone_abbrevs reflects the abbreviations defined in our user-customizable abbreviations table. It'd be impossible to make them match exactly, and we don't try exceedingly hard. In particular, the IANA list has some zones such as "CET" that don't follow their own continent/city naming convention. (AFAIK those are all legacy zones that they'd get rid of if they weren't concerned with backwards compatibility.) If those look like abbreviations, which they mostly do, then it's confusing.

Where the rubber meets the road is in timestamptz input, and there we consult the abbreviations table first. (Not sure if that's documented, but you can easily prove it by experiment.)

As for the question about "abbreviations" like +09 --- those are not abbreviations at all, they're just hard-coded numeric UTC offsets. So they don't appear in pg_timezone_abbrevs. IANA uses those as display offsets in zones where there's not any widely-used-on-the-ground abbreviation.

Thanks, as ever, David and Tom, for your quick responses. Thanks also to Adrian Klaver, who replied in a branched thread with this—in response to my comment about my reading of the information content of the pg_timezone_abbrevs view: « This claims (as I read it) that a time zone abbreviation uniquely determines an offset from UTC. »

adrian.klaver@aklaver.com wrote:

It says no such thing and would be lying if it did. Take a look at this:

https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations

and see the abbreviations that share offsets.

What it is saying that, for example, the timezone America/Los_Angeles has two timezone abbreviations PDT (what I'm currently in) and PST. If you use an abbreviation you don't get the DST transition rules that a full timezone name has.

______________________________________________________________________

Firstly, David's response.

I (re)read "B.4. Date/Time Configuration Files" and located this directory on my macOS Big Sur for my PG Version 13.2 installation:

/usr/local/share/postgresql/timezonesets

I saw that it has files with names like America.txt, Asia.txt, Europe.txt, and so on. I opened America.txt and read this:

# NOTE:
# This file is NOT loaded by the PostgreSQL database. It just serves as
# a template for timezones you could need. See the `Date/Time Support'
# appendix in the PostgreSQL documentation for more information.

The x-ref'd Appendix doesn't x-ref further to an explanation of the steps to follow in order actually to add timezons that I might need. It does have an entry for PST.

# CONFLICT! PST is not unique
# Other timezones:
# - PST: Philippine Standard Time
PST -28800 # Pacific Standard Time

Note the comment about a conflict. This suggests to naïve readers like me that conflicts are to be avoided. It also has a PDT entry.

I also looked in Asia.txt and found this:

# CONFLICT! PST is not unique
# Other timezones:
# - PST: Pacific Standard Time (America)
PST 28800 # Philippine Standard Time

I s'pose that this is what David wanted me to see. Conflicts are a bad thing—but not disallowed.

This does mean that the outcome of this:

select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'PST';

is undefined (unless I'm missing a statement of the priority rule for handling conflicts elsewhere in the PG doc) as I said in my email that started this thread, using a different example.

______________________________________________________________________

Secondly, Adrians's response.

Yes, the point that a timezone abbreviation does not uniquely determine the timezone offset is taken now. But notice this:

« In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC…»

from

"8.5.3. Time Zones"

https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES

This seems to me to be flat-out wrong. An abbreviation, in general, does not represent a specific offset from UTC. Rather, it can represent two or more different offsets.

This tells me that I must recommend never to use a timezone abbreviation in application code. It's anyway illegal as an argument for "set timezone". (If the same text happens to be a timezone name, well… that's what it is in that context.) And the result of using a timezone abbreviation in the "at time zone" clause is, in general, unspecified.

I'm convinced that the typical use case calls for using time zone names—exactly, as as been said, to benefit from the encoded DST transition rules that this brings. If I want to arrange a Zoom call with a colleague in Helsinki on, say, the Monday after the San Francisco "spring forward" weekend, and I don't remember when Finland springs forward, then I can do this to see what local time it is over there at my 09:00:

select '2021-03-20 09:00:00 America/Los_Angeles'::timestamptz at time zone 'Europe/Helsinki';

And if, for some strange reason, I want to find out what my local time in this example is in a +03:00 timezone (as this is shown in a ::text typecast of timestamptz value and not as POSIX has it) not caring about who does what with DST, I can spell it as I mean it:

select ('2021-03-20 09:00:00 America/Los_Angeles'::timestamptz at time zone 'UTC') + '3 hours'::interval;

This formulation is self-documenting. And it sidesteps all that rubbish about POSIZ saying "up" for what everybody else calls "down".

Because the purpose for the pg_timezone_abbrevs view (as it seems to me) is to control what's legal in the "at time zone" clause, I can forget it. Whatever it might say about utc_offset and is_dst is available, against the unique name, in pg_timezone_names.

This is an instructive example:

set timezone = 'America/Los_Angeles';
select
to_char('2021-11-07 08:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "1st 1:30",
to_char('2021-11-07 09:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "2nd 1:30";

This is the result:

1st 1:30 | 2nd 1:30
--------------------+--------------------
01:30:00 PDT (-07) | 01:30:00 PST (-08)

The "(-07)" and "(-08)" convey real information. But the "PDT" and "PST" do not—except in the sense that one is culturally aware and can Google for things. The pg_timezone_names view never has PDT and PST at the same querying moment. Rather, it has only one according to the date. And this

select abbrev, utc_offset, is_dst::text

from pg_timezone_abbrevs

where abbrev in ('PST', 'PDT');

with this result:

PDT | -07:00:00 | true
PST | -08:00:00 | false

gives just an illusion of value because the two abbreviations happen each to be unique in that view. Each could easily have many rows that, for each abbreviation, mix "true" and "false".

______________________________________________________________________

Thirdly, Tom's response.

About « the reason there are two views is that there are two sources of truth involved », well yes… and who wants two sources of truth? All the more reason to forget the pg_timezone_abbrevs view and never to use an abbreviation in the "at time zone" clause.

About « the IANA list has some zones such as "CET" that don't follow their own continent/city naming convention… If those look like abbreviations, which they mostly do, then it's confusing. », what about "UTC" itself. Sources like this"

The Difference Between GMT and UTC

https://www.timeanddate.com/time/gmt-utc-time.html

point out that UTC is a time standard and not a time zone. But it's still useful to have it listed in pg_timezone_names. It's a challenge to work out how to recommend what subset of what's listed there to use. This page:

List of tz database time zones

https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

is helpful because it shows the "cannonical" or not status of each entry. But consider this:

select name, abbrev, utc_offset, is_dst::text
from pg_timezone_names
where
  lower(name) like 'etc/gmt%0%' or
  lower(name) like 'etc/gmt%1%' or
  lower(name) like 'etc/gmt%2%' or
  lower(name) like 'etc/gmt%3%' or
  lower(name) like 'etc/gmt%4%' or
  lower(name) like 'etc/gmt%5%' or
  lower(name) like 'etc/gmt%6%' or
  lower(name) like 'etc/gmt%7%' or
  lower(name) like 'etc/gmt%8%' or
  lower(name) like 'etc/gmt%9%'
order by utc_offset;

with this result:

Etc/GMT+12 | -12   | -12:00:00  | false
Etc/GMT+11 | -11   | -11:00:00  | false
Etc/GMT+10 | -10   | -10:00:00  | false
...
Etc/GMT+1  | -01   | -01:00:00  | false
Etc/GMT+0  | GMT   | 00:00:00   | false
Etc/GMT-0  | GMT   | 00:00:00   | false
Etc/GMT0   | GMT   | 00:00:00   | false
Etc/GMT-1  | +01   | 01:00:00   | false
...
Etc/GMT-12 | +12   | 12:00:00   | false
Etc/GMT-13 | +13   | 13:00:00   | false
Etc/GMT-14 | +14   | 14:00:00   | false

(Notice the "up is down" POSIX silliness.) The Wikipedia "List of tz database time zones" has each of these as canonical. But I intend to recommend avoiding using these for two reasons. First reason: 'cos each name contains a number, it isn't controlled by the names in pg_timezone_names (like David pointed out in another email).

Try this (simulating a keystroke bounce typo):

set timezone = 'Etc/GMT+122';

show timezone;

It's silently accepted and gives "ETC/GMT+122" with "show". And then, in turn, this:

select '2021-03-20 09:00:00'::timestamptz;

gives this result:

2021-03-20 09:00:00-122

Nonsense, eh? As David said, it's an instance of the more general:

set timezone = 'Foo42Bar';

show timezone;

I wish there was a way to turn this off and accept only pg_timestamp_names.name values.

The second reason is that the abbreviations confuse ordinary readers who are slow to remember the "up is down" story.

Re: The contents of the pg_timezone_names view bring some surprises

From

Adrian Klaver

Date:

21 May 2021, 15:25:53

On 5/19/21 5:50 PM, Bryn Llewellyn wrote:

> Thanks, as ever, David and Tom, for your quick responses. Thanks also to 
> Adrian Klaver, who replied in a branched thread with this—in response to 
> my comment about my reading of the information content of the 
> pg_timezone_abbrevs view: « This claims (as I read it) that a time zone 
> abbreviation uniquely determines an offset from UTC. »
> 

> 
> *Secondly, Adrians's response.*
> 
> Yes, the point that a timezone abbreviation does not uniquely determine 
> the timezone offset is taken now. But notice this:
> 
> « In short, this is the difference between abbreviations and full names: 
> abbreviations represent a specific offset from UTC…»
> from
> 
> "8.5.3. Time Zones"
> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES 
> <https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES>
> 
> This seems to me to be flat-out wrong. An abbreviation, in general, does 
> not represent a specific offset from UTC. Rather, it can represent two 
> or more different offsets.

It is not flat out wrong. An abbreviation, say the one I'm in now PDT, 
will only represent a specific offset(-07), whereas the timezone I'm in, 
America/Los_Angeles, represents two offsets(-08/-07) the value of which 
depends on the date. Now there maybe another abbreviation that uses that 
same offset, but again it only represents a single offset.

> Nonsense, eh? As David said, it's an instance of the more general:
> 
> set timezone = 'Foo42Bar';
> show timezone;
> 
> I wish there was a way to turn this off and accept only 
> pg_timestamp_names.name values.
> 
> The second reason is that the abbreviations confuse ordinary readers who 
> are slow to remember the "up is down" story.
> 

The issue is you are looking for logic in a system that is based on 
political decisions. For instance there is a brewing West Coast 
movement, whereby the states on the US West Coast are looking to drop 
the DST transition with or without the approval of Congress. COVID 
stalled it, but I expect it will appear again in the near future.

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: The contents of the pg_timezone_names view bring some surprises

From

"Peter J. Holzer"

Date:

22 May 2021, 09:53:48

On 2021-05-18 23:31:57 -0700, Bryn Llewellyn wrote:
> Africa/Algiers     | CET    | 01:00:00   | f
> Africa/Cairo       | EET    | 02:00:00   | f
> Africa/Tripoli     | EET    | 02:00:00   | f
> Africa/Tunis       | CET    | 01:00:00   | f
> CET                | CEST   | 02:00:00   | t
> EET                | EEST   | 03:00:00   | t
> Egypt              | EET    | 02:00:00   | f
> Europe/Kaliningrad | EET    | 02:00:00   | f
> Libya              | EET    | 02:00:00   | f
>
> This tells me that when CET is used as a timezone name, it denotes an offset
> of 02:00—at least at some times of the year. And when the same text is used as
> an abbrev, it denotes an offset of 01:00.

I think you are confused by the fact that it is currently summer (well,
spring) and that some countries in the Central European Time zone
observe DST and others don't.

So "Africa/Tunis" shows up with an abbreviation of "CET" and an offset
of 01:00:00 while "Europe/Vienna" has the abbreviation "CEST" and an
offset of 02:00:00. Two months ago Vienna would also had the
abbreviation "CET" and an offset of 01:00:00.
The time zone "CET" probably reflects what most countries in that zone
do, so it is currently also in DST.

Same for Eastern European Time.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re: The contents of the pg_timezone_names view bring some surprises

From

"Peter J. Holzer"

Date:

22 May 2021, 10:09:23

On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote:
> On 5/18/21 11:31 PM, Bryn Llewellyn wrote:
> > This seems to be at odds with what section “8.5.3. Time Zones” at
> >
> > https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES
<https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES>
> >
> > says:
> >
> > «
> > A time zone abbreviation, for example PST. Such a specification merely
> > defines a particular offset from UTC, in contrast to full time zone
> > names which can imply a set of daylight savings transition rules as
> > well. The recognized abbreviations are listed in
> > the pg_timezone_abbrevs view (see Section 51.91). You cannot set the
> > configuration parameters TimeZone or log_timezone to a time zone
> > abbreviation, but you can use abbreviations in date/time input values
> > and with the AT TIME ZONE operator.
> > »
> >
> > This claims (as I read it) that a time zone abbreviation uniquely
> > determines an offset from UTC.
>
> It says no such thing

Maybe that's the inherent ambiguity of the English language, but to me
"Such a specification defines a particular offset from UTC" does imply a
one-to-one mapping from abbreviation to offset.

> and would be lying if it did.

As far as pg_timezone_abbrevs is concerned, it is correct. abbrev is
unique in that view:

hjp=> select abbrev, count(*) from pg_timezone_abbrevs group by 1 having count(*) > 1;
╔════════╤═══════╗
║ abbrev │ count ║
╟────────┼───────╢
╚════════╧═══════╝
(0 rows)

(at least in PostgreSQL 11.12)

It is not true in "the real world", where different time zones may use
the same abbreviation. But that isn't what the paragraph is about.


> What it is saying that, for example,  the timezone America/Los_Angeles has
> two timezone abbreviations PDT(what I'm currently in) and PST. If you use an
> abbreviation you don't get the DST transition rules that a full timezone
> name has.

Yes, does also say that.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re: The contents of the pg_timezone_names view bring some surprises

From

"Peter J. Holzer"

Date:

22 May 2021, 10:36:41

On 2021-05-22 12:09:23 +0200, Peter J. Holzer wrote:
> On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote:
> > On 5/18/21 11:31 PM, Bryn Llewellyn wrote:
> > > This seems to be at odds with what section “8.5.3. Time Zones” at
> > >
> > > https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES
<https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES>
> > >
> > > says:
> > >
> > > «
> > > A time zone abbreviation, for example PST. Such a specification merely
> > > defines a particular offset from UTC, in contrast to full time zone
> > > names which can imply a set of daylight savings transition rules as
> > > well. The recognized abbreviations are listed in
> > > the pg_timezone_abbrevs view (see Section 51.91). You cannot set the
> > > configuration parameters TimeZone or log_timezone to a time zone
> > > abbreviation, but you can use abbreviations in date/time input values
> > > and with the AT TIME ZONE operator.
> > > »
> > >
> > > This claims (as I read it) that a time zone abbreviation uniquely
> > > determines an offset from UTC.
> >
> > It says no such thing
>
> Maybe that's the inherent ambiguity of the English language, but to me
> "Such a specification defines a particular offset from UTC" does imply a
> one-to-one mapping from abbreviation to offset.

And I just realised that "one-to-one" isn't the right term.
Mathematically it would be "functional": There is exactly one offset for
each abbreviation (never two or more; there might be zero but in that
case one could argue that this isn't actually a time zone abbreviation),
but several abbreviations can map to the same offset.

        hp


--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re: The contents of the pg_timezone_names view bring some surprises

From

Adrian Klaver

Date:

22 May 2021, 15:26:27

On 5/22/21 3:09 AM, Peter J. Holzer wrote:
> On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote:
>> On 5/18/21 11:31 PM, Bryn Llewellyn wrote:
>>> This seems to be at odds with what section “8.5.3. Time Zones” at
>>>
>>> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES
<https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES>
>>>
>>> says:
>>>
>>> «
>>> A time zone abbreviation, for example PST. Such a specification merely
>>> defines a particular offset from UTC, in contrast to full time zone
>>> names which can imply a set of daylight savings transition rules as
>>> well. The recognized abbreviations are listed in
>>> the pg_timezone_abbrevs view (see Section 51.91). You cannot set the
>>> configuration parameters TimeZone or log_timezone to a time zone
>>> abbreviation, but you can use abbreviations in date/time input values
>>> and with the AT TIME ZONE operator.
>>> »
>>>
>>> This claims (as I read it) that a time zone abbreviation uniquely
>>> determines an offset from UTC.
>>
>> It says no such thing
> 
> Maybe that's the inherent ambiguity of the English language, but to me
> "Such a specification defines a particular offset from UTC" does imply a
> one-to-one mapping from abbreviation to offset.

As your later post points out the reality is a given abbreviation has 
only one offset, but an offset may have many abbreviations.

> 
>> and would be lying if it did.
> 
> As far as pg_timezone_abbrevs is concerned, it is correct. abbrev is
> unique in that view:

1) Go to Postgres cluster and in ~/share/timezonesets search in 
Defaults.txt for the word C(c)onflict.

2)Then go here 
https://www.postgresql.org/docs/current/datetime-config-files.html and 
see how the below query could be made to change if someone is not aware 
of 1).

More below.
> 
> hjp=> select abbrev, count(*) from pg_timezone_abbrevs group by 1 having count(*) > 1;
> ╔════════╤═══════╗
> ║ abbrev │ count ║
> ╟────────┼───────╢
> ╚════════╧═══════╝
> (0 rows)
> 
> (at least in PostgreSQL 11.12)
> 
> It is not true in "the real world", where different time zones may use
> the same abbreviation. But that isn't what the paragraph is about.

~/share/timezonesets encompasses the 'real world'.

> 
> 
>> What it is saying that, for example,  the timezone America/Los_Angeles has
>> two timezone abbreviations PDT(what I'm currently in) and PST. If you use an
>> abbreviation you don't get the DST transition rules that a full timezone
>> name has.
> 
> Yes, does also say that.
> 
>          hp
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: The contents of the pg_timezone_names view bring some surprises

From

"Peter J. Holzer"

Date:

22 May 2021, 17:55:15

On 2021-05-22 08:26:27 -0700, Adrian Klaver wrote:
> On 5/22/21 3:09 AM, Peter J. Holzer wrote:
> > On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote:
> > > On 5/18/21 11:31 PM, Bryn Llewellyn wrote:
> > > > This claims (as I read it) that a time zone abbreviation uniquely
> > > > determines an offset from UTC.
> > >
> > > It says no such thing
> >
> > Maybe that's the inherent ambiguity of the English language, but to me
> > "Such a specification defines a particular offset from UTC" does imply a
> > one-to-one mapping from abbreviation to offset.
>
> As your later post points out the reality is a given abbreviation has only
> one offset, but an offset may have many abbreviations.

Which is what Bryn wrote. (But I realize I'm getting into an argument
about what another person meant - again. I should stop that. If Bryn
thinks it is important to be understood he can explain himself.)

> > > and would be lying if it did.
> >
> > As far as pg_timezone_abbrevs is concerned, it is correct. abbrev is
> > unique in that view:
>
> 1) Go to Postgres cluster and in ~/share/timezonesets search in Defaults.txt
> for the word C(c)onflict.
>
> 2)Then go here
> https://www.postgresql.org/docs/current/datetime-config-files.html and see
> how the below query could be made to change if someone is not aware of 1).

Actually, no. If you introduce a conflict in a timezoneset file the
database doesn't even start:

2021-05-22 19:47:17 CEST [399103]: [1-1] user=,db=,pid=399103 FATAL:  time zone abbreviation "test" is multiply defined
2021-05-22 19:47:17 CEST [399103]: [2-1] user=,db=,pid=399103 DETAIL:  Entry in time zone file "Default", line 634,
conflictswith entry in file "Default", line 635. 
pg_ctl: could not start server

So abbrev is a unique key on pg_timezone_abbrevs and the documentation
is correct.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re: The contents of the pg_timezone_names view bring some surprises

From

Adrian Klaver

Date:

22 May 2021, 18:46:35

On 5/22/21 10:55 AM, Peter J. Holzer wrote:
> On 2021-05-22 08:26:27 -0700, Adrian Klaver wrote:
>> On 5/22/21 3:09 AM, Peter J. Holzer wrote:
>>> On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote:
>>>> On 5/18/21 11:31 PM, Bryn Llewellyn wrote:
>>>>> This claims (as I read it) that a time zone abbreviation uniquely
>>>>> determines an offset from UTC.
>>>>
>>>> It says no such thing
>>>
>>> Maybe that's the inherent ambiguity of the English language, but to me
>>> "Such a specification defines a particular offset from UTC" does imply a
>>> one-to-one mapping from abbreviation to offset.
>>
>> As your later post points out the reality is a given abbreviation has only
>> one offset, but an offset may have many abbreviations.
> 
> Which is what Bryn wrote. (But I realize I'm getting into an argument
> about what another person meant - again. I should stop that. If Bryn
> thinks it is important to be understood he can explain himself.)
> 
> 
>>>> and would be lying if it did.
>>>
>>> As far as pg_timezone_abbrevs is concerned, it is correct. abbrev is
>>> unique in that view:
>>
>> 1) Go to Postgres cluster and in ~/share/timezonesets search in Defaults.txt
>> for the word C(c)onflict.
>>
>> 2)Then go here
>> https://www.postgresql.org/docs/current/datetime-config-files.html and see
>> how the below query could be made to change if someone is not aware of 1).
> 
> Actually, no. If you introduce a conflict in a timezoneset file the
> database doesn't even start:
> 
> 2021-05-22 19:47:17 CEST [399103]: [1-1] user=,db=,pid=399103 FATAL:  time zone abbreviation "test" is multiply
defined
> 2021-05-22 19:47:17 CEST [399103]: [2-1] user=,db=,pid=399103 DETAIL:  Entry in time zone file "Default", line 634,
conflictswith entry in file "Default", line 635.
 
> pg_ctl: could not start server
> 
> So abbrev is a unique key on pg_timezone_abbrevs and the documentation
> is correct.

I see no unique key. That led me to the wrong assumption that duplicate 
abbreviations could exist in the view.

I do see this, which I had not looked at before:

https://www.postgresql.org/docs/12/datetime-config-files.html

"The @OVERRIDE syntax indicates that subsequent entries in the file can 
override previous entries (typically, entries obtained from included 
files). Without this, conflicting definitions of the same timezone 
abbreviation are considered an error."

"
Note

If an error occurs while reading the time zone abbreviation set, no new 
value is applied and the old set is kept. If the error occurs while 
starting the database, startup fails.
"

So the error I'm thinking, comes during the reading and processing of 
the file contents in pg_timezone_abbrevs(), which is what the view is 
built on.

I learned something, Postgres errors before allowing a conflict.



> 
>          hp
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: The contents of the pg_timezone_names view bring some surprises

From

Bryn Llewellyn

Date:

23 May 2021, 00:54:07

Tom, David, Adrian, and Peter—thank you all very much for taking an interest in my questions. Your responses have, collectively, been an enormous help. I deleted the text of the exchanges in this particular branch of the overall discussion because it's become rather difficult to work out who said what in response to what. It's all there in the "pgsql-general" email archive.

Here's my summary (in no particular order) of what I've learned. Please feel free to ignore it.

----------------------------------------------------------------------

(1) In the context of discussing timezones, the English word "abbreviation" maps to two distinct notions in PG: "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom earlier said « there are two views [because there are] two sources of truth involved ». This really means that these two columns list different classes of facts. It's therefore unfortunate that they both have the same name.

----------------------------------------------------------------------

(2) It's no wonder, therefore, that I was confused. Anybody with even a slight exposure to relational design would guess that "pg_timezone_names.abbrev" is a FK to the PK in "pg_timezone_abbrevs.abbrev". And they might wonder why "utc_offset" and "is_dst" seem to be denormalized. But they'd probably recall that such things are common, as a usability convenience, in views. Anyway, I'm beyond that confusion now.

----------------------------------------------------------------------

(3) It helps me to think of "pg_timezone_names.abbrev" as "nickname"—which notion is unique, just for each name. It's useful only informally, and only when a particular timezone observes DST, as a shorthand for disambiguation. I used the example earlier:

gets:

1st 1:30 | 2nd 1:30
--------------------+--------------------
01:30:00 PDT (-07) | 01:30:00 PST (-08)

The fact that you can find another nickname spelled "PST" in "pg_timezone_names" doesn't matter: This (as of this email's time stamp):

select name, abbrev, utc_offset from pg_timezone_names where abbrev = 'PST';

gets this:

name | abbrev | utc_offset
-------------+--------+------------
Asia/Manila | PST | 08:00:00

The fact that Asia/Manila happens not to observe DST makes my example a little less powerful. Never mind, their politicians could decide to introduce it presently and to give DST the nickname BST (or DOG). It's remarkable, in itself, that "ST" in "BST" means "Summer Time" but that it means "Standard Time" in "PST". But this nicely unerlines the point that there's no rhyme or reason in the design of these abbreviations.

Notably, the "nickname" in "pg_timezone_names" has nothing at all (formally) to do with "abbrev" in "pg_timezone_abbrevs".

----------------------------------------------------------------------

(4) My realization, as set out in #3, helps me now to understand the rule:

« PostgreSQL allows you to specify time zones in three different forms... A time zone abbreviation, for example PST [but ONLY] in date/time input values and with the AT TIME ZONE operator… Such a specification merely defines a particular offset from UTC... The recognized abbreviations are listed in the pg_timezone_abbrevs view »

This (to my embarrassment, now) does say that a "nickname" from "pg_timezone_names" is not allowed as the argument for the AT TIME ZONE operator. But it doesn't adumbrate that the English word "abbreviation" means what I now understand it to mean. You need to get the point from one terse sentence without the luxury of examples to strengthen the explanation.

I mentioned somewhere that I had discovered that this:

select * from pg_timezone_names where abbrev = 'CAT';

gets lots of rows (all with utc_offset = '02:00'::interval) but that this:

select * from pg_timezone_abbrevs where abbrev = 'CAT';

gets no rows. This informs the test that Tom suggested that I might do:

select '2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'CAT';

It causes this error:

22023: time zone "CAT" not recognized

Notice that this:

select '2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'CAT42';

silently succeeds. Sure enough, the doc does say « PostgreSQL will accept POSIX-style time zone specifications, as described in Section B.5. [as the argument for AT TIME ZONE] » But what a quagmire of confusion this is. I said elsewhere that the same aim (get the local time at -42 hours w.r.t. UTC) can be met with transparent syntax, thus:

select ('2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'UTC') - '42:00'::interval;

I know which syntax I prefer!

----------------------------------------------------------------------

(5) I was embarrassingly slow to find this in the doc for the SET command:

« Timezone settings given as numbers or intervals are internally translated to POSIX timezone syntax. For example, after SET TIME ZONE -7, SHOW TIME ZONE would report <-07>+07. »

Search tools are famously feeble for finding such things. This instructs the careful reader to beware of this trap:

-- OK.

set timezone = 'America/Los_Angeles';

-- Clear error 22023: invalid value for parameter "TimeZone".

set timezone = 'America/Los_Angelez';

-- Silently succeeds.

set timezone = 'America/Los_Angele9';

(David pointed this out.) This is confusing in itself. And it's compounded by this fact: POSIX uses positive offsets to denote what both the utc_offset column and to_char(<some timestamptz value>), with the format element TH:TM, shows as negative. But it is what it is.

----------------------------------------------------------------------

(6) To the extent that a catalog view can have a business unique key (and I believe that this must be the case for it to be useful), the unique key for "pg_timezone_names" is "name" and the unique key for "pg_timezone_abbrevs" is "abbrev". I'm going to assume that PG has code to maintain these invariants. This gives me the way to interpret the statement "an [abbreviation] specification merely defines a particular offset from UTC".

Peter showed us a query like this:

select count(*) from (
select abbrev from pg_timezone_abbrevs
group by abbrev
having count(*) > 1) as a;

In my un-tampered-with PG 13.2 env, it gets a count of zero. I like this.

Adrian pointed me to the "B.4. Date/Time Configuration Files" Appendix here:

https://www.postgresql.org/docs/current/datetime-config-files.html

and to this in particular:

«
The @OVERRIDE syntax indicates that subsequent entries in the file can override previous entries (typically, entries obtained from included files). Without this, conflicting definitions of the same timezone abbreviation are considered an error.

So even without "timezone_abbreviations" set to "default" at session level, "pg_timezone_abbrevs.abbrev" will always be unique.

----------------------------------------------------------------------

(7) There are two different syntaxes for setting the timezone session environment variable. This (or with TO instead of =):

set timezone =

and this:

set time zone

This, too, is hugely confusing. (Correspondingly, there's both "show timezone" and "show time zone".)

The first "set" (and "show") alternative is consistent with how you set all other session environment variables. Moreover, all are observable with the current_setting() built-in function. And there the argument "timezone" can only be spelled as one word. This makes me favor the "set timezone" spelling.

However, only the "set time zone" spelling allows an argument that's an explicit interval value like this:

set time zone interval '-08:00';

I said "explicit interval value" because this isn't true. For example, these two fail:

set time zone '-08:00'::interval;

and

set time zone make_interval(hours=>-8);

both fail, even though each uses a legal interval value. This is confusion on top of confusion. All the more reason to avoid it.

Re: The contents of the pg_timezone_names view bring some surprises

From

Adrian Klaver

Date:

23 May 2021, 15:44:58

On 5/22/21 5:52 PM, Bryn Llewellyn wrote:
> Tom, David, Adrian, and Peter—thank you all very much for taking an 
> interest in my questions. Your responses have, collectively, been an 
> enormous help. I deleted the text of the exchanges in this particular 
> branch of the overall discussion because it's become rather difficult to 
> work out who said what in response to what. It's all there in the 
> "pgsql-general" email archive.
> 
> Here's my summary (in no particular order) of what I've learned. Please 
> feel free to ignore it.
> 
> ----------------------------------------------------------------------
> 
> (1) In the context of discussing timezones, the English word 
> "abbreviation" maps to two distinct notions in PG: 
> "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom 
> earlier said « there are two views [because there are] two sources of 
> truth involved ». This really means that these two columns list 
> different classes of facts. It's therefore unfortunate that they both 
> have the same name.

They are reporting the same facts, just in different context. 
pg_timezone_abbrevs.abbrev is a list of un-duplicated, as I found out, 
abbreviations. pg_timezone_names.abbrev shows the abbreviations or 
offsets in effect at CURRENT_TIMESTAMP.

> 
> ----------------------------------------------------------------------
> 
> (2) It's no wonder, therefore, that I was confused. Anybody with even a 
> slight exposure to relational design would guess 
> that "pg_timezone_names.abbrev" is a FK to the PK in 
> "pg_timezone_abbrevs.abbrev". And they might wonder why "utc_offset" and 
> "is_dst" seem to be denormalized. But they'd probably recall that such 
> things are common, as a usability convenience, in views. Anyway, I'm 
> beyond that confusion now.

Since the views are based on functions that are displaying different 
context I don't find it surprising.

> 
> ----------------------------------------------------------------------
> 
> (3) It helps me to think of "pg_timezone_names.abbrev" as 
> "nickname"—which notion is unique, just for each name. It's useful only 
> informally, and only when a particular timezone observes DST, as a 
> shorthand for disambiguation. I used the example earlier:
> 

> 
> The fact that Asia/Manila happens not to observe DST makes my example a 
> little less powerful. Never mind, their politicians could decide to 
> introduce it presently and to give DST the nickname BST (or DOG). It's 
> remarkable, in itself, that "ST" in "BST" means "Summer Time" but that 
> it means "Standard Time" in "PST". But this nicely unerlines the point 
> that there's no rhyme or reason in the design of these abbreviations.
> 
> Notably, the "nickname" in "pg_timezone_names" has nothing at all 
> (formally) to do with "abbrev" in "pg_timezone_abbrevs".

Yes it does. It just there is not always a one-to-one correspondence 
between the two. For instance duplicate abbreviations do exist but they 
are filtered out of pg_timezone_abbrevs.

> 
> ----------------------------------------------------------------------
> 
> (4) My realization, as set out in #3, helps me now to understand the rule:
> 
> « PostgreSQL allows you to specify time zones in three different 
> forms... A time zone abbreviation, for example PST [but ONLY] in 
> date/time input values and with the AT TIME ZONE operator… Such a 
> specification merely defines a particular offset from UTC... The 
> recognized abbreviations are listed in the pg_timezone_abbrevs view »
> 
> This (to my embarrassment, now) does say that a "nickname" from 
> "pg_timezone_names" is not allowed as the argument for the AT TIME ZONE 
> operator. But it doesn't adumbrate that the English word "abbreviation" 
> means what I now understand it to mean. You need to get the point from 
> one terse sentence without the luxury of examples to strengthen the 
> explanation.
> 

> silently succeeds. Sure enough, the doc does say « PostgreSQL will 
> accept POSIX-style time zone specifications, as described in Section 
> B.5. [as the argument for AT TIME ZONE] » But what a quagmire of 
> confusion this is. I said elsewhere that the same aim (get the local 
> time at -42 hours w.r.t. UTC) can be met with transparent syntax, thus:
> 
> *select ('2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time 
> zone 'UTC') - '42:00'::interval;*
> 
> I know which syntax I prefer!

That is the root of the above, preference. There are many ways to 
express time zones and as a general purpose database Postgres needs to 
deal with all of them.

> 
> ----------------------------------------------------------------------
> 
> (5) I was embarrassingly slow to find this in the doc for the SET command:
> 
> « Timezone settings given as numbers or intervals are internally 
> translated to POSIX timezone syntax. For example, after SET TIME ZONE 
> -7, SHOW TIME ZONE would report <-07>+07. »
> 

> (David pointed this out.) This is confusing in itself. And it's 
> compounded by this fact: POSIX uses positive offsets to denote what both 
> the utc_offset column and to_char(<some timestamptz value>), with the 
> format element TH:TM, shows as negative. But it is what it is.

See my earlier post about trying to find logic in this.

> 
> ----------------------------------------------------------------------
> 
> (6) To the extent that a catalog view can have a business unique key 
> (and I believe that this must be the case for it to be useful), the 
> unique key for "pg_timezone_names" is "name" and the unique key for 
> "pg_timezone_abbrevs" is "abbrev". I'm going to assume that PG has code 
> to maintain these invariants. This gives me the way to interpret the 
> statement "an [abbreviation] specification merely defines a particular 
> offset from UTC".
> 
> 
> «
> The @OVERRIDE syntax indicates that subsequent entries in the file can 
> override previous entries (typically, entries obtained from included 
> files). Without this, conflicting definitions of the same timezone 
> abbreviation are considered an error.
> »
> 
> So even without "timezone_abbreviations" set to "default" at session 
> level, "pg_timezone_abbrevs.abbrev" will always be unique.

True.

> 
> ----------------------------------------------------------------------
> 
> (7) There are two different syntaxes for setting the timezone session 
> environment variable. This (or with TO instead of =):
> 
> *set timezone =*
> 
> and this:
> 
> *set time zone*
> 
> This, too, is hugely confusing. (Correspondingly, there's both "show 
> timezone" and "show time zone".)
> 
> The first "set" (and "show") alternative is consistent with how you set 
> all other session environment variables. Moreover, all are observable 
> with the current_setting() built-in function. And there the argument 
> "timezone" can only be spelled as one word. This makes me favor the "set 
> timezone" spelling.
> 
> However, only the "set time zone" spelling allows an argument that's an 
> explicit interval value like this:
> 
> *set time zone interval '-08:00';*
> 
> I said "explicit interval value" because this isn't true. For example, 
> these two fail:
> 
> *set time zone '-08:00'::interval;*

> 
> and
> 
> *set time zone make_interval(hours=>-8);*
> 
> both fail, even though each uses a legal interval value. This is 
> confusion on top of confusion. All the more reason to avoid it.

It is spelled out here:

https://www.postgresql.org/docs/12/sql-set.html

Basically set time zone is for customizing the timezone value, whereas 
set timezone is for using 'standard' timezone values. They both end up 
setting the same value.

> 
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: The contents of the pg_timezone_names view bring some surprises

From

Tom Lane

Date:

23 May 2021, 17:00:48

Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 5/22/21 5:52 PM, Bryn Llewellyn wrote:
>> (1) In the context of discussing timezones, the English word 
>> "abbreviation" maps to two distinct notions in PG: 
>> "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom 
>> earlier said « there are two views [because there are] two sources of 
>> truth involved ». This really means that these two columns list 
>> different classes of facts. It's therefore unfortunate that they both 
>> have the same name.

> They are reporting the same facts, just in different context. 

No, Bryn's right really.  pg_timezone_names.abbrev reports out what
the IANA (tzdb) database says is the abbreviation in use at the
current time in the indicated time zone.  pg_timezone_abbrevs.abbrev
reports about names defined by the timezone_abbreviations configuration
file.  While those ought to be generally in sync, there's no expectation
that they necessarily match one-to-one; they really can't given that
the same abbreviation can be in use to mean different things in different
parts of the world.  IANA will happily report such inconsistent
abbreviations, but timezone_abbreviations has to treat names as unique
keys, so at most one meaning of a given abbreviation will be available
there.  Indeed there might be *no* entry there for an abbreviation that
appears in some row of pg_timezone_names, particularly if an installation
has chosen to trim the timezone_abbreviations contents in order to reduce
the potential for bad data entry.  Conversely, there are quite a few
entries in our standard timezone_abbreviations list that match no entry
in tzdb.  (Most of them used to, but IANA has been on a kick lately
of removing abbreviations that they invented.)  And that's before you
even get into the question of dubious custom mods of
timezone_abbreviations.  Or custom mods of the tzdb files, for that
matter.  Or timezone_abbreviations files that aren't in sync with
the tzdb database that's in use, which is an extremely common
situation given that we recommend use of --with-system-tzdata.

            regards, tom lane

Re: The contents of the pg_timezone_names view bring some surprises

From

Bryn Llewellyn

Date:

23 May 2021, 19:59:05

tgl@sss.pgh.pa.us wrote:

adrian.klaver@aklaver.com wote:

Bryn wrote:

(1) In the context of discussing timezones, the English word "abbreviation" maps to two distinct notions in PG: "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom earlier said « there are two views [because there are] two sources of truth involved ». This really means that these two columns list different classes of facts. It's therefore unfortunate that they both have the same name.

They are reporting the same facts, just in different context.

No, Bryn's right really. pg_timezone_names.abbrev reports out what the IANA (tzdb) database says is the abbreviation in use at the current time in the indicated time zone. pg_timezone_abbrevs.abbrev reports about names defined by the timezone_abbreviations configuration file. While those ought to be generally in sync, there's no expectation that they necessarily match one-to-one; they really can't given that the same abbreviation can be in use to mean different things in different parts of the world. IANA will happily report such inconsistent abbreviations, but timezone_abbreviations has to treat names as unique keys, so at most one meaning of a given abbreviation will be available there. Indeed there might be *no* entry there for an abbreviation that appears in some row of pg_timezone_names, particularly if an installation has chosen to trim the timezone_abbreviations contents in order to reduce the potential for bad data entry. Conversely, there are quite a few entries in our standard timezone_abbreviations list that match no entry in tzdb. (Most of them used to, but IANA has been on a kick lately of removing abbreviations that they invented.) And that's before you even get into the question of dubious custom mods of timezone_abbreviations. Or custom mods of the tzdb files, for that matter. Or timezone_abbreviations files that aren't in sync with the tzdb database that's in use, which is an extremely common situation given that we recommend use of "--with-system-tzdata".

Thank you very much, Tom. I should have taken more care with my wording. You said it very well, of course, here.

It seems to me, against the background that we've been discussing, that the pg_timezone_abbrevs view serves just this purpose: to provide a list of text strings that can be used as the argument for "at time zone" and as the tz specification in a timestamptz literal. Unfortunately, but for reasons that I can only accept, the pg_timezone_abbrevs view translates abbrev key values that are found into an offset value. But it doesn't necessarily error when presented with a key that it doesn't have. This, too, has been discussed at length in this exchanges. These examples make the point.

set timezone = 'UTC';

select '2021-05-23 19:00:00 foo42bar'::timestamptz;

select '2021-05-23 19:00:00'::timestamptz at time zone 'bar99foo';

Sadly, there's no simple way to enforce a practice for applications that want to avoid this risk unless, maybe, every such plain timestamp expression is constructed programmatically.

There is hope for this:

set timezone = 'bar99foo';

brought by the possibility of writing one's own set_timezone() function that checks the input values against pg_timezone_names.name. But it would win only half the battle.

Re: The contents of the pg_timezone_names view bring some surprises

From

"Peter J. Holzer"

Date:

24 May 2021, 10:46:23

On 2021-05-23 12:55:52 -0700, Bryn Llewellyn wrote:
> But it doesn't necessarily error when presented with a key that it
> doesn't have. This, too, has been discussed at length in this
> exchanges. These examples make the point.
>
> set timezone = 'UTC';
> select '2021-05-23 19:00:00 foo42bar'::timestamptz;
> select '2021-05-23 19:00:00'::timestamptz at time zone 'bar99foo';

This is an unfortunate side effect of the flexibility of posix-style
timezone specifications. That flexibility was very useful when the
format was invented in the 1980's. The Olson database may or may not
have existed at the time (Wikipedia says its "origins go back to 1986 or
earlier"), but it wasn't well known and system administrators were
expected to set a TZ environment variable with the correct rules for
their time zone, so it had to be flexible enough for all time zones in
the world (not sure if it actually achieved that goal). When the Olson
(now IANA) timezone database spread that flexibility became mostly
obsolete but it might still be needed sometimes (for example Turkey has
in recent times changed the date for a DST switch at very short notice
so some administrators may have had to override their system's rules
when the vendor didn't issue a patch in time).

> Sadly, there's no simple way to enforce a practice for applications that want
> to avoid this risk unless, maybe, every such plain timestamp expression is
> constructed programmatically.

I think there are two possibilities:

1. Give the user a list of possible timezones to choose from (maybe a
simple dropdown menu, maybe a fancy map where they can click on their
location) and don't allow them to enter timezone names manually.

2. Ignore the problem. If a user enters a time zone of "foo42bar" either
they know what they are doing or they made a mistake. The latter case is
really not that different from entering a wrong but existing timezone or
making a typo in the date or time. In any case you now have a wrong
timestamp in your database which you may or may not be able to catch via
other QA measures.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc