Thread: The contents of the pg_timezone_names view bring some surprises
Some time zones have abbreviations that are identical to their names. This query:
select name
from pg_timezone_names
where abbrev = name
order by name;
gets this result:
EST
GMT
HST
MST
UCT
UTC
GMT
HST
MST
UCT
UTC
This, in itself, doesn’t seem remarkable. I wondered if any time zones have names that occur as time zone abbreviations but where the name and its abbreviation differ.
select name
from
pg_timezone_names
where
name in (select abbrev from pg_timezone_names) and
name <> abbrev
order by name;
from
pg_timezone_names
where
name in (select abbrev from pg_timezone_names) and
name <> abbrev
order by name;
gets this result:
CET
EET
EET
So I wondered what rows have CET or EET as either a name or an abbreviation and yet the name and the abbreviation differ.
select name, abbrev, utc_offset, is_dst
from pg_timezone_names
where
(
name in ('CET', 'EET') or
abbrev in ('CET', 'EET')
)
and name <> abbrev
order by name;
Africa/Algiers | CET | 01:00:00 | f
Africa/Cairo | EET | 02:00:00 | f
Africa/Tripoli | EET | 02:00:00 | f
Africa/Tunis | CET | 01:00:00 | f
CET | CEST | 02:00:00 | t
EET | EEST | 03:00:00 | t
Egypt | EET | 02:00:00 | f
Europe/Kaliningrad | EET | 02:00:00 | f
Libya | EET | 02:00:00 | f
from pg_timezone_names
where
(
name in ('CET', 'EET') or
abbrev in ('CET', 'EET')
)
and name <> abbrev
order by name;
gets this result:
Africa/Algiers | CET | 01:00:00 | f
Africa/Cairo | EET | 02:00:00 | f
Africa/Tripoli | EET | 02:00:00 | f
Africa/Tunis | CET | 01:00:00 | f
CET | CEST | 02:00:00 | t
EET | EEST | 03:00:00 | t
Egypt | EET | 02:00:00 | f
Europe/Kaliningrad | EET | 02:00:00 | f
Libya | EET | 02:00:00 | f
This tells me that when CET is used as a timezone name, it denotes an offset of 02:00—at least at some times of the year. And when the same text is used as an abbrev, it denotes an offset of 01:00.
But you can use either a timezone name, or a timezone abbreviation in the `at time zone` clause (see below).
There’s a similar story for EET where it denotes respectively offsets of 03:00 and 02:00.
Here’s what seems to me to be a closely related dilemma. I’d thought that an abbrev uniquely specified the utc_offset. But this test shows that it doesn’t:
with
v1 as (
select distinct abbrev, utc_offset
from pg_timezone_names),
v2 as (
select abbrev, count(*)
from v1
group by abbrev
having count(*) > 1)
select name, abbrev, utc_offset, is_dst
from pg_timezone_names
where abbrev in (select abbrev from v2)
order by abbrev;
v1 as (
select distinct abbrev, utc_offset
from pg_timezone_names),
v2 as (
select abbrev, count(*)
from v1
group by abbrev
having count(*) > 1)
select name, abbrev, utc_offset, is_dst
from pg_timezone_names
where abbrev in (select abbrev from v2)
order by abbrev;
It gets 46 rows. Here’s an interesting subset:
America/Monterrey | CDT | -05:00:00 | t
America/Havana | CDT | -04:00:00 | t
ROC | CST | 08:00:00 | f
America/Costa_Rica | CST | -06:00:00 | f
Eire | IST | 01:00:00 | f
Asia/Kolkata | IST | 05:30:00 | f
America/Havana | CDT | -04:00:00 | t
ROC | CST | 08:00:00 | f
America/Costa_Rica | CST | -06:00:00 | f
Eire | IST | 01:00:00 | f
Asia/Kolkata | IST | 05:30:00 | f
So here, the same text, even when used as abbrev, can denote different utc_offset values. (But note that there seems to be no way, in the 'at time zone' clause, that I can say that I want a text value to be taken as a name and not as an abbreviation, or vice versa.)
This seems to be at odds with what section “8.5.3. Time Zones” at
says:
«
A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs view (see Section 51.91). You cannot set the configuration parameters TimeZone or log_timezone to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator.
»
This claims (as I read it) that a time zone abbreviation uniquely determines an offset from UTC.
It seems that the result of this is therefore undefined because CDT denotes two different utc_offset values.:
select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'CDT';
The same goes for this:
select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'EET';
In summary, each of these texts, for two kinds of reason, can denote two different utc_offset values.:
CET
EET
CDT
CST
IST
Am I missing an essential clue to resolving what seems to me to be a paradox? Or am I seeing two kinds of bug?
On Tuesday, May 18, 2021, Bryn Llewellyn <bryn@yugabyte.com> wrote:
Some time zones have abbreviations that are identical to their names. This query:Am I missing an essential clue to resolving what seems to me to be a paradox? Or am I seeing two kinds of bug?
You are missing the material in appendix B.4
I think that covers dealing with the situations you note.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Tuesday, May 18, 2021, Bryn Llewellyn <bryn@yugabyte.com> wrote: >> Am I missing an essential clue to resolving what seems to me to be a >> paradox? Or am I seeing two kinds of bug? > You are missing the material in appendix B.4 > https://www.postgresql.org/docs/current/datetime-config-files.html The short answer is that the reason there are two views is that there are two sources of truth involved. pg_timezone_names reflects the zone names defined in the IANA timezone database, while pg_timezone_abbrevs reflects the abbreviations defined in our user-customizable abbreviations table. It'd be impossible to make them match exactly, and we don't try exceedingly hard. In particular, the IANA list has some zones such as "CET" that don't follow their own continent/city naming convention. (AFAIK those are all legacy zones that they'd get rid of if they weren't concerned with backwards compatibility.) If those look like abbreviations, which they mostly do, then it's confusing. Where the rubber meets the road is in timestamptz input, and there we consult the abbreviations table first. (Not sure if that's documented, but you can easily prove it by experiment.) As for the question about "abbreviations" like +09 --- those are not abbreviations at all, they're just hard-coded numeric UTC offsets. So they don't appear in pg_timezone_abbrevs. IANA uses those as display offsets in zones where there's not any widely-used-on-the-ground abbreviation. regards, tom lane
On 5/18/21 11:31 PM, Bryn Llewellyn wrote: > Some time zones have abbreviations that are identical to their names. > This query: > > Here’s what seems to me to be a closely related dilemma. I’d thought > that an abbrev uniquely specified the utc_offset. But this test shows > that it doesn’t: > > with > v1 as ( > select distinct abbrev, utc_offset > from pg_timezone_names), > v2 as ( > select abbrev, count(*) > from v1 > group by abbrev > having count(*) > 1) > select name, abbrev, utc_offset, is_dst > from pg_timezone_names > where abbrev in (select abbrev from v2) > order by abbrev; > > It gets 46 rows. Here’s an interesting subset: > > America/Monterrey | CDT | -05:00:00 | t > America/Havana | CDT | -04:00:00 | t > > ROC | CST | 08:00:00 | f > America/Costa_Rica | CST | -06:00:00 | f > > Eire | IST | 01:00:00 | f > Asia/Kolkata | IST | 05:30:00 | f > > So here, the same text, even when used as abbrev, can denote different > utc_offset values. (But note that there seems to be no way, in the 'at > time zone' clause, that I can say that I want a text value to be taken > as a name and not as an abbreviation, or vice versa.) > > This seems to be at odds with what section “8.5.3. Time Zones” at > > https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES > <https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES> > > says: > > « > A time zone abbreviation, for example PST. Such a specification merely > defines a particular offset from UTC, in contrast to full time zone > names which can imply a set of daylight savings transition rules as > well. The recognized abbreviations are listed in > the pg_timezone_abbrevs view (see Section 51.91). You cannot set the > configuration parameters TimeZone or log_timezone to a time zone > abbreviation, but you can use abbreviations in date/time input values > and with the AT TIME ZONE operator. > » > > This claims (as I read it) that a time zone abbreviation uniquely > determines an offset from UTC. It says no such thing and would be lying if it did. Take a look at this: https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations and see the abbreviations that share offsets. What it is saying that, for example, the timezone America/Los_Angeles has two timezone abbreviations PDT(what I'm currently in) and PST. If you use an abbreviation you don't get the DST transition rules that a full timezone name has. > > It seems that the result of this is therefore undefined because CDT > denotes two different utc_offset values.: > > select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'CDT'; > > The same goes for this: > > select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'EET'; > > In summary, each of these texts, for two kinds of reason, can denote two > different utc_offset values.: > > CET > EET > CDT > CST > IST > > Am I missing an essential clue to resolving what seems to me to be a > paradox? Or am I seeing two kinds of bug? > > > -- Adrian Klaver adrian.klaver@aklaver.com
bryn@yugabyte.com wrote:Am I missing an essential clue to resolving what seems to me to be a paradox? Or am I seeing two kinds of bug?You are missing the material in appendix B.4. Date/Time Configuration Files
https://www.postgresql.org/docs/current/datetime-config-files.html
The short answer is that the reason there are two views is that there are two sources of truth involved. pg_timezone_names reflects the zone names defined in the IANA timezone database, while pg_timezone_abbrevs reflects the abbreviations defined in our user-customizable abbreviations table. It'd be impossible to make them match exactly, and we don't try exceedingly hard. In particular, the IANA list has some zones such as "CET" that don't follow their own continent/city naming convention. (AFAIK those are all legacy zones that they'd get rid of if they weren't concerned with backwards compatibility.) If those look like abbreviations, which they mostly do, then it's confusing.
Where the rubber meets the road is in timestamptz input, and there we consult the abbreviations table first. (Not sure if that's documented, but you can easily prove it by experiment.)
As for the question about "abbreviations" like +09 --- those are not abbreviations at all, they're just hard-coded numeric UTC offsets. So they don't appear in pg_timezone_abbrevs. IANA uses those as display offsets in zones where there's not any widely-used-on-the-ground abbreviation.
Thanks, as ever, David and Tom, for your quick responses. Thanks also to Adrian Klaver, who replied in a branched thread with this—in response to my comment about my reading of the information content of the pg_timezone_abbrevs view: « This claims (as I read it) that a time zone abbreviation uniquely determines an offset from UTC. »
adrian.klaver@aklaver.com wrote:It says no such thing and would be lying if it did. Take a look at this:
https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations
and see the abbreviations that share offsets.
What it is saying that, for example, the timezone America/Los_Angeles has two timezone abbreviations PDT (what I'm currently in) and PST. If you use an abbreviation you don't get the DST transition rules that a full timezone name has.
______________________________________________________________________
Firstly, David's response.
I (re)read "B.4. Date/Time Configuration Files" and located this directory on my macOS Big Sur for my PG Version 13.2 installation:
/usr/local/share/postgresql/timezonesets
I saw that it has files with names like America.txt, Asia.txt, Europe.txt, and so on. I opened America.txt and read this:
# NOTE:
# This file is NOT loaded by the PostgreSQL database. It just serves as
# a template for timezones you could need. See the `Date/Time Support'
# appendix in the PostgreSQL documentation for more information.
# This file is NOT loaded by the PostgreSQL database. It just serves as
# a template for timezones you could need. See the `Date/Time Support'
# appendix in the PostgreSQL documentation for more information.
The x-ref'd Appendix doesn't x-ref further to an explanation of the steps to follow in order actually to add timezons that I might need. It does have an entry for PST.
# CONFLICT! PST is not unique
# Other timezones:
# - PST: Philippine Standard Time
PST -28800 # Pacific Standard Time
# Other timezones:
# - PST: Philippine Standard Time
PST -28800 # Pacific Standard Time
Note the comment about a conflict. This suggests to naïve readers like me that conflicts are to be avoided. It also has a PDT entry.
I also looked in Asia.txt and found this:
# CONFLICT! PST is not unique
# Other timezones:
# - PST: Pacific Standard Time (America)
PST 28800 # Philippine Standard Time
# Other timezones:
# - PST: Pacific Standard Time (America)
PST 28800 # Philippine Standard Time
I s'pose that this is what David wanted me to see. Conflicts are a bad thing—but not disallowed.
This does mean that the outcome of this:
select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'PST';
is undefined (unless I'm missing a statement of the priority rule for handling conflicts elsewhere in the PG doc) as I said in my email that started this thread, using a different example.
______________________________________________________________________
Secondly, Adrians's response.
Yes, the point that a timezone abbreviation does not uniquely determine the timezone offset is taken now. But notice this:
« In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC…»
from
"8.5.3. Time Zones"
This seems to me to be flat-out wrong. An abbreviation, in general, does not represent a specific offset from UTC. Rather, it can represent two or more different offsets.
This tells me that I must recommend never to use a timezone abbreviation in application code. It's anyway illegal as an argument for "set timezone". (If the same text happens to be a timezone name, well… that's what it is in that context.) And the result of using a timezone abbreviation in the "at time zone" clause is, in general, unspecified.
I'm convinced that the typical use case calls for using time zone names—exactly, as as been said, to benefit from the encoded DST transition rules that this brings. If I want to arrange a Zoom call with a colleague in Helsinki on, say, the Monday after the San Francisco "spring forward" weekend, and I don't remember when Finland springs forward, then I can do this to see what local time it is over there at my 09:00:
select '2021-03-20 09:00:00 America/Los_Angeles'::timestamptz at time zone 'Europe/Helsinki';
And if, for some strange reason, I want to find out what my local time in this example is in a +03:00 timezone (as this is shown in a ::text typecast of timestamptz value and not as POSIX has it) not caring about who does what with DST, I can spell it as I mean it:
select ('2021-03-20 09:00:00 America/Los_Angeles'::timestamptz at time zone 'UTC') + '3 hours'::interval;
This formulation is self-documenting. And it sidesteps all that rubbish about POSIZ saying "up" for what everybody else calls "down".
Because the purpose for the pg_timezone_abbrevs view (as it seems to me) is to control what's legal in the "at time zone" clause, I can forget it. Whatever it might say about utc_offset and is_dst is available, against the unique name, in pg_timezone_names.
This is an instructive example:
set timezone = 'America/Los_Angeles';
select
to_char('2021-11-07 08:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "1st 1:30",
to_char('2021-11-07 09:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "2nd 1:30";
select
to_char('2021-11-07 08:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "1st 1:30",
to_char('2021-11-07 09:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "2nd 1:30";
This is the result:
1st 1:30 | 2nd 1:30
--------------------+--------------------
01:30:00 PDT (-07) | 01:30:00 PST (-08)
--------------------+--------------------
01:30:00 PDT (-07) | 01:30:00 PST (-08)
The "(-07)" and "(-08)" convey real information. But the "PDT" and "PST" do not—except in the sense that one is culturally aware and can Google for things. The pg_timezone_names view never has PDT and PST at the same querying moment. Rather, it has only one according to the date. And this
select abbrev, utc_offset, is_dst::text
from pg_timezone_abbrevs
where abbrev in ('PST', 'PDT');
with this result:
PDT | -07:00:00 | true
PST | -08:00:00 | false
PST | -08:00:00 | false
gives just an illusion of value because the two abbreviations happen each to be unique in that view. Each could easily have many rows that, for each abbreviation, mix "true" and "false".
______________________________________________________________________
Thirdly, Tom's response.
About « the reason there are two views is that there are two sources of truth involved », well yes… and who wants two sources of truth? All the more reason to forget the pg_timezone_abbrevs view and never to use an abbreviation in the "at time zone" clause.
About « the IANA list has some zones such as "CET" that don't follow their own continent/city naming convention… If those look like abbreviations, which they mostly do, then it's confusing. », what about "UTC" itself. Sources like this"
The Difference Between GMT and UTC
point out that UTC is a time standard and not a time zone. But it's still useful to have it listed in pg_timezone_names. It's a challenge to work out how to recommend what subset of what's listed there to use. This page:
List of tz database time zones
is helpful because it shows the "cannonical" or not status of each entry. But consider this:
select name, abbrev, utc_offset, is_dst::text
from pg_timezone_names
where
lower(name) like 'etc/gmt%0%' or
lower(name) like 'etc/gmt%1%' or
lower(name) like 'etc/gmt%2%' or
lower(name) like 'etc/gmt%3%' or
lower(name) like 'etc/gmt%4%' or
lower(name) like 'etc/gmt%5%' or
lower(name) like 'etc/gmt%6%' or
lower(name) like 'etc/gmt%7%' or
lower(name) like 'etc/gmt%8%' or
lower(name) like 'etc/gmt%9%'
order by utc_offset;
from pg_timezone_names
where
lower(name) like 'etc/gmt%0%' or
lower(name) like 'etc/gmt%1%' or
lower(name) like 'etc/gmt%2%' or
lower(name) like 'etc/gmt%3%' or
lower(name) like 'etc/gmt%4%' or
lower(name) like 'etc/gmt%5%' or
lower(name) like 'etc/gmt%6%' or
lower(name) like 'etc/gmt%7%' or
lower(name) like 'etc/gmt%8%' or
lower(name) like 'etc/gmt%9%'
order by utc_offset;
with this result:
Etc/GMT+12 | -12 | -12:00:00 | false
Etc/GMT+11 | -11 | -11:00:00 | false
Etc/GMT+10 | -10 | -10:00:00 | false
...
Etc/GMT+1 | -01 | -01:00:00 | false
Etc/GMT+0 | GMT | 00:00:00 | false
Etc/GMT-0 | GMT | 00:00:00 | false
Etc/GMT0 | GMT | 00:00:00 | false
Etc/GMT-1 | +01 | 01:00:00 | false
...
Etc/GMT-12 | +12 | 12:00:00 | false
Etc/GMT-13 | +13 | 13:00:00 | false
Etc/GMT-14 | +14 | 14:00:00 | false
Etc/GMT+11 | -11 | -11:00:00 | false
Etc/GMT+10 | -10 | -10:00:00 | false
...
Etc/GMT+1 | -01 | -01:00:00 | false
Etc/GMT+0 | GMT | 00:00:00 | false
Etc/GMT-0 | GMT | 00:00:00 | false
Etc/GMT0 | GMT | 00:00:00 | false
Etc/GMT-1 | +01 | 01:00:00 | false
...
Etc/GMT-12 | +12 | 12:00:00 | false
Etc/GMT-13 | +13 | 13:00:00 | false
Etc/GMT-14 | +14 | 14:00:00 | false
(Notice the "up is down" POSIX silliness.) The Wikipedia "List of tz database time zones" has each of these as canonical. But I intend to recommend avoiding using these for two reasons. First reason: 'cos each name contains a number, it isn't controlled by the names in pg_timezone_names (like David pointed out in another email).
Try this (simulating a keystroke bounce typo):
set timezone = 'Etc/GMT+122';
show timezone;
It's silently accepted and gives "ETC/GMT+122" with "show". And then, in turn, this:
select '2021-03-20 09:00:00'::timestamptz;
gives this result:
2021-03-20 09:00:00-122
Nonsense, eh? As David said, it's an instance of the more general:
set timezone = 'Foo42Bar';
show timezone;
I wish there was a way to turn this off and accept only pg_timestamp_names.name values.
The second reason is that the abbreviations confuse ordinary readers who are slow to remember the "up is down" story.
On 5/19/21 5:50 PM, Bryn Llewellyn wrote: > Thanks, as ever, David and Tom, for your quick responses. Thanks also to > Adrian Klaver, who replied in a branched thread with this—in response to > my comment about my reading of the information content of the > pg_timezone_abbrevs view: « This claims (as I read it) that a time zone > abbreviation uniquely determines an offset from UTC. » > > > *Secondly, Adrians's response.* > > Yes, the point that a timezone abbreviation does not uniquely determine > the timezone offset is taken now. But notice this: > > « In short, this is the difference between abbreviations and full names: > abbreviations represent a specific offset from UTC…» > from > > "8.5.3. Time Zones" > https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES > <https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES> > > This seems to me to be flat-out wrong. An abbreviation, in general, does > not represent a specific offset from UTC. Rather, it can represent two > or more different offsets. It is not flat out wrong. An abbreviation, say the one I'm in now PDT, will only represent a specific offset(-07), whereas the timezone I'm in, America/Los_Angeles, represents two offsets(-08/-07) the value of which depends on the date. Now there maybe another abbreviation that uses that same offset, but again it only represents a single offset. > Nonsense, eh? As David said, it's an instance of the more general: > > set timezone = 'Foo42Bar'; > show timezone; > > I wish there was a way to turn this off and accept only > pg_timestamp_names.name values. > > The second reason is that the abbreviations confuse ordinary readers who > are slow to remember the "up is down" story. > The issue is you are looking for logic in a system that is based on political decisions. For instance there is a brewing West Coast movement, whereby the states on the US West Coast are looking to drop the DST transition with or without the approval of Congress. COVID stalled it, but I expect it will appear again in the near future. -- Adrian Klaver adrian.klaver@aklaver.com
On 2021-05-18 23:31:57 -0700, Bryn Llewellyn wrote: > Africa/Algiers | CET | 01:00:00 | f > Africa/Cairo | EET | 02:00:00 | f > Africa/Tripoli | EET | 02:00:00 | f > Africa/Tunis | CET | 01:00:00 | f > CET | CEST | 02:00:00 | t > EET | EEST | 03:00:00 | t > Egypt | EET | 02:00:00 | f > Europe/Kaliningrad | EET | 02:00:00 | f > Libya | EET | 02:00:00 | f > > This tells me that when CET is used as a timezone name, it denotes an offset > of 02:00—at least at some times of the year. And when the same text is used as > an abbrev, it denotes an offset of 01:00. I think you are confused by the fact that it is currently summer (well, spring) and that some countries in the Central European Time zone observe DST and others don't. So "Africa/Tunis" shows up with an abbreviation of "CET" and an offset of 01:00:00 while "Europe/Vienna" has the abbreviation "CEST" and an offset of 02:00:00. Two months ago Vienna would also had the abbreviation "CET" and an offset of 01:00:00. The time zone "CET" probably reflects what most countries in that zone do, so it is currently also in DST. Same for Eastern European Time. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote: > On 5/18/21 11:31 PM, Bryn Llewellyn wrote: > > This seems to be at odds with what section “8.5.3. Time Zones” at > > > > https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES <https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES> > > > > says: > > > > « > > A time zone abbreviation, for example PST. Such a specification merely > > defines a particular offset from UTC, in contrast to full time zone > > names which can imply a set of daylight savings transition rules as > > well. The recognized abbreviations are listed in > > the pg_timezone_abbrevs view (see Section 51.91). You cannot set the > > configuration parameters TimeZone or log_timezone to a time zone > > abbreviation, but you can use abbreviations in date/time input values > > and with the AT TIME ZONE operator. > > » > > > > This claims (as I read it) that a time zone abbreviation uniquely > > determines an offset from UTC. > > It says no such thing Maybe that's the inherent ambiguity of the English language, but to me "Such a specification defines a particular offset from UTC" does imply a one-to-one mapping from abbreviation to offset. > and would be lying if it did. As far as pg_timezone_abbrevs is concerned, it is correct. abbrev is unique in that view: hjp=> select abbrev, count(*) from pg_timezone_abbrevs group by 1 having count(*) > 1; ╔════════╤═══════╗ ║ abbrev │ count ║ ╟────────┼───────╢ ╚════════╧═══════╝ (0 rows) (at least in PostgreSQL 11.12) It is not true in "the real world", where different time zones may use the same abbreviation. But that isn't what the paragraph is about. > What it is saying that, for example, the timezone America/Los_Angeles has > two timezone abbreviations PDT(what I'm currently in) and PST. If you use an > abbreviation you don't get the DST transition rules that a full timezone > name has. Yes, does also say that. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2021-05-22 12:09:23 +0200, Peter J. Holzer wrote: > On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote: > > On 5/18/21 11:31 PM, Bryn Llewellyn wrote: > > > This seems to be at odds with what section “8.5.3. Time Zones” at > > > > > > https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES <https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES> > > > > > > says: > > > > > > « > > > A time zone abbreviation, for example PST. Such a specification merely > > > defines a particular offset from UTC, in contrast to full time zone > > > names which can imply a set of daylight savings transition rules as > > > well. The recognized abbreviations are listed in > > > the pg_timezone_abbrevs view (see Section 51.91). You cannot set the > > > configuration parameters TimeZone or log_timezone to a time zone > > > abbreviation, but you can use abbreviations in date/time input values > > > and with the AT TIME ZONE operator. > > > » > > > > > > This claims (as I read it) that a time zone abbreviation uniquely > > > determines an offset from UTC. > > > > It says no such thing > > Maybe that's the inherent ambiguity of the English language, but to me > "Such a specification defines a particular offset from UTC" does imply a > one-to-one mapping from abbreviation to offset. And I just realised that "one-to-one" isn't the right term. Mathematically it would be "functional": There is exactly one offset for each abbreviation (never two or more; there might be zero but in that case one could argue that this isn't actually a time zone abbreviation), but several abbreviations can map to the same offset. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 5/22/21 3:09 AM, Peter J. Holzer wrote: > On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote: >> On 5/18/21 11:31 PM, Bryn Llewellyn wrote: >>> This seems to be at odds with what section “8.5.3. Time Zones” at >>> >>> https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES <https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES> >>> >>> says: >>> >>> « >>> A time zone abbreviation, for example PST. Such a specification merely >>> defines a particular offset from UTC, in contrast to full time zone >>> names which can imply a set of daylight savings transition rules as >>> well. The recognized abbreviations are listed in >>> the pg_timezone_abbrevs view (see Section 51.91). You cannot set the >>> configuration parameters TimeZone or log_timezone to a time zone >>> abbreviation, but you can use abbreviations in date/time input values >>> and with the AT TIME ZONE operator. >>> » >>> >>> This claims (as I read it) that a time zone abbreviation uniquely >>> determines an offset from UTC. >> >> It says no such thing > > Maybe that's the inherent ambiguity of the English language, but to me > "Such a specification defines a particular offset from UTC" does imply a > one-to-one mapping from abbreviation to offset. As your later post points out the reality is a given abbreviation has only one offset, but an offset may have many abbreviations. > >> and would be lying if it did. > > As far as pg_timezone_abbrevs is concerned, it is correct. abbrev is > unique in that view: 1) Go to Postgres cluster and in ~/share/timezonesets search in Defaults.txt for the word C(c)onflict. 2)Then go here https://www.postgresql.org/docs/current/datetime-config-files.html and see how the below query could be made to change if someone is not aware of 1). More below. > > hjp=> select abbrev, count(*) from pg_timezone_abbrevs group by 1 having count(*) > 1; > ╔════════╤═══════╗ > ║ abbrev │ count ║ > ╟────────┼───────╢ > ╚════════╧═══════╝ > (0 rows) > > (at least in PostgreSQL 11.12) > > It is not true in "the real world", where different time zones may use > the same abbreviation. But that isn't what the paragraph is about. ~/share/timezonesets encompasses the 'real world'. > > >> What it is saying that, for example, the timezone America/Los_Angeles has >> two timezone abbreviations PDT(what I'm currently in) and PST. If you use an >> abbreviation you don't get the DST transition rules that a full timezone >> name has. > > Yes, does also say that. > > hp > -- Adrian Klaver adrian.klaver@aklaver.com
On 2021-05-22 08:26:27 -0700, Adrian Klaver wrote: > On 5/22/21 3:09 AM, Peter J. Holzer wrote: > > On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote: > > > On 5/18/21 11:31 PM, Bryn Llewellyn wrote: > > > > This claims (as I read it) that a time zone abbreviation uniquely > > > > determines an offset from UTC. > > > > > > It says no such thing > > > > Maybe that's the inherent ambiguity of the English language, but to me > > "Such a specification defines a particular offset from UTC" does imply a > > one-to-one mapping from abbreviation to offset. > > As your later post points out the reality is a given abbreviation has only > one offset, but an offset may have many abbreviations. Which is what Bryn wrote. (But I realize I'm getting into an argument about what another person meant - again. I should stop that. If Bryn thinks it is important to be understood he can explain himself.) > > > and would be lying if it did. > > > > As far as pg_timezone_abbrevs is concerned, it is correct. abbrev is > > unique in that view: > > 1) Go to Postgres cluster and in ~/share/timezonesets search in Defaults.txt > for the word C(c)onflict. > > 2)Then go here > https://www.postgresql.org/docs/current/datetime-config-files.html and see > how the below query could be made to change if someone is not aware of 1). Actually, no. If you introduce a conflict in a timezoneset file the database doesn't even start: 2021-05-22 19:47:17 CEST [399103]: [1-1] user=,db=,pid=399103 FATAL: time zone abbreviation "test" is multiply defined 2021-05-22 19:47:17 CEST [399103]: [2-1] user=,db=,pid=399103 DETAIL: Entry in time zone file "Default", line 634, conflictswith entry in file "Default", line 635. pg_ctl: could not start server So abbrev is a unique key on pg_timezone_abbrevs and the documentation is correct. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 5/22/21 10:55 AM, Peter J. Holzer wrote: > On 2021-05-22 08:26:27 -0700, Adrian Klaver wrote: >> On 5/22/21 3:09 AM, Peter J. Holzer wrote: >>> On 2021-05-19 06:57:13 -0700, Adrian Klaver wrote: >>>> On 5/18/21 11:31 PM, Bryn Llewellyn wrote: >>>>> This claims (as I read it) that a time zone abbreviation uniquely >>>>> determines an offset from UTC. >>>> >>>> It says no such thing >>> >>> Maybe that's the inherent ambiguity of the English language, but to me >>> "Such a specification defines a particular offset from UTC" does imply a >>> one-to-one mapping from abbreviation to offset. >> >> As your later post points out the reality is a given abbreviation has only >> one offset, but an offset may have many abbreviations. > > Which is what Bryn wrote. (But I realize I'm getting into an argument > about what another person meant - again. I should stop that. If Bryn > thinks it is important to be understood he can explain himself.) > > >>>> and would be lying if it did. >>> >>> As far as pg_timezone_abbrevs is concerned, it is correct. abbrev is >>> unique in that view: >> >> 1) Go to Postgres cluster and in ~/share/timezonesets search in Defaults.txt >> for the word C(c)onflict. >> >> 2)Then go here >> https://www.postgresql.org/docs/current/datetime-config-files.html and see >> how the below query could be made to change if someone is not aware of 1). > > Actually, no. If you introduce a conflict in a timezoneset file the > database doesn't even start: > > 2021-05-22 19:47:17 CEST [399103]: [1-1] user=,db=,pid=399103 FATAL: time zone abbreviation "test" is multiply defined > 2021-05-22 19:47:17 CEST [399103]: [2-1] user=,db=,pid=399103 DETAIL: Entry in time zone file "Default", line 634, conflictswith entry in file "Default", line 635. > pg_ctl: could not start server > > So abbrev is a unique key on pg_timezone_abbrevs and the documentation > is correct. I see no unique key. That led me to the wrong assumption that duplicate abbreviations could exist in the view. I do see this, which I had not looked at before: https://www.postgresql.org/docs/12/datetime-config-files.html "The @OVERRIDE syntax indicates that subsequent entries in the file can override previous entries (typically, entries obtained from included files). Without this, conflicting definitions of the same timezone abbreviation are considered an error." " Note If an error occurs while reading the time zone abbreviation set, no new value is applied and the old set is kept. If the error occurs while starting the database, startup fails. " So the error I'm thinking, comes during the reading and processing of the file contents in pg_timezone_abbrevs(), which is what the view is built on. I learned something, Postgres errors before allowing a conflict. > > hp > -- Adrian Klaver adrian.klaver@aklaver.com
Tom, David, Adrian, and Peter—thank you all very much for taking an interest in my questions. Your responses have, collectively, been an enormous help. I deleted the text of the exchanges in this particular branch of the overall discussion because it's become rather difficult to work out who said what in response to what. It's all there in the "pgsql-general" email archive.
Here's my summary (in no particular order) of what I've learned. Please feel free to ignore it.
----------------------------------------------------------------------
(1) In the context of discussing timezones, the English word "abbreviation" maps to two distinct notions in PG: "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom earlier said « there are two views [because there are] two sources of truth involved ». This really means that these two columns list different classes of facts. It's therefore unfortunate that they both have the same name.
----------------------------------------------------------------------
(2) It's no wonder, therefore, that I was confused. Anybody with even a slight exposure to relational design would guess that "pg_timezone_names.abbrev" is a FK to the PK in "pg_timezone_abbrevs.abbrev". And they might wonder why "utc_offset" and "is_dst" seem to be denormalized. But they'd probably recall that such things are common, as a usability convenience, in views. Anyway, I'm beyond that confusion now.
----------------------------------------------------------------------
(3) It helps me to think of "pg_timezone_names.abbrev" as "nickname"—which notion is unique, just for each name. It's useful only informally, and only when a particular timezone observes DST, as a shorthand for disambiguation. I used the example earlier:
set timezone = 'America/Los_Angeles';
select
to_char('2021-11-07 08:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "1st 1:30",
to_char('2021-11-07 09:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "2nd 1:30";
select
to_char('2021-11-07 08:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "1st 1:30",
to_char('2021-11-07 09:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "2nd 1:30";
gets:
1st 1:30 | 2nd 1:30
--------------------+--------------------
01:30:00 PDT (-07) | 01:30:00 PST (-08)
--------------------+--------------------
01:30:00 PDT (-07) | 01:30:00 PST (-08)
The fact that you can find another nickname spelled "PST" in "pg_timezone_names" doesn't matter: This (as of this email's time stamp):
select name, abbrev, utc_offset from pg_timezone_names where abbrev = 'PST';
gets this:
name | abbrev | utc_offset
-------------+--------+------------
Asia/Manila | PST | 08:00:00
-------------+--------+------------
Asia/Manila | PST | 08:00:00
The fact that Asia/Manila happens not to observe DST makes my example a little less powerful. Never mind, their politicians could decide to introduce it presently and to give DST the nickname BST (or DOG). It's remarkable, in itself, that "ST" in "BST" means "Summer Time" but that it means "Standard Time" in "PST". But this nicely unerlines the point that there's no rhyme or reason in the design of these abbreviations.
Notably, the "nickname" in "pg_timezone_names" has nothing at all (formally) to do with "abbrev" in "pg_timezone_abbrevs".
----------------------------------------------------------------------
(4) My realization, as set out in #3, helps me now to understand the rule:
« PostgreSQL allows you to specify time zones in three different forms... A time zone abbreviation, for example PST [but ONLY] in date/time input values and with the AT TIME ZONE operator… Such a specification merely defines a particular offset from UTC... The recognized abbreviations are listed in the pg_timezone_abbrevs view »
This (to my embarrassment, now) does say that a "nickname" from "pg_timezone_names" is not allowed as the argument for the AT TIME ZONE operator. But it doesn't adumbrate that the English word "abbreviation" means what I now understand it to mean. You need to get the point from one terse sentence without the luxury of examples to strengthen the explanation.
I mentioned somewhere that I had discovered that this:
select * from pg_timezone_names where abbrev = 'CAT';
gets lots of rows (all with utc_offset = '02:00'::interval) but that this:
select * from pg_timezone_abbrevs where abbrev = 'CAT';
gets no rows. This informs the test that Tom suggested that I might do:
select '2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'CAT';
It causes this error:
22023: time zone "CAT" not recognized
Notice that this:
select '2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'CAT42';
silently succeeds. Sure enough, the doc does say « PostgreSQL will accept POSIX-style time zone specifications, as described in Section B.5. [as the argument for AT TIME ZONE] » But what a quagmire of confusion this is. I said elsewhere that the same aim (get the local time at -42 hours w.r.t. UTC) can be met with transparent syntax, thus:
select ('2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'UTC') - '42:00'::interval;
I know which syntax I prefer!
----------------------------------------------------------------------
(5) I was embarrassingly slow to find this in the doc for the SET command:
« Timezone settings given as numbers or intervals are internally translated to POSIX timezone syntax. For example, after SET TIME ZONE -7, SHOW TIME ZONE would report <-07>+07. »
Search tools are famously feeble for finding such things. This instructs the careful reader to beware of this trap:
-- OK.
set timezone = 'America/Los_Angeles';
-- Clear error 22023: invalid value for parameter "TimeZone".
set timezone = 'America/Los_Angelez';
-- Silently succeeds.
set timezone = 'America/Los_Angele9';
(David pointed this out.) This is confusing in itself. And it's compounded by this fact: POSIX uses positive offsets to denote what both the utc_offset column and to_char(<some timestamptz value>), with the format element TH:TM, shows as negative. But it is what it is.
----------------------------------------------------------------------
(6) To the extent that a catalog view can have a business unique key (and I believe that this must be the case for it to be useful), the unique key for "pg_timezone_names" is "name" and the unique key for "pg_timezone_abbrevs" is "abbrev". I'm going to assume that PG has code to maintain these invariants. This gives me the way to interpret the statement "an [abbreviation] specification merely defines a particular offset from UTC".
Peter showed us a query like this:
select count(*) from (
select abbrev from pg_timezone_abbrevs
group by abbrev
having count(*) > 1) as a;
In my un-tampered-with PG 13.2 env, it gets a count of zero. I like this.
select count(*) from (
select abbrev from pg_timezone_abbrevs
group by abbrev
having count(*) > 1) as a;
In my un-tampered-with PG 13.2 env, it gets a count of zero. I like this.
Adrian pointed me to the "B.4. Date/Time Configuration Files" Appendix here:
and to this in particular:
«
The @OVERRIDE syntax indicates that subsequent entries in the file can override previous entries (typically, entries obtained from included files). Without this, conflicting definitions of the same timezone abbreviation are considered an error.
«
The @OVERRIDE syntax indicates that subsequent entries in the file can override previous entries (typically, entries obtained from included files). Without this, conflicting definitions of the same timezone abbreviation are considered an error.
»
So even without "timezone_abbreviations" set to "default" at session level, "pg_timezone_abbrevs.abbrev" will always be unique.
----------------------------------------------------------------------
(7) There are two different syntaxes for setting the timezone session environment variable. This (or with TO instead of =):
set timezone =
and this:
set time zone
This, too, is hugely confusing. (Correspondingly, there's both "show timezone" and "show time zone".)
The first "set" (and "show") alternative is consistent with how you set all other session environment variables. Moreover, all are observable with the current_setting() built-in function. And there the argument "timezone" can only be spelled as one word. This makes me favor the "set timezone" spelling.
However, only the "set time zone" spelling allows an argument that's an explicit interval value like this:
set time zone interval '-08:00';
I said "explicit interval value" because this isn't true. For example, these two fail:
set time zone '-08:00'::interval;
and
set time zone make_interval(hours=>-8);
both fail, even though each uses a legal interval value. This is confusion on top of confusion. All the more reason to avoid it.
On 5/22/21 5:52 PM, Bryn Llewellyn wrote: > Tom, David, Adrian, and Peter—thank you all very much for taking an > interest in my questions. Your responses have, collectively, been an > enormous help. I deleted the text of the exchanges in this particular > branch of the overall discussion because it's become rather difficult to > work out who said what in response to what. It's all there in the > "pgsql-general" email archive. > > Here's my summary (in no particular order) of what I've learned. Please > feel free to ignore it. > > ---------------------------------------------------------------------- > > (1) In the context of discussing timezones, the English word > "abbreviation" maps to two distinct notions in PG: > "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom > earlier said « there are two views [because there are] two sources of > truth involved ». This really means that these two columns list > different classes of facts. It's therefore unfortunate that they both > have the same name. They are reporting the same facts, just in different context. pg_timezone_abbrevs.abbrev is a list of un-duplicated, as I found out, abbreviations. pg_timezone_names.abbrev shows the abbreviations or offsets in effect at CURRENT_TIMESTAMP. > > ---------------------------------------------------------------------- > > (2) It's no wonder, therefore, that I was confused. Anybody with even a > slight exposure to relational design would guess > that "pg_timezone_names.abbrev" is a FK to the PK in > "pg_timezone_abbrevs.abbrev". And they might wonder why "utc_offset" and > "is_dst" seem to be denormalized. But they'd probably recall that such > things are common, as a usability convenience, in views. Anyway, I'm > beyond that confusion now. Since the views are based on functions that are displaying different context I don't find it surprising. > > ---------------------------------------------------------------------- > > (3) It helps me to think of "pg_timezone_names.abbrev" as > "nickname"—which notion is unique, just for each name. It's useful only > informally, and only when a particular timezone observes DST, as a > shorthand for disambiguation. I used the example earlier: > > > The fact that Asia/Manila happens not to observe DST makes my example a > little less powerful. Never mind, their politicians could decide to > introduce it presently and to give DST the nickname BST (or DOG). It's > remarkable, in itself, that "ST" in "BST" means "Summer Time" but that > it means "Standard Time" in "PST". But this nicely unerlines the point > that there's no rhyme or reason in the design of these abbreviations. > > Notably, the "nickname" in "pg_timezone_names" has nothing at all > (formally) to do with "abbrev" in "pg_timezone_abbrevs". Yes it does. It just there is not always a one-to-one correspondence between the two. For instance duplicate abbreviations do exist but they are filtered out of pg_timezone_abbrevs. > > ---------------------------------------------------------------------- > > (4) My realization, as set out in #3, helps me now to understand the rule: > > « PostgreSQL allows you to specify time zones in three different > forms... A time zone abbreviation, for example PST [but ONLY] in > date/time input values and with the AT TIME ZONE operator… Such a > specification merely defines a particular offset from UTC... The > recognized abbreviations are listed in the pg_timezone_abbrevs view » > > This (to my embarrassment, now) does say that a "nickname" from > "pg_timezone_names" is not allowed as the argument for the AT TIME ZONE > operator. But it doesn't adumbrate that the English word "abbreviation" > means what I now understand it to mean. You need to get the point from > one terse sentence without the luxury of examples to strengthen the > explanation. > > silently succeeds. Sure enough, the doc does say « PostgreSQL will > accept POSIX-style time zone specifications, as described in Section > B.5. [as the argument for AT TIME ZONE] » But what a quagmire of > confusion this is. I said elsewhere that the same aim (get the local > time at -42 hours w.r.t. UTC) can be met with transparent syntax, thus: > > *select ('2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time > zone 'UTC') - '42:00'::interval;* > > I know which syntax I prefer! That is the root of the above, preference. There are many ways to express time zones and as a general purpose database Postgres needs to deal with all of them. > > ---------------------------------------------------------------------- > > (5) I was embarrassingly slow to find this in the doc for the SET command: > > « Timezone settings given as numbers or intervals are internally > translated to POSIX timezone syntax. For example, after SET TIME ZONE > -7, SHOW TIME ZONE would report <-07>+07. » > > (David pointed this out.) This is confusing in itself. And it's > compounded by this fact: POSIX uses positive offsets to denote what both > the utc_offset column and to_char(<some timestamptz value>), with the > format element TH:TM, shows as negative. But it is what it is. See my earlier post about trying to find logic in this. > > ---------------------------------------------------------------------- > > (6) To the extent that a catalog view can have a business unique key > (and I believe that this must be the case for it to be useful), the > unique key for "pg_timezone_names" is "name" and the unique key for > "pg_timezone_abbrevs" is "abbrev". I'm going to assume that PG has code > to maintain these invariants. This gives me the way to interpret the > statement "an [abbreviation] specification merely defines a particular > offset from UTC". > > > « > The @OVERRIDE syntax indicates that subsequent entries in the file can > override previous entries (typically, entries obtained from included > files). Without this, conflicting definitions of the same timezone > abbreviation are considered an error. > » > > So even without "timezone_abbreviations" set to "default" at session > level, "pg_timezone_abbrevs.abbrev" will always be unique. True. > > ---------------------------------------------------------------------- > > (7) There are two different syntaxes for setting the timezone session > environment variable. This (or with TO instead of =): > > *set timezone =* > > and this: > > *set time zone* > > This, too, is hugely confusing. (Correspondingly, there's both "show > timezone" and "show time zone".) > > The first "set" (and "show") alternative is consistent with how you set > all other session environment variables. Moreover, all are observable > with the current_setting() built-in function. And there the argument > "timezone" can only be spelled as one word. This makes me favor the "set > timezone" spelling. > > However, only the "set time zone" spelling allows an argument that's an > explicit interval value like this: > > *set time zone interval '-08:00';* > > I said "explicit interval value" because this isn't true. For example, > these two fail: > > *set time zone '-08:00'::interval;* > > and > > *set time zone make_interval(hours=>-8);* > > both fail, even though each uses a legal interval value. This is > confusion on top of confusion. All the more reason to avoid it. It is spelled out here: https://www.postgresql.org/docs/12/sql-set.html Basically set time zone is for customizing the timezone value, whereas set timezone is for using 'standard' timezone values. They both end up setting the same value. > > > > > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > On 5/22/21 5:52 PM, Bryn Llewellyn wrote: >> (1) In the context of discussing timezones, the English word >> "abbreviation" maps to two distinct notions in PG: >> "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom >> earlier said « there are two views [because there are] two sources of >> truth involved ». This really means that these two columns list >> different classes of facts. It's therefore unfortunate that they both >> have the same name. > They are reporting the same facts, just in different context. No, Bryn's right really. pg_timezone_names.abbrev reports out what the IANA (tzdb) database says is the abbreviation in use at the current time in the indicated time zone. pg_timezone_abbrevs.abbrev reports about names defined by the timezone_abbreviations configuration file. While those ought to be generally in sync, there's no expectation that they necessarily match one-to-one; they really can't given that the same abbreviation can be in use to mean different things in different parts of the world. IANA will happily report such inconsistent abbreviations, but timezone_abbreviations has to treat names as unique keys, so at most one meaning of a given abbreviation will be available there. Indeed there might be *no* entry there for an abbreviation that appears in some row of pg_timezone_names, particularly if an installation has chosen to trim the timezone_abbreviations contents in order to reduce the potential for bad data entry. Conversely, there are quite a few entries in our standard timezone_abbreviations list that match no entry in tzdb. (Most of them used to, but IANA has been on a kick lately of removing abbreviations that they invented.) And that's before you even get into the question of dubious custom mods of timezone_abbreviations. Or custom mods of the tzdb files, for that matter. Or timezone_abbreviations files that aren't in sync with the tzdb database that's in use, which is an extremely common situation given that we recommend use of --with-system-tzdata. regards, tom lane
tgl@sss.pgh.pa.us wrote:adrian.klaver@aklaver.com wote:Bryn wrote:(1) In the context of discussing timezones, the English word "abbreviation" maps to two distinct notions in PG: "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom earlier said « there are two views [because there are] two sources of truth involved ». This really means that these two columns list different classes of facts. It's therefore unfortunate that they both have the same name.They are reporting the same facts, just in different context.No, Bryn's right really. pg_timezone_names.abbrev reports out what the IANA (tzdb) database says is the abbreviation in use at the current time in the indicated time zone. pg_timezone_abbrevs.abbrev reports about names defined by the timezone_abbreviations configuration file. While those ought to be generally in sync, there's no expectation that they necessarily match one-to-one; they really can't given that the same abbreviation can be in use to mean different things in different parts of the world. IANA will happily report such inconsistent abbreviations, but timezone_abbreviations has to treat names as unique keys, so at most one meaning of a given abbreviation will be available there. Indeed there might be *no* entry there for an abbreviation that appears in some row of pg_timezone_names, particularly if an installation has chosen to trim the timezone_abbreviations contents in order to reduce the potential for bad data entry. Conversely, there are quite a few entries in our standard timezone_abbreviations list that match no entry in tzdb. (Most of them used to, but IANA has been on a kick lately of removing abbreviations that they invented.) And that's before you even get into the question of dubious custom mods of timezone_abbreviations. Or custom mods of the tzdb files, for that matter. Or timezone_abbreviations files that aren't in sync with the tzdb database that's in use, which is an extremely common situation given that we recommend use of "--with-system-tzdata".
Thank you very much, Tom. I should have taken more care with my wording. You said it very well, of course, here.
It seems to me, against the background that we've been discussing, that the pg_timezone_abbrevs view serves just this purpose: to provide a list of text strings that can be used as the argument for "at time zone" and as the tz specification in a timestamptz literal. Unfortunately, but for reasons that I can only accept, the pg_timezone_abbrevs view translates abbrev key values that are found into an offset value. But it doesn't necessarily error when presented with a key that it doesn't have. This, too, has been discussed at length in this exchanges. These examples make the point.
set timezone = 'UTC';
select '2021-05-23 19:00:00 foo42bar'::timestamptz;
select '2021-05-23 19:00:00'::timestamptz at time zone 'bar99foo';
Sadly, there's no simple way to enforce a practice for applications that want to avoid this risk unless, maybe, every such plain timestamp expression is constructed programmatically.
There is hope for this:
set timezone = 'bar99foo';
brought by the possibility of writing one's own set_timezone() function that checks the input values against pg_timezone_names.name. But it would win only half the battle.
On 2021-05-23 12:55:52 -0700, Bryn Llewellyn wrote: > But it doesn't necessarily error when presented with a key that it > doesn't have. This, too, has been discussed at length in this > exchanges. These examples make the point. > > set timezone = 'UTC'; > select '2021-05-23 19:00:00 foo42bar'::timestamptz; > select '2021-05-23 19:00:00'::timestamptz at time zone 'bar99foo'; This is an unfortunate side effect of the flexibility of posix-style timezone specifications. That flexibility was very useful when the format was invented in the 1980's. The Olson database may or may not have existed at the time (Wikipedia says its "origins go back to 1986 or earlier"), but it wasn't well known and system administrators were expected to set a TZ environment variable with the correct rules for their time zone, so it had to be flexible enough for all time zones in the world (not sure if it actually achieved that goal). When the Olson (now IANA) timezone database spread that flexibility became mostly obsolete but it might still be needed sometimes (for example Turkey has in recent times changed the date for a DST switch at very short notice so some administrators may have had to override their system's rules when the vendor didn't issue a patch in time). > Sadly, there's no simple way to enforce a practice for applications that want > to avoid this risk unless, maybe, every such plain timestamp expression is > constructed programmatically. I think there are two possibilities: 1. Give the user a list of possible timezones to choose from (maybe a simple dropdown menu, maybe a fancy map where they can click on their location) and don't allow them to enter timezone names manually. 2. Ignore the problem. If a user enters a time zone of "foo42bar" either they know what they are doing or they made a mistake. The latter case is really not that different from entering a wrong but existing timezone or making a typo in the date or time. In any case you now have a wrong timestamp in your database which you may or may not be able to catch via other QA measures. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"