Thread: Domain based on TIMEZONE WITH TIME ZONE
Hi,
I'm using a domain to specialize the built in TIMEZONE WITH TIME ZONE type. I want to sanity check this approach before continuing to use this.
I want to prevent timestamps with non-UTC offsets from getting inserted into the database. Having a UTC-only database at the schema level means no app, proc, script or load operation can inadvertently supply a local time offset.
The domain is defined as:
CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0);
CREATE TYPE UTC_RANGE AS RANGE ( SUBTYPE = UTC_TIMESTAMP );
My potentially misguided assumption is that I am effectively re-using a check constraint across the schema.
Is this crazy?
Am I missing the point of how Postgres stores TIMEZONE WITH TIME ZONE internally?
Thanks in advance,
Ben
Sorry about the bug in the subject: the data type is TIMESTAMP WITH TIME ZONE, not TIMEZONE WITH TIME ZONE
On 10 May 2018, at 09:03, Ben Hood <ben@relops.com> wrote:Hi,I'm using a domain to specialize the built in TIMEZONE WITH TIME ZONE type. I want to sanity check this approach before continuing to use this.I want to prevent timestamps with non-UTC offsets from getting inserted into the database. Having a UTC-only database at the schema level means no app, proc, script or load operation can inadvertently supply a local time offset.The domain is defined as:CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0);CREATE TYPE UTC_RANGE AS RANGE ( SUBTYPE = UTC_TIMESTAMP );My potentially misguided assumption is that I am effectively re-using a check constraint across the schema.Is this crazy?Am I missing the point of how Postgres stores TIMEZONE WITH TIME ZONE internally?Thanks in advance,Ben
On Thu, May 10, 2018 at 10:03 AM, Ben Hood <ben@relops.com> wrote: ... > Am I missing the point of how Postgres stores TIMEZONE WITH TIME ZONE > internally? After reading in the follow up TIMEZONE is a typo for TIMESTAMP, yes, you are. Aproximately, postgres stores ( virtually ) a point in the time line, for both with and without ts types, same format. Something like a real number. The difference is mainly for transforming from/to text ( bear in mind when you put a constant in a query you are trasnforming from text ). In the with time zone case it formats/expects it as a time string in the session configured time zone, in the without case it treats it ( aproximately ) as if it was in utc ( and then discards the "+00" after formating ). Maybe I'm confussing you more, its not too easy to explain. The point is TIMEZONE is not stored in either of them. Francisco Olarte.
> On 10 May 2018, at 09:59, Francisco Olarte <folarte@peoplecall.com> wrote: > > On Thu, May 10, 2018 at 10:03 AM, Ben Hood <ben@relops.com> wrote: > ... >> Am I missing the point of how Postgres stores TIMEZONE WITH TIME ZONE >> internally? > > After reading in the follow up TIMEZONE is a typo for TIMESTAMP, yes, you are. > > Aproximately, postgres stores ( virtually ) a point in the time line, > for both with and without ts types, same format. Something like a real > number. > > The difference is mainly for transforming from/to text ( bear in mind > when you put a constant in a query you are trasnforming from text ). > In the with time zone case it formats/expects it as a time string in > the session configured time zone, in the without case it treats it ( > aproximately ) as if it was in utc ( and then discards the "+00" after > formating ). > > Maybe I'm confussing you more, its not too easy to explain. > > The point is TIMEZONE is not stored in either of them. Many thanks for clarification, very much appreciated. Your point is consistent with the manual: “All timezone-aware dates and times are stored internally in UTC” The subtleties of how a TIMESTAMP WITH TIME ZONE is stored versus how it is rendered by a client or used in calculationsand queries have been discussed in numerous places. On reflection, maybe my question was phrased badly. The question should not be “how does Postgres store the timestamp internally”. Rather it should read “is enforcing the submission of UTC denominated timestamps in the server by using a domain a sensibleway to enforce a policy that will blow up when apps attempt to use non-UTC timestamps (when they shouldn’t be)”. So the question is not how does the timestamp get stored, rather, is it an anti-pattern to use Postgres as a linter for appsthat forget to use UTC exclusively?
On Thu, May 10, 2018 at 11:19:36AM +0100, Ben Hood wrote: > The question should not be “how does Postgres store the timestamp internally”. > > Rather it should read “is enforcing the submission of UTC denominated timestamps in the server by using a domain a sensibleway to enforce a policy that will blow up when apps attempt to use non-UTC timestamps (when they shouldn’t be)”. > > So the question is not how does the timestamp get stored, rather, is it an anti-pattern to use Postgres as a linter forapps that forget to use UTC exclusively? I dare say it is one of PG's strengths' to be usable as a "linter". However, maybe rephrase to: Is it an anti-pattern to use Postgres as a linter for apps that forget to use ... timezones *appropriately* ... ? As long as you can force apps to submit proper timestamp-with- timezone data is there _really_ a need to care whether apps do submit in UTC ? After all, it is always converted to UTC servside anyway ? In case you want to enforce only ever _handing out_ UTC data you could wrap the table in a view with forces the output timezone to UTC and only offers timestamp-withOUT-timezone to the outside. Then force read access via the view. Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
> On 10 May 2018, at 11:36, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > > On Thu, May 10, 2018 at 11:19:36AM +0100, Ben Hood wrote: > > I dare say it is one of PG's strengths' to be usable as a > "linter”. Interesting that you share this view, because after thinking about why I was doing this, using UTC domains is for PG to bea linter. > However, maybe rephrase to: > > Is it an anti-pattern to use Postgres as a linter for > apps that forget to use ... timezones *appropriately* ... ? > > As long as you can force apps to submit proper timestamp-with- > timezone data is there _really_ a need to care whether apps > do submit in UTC ? OK, so by using TIMESTAMP WITH TIME ZONE, you force all apps to submit timezone qualified timestamps in what language theyare written in. > After all, it is always converted to UTC > servside anyway ? And because of the internal UTC representation, there is no room for ambiguous timezones. > > In case you want to enforce only ever _handing out_ UTC data > you could wrap the table in a view with forces the output > timezone to UTC and only offers timestamp-withOUT-timezone to > the outside. Then force read access via the view. So on balance there is no need to use a domain for this? Or are we saying that domains are one way of achieving the timestamp hygiene, but equally, you can get the same result asdescribed above? Or is there a specific downside to using a domain for this purpose? …..maybe I’ve just got a subjective prejudice for using domains to refine and tighten built in data types….
On 05/10/2018 04:31 AM, Ben Hood wrote: > >> On 10 May 2018, at 11:36, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: >> >> On Thu, May 10, 2018 at 11:19:36AM +0100, Ben Hood wrote: >> >> I dare say it is one of PG's strengths' to be usable as a >> "linter”. > > > Interesting that you share this view, because after thinking about why I was doing this, using UTC domains is for PG tobe a linter. > > >> However, maybe rephrase to: >> >> Is it an anti-pattern to use Postgres as a linter for >> apps that forget to use ... timezones *appropriately* ... ? >> >> As long as you can force apps to submit proper timestamp-with- >> timezone data is there _really_ a need to care whether apps >> do submit in UTC ? > > OK, so by using TIMESTAMP WITH TIME ZONE, you force all apps to submit timezone qualified timestamps in what language theyare written in. Not really: https://www.postgresql.org/docs/10/static/datatype-datetime.html "For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone." > >> After all, it is always converted to UTC >> servside anyway ? > > And because of the internal UTC representation, there is no room for ambiguous timezones. Define ambiguous timezone? > >> >> In case you want to enforce only ever _handing out_ UTC data >> you could wrap the table in a view with forces the output >> timezone to UTC and only offers timestamp-withOUT-timezone to >> the outside. Then force read access via the view. > > So on balance there is no need to use a domain for this? > > Or are we saying that domains are one way of achieving the timestamp hygiene, but equally, you can get the same resultas described above? > > Or is there a specific downside to using a domain for this purpose? > > …..maybe I’ve just got a subjective prejudice for using domains to refine and tighten built in data types…. > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Thu, May 10, 2018 at 7:31 AM, Ben Hood <ben@relops.com> wrote:
Or are we saying that domains are one way of achieving the timestamp hygiene, but equally, you can get the same result as described above?
The *only* way to have timestamp hygiene is to require them to have time zones at all times, even if that time zone is UTC. Any other representation of a time is ambiguous without context.
On Thu, May 10, 2018 at 06:41:04AM -0700, Adrian Klaver wrote: >> OK, so by using TIMESTAMP WITH TIME ZONE, you force all apps to submit timezone qualified timestamps in what languagethey are written in. > > Not really: > > https://www.postgresql.org/docs/10/static/datatype-datetime.html > > "For timestamp with time zone, the internally stored value is always in UTC > (Universal Coordinated Time, traditionally known as Greenwich Mean Time, > GMT). An input value that has an explicit time zone specified is converted > to UTC using the appropriate offset for that time zone. If no time zone is > stated in the input string, then it is assumed to be in the time zone > indicated by the system's TimeZone parameter, and is converted to UTC using > the offset for the timezone zone." True enough, I didn't remember that behaviour. And since a BEFORE UPDATE/INSERT trigger will see the data to be inserted PG will have already done that while parsing from on-the-wire data into in-memory ts-w-tz presentation so we can't simply use a trigger to enforce explicit specification of a timezone. Therefore, a domain could work but will require client language support for easy integration. > > And because of the internal UTC representation, there is no room for ambiguous timezones. > > Define ambiguous timezone? OP likely means underspecified for his use case (= not assuming "unspecified" to mean "TimeZone value"). But, then, OP could always force TimeZone to UTC on his servers :-) Karsten --
Ben Hood <ben@relops.com> writes: > So the question is not how does the timestamp get stored, rather, is it an anti-pattern to use Postgres as a linter forapps that forget to use UTC exclusively? Well, using a domain to enforce additional constraints on a field's value is certainly not an anti-pattern in itself. But you have to realize that the processing consists of first creating a value of the base type and then applying the constraint expressions of the domain to it. This means you cannot check any details that are lost in the input conversion, because you don't have access to the original input string, only the stored value. As others have explained, Postgres' TIMESTAMP WITH TIME ZONE type doesn't preserve the input's timezone specification (if any) but forcibly rotates to UTC and stores just a scalar UTC value. So you can't use a domain to check anything about whether the input had a timezone field and if so what it was. (This behavior is nonstandard --- the SQL spec evidently expects the timezone to be stored explicitly in some fashion --- but I don't foresee us changing it; we've accumulated too much backwards-compatibility baggage now.) If you're sufficiently intent on having checking of that sort, you could invent your own datatype with your own input function, and then make it binary-compatible with timestamptz so that you don't need to provide much else besides the I/O functions. varchar(n) has the same sort of relationship with text, so there's precedent ... regards, tom lane
On Thu, May 10, 2018 at 12:19 PM, Ben Hood <ben@relops.com> wrote: >> On 10 May 2018, at 09:59, Francisco Olarte <folarte@peoplecall.com> wrote: .... >> Maybe I'm confussing you more, its not too easy to explain. >> The point is TIMEZONE is not stored in either of them. > > Many thanks for clarification, very much appreciated. > > Your point is consistent with the manual: “All timezone-aware dates and times are stored internally in UTC” Tom lane, which is much more knowledgeable than me, points they are stored in UTC. Maybe, but, IIRC, they are stored in 8 bytes, I do not know exactly how, last time I read it that could be either a float8 number or a int8 ( maybe to int4 numbers ). I think they use the same type of storage as the unix timestamps ( unix timestamp is normally a signed number of integer seconds from an arbitrary start point, 19700101T000000 UTC, and they designated an instant in time. 7200 designates an instant, I can format it for the reader in many ways, 19700101T020000+0000, "tres de la mañana del primero de enero de mil novecientos ochenta, hora de Madrid", "1970-01-01 03:00:00 +01" ). But it is not UTC, it is the designation of an instant in time. Timestamps do not have time zones. BUT the postgres data types timestamptz is a way to store a timestamp. So is timestamp. And so is float8. The difference is how it is converted and interacts with other types. > Rather it should read “is enforcing the submission of UTC denominated timestamps in the server by using a domain a sensibleway to enforce a policy that will blow up when apps attempt to use non-UTC timestamps (when they shouldn’t be)”. > So the question is not how does the timestamp get stored, rather, is it an anti-pattern to use Postgres as a linter forapps that forget to use UTC exclusively? That poses a problem. You must check the input representation. I mean..., you were using extract on the value, but this happens ( YMMV ): cdrs=# show timezone; TimeZone --------------- Europe/Madrid (1 row) cdrs=# select extract(timezone from current_timestamp); date_part ----------- 7200 (1 row) cdrs=# set timezone TO 'UTC'; SET cdrs=# select extract(timezone from current_timestamp); date_part ----------- 0 (1 row) cdrs=# select extract(timezone from '2018-01-01 07:00:00+0100'::timestamptz), extract(timezone from '2018-01-01 00:00:00-0300'::timestamptz); date_part | date_part -----------+----------- 0 | 0 (1 row) cdrs=# set timezone to 'Europe/Madrid'; SET cdrs=# select extract(timezone from '2018-01-01 07:00:00+0100'::timestamptz), extract(timezone from '2018-01-01 00:00:00-0300'::timestamptz); date_part | date_part -----------+----------- 3600 | 3600 (1 row) cdrs=# select extract(timezone from '2018-01-01 07:00:00+0000'::timestamptz), extract(timezone from '2018-07-01 00:00:00+0000'::timestamptz); date_part | date_part -----------+----------- 3600 | 7200 (1 row) As you can see you are getting the seconds offset for the client, may be in a domain for the server, timezone at the instant in time designated by the value. Not what you originally typed to locate the instant in time. For what you want to do I think you'll have to parse the text value, maybe by definig a view with a text columns and using some rule/trigger magic for insert / updates. It seems you want to force your users to use an explcit time zone. This may be better handled above the database. Francisco Olarte.
> On 10 May 2018, at 14:41, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> OK, so by using TIMESTAMP WITH TIME ZONE, you force all apps to submit timezone qualified timestamps in what languagethey are written in. > > Not really: > > https://www.postgresql.org/docs/10/static/datatype-datetime.html > > "For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionallyknown as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted toUTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumedto be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset forthe timezone zone.” Many thanks for this clarification. So therefore you can’t rely on the TIMESTAMP WITH TIME ZONE forcing the app to explicitlyspecify the offset. This is is because if the app is not specific, then the server will default back to its configuredtimezone. So to get deterministic timestamps, you could either: a) make sure the server is always configured to run in UTC; b) issue SET TIME ZONE ‘UTC’; at the beginning of any application session > >>> After all, it is always converted to UTC >>> servside anyway ? >> And because of the internal UTC representation, there is no room for ambiguous timezones. > > Define ambiguous timezone? What I meant to say that is there should be no possibility for an effective timezone to arise implicitly. For example, if you (1) didn’t control the db server config and (2) and you forgot to enforce UTC at a client driver level and (3) didn’t set the offset in the app session Then the only way to know what the effective zone offset will be is to find out what the server default is. Is this plausible?
On 10 May 2018, at 15:12, Vick Khera <vivek@khera.org> wrote:On Thu, May 10, 2018 at 7:31 AM, Ben Hood <ben@relops.com> wrote:Or are we saying that domains are one way of achieving the timestamp hygiene, but equally, you can get the same result as described above?The *only* way to have timestamp hygiene is to require them to have time zones at all times, even if that time zone is UTC. Any other representation of a time is ambiguous without context.
That makes sense.
The motivation behind narrowing the built in TIMESTAMP WITH TIME ZONE down to a domain is to ensure the only permissible zone offset is UTC. This would be unambiguous.
> On 10 May 2018, at 15:17, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > >> Not really: >> >> https://www.postgresql.org/docs/10/static/datatype-datetime.html >> >> "For timestamp with time zone, the internally stored value is always in UTC >> (Universal Coordinated Time, traditionally known as Greenwich Mean Time, >> GMT). An input value that has an explicit time zone specified is converted >> to UTC using the appropriate offset for that time zone. If no time zone is >> stated in the input string, then it is assumed to be in the time zone >> indicated by the system's TimeZone parameter, and is converted to UTC using >> the offset for the timezone zone." > > True enough, I didn't remember that behaviour. > > And since a BEFORE UPDATE/INSERT trigger will see the data to > be inserted PG will have already done that while parsing from > on-the-wire data into in-memory ts-w-tz presentation so we > can't simply use a trigger to enforce explicit specification > of a timezone. > > Therefore, a domain could work but will require client > language support for easy integration. For reference, the domain based solution has been working for a while. It has fleshed out bugs in apps that failed to set the zone either on the session level at the field level. In those circumstances, the server raised an exception to say that a non-UTC timestamp was submitted. It could have been the timestamp had a specific zone other than UTC. In this case, the behavior would be that Postgres storesit as UTC and therefore all is good. But it also would be that a timestamp WITHOUT a zone was submitted, in which case, the server default would take precedence. > >>> And because of the internal UTC representation, there is no room for ambiguous timezones. >> >> Define ambiguous timezone? > > OP likely means underspecified for his use case (= not > assuming "unspecified" to mean "TimeZone value"). But, then, > OP could always force TimeZone to UTC on his servers :-) That is what the OP meant. That the zone value was not explicit in all cases. For example, if you fallback to the serverdefault. And yes, the OP could have set UTC both on the DB servers (assuming OP controls them) and within each server app (assumingOP controls them).
On 10 May 2018, at 15:12, Vick Khera <vivek@khera.org> wrote:On Thu, May 10, 2018 at 7:31 AM, Ben Hood <ben@relops.com> wrote:Or are we saying that domains are one way of achieving the timestamp hygiene, but equally, you can get the same result as described above?The *only* way to have timestamp hygiene is to require them to have time zones at all times, even if that time zone is UTC. Any other representation of a time is ambiguous without context.That makes sense.The motivation behind narrowing the built in TIMESTAMP WITH TIME ZONE down to a domain is to ensure the only permissible zone offset is UTC. This would be unambiguous.
'2018-05-10T15:23:00-07:00'::timestamptz is unambiguous
Allowing client applications to represent time in the user's timezone is a feature.
"""Ben
So to get deterministic timestamps, you could either:
a) make sure the server is always configured to run in UTC;
b) issue SET TIME ZONE ‘UTC’; at the beginning of any application session
a) make sure the server is always configured to run in UTC;
b) issue SET TIME ZONE ‘UTC’; at the beginning of any application session
"""
No
If I send 4pm ET to the server to be stored in a timestamptz field, and fail to tell the server that the timezone is ET in the value itself then I must instead set my session timezone to ET or the server is going to store the wrong value. There is nothing you can do in an default server to prevent this. Tom has described how you could possibly make the "fail to tell the server that the timezone is ET" impossible using a custom type. This seems to be what you want though I'd question whether it is worth the cost.
I'm not sure how binary timestamp values being sent to the server in a BIND command plays into this...
David J.
On 10 May 2018, at 15:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:Ben Hood <ben@relops.com> writes:So the question is not how does the timestamp get stored, rather, is it an anti-pattern to use Postgres as a linter for apps that forget to use UTC exclusively?
Well, using a domain to enforce additional constraints on a field's value
is certainly not an anti-pattern in itself. But you have to realize that
the processing consists of first creating a value of the base type and
then applying the constraint expressions of the domain to it. This means
you cannot check any details that are lost in the input conversion,
because you don't have access to the original input string, only the
stored value.
Fair point.
In the case of the domain based on TIMESTAMP WITH TIME ZONE, if you are mandating UTC, what further details could get lost that may have been in the original input string?
The semantics are that inserting into a column of this domain with a non-UTC or absent zone will be rejected and hence is data you don’t want in the database in any case.
As others have explained, Postgres' TIMESTAMP WITH TIME ZONE type doesn't
preserve the input's timezone specification (if any) but forcibly rotates
to UTC and stores just a scalar UTC value. So you can't use a domain to
check anything about whether the input had a timezone field and if so what
it was.
OK, point taken. The intention of the domain is prevent any non-UTC or absent zones, so do you need to check anything after you have inserted it?
(This behavior is nonstandard --- the SQL spec evidently expects the
timezone to be stored explicitly in some fashion --- but I don't foresee
us changing it; we've accumulated too much backwards-compatibility
baggage now.)
If you're sufficiently intent on having checking of that sort, you could
invent your own datatype with your own input function, and then make it
binary-compatible with timestamptz so that you don't need to provide much
else besides the I/O functions. varchar(n) has the same sort of
relationship with text, so there's precedent …
Many thanks for your feedback, much appreciated. What I hear you saying is that is possible and similar things have been done before. But by the same token, just because you can do it, doesn’t necssarily mean it buys you that much.
So if a person of your experience hasn’t come across something like this before, it might mean that on balance it’s not worth the effort and/or potential deviation from standard DB tooling. Be pragmatic.
As an aside, this whole discussion has been super valuable for me to get a better understanding of exactly what is going on. This is precisely the intention of the domain in the first place - to be explicit about timestamps everywhere.
So if I get timestamp UTC explicitness everywhere in my database by actually understanding what is going on as opposed to using a domain, then I have achieved my goal.
I appreciate everybody chiming in on this topic :-)
On 05/10/2018 09:09 AM, Ben Hood wrote: > >> On 10 May 2018, at 14:41, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>> OK, so by using TIMESTAMP WITH TIME ZONE, you force all apps to submit timezone qualified timestamps in what languagethey are written in. >> >> Not really: >> >> https://www.postgresql.org/docs/10/static/datatype-datetime.html >> >> "For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionallyknown as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted toUTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumedto be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset forthe timezone zone.” > > > Many thanks for this clarification. So therefore you can’t rely on the TIMESTAMP WITH TIME ZONE forcing the app to explicitlyspecify the offset. This is is because if the app is not specific, then the server will default back to its configuredtimezone. > > So to get deterministic timestamps, you could either: > > a) make sure the server is always configured to run in UTC; > b) issue SET TIME ZONE ‘UTC’; at the beginning of any application session Well if you are using a timestamp with timezone field the value is always going to be stored as UTC. The TimeZone setting just determines the rotation from the input value to the stored value and the reverse. My previous point was just that Postgres will not enforce an offset on input data. > > > >> >>>> After all, it is always converted to UTC >>>> servside anyway ? >>> And because of the internal UTC representation, there is no room for ambiguous timezones. >> >> Define ambiguous timezone? > > What I meant to say that is there should be no possibility for an effective timezone to arise implicitly. > > For example, if you > > (1) didn’t control the db server config > > and > > (2) and you forgot to enforce UTC at a client driver level > > and > > (3) didn’t set the offset in the app session > > > Then the only way to know what the effective zone offset will be is to find out what the server default is. > > Is this plausible? If you mean find the server default then yes: test_(aklaver)> select current_setting('TimeZone'); current_setting ----------------- US/Pacific https://www.postgresql.org/docs/10/static/functions-admin.html#FUNCTIONS-ADMIN-SET > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On 05/10/2018 09:13 AM, Ben Hood wrote: > >> On 10 May 2018, at 15:12, Vick Khera <vivek@khera.org >> <mailto:vivek@khera.org>> wrote: >> >> On Thu, May 10, 2018 at 7:31 AM, Ben Hood <ben@relops.com >> <mailto:ben@relops.com>> wrote: >> >> Or are we saying that domains are one way of achieving the >> timestamp hygiene, but equally, you can get the same result as >> described above? >> >> >> The *only* way to have timestamp hygiene is to require them to have >> time zones at all times, even if that time zone is UTC. Any other >> representation of a time is ambiguous without context. > > That makes sense. > > The motivation behind narrowing the built in TIMESTAMP WITH TIME ZONE > down to a domain is to ensure the only permissible zone offset is UTC. > This would be unambiguous. Per my previous post a timestamp with timezone is going to be stored as UTC, so there is no ambiguity there. On reflection I realized your concern maybe with determining the original input timezone. That information is not stored by Postgres, so there can be ambiguity as to their value. Doing all timestamps in UTC is one one way to eliminate this. The potential issue I see is that you now push the ambiguity into the app. Namely just from looking at the database values you still do not know what the original timezone the app lives in is. -- Adrian Klaver adrian.klaver@aklaver.com
> On 10 May 2018, at 18:29, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > Per my previous post a timestamp with timezone is going to be stored as UTC, so there is no ambiguity there. On reflectionI realized your concern maybe with determining the original input timezone. That information is not stored by Postgres,so there can be ambiguity as to their value. Doing all timestamps in UTC is one one way to eliminate this. The potentialissue I see is that you now push the ambiguity into the app. Namely just from looking at the database values youstill do not know what the original timezone the app lives in is. That’s very true, I hadn’t thought of that. The use case I have is a schema that unites billing events from disparate telecoms systems from a bunch of different carriers.The source of the non-specific timestamps is party input data files that provide different local timestamps forsystems in different systems, but also different apps that submit billing events. So there are many inconsistent feedsinto the database. So when we push the ambiguity into the app, at least what is happening is the the transaction is rejected which means theapp breaks. When the app breaks, we can go in and fix the timestamp bug in the particular app. Often this is difficult,and we need to reason about the source data, but the breakage at least tells us that something is wrong. Otherwisewe ingress the data, do complex billing queries and then the only time we find out about a bug is when a customercomplains about a particular bill. When this happens, it is really difficult to determine whether there is a bugin the query logic or if the input is bogus. So in this sense, the database is linting the the source data.
On 10 May 2018, at 17:35, David G. Johnston <david.g.johnston@gmail.com> wrote:'2018-05-10T15:23:00-07:00'::timestamptz is unambiguous
That is true. Mandating UTC is not the only way to eliminate ambiguity. Apologies for appearing to suggest that this is case.
Allowing client applications to represent time in the user's timezone is a feature.
Fair point.
If I send 4pm ET to the server to be stored in a timestamptz field, and fail to tell the server that the timezone is ET in the value itself then I must instead set my session timezone to ET or the server is going to store the wrong value. There is nothing you can do in an default server to prevent this. Tom has described how you could possibly make the "fail to tell the server that the timezone is ET" impossible using a custom type. This seems to be what you want though I'd question whether it is worth the cost.
The domain approach is what has been implemented and has been running in production for some time without a problem (that we know of).
The cost of this appears to be quite trivial in Postgres - there are 2 lines of DDL in the OP that describe the implementation and it seems to work.
The reason why I started this thread is because somebody on a JDBC related list told me that I was crazy for trying to do this and that I had most likely misunderstood how Postgres works. So respecting their opinion, I decided to ask the Postgres experts.
It is heartening to hear Tom say that the idea is not completely insane, but questions the business value of mandating UTC, given there are other ways to ensure timezone explicitness across the app(s) and the database.
I’m also somewhat wiser about Postgres works now as well.
I'm not sure how binary timestamp values being sent to the server in a BIND command plays into this...
The bit I’m ignorant of is that the domain approach appears to work with the timestampz wire representation. Apps seem to bind using the platform timestampz representation and the server barfs if the app forgets to enforce UTC.
On 10 May 2018, at 17:38, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
Well if you are using a timestamp with timezone field the value is always going to be stored as UTC. The TimeZone setting just determines the rotation from the input value to the stored value and the reverse. My previous point was just that Postgres will not enforce an offset on input data.
Good point.
Then the only way to know what the effective zone offset will be is to find out what the server default is.
Is this plausible?
If you mean find the server default then yes:
test_(aklaver)> select current_setting('TimeZone');
current_setting
-----------------
US/Pacific
Thanks for the tip.
I think for the use case you describe, and given the fact that it does work in production, your solution simply shows The Power Of PostgreSQL. Java tries to be cross-platform, JDBC tries to be cross-database -- these goals don't seem to lend themselves to advocating or understanding what one particular database may have to offer over and above basic SQL for solving a particular problem. Karsten --
On 10 May 2018, at 16:33, Francisco Olarte <folarte@peoplecall.com> wrote:
For what you want to do I think you'll have to parse the text value,
maybe by definig a view with a text columns and using some
rule/trigger magic for insert / updates.
Sorry for being unclear - the solution I have in production appears to work with
CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0);
This raises an exception when an app doesn’t use UTC.
It seems you want to force your users to use an explcit time zone.
This may be better handled above the database.
This is what I wanted to achieve and the approach so far seems to work.
It is just that most people think the approach is weird. And this is the reason for the OP - a knowledgable person on a different list saw this approach, thought it was weird and suggested that I should educate myself. Which led me to asking on this list.
And the consensus appears to be that the approach is weird and that there are other ways to achieve timezone explicitness, one of which is to gain a deep understanding of how Postgres handles and stores timestamps.
On 2018-05-10 21:37:26 +0100, Ben Hood wrote: > On 10 May 2018, at 16:33, Francisco Olarte <folarte@peoplecall.com> wrote: > > For what you want to do I think you'll have to parse the text value, > maybe by definig a view with a text columns and using some > rule/trigger magic for insert / updates. > > > Sorry for being unclear - the solution I have in production appears to work > with > > CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE > FROM VALUE) = 0); > > This raises an exception when an app doesn’t use UTC. I don't understand how this can work. As Francisco demonstrated, EXTRACT(TIMEZONE FROM ts) doesn't extract the time zone from the value ts, it reports the offset of the client's time zone. So, if my time zone is set to Europe/Vienna, extract(timezone from '2018-05-10 23:17:44+00'::timestamptz) will still return 7200, even though I have explicitely specified a UTC timestamp. What your check probably does is to enforce that the client's time zone is set to UTC. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment
On 05/10/2018 02:17 PM, Peter J. Holzer wrote: > On 2018-05-10 21:37:26 +0100, Ben Hood wrote: >> On 10 May 2018, at 16:33, Francisco Olarte <folarte@peoplecall.com> wrote: >> >> For what you want to do I think you'll have to parse the text value, >> maybe by definig a view with a text columns and using some >> rule/trigger magic for insert / updates. >> >> >> Sorry for being unclear - the solution I have in production appears to work >> with >> >> CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE >> FROM VALUE) = 0); >> >> This raises an exception when an app doesn’t use UTC. > > I don't understand how this can work. As Francisco demonstrated, > EXTRACT(TIMEZONE FROM ts) doesn't extract the time zone from the value > ts, it reports the offset of the client's time zone. > > So, if my time zone is set to Europe/Vienna, > extract(timezone from '2018-05-10 23:17:44+00'::timestamptz) > will still return 7200, even though I have explicitely specified a UTC > timestamp. It depends on the TimeZone setting in the conf file. So on my machine I have two instances of Postgres running. One is set to the timezone set by initdb to 'US/Pacific'. In the other I set TimeZone = 'UTC'. Using the same client(psql) I get: 'US/Pacific' instance test=> select now(); now ------------------------------- 2018-05-10 14:47:40.903274-07 test=> select extract(timezone from '05/10/18 14:45+00'::timestamptz); date_part ----------- -25200 (1 row) ---------------------------------------- 'UTC' instance postgres=> select now(); now ------------------------------- 2018-05-10 21:47:24.934913+00 postgres=> select extract(timezone from '05/10/18 14:45+00'::timestamptz); date_part ----------- 0 (1 row) - > > What your check probably does is to enforce that the client's time zone > is set to UTC. > > hp > -- Adrian Klaver adrian.klaver@aklaver.com
On 05/10/2018 01:37 PM, Ben Hood wrote: > > >> On 10 May 2018, at 16:33, Francisco Olarte <folarte@peoplecall.com >> <mailto:folarte@peoplecall.com>> wrote: >> >> For what you want to do I think you'll have to parse the text value, >> maybe by definig a view with a text columns and using some >> rule/trigger magic for insert / updates. > > Sorry for being unclear - the solution I have in production appears to > work with > > CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK > (EXTRACT(TIMEZONE FROM VALUE) = 0); > > This raises an exception when an app doesn’t use UTC. > >> >> It seems you want to force your users to use an explcit time zone. >> This may be better handled above the database. > > This is what I wanted to achieve and the approach so far seems to work. > > It is just that most people think the approach is weird. And this is the > reason for the OP - a knowledgable person on a different list saw this > approach, thought it was weird and suggested that I should educate > myself. Which led me to asking on this list. > > And the consensus appears to be that the approach is weird and that > there are other ways to achieve timezone explicitness, one of which is > to gain a deep understanding of how Postgres handles and stores timestamps. Trying to tame time and time zones is maybe quixotic, but not weird. While I was working on my response to Peter I realized that the below: CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0); might not work the way you expect if you have your server set to 'UTC'. show timezone; TimeZone ---------- UTC (1 row) CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0); create table ts_check(id integer, ts_fld utc_timestamp); insert into ts_check values (1, now()); insert into ts_check values (2, '05/10/18 15:23'); insert into ts_check values (3, '05/10/18 15:23+07'); test=> select * from ts_check ; id | ts_fld ----+------------------------------- 1 | 2018-05-10 22:37:58.745263+00 2 | 2018-05-10 15:23:00+00 3 | 2018-05-10 08:23:00+00 In fact I am not sure how it works:): set timezone = 'US/Pacific'; test=> insert into ts_check values (5, '05/10/18 15:23'); ERROR: value for domain utc_timestamp violates check constraint "utc_timestamp_check" test=> insert into ts_check values (6, '05/10/18 15:23+00'); ERROR: value for domain utc_timestamp violates check constraint "utc_timestamp_check" -- Adrian Klaver adrian.klaver@aklaver.com
> On 10 May 2018, at 22:17, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > I don't understand how this can work. As Francisco demonstrated, > EXTRACT(TIMEZONE FROM ts) doesn't extract the time zone from the value > ts, it reports the offset of the client's time zone. Yes, you and Francisco are right. If you do: set time zone 'UTC'; select '2018-05-10 23:17:44-02' :: utc_timestamp; This will work. But if you change the client zone, it will not: set time zone 'Asia/Pyongyang’; select '2018-05-10 23:17:44-02' :: utc_timestamp; [Code: , SQL State: 23514] ERROR: value for domain utc_timestamp violates check constraint "utc_timestamp_check" > What your check probably does is to enforce that the client's time zone > is set to UTC. Logically you must be right, given (a) you cannot extract the zone from the stored value, because the stored value does not contain a zone (b) the examples that only work when the session zone is set to UTC Therefore the semantics of EXTRACT(TIMEZONE FROM ts) must rely on the session zone only. Hence all the check does is to enforce the UTC zone.
On 10 May 2018, at 23:43, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
Trying to tame time and time zones is maybe quixotic, but not weird.
Quixotic is a very good description, I’d happily admit that using the UTC domain in this way is not as pragmatic as I thought it would when I introduced it.
While I was working on my response to Peter I realized that the below:
CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0);
might not work the way you expect if you have your server set to 'UTC’.
“Might not work the way you expect it to” is the underlying theme and the overall motivation for trying to master time zones from disparate sources.
show timezone;
TimeZone
----------
UTC
(1 row)
CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0);
create table ts_check(id integer, ts_fld utc_timestamp);
insert into ts_check values (1, now());
insert into ts_check values (2, '05/10/18 15:23');
insert into ts_check values (3, '05/10/18 15:23+07');
test=> select * from ts_check ;
id | ts_fld
----+-------------------------------
1 | 2018-05-10 22:37:58.745263+00
2 | 2018-05-10 15:23:00+00
3 | 2018-05-10 08:23:00+00
In fact I am not sure how it works:):
As Peter and Francisco has previously demonstrated, I believe that inserting a timestamp into the UTC_TIMESTAMP column:
(1) Checks the client time zone is UTC
(2) Converts the timestamp to UTC, whether it was presented as UTC or not.
Going to back to the original intention of enforcing UTC, the end result is correct, but for the wrong reason.
In this example, row 3 contains the UTC value of '05/10/18 15:23+07’, which is the end result I wanted (pure UTC everywhere).
But, the way that it is actually happens under the covers is masked by the use of the UTC_TIMESTAMP domain.
In the scenario, there is a belief that only UTC timestamps are being inserted, which is not true.
The way that Postgres stores any timestamp leads to the desired result, fooling me into believing that the UTC_TIMESTAMP domain is casual for the correct result.
I would have need check whether the behavior is the same using a network client driver (maybe there is some implicit coercion going on when psql parses the statement).
But I think the point still stands that Postgres timestamp handling is doing the real work and is casual for the desired outcome, not the domain.
On Thu, May 10, 2018 at 10:37 PM, Ben Hood <ben@relops.com> wrote: > On 10 May 2018, at 16:33, Francisco Olarte <folarte@peoplecall.com> wrote: > > For what you want to do I think you'll have to parse the text value, > maybe by definig a view with a text columns and using some > rule/trigger magic for insert / updates. > > > Sorry for being unclear - the solution I have in production appears to work > with > > CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK > (EXTRACT(TIMEZONE FROM VALUE) = 0); > > This raises an exception when an app doesn’t use UTC. .... > This is what I wanted to achieve and the approach so far seems to work. Can you post an example ( correct, error inputs, preferrably done with psql ) ? At this point I do not know if it is working or if it seems to working for you. I see some cases in which it would seem to work, but I would like to know what you mean by "sending non utc timestamps to the database". I mean something like the one attached below, which to me seems to prove it does not work, it just checks that the session timezone is utc, note the 1st transaction tests an explicit timestamp value, without conversions, failing in non-utc, the second one checks an explicit +0000 zone not working in non UTC and the third one checks anything goes , with or without timestamp, when the time zone is utc. cdrs=# show timezone; TimeZone --------------- Europe/Madrid (1 row) cdrs=# begin; BEGIN cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0); CREATE DOMAIN cdrs=# create table t(ts utc_timestamp); CREATE TABLE cdrs=# insert into t values ( current_timestamp ); ERROR: value for domain utc_timestamp violates check constraint "utc_timestamp_check" cdrs=# rollback; ROLLBACK cdrs=# select current_timestamp; now ------------------------------- 2018-05-12 12:58:03.616949+02 (1 row) cdrs=# begin; BEGIN cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0); CREATE DOMAIN cdrs=# create table t(ts utc_timestamp); CREATE TABLE cdrs=# insert into t values ('2015-05-11 10:20:30+0000'); ERROR: value for domain utc_timestamp violates check constraint "utc_timestamp_check" cdrs=# rollback; ROLLBACK cdrs=# set timezone TO 'UTC'; SET cdrs=# select current_timestamp; now ------------------------------- 2018-05-12 10:59:47.946338+00 (1 row) cdrs=# begin; BEGIN cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK (EXTRACT(TIMEZONE FROM VALUE) = 0); CREATE DOMAIN cdrs=# create table t(ts utc_timestamp); CREATE TABLE cdrs=# insert into t values ('2015-05-11 10:20:30+0000'); INSERT 0 1 cdrs=# insert into t values ('2015-05-11 10:20:30+0200'); INSERT 0 1 cdrs=# insert into t values ('2015-05-11 10:20:30'); INSERT 0 1 cdrs=# insert into t values (current_timestamp); INSERT 0 1 cdrs=# select * from t; ts ------------------------------- 2015-05-11 10:20:30+00 2015-05-11 08:20:30+00 2015-05-11 10:20:30+00 2018-05-12 10:59:54.289827+00 (4 rows) cdrs=# rollback; ROLLBACK Francisco Olarte.
On 05/12/2018 04:04 AM, Francisco Olarte wrote: > On Thu, May 10, 2018 at 10:37 PM, Ben Hood <ben@relops.com> wrote: >> On 10 May 2018, at 16:33, Francisco Olarte <folarte@peoplecall.com> wrote: >> >> For what you want to do I think you'll have to parse the text value, >> maybe by definig a view with a text columns and using some >> rule/trigger magic for insert / updates. >> >> >> Sorry for being unclear - the solution I have in production appears to work >> with >> >> CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK >> (EXTRACT(TIMEZONE FROM VALUE) = 0); >> >> This raises an exception when an app doesn’t use UTC. > .... >> This is what I wanted to achieve and the approach so far seems to work. > > Can you post an example ( correct, error inputs, preferrably done with psql ) ? > > At this point I do not know if it is working or if it seems to working > for you. I see some cases in which it would seem to work, but I would > like to know what you mean by "sending non utc timestamps to the > database". In the post below Ben explains that he realizes it is not working the way he thought: https://www.postgresql.org/message-id/1465CD01-C08A-49BF-BCF3-193E6509A703%40relops.com From what I can gather it comes down where UTC is being enforced. Ben was hoping that the domain would force timestamps to be only submitted with a UTC offset with the idea that this would force storage in the field as UTC only. The realization is that if you have a field of type timestamp with time zone the value is going to be stored as UTC regardless of the offset that is presented(even in cases where there is no offset presented, when an implicit one is assigned). That means there really is no need for the domain. > > I mean something like the one attached below, which to me seems to > prove it does not work, it just checks that the session timezone is > utc, note the 1st transaction tests an explicit timestamp value, > without conversions, failing in non-utc, the second one checks an > explicit +0000 zone not working in non UTC and the third one checks > anything goes , with or without timestamp, when the time zone is utc. > > cdrs=# show timezone; > TimeZone > --------------- > Europe/Madrid > (1 row) > > cdrs=# begin; > BEGIN > cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK > (EXTRACT(TIMEZONE FROM VALUE) = 0); > CREATE DOMAIN > cdrs=# create table t(ts utc_timestamp); > CREATE TABLE > cdrs=# insert into t values ( current_timestamp ); > ERROR: value for domain utc_timestamp violates check constraint > "utc_timestamp_check" > cdrs=# rollback; > ROLLBACK > cdrs=# select current_timestamp; > now > ------------------------------- > 2018-05-12 12:58:03.616949+02 > (1 row) > > cdrs=# begin; > BEGIN > cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK > (EXTRACT(TIMEZONE FROM VALUE) = 0); > CREATE DOMAIN > cdrs=# create table t(ts utc_timestamp); > CREATE TABLE > cdrs=# insert into t values ('2015-05-11 10:20:30+0000'); > ERROR: value for domain utc_timestamp violates check constraint > "utc_timestamp_check" > cdrs=# rollback; > ROLLBACK > cdrs=# set timezone TO 'UTC'; > SET > cdrs=# select current_timestamp; > now > ------------------------------- > 2018-05-12 10:59:47.946338+00 > (1 row) > > cdrs=# begin; > BEGIN > cdrs=# CREATE DOMAIN UTC_TIMESTAMP AS TIMESTAMP WITH TIME ZONE CHECK > (EXTRACT(TIMEZONE FROM VALUE) = 0); > CREATE DOMAIN > cdrs=# create table t(ts utc_timestamp); > CREATE TABLE > cdrs=# insert into t values ('2015-05-11 10:20:30+0000'); > INSERT 0 1 > cdrs=# insert into t values ('2015-05-11 10:20:30+0200'); > INSERT 0 1 > cdrs=# insert into t values ('2015-05-11 10:20:30'); > INSERT 0 1 > cdrs=# insert into t values (current_timestamp); > INSERT 0 1 > cdrs=# select * from t; > ts > ------------------------------- > 2015-05-11 10:20:30+00 > 2015-05-11 08:20:30+00 > 2015-05-11 10:20:30+00 > 2018-05-12 10:59:54.289827+00 > (4 rows) > > cdrs=# rollback; > ROLLBACK > > > Francisco Olarte. > > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian: On Sat, May 12, 2018 at 6:42 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > On 05/12/2018 04:04 AM, Francisco Olarte wrote: ... >> Can you post an example ( correct, error inputs, preferrably done with >> psql ) ? >> >> At this point I do not know if it is working or if it seems to working >> for you. I see some cases in which it would seem to work, but I would >> like to know what you mean by "sending non utc timestamps to the >> database". > > In the post below Ben explains that he realizes it is not working the way he > thought: Yep, saw it after posting this. > From what I can gather it comes down where UTC is being enforced. Ben was > hoping that the domain would force timestamps to be only submitted with a > UTC offset with the idea that this would force storage in the field as UTC > only. The realization is that if you have a field of type timestamp with > time zone the value is going to be stored as UTC regardless of the offset > that is presented(even in cases where there is no offset presented, when an > implicit one is assigned). That means there really is no need for the > domain. I think this problem, and similar ones, come from the description in the doc, and in a lot of similar places, as "the timestamp with time zone value is stored as utc", or "timestamps with time zones are normalized to utc". If you look at the docs you realize they are all stored as numbers, and ocupy the same space as timestamp W/O TZ. The thing is you do not need UTC for anything to represent timestamps. Using the (simpler) unix timestamp as an example. If UTC ( and UAT and GMT ) did not exist I could defne it as, say, "number of seconds elapsed since eiight in the morning January the Second in Madrid/Spain local clocks plus one hundred and eight thousands", no UTC needed at all, they are just numbers, they do not have timezones. The only difference is timestamp uses gmtime/timegm for text conversion and timestamptz uses localtime/timelocal. In fact I've found the following in the sources: * Timestamps, as well as the h/m/s fields of intervals, are stored as * int64 values with units of microseconds. (Once upon a time they were * double values with units of seconds.) And from some reading it seems to be like the unix timestamp, but in 64 bits microseconds and referencing extended Julian ( or Gregorian ) calendar, not too sure about it. I've read the definition somewhere, but thinking of them as "just a number" has avoided me a lot of problems. Francisco Olarte.
On 05/12/2018 10:22 AM, Francisco Olarte wrote: > Adrian: > > On Sat, May 12, 2018 at 6:42 PM, Adrian Klaver > <adrian.klaver@aklaver.com> wrote: >> On 05/12/2018 04:04 AM, Francisco Olarte wrote: > ... >>> Can you post an example ( correct, error inputs, preferrably done with >>> psql ) ? >>> >>> At this point I do not know if it is working or if it seems to working >>> for you. I see some cases in which it would seem to work, but I would >>> like to know what you mean by "sending non utc timestamps to the >>> database". >> >> In the post below Ben explains that he realizes it is not working the way he >> thought: > > Yep, saw it after posting this. > > >> From what I can gather it comes down where UTC is being enforced. Ben was >> hoping that the domain would force timestamps to be only submitted with a >> UTC offset with the idea that this would force storage in the field as UTC >> only. The realization is that if you have a field of type timestamp with >> time zone the value is going to be stored as UTC regardless of the offset >> that is presented(even in cases where there is no offset presented, when an >> implicit one is assigned). That means there really is no need for the >> domain. > > I think this problem, and similar ones, come from the description in > the doc, and in a lot of similar places, as "the timestamp with time > zone value is stored as utc", or "timestamps with time zones are > normalized to utc". If you look at the docs you realize they are all > stored as numbers, and ocupy the same space as timestamp W/O TZ. The > thing is you do not need UTC for anything to represent timestamps. > Using the (simpler) unix timestamp as an example. If UTC ( and UAT and > GMT ) did not exist I could defne it as, say, "number of seconds > elapsed since eiight in the morning January the Second in > Madrid/Spain local clocks plus one hundred and eight thousands", no > UTC needed at all, they are just numbers, they do not have timezones. > The only difference is timestamp uses gmtime/timegm for text > conversion and timestamptz uses localtime/timelocal. I would agree that timestamp and timestamptz are both stored as numbers. I would not agree they do not have timezones: show timezone; TimeZone ------------ US/Pacific create table ts_comparison (id integer, ts_tz timestamptz, ts_naive timestamp); insert into ts_comparison values (1, now(), now()); insert into ts_comparison values (1, '05/12/18 11:05', '05/12/18 11:05'); insert into ts_comparison values (1, '05/12/18 11:05-07', '05/12/18 11:05-07'); select * from ts_comparison ; id | ts_tz | ts_naive ----+-------------------------------+---------------------------- 1 | 2018-05-12 11:04:44.161849-07 | 2018-05-12 11:04:44.161849 1 | 2018-05-12 11:05:00-07 | 2018-05-12 11:05:00 1 | 2018-05-12 11:05:00-07 | 2018-05-12 11:05:00 A timestamp with time zone anchors the entered value at a point in time(UTC timezone) and therefore allows you to recover that point in time. From there you can rotate it to whatever timezone you want and know that it represents the original point in time. A timestamp(without time zone) just records the date and time portions without reference to a timezone. This means you have an un-anchored timestamp and a future of trying to reconstruct the original point in time. This is a very important distinction and the reason why if you care about accurate date time's you store as timestamp with time zone. In other words timestamp with time zone is more then 'just a number'. > > In fact I've found the following in the sources: > > * Timestamps, as well as the h/m/s fields of intervals, are stored as > * int64 values with units of microseconds. (Once upon a time they were > * double values with units of seconds.) > > And from some reading it seems to be like the unix timestamp, but in > 64 bits microseconds and referencing extended Julian ( or Gregorian ) > calendar, not too sure about it. I've read the definition somewhere, > but thinking of them as "just a number" has avoided me a lot of > problems. > > Francisco Olarte. > -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, May 12, 2018 at 8:19 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > I would agree that timestamp and timestamptz are both stored as numbers. Well, after reading source that is a fact. I was trying to say they are like real numbers, a point in a line. > I would not agree they do not have timezones: > show timezone; > TimeZone > ------------ > US/Pacific This proves a session has a timezone. > > create table ts_comparison (id integer, ts_tz timestamptz, ts_naive > timestamp); > > insert into ts_comparison values (1, now(), now()); > insert into ts_comparison values (1, '05/12/18 11:05', '05/12/18 11:05'); > insert into ts_comparison values (1, '05/12/18 11:05-07', '05/12/18 > > select * from ts_comparison ; > id | ts_tz | ts_naive > ----+-------------------------------+---------------------------- > 1 | 2018-05-12 11:04:44.161849-07 | 2018-05-12 11:04:44.161849 > 1 | 2018-05-12 11:05:00-07 | 2018-05-12 11:05:00 > 1 | 2018-05-12 11:05:00-07 | 2018-05-12 11:05:00 This proves timestampTz values get printed as a string which includes a timezone. It's got a cute "ts_naive" name to trick someone. And relies on some default conventions for convertions to/from strings to make it seem there is magic appearing. IIRC what you are doing is, given now() returns a timestamp with time one and the postgres default conversims: > insert into ts_comparison values (1, now(), now()); insert now(), now() at session_timezone <- do not remember the parameter name. > insert into ts_comparison values (1, '05/12/18 11:05', '05/12/18 11:05'); > insert into ts_comparison values (1, '05/12/18 11:05-07', '05/12/18 11:05-07'); Use string conversion, fully knowing that tstz conversion use session timestamp by default and ts ignore them, so for the second column you can use "+00", "-00", "-07" or +UnicornFarts", it's just line noise to the parser, and relies on a lot of hidden info, like the fact that your locale uses M/D/Y ( results would differ in mine which uses D/M/Y ). Those are examples of very complex string conversions. But timestamps do not have timezone. They are points in the time line. Points in earth surface have timezones, countries have timezones, but nor timestamp. > A timestamp with time zone anchors the entered value at a point in time(UTC > timezone) and therefore allows you to recover that point in time. The entered value is a string. The string representation of a point in time. This string, if it is given in the usual YMDhms or similar thing it needs a timezone ( either explicit or implied by the session ) to b converted into a timestamp. Once it's a timestamp you do not need the timezone. "Launch time of Apollo XI" is another string representation of a timestamp, which does not need timezones. You may need a timezone to convert it back to "YMDhms" form, but the timestamp does not have a time zone. If timestamps have time zones, then try to produce this for me "Launch time of Apollo XI", two values for this with different time zones ( Different values, that mean they must compare different as tstz, not two different string values ). Two different strings are not good, I agree you can produce strings with have a part which we call timezone and extract from it said part. You can extract any timezone from any timestamp, but it is of no use alone. It's just an string formatting artifact. > From there > you can rotate it to whatever timezone you want and know that it represents > the original point in time. I can represent it as a string, but I do not call it rotate. Of course, reprensenting a value in different ways does not change its meaning, an integer value equal to eighteen is not going to represent nothing different because I print it as "18", "022", "0x12" or "2*9" at different times. > A timestamp(without time zone) just records the > date and time portions without reference to a timezone. Nope. A ts records an instant in time. It is converted to string by default withtout printing a timezone, but this is becuase they are used this way. Also "just records" means tstz records more stuff. So it has more info. So I should not be able to build an isomorphism between then? > This means you have > an un-anchored timestamp and a future of trying to reconstruct the original > point in time. This is a very important distinction and the reason why if > you care about accurate date time's you store as timestamp with time zone. > In other words timestamp with time zone is more then 'just a number'. Nope. If you care about "easy input output" you store as tstz. It has nothing to do with accuracy ( for a correct manipulation, the thing is it is easier to wrtie correct code for tstz than for ts, but they are equally acurate ). In fact, if I store "2010-01-01 12:23:45"::timestamp I know I'm getting "2010-01-01 12:23:45" in the future, independently of where I read it, so normally I use it when I want to preserver "string" values but have some special "numeric" semantics. When I store instants in time I need to make sure my string input values accurately designate them, and I normally use tstz because its default input conversions process those correct formats I allow ( and screen before sending them for conversion ). Tell me one thing you can do with a timestamp with timezone that you cannot do with a int8 number or a with a timestamp without time zone. Given there is an isomorphism between them ( and, in postgres, the text version is the same for the inverse, IIRC "one=the_other at timezone UTC" ) and they have the same representation, I doubt it. They are just convenient ways to manipulate text values and apply functions based on type, but they do not have a time zone. You can not get a time zone from a timestamp with time zone value. What leads to countless confussion is it seems that a tstz-value should be composed of a ts-value plus a tz-value, but they are not. Is just a problem with naming. Francisco Olarte.
On 05/13/2018 01:55 AM, Francisco Olarte wrote: > On Sat, May 12, 2018 at 8:19 PM, Adrian Klaver > <adrian.klaver@aklaver.com> wrote: > >> I would agree that timestamp and timestamptz are both stored as numbers. > > Well, after reading source that is a fact. I was trying to say they > are like real numbers, a point in a line. > >> I would not agree they do not have timezones: >> show timezone; >> TimeZone >> ------------ >> US/Pacific > > This proves a session has a timezone. Agreed, it was included just to show what my time zone my client was configured in for the examples shown below. > >> >> create table ts_comparison (id integer, ts_tz timestamptz, ts_naive >> timestamp); >> >> insert into ts_comparison values (1, now(), now()); >> insert into ts_comparison values (1, '05/12/18 11:05', '05/12/18 11:05'); >> insert into ts_comparison values (1, '05/12/18 11:05-07', '05/12/18 >> >> select * from ts_comparison ; >> id | ts_tz | ts_naive >> ----+-------------------------------+---------------------------- >> 1 | 2018-05-12 11:04:44.161849-07 | 2018-05-12 11:04:44.161849 >> 1 | 2018-05-12 11:05:00-07 | 2018-05-12 11:05:00 >> 1 | 2018-05-12 11:05:00-07 | 2018-05-12 11:05:00 > > This proves timestampTz values get printed as a string which includes > a timezone. It's got a cute "ts_naive" name to trick someone. And > relies on some default conventions for convertions to/from strings to > make it seem there is magic appearing. IIRC what you are doing is, > given now() returns a timestamp with time one and the postgres default > conversims: Not trying to trick anyone and no magic. The difference in the represented values between ts_tz and ts_naive is the heart of my argument. Timestamptz values are stored in manner that allows you to have the output with a time zone offset. Timestamps w/notz are not. This is a big difference and very important. The simple case above does not really illustrate that though. Where it becomes important is if you have multiple clients in multiple sessions over multiple time zones. If you store timestamps in a timestamp(notz) field and get input from PDT and EDT of the timestamp string '05/13/18 11:53' you are not dealing with the same time even though they show up that way: set timezone = 'US/Eastern'; SET insert into ts_comparison values (2, '05/13/18 11:53', '05/13/18 11:53'); set timezone = 'US/Pacific'; SET insert into ts_comparison values (3, '05/13/18 11:53', '05/13/18 11:53'); select * from ts_comparison ; id | ts_tz | ts_naive ----+-------------------------------+---------------------------- 2 | 2018-05-13 08:53:00-07 | 2018-05-13 11:53:00 3 | 2018-05-13 11:53:00-07 | 2018-05-13 11:53:00 The above is my point. That is the ability to accurately distinguish points in time, because '2018-05-13 11:53:00' to me is just a timestamp string without the context of a tz offset to make it useful. > >> insert into ts_comparison values (1, now(), now()); > insert now(), now() at session_timezone <- do not remember the parameter name. > >> insert into ts_comparison values (1, '05/12/18 11:05', '05/12/18 11:05'); >> insert into ts_comparison values (1, '05/12/18 11:05-07', '05/12/18 11:05-07'); > > Use string conversion, fully knowing that tstz conversion use session > timestamp by default and ts ignore them, so for the second column you > can use "+00", "-00", "-07" or +UnicornFarts", it's just line noise to > the parser, and relies on a lot of hidden info, like the fact that > your locale uses M/D/Y ( results would differ in mine which uses > D/M/Y ). Those are examples of very complex string conversions. Not sure what the above is supposed to indicate. It is no surprise that there are a lot of ways to output timestamps in human readable form. > > > But timestamps do not have timezone. They are points in the time line. > Points in earth surface have timezones, countries have timezones, but > nor timestamp. I don't know about you but I am living on the earths surface:). That means when I deal with timestamps they are with reference to a location. There is the general notion of time marching on with out reference to Earth or humans. Then there is the time that humans deal with on a daily basis and that is very much anchored to time zones. For that we timestamps with timezones: https://www.postgresql.org/docs/10/static/datatype-datetime.html#DATATYPE-DATETIME-INPUT timestamp [ (p) ] with time zone > > >> A timestamp with time zone anchors the entered value at a point in time(UTC >> timezone) and therefore allows you to recover that point in time. > > The entered value is a string. The string representation of a point in > time. This string, if it is given in the usual YMDhms or similar thing > it needs a timezone ( either explicit or implied by the session ) to b > converted into a timestamp. Once it's a timestamp you do not need the > timezone. "Launch time of Apollo XI" is another string representation > of a timestamp, which does not need timezones. You may need a timezone > to convert it back to "YMDhms" form, but the timestamp does not have a > time zone. I will agree that timestamptz is stored as number only. However that number in Postgres has an implied time zone of UTC: https://www.postgresql.org/docs/10/static/datatype-datetime.html#DATATYPE-DATETIME-INPUT "For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT)" It is what allows timestamptz to deliver time zone offsets on output. > > If timestamps have time zones, then try to produce this for me "Launch > time of Apollo XI", two values for this with different time zones ( > Different values, that mean they must compare different as tstz, not > two different string values ). Two different strings are not good, I > agree you can produce strings with have a part which we call timezone > and extract from it said part. The point of time zones is that the above cannot be done. They all refer back to UTC via offsets so represent the same UTC time. Yet to the folks on the ground 07/16/1969 9:32:00-05 and 1969-07-16 06:32:00-08 are different times. It where the logical(universal time) and reality(perceived time) clash. > > You can extract any timezone from any timestamp, but it is of no use > alone. It's just an string formatting artifact. It provides context to the timestamp so it more then an artifact. > >> From there >> you can rotate it to whatever timezone you want and know that it represents >> the original point in time. > > I can represent it as a string, but I do not call it rotate. Of > course, reprensenting a value in different ways does not change its > meaning, an integer value equal to eighteen is not going to represent > nothing different because I print it as "18", "022", "0x12" or "2*9" > at different times. For time it does change its meaning. Per the Apollo example the same UTC time has different meanings in local time. On the US East coast it meant viewing at mid morning for the Pacific coast it meant viewing in the early morning. If you do not think that is different talk to someone who is not an early riser:) > >> A timestamp(without time zone) just records the >> date and time portions without reference to a timezone. > > Nope. A ts records an instant in time. It is converted to string by > default withtout printing a timezone, but this is becuase they are > used this way. > > Also "just records" means tstz records more stuff. So it has more > info. So I should not be able to build an isomorphism between then? > > >> This means you have >> an un-anchored timestamp and a future of trying to reconstruct the original >> point in time. This is a very important distinction and the reason why if >> you care about accurate date time's you store as timestamp with time zone. >> In other words timestamp with time zone is more then 'just a number'. > > Nope. If you care about "easy input output" you store as tstz. It has > nothing to do with accuracy ( for a correct manipulation, the thing is > it is easier to wrtie correct code for tstz than for ts, but they are > equally acurate ). In fact, if I store "2010-01-01 > 12:23:45"::timestamp I know I'm getting "2010-01-01 12:23:45" in the But you have lost all context. Unless you are independently recording where the timestamp originated you have lost the actual point in time it represents. > future, independently of where I read it, so normally I use it when I > want to preserver "string" values but have some special "numeric" > semantics. When I store instants in time I need to make sure my string > input values accurately designate them, and I normally use tstz > because its default input conversions process those correct formats I > allow ( and screen before sending them for conversion ). > > Tell me one thing you can do with a timestamp with timezone that you > cannot do with a int8 number or a with a timestamp without time zone. Retrieve a anchored timestamp without bringing in more info. As I said above: '2018-05-13 11:53:00' has no meaning to me '2018-05-12 11:05:00-07' does > > > Given there is an isomorphism between them ( and, in postgres, the > text version is the same for the inverse, IIRC "one=the_other at > timezone UTC" ) and they have the same representation, I doubt it. > They are just convenient ways to manipulate text values and apply > functions based on type, but they do not have a time zone. > > You can not get a time zone from a timestamp with time zone value. > What leads to countless confussion is it seems that a tstz-value > should be composed of a ts-value plus a tz-value, but they are not. Is > just a problem with naming. The stored value is value+00(implied). The returned values sure look like they have tz-value: select * from ts_comparison ; id | ts_tz | ts_naive ----+-------------------------------+---------------------------- 1 | 2018-05-12 11:04:44.161849-07 | 2018-05-12 11:04:44.161849 1 | 2018-05-12 11:05:00-07 | 2018-05-12 11:05:00 1 | 2018-05-12 11:05:00-07 | 2018-05-12 11:05:00 2 | 2018-05-13 08:53:00-07 | 2018-05-13 11:53:00 3 | 2018-05-13 11:53:00-07 | 2018-05-13 11:53:00 > > Francisco Olarte. > -- Adrian Klaver adrian.klaver@aklaver.com
On 2018-05-13 12:46:42 -0700, Adrian Klaver wrote: > Not trying to trick anyone and no magic. The difference in the represented > values between ts_tz and ts_naive is the heart of my argument. Timestamptz > values are stored in manner that allows you to have the output with a time > zone offset. Timestamps w/notz are not. I disagree. The difference isn't in how they are *stored*. We have already established that they are stored in the same format. The difference is in their *semantics*. A timestamptz denotes a unique and unambiguous point in time. This point in time can be represented in various time zones. So the point in time when Apollo 11 launched can be represented as '1969-07-16 09:32:00-04' (local time) or '1969-07-16 13:32:00+00' (UTC) or '1969-07-16 14:32:00+01' (CET). These are just different ways to denote the same point in time - and in fact all three are stored as the same timestamptz value (-14552880000000, I think). Only when displaying the value or doing certain operations on it is it converted to YMDhmsfz format. A timestamp without timezone does NOT denote an unambiguous point in time. It is just a compact form of representing a date and time. But without any additional context (the location or time zone) this doesn't tell you much. '2018-01-01 00:00' in Kiribati was 25 hours before '2018-01-01 00:00' in American Samoa. > > But timestamps do not have timezone. They are points in the time line. > > Points in earth surface have timezones, countries have timezones, but > > nor timestamp. > > I don't know about you but I am living on the earths surface:). That means > when I deal with timestamps they are with reference to a location. But when you store a timestamp as a timestamptz, you lose that reference to a location. All that is left is an abstract reference to a point in time. Only when you read that value again (and do certain operations with it) is that reference to a location added again - but the current location of the reader, not the the original locaton (that is lost forever, unless it was stored elsewhere). > I will agree that timestamptz is stored as number only. However that number > in Postgres has an implied time zone of UTC: > > https://www.postgresql.org/docs/10/static/datatype-datetime.html#DATATYPE-DATETIME-INPUT > > "For timestamp with time zone, the internally stored value is always in UTC > (Universal Coordinated Time, traditionally known as Greenwich Mean Time, > GMT)" This is not actually true. There is nothing in the storage format which depends on UTC (well, the epoch is at Midnight UTC, at if you say the epoch is at 08:00 Beijing time it is equally correct). hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment
On 05/13/2018 03:45 PM, Peter J. Holzer wrote: > On 2018-05-13 12:46:42 -0700, Adrian Klaver wrote: >> Not trying to trick anyone and no magic. The difference in the represented >> values between ts_tz and ts_naive is the heart of my argument. Timestamptz >> values are stored in manner that allows you to have the output with a time >> zone offset. Timestamps w/notz are not. > > I disagree. The difference isn't in how they are *stored*. We have > already established that they are stored in the same format. > > The difference is in their *semantics*. Exactly, timestamptz allows you to retrieve an unambiguous point in time, timestamp does not. We could argue all day about the details, but the previous sentence is the important distinction. > > A timestamptz denotes a unique and unambiguous point in time. This point > in time can be represented in various time zones. So the point in time > when Apollo 11 launched can be represented as '1969-07-16 09:32:00-04' > (local time) or '1969-07-16 13:32:00+00' (UTC) or '1969-07-16 > 14:32:00+01' (CET). These are just different ways to denote the same > point in time - and in fact all three are stored as the same timestamptz > value (-14552880000000, I think). Only when displaying the value or > doing certain operations on it is it converted to YMDhmsfz format. > > A timestamp without timezone does NOT denote an unambiguous point in > time. It is just a compact form of representing a date and time. But > without any additional context (the location or time zone) this doesn't > tell you much. '2018-01-01 00:00' in Kiribati was 25 hours before > '2018-01-01 00:00' in American Samoa. > > >>> But timestamps do not have timezone. They are points in the time line. >>> Points in earth surface have timezones, countries have timezones, but >>> nor timestamp. >> >> I don't know about you but I am living on the earths surface:). That means >> when I deal with timestamps they are with reference to a location. > > But when you store a timestamp as a timestamptz, you lose that reference > to a location. All that is left is an abstract reference to a point in > time. Only when you read that value again (and do certain operations A point in time anchored to location, the prime meridian. Now I agree you lose the original location information, but for many operations that is not important as you can reconstitute any location at a later date. For those operations where it is important, ideas have been floated on this list for dealing with this. I remember a proposal for a composite type that included the timestamp and the original timezone. That became an actual extension(?) at some point, but I cannot dig it up at the moment. > with it) is that reference to a location added again - but the current > location of the reader, not the the original locaton (that is lost > forever, unless it was stored elsewhere). > > >> I will agree that timestamptz is stored as number only. However that number >> in Postgres has an implied time zone of UTC: >> >> https://www.postgresql.org/docs/10/static/datatype-datetime.html#DATATYPE-DATETIME-INPUT >> >> "For timestamp with time zone, the internally stored value is always in UTC >> (Universal Coordinated Time, traditionally known as Greenwich Mean Time, >> GMT)" > > This is not actually true. There is nothing in the storage format which > depends on UTC (well, the epoch is at Midnight UTC, at if you say the Agreed it is a stored number. What makes timestamptz work is that the number is relative to Midnight UTC and that the Postgres datetime code knows this and uses that knowledge to create unambiguous points in time. If you where to throw that number at some other program without any context then you would be dealing with just a number. The point is that it is being dealt with inside Postgres from a known point of reference and so the stored number is more then just a number. > epoch is at 08:00 Beijing time it is equally correct). > > hp > -- Adrian Klaver adrian.klaver@aklaver.com