Re: to_timestamp() and timestamp without time zone - Mailing list pgsql-general

From hernan gonzalez
Subject Re: to_timestamp() and timestamp without time zone
Date
Msg-id BANLkTinoZpzHxd9tMAJFmnZfmUk05LV+5A@mail.gmail.com
Whole thread Raw
In response to Re: to_timestamp() and timestamp without time zone  (Steve Crawford <scrawford@pinpointresearch.com>)
Responses Re: to_timestamp() and timestamp without time zone  ("David Johnston" <polobo@yahoo.com>)
Re: to_timestamp() and timestamp without time zone  (Adrian Klaver <adrian.klaver@gmail.com>)
List pgsql-general
As I understand it, documentation patches are welcomed:)

I'd indeed wish some radical changes to the documentation. 

To start with, the fundamental data type names are rather misleading; SQL standard sucks here, true, but Postgresql also has its idiosincracies, and the docs do not help much:


The most confusing thing is that PG's "TIMESTAMP WITH TIMEZONE" is NOT... a timestamp with timezone! (not even in the crippled sense of the SQL standard, which thinks of a "timestamp with offset"). It actually has no relation with timezones. It's actually a plain "timestamp", as the world is commonly used in computing, the "INSTANT" of time in which (typically) an event happened. It's a physical concept, not related with civil things (even with calendars). Typical example: the "modification time" of a file in a unix filesystem: here the timestamp is stored internally as a Unix integer time (seconds, relative to the Unix epoch), and when doing a "ls" the OS displays it using the current timezone; but it's very clear that changing the timezone only changes the output representation. The same happens in Postgresql. A stored "TIMESTAMP WITH TIMEZONE" will be the same physical instant (say, the instant for the first plane crash at S/11) regardless of the timezone that is used (ONLY!) for output/input representation: '2011-09-11 08:46:40-04' and '2011-09-11 09:46:40-03' are the same INSTANT.


db=# set TIMEZONE='US/Eastern';        
db=# select inst from test_dates where rid=5;
 2011-09-11 08:46:40-04
db=# set TIMEZONE='America/Argentina/Buenos_Aires';
db=# select inst from test_dates where rid=5;
 2011-09-11 09:46:40-03

This is the type that should normally be used to record the time at which an event happened (typically a record modification - like MYSQL uses the world "TIMESTAMP").

On the other hand, "TIMESTAMP WITHOUT TIMEZONE" is a wholy different concept (neither 'wider' or narrow' type than the other). It's just the local calendar time, it's (conceptually) like a tuple of numbers{year,month,day,hour,min,sec, fracsecs}, the "date and time" as it's understood in the business-civil world, with the timezone information missing. 
This is the type that should be used for that concept, when it's critical for me that 'If I stored "2011,09,11 at 23:59:59", I want to get that precise calendar date, no matter what the server timezone is, no matter it changes, or the DST rules changes, or have bugs or anything': timezones should not be involved AT ALL when manipulating them(parsing, outputing, storing, retrieving, comparing, extracting date or time, etc). Of course, this is internally implemented (for mere convenience of space and arithmetic) using a UTC timestamp (but this should not leak - from the user point of view, everything should be as if the tuple of numbers was stored).

To resume: 

"TIMESTAMP WITH TIMEZONE" is an INSTANT
"TIMESTAMP WITHOUT TIMEZONE" is a LOCALDATETIME

(BTW, this mirrors the Joda-Time JAVA API terminology http://joda-time.sourceforge.net/ )

Neither of them has or implies a TIMEZONE. The first can accept/use one TZ for input/ouput representation, that's all.

Docs should make clear this, and the fact that Postgresql currently lacks a "FULL" datetime type. This would correspond conceptually to the tuple {INSTANT,TIMEZONE} (which implies univocally also a LOCALDATETIME). Nor does it implemented the SQL standard "TIMESTAMP WITH TIMEZONE" (actually a {INSTANT,OFFSET} tuple). I'm ok with that, I agree that the later is crippled, and the former is complex and not standard, and that both have little support from upper layers. 

The problem, for me, is that Postgresql does not emphasize the conceptual difference between this types, and that it tends too easily IMO to "assume" the missing information (the timezone) even when probably the user does not want to make that assumption. For example, I hope someday PG disables the implicit casting between these two types.

In general, I feel that sometimes Postgresql pretends to know too much about the data. Say, when I store a local datetime (TIMESTAMP WITHOUT TIMEZONE), I wish to tell him: "this a LOCAL datetime, please don't pretend EVER to know its timezone, you don't need it; if I asked you to select all LOCAL DATE TIMES before "2001/2/3 0:0:0", if you for doing that are invoking some timezone convertion or intelligence, you are messsing with my data - stop doing that". 

You can see in this SO question some (non PG specific) discussion about datetimes and DBs with potentially many timezones. 
The most upvoted recipe is: "Persist globally, display locally". I.e., don't pretend to store the full date time info, only the instant, let the upper layers fill the tz info (perhaps from other data, the user profile, or whatever) and do the display. I don't endorse this fully (sometimes the DB must know, eg for doing comparisons and arithmetic), but sometimes I feel that Postgresql lacks this perspective, and its mindset lies in the other extreme: "I know everything (timezones), I do everything (display)". 
When I read the PG docs I'm frequently uncomfortable to see many explanations of fundamentals tied to "how PG parses/displays" the data. See the docs for the TIMESTAMP types, where the difference between them is dealt only in this context. As most developers, I'm using some client interface (say, JDBC), why should I care about how PG converts data from/to human readable form? I'm not asking it to do that, I tend to think, I'm just using it for the persisting layer. Only gradually one (I) graps that human-readable strings are at the core of PG data handling, and that client interfaces must deal with that. It would be ludicrous for me to critize that, all that has surely some deep justification. But anyway, from the point of view of the app developer, it sounds strange sometimes. 

BTW, I was curious to see how the JDBC client code dealt wit this, if he was confortable and if all was clear and foolproof with these TIMESTAMP types at that level. Well, not very - it seems. See http://goo.gl/L2Pzi , search for "compromise". Probably JDBC is also to blame here, but anyway...

Regards

Hernan J Gonzalez

pgsql-general by date:

Previous
From: Steve Crawford
Date:
Subject: Re: to_timestamp() and timestamp without time zone
Next
From: Hiroshi Saito
Date:
Subject: Re: UUID-OSP contrib module