Thread: Alter timestamp without timezone to with timezone rewrites rows

Alter timestamp without timezone to with timezone rewrites rows

From
Dorian Hoxha
Date:
Hi team,

I have a table with a column of "timestamp without timezone" and I want to alter it to a column of "timestamp with timezone" without rewriting the rows.

Looking at the docs, and doing some quick tests, the data-on-disk is identical for both columns. But when doing an alter table, looks like it's rewriting the rows: (PostgreSQL 13.1)

guru=# create table tt(m timestamp without time zone default now());
guru=# insert into tt(m) values(now());
guru=# SELECT xmin, xmax, cmin, cmax, m from tt;
 xmin | xmax | cmin | cmax |             m              
------+------+------+------+----------------------------
  695 |    0 |    0 |    0 | 2021-01-13 11:47:59.146952
(1 row)

guru=# alter table tt alter column m type timestamp with time zone;
guru=# SELECT xmin, xmax, cmin, cmax, m from tt;
 xmin | xmax | cmin | cmax |               m              
------+------+------+------+-------------------------------
  696 |    0 |    4 |    4 | 2021-01-13 11:47:59.146952+01
(1 row)

Is there any reason to rewrite the rows or is it just an optimization that no one hasn't sent a patch ?

Regards,
Dorian

Re: Alter timestamp without timezone to with timezone rewrites rows

From
Ashutosh Bapat
Date:
On Wed, Jan 13, 2021 at 4:24 PM Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
>
> Hi team,
>
> I have a table with a column of "timestamp without timezone" and I want to alter it to a column of "timestamp with
timezone"without rewriting the rows.
 
>
> Looking at the docs, and doing some quick tests, the data-on-disk is identical for both columns. But when doing an
altertable, looks like it's rewriting the rows: (PostgreSQL 13.1)
 
>
> guru=# create table tt(m timestamp without time zone default now());
> guru=# insert into tt(m) values(now());
> guru=# SELECT xmin, xmax, cmin, cmax, m from tt;
>  xmin | xmax | cmin | cmax |             m
> ------+------+------+------+----------------------------
>   695 |    0 |    0 |    0 | 2021-01-13 11:47:59.146952
> (1 row)
>
> guru=# alter table tt alter column m type timestamp with time zone;
> guru=# SELECT xmin, xmax, cmin, cmax, m from tt;
>  xmin | xmax | cmin | cmax |               m
> ------+------+------+------+-------------------------------
>   696 |    0 |    4 |    4 | 2021-01-13 11:47:59.146952+01
> (1 row)

+01 indicates that there's timezone information added to the data, so
the rows aren't identical. Here's some more SQL run on my laptop which
shows that
postgres=# create table tt(m timestamp without time zone default now());
postgres=# insert into tt(m) values(now());
INSERT 0 1
postgres=# SELECT xmin, xmax, cmin, cmax, m from tt;
 xmin | xmax | cmin | cmax |             m
------+------+------+------+----------------------------
  509 |    0 |    0 |    0 | 2021-01-13 19:23:25.647806
(1 row)

postgres=# show timezone;
   TimeZone
--------------
 Asia/Kolkata
(1 row)

postgres=# set timezone TO 'Asia/Manila';
SET
postgres=# show timezone;
  TimeZone
-------------
 Asia/Manila
(1 row)

postgres=# SELECT xmin, xmax, cmin, cmax, m from tt;
 xmin | xmax | cmin | cmax |             m
------+------+------+------+----------------------------
  509 |    0 |    0 |    0 | 2021-01-13 19:23:25.647806
(1 row)

-- Note that this output is same as the above one even though I have
changed the timezone setting.

postgres=# reset timezone;
RESET
postgres=# show timezone;
   TimeZone
--------------
 Asia/Kolkata
(1 row)

postgres=# alter table tt alter column m type timestamp with time zone;
ALTER TABLE
postgres=# SELECT xmin, xmax, cmin, cmax, m from tt;
 xmin | xmax | cmin | cmax |                m
------+------+------+------+----------------------------------
  510 |    0 |    4 |    4 | 2021-01-13 19:23:25.647806+05:30
(1 row)

postgres=# set timezone TO 'Asia/Manila';
SET
postgres=# SELECT xmin, xmax, cmin, cmax, m from tt;
 xmin | xmax | cmin | cmax |               m
------+------+------+------+-------------------------------
  510 |    0 |    4 |    4 | 2021-01-13 21:53:25.647806+08
(1 row)

See the difference in the output when the timezone setting is changed.

-- 
Best Wishes,
Ashutosh Bapat



Re: Alter timestamp without timezone to with timezone rewrites rows

From
"David G. Johnston"
Date:
On Wed, Jan 13, 2021 at 6:59 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
+01 indicates that there's timezone information added to the data, so
the rows aren't identical. Here's some more SQL run on my laptop which
shows that

This is indeed true but examples that use the textual representation of the data don't support the claim.  Both types effectively store a point-in-time in UTC, the "with timezone" just does so explicitly - but the behavior for conversions from "without timezone" treat the stored time as being UTC as well.  It would be possible to simply store a timezone-less timestamp in both cases and assume UTC (or not) when displaying the value based upon whether the datatype of the value is determined to be "with timezone" or not (the former also undergoing rotation based upon the runtime value of the timezone setting).  The fact that pg_attribute is required to interpret the data suggests that the stored data doesn't care about its named datatype and that runtime interpretation of the value based upon datatype would be possible.

To be clear, I don't envision the current status changing - we were able to avoid a rewrite with varchar(n) -> text because the stored data was indeed identical.  Reading the documentation it does say, though maybe not as explicitly as it could, that the physical storage of a timestamptz includes an explicit UTC data component ("the internally stored value is always in UTC") while the storage of a timestamp does not (i.e., there is no such verbiage).

David J.



Re: Alter timestamp without timezone to with timezone rewrites rows

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wed, Jan 13, 2021 at 6:59 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
> wrote:
>> +01 indicates that there's timezone information added to the data, so
>> the rows aren't identical. Here's some more SQL run on my laptop which
>> shows that

> This is indeed true but examples that use the textual representation of the
> data don't support the claim.

The physical representation of the data is identical, yes, but the
*interpretation* is not.  In modern PG, timestamptz is int64 microseconds
counted from 2000-01-01 00:00:00 UTC, while timestamp is int64 microseconds
counted from 2000-01-01 00:00:00 local time (whatever you think local time
is).  Also, while the display (not storage) of timestamptz accounts for
local daylight-savings rules, display of timestamp values does not.
Thus, the offset between the stored values of timestamptz and timestamp
for the "same" date/time will vary over time.  Where I live, there's
a five-hour offset right now, but for much of the year it's a four-hour
offset.

So a non-rewriting conversion would only be possible if local time is
identical to UTC; which is true for few enough people that nobody has
bothered with attempting the optimization.  (From memory, the existing
method for deciding whether non-rewriting conversion is possible could not
cope with such an environment-dependent rule anyway, so some significant
trouble would be involved to figure out how to do it.)

            regards, tom lane



Re: Alter timestamp without timezone to with timezone rewrites rows

From
Noah Misch
Date:
On Wed, Jan 13, 2021 at 10:28:26AM -0500, Tom Lane wrote:
> So a non-rewriting conversion would only be possible if local time is
> identical to UTC; which is true for few enough people that nobody has
> bothered with attempting the optimization.

PostgreSQL 12 and later do have that optimization.  Example DDL:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3c59263#patch4



Re: Alter timestamp without timezone to with timezone rewrites rows

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Wed, Jan 13, 2021 at 10:28:26AM -0500, Tom Lane wrote:
>> So a non-rewriting conversion would only be possible if local time is
>> identical to UTC; which is true for few enough people that nobody has
>> bothered with attempting the optimization.

> PostgreSQL 12 and later do have that optimization.  Example DDL:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3c59263#patch4

Ah, I'd forgotten about that patch (obviously).  It is a kluge, without
a doubt, since it has to hard-wire knowledge about the behavior of two
specific conversions into what ought to be general-purpose code.  But
I guess it is useful often enough to justify its existence.

I wonder whether it'd be worth moving this logic into a planner support
function attached to those conversion functions?  I wouldn't bother
right now, but if somebody else comes along with a similar proposal,
we should think hard about that.

            regards, tom lane



Re: Alter timestamp without timezone to with timezone rewrites rows

From
Noah Misch
Date:
On Sat, Jan 16, 2021 at 12:03:19PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Jan 13, 2021 at 10:28:26AM -0500, Tom Lane wrote:
> >> So a non-rewriting conversion would only be possible if local time is
> >> identical to UTC; which is true for few enough people that nobody has
> >> bothered with attempting the optimization.
> 
> > PostgreSQL 12 and later do have that optimization.  Example DDL:
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3c59263#patch4
> 
> Ah, I'd forgotten about that patch (obviously).  It is a kluge, without
> a doubt, since it has to hard-wire knowledge about the behavior of two
> specific conversions into what ought to be general-purpose code.  But
> I guess it is useful often enough to justify its existence.
> 
> I wonder whether it'd be worth moving this logic into a planner support
> function attached to those conversion functions?  I wouldn't bother
> right now, but if somebody else comes along with a similar proposal,
> we should think hard about that.

Yes.