Thread: timestamp and timestamptz

timestamp and timestamptz

From
Niels Jespersen
Date:

Hello all

 

We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to access the data in our native timezone (CET).

 

I am considering a few alternatives:

 

1.       Early in the process, convert to timestamptz and keep this datatype.

2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the data in the timestamp column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future kept in another country in another timezone.

 

I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz.

 

Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?

 

set timezone to 'cet';

select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t limit 10;

 

We are on Postgres 12.

 

Regards Niels Jespersen

Re: timestamp and timestamptz

From
Magnus Hagander
Date:


On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen <NJN@dst.dk> wrote:

Hello all

 

We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to access the data in our native timezone (CET).

 

I am considering a few alternatives:

 

1.       Early in the process, convert to timestamptz and keep this datatype.

2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the data in the timestamp column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future kept in another country in another timezone.

 

I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz.


Yes, you should.

 

Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?

 

set timezone to 'cet';

select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t limit 10;



As long as you use option 1:

SELECT read_time
will return the time in CET (as a timestamptz) after you've set timezone to 'cet'. If you set timezone to 'utc' it will directly return utc.

SELECT read_time AT TIME ZONE 'utc'
will return the time in UTC (as a timestamp)


And just make sure you have done a "set time zone 'utc'" before you *load* the data, and everything should just work automatically.

--

Re: timestamp and timestamptz

From
Steve Crawford
Date:
What is the exact format of the timestamp in the CSV? As long as it is in a "fully qualified" format, i.e. includes the time-zone offset, then you will have no problem as the data represents a point in time.

It is easier to conceptualize "time stamp with time zone" (timestamptz) as actually representing a "point in time". If your client is set to CET then the data will display as desired.

-Steve

On Wed, Apr 15, 2020 at 10:50 AM Niels Jespersen <NJN@dst.dk> wrote:

Hello all

 

We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to access the data in our native timezone (CET).

 

I am considering a few alternatives:

 

1.       Early in the process, convert to timestamptz and keep this datatype.

2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the data in the timestamp column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future kept in another country in another timezone.

 

I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz.

 

Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?

 

set timezone to 'cet';

select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t limit 10;

 

We are on Postgres 12.

 

Regards Niels Jespersen

Re: timestamp and timestamptz

From
Steve Crawford
Date:


On Wed, Apr 15, 2020 at 11:06 AM Steve Crawford <scrawford@pinpointresearch.com> wrote:
What is the exact format of the timestamp in the CSV? As long as it is in a "fully qualified" format, i.e. includes the time-zone offset, then you will have no problem as the data represents a point in time.

It is easier to conceptualize "time stamp with time zone" (timestamptz) as actually representing a "point in time". If your client is set to CET then the data will display as desired.

-Steve

On Wed, Apr 15, 2020 at 10:50 AM Niels Jespersen <NJN@dst.dk> wrote:

Hello all

 

We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to access the data in our native timezone (CET).

 

I am considering a few alternatives:

 

1.       Early in the process, convert to timestamptz and keep this datatype.

2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the data in the timestamp column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future kept in another country in another timezone.

 

I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz.

 

Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?

 

set timezone to 'cet';

select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t limit 10;

 

We are on Postgres 12.

 

Regards Niels Jespersen


Oops - sorry for top-post. Darn GMail.

-Steve 

Re: timestamp and timestamptz

From
Tim Cross
Date:
Niels Jespersen <NJN@dst.dk> writes:

> Hello all
>
>  
>
> We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to access
thedata in
 
> our native timezone (CET). 
>
>  
>
> I am considering a few alternatives: 
>
>  
>
> 1.       Early in the process, convert to timestamptz and keep this datatype. 
>
> 2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the
datain the timestamp
 
> column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future
keptin another
 
> country in another timezone. 
>
>  
>
> I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz. 
>
>  
>
> Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?
>
>  
>
> set timezone to 'cet';
>
> select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone
'utc')::timestamptz
> read_time_tz from t limit 10;
>
>  
>
> We are on Postgres 12. 
>
>  
>

Keep your life simple - just go with option 1. Keep all timestamps in
UTC and let clients deal with it in whatever way they need to. This will
also help deal with issues associated with daylight savings time (DST
can be a real pain as different locations have it and others don't and
the 'switchover' date is subject to political whims and can change).
Your option 2 will cause all sorts of issues and keep in mind that most
larger countries have multiple timezones, so even if your data is all
associated with a single country, you can potentially have multiple
conversion routines required. On most *nix systems, clock time is UTC as
well, so having everything in UTC really helps when you want to do
diagnosis across your database and system log files etc. 

-- 
Tim Cross



Re: timestamp and timestamptz

From
raf
Date:
Tim Cross wrote:

> Niels Jespersen <NJN@dst.dk> writes:
> 
> > Hello all
> >
> > We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to
accessthe data in
 
> > our native timezone (CET). 
> >
> > I am considering a few alternatives: 
> >
> > 1.       Early in the process, convert to timestamptz and keep this datatype. 
> >
> > 2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the
datain the timestamp
 
> > column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future
keptin another
 
> > country in another timezone. 
> >
> > I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz.

> >
> > Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?
> >
> > set timezone to 'cet';
> >
> > select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone
'utc')::timestamptz
> > read_time_tz from t limit 10;
> >
> > We are on Postgres 12. 
> 
> Keep your life simple - just go with option 1. Keep all timestamps in
> UTC and let clients deal with it in whatever way they need to.

That suggestion (keep timestamps in UTC) doesn't sound like option 1 to me
(i.e. convert timestamps-that-are-in-UTC-but-don't-record-the-fact to timestamptz).
Option 1 could be done using any timezone. The point is to record what the
timezone is. It doesn't matter whether it's left as UTC or converted to CET
as long as that decision is recorded in the data by using timestamptz (and
as long as you get postgres to perform the time zone conversion because it
will get it right).

But presumably there is some reason why the data is already in UTC
and there's probably no reason to change that. But its timezone should
be recorded.

> This will
> also help deal with issues associated with daylight savings time (DST
> can be a real pain as different locations have it and others don't and
> the 'switchover' date is subject to political whims and can change).

That's a dim view of the tzdata database which gets updated regularly
to take such political decisions into account. As long as postgres uses
the same tzdata as all good UNIX-based operating systems do, and it's kept
up to date, it should be fine (as long as you never trust what a Windows
host says the timezone is for arbitrary timestamps).

This is required even if you leave data in UTC if it ever needs to be
displayed in any other time zone. Postgres still needs up to date tzdata
to perform the conversions later for users.

> Your option 2 will cause all sorts of issues and keep in mind that most
> larger countries have multiple timezones, so even if your data is all
> associated with a single country, you can potentially have multiple
> conversion routines required. On most *nix systems, clock time is UTC as
> well, so having everything in UTC really helps when you want to do
> diagnosis across your database and system log files etc. 

I don't see much difference in storing a timestamptz in UTC or a timestamptz
in CET. As long as the intended offset from UTC is recorded (which it is
in a timestamptz) it should be fine. If the timestamp is CET, then that fact
is in the data and the user doesn't need to be told it separately. It's obvious
when they see the data because the timezone is part of the data.

And you don't need conversion routines. Postgres can compare timestamptz
values and convert from one time zone to another for display purposes.

However, option 2 seems to be converting the data to CET but not using the
timestamptz datatype. That would be a bad idea. I think timestamptz should
always be preferred to timestamp. Not recording the timezone is where the
problems come from.

Although having a single timezone for log files is a great idea. They hardly
ever include timezone information so keeping everything in the same timezone
is important. Mind you, they hardly ever even include the year. What's with
that? (yes, I'm looking at you /var/log).

> -- 
> Tim Cross

cheers,
raf




Re: timestamp and timestamptz

From
"David G. Johnston"
Date:
On Wed, Apr 15, 2020 at 4:53 PM raf <raf@raf.org> wrote:
I don't see much difference in storing a timestamptz in UTC or a timestamptz
in CET. As long as the intended offset from UTC is recorded (which it is
in a timestamptz) it should be fine.

I only really skimmed the entire response but this framing of how timestamptz stores results is wrong.  Once you've stored a timestamptz in PostgreSQL you no longer have any knowledge of the timezone.  If you truly need that you need to record that in a different field.  What you do know is that PostgreSQL has a known point-in-time in UTC and can give you back the same value expressed in any other timezone according to the rules in the timezone database.

Or, as written verbatim in the documentation:
"""
For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone.
"""


David J.

SV: timestamp and timestamptz

From
Niels Jespersen
Date:

 

 

Fra: Magnus Hagander <magnus@hagander.net>
Sendt: 15. april 2020 20:05
Til: Niels Jespersen <NJN@dst.dk>
Cc: pgsql-general@lists.postgresql.org
Emne: Re: timestamp and timestamptz

 

 

 

On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen <NJN@dst.dk> wrote:

Hello all

 

We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to access the data in our native timezone (CET).

 

I am considering a few alternatives:

 

1.       Early in the process, convert to timestamptz and keep this datatype.

2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the data in the timestamp column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future kept in another country in another timezone.

 

I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz.

 

Yes, you should.

 

 

Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?

 

set timezone to 'cet';

select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t limit 10;

 

 

As long as you use option 1:

 

SELECT read_time

will return the time in CET (as a timestamptz) after you've set timezone to 'cet'. If you set timezone to 'utc' it will directly return utc.

 

SELECT read_time AT TIME ZONE 'utc'

will return the time in UTC (as a timestamp)

 

 

And just make sure you have done a "set time zone 'utc'" before you *load* the data, and everything should just work automatically.

 

--

 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

 

 

Thank you Magnus (and others) for your replies.

 

The raw input data are in this, slightly strange format: 2019.05.01 00:00:00. No timezone indicator, just an informal guarantee from the supplier that it is indeed utc. And no real chance of changing the format. We know, from experience.

 

The data volume is a bit high, a few billion rows pr month. So, table partitioning is very helpful (aka really indispensable). Data will be aggregated in several ways for analytics. Time aggregations must be according to our local timezone (cet). We do not want data from one day being aggregated into the wrong date because of timezone issues. This means that partition boundaries (monthly pratitions most often, sometimes day partitions) must be on CET-boundaries so that partition pruning will pull data from the relevant cet-month not the utc-month.

 

Now, if I load data into a timestamptz with timezone set to utc, partition to cet-boundaries, query and aggredate with timezone set to cet, everything wil be ok, I think. My small testcase below shows that the row goes into the april-partition (as it should). The planner does the correct partition pruning according to specified filtering and set timezone. All good.

 

create table t (t_id bigserial, ts timestamptz) partition by range (ts);

create table t_2020_02 partition of t for values from ('2020-02-01 00:00:00+01') to ('2020-03-01 00:00:00+01');

create table t_2020_03 partition of t for values from ('2020-03-01 00:00:00+01') to ('2020-04-01 00:00:00+02');

create table t_2020_04 partition of t for values from ('2020-04-01 00:00:00+02') to ('2020-05-01 00:00:00+02');

 

set timezone to 'utc';

insert into t (ts) values('2020-03-31 23:30:00');

 

Once again, thank you for invaluable feedback.

 

Niels Jespersen

 

Re: timestamp and timestamptz

From
raf
Date:
David G. Johnston wrote:

> On Wed, Apr 15, 2020 at 4:53 PM raf <raf@raf.org> wrote:
> 
> > I don't see much difference in storing a timestamptz in UTC or a
> > timestamptz
> > in CET. As long as the intended offset from UTC is recorded (which it is
> > in a timestamptz) it should be fine.
> 
> I only really skimmed the entire response but this framing of how
> timestamptz stores results is wrong.  Once you've stored a timestamptz in
> PostgreSQL you no longer have any knowledge of the timezone.  If you truly
> need that you need to record that in a different field.  What you do know
> is that PostgreSQL has a known point-in-time in UTC and can give you back
> the same value expressed in any other timezone according to the rules in
> the timezone database.
> 
> Or, as written verbatim in the documentation:
> """
> For timestamp with time zone, the internally stored value is always in UTC
> (Universal Coordinated Time, traditionally known as Greenwich Mean Time,
> GMT). An input value that has an explicit time zone specified is converted
> to UTC using the appropriate offset for that time zone. If no time zone is
> stated in the input string, then it is assumed to be in the time zone
> indicated by the system's TimeZone parameter, and is converted to UTC using
> the offset for the timezone zone.
> """
> 
> https://www.postgresql.org/docs/12/datatype-datetime.html
> 
> David J.

You are misinterpreting the documentation, or the
documentation is incomplete/misleading at that
location. It doesn't just convert and store the time in
UTC. It stores the time in UTC and it also stores the
offset from UTC as determined by the time zone
specified on input (that section of the documentation
might not mention that fact but it is true
nonetheless).

I store localtimes in the database and always see the
offset when selecting it later. The timezone
information doesn't just disappear as you (or that
piece of documentation) might be suggesting.

If you don't believe me, try this:

  create table example (t timestamptz not null);
  insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 Australia/Sydney');
  select * from  example;
  drop table example;

When I run that, I see:

  CREATE TABLE
  INSERT 0 1
                t               
  ------------------------------
   2020-04-16 17:12:33.71768+10
  (1 row)
  
  DROP TABLE

So the timezone is stored (or the offset is stored if
you prefer). Had it been daylight savings on that date
in Sydney, the offset would have been "+11". It's all
fine.

cheers,
raf



Re: timestamp and timestamptz

From
Steve Baldwin
Date:
I'm pretty sure you are mistaken. Postgres doesn't store the 'creating' time zone in a timestamptz column.

Try doing this before re-running your test:

set timezone to 'utc';

What you are seeing in your test is an artifact of that timezone setting.

Steve

On Thu, Apr 16, 2020 at 5:19 PM raf <raf@raf.org> wrote:
David G. Johnston wrote:

> On Wed, Apr 15, 2020 at 4:53 PM raf <raf@raf.org> wrote:
>
> > I don't see much difference in storing a timestamptz in UTC or a
> > timestamptz
> > in CET. As long as the intended offset from UTC is recorded (which it is
> > in a timestamptz) it should be fine.
>
> I only really skimmed the entire response but this framing of how
> timestamptz stores results is wrong.  Once you've stored a timestamptz in
> PostgreSQL you no longer have any knowledge of the timezone.  If you truly
> need that you need to record that in a different field.  What you do know
> is that PostgreSQL has a known point-in-time in UTC and can give you back
> the same value expressed in any other timezone according to the rules in
> the timezone database.
>
> Or, as written verbatim in the documentation:
> """
> For timestamp with time zone, the internally stored value is always in UTC
> (Universal Coordinated Time, traditionally known as Greenwich Mean Time,
> GMT). An input value that has an explicit time zone specified is converted
> to UTC using the appropriate offset for that time zone. If no time zone is
> stated in the input string, then it is assumed to be in the time zone
> indicated by the system's TimeZone parameter, and is converted to UTC using
> the offset for the timezone zone.
> """
>
> https://www.postgresql.org/docs/12/datatype-datetime.html
>
> David J.

You are misinterpreting the documentation, or the
documentation is incomplete/misleading at that
location. It doesn't just convert and store the time in
UTC. It stores the time in UTC and it also stores the
offset from UTC as determined by the time zone
specified on input (that section of the documentation
might not mention that fact but it is true
nonetheless).

I store localtimes in the database and always see the
offset when selecting it later. The timezone
information doesn't just disappear as you (or that
piece of documentation) might be suggesting.

If you don't believe me, try this:

  create table example (t timestamptz not null);
  insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 Australia/Sydney');
  select * from  example;
  drop table example;

When I run that, I see:

  CREATE TABLE
  INSERT 0 1
                t               
  ------------------------------
   2020-04-16 17:12:33.71768+10
  (1 row)

  DROP TABLE

So the timezone is stored (or the offset is stored if
you prefer). Had it been daylight savings on that date
in Sydney, the offset would have been "+11". It's all
fine.

cheers,
raf


Re: timestamp and timestamptz

From
Pavel Stehule
Date:


čt 16. 4. 2020 v 9:19 odesílatel raf <raf@raf.org> napsal:
David G. Johnston wrote:

> On Wed, Apr 15, 2020 at 4:53 PM raf <raf@raf.org> wrote:
>
> > I don't see much difference in storing a timestamptz in UTC or a
> > timestamptz
> > in CET. As long as the intended offset from UTC is recorded (which it is
> > in a timestamptz) it should be fine.
>
> I only really skimmed the entire response but this framing of how
> timestamptz stores results is wrong.  Once you've stored a timestamptz in
> PostgreSQL you no longer have any knowledge of the timezone.  If you truly
> need that you need to record that in a different field.  What you do know
> is that PostgreSQL has a known point-in-time in UTC and can give you back
> the same value expressed in any other timezone according to the rules in
> the timezone database.
>
> Or, as written verbatim in the documentation:
> """
> For timestamp with time zone, the internally stored value is always in UTC
> (Universal Coordinated Time, traditionally known as Greenwich Mean Time,
> GMT). An input value that has an explicit time zone specified is converted
> to UTC using the appropriate offset for that time zone. If no time zone is
> stated in the input string, then it is assumed to be in the time zone
> indicated by the system's TimeZone parameter, and is converted to UTC using
> the offset for the timezone zone.
> """
>
> https://www.postgresql.org/docs/12/datatype-datetime.html
>
> David J.

You are misinterpreting the documentation, or the
documentation is incomplete/misleading at that
location. It doesn't just convert and store the time in
UTC. It stores the time in UTC and it also stores the
offset from UTC as determined by the time zone
specified on input (that section of the documentation
might not mention that fact but it is true
nonetheless).

I store localtimes in the database and always see the
offset when selecting it later. The timezone
information doesn't just disappear as you (or that
piece of documentation) might be suggesting.

If you don't believe me, try this:

  create table example (t timestamptz not null);
  insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 Australia/Sydney');
  select * from  example;
  drop table example;

When I run that, I see:

  CREATE TABLE
  INSERT 0 1
                t               
  ------------------------------
   2020-04-16 17:12:33.71768+10
  (1 row)

  DROP TABLE

So the timezone is stored (or the offset is stored if
you prefer). Had it been daylight savings on that date
in Sydney, the offset would have been "+11". It's all
fine.

surely not. Postgres doesn't store offset. Every timestamptz is transalated to UTC when it is parsed to binary form, and later translated to client time zone when it is displayed.

Regards

Pavel



cheers,
raf


Re: timestamp and timestamptz

From
raf
Date:
Steve Baldwin wrote:

> I'm pretty sure you are mistaken. Postgres doesn't store the 'creating'
> time zone in a timestamptz column.
> 
> Try doing this before re-running your test:
> 
> set timezone to 'utc';
> 
> What you are seeing in your test is an artifact of that timezone setting.
> 
> Steve

Thanks. You're right.

  create table example (t timestamptz not null);
  insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 Australia/Sydney');
  select * from  example;
  set timezone to 'utc';
  select * from  example;
  drop table example;

Does this:

  CREATE TABLE
  INSERT 0 1
              t               
  ------------------------------
   2020-04-16 17:12:33.71768+10
  (1 row)

  SET
                t               
  ------------------------------
   2020-04-16 07:12:33.71768+00
  (1 row)

  DROP TABLE

So it doesn't store the offset, but I've used postgres
for 12 years without knowing that and it hasn't been
a problem. Yay, postgres!

It doesn't store the offset but, by using timestamptz,
it knows that the timezone is UTC. That's what matters.
The fact that it knows the time zone is what makes everything
work. Timestamp without time zone is best avoided I think.

cheers,
raf






Re: timestamp and timestamptz

From
Magnus Hagander
Date:


On Thu, Apr 16, 2020 at 6:12 AM Niels Jespersen <NJN@dst.dk> wrote:

 

 

Fra: Magnus Hagander <magnus@hagander.net>
Sendt: 15. april 2020 20:05
Til: Niels Jespersen <NJN@dst.dk>
Cc: pgsql-general@lists.postgresql.org
Emne: Re: timestamp and timestamptz

 

 

 

On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen <NJN@dst.dk> wrote:

Hello all

 

We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to access the data in our native timezone (CET).

 

I am considering a few alternatives:

 

1.       Early in the process, convert to timestamptz and keep this datatype.

2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the data in the timestamp column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future kept in another country in another timezone.

 

I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz.

 

Yes, you should.

 

 

Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?

 

set timezone to 'cet';

select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t limit 10;

 

 

As long as you use option 1:

 

SELECT read_time

will return the time in CET (as a timestamptz) after you've set timezone to 'cet'. If you set timezone to 'utc' it will directly return utc.

 

SELECT read_time AT TIME ZONE 'utc'

will return the time in UTC (as a timestamp)

 

 

And just make sure you have done a "set time zone 'utc'" before you *load* the data, and everything should just work automatically.

 

--

 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

 

 

Thank you Magnus (and others) for your replies.

 

The raw input data are in this, slightly strange format: 2019.05.01 00:00:00. No timezone indicator, just an informal guarantee from the supplier that it is indeed utc. And no real chance of changing the format. We know, from experience.

 

The data volume is a bit high, a few billion rows pr month. So, table partitioning is very helpful (aka really indispensable). Data will be aggregated in several ways for analytics. Time aggregations must be according to our local timezone (cet). We do not want data from one day being aggregated into the wrong date because of timezone issues. This means that partition boundaries (monthly pratitions most often, sometimes day partitions) must be on CET-boundaries so that partition pruning will pull data from the relevant cet-month not the utc-month.

 

Now, if I load data into a timestamptz with timezone set to utc, partition to cet-boundaries, query and aggredate with timezone set to cet, everything wil be ok, I think. My small testcase below shows that the row goes into the april-partition (as it should). The planner does the correct partition pruning according to specified filtering and set timezone. All good.

 

create table t (t_id bigserial, ts timestamptz) partition by range (ts);

create table t_2020_02 partition of t for values from ('2020-02-01 00:00:00+01') to ('2020-03-01 00:00:00+01');

create table t_2020_03 partition of t for values from ('2020-03-01 00:00:00+01') to ('2020-04-01 00:00:00+02');

create table t_2020_04 partition of t for values from ('2020-04-01 00:00:00+02') to ('2020-05-01 00:00:00+02');

 

set timezone to 'utc';

insert into t (ts) values('2020-03-31 23:30:00');

 

Once again, thank you for invaluable feedback.

 


Yes, this should work just fine. The internal representation of timestamptz is always UTC, and it's only converted on entry/exit.

You can see this clearly if you create your partitions like above, and then do a "set timezone to 'America/Los_Angeles'" followed by \d+ t in psql. This will now show you what the partition bounds are in that timezone.

You can also just specify the timestamps when you create your partition without including the timezone (+01) specifier. In this PostgreSQL will interpret it as whatever your current value for the timezone setting is, so as long as it's CET it should work fine, and you don't have to remember which months are in DST and which are not.


--

SV: timestamp and timestamptz

From
Niels Jespersen
Date:

 

 

Fra: Magnus Hagander <magnus@hagander.net>
Sendt: 16. april 2020 10:28
Til: Niels Jespersen <NJN@dst.dk>
Cc: pgsql-general@lists.postgresql.org
Emne: Re: timestamp and timestamptz

 

 

 

On Thu, Apr 16, 2020 at 6:12 AM Niels Jespersen <NJN@dst.dk> wrote:

 

 

Fra: Magnus Hagander <magnus@hagander.net>
Sendt: 15. april 2020 20:05
Til: Niels Jespersen <NJN@dst.dk>
Cc: pgsql-general@lists.postgresql.org
Emne: Re: timestamp and timestamptz

 

 

 

On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen <NJN@dst.dk> wrote:

Hello all

 

We have some data that have entered a timestamp column from a csv. The data in the csv are in utc. We want to access the data in our native timezone (CET).

 

I am considering a few alternatives:

 

1.       Early in the process, convert to timestamptz and keep this datatype.

2.       Early in the process, convert to timestamp as understood in CET.  This will imply by convention that the data in the timestamp column represents CET. Users will need to be told that data represents CET, even if data is somwhere in the future kept in another country in another timezone.

 

I probably should choose 1 over 2. But I am a bit hesitant, probably because we almost never have used timestamptz.

 

Yes, you should.

 

 

Can we agree that the below query is selecting both the original utc timestamps and 2 and 1 (as decribed above)?

 

set timezone to 'cet';

select read_time read_time_utc, (read_time at time zone 'utc')::timestamp read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t limit 10;

 

 

As long as you use option 1:

 

SELECT read_time

will return the time in CET (as a timestamptz) after you've set timezone to 'cet'. If you set timezone to 'utc' it will directly return utc.

 

SELECT read_time AT TIME ZONE 'utc'

will return the time in UTC (as a timestamp)

 

 

And just make sure you have done a "set time zone 'utc'" before you *load* the data, and everything should just work automatically.

 

--

 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

 

 

Thank you Magnus (and others) for your replies.

 

The raw input data are in this, slightly strange format: 2019.05.01 00:00:00. No timezone indicator, just an informal guarantee from the supplier that it is indeed utc. And no real chance of changing the format. We know, from experience.

 

The data volume is a bit high, a few billion rows pr month. So, table partitioning is very helpful (aka really indispensable). Data will be aggregated in several ways for analytics. Time aggregations must be according to our local timezone (cet). We do not want data from one day being aggregated into the wrong date because of timezone issues. This means that partition boundaries (monthly pratitions most often, sometimes day partitions) must be on CET-boundaries so that partition pruning will pull data from the relevant cet-month not the utc-month.

 

Now, if I load data into a timestamptz with timezone set to utc, partition to cet-boundaries, query and aggredate with timezone set to cet, everything wil be ok, I think. My small testcase below shows that the row goes into the april-partition (as it should). The planner does the correct partition pruning according to specified filtering and set timezone. All good.

 

create table t (t_id bigserial, ts timestamptz) partition by range (ts);

create table t_2020_02 partition of t for values from ('2020-02-01 00:00:00+01') to ('2020-03-01 00:00:00+01');

create table t_2020_03 partition of t for values from ('2020-03-01 00:00:00+01') to ('2020-04-01 00:00:00+02');

create table t_2020_04 partition of t for values from ('2020-04-01 00:00:00+02') to ('2020-05-01 00:00:00+02');

 

set timezone to 'utc';

insert into t (ts) values('2020-03-31 23:30:00');

 

Once again, thank you for invaluable feedback.

 

 

Yes, this should work just fine. The internal representation of timestamptz is always UTC, and it's only converted on entry/exit.

 

You can see this clearly if you create your partitions like above, and then do a "set timezone to 'America/Los_Angeles'" followed by \d+ t in psql. This will now show you what the partition bounds are in that timezone.

 

You can also just specify the timestamps when you create your partition without including the timezone (+01) specifier. In this PostgreSQL will interpret it as whatever your current value for the timezone setting is, so as long as it's CET it should work fine, and you don't have to remember which months are in DST and which are not.


 

--

 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

 

Thank you, Magnus. Especially for the last hint.

Re: timestamp and timestamptz

From
Adrian Klaver
Date:
On 4/16/20 1:23 AM, raf wrote:
> Steve Baldwin wrote:
> 
>> I'm pretty sure you are mistaken. Postgres doesn't store the 'creating'
>> time zone in a timestamptz column.
>>
>> Try doing this before re-running your test:
>>
>> set timezone to 'utc';
>>
>> What you are seeing in your test is an artifact of that timezone setting.
>>
>> Steve
> 
> Thanks. You're right.
> 
>    create table example (t timestamptz not null);
>    insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 Australia/Sydney');
>    select * from  example;
>    set timezone to 'utc';
>    select * from  example;
>    drop table example;
> 
> Does this:
> 
>    CREATE TABLE
>    INSERT 0 1
>                t
>    ------------------------------
>     2020-04-16 17:12:33.71768+10
>    (1 row)
> 
>    SET
>                  t
>    ------------------------------
>     2020-04-16 07:12:33.71768+00
>    (1 row)
> 
>    DROP TABLE
> 
> So it doesn't store the offset, but I've used postgres
> for 12 years without knowing that and it hasn't been
> a problem. Yay, postgres!
> 
> It doesn't store the offset but, by using timestamptz,
> it knows that the timezone is UTC. That's what matters.

Well that is somewhat misleading. The value entered is stored as 
timestamp with an implicit tz of 'UTC'. The issue that trips people up 
is the format of the input value. If you enter an input with an offset 
or correct tz value then you have given Postgres an explicit value to 
work off for converting it to 'UTC'. For what is correct see here:

https://www.postgresql.org/docs/12/datatype-datetime.html#DATATYPE-TIMEZONES

If you input a value that does not have the above then Postgres uses the 
SET TimeZone value to implicitly set the input value's tz(as pointed out 
by David Johnson upstream). In other words Postgres does not assume an 
input value is in 'UTC'. For the OP's case that could cause issues if 
the timestamp in the CSV file does not have a proper offset/tz and the 
client is using a tz other then 'UTC'(as pointed by others upstream). 
The bottom line is that when dealing with timestamps explicit is better 
then implicit.

> The fact that it knows the time zone is what makes everything
> work. Timestamp without time zone is best avoided I think.
> 
> cheers,
> raf
> 
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: timestamp and timestamptz

From
raf
Date:
Adrian Klaver wrote:

> On 4/16/20 1:23 AM, raf wrote:
> > Steve Baldwin wrote:
> > 
> > > I'm pretty sure you are mistaken. Postgres doesn't store the 'creating'
> > > time zone in a timestamptz column.
> > > 
> > > Try doing this before re-running your test:
> > > 
> > > set timezone to 'utc';
> > > 
> > > What you are seeing in your test is an artifact of that timezone setting.
> > > 
> > > Steve
> > 
> > Thanks. You're right.
> > 
> >    create table example (t timestamptz not null);
> >    insert into example (t) values (timestamptz '2020-04-16 17:12:33.71768 Australia/Sydney');
> >    select * from  example;
> >    set timezone to 'utc';
> >    select * from  example;
> >    drop table example;
> > 
> > Does this:
> > 
> >    CREATE TABLE
> >    INSERT 0 1
> >                t
> >    ------------------------------
> >     2020-04-16 17:12:33.71768+10
> >    (1 row)
> > 
> >    SET
> >                  t
> >    ------------------------------
> >     2020-04-16 07:12:33.71768+00
> >    (1 row)
> > 
> >    DROP TABLE
> > 
> > So it doesn't store the offset, but I've used postgres
> > for 12 years without knowing that and it hasn't been
> > a problem. Yay, postgres!
> > 
> > It doesn't store the offset but, by using timestamptz,
> > it knows that the timezone is UTC. That's what matters.
> 
> Well that is somewhat misleading. The value entered is stored as timestamp
> with an implicit tz of 'UTC'. The issue that trips people up is the format
> of the input value. If you enter an input with an offset or correct tz value
> then you have given Postgres an explicit value to work off for converting it
> to 'UTC'.

When I said "it knows that the timezone is UTC", I was only referring
to the values once stored as a timestamptz, not the input. Sorry that
wasn't clear enough. I meant to say that once a value is stored in a
timestamptz (as opposed to a timestamp), postgres knows that it is
stored in UTC and will do the right things with it.

> For what is correct see here:
> 
> https://www.postgresql.org/docs/12/datatype-datetime.html#DATATYPE-TIMEZONES
> 
> If you input a value that does not have the above then Postgres uses the SET
> TimeZone value to implicitly set the input value's tz(as pointed out by
> David Johnson upstream). In other words Postgres does not assume an input
> value is in 'UTC'. For the OP's case that could cause issues if the
> timestamp in the CSV file does not have a proper offset/tz and the client is
> using a tz other then 'UTC'(as pointed by others upstream). The bottom line
> is that when dealing with timestamps explicit is better then implicit.

I couldn't agree more.

> > The fact that it knows the time zone is what makes everything
> > work. Timestamp without time zone is best avoided I think.
> > 
> > cheers,
> > raf
> 
> -- 
> Adrian Klaver
> adrian.klaver@aklaver.com
>