Thread: truncating timestamps on arbitrary intervals

truncating timestamps on arbitrary intervals

From
John Naylor
Date:
Hi,

When analyzing time-series data, it's useful to be able to bin
timestamps into equally spaced ranges. date_trunc() is only able to
bin on a specified whole unit. In the attached patch for the March
commitfest, I propose a new function date_trunc_interval(), which can
truncate to arbitrary intervals, e.g.:

select date_trunc_interval('15 minutes', timestamp '2020-02-16
20:48:40'); date_trunc_interval
---------------------
 2020-02-16 20:45:00
(1 row)

With this addition, it might be possible to turn the existing
date_trunc() functions into wrappers. I haven't done that here because
it didn't seem practical at this point. For one, the existing
functions have special treatment for weeks, centuries, and millennia.

Note: I've only written the implementation for the type timestamp
without timezone. Adding timezone support would be pretty simple, but
I wanted to get feedback on the basic idea first before making it
complete. I've also written tests and very basic documentation.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: truncating timestamps on arbitrary intervals

From
David Fetter
Date:
On Wed, Feb 26, 2020 at 10:50:19AM +0800, John Naylor wrote:
> Hi,
> 
> When analyzing time-series data, it's useful to be able to bin
> timestamps into equally spaced ranges. date_trunc() is only able to
> bin on a specified whole unit.

Thanks for adding this very handy feature!

> In the attached patch for the March
> commitfest, I propose a new function date_trunc_interval(), which can
> truncate to arbitrary intervals, e.g.:
> 
> select date_trunc_interval('15 minutes', timestamp '2020-02-16
> 20:48:40'); date_trunc_interval
> ---------------------
>  2020-02-16 20:45:00
> (1 row)

I believe the following should error out, but doesn't.

# SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
 date_trunc_interval 
═════════════════════
 2001-01-01 00:00:00
(1 row)

> With this addition, it might be possible to turn the existing
> date_trunc() functions into wrappers. I haven't done that here because
> it didn't seem practical at this point. For one, the existing
> functions have special treatment for weeks, centuries, and millennia.

I agree that turning it into a wrapper would be separate work.

> Note: I've only written the implementation for the type timestamp
> without timezone. Adding timezone support would be pretty simple,
> but I wanted to get feedback on the basic idea first before making
> it complete. I've also written tests and very basic documentation.

Please find attached an update that I believe fixes the bug I found in
a principled way.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Wed, Feb 26, 2020 at 3:51 PM David Fetter <david@fetter.org> wrote:
>
> I believe the following should error out, but doesn't.
>
> # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
>  date_trunc_interval
> ═════════════════════
>  2001-01-01 00:00:00
> (1 row)

You're quite right. I forgot to add error checking for
second-and-below units. I've added your example to the tests. (I
neglected to mention in my first email that because I chose to convert
the interval to the pg_tm struct (seemed easiest), it's not
straightforward how to allow multiple unit types, and I imagine the
use case is small, so I had it throw an error.)

> Please find attached an update that I believe fixes the bug I found in
> a principled way.

Thanks for that! I made a couple adjustments and incorporated your fix
into v3: While working on v1, I noticed the DTK_FOO macros already had
an idiom for bitmasking (see utils/datetime.h), so I used that instead
of a bespoke enum. Also, since the bitmask is checked once, I removed
the individual member checks, allowing me to remove all the gotos.

There's another small wrinkle: Since we store microseconds internally,
it's neither convenient nor useful to try to error out for things like
'2 ms 500 us', since that is just as well written as '2500 us', and
stored exactly the same. I'm inclined to just skip the millisecond
check and just use microseconds, but I haven't done that yet.

Also, I noticed this bug in v1:

SELECT date_trunc_interval('17 days', TIMESTAMP '2001-02-16 20:38:40.123456');
 date_trunc_interval
---------------------
 2001-01-31 00:00:00
(1 row)

This is another consequence of month and day being 1-based. Fixed,
with new tests.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: truncating timestamps on arbitrary intervals

From
Tom Lane
Date:
John Naylor <john.naylor@2ndquadrant.com> writes:
> [ v3-datetrunc_interval.patch ]

A few thoughts:

* In general, binning involves both an origin and a stride.  When
working with plain numbers it's almost always OK to set the origin
to zero, but it's less clear to me whether that's all right for
timestamps.  Do we need another optional argument?  Even if we
don't, "zero" for tm_year is 1900, which is going to give results
that surprise somebody.

* I'm still not convinced that the code does the right thing for
1-based months or days.  Shouldn't you need to subtract 1, then
do the modulus, then add back 1?

* Speaking of modulus, would it be clearer to express the
calculations like
    timestamp -= timestamp % interval;
(That's just a question, I'm not sure.)

* Code doesn't look to have thought carefully about what to do with
negative intervals, or BC timestamps.

* The comment 
     * Justify all lower timestamp units and throw an error if any
     * of the lower interval units are non-zero.
doesn't seem to have a lot to do with what the code after it actually
does.  Also, you need explicit /* FALLTHRU */-type comments in that
switch, or pickier buildfarm members will complain.

* Seems like you could jam all the unit-related error checking into
that switch's default: case, where it will cost nothing if the
call is valid:

    switch (unit)
    {
        ...
        default:
        if (unit == 0)
            // complain about zero interval
        else
            // complain about interval with multiple components
    }

* I'd use ERRCODE_INVALID_PARAMETER_VALUE for any case of disallowed
contents of the interval.

            regards, tom lane



Re: truncating timestamps on arbitrary intervals

From
David Fetter
Date:
On Wed, Feb 26, 2020 at 06:38:57PM +0800, John Naylor wrote:
> On Wed, Feb 26, 2020 at 3:51 PM David Fetter <david@fetter.org> wrote:
> >
> > I believe the following should error out, but doesn't.
> >
> > # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
> >  date_trunc_interval
> > ═════════════════════
> >  2001-01-01 00:00:00
> > (1 row)
> 
> You're quite right. I forgot to add error checking for
> second-and-below units. I've added your example to the tests. (I
> neglected to mention in my first email that because I chose to convert
> the interval to the pg_tm struct (seemed easiest), it's not
> straightforward how to allow multiple unit types, and I imagine the
> use case is small, so I had it throw an error.)

I suspect that this could be sanely expanded to span some sets of
adjacent types in a future patch, e.g. year + month or hour + minute.

> > Please find attached an update that I believe fixes the bug I found in
> > a principled way.
> 
> Thanks for that! I made a couple adjustments and incorporated your fix
> into v3: While working on v1, I noticed the DTK_FOO macros already had
> an idiom for bitmasking (see utils/datetime.h),

Oops!  Sorry I missed that.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Wed, Feb 26, 2020 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> John Naylor <john.naylor@2ndquadrant.com> writes:
> > [ v3-datetrunc_interval.patch ]
>
> A few thoughts:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.

Not sure.

A surprise I foresee in general might be: '1 week' is just '7 days',
and not aligned on "WOY". Since the function is passed an interval and
not text, we can't raise a warning. But date_trunc() already covers
that, so probably not a big deal.

> * I'm still not convinced that the code does the right thing for
> 1-based months or days.  Shouldn't you need to subtract 1, then
> do the modulus, then add back 1?

Yes, brain fade on my part. Fixed in the attached v4.

> * Speaking of modulus, would it be clearer to express the
> calculations like
>         timestamp -= timestamp % interval;
> (That's just a question, I'm not sure.)

Seems nicer, so done that way.

> * Code doesn't look to have thought carefully about what to do with
> negative intervals, or BC timestamps.

By accident, negative intervals currently behave the same as positive
ones. We could make negative intervals round up rather than truncate
(or vice versa). I don't know the best thing to do here.

As for BC, changed so it goes in the correct direction at least, and added test.

> * The comment
>          * Justify all lower timestamp units and throw an error if any
>          * of the lower interval units are non-zero.
> doesn't seem to have a lot to do with what the code after it actually
> does.  Also, you need explicit /* FALLTHRU */-type comments in that
> switch, or pickier buildfarm members will complain.

Stale comment from an earlier version, fixed. Not sure if "justify" is
the right term, but "zero" is a bit misleading. Added fall thru's.

> * Seems like you could jam all the unit-related error checking into
> that switch's default: case, where it will cost nothing if the
> call is valid:
>
>         switch (unit)
>         {
>             ...
>             default:
>                 if (unit == 0)
>                         // complain about zero interval
>                 else
>                         // complain about interval with multiple components
>         }

Done.

> * I'd use ERRCODE_INVALID_PARAMETER_VALUE for any case of disallowed
> contents of the interval.

Done.

Also removed the millisecond case, since it's impossible, or at least
not worth it, to distinguish from the microsecond case.

Note: I haven't done any additional docs changes in v4.

TODO: with timezone

possible TODO: origin parameter

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Wed, Feb 26, 2020 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.

I tried the simplest way in the attached v5. Examples (third param is origin):

-- same result as no origin:
select date_trunc_interval('5 min'::interval, TIMESTAMP '2020-02-01
01:01:01', TIMESTAMP '2020-02-01');
 date_trunc_interval
---------------------
 2020-02-01 01:00:00
(1 row)

-- shift bins by 2.5 min:
select date_trunc_interval('5 min'::interval, TIMESTAMP '2020-02-1
01:01:01', TIMESTAMP '2020-02-01 00:02:30');
 date_trunc_interval
---------------------
 2020-02-01 00:57:30
(1 row)

-- align weeks to start on Sunday
select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
---------------------
 2020-02-09 00:00:00
(1 row)

I've put off adding documentation on the origin piece pending comments
about the approach.

I haven't thought seriously about timezone yet, but hopefully it's
just work and nothing to think too hard about.


--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: truncating timestamps on arbitrary intervals

From
Isaac Morland
Date:
On Fri, 13 Mar 2020 at 03:13, John Naylor <john.naylor@2ndquadrant.com> wrote:
On Wed, Feb 26, 2020 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.

- align weeks to start on Sunday
select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
---------------------
 2020-02-09 00:00:00
(1 row)

I'm confused by this. If my calendars are correct, both 1900-01-02 and 2020-02-11 are Tuesdays. So if the date being adjusted and the origin are both Tuesday, shouldn't the day part be left alone when truncating to 7 days? Also, I'd like to confirm that the default starting point for 7 day periods (weeks) is Monday, per ISO. I know it's very fashionable in North America to split the weekend in half but it's not the international standard.

Perhaps the starting point for dates should be either 0001-01-01 (the proleptic beginning of the CE calendar) or 2001-01-01 (the beginning of the current 400-year repeating cycle of leap years and weeks, and a Monday, giving the appropriate ISO result for truncating to 7 day periods).

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Fri, Mar 13, 2020 at 7:48 PM Isaac Morland <isaac.morland@gmail.com> wrote:
>
> On Fri, 13 Mar 2020 at 03:13, John Naylor <john.naylor@2ndquadrant.com> wrote:
>
>> - align weeks to start on Sunday
>> select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
>> 01:01:01.0', TIMESTAMP '1900-01-02');
>>  date_trunc_interval
>> ---------------------
>>  2020-02-09 00:00:00
>> (1 row)
>
>
> I'm confused by this. If my calendars are correct, both 1900-01-02 and 2020-02-11 are Tuesdays. So if the date being
adjustedand the origin are both Tuesday, shouldn't the day part be left alone when truncating to 7 days?
 

Thanks for taking a look! The non-intuitive behavior you found is
because the patch shifts the timestamp before converting to the
internal pg_tm type. The pg_tm type stores day of the month, which is
used for the calculation. It's not counting the days since the origin.
Then the result is shifted back.

To get more logical behavior, perhaps the optional parameter is better
as an offset instead of an origin. Alternatively (or additionally),
the function could do the math on int64 timestamps directly.

> Also, I'd like to confirm that the default starting point for 7 day periods (weeks) is Monday, per ISO.

That's currently the behavior in the existing date_trunc function,
when passed the string 'week'. Given that keyword, it calculates the
week of the year.

When using the proposed function with arbitrary intervals, it uses day
of the month, as found in the pg_tm struct. It doesn't treat 7 days
differently then 5 or 10 without user input (origin or offset), since
there is nothing special about 7 day intervals as such internally. To
show the difference between date_trunc, and date_trunc_interval as
implemented in v5 with no origin:

select date_trunc('week', d), count(*) from generate_series(
'2020-02-01'::timestamp, '2020-03-31', '1 day') d group by 1 order by
1;
     date_trunc      | count
---------------------+-------
 2020-01-27 00:00:00 |     2
 2020-02-03 00:00:00 |     7
 2020-02-10 00:00:00 |     7
 2020-02-17 00:00:00 |     7
 2020-02-24 00:00:00 |     7
 2020-03-02 00:00:00 |     7
 2020-03-09 00:00:00 |     7
 2020-03-16 00:00:00 |     7
 2020-03-23 00:00:00 |     7
 2020-03-30 00:00:00 |     2
(10 rows)

select date_trunc_interval('7 days'::interval, d), count(*) from
generate_series( '2020-02-01'::timestamp, '2020-03-31', '1 day') d
group by 1 order by 1;
 date_trunc_interval | count
---------------------+-------
 2020-02-01 00:00:00 |     7
 2020-02-08 00:00:00 |     7
 2020-02-15 00:00:00 |     7
 2020-02-22 00:00:00 |     7
 2020-02-29 00:00:00 |     1
 2020-03-01 00:00:00 |     7
 2020-03-08 00:00:00 |     7
 2020-03-15 00:00:00 |     7
 2020-03-22 00:00:00 |     7
 2020-03-29 00:00:00 |     3
(10 rows)

Resetting the day every month is counterintuitive if not broken, and
as I mentioned it might make more sense to use the int64 timestamp
directly, at least for intervals less than one month. I'll go look
into doing that.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: truncating timestamps on arbitrary intervals

From
Artur Zakirov
Date:
Hello,

On 3/13/2020 4:13 PM, John Naylor wrote:
> I've put off adding documentation on the origin piece pending comments
> about the approach.
> 
> I haven't thought seriously about timezone yet, but hopefully it's
> just work and nothing to think too hard about.

Thank you for the patch. I looked it and tested a bit.

There is one interesting case which might be mentioned in the 
documentation or in the tests is the following. The function has 
interesting behaviour with real numbers:

=# select date_trunc_interval('0.1 year'::interval, TIMESTAMP 
'2020-02-01 01:21:01');
  date_trunc_interval
---------------------
  2020-02-01 00:00:00

=# select date_trunc_interval('1.1 year'::interval, TIMESTAMP 
'2020-02-01 01:21:01');
ERROR:  only one interval unit allowed for truncation

It is because the second interval has two interval units:

=# select '0.1 year'::interval;
  interval
----------
  1 mon

=# select '1.1 year'::interval;
    interval
--------------
  1 year 1 mon

-- 
Artur



Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Sun, Mar 15, 2020 at 2:26 PM I wrote:
>
> To get more logical behavior, perhaps the optional parameter is better
> as an offset instead of an origin. Alternatively (or additionally),
> the function could do the math on int64 timestamps directly.

For v6, I changed the algorithm to use pg_tm for months and years, and
int64 for all smaller units. Despite the split, I think it's easier to
read now, and certainly shorter. This has the advantage that now
mixing units is allowed, as long as you don't mix months/years with
days or smaller, which often doesn't make sense and is not very
practical. (not yet documented) One consequence of this is that when
operating on months/years, and the origin contains smaller units, the
smaller units are ignored. Example:

select date_trunc_interval('12 months'::interval, timestamp
'2012-03-01 01:21:01', timestamp '2011-03-22');
 date_trunc_interval
---------------------
 2012-03-01 00:00:00
(1 row)

Even though not quite a full year has passed, it ignores the days in
the origin time and detects a difference in 12 calendar months. That
might be fine, although we could also throw an error and say origins
must be in the form of 'YYYY-01-01 00:00:00' when truncating on months
and/or years.

I added a sketch of documentation for the origin parameter and more tests.

On Fri, Mar 13, 2020 at 7:48 PM Isaac Morland <isaac.morland@gmail.com> wrote:
> I'm confused by this. If my calendars are correct, both 1900-01-02 and 2020-02-11 are Tuesdays. So if the date being
adjustedand the origin are both Tuesday, shouldn't the day part be left alone when truncating to 7 days? Also, I'd like
toconfirm that the default starting point for 7 day periods (weeks) is Monday, per ISO. 

This is fixed.

select date_trunc_interval('7 days'::interval, timestamp '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
---------------------
 2020-02-11 00:00:00
(1 row)

select date_trunc_interval('7 days'::interval, d), count(*) from
generate_series( '2020-02-01'::timestamp, '2020-03-31', '1 day') d
group by 1 order by 1;
 date_trunc_interval | count
---------------------+-------
 2020-01-27 00:00:00 |     2
 2020-02-03 00:00:00 |     7
 2020-02-10 00:00:00 |     7
 2020-02-17 00:00:00 |     7
 2020-02-24 00:00:00 |     7
 2020-03-02 00:00:00 |     7
 2020-03-09 00:00:00 |     7
 2020-03-16 00:00:00 |     7
 2020-03-23 00:00:00 |     7
 2020-03-30 00:00:00 |     2
(10 rows)

> Perhaps the starting point for dates should be either 0001-01-01 (the proleptic beginning of the CE calendar) or
2001-01-01(the beginning of the current 400-year repeating cycle of leap years and weeks, and a Monday, giving the
appropriateISO result for truncating to 7 day periods). 

I went ahead with 2001-01-01 for the time being.

On Thu, Mar 19, 2020 at 4:20 PM Artur Zakirov <zaartur@gmail.com> wrote:
>
> =# select date_trunc_interval('1.1 year'::interval, TIMESTAMP
> '2020-02-01 01:21:01');
> ERROR:  only one interval unit allowed for truncation

For any lingering cases like this (i don't see any), maybe an error
hint is in order. The following works now, as expected for 1 year 1
month:

select date_trunc_interval('1.1 year'::interval, timestamp '2002-05-01
01:21:01');
 date_trunc_interval
---------------------
 2002-02-01 00:00:0

I'm going to look into implementing timezone while awaiting comments on v6.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: truncating timestamps on arbitrary intervals

From
Tels
Date:
Hello John,

this looks like a nice feature. I'm wondering how it relates to the 
following use-case:

When drawing charts, the user can select pre-defined widths on times 
(like "15 min", "1 hour").

The data for these slots is fitted always to intervalls that start in 0 
in the slot, e.g. if the user selects "15 min", the interval always 
starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the 
resulting data, and so that requesting the same chart at xx:00 and xx:01 
actually draws the same chart, and not some bar with only one minute 
data at at the end.

In PSQL, this works out to using this as GROUP BY and then summing up 
all values:

   SELECT FLOOR(EXTRACT(EPOCH from thetime) / 3600) * 3600, SUM(events) 
FROM mytable ... GROUP BY 1;

whereas here 3600 means "hourly".

It is of course easy for things like "1 hour", but for yearly I had to 
come up with things like:

   EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime)

and its gets even more involved for weeks, weekdays or odd things like 
fortnights.

So would that mean one could replace this by:

  date_trunc_interval('3600 seconds'::interval, thetime)

and it would be easier, and (hopefully) faster?

Best regards,

Tels



Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Tue, Mar 24, 2020 at 9:34 PM Tels <nospam-pg-abuse@bloodgate.com> wrote:
>
> Hello John,
>
> this looks like a nice feature. I'm wondering how it relates to the
> following use-case:
>
> When drawing charts, the user can select pre-defined widths on times
> (like "15 min", "1 hour").
>
> The data for these slots is fitted always to intervalls that start in 0
> in the slot, e.g. if the user selects "15 min", the interval always
> starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the
> resulting data, and so that requesting the same chart at xx:00 and xx:01
> actually draws the same chart, and not some bar with only one minute
> data at at the end.

Hi Tels, thanks for your interest! Sounds like the exact use case I had in mind.

> It is of course easy for things like "1 hour", but for yearly I had to
> come up with things like:
>
>    EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime)
>
> and its gets even more involved for weeks, weekdays or odd things like
> fortnights.

To be clear, this particular case was already handled by the existing
date_trunc, but only single units and a few other special cases. I
understand if you have to write code to handle 15 minutes, you need to
use that structure for all cases.

Fortnight would be trivial:

date_trunc_interval('14 days'::interval, thetime [, optional start of
the fortnight])

...but weekdays would still need something extra.

> So would that mean one could replace this by:
>
>   date_trunc_interval('3600 seconds'::interval, thetime)
>
> and it would be easier, and (hopefully) faster?

Certainly easier and more flexible. It's hard to make guesses about
performance, but with your example above where you have two SQL
function calls plus some expression evaluation, I think a single
function would be faster in many cases.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
I wrote:

> I'm going to look into implementing timezone while awaiting comments on v6.

I attempted this in the attached v7. There are 4 new functions for
truncating timestamptz on an interval -- with and without origin, and
with and without time zone.

Parts of it are hackish, and need more work, but I think it's in
passable enough shape to get feedback on. The origin parameter logic
was designed with timestamps-without-time-zone in mind, and
retrofitting time zone on top of that was a bit messy. There might be
bugs.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: truncating timestamps on arbitrary intervals

From
Artur Zakirov
Date:
On 3/30/2020 9:30 PM, John Naylor wrote:
> I attempted this in the attached v7. There are 4 new functions for
> truncating timestamptz on an interval -- with and without origin, and
> with and without time zone.

Thank you for new version of the patch.

I'm not sure that I fully understand the 'origin' parameter. Is it valid 
to have a value of 'origin' which is greater than a value of 'timestamp' 
parameter?

I get some different results in such case:

=# select date_trunc_interval('2 year', timestamp '2020-01-16 20:38:40', 
timestamp '2022-01-17 00:00:00');
  date_trunc_interval
---------------------
  2020-01-01 00:00:00

=# select date_trunc_interval('3 year', timestamp '2020-01-16 20:38:40', 
timestamp '2022-01-17 00:00:00');
  date_trunc_interval
---------------------
  2022-01-01 00:00:00

So here I'm not sure which result is correct.

It seems that the patch is still in progress, but I have some nitpicking.

> +        <entry><literal><function>date_trunc_interval(<type>interval</type>, <type>timestamptz</type>,
<type>text</type>)</function></literal></entry>
> +        <entry><type>timestamptz  </type></entry>

It seems that 'timestamptz' in both argument and result descriptions 
should be replaced by 'timestamp with time zone' (see other functions 
descriptions). Though it is okay to use 'timestamptz' in SQL examples.

timestamp_trunc_interval_internal() and 
timestamptz_trunc_interval_internal() have similar code. I think they 
can be rewritten to avoid code duplication.

-- 
Artur



Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Tue, Mar 31, 2020 at 4:34 PM Artur Zakirov <zaartur@gmail.com> wrote:
> Thank you for new version of the patch.

Thanks for taking a look! Attached is v8, which addresses your points,
adds tests and fixes some bugs. There are still some WIP detritus in
the timezone code, so I'm not claiming it's ready, but it's much
closer. I'm fairly confident in the implementation of timestamp
without time zone, however.

> I'm not sure that I fully understand the 'origin' parameter. Is it valid
> to have a value of 'origin' which is greater than a value of 'timestamp'
> parameter?

That is the intention. The returned values should be

origin +/- (n * interval)

where n is an integer.

> I get some different results in such case:
>
> =# select date_trunc_interval('2 year', timestamp '2020-01-16 20:38:40',
> timestamp '2022-01-17 00:00:00');
>   date_trunc_interval
> ---------------------
>   2020-01-01 00:00:00

This was correct per how I coded it, but I have rethought where to
draw the bins for user-specified origins. I have decided that the
above is inconsistent with units smaller than a month. We shouldn't
"cross" the bin until the input has reached Jan. 17, in this case. In
v8, the answer to the above is

 date_trunc_interval
---------------------
 2018-01-17 00:00:00
(1 row)

(This could probably be better documented)

> =# select date_trunc_interval('3 year', timestamp '2020-01-16 20:38:40',
 timestamp '2022-01-17 00:00:00');
>   date_trunc_interval
> ---------------------
>   2022-01-01 00:00:00
>
> So here I'm not sure which result is correct.

This one is just plain broken. The result should always be equal or
earlier than the input. In v8 the result is now:

 date_trunc_interval
---------------------
 2019-01-17 00:00:00
(1 row)

> It seems that the patch is still in progress, but I have some nitpicking.
>
> > +        <entry><literal><function>date_trunc_interval(<type>interval</type>, <type>timestamptz</type>,
<type>text</type>)</function></literal></entry>
> > +        <entry><type>timestamptz  </type></entry>
>
> It seems that 'timestamptz' in both argument and result descriptions
> should be replaced by 'timestamp with time zone' (see other functions
> descriptions). Though it is okay to use 'timestamptz' in SQL examples.

Any and all nitpicks welcome! I have made these match the existing
date_trunc documentation more closely.

> timestamp_trunc_interval_internal() and
> timestamptz_trunc_interval_internal() have similar code. I think they
> can be rewritten to avoid code duplication.

I thought so too (and noticed the same about the existing date_trunc),
but it's more difficult than it looks.

Note: I copied some tests from timestamp to timestamptz with a few
tweaks. A few tz tests still don't pass. I'm not yet sure if the
problem is in the test, or my code. Some detailed review of the tests
and their results would be helpful.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
In v9, I've simplified the patch somewhat to make it easier for future
work to build on.

- When truncating on month-or-greater intervals, require the origin to
align on month. This removes the need to handle weird corner cases
that have no straightforward behavior.
- Remove hackish and possibly broken code to allow origin to be after
the input timestamp. The default origin is Jan 1, 1 AD, so only AD
dates will behave correctly by default. This is not enforced for now,
since it may be desirable to find a way to get this to work in a nicer
way.
- Rebase docs over PG13 formatting changes.



--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: truncating timestamps on arbitrary intervals

From
Peter Eisentraut
Date:
On 2020-06-30 06:34, John Naylor wrote:
> In v9, I've simplified the patch somewhat to make it easier for future
> work to build on.
> 
> - When truncating on month-or-greater intervals, require the origin to
> align on month. This removes the need to handle weird corner cases
> that have no straightforward behavior.
> - Remove hackish and possibly broken code to allow origin to be after
> the input timestamp. The default origin is Jan 1, 1 AD, so only AD
> dates will behave correctly by default. This is not enforced for now,
> since it may be desirable to find a way to get this to work in a nicer
> way.
> - Rebase docs over PG13 formatting changes.

This looks pretty solid now.  Are there any more corner cases and other 
areas with unclear behavior that you are aware of?

A couple of thoughts:

- After reading the discussion a few times, I'm not so sure anymore 
whether making this a cousin of date_trunc is the right way to go.  As 
you mentioned, there are some behaviors specific to date_trunc that 
don't really make sense in date_trunc_interval, and maybe we'll have 
more of those.  Also, date_trunc_interval isn't exactly a handy name. 
Maybe something to think about.  It's obviously fairly straightforward 
to change it.

- There were various issues with the stride interval having months and 
years.  I'm not sure we even need that.  It could be omitted unless you 
are confident that your implementation is now sufficient.

- Also, negative intervals could be prohibited, but I suppose that 
matters less.

- I'm curious about the origin being set to 0001-01-01.  This seems to 
work correctly in that it sets the origin to a Monday, which is what we 
wanted, but according to Google that day was a Saturday.  Something to 
do with Julian vs. Gregorian calendar?  Maybe we should choose a date 
that is a bit more recent and easier to reason with.

- Then again, I'm thinking that maybe we should make the origin 
mandatory.  Otherwise, the default answers when having strides larger 
than a day are entirely arbitrary, e.g.,

=> select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
0190-01-01 00:00:00 BC

=> select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
0191-01-01 00:00:00

Perhaps the origin could be defaulted if the interval is less than a day 
or something like that.

- I'm wondering whether we need the date_trunc_interval(interval, 
timestamptz, timezone) variant.  Isn't that the same as 
date_trunc_interval(foo AT ZONE xyz, value)?



Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 2020-06-30 06:34, John Naylor wrote:
> > In v9, I've simplified the patch somewhat to make it easier for future
> > work to build on.
> >
> > - When truncating on month-or-greater intervals, require the origin to
> > align on month. This removes the need to handle weird corner cases
> > that have no straightforward behavior.
> > - Remove hackish and possibly broken code to allow origin to be after
> > the input timestamp. The default origin is Jan 1, 1 AD, so only AD
> > dates will behave correctly by default. This is not enforced for now,
> > since it may be desirable to find a way to get this to work in a nicer
> > way.
> > - Rebase docs over PG13 formatting changes.
>
> This looks pretty solid now.  Are there any more corner cases and other
> areas with unclear behavior that you are aware of?

Hi Peter,

Thanks for taking a look!

I believe there are no known corner cases aside from not throwing an error if origin > input, but I'll revisit that when we are more firm on what features we want support.

> A couple of thoughts:
>
> - After reading the discussion a few times, I'm not so sure anymore
> whether making this a cousin of date_trunc is the right way to go.  As
> you mentioned, there are some behaviors specific to date_trunc that
> don't really make sense in date_trunc_interval, and maybe we'll have
> more of those.

As far as the behaviors, I'm not sure exactly what you what you were thinking of, but here are two issues off the top of my head:

- If the new functions are considered variants of date_trunc(), there is the expectation that the options work the same way, in particular the timezone parameter. You asked specifically about that below, so I'll address that separately.
- In the "week" case, the boundary position depends on the origin, since a week-long interval is just 7 days.

> Also, date_trunc_interval isn't exactly a handy name.
> Maybe something to think about.  It's obviously fairly straightforward
> to change it.

Effectively, it puts timestamps into bins, so maybe date_bin() or something like that?

> - There were various issues with the stride interval having months and
> years.  I'm not sure we even need that.  It could be omitted unless you
> are confident that your implementation is now sufficient.

Months and years were a bit tricky, so I'd be happy to leave that out if there is not much demand for it. date_trunc() already has quarters, decades, centuries, and millenia.

> - Also, negative intervals could be prohibited, but I suppose that
> matters less.

Good for the sake of completeness. I think they happen to work in v9 by accident, but it would be better not to expose that.

> - I'm curious about the origin being set to 0001-01-01.  This seems to
> work correctly in that it sets the origin to a Monday, which is what we
> wanted, but according to Google that day was a Saturday.  Something to
> do with Julian vs. Gregorian calendar?

Right, working backwards from our calendar today, it's Monday, but at the time it would theoretically be Saturday, barring leap year miscalculations.

> Maybe we should choose a date
> that is a bit more recent and easier to reason with.

2001-01-01 would also be a Monday aligned with centuries and millenia, so that would be my next suggestion. If we don't care to match with date_trunc() on those larger units, we could also use 1900-01-01, so the vast majority of dates in databases are after the origin.

> - Then again, I'm thinking that maybe we should make the origin
> mandatory.  Otherwise, the default answers when having strides larger
> than a day are entirely arbitrary, e.g.,
>
> => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
> 0190-01-01 00:00:00 BC
>
> => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
> 0191-01-01 00:00:00

Right. In the first case, the default origin is also after the input, and crosses the AD/BC boundary. Tricky to get right.

> Perhaps the origin could be defaulted if the interval is less than a day
> or something like that.

If we didn't allow months and years to be units, it seems the default would always make sense?

> - I'm wondering whether we need the date_trunc_interval(interval,
> timestamptz, timezone) variant.  Isn't that the same as
> date_trunc_interval(foo AT ZONE xyz, value)?

I based this on 600b04d6b5ef6 for date_trunc(), whose message states:

date_trunc(field, timestamptz, zone_name)

is the same as

date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name

so without the shorthand, you need to specify the timezone twice, once for the calculation, and once for the output.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Mon, Nov 23, 2020 at 1:44 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>
> On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
> > - After reading the discussion a few times, I'm not so sure anymore
> > whether making this a cousin of date_trunc is the right way to go.  As
> > you mentioned, there are some behaviors specific to date_trunc that
> > don't really make sense in date_trunc_interval, and maybe we'll have
> > more of those.

For v10, I simplified the behavior by decoupling the behavior from date_trunc() and putting in some restrictions as discussed earlier. It's much simpler now. It could be argued that it goes too far in that direction, but it's easy to reason about and we can put back some features as we see fit.

> > Also, date_trunc_interval isn't exactly a handy name.
> > Maybe something to think about.  It's obviously fairly straightforward
> > to change it.
>
> Effectively, it puts timestamps into bins, so maybe date_bin() or something like that?

For v10 I went with date_bin() so we can see how that looks.

> > - There were various issues with the stride interval having months and
> > years.  I'm not sure we even need that.  It could be omitted unless you
> > are confident that your implementation is now sufficient.
>
> Months and years were a bit tricky, so I'd be happy to leave that out if there is not much demand for it. date_trunc() already has quarters, decades, centuries, and millenia.

I removed months and years for this version, but that can be reconsidered of course. The logic is really simple now.

> > - Also, negative intervals could be prohibited, but I suppose that
> > matters less.

I didn't go this far, but probably should before long.

> > - Then again, I'm thinking that maybe we should make the origin
> > mandatory.  Otherwise, the default answers when having strides larger
> > than a day are entirely arbitrary, e.g.,

I've tried this and like the resulting simplification.

> > - I'm wondering whether we need the date_trunc_interval(interval,
> > timestamptz, timezone) variant.  Isn't that the same as
> > date_trunc_interval(foo AT ZONE xyz, value)?
>
> I based this on 600b04d6b5ef6 for date_trunc(), whose message states:
>
> date_trunc(field, timestamptz, zone_name)
>
> is the same as
>
> date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name
>
> so without the shorthand, you need to specify the timezone twice, once for the calculation, and once for the output.

In light of making the origin mandatory, it no longer makes sense to have a time zone parameter, since we can specify the time zone on the origin; and if desired on the output as well.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

Re: truncating timestamps on arbitrary intervals

From
David Steele
Date:
On 1/18/21 3:54 PM, John Naylor wrote:
> On Mon, Nov 23, 2020 at 1:44 PM John Naylor 
> <john.naylor@enterprisedb.com <mailto:john.naylor@enterprisedb.com>> wrote:
>  >
>  > On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut 
> <peter.eisentraut@enterprisedb.com 
> <mailto:peter.eisentraut@enterprisedb.com>> wrote:
>  > > - After reading the discussion a few times, I'm not so sure anymore
>  > > whether making this a cousin of date_trunc is the right way to go.  As
>  > > you mentioned, there are some behaviors specific to date_trunc that
>  > > don't really make sense in date_trunc_interval, and maybe we'll have
>  > > more of those.
> 
> For v10, I simplified the behavior by decoupling the behavior from 
> date_trunc() and putting in some restrictions as discussed earlier. It's 
> much simpler now. It could be argued that it goes too far in that 
> direction, but it's easy to reason about and we can put back some 
> features as we see fit.

Peter, thoughts on the new patch?

Regards,
-- 
-David
david@pgmasters.net



Re: truncating timestamps on arbitrary intervals

From
Peter Eisentraut
Date:
On 18.01.21 21:54, John Naylor wrote:
> On Mon, Nov 23, 2020 at 1:44 PM John Naylor 
> <john.naylor@enterprisedb.com <mailto:john.naylor@enterprisedb.com>> wrote:
>  >
>  > On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut 
> <peter.eisentraut@enterprisedb.com 
> <mailto:peter.eisentraut@enterprisedb.com>> wrote:
>  > > - After reading the discussion a few times, I'm not so sure anymore
>  > > whether making this a cousin of date_trunc is the right way to go.  As
>  > > you mentioned, there are some behaviors specific to date_trunc that
>  > > don't really make sense in date_trunc_interval, and maybe we'll have
>  > > more of those.
> 
> For v10, I simplified the behavior by decoupling the behavior from 
> date_trunc() and putting in some restrictions as discussed earlier. It's 
> much simpler now. It could be argued that it goes too far in that 
> direction, but it's easy to reason about and we can put back some 
> features as we see fit.

Committed.

I noticed that some of the documentation disappeared between v9 and v10. 
  So I put that back and updated it appropriately.  I also added a few 
more test cases to cover some things that have been discussed during the 
course of this thread.

As a potential follow-up, should we perhaps add named arguments?  That 
might make the invocations easier to read, depending on taste.



Re: truncating timestamps on arbitrary intervals

From
Erik Rijkers
Date:
> On 2021.03.24. 16:38 Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

> 
> Committed.
> 

'In cases full units' seems strange.

Not a native speaker but maybe the attached changes are improvements?


Erik Rijkers
Attachment

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Wed, Mar 24, 2021 at 11:38 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

> Committed.
>
> I noticed that some of the documentation disappeared between v9 and v10.
>   So I put that back and updated it appropriately.  I also added a few
> more test cases to cover some things that have been discussed during the
> course of this thread.

Thanks! I put off updating the documentation in case the latest approach was not feature-rich enough.

> As a potential follow-up, should we perhaps add named arguments?  That
> might make the invocations easier to read, depending on taste.

I think it's quite possible some users will prefer that. All we need is to add something like

proargnames => '{bin_width,input,origin}'

to the catalog, right?

Also, I noticed that I put in double semicolons in the new functions somehow. I'll fix that as well.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:


On Wed, Mar 24, 2021 at 1:25 PM Erik Rijkers <er@xs4all.nl> wrote:
>
> 'In cases full units' seems strange.
>
> Not a native speaker but maybe the attached changes are improvements?

-    In cases full units (1 minute, 1 hour, etc.), it gives the same result as
+    In case of full units (1 minute, 1 hour, etc.), it gives the same result as
     the analogous <function>date_trunc</function> call, but the difference is
     that <function>date_bin</function> can truncate to an arbitrary interval.
    </para>
 
I would say "In the case of"

    <para>
-    The <parameter>stride</parameter> interval cannot contain units of month
+    The <parameter>stride</parameter> interval cannot contain units of a month
     or larger.

The original seems fine to me here.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
Peter Eisentraut
Date:
On 24.03.21 18:25, Erik Rijkers wrote:
>> On 2021.03.24. 16:38 Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
> 
>>
>> Committed.
>>
> 
> 'In cases full units' seems strange.

fixed, thanks



Re: truncating timestamps on arbitrary intervals

From
Peter Eisentraut
Date:
On 24.03.21 18:58, John Naylor wrote:
>  > As a potential follow-up, should we perhaps add named arguments?  That
>  > might make the invocations easier to read, depending on taste.
> 
> I think it's quite possible some users will prefer that. All we need is 
> to add something like
> 
> proargnames => '{bin_width,input,origin}'
> 
> to the catalog, right?

right, plus some documentation adjustments perhaps

> Also, I noticed that I put in double semicolons in the new functions 
> somehow. I'll fix that as well.

I have fixed that.



Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
Currently, when the origin is after the input, the result is the timestamp at the end of the bin, rather than the beginning as expected. The attached puts the result consistently at the beginning of the bin.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Sat, Mar 27, 2021 at 1:06 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> The current docs seem to be missing a "synopsis", like
>
> +<synopsis>
> +date_trunc(<replaceable>stride</replaceable>, <replaceable>timestamp</replaceable>, <replaceable>origin</replaceable>)
> +</synopsis>

The attached 
- adds a synopsis 
- adds a bit more description to the parameters similar to those in date_trunc
- documents that negative intervals are treated the same as positive ones

Note on the last point: This just falls out of the math, so was not deliberate, but it seems fine to me. We could ban negative intervals, but that would possibly just inconvenience some people unnecessarily. We could also treat negative strides differently somehow, but I don't immediately see a useful and/or intuitive change in behavior to come of that.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

Re: truncating timestamps on arbitrary intervals

From
Salek Talangi
Date:
Hi all,

it might be a bit late now, but do you know that TimescaleDB already has a similar feature, named time_bucket?
Perhaps that can help with some design decisions.
I saw your feature on Depesz' "Waiting for PostgreSQL 14" and remembered reading about it just two days ago.

Best regards
Salek Talangi

Am Do., 1. Apr. 2021 um 13:31 Uhr schrieb John Naylor <john.naylor@enterprisedb.com>:
On Sat, Mar 27, 2021 at 1:06 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> The current docs seem to be missing a "synopsis", like
>
> +<synopsis>
> +date_trunc(<replaceable>stride</replaceable>, <replaceable>timestamp</replaceable>, <replaceable>origin</replaceable>)
> +</synopsis>

The attached 
- adds a synopsis 
- adds a bit more description to the parameters similar to those in date_trunc
- documents that negative intervals are treated the same as positive ones

Note on the last point: This just falls out of the math, so was not deliberate, but it seems fine to me. We could ban negative intervals, but that would possibly just inconvenience some people unnecessarily. We could also treat negative strides differently somehow, but I don't immediately see a useful and/or intuitive change in behavior to come of that.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Thu, Apr 1, 2021 at 9:11 AM Salek Talangi <salek.talangi@googlemail.com> wrote:
>
> Hi all,
>
> it might be a bit late now, but do you know that TimescaleDB already has a similar feature, named time_bucket?
> https://docs.timescale.com/latest/api#time_bucket
> Perhaps that can help with some design decisions.

Yes, thanks I'm aware of it. It's a bit more feature-rich, and I wanted to have something basic that users can have available without installing an extension. 

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
Peter Eisentraut
Date:
On 30.03.21 18:50, John Naylor wrote:
> On Sat, Mar 27, 2021 at 1:06 PM Justin Pryzby <pryzby@telsasoft.com 
> <mailto:pryzby@telsasoft.com>> wrote:
>  >
>  > The current docs seem to be missing a "synopsis", like
>  >
>  > +<synopsis>
>  > +date_trunc(<replaceable>stride</replaceable>, 
> <replaceable>timestamp</replaceable>, <replaceable>origin</replaceable>)
>  > +</synopsis>
> 
> The attached
> - adds a synopsis
> - adds a bit more description to the parameters similar to those in 
> date_trunc
> - documents that negative intervals are treated the same as positive ones
> 
> Note on the last point: This just falls out of the math, so was not 
> deliberate, but it seems fine to me. We could ban negative intervals, 
> but that would possibly just inconvenience some people unnecessarily. We 
> could also treat negative strides differently somehow, but I don't 
> immediately see a useful and/or intuitive change in behavior to come of 
> that.

committed



Re: truncating timestamps on arbitrary intervals

From
Peter Eisentraut
Date:
On 30.03.21 18:06, John Naylor wrote:
> Currently, when the origin is after the input, the result is the 
> timestamp at the end of the bin, rather than the beginning as expected. 
> The attached puts the result consistently at the beginning of the bin.

In the patch

+   if (origin > timestamp && stride_usecs > 1)
+       tm_delta -= stride_usecs;

is the condition stride_usecs > 1 really necessary?  My assessment is 
that it's not, in which case it would be better to omit it.



Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:

On Sat, Apr 10, 2021 at 7:43 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 30.03.21 18:06, John Naylor wrote:
> > Currently, when the origin is after the input, the result is the
> > timestamp at the end of the bin, rather than the beginning as expected.
> > The attached puts the result consistently at the beginning of the bin.
>
> In the patch
>
> +   if (origin > timestamp && stride_usecs > 1)
> +       tm_delta -= stride_usecs;
>
> is the condition stride_usecs > 1 really necessary?  My assessment is
> that it's not, in which case it would be better to omit it.

Without the condition, the case of 1 microsecond will fail to be a no-op. This case has no practical use, but it still must work correctly, just as date_trunc('microsecond', input) does.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
Peter Eisentraut
Date:
On 10.04.21 14:53, John Naylor wrote:
> 
> On Sat, Apr 10, 2021 at 7:43 AM Peter Eisentraut 
> <peter.eisentraut@enterprisedb.com 
> <mailto:peter.eisentraut@enterprisedb.com>> wrote:
>  >
>  > On 30.03.21 18:06, John Naylor wrote:
>  > > Currently, when the origin is after the input, the result is the
>  > > timestamp at the end of the bin, rather than the beginning as expected.
>  > > The attached puts the result consistently at the beginning of the bin.
>  >
>  > In the patch
>  >
>  > +   if (origin > timestamp && stride_usecs > 1)
>  > +       tm_delta -= stride_usecs;
>  >
>  > is the condition stride_usecs > 1 really necessary?  My assessment is
>  > that it's not, in which case it would be better to omit it.
> 
> Without the condition, the case of 1 microsecond will fail to be a 
> no-op. This case has no practical use, but it still must work correctly, 
> just as date_trunc('microsecond', input) does.

Ah yes, the tests cover that.  Committed.



Re: truncating timestamps on arbitrary intervals

From
Peter Eisentraut
Date:
On 22.04.21 11:16, Justin Pryzby wrote:
> It looks like we all missed that I misspelled "date_bin" as
> "date_trunc"...sorry.  I will include this with my next round of doc review, in
> case you don't want to make a separate commit for it.

fixed



Re: truncating timestamps on arbitrary intervals

From
Bauyrzhan Sakhariyev
Date:
Is date_bin supposed to return the beginning of the bin? And does the sign of an interval define the "direction" of the bin?
Judging by results of queries #1 and #2, sign of interval decides a direction timestamp gets shifted to (in both cases ts < origin)
but when ts >origin (queries #3 and #4) interval sign doesn't matter, specifically #4 doesn't return 6-th of January.

1. SELECT date_bin('-2 days'::interval, timestamp '2001-01-01 00:00:00', timestamp '2001-01-04 00:00:00'); -- 2001-01-02 00:00:00
2. SELECT date_bin('2 days'::interval, timestamp '2001-01-01 00:00:00', timestamp '2001-01-04 00:00:00'); -- 2000-12-31 00:00:00
3. SELECT date_bin('2 days'::interval, timestamp '2001-01-04 00:00:00', timestamp '2001-01-01 00:00:00'); -- 2001-01-03 00:00:00
4. SELECT date_bin('-2 days'::interval, timestamp '2001-01-04 00:00:00', timestamp '2001-01-01 00:00:00'); -- 2001-01-03 00:00:00

On Thu, Jul 22, 2021 at 6:21 PM John Naylor <john.naylor@2ndquadrant.com> wrote:
Hi,

When analyzing time-series data, it's useful to be able to bin
timestamps into equally spaced ranges. date_trunc() is only able to
bin on a specified whole unit. In the attached patch for the March
commitfest, I propose a new function date_trunc_interval(), which can
truncate to arbitrary intervals, e.g.:

select date_trunc_interval('15 minutes', timestamp '2020-02-16
20:48:40'); date_trunc_interval
---------------------
 2020-02-16 20:45:00
(1 row)

With this addition, it might be possible to turn the existing
date_trunc() functions into wrappers. I haven't done that here because
it didn't seem practical at this point. For one, the existing
functions have special treatment for weeks, centuries, and millennia.

Note: I've only written the implementation for the type timestamp
without timezone. Adding timezone support would be pretty simple, but
I wanted to get feedback on the basic idea first before making it
complete. I've also written tests and very basic documentation.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:

On Thu, Jul 22, 2021 at 12:24 PM Bauyrzhan Sakhariyev <baurzhansahariev@gmail.com> wrote:
>
> Is date_bin supposed to return the beginning of the bin?

Thanks for testing! And yes.

> And does the sign of an interval define the "direction" of the bin?

No, the boundary is intentionally the earlier one:

/*
 * Make sure the returned timestamp is at the start of the bin, even if
 * the origin is in the future.
 */
if (origin > timestamp && stride_usecs > 1)
    tm_delta -= stride_usecs;

I wonder if we should just disallow negative intervals here.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
Bauyrzhan Sakhariyev
Date:
> No, the boundary is intentionally the earlier one:

I found that commit in GitHub, thanks for pointing it out.
When I test locally origin_in_the_future case I get different results for positive and negative intervals (see queries #1 and #2 from above, they have same timestamp, origin and interval magnitude, difference is only in interval sign) - can it be that the version I downloaded from https://www.enterprisedb.com/postgresql-early-experience doesn't include commit with that improvement?

>  I wonder if we should just disallow negative intervals here.

I cannot imagine somebody using negative as a constant argument but users can pass another column as a first argument date or some function(ts) - not likely but possible. A line in docs about the leftmost point of interval as start of the bin could be helpful.

Not related to negative interval - I created a PR for adding zero check for stride https://github.com/postgres/postgres/pull/67 and after getting it closed I stopped right there - 1 line check doesn't worth going through the patching process I'm not familiar with.

>In the case of full units (1 minute, 1 hour, etc.), it gives the same result as the analogous date_trunc call,
Was not obvious to me that we need to supply Monday origin to make date_bin(1 week, ts) produce same result with date_trunc

Sorry for the verbose report and thanks for the nice function -  I know it's not yet released, was just playing around with beta as I want to align CrateDB date_bin with Postgresql

On Thu, Jul 22, 2021 at 7:28 PM John Naylor <john.naylor@enterprisedb.com> wrote:

On Thu, Jul 22, 2021 at 12:24 PM Bauyrzhan Sakhariyev <baurzhansahariev@gmail.com> wrote:
>
> Is date_bin supposed to return the beginning of the bin?

Thanks for testing! And yes.

> And does the sign of an interval define the "direction" of the bin?

No, the boundary is intentionally the earlier one:

/*
 * Make sure the returned timestamp is at the start of the bin, even if
 * the origin is in the future.
 */
if (origin > timestamp && stride_usecs > 1)
    tm_delta -= stride_usecs;

I wonder if we should just disallow negative intervals here.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:

On Thu, Jul 22, 2021 at 4:49 PM Bauyrzhan Sakhariyev <baurzhansahariev@gmail.com> wrote:
> Not related to negative interval - I created a PR for adding zero check for stride https://github.com/postgres/postgres/pull/67 and after getting it closed I stopped right there - 1 line check doesn't worth going through the patching process I'm not familiar with.

Thanks for the pull request! I added tests and reworded the error message slightly to match current style, and pushed.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:


On Thu, Jul 22, 2021 at 4:49 PM Bauyrzhan Sakhariyev <baurzhansahariev@gmail.com> wrote:
>
> > No, the boundary is intentionally the earlier one:
>
> I found that commit in GitHub, thanks for pointing it out.
> When I test locally origin_in_the_future case I get different results for positive and negative intervals (see queries #1 and #2 from above, they have same timestamp, origin and interval magnitude, difference is only in interval sign) - can it be that the version I downloaded from https://www.enterprisedb.com/postgresql-early-experience doesn't include commit with that improvement?

Sorry, I wasn't clear. The intention is that the boundary is on the lower side, but query #1 doesn't follow that, so that's a bug in my view. I found while developing the feature that the sign of the stride didn't seem to matter, but evidently I didn't try with the origin in the future.

> >  I wonder if we should just disallow negative intervals here.
>
> I cannot imagine somebody using negative as a constant argument but users can pass another column as a first argument date or some function(ts) - not likely but possible. A line in docs about the leftmost point of interval as start of the bin could be helpful.

In recent years there have been at least two attempts to add an absolute value function for intervals, and both stalled over semantics, so I'd rather just side-step the issue, especially as we're in beta.

> >In the case of full units (1 minute, 1 hour, etc.), it gives the same result as the analogous date_trunc call,
> Was not obvious to me that we need to supply Monday origin to make date_bin(1 week, ts) produce same result with date_trunc

The docs for date_trunc() don't mention this explicitly, but it might be worth mentioning ISO weeks. There is a nearby mention for EXTRACT():

https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

"The number of the ISO 8601 week-numbering week of the year. By definition, ISO weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year."

> Sorry for the verbose report and thanks for the nice function -  I know it's not yet released, was just playing around with beta as I want to align CrateDB date_bin with Postgresql

Thanks again for testing! This is good feedback.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
I wrote:

> On Thu, Jul 22, 2021 at 4:49 PM Bauyrzhan Sakhariyev <baurzhansahariev@gmail.com> wrote:
> >
> > > No, the boundary is intentionally the earlier one:
> >
> > I found that commit in GitHub, thanks for pointing it out.
> > When I test locally origin_in_the_future case I get different results for positive and negative intervals (see queries #1 and #2 from above, they have same timestamp, origin and interval magnitude, difference is only in interval sign) - can it be that the version I downloaded from https://www.enterprisedb.com/postgresql-early-experience doesn't include commit with that improvement?
>
> Sorry, I wasn't clear. The intention is that the boundary is on the lower side, but query #1 doesn't follow that, so that's a bug in my view. I found while developing the feature that the sign of the stride didn't seem to matter, but evidently I didn't try with the origin in the future.
>
> > >  I wonder if we should just disallow negative intervals here.
> >
> > I cannot imagine somebody using negative as a constant argument but users can pass another column as a first argument date or some function(ts) - not likely but possible. A line in docs about the leftmost point of interval as start of the bin could be helpful.
>
> In recent years there have been at least two attempts to add an absolute value function for intervals, and both stalled over semantics, so I'd rather just side-step the issue, especially as we're in beta.

Concretely, I propose to push the attached on master and v14. Since we're in beta 2 and this thread might not get much attention, I've CC'd the RMT.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

Re: truncating timestamps on arbitrary intervals

From
Tom Lane
Date:
John Naylor <john.naylor@enterprisedb.com> writes:
> Concretely, I propose to push the attached on master and v14. Since we're
> in beta 2 and this thread might not get much attention, I've CC'd the RMT.

+1, we can figure out whether that has a use some other time.

            regards, tom lane



Re: truncating timestamps on arbitrary intervals

From
Michael Paquier
Date:
On Tue, Jul 27, 2021 at 12:05:51PM -0400, John Naylor wrote:
> Concretely, I propose to push the attached on master and v14. Since we're
> in beta 2 and this thread might not get much attention, I've CC'd the RMT.

(It looks like gmail has messed up a bit the format of your last
message.)

Hmm.  The docs say also the following thing, but your patch does not
reflect that anymore:
"Negative intervals are allowed and are treated the same as positive
intervals."
So you may want to update that, at least.
--
Michael

Attachment

Re: truncating timestamps on arbitrary intervals

From
John Naylor
Date:
On Wed, Jul 28, 2021 at 12:15 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jul 27, 2021 at 12:05:51PM -0400, John Naylor wrote:
> > Concretely, I propose to push the attached on master and v14. Since we're
> > in beta 2 and this thread might not get much attention, I've CC'd the RMT.
>
> (It looks like gmail has messed up a bit the format of your last
> message.)

Hmm, it looks fine in the archives.

> Hmm.  The docs say also the following thing, but your patch does not
> reflect that anymore:
> "Negative intervals are allowed and are treated the same as positive
> intervals."

I'd forgotten that was documented based on incomplete information, thanks for looking! Pushed with that fixed.

--
John Naylor
EDB: http://www.enterprisedb.com