Re: Re: Add generate_series(date,date) and generate_series(date,date,integer) - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)
Date
Msg-id CADkLM=dY0aeS2QvoB7+QzE4_p=WMgR2A_Xexro-PY_u8UXo+3Q@mail.gmail.com
Whole thread Raw
In response to Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)  (David Steele <david@pgmasters.net>)
Responses Re: Re: Add generate_series(date,date) and generate_series(date,date,integer)
List pgsql-hackers
On Thu, Mar 17, 2016 at 10:00 AM, David Steele <david@pgmasters.net> wrote:
On 3/17/16 4:49 AM, Dean Rasheed wrote:

> On 16 March 2016 at 23:32, David Steele <david@pgmasters.net> wrote:
>
>>
>> I think in this case it comes down to a committer's judgement so I have
>> marked this "ready for committer" and passed the buck on to Álvaro.
>
> So I was pretty much "meh" on this patch too, because I'm not
> convinced it actually saves much typing, if any.
>
> However, I now realise that it introduces a backwards-compatibility
> breakage. Today it is possible to type
>
> SELECT * FROM generate_series('01-01-2000'::date, '01-04-2000', '7 days');

It can also be broken as below and this is even scarier to me:

postgres=# SELECT * FROM generate_series('01-01-2000'::date,
'01-04-2000'::date, '7 days');
ERROR:  invalid input syntax for integer: "7 days"
LINE 1: ...te_series('01-01-2000'::date, '01-04-2000'::date, '7 days');

And only works when:

postgres=# SELECT * FROM generate_series('01-01-2000'::date,
'01-04-2000'::date, '7 days'::interval);
    generate_series
------------------------
 2000-01-01 00:00:00+00
(1 row)

One might argue that string constants for dates in actual code might be
rare but I think it's safe to say that string constants for intervals
are pretty common.  I also think it unlikely that they are all
explicitly cast.

I marked this "waiting for author" so Corey can respond.  I actually
tested with the v3 patch since the v4 patch seems to be broken with git
apply or patch:

$ patch -p1 < ../other/generate_series_date.v4.diff
patching file doc/src/sgml/func.sgml
Hunk #1 succeeded at 14787 (offset 87 lines).
Hunk #2 succeeded at 14871 (offset 87 lines).
patching file src/backend/utils/adt/date.c
patch: **** malformed patch at line 163: diff --git
a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h

$ git apply -3 ../other/generate_series_date.v4.diff
fatal: corrupt patch at line 163

--
-David
david@pgmasters.net

Not sure what's wrong with the patch, but I get a clean one to you pending the outcome of the design discussion. v4 just ripped out the infinity tests, so v3 is valid for the issue you found..

So the function clobbers the situation where the user meant a timestamp range but instead gave two dates, and meant an interval but gave a string. I'm curious how generate_series() for numeric didn't have the same issue with floats.

I see two ways around this: 

1. Drop the step parameter entirely. My own use cases only ever require the step values 1 and -1, and which one the user wants can be inferred from (start < end). This would still change the output type where a person wanted timestamps, but instead input two dates.
2. Rename the function date_series() or generate_series_date()

I still think this is an important function because at the last several places I've worked, I've found myself manufacturing a query where some event data is likely to have date gaps, but we want to see the date gaps or at least have the 0 values contribute to a later average aggregate.

Like this:

select d.dt, sum(s.v1), sum(s2.v2), sum(t.v1), sum(t2.v2)
from   date_series_function(input1,input2) as d(dt),
       some_dimensional_characteristic c
left join sparse_table s on s.event_date = d.dt and s.c1 = c.cval
left join sparse_table s2 on s2.event_date = d.dt and s2.c2 = c.cval
left join other_sparse t on t.event_date = d.dt and t.c1 = c.cval
left join other_sparse t2 on t2.event_date = d.dt and t2.c2 = c.cval
group by d.dt
order by d.dt

This gets cumbersome when you have to cast every usage of your date column.

Furthermore, this is 90%+ of my usage of generate_series(), the remaining 10% being the integer case to populate sample data for tests.

Any thoughts on how to proceed?

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: silent data loss with ext4 / all current versions
Next
From: Teodor Sigaev
Date:
Subject: Re: WIP: Access method extendability