Re: Windowing Function Patch Review -> Standard Conformance - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: Windowing Function Patch Review -> Standard Conformance
Date
Msg-id e08cc0400812010813u10eb395ewcc55881bce633ca5@mail.gmail.com
Whole thread Raw
In response to Re: Windowing Function Patch Review -> Standard Conformance  ("David Rowley" <dgrowley@gmail.com>)
Responses Re: Windowing Function Patch Review -> Standard Conformance  ("Hitoshi Harada" <umi.tanuki@gmail.com>)
Re: Windowing Function Patch Review -> Standard Conformance  ("Hitoshi Harada" <umi.tanuki@gmail.com>)
Re: Windowing Function Patch Review -> Standard Conformance  ("David Rowley" <dgrowley@gmail.com>)
List pgsql-hackers
2008/12/1 David Rowley <dgrowley@gmail.com>:
> I wrote:
>> I was also reading over the standard tonight. I've discovered that the
>> OFFSET in LEAD() and LAG() is optional. It should default to 1 if it is
>> not present. Oracle seems to support this.
>>
>> SQL2008 says:
>> > If <lead or lag function> is specified, then:
>> > i) Let VE1 be <lead or lag extent> and let DT be the declared type of
>> VE1.
>> > ii) Case:
>> > Scalar expressions 209
>> > WD 9075-2:200w(E)
>> > 6.10 <window function>
>> > If <offset> is specified, then let OFF be <offset>. The declared type of
>> >OFF shall be an
>> > exact numeric type with scale 0 (zero).
>> > 1)
>> > 2) Otherwise, let OFF be 1 (one).
>>
>> Yet another variant of LEAD() and LAG() but I think well worth it for both
>> compliance to the standard and compatibility to Oracle.
>
> I figured this was quite simple so I've created a patch to implement this.
> Can probably put this down to the fact that I'm starting to feel bad about
> pointing out the mistakes and having someone else fix them. Figured it was
> time to make some changes myself.
>
> I've got limited experience with diff so please let me know if there is
> something wrong with the patch. Same goes for my changes to the code.
>
> I re-sequenced the OIDs of other window functions so it will require initdb.
>
> Also I made some updates to the documentation. Wasn't 100% sure on the
> syntax for the optional arguments there. Hitoshi had: arg1 [,optional1].
> I've changed this to arg, [optional1], [optional2].
>
> One thing I didn't do was update the regression test:
> SELECT oid, proname FROM pg_proc WHERE proiswfunc;
>
> Hopefully this patch will apply after applying Heikki's latest patch
> (version 3).
>
> If you're happy with this Heikki can you merge to your patch?

I merged Heikki's patch with your lead/lag, then added a few tests for
those new comer functions. Also it contains some of those Tom pointed
including:

- Remove sgml keyword modifications as it will be generated automatically.
- Remove PartitionClause and OrderClause since they can be replaced
with SortGroupClause.
- Parallel test schedule changed to fit the parallel limit.
- equalfuncs/nodefuncs are now consistent in Query/WFunc
- Fix error code, which is now 42P36. But I'm still not sure it is appropriate.

And the patch is against HEAD. The git repository now points "spool"
branch for his approach, which I suppose will be merged to the master
(trunk) of the window functions repository.

I tested the spool performance with David's earlier bigtable:

CREATE TABLE bigtable (
 id SERIAL NOT NULL PRIMARY KEY,
 timestamp TIMESTAMP NOT NULL
);

-- about 383MB of data
INSERT INTO bigtable (timestamp)
SELECT NOW() + (CAST(RANDOM() * 10 AS INT) || ' secs')::INTERVAL
FROM generate_series(1,10000000);

CREATE INDEX bigtable_timestamp_idx ON bigtable (timestamp);

VACUUM ANALYZE bigtable;

sample=# SELECT COUNT(*) FROM bigtable;
 count
----------
 10000000
(1 row)

sample=# SELECT LEAD(timestamp) OVER (ORDER BY id) FROM bigtable LIMIT 1;
           lead
----------------------------
 2008-12-02 00:15:10.288461
(1 row)

sample=# EXPLAIN ANALYZE SELECT LEAD(timestamp) OVER (ORDER BY id)
FROM bigtable LIMIT 1;

QUERY PLAN

----------------------------------------------------------------------------------------------
---------------------------------------------------
 Limit  (cost=0.00..0.04 rows=1 width=12) (actual time=0.038..0.039
rows=1 loops=1)
  ->  Window  (cost=0.00..386612.13 rows=10000000 width=12) (actual
time=0.036..0.036 rows=1
loops=1)
        ->  Index Scan using bigtable_pkey on bigtable
(cost=0.00..286612.13 rows=10000000 w
idth=12) (actual time=0.018..0.021 rows=2 loops=1)
 Total runtime: 0.071 ms
(4 rows)


shows quite good result. Great work.

The following query works on my build:

> SELECT depname,SUM(SUM(salary)) OVER (ORDER BY depname) FROM empsalary GROUP
> BY depname;
> ERROR:  variable not found in subplan target list


Now, I am thinking about refactoring around aggregate common code, and
renaming WFunc to WinFunc, which leads pg_proc.proiswfunc be
pg_proc.proiswinfunc and so on if no objections come.

P.S. seems hit attachment limitation. Sorry if you receive duplicated mail.

Regards,



--
Hitoshi Harada

Attachment

pgsql-hackers by date:

Previous
From: "Robert Haas"
Date:
Subject: Re: New to_timestamp implementation is pretty strict
Next
From: "Hitoshi Harada"
Date:
Subject: Re: Windowing Function Patch Review -> Standard Conformance