Thread: TODO items for window functions
The core window-functions patch is now committed and ready for wider testing. However, there are a number of unfinished items, at least some of which I'd like to see addressed before 8.4 release. In rough order of importance: * Support creation of user-defined window functions. I think this is a "must have" for 8.4 --- we are not in the habit of building nonextensible basic features. It doesn't seem that hard either. I think all we need do is to allow "WINDOW" as an attribute keyword in CREATE FUNCTION. Does anyone have an objection or a better idea? * Implement support for non-default window framing clauses. Most likely it's too late to consider getting the whole feature done for 8.4, but I wonder whether we could support the restricted case of allowing just these two combinationsBETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (which is default)BETWEEN UNBOUNDED PRECEDING AND UNBOUNDEDFOLLOWING I said earlier that we didn't really need to address this for 8.4, but my thinking was flawed. The case I think is really important is to allow last_value() to do something useful, and you can hardly argue that it's useful without an ORDER BY to define which row in the partition is "last". * Investigate whether we should prohibit window functions in recursive terms; check whether any of the committed prohibitions are unnecessary. * Look at tuplestore performance issues. The tuplestore_in_memory() thing is just a band-aid, we ought to try to solve it properly. tuplestore_advance seems like a weak spot as well. * Do we really need so much duplicated code between Agg and WindowAgg? regards, tom lane
Tom Lane Wrote: > The core window-functions patch is now committed and ready for wider > testing. However, there are a number of unfinished items, at least > some of which I'd like to see addressed before 8.4 release. In rough > order of importance: > > * Support creation of user-defined window functions. I think this is > a "must have" for 8.4 --- we are not in the habit of building > nonextensible basic features. It doesn't seem that hard either. > I think all we need do is to allow "WINDOW" as an attribute keyword > in CREATE FUNCTION. Does anyone have an objection or a better idea? > > * Implement support for non-default window framing clauses. Most likely > it's too late to consider getting the whole feature done for 8.4, but > I wonder whether we could support the restricted case of allowing just > these two combinations > BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (which is default) > BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING > I said earlier that we didn't really need to address this for 8.4, > but my thinking was flawed. The case I think is really important > is to allow last_value() to do something useful, and you can hardly > argue that it's useful without an ORDER BY to define which row in the > partition is "last". > > * Investigate whether we should prohibit window functions in recursive > terms; check whether any of the committed prohibitions are unnecessary. > > * Look at tuplestore performance issues. The tuplestore_in_memory() > thing is just a band-aid, we ought to try to solve it properly. > tuplestore_advance seems like a weak spot as well. > > * Do we really need so much duplicated code between Agg and WindowAgg? Hitoshi and I did briefly talk about these two a few months back, but there were other priorities at the time. Unsure how difficult it is, maybe another one for a TODO, 8.4 or 8.5 I'm not sure: * Minimise sorts in a query such as: david=# explain SELECT depname, david-# SUM(salary) OVER (ORDER BY salary), david-# SUM(salary) OVER (ORDER BY empno) david-# FROM empsalary david-# ORDER BY salary; QUERY PLAN ---------------------------------------------------------------------------- ----------------Sort (cost=213.15..215.75 rows=1040 width=44) Sort Key: salary -> WindowAgg (cost=142.83..161.03 rows=1040width=44) -> Sort (cost=142.83..145.43 rows=1040 width=44) Sort Key: empno -> WindowAgg (cost=72.52..90.72 rows=1040 width=44) -> Sort (cost=72.52..75.12 rows=1040 width=44) Sort Key: salary -> Seq Scan on empsalary (cost=0.00..20.40 rows=1040 width=44) In the above case it would be more efficient to evaluate windows with the same order by clause as the final results last to eliminate the final sort. In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3 sorts. We also perform 3. Also perhaps more difficult? Maybe 8.5... * Teach planner to decide which window to evaluate first based on costs. Currently the first window in the query is evaluated first, there may be no index to help sort the first window, but perhaps there are for other windows in the query. This may allow an index scan instead of a seqscan -> sort. I Oracle 10g seems to have the above capability but Sybase seems evaluate the first window first. I've yet to look at DB2. This would stop performance critical queries from looking like: SELECT id,sum_value2,sum_value FROM ( SELECT id, SUM(value) OVER (ORDER BY idxcol) AS sum_value, SUM(value2) OVER (ORDER BY no_idxcol) AS sum_value2 FROM some_table ) t; -- Allow index scan by putting the indexed column as the 1st window When they could look like: SELECT id, SUM(value2) OVER (ORDER BY no_idxcol) AS sum_value2, SUM(value) OVER (ORDER BY idxcol) AS sum_value FROM some_table; -- Planner has the option to eval 2nd window first due to index. David.
"David Rowley" <dgrowley@gmail.com> writes: > Unsure how difficult it is, maybe another one for a TODO, 8.4 or 8.5 I'm not > sure: > * Minimise sorts in a query such as: I'm not tremendously excited about improving that situation. As the code stands, the user can control what happens by ordering the WINDOW clause appropriately, so it's not really a show-stopper. Ideally we'd do better automatically, but it's not easy in the current planner structure --- we can only ask query_planner for one target sort order and there's no good way to determine beforehand which of the possible targets might be the best choice. So at best this is a "maybe TODO" for 8.5 or later. I do think the patch has probably left some low-hanging fruit on the simpler end of the difficulty spectrum, namely when the window stuff requires only one ordering that could be done either explicitly or by an indexscan. That choice should ideally be done with a proper cost comparison taking any LIMIT into account. I think right now the LIMIT might not be accounted for, or might be considered even when it shouldn't be because another sort is needed anyway. But in any case, the tuplestore internal issues are probably the more significant performance problems for the short term. regards, tom lane
2008/12/29 Tom Lane <tgl@sss.pgh.pa.us>: > The core window-functions patch is now committed and ready for wider > testing. However, there are a number of unfinished items, at least > some of which I'd like to see addressed before 8.4 release. In rough > order of importance: > > * Support creation of user-defined window functions. I think this is > a "must have" for 8.4 --- we are not in the habit of building > nonextensible basic features. It doesn't seem that hard either. > I think all we need do is to allow "WINDOW" as an attribute keyword > in CREATE FUNCTION. Does anyone have an objection or a better idea? The reason I and people decided window functions are not able to be defined by user is whether Window functions API is ready for exposure. If we agree with the current design is not changing for the long future, I don't have any objection. My only concern is for plpgsql. For c functions we can define user functions but what about other pls? > * Implement support for non-default window framing clauses. Most likely > it's too late to consider getting the whole feature done for 8.4, but > I wonder whether we could support the restricted case of allowing just > these two combinations > BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (which is default) > BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING > I said earlier that we didn't really need to address this for 8.4, > but my thinking was flawed. The case I think is really important > is to allow last_value() to do something useful, and you can hardly > argue that it's useful without an ORDER BY to define which row in the > partition is "last". Frame clause concern is shrinking situation. For shrinking frame, we must do something to optimize aggregate not to recalculate from the first of the frame. So as far as we stay in UNBOUNDED PRECEDING, it is not so hard work, inclulding BETWEEN UNBOUNDED PRECEDING AND n FOLLOWING. And surveying sgml docs, I found this is not correct. http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/select.sgml?r1=1.112&r2=1.113 + default framing behavior, which is equivalent to the framing clause + <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</>. the default frame with ORDER BY clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, as aggregates perform rows peer to the current row. Regards, -- Hitoshi Harada
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: > 2008/12/29 Tom Lane <tgl@sss.pgh.pa.us>: >> * Support creation of user-defined window functions. I think this is >> a "must have" for 8.4 --- we are not in the habit of building >> nonextensible basic features. It doesn't seem that hard either. > The reason I and people decided window functions are not able to be > defined by user is whether Window functions API is ready for exposure. Well, it seems about as stable as any other bit of new backend code ;-). We never promise that backend-internal APIs will not change across versions. > My only concern is for plpgsql. For c functions we can define user > functions but what about other pls? I'm not concerned about making this stuff available at the PL level (at least not yet). C-level capability will be enough to satisfy my concern for 8.4. > And surveying sgml docs, I found this is not correct. > http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/select.sgml?r1=1.112&r2=1.113 > + default framing behavior, which is equivalent to the framing clause > + <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</>. > the default frame with ORDER BY clause is RANGE BETWEEN UNBOUNDED > PRECEDING AND CURRENT ROW, as aggregates perform rows peer to the > current row. What is the difference? AFAICS the RANGE and ROWS keywords ought to be equivalent if you are not specifying "expression PRECEDING" or "expression FOLLOWING". regards, tom lane
I wrote: > * Support creation of user-defined window functions. I think this is > a "must have" for 8.4 --- we are not in the habit of building > nonextensible basic features. It doesn't seem that hard either. > I think all we need do is to allow "WINDOW" as an attribute keyword > in CREATE FUNCTION. Does anyone have an objection or a better idea? What I had in mind when I wrote that was something like create [or replace] function mywindow(...) returns ... as 'mymodule, 'mywindow' language c window; but on reflection there seems to be a case also for create [or replace] window function mywindow(...) returns ... as 'mymodule, 'mywindow' language c; The main argument in favor of the latter is that window-ness will need to be a fixed property of a function that you can't change except by dropping and recreating it, because any existing views calling the function will have its window-ness embedded in them in the form of whether they use a FuncExpr or WindowFunc node to call it. So it doesn't feel quite like an optional attribute. However, if we do that then for consistency we'd have to invent DROP WINDOW FUNCTION, ALTER WINDOW FUNCTION, RENAME WINDOW FUNCTION, COMMENT ON WINDOW FUNCTION, yadda yadda, and insist that you refer to a function properly (with or without WINDOW) in each one of these commands. Which would be a real PITA to implement and document, and I can't see that it's doing anything much for users either. So I'm still leaning to the first way. Comments? regards, tom lane
2008/12/29 Tom Lane <tgl@sss.pgh.pa.us>: > I wrote: >> * Support creation of user-defined window functions. I think this is >> a "must have" for 8.4 --- we are not in the habit of building >> nonextensible basic features. It doesn't seem that hard either. >> I think all we need do is to allow "WINDOW" as an attribute keyword >> in CREATE FUNCTION. Does anyone have an objection or a better idea? > > What I had in mind when I wrote that was something like > > create [or replace] function mywindow(...) returns ... > as 'mymodule, 'mywindow' > language c > window; > +1 regards Pavel Stehule > but on reflection there seems to be a case also for > > create [or replace] window function mywindow(...) returns ... > as 'mymodule, 'mywindow' > language c; > > The main argument in favor of the latter is that window-ness will need > to be a fixed property of a function that you can't change except by > dropping and recreating it, because any existing views calling the > function will have its window-ness embedded in them in the form of > whether they use a FuncExpr or WindowFunc node to call it. So it > doesn't feel quite like an optional attribute. > > However, if we do that then for consistency we'd have to invent > DROP WINDOW FUNCTION, ALTER WINDOW FUNCTION, RENAME WINDOW FUNCTION, > COMMENT ON WINDOW FUNCTION, yadda yadda, and insist that you refer > to a function properly (with or without WINDOW) in each one of these > commands. Which would be a real PITA to implement and document, > and I can't see that it's doing anything much for users either. > > So I'm still leaning to the first way. Comments? > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Tom Lane wrote: > However, if we do that then for consistency we'd have to invent > DROP WINDOW FUNCTION, ALTER WINDOW FUNCTION, RENAME WINDOW FUNCTION, > COMMENT ON WINDOW FUNCTION, yadda yadda, and insist that you refer > to a function properly (with or without WINDOW) in each one of these > commands. Which would be a real PITA to implement and document, > and I can't see that it's doing anything much for users either. > > So I'm still leaning to the first way. Comments? > > > I don't know that this matters so much unless you're going to have a seperate namespace for window functions. Otherwise, isn't WINDOW basically a noise word for these operations? cheers andrew
On Mon, Dec 29, 2008 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> * Support creation of user-defined window functions. I think this is >> a "must have" for 8.4 --- we are not in the habit of building >> nonextensible basic features. It doesn't seem that hard either. >> I think all we need do is to allow "WINDOW" as an attribute keyword >> in CREATE FUNCTION. Does anyone have an objection or a better idea? > > What I had in mind when I wrote that was something like > > create [or replace] function mywindow(...) returns ... > as 'mymodule, 'mywindow' > language c > window; > i don't understand this window function stuff well yet, but AFAIU it is like an aggregate function that shows grouped values without grouping rows (ok, maybe a very laizy or novice definition) but if that is correct or near correct maybe we need to follow the same pattern: create function -- without any decoration create aggregate maybe with a decoration of being window o create window aggregate or something similar... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
I wrote: > * Investigate whether we should prohibit window functions in recursive > terms; check whether any of the committed prohibitions are unnecessary. I looked into these questions a bit. As for the first, there doesn't appear to be a compelling implementation reason to forbid it, and I can't find anything in the spec that says to disallow it. SQL:2008's prohibition on aggregates in recursive terms is in 7.13 <query expression> syntax rule 2) g) iii) 4), and there's nothing about window functions in the vicinity. The primary reason to forbid aggregates, so far as I can divine the intent of the SQL committee, is that incremental evaluation of an aggregate would give implementation-dependent results, ie you'd get a different aggregate result depending on how many and which rows the implementation chose to push through the recursion at a time. It seems like the same charge could be leveled against window functions. On the other hand it's at least possible to construct recursive queries in which all the rows of a given window partition should get pushed through together, so that you should get consistent answers despite the overall incremental evaluation. So I can't tell for sure if the committee thought about that and intentionally decided to allow window functions in recursive terms, or if their failure to forbid it was an oversight. (My confidence in the unerring accuracy of the spec is not high at the moment ;-).) Nonetheless, the spec does not forbid it, so I feel we shouldn't either. As for the second point, I looked at every place that the committed patch throws an error for queries or expressions containing window functions. Most are demonstrably per spec, or are necessary implementation restrictions arising from the fact that we know an expression isn't going to get fed through the full planner/executor machinery (an example of the latter is ALTER COLUMN TYPE USING). The only case that I think is debatable is that we are throwing error for window functions used in a SELECT FOR UPDATE/FOR SHARE query. The corresponding error for aggregate functions is necessary because the executor top level doesn't "see" the individual rows that went into the aggregate, so there's no way to lock them. In the case of window functions no aggregation occurs, and so in principle we could lock the rows. However, consider something like this: select x, lead(x) over() from table for update limit 1; Because of the LIMIT, we'd only lock the first-returned row ... but the values returned would also depend on the second row of the table, which wouldn't get locked. In general the results could depend on any or all rows of the table but we might lock only some. This seems to me to be at variance with how you'd expect SELECT FOR UPDATE to behave, so I'm inclined to leave the prohibition in there --- at least until someone comes up with a convincing use-case for SELECT FOR UPDATE together with a window function, and explains why he doesn't care about relevant rows possibly not getting locked. Comments? regards, tom lane
2008/12/30 Tom Lane <tgl@sss.pgh.pa.us>: > "Hitoshi Harada" <umi.tanuki@gmail.com> writes: >> And surveying sgml docs, I found this is not correct. > >> http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/select.sgml?r1=1.112&r2=1.113 > >> + default framing behavior, which is equivalent to the framing clause >> + <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</>. > >> the default frame with ORDER BY clause is RANGE BETWEEN UNBOUNDED >> PRECEDING AND CURRENT ROW, as aggregates perform rows peer to the >> current row. > > What is the difference? AFAICS the RANGE and ROWS keywords ought to be > equivalent if you are not specifying "expression PRECEDING" or > "expression FOLLOWING". The difference is that RANGE ... CURRENT ROW contains all peers of the current row, while ROWS ... CURRENT ROW doesn't contain them but the current row itself only. See 7.11 rule 5-b. Regards, -- Hitoshi Harada
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> However, if we do that then for consistency we'd have to invent >> DROP WINDOW FUNCTION, ALTER WINDOW FUNCTION, RENAME WINDOW FUNCTION, >> COMMENT ON WINDOW FUNCTION, yadda yadda, and insist that you refer >> to a function properly (with or without WINDOW) in each one of these >> commands. > I don't know that this matters so much unless you're going to have a > seperate namespace for window functions. Otherwise, isn't WINDOW > basically a noise word for these operations? Well, the question is whether window functions are so different from plain functions that we should treat them as a different kind of animal for SQL-command purposes. We do do that for aggregate functions, but aggregates have some really fundamental effects on query semantics. Consider select sin(x) from table; -- returns 1 row per table row select sum(x) from table; -- returns 1 row select lead(x) over () from table; -- returns 1 row per table row In this sense window functions aren't that different from regular ones. Window functions are also much more like regular functions than aggregates in terms of what you have to specify to define one. You could certainly argue the classification either way, but I think that we should make a hard decision now: either window functions are treated as a distinct object type (implying their own set of command names and nuisance errors if you use the wrong one), or they are not a distinct object type (implying that WINDOW is an attribute for CREATE FUNCTION and not part of the command name). If we are wishy-washy about it and treat WINDOW as just a noise word in some contexts then we will have user confusion. The precedent that is bothering me here is all the user confusion that has ensued over whether you can use ALTER TABLE to operate on sequences and views. regards, tom lane
2008/12/30 Jaime Casanova <jcasanov@systemguards.com.ec>: > On Mon, Dec 29, 2008 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wrote: >>> * Support creation of user-defined window functions. I think this is >>> a "must have" for 8.4 --- we are not in the habit of building >>> nonextensible basic features. It doesn't seem that hard either. >>> I think all we need do is to allow "WINDOW" as an attribute keyword >>> in CREATE FUNCTION. Does anyone have an objection or a better idea? >> >> What I had in mind when I wrote that was something like >> >> create [or replace] function mywindow(...) returns ... >> as 'mymodule, 'mywindow' >> language c >> window; >> > > i don't understand this window function stuff well yet, but AFAIU it > is like an aggregate function that shows grouped values without > grouping rows (ok, maybe a very laizy or novice definition) but if > that is correct or near correct maybe we need to follow the same > pattern: > > create function -- without any decoration > create aggregate maybe with a decoration of being window o create > window aggregate or something similar... > I prefer "create window function" because it is semantically readable and window keyword is more similar to aggregate than immutable, or strict. And for drop/comment or so, I guess we don't need prepare window keyword. It's because window function is represented in pg_proc catalog only, whereas aggregate uses pg_proc and pg_aggregate. If there weren't window keyword in DROP FUNCTION, we won't be worried which function to be dropped? Regards, -- Hitoshi Harada
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > i don't understand this window function stuff well yet, but AFAIU it > is like an aggregate function that shows grouped values without > grouping rows (ok, maybe a very laizy or novice definition) but if > that is correct or near correct maybe we need to follow the same > pattern: > create function -- without any decoration > create aggregate maybe with a decoration of being window o create > window aggregate or something similar... No, we can't really manage this as a variant of CREATE AGGREGATE --- the information you need to specify to create a window function is not at all like what you need to specify to create an aggregate. regards, tom lane
2008/12/30 Tom Lane <tgl@sss.pgh.pa.us>: > Andrew Dunstan <andrew@dunslane.net> writes: >> Tom Lane wrote: >>> However, if we do that then for consistency we'd have to invent >>> DROP WINDOW FUNCTION, ALTER WINDOW FUNCTION, RENAME WINDOW FUNCTION, >>> COMMENT ON WINDOW FUNCTION, yadda yadda, and insist that you refer >>> to a function properly (with or without WINDOW) in each one of these >>> commands. > >> I don't know that this matters so much unless you're going to have a >> seperate namespace for window functions. Otherwise, isn't WINDOW >> basically a noise word for these operations? > > Well, the question is whether window functions are so different from > plain functions that we should treat them as a different kind of animal > for SQL-command purposes. We do do that for aggregate functions, but > aggregates have some really fundamental effects on query semantics. > Consider > > select sin(x) from table; -- returns 1 row per table row > select sum(x) from table; -- returns 1 row > select lead(x) over () from table; -- returns 1 row per table row > > In this sense window functions aren't that different from regular ones. > > Window functions are also much more like regular functions than > aggregates in terms of what you have to specify to define one. > > You could certainly argue the classification either way, but I think > that we should make a hard decision now: either window functions are > treated as a distinct object type (implying their own set of command > names and nuisance errors if you use the wrong one), or they are not a > distinct object type (implying that WINDOW is an attribute for CREATE > FUNCTION and not part of the command name). If we are wishy-washy about > it and treat WINDOW as just a noise word in some contexts then we will > have user confusion. The precedent that is bothering me here is all the > user confusion that has ensued over whether you can use ALTER TABLE to > operate on sequences and views. Hmm, sequences and views are created by CREATE SEQUENCE CREATE VIEW but CREATE WINDOW FUNCTION is not so. I mean, by this syntax user recognizes he creates one of the plain function with window attribute. And as I said in the previous mail, actually it is. The proiswindow attribute is used to declare that it can handle window function API and nothing more, everything is the same with plain ones. Regards, -- Hitoshi Harada
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: > 2008/12/30 Tom Lane <tgl@sss.pgh.pa.us>: >> What is the difference? AFAICS the RANGE and ROWS keywords ought to be >> equivalent if you are not specifying "expression PRECEDING" or >> "expression FOLLOWING". > The difference is that RANGE ... CURRENT ROW contains all peers of the > current row, while ROWS ... CURRENT ROW doesn't contain them but the > current row itself only. See 7.11 rule 5-b. Hah, I had missed that fine point. Okay, doc is wrong and I will fix. Given that, I think that a suitable minimum implementation should cover both the RANGE/ROWS distinction and the CURRENT ROW/UNBOUNDED FOLLOWING distinction, ie I would like 8.4 to support RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGROWS BETWEEN UNBOUNDEDPRECEDING AND CURRENT ROWROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (1 is the default, 2 and 4 behave the same unless I'm still missing something.) This doesn't seem too difficult to consider adding now, and it will greatly increase the usefulness of frame-dependent window functions. Is this something you're interested in working on? I can tackle it if you don't have time now. regards, tom lane
2008/12/30 Tom Lane <tgl@sss.pgh.pa.us>: > Hah, I had missed that fine point. Okay, doc is wrong and I will fix. > > Given that, I think that a suitable minimum implementation should cover > both the RANGE/ROWS distinction and the CURRENT ROW/UNBOUNDED FOLLOWING > distinction, ie I would like 8.4 to support > > RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING > ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING > > (1 is the default, 2 and 4 behave the same unless I'm still missing > something.) My understanding is as well. > Is this something you're interested in working on? I can tackle it > if you don't have time now. > > regards, tom lane > Sorry, over the new year days, I don't have time and will be remote. Maybe from 3th or 4th I can work on this, so if you have time during time I would like you to do it. Otherwise, I will. Regards, -- Hitoshi Harada
"Hitoshi Harada" <umi.tanuki@gmail.com> writes: > 2008/12/30 Tom Lane <tgl@sss.pgh.pa.us>: >> Is this something you're interested in working on? I can tackle it >> if you don't have time now. > Sorry, over the new year days, I don't have time and will be remote. > Maybe from 3th or 4th I can work on this, so if you have time during > time I would like you to do it. Otherwise, I will. I have nothing pressing during this week; I'll see what I can get done. regards, tom lane
Tom Lane escribió: > The core window-functions patch is now committed and ready for wider > testing. However, there are a number of unfinished items, at least > some of which I'd like to see addressed before 8.4 release. In rough > order of importance: [lots of discussion] Perhaps I was a bit hasty -- I added the initial items mentioned in this thread to the Todo page: http://wiki.postgresql.org/wiki/Todo#Window_Functions Perhaps people who has a clue should have a look for reworking the list. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, 2008-12-29 at 12:35 -0500, Tom Lane wrote: > we could lock the rows. However, consider something like this: > > select x, lead(x) over() from table for update limit 1; > > Because of the LIMIT, we'd only lock the first-returned row ... > but the values returned would also depend on the second row of the > table, which wouldn't get locked. In general the results could > depend on any or all rows of the table but we might lock only some. > This seems to me to be at variance with how you'd expect SELECT FOR > UPDATE to behave, so I'm inclined to leave the prohibition in there > --- at least until someone comes up with a convincing use-case for > SELECT FOR UPDATE together with a window function, and explains why > he doesn't care about relevant rows possibly not getting locked. > How is that different from a subselect? create table foo(a int, b int); create table bar(c int, d int); insert into foo values(1, 10); insert into foo values(2, 20); insert into bar values(100, 1000); -- connection1 BEGIN; select a, b, (select d from bar where c = 100) as d from foo where a = 1 for update; -- connection2 BEGIN; select a, b, (select d from bar where c = 100) as d from foo where a = 2 for update; The single tuple in bar affects both results, but the second connection is not blocked. Regards,Jeff Davis
I wrote: > You could certainly argue the classification either way, but I think > that we should make a hard decision now: either window functions are > treated as a distinct object type (implying their own set of command > names and nuisance errors if you use the wrong one), or they are not a > distinct object type (implying that WINDOW is an attribute for CREATE > FUNCTION and not part of the command name). If we are wishy-washy about > it and treat WINDOW as just a noise word in some contexts then we will > have user confusion. The precedent that is bothering me here is all the > user confusion that has ensued over whether you can use ALTER TABLE to > operate on sequences and views. Apparently that analogy didn't impress anyone but me. AFAICT the majority opinion is that we should use the syntax create [or replace] [window] function ... but just ignore the distinction between regular functions and window functions for all other function-related SQL commands. Barring further discussion, I'll make that happen in the next day or two. regards, tom lane
On Tue, Dec 30, 2008 at 11:59:22AM -0500, Tom Lane wrote: > I wrote: > > You could certainly argue the classification either way, but I > > think that we should make a hard decision now: either window > > functions are treated as a distinct object type (implying their > > own set of command names and nuisance errors if you use the wrong > > one), or they are not a distinct object type (implying that WINDOW > > is an attribute for CREATE FUNCTION and not part of the command > > name). If we are wishy-washy about it and treat WINDOW as just a > > noise word in some contexts then we will have user confusion. The > > precedent that is bothering me here is all the user confusion that > > has ensued over whether you can use ALTER TABLE to operate on > > sequences and views. > > Apparently that analogy didn't impress anyone but me. AFAICT the > majority opinion is that we should use the syntax > > create [or replace] [window] function ... > > but just ignore the distinction between regular functions and window > functions for all other function-related SQL commands. Barring further > discussion, I'll make that happen in the next day or two. Presumably psql should know about this change. Should \df now include windowing functions along with a boolean column that indicates whether a function is a windowing function? Should there be \dw[+] instead? In either case, should the S option indicating "include system functions only when S is present" (e.g. \dwS) apply? I'm thinking yes on that last one. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
> Apparently that analogy didn't impress anyone but me. AFAICT the > majority opinion is that we should use the syntax > > create [or replace] [window] function ... > > but just ignore the distinction between regular functions and window > functions for all other function-related SQL commands. Barring further > discussion, I'll make that happen in the next day or two. It impressed me. I liked making WINDOW a flag that occurs later in the statement a lot better. ...Robert
David Fetter <david@fetter.org> writes: > Presumably psql should know about this change. Should \df now include > windowing functions along with a boolean column that indicates whether > a function is a windowing function? Should there be \dw[+] instead? > In either case, should the S option indicating "include system > functions only when S is present" (e.g. \dwS) apply? If people are going to start proposing that, I'm going to switch back to the position that WINDOW should be treated as an attribute. The whole point of the discussion is that no one wanted to get into the game of treating window functions as a separate classification in general. regards, tom lane
"Robert Haas" <robertmhaas@gmail.com> writes: >> Apparently that analogy didn't impress anyone but me. > It impressed me. I liked making WINDOW a flag that occurs later in > the statement a lot better. I ended up going with the flag/attribute approach. The other would be only marginally more work now, but I remain convinced that we'd have to do more work later to deal with the issue that CREATE WINDOW FUNCTION looks like "window function" is a distinct kind of SQL object. And nobody seemed to want to propagate that distinction into all the places it would logically have to go. However ... having said that, there is more to David Fetter's gripe about \df than I realized at first. Consider regression=# \df nth_value List of functions Schema | Name | Result data type | Argument datatypes ------------+-----------+------------------+---------------------pg_catalog | nth_value | anyelement | anyelement,integer (1 row) Even without any consideration of user-defined window functions, this seems a bit lacking: the user of nth_value() needs to know that he has to write an OVER clause, and as things stand \df is not going to give him the slightest hint about that. So I can see the argument for reflecting window-ness into \df somehow. I am not thrilled about inventing a new column for this, but how about a display like so: regression=# \df nth_value List of functions Schema | Name | Result data type | Argumentdata types ------------+-----------+------------------+---------------------------------pg_catalog | nth_value | anyelement |anyelement, integer OVER window or some other addition that only shows up when needed. regards, tom lane
2008/12/31 Tom Lane <tgl@sss.pgh.pa.us>: > "Robert Haas" <robertmhaas@gmail.com> writes: >>> Apparently that analogy didn't impress anyone but me. > >> It impressed me. I liked making WINDOW a flag that occurs later in >> the statement a lot better. > > I ended up going with the flag/attribute approach. The other would be > only marginally more work now, but I remain convinced that we'd have to > do more work later to deal with the issue that CREATE WINDOW FUNCTION > looks like "window function" is a distinct kind of SQL object. And > nobody seemed to want to propagate that distinction into all the places > it would logically have to go. > > However ... having said that, there is more to David Fetter's gripe > about \df than I realized at first. Consider > > regression=# \df nth_value > List of functions > Schema | Name | Result data type | Argument data types > ------------+-----------+------------------+--------------------- > pg_catalog | nth_value | anyelement | anyelement, integer > (1 row) > > Even without any consideration of user-defined window functions, > this seems a bit lacking: the user of nth_value() needs to know that > he has to write an OVER clause, and as things stand \df is not going > to give him the slightest hint about that. So I can see the argument > for reflecting window-ness into \df somehow. > > I am not thrilled about inventing a new column for this, but how about > a display like so: > > regression=# \df nth_value > List of functions > Schema | Name | Result data type | Argument data types > ------------+-----------+------------------+--------------------------------- > pg_catalog | nth_value | anyelement | anyelement, integer OVER window > +1 regards Pavel Stehule > or some other addition that only shows up when needed. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Tom Lane wrote: > I am not thrilled about inventing a new column for this, but how about > a display like so: > > regression=# \df nth_value > List of functions > Schema | Name | Result data type | Argument data types > ------------+-----------+------------------+--------------------------------- > pg_catalog | nth_value | anyelement | anyelement, integer OVER window > > or some other addition that only shows up when needed. That looks like "OVER window" is associated with the "integer", like DEFAULT. I don't have any better suggestions, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas escribió: > Tom Lane wrote: >> I am not thrilled about inventing a new column for this, but how about >> a display like so: >> >> regression=# \df nth_value >> List of functions >> Schema | Name | Result data type | Argument data types >> ------------+-----------+------------------+--------------------------------- >> pg_catalog | nth_value | anyelement | anyelement, integer OVER window >> >> or some other addition that only shows up when needed. > > That looks like "OVER window" is associated with the "integer", like > DEFAULT. I don't have any better suggestions, though. List of functions Schema | Name | Result data type | Argument data types------------+-----------+------------------+-----------------------------------pg_catalog | nth_value | anyelement | (anyelement, integer) OVER window -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Heikki Linnakangas escribi�: >> Tom Lane wrote: >>> pg_catalog | nth_value | anyelement | anyelement, integer OVER window >> >> That looks like "OVER window" is associated with the "integer", like >> DEFAULT. I don't have any better suggestions, though. > pg_catalog | nth_value | anyelement | (anyelement, integer) OVER window Yeah, I had considered that too, and it has a distinct advantage for parameterless functions like rank(): Schema | Name | Result data type | Argument data types ------------+------+------------------+---------------------pg_catalog | rank | bigint | OVER windowpg_catalog| rank | bigint | () OVER window The latter is definitely clearer about what you're supposed to do. However, it seems kind of inconsistent to do this for window functions unless we also make \df start putting parens around the argument lists for regular functions. Comments? regards, tom lane
On Wed, Dec 31, 2008 at 11:04:41AM -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Heikki Linnakangas escribi�: > >> Tom Lane wrote: > >>> pg_catalog | nth_value | anyelement | anyelement, integer OVER window > >> > >> That looks like "OVER window" is associated with the "integer", like > >> DEFAULT. I don't have any better suggestions, though. > > > pg_catalog | nth_value | anyelement | (anyelement, integer) OVER window > > Yeah, I had considered that too, and it has a distinct advantage for > parameterless functions like rank(): > > Schema | Name | Result data type | Argument data types > ------------+------+------------------+--------------------- > pg_catalog | rank | bigint | OVER window > pg_catalog | rank | bigint | () OVER window > > The latter is definitely clearer about what you're supposed to do. +1 on the latter. > However, it seems kind of inconsistent to do this for window functions > unless we also make \df start putting parens around the argument lists > for regular functions. Comments? Would parens around all the argument lists really be so bad? I'm thinking not. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi, Happy new year! Le 31 déc. 08 à 17:04, Tom Lane <tgl@sss.pgh.pa.us> a écrit : > However, it seems kind of inconsistent to do this for window functions > unless we also make \df start putting parens around the argument lists > for regular functions. Comments? A way to distinguish between window functions "seeing" frames vs. partitions, if possible, would sound quite useful after a docs reading session. AKA OVER(... order by ...) effect in some other thread. -- dim
> I am not thrilled about inventing a new column for this, but how about > a display like so: > > regression=# \df nth_value > List of functions > Schema | Name | Result data type | Argument data types > ------------+-----------+------------------+--------------------------------- > pg_catalog | nth_value | anyelement | anyelement, integer OVER window > > or some other addition that only shows up when needed. I think this whole idea is a bad one. In the current release, you can do DROP FUNCTION Name ( Argument data types ) ...and it will work. Maybe you will say that no one is doing this via a script (which I wouldn't bet on, but it's possible) but I'm sure people are doing it via cut and paste, because I have done exactly this thing. Any of the various proposals for hacking up Argument data types will make this no longer true, and somebody will get confused. I think you should bite the bullet and add a "type" column (f for regular function and w for window? could there be others in the future?). ...Robert
Robert Haas wrote: > > I am not thrilled about inventing a new column for this, but how about > > a display like so: > > > > regression=# \df nth_value > > List of functions > > Schema | Name | Result data type | Argument data types > > ------------+-----------+------------------+--------------------------------- > > pg_catalog | nth_value | anyelement | anyelement, integer OVER window > > > > or some other addition that only shows up when needed. > > I think this whole idea is a bad one. In the current release, you can do > > DROP FUNCTION Name ( Argument data types ) > > ...and it will work. Maybe you will say that no one is doing this via > a script (which I wouldn't bet on, but it's possible) but I'm sure > people are doing it via cut and paste, because I have done exactly > this thing. Any of the various proposals for hacking up Argument data > types will make this no longer true, and somebody will get confused. > I think you should bite the bullet and add a "type" column (f for > regular function and w for window? could there be others in the > future?). I assume this is still an open issue. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Robert Haas wrote: > > I am not thrilled about inventing a new column for this, but how about > > a display like so: > > > > regression=# \df nth_value > > List of functions > > Schema | Name | Result data type | Argument data types > > ------------+-----------+------------------+--------------------------------- > > pg_catalog | nth_value | anyelement | anyelement, integer OVER window > > > > or some other addition that only shows up when needed. > > I think this whole idea is a bad one. In the current release, you can do > > DROP FUNCTION Name ( Argument data types ) > > ...and it will work. Maybe you will say that no one is doing this via > a script (which I wouldn't bet on, but it's possible) but I'm sure > people are doing it via cut and paste, because I have done exactly > this thing. Any of the various proposals for hacking up Argument data > types will make this no longer true, and somebody will get confused. > I think you should bite the bullet and add a "type" column (f for > regular function and w for window? could there be others in the > future?). Are we doing anything for this for 8.4? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
2009/2/5 Bruce Momjian <bruce@momjian.us>: > Robert Haas wrote: >> > I am not thrilled about inventing a new column for this, but how about >> > a display like so: >> > >> > regression=# \df nth_value >> > List of functions >> > Schema | Name | Result data type | Argument data types >> > ------------+-----------+------------------+--------------------------------- >> > pg_catalog | nth_value | anyelement | anyelement, integer OVER window >> > >> > or some other addition that only shows up when needed. >> >> I think this whole idea is a bad one. In the current release, you can do >> >> DROP FUNCTION Name ( Argument data types ) >> >> ...and it will work. Maybe you will say that no one is doing this via >> a script (which I wouldn't bet on, but it's possible) but I'm sure >> people are doing it via cut and paste, because I have done exactly >> this thing. Any of the various proposals for hacking up Argument data >> types will make this no longer true, and somebody will get confused. >> I think you should bite the bullet and add a "type" column (f for >> regular function and w for window? could there be others in the >> future?). > > Are we doing anything for this for 8.4? I prefer adding column of type 'w'|'f' to attaching OVER keyword in argument column, because type column approach is more general if it may refer to not only window functions but also setof, trigger, and those coming in the future (hypothetical set function maybe?). It seems to me that the OVER keyword is not necessary to let him know he needs it. Regards, -- Hitoshi Harada