Thread: More frame options in window functions
Attached is the fix pointed out in the previous CommitFest plus RANGE offset support. *fix - move window regression test to another parallel group, but regarding the limitation of 20 per group the union test goes to the group the window test belonged to. - allow NULL iswindowagg as an argument of AggGetMemoryContext - change view name to longer one *RANGE offset - allow all of RANGE BETWEEN <value> PRECEDING/FOLLOWING AND <value> PRECEDING/FOLLOWING for any data types that support ORDER BY and additions/subtractions, which is extended design to the spec. The spec says data types allowed in RANGE offset are only numeric and temporal ones but we don't have such limitation. - add "+"/"-" operator search code in parsing, which is used to calculate frame bound, but I'm not sure if this is right approach. - add more regression tests Regards, -- Hitoshi Harada
Attachment
2009/12/31 Hitoshi Harada <umi.tanuki@gmail.com>: > Attached is the fix pointed out in the previous CommitFest plus RANGE > offset support. Improved version attached. In this revision I fixed type mismatch case like "ORDER BY int4_data RANGE BETWEEN int8_data PRECEDING ...". Update of comments and fix typos in documents are also included. Regards, -- Hitoshi Harada
Attachment
2010/1/5 Hitoshi Harada <umi.tanuki@gmail.com>: > 2009/12/31 Hitoshi Harada <umi.tanuki@gmail.com>: >> Attached is the fix pointed out in the previous CommitFest plus RANGE >> offset support. > > Improved version attached. In this revision I fixed type mismatch case > like "ORDER BY int4_data RANGE BETWEEN int8_data PRECEDING ...". > > Update of comments and fix typos in documents are also included. Fix some trivial things and synced with HEAD. I've came up with using "upper" and "lower" instead of "start" and "end" for window frame bounds. The upper/lower is more beautiful since two have same length but start/end is used since it was introduced. Comments? Regards, -- Hitoshi Harada
Attachment
> Thanks for the review. I've found another crash today and attached is > fixed version. The case is: > > SELECT four, sum(ten) over (PARTITION BY four ORDER BY four RANGE 1 > PRECEDING) FROM tenk1 WHERE unique1 < 10; > Hi, The patch (more_frame_options.20100115.patch.gz) applies cleanly, but the regression test gives: *** /var/data1/pg_stuff/pg_sandbox/pgsql.rows_frame_types/src/test/regress/expected/window.out 2010-01-15 22:36:01.000000000 +0100 --- /var/data1/pg_stuff/pg_sandbox/pgsql.rows_frame_types/src/test/regress/results/window.out 2010-01-15 22:37:01.000000000 +0100 *************** *** 934,953 **** SELECT four, ten, sum(ten) over (partition by four order by four range 1 preceding) FROM tenk1 WHERE unique1 < 10; ! four | ten | sum ! ------+-----+----- ! 0 | 0 | 12 ! 0 | 8 | 12 ! 0 | 4 | 12 ! 1 | 5 | 15 ! 1 | 9 | 15 ! 1 | 1 | 15 ! 2 | 6 | 8 ! 2 | 2 | 8 ! 3 | 3 | 10 ! 3 | 7 | 10 ! (10 rows) ! CREATE VIEW v_window AS SELECT i, sum(i) over (order by i rows between 1 preceding and 1 following) as sum_rows, sum(i) over (order by i / 3 range between 1 preceding and 1 following) as sum_range --- 934,940 ---- SELECT four, ten, sum(ten) over (partition by four order by four range 1 preceding) FROM tenk1 WHERE unique1 < 10; ! ERROR: cannot extract system attribute from minimal tuple CREATE VIEW v_window AS SELECT i, sum(i) over (order byi rows between 1 preceding and 1 following) as sum_rows, sum(i) over (order by i / 3 range between 1 preceding and1 following) as sum_range ====================================================================== regards, Erik Rijkers
2010/1/16 Erik Rijkers <er@xs4all.nl>: > >> Thanks for the review. I've found another crash today and attached is >> fixed version. The case is: >> >> SELECT four, sum(ten) over (PARTITION BY four ORDER BY four RANGE 1 >> PRECEDING) FROM tenk1 WHERE unique1 < 10; >> > > Hi, > > The patch (more_frame_options.20100115.patch.gz) applies cleanly, but the regression test gives: It doesn't happen here. Could you send more information like configure options and EXPLAIN output of that query? Regards, -- Hitoshi Harada
2010/1/16 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/1/16 Erik Rijkers <er@xs4all.nl>: >> >>> Thanks for the review. I've found another crash today and attached is >>> fixed version. The case is: >>> >>> SELECT four, sum(ten) over (PARTITION BY four ORDER BY four RANGE 1 >>> PRECEDING) FROM tenk1 WHERE unique1 < 10; >>> >> >> Hi, >> >> The patch (more_frame_options.20100115.patch.gz) applies cleanly, but the regression test gives: on fedora 32 bit is all ok. I'll check it on 64bit fedora on Monday. Regards Pavel > > It doesn't happen here. Could you send more information like configure > options and EXPLAIN output of that query? > > Regards, > > > -- > Hitoshi Harada > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Sat, January 16, 2010 09:29, Hitoshi Harada wrote: > 2010/1/16 Erik Rijkers <er@xs4all.nl>: >> >>> Thanks for the review. I've found another crash today and attached is >>> fixed version. The case is: >>> >>> SELECT four, sum(ten) over (PARTITION BY four ORDER BY four RANGE 1 >>> PRECEDING) FROM tenk1 WHERE unique1 < 10; >>> >> >> Hi, >> >> The patch (more_frame_options.20100115.patch.gz) applies cleanly, but the regression test gives: > > It doesn't happen here. Could you send more information like configure > options and EXPLAIN output of that query? > Sorry, I should have done that the first time. Here is more info: Linux 2.6.18-164.el5 x86_64 / CentOS release 5.4 (Final) Source is CVS as of today (which without the patch compiles and runs the regression test OK). With the patch, compiled + installed without 'make check', pg_config gives: CONFIGURE = '--prefix=/var/data1/pg_stuff/pg_installations/pgsql.rows_frame_types' '--with-pgport=6547' CC = gcc CPPFLAGS = -D_GNU_SOURCE CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv CFLAGS_SL = -fpic LDFLAGS = -Wl,-rpath,'/var/data1/pg_stuff/pg_installations/pgsql.rows_frame_types/lib' LDFLAGS_SL = LIBS = -lpgport -lz -lreadline -ltermcap -lcrypt -ldl -lm VERSION = PostgreSQL 8.5devel-rows_frame_types psql -f $pgsql_regress_sql/create_table.sql; cat $pgsql_regress_data/tenk.data | psql -c "copy tenk1 from stdin csv delimiter E'\t'"; explain SELECT four, ten, sum(ten) over (partition by four order by four range 1 preceding) FROM tenk1 WHERE unique1 < 10; QUERY PLAN --------------------------------------------------------------------WindowAgg (cost=483.17..483.37 rows=10 width=8) -> Sort (cost=483.17..483.19 rows=10 width=8) Sort Key: four -> Seq Scan on tenk1 (cost=0.00..483.00 rows=10width=8) Filter: (unique1 < 10) (5 rows) explain analyze SELECT four, ten, sum(ten) over (partition by four order by four range 1 preceding) FROM tenk1 WHERE unique1< 10; ERROR: cannot extract system attribute from minimal tuple hth, Erik Rijkers
2010/1/17 Erik Rijkers <er@xs4all.nl>: > On Sat, January 16, 2010 09:29, Hitoshi Harada wrote: >> 2010/1/16 Erik Rijkers <er@xs4all.nl>: >>> >>>> Thanks for the review. I've found another crash today and attached is >>>> fixed version. The case is: >>>> >>>> SELECT four, sum(ten) over (PARTITION BY four ORDER BY four RANGE 1 >>>> PRECEDING) FROM tenk1 WHERE unique1 < 10; >>>> >>> >>> Hi, >>> >>> The patch (more_frame_options.20100115.patch.gz) applies cleanly, but the regression test gives: >> >> It doesn't happen here. Could you send more information like configure >> options and EXPLAIN output of that query? >> > > Sorry, I should have done that the first time. Here is more info: > > Linux 2.6.18-164.el5 x86_64 / CentOS release 5.4 (Final) Thanks, I confirmed it on my 64bit environment. My approach to solve this was completely wrong. The problem here is that when PARTITION BY clause equals to ORDER BY clause window_pathkeys is canonicalized in make_pathkeys_for_window() and so get_column_info_for_window() recognizes number of ORDER BY columns as 0, in other word, window->ordNumCols = 0 && window->ordColIdx[0] is invalid. This behavior is ok in 8.4 because in that case all rows are peer to each other and the partition = the frame in RANGE mode. This assumption is now broken since FRAMEOPTION_START_OFFSET and FRAMEOPTION_END_OFFSET are introduced, which means that the current row can be out of the frame. So with these options we must always evaluate if the current row is inside the frame or not by *sort key column*. However, we don't have it in the executor as the planner has removed it during canonicalization of pathkeys. So I previously added such code: *** 2819,2825 **** get_column_info_for_window(PlannerInfo *root, WindowClause *wc, List *tlist, int numPart = list_length(wc->partitionClause); int numOrder= list_length(wc->orderClause); ! if (numSortCols == numPart + numOrder) { /* easy case */ *partNumCols = numPart; --- 2836,2847 ---- int numPart = list_length(wc->partitionClause); int numOrder = list_length(wc->orderClause); ! /* ! * in RANGE offset mode, numOrder should be represented as-is. ! */ ! if (numSortCols == numPart + numOrder || ! (wc->frameOptions & FRAMEOPTION_RANGE && ! wc->frameOptions & (FRAMEOPTION_START_VALUE | FRAMEOPTION_END_VALUE))) { /* easy case */ *partNumCols= numPart; but it failed now, since the "sortColIdx" passed in has been canonicalized already. And I tried to change not to canonicalize the pathkeys in make_pathkeys_window() in such cases and succeeded then passed all regression tests. diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 6ba051d..fee1302 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1417,11 +1417,17 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) int ordNumCols; AttrNumber *ordColIdx; Oid *ordOperators; + bool canonicalize; + + canonicalize = !(wc->frameOptions & FRAMEOPTION_RANGE && + wc->frameOptions & + (FRAMEOPTION_START_VALUE | + FRAMEOPTION_END_VALUE)); window_pathkeys = make_pathkeys_for_window(root, wc, tlist, - true); + canonicalize); /* * This is a bit tricky: we build a sort node even if we don't I am not very sure if this is the correct answer. Thoughts? Regards, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > ... I tried to change not to canonicalize the > pathkeys in make_pathkeys_window() in such cases and succeeded then > passed all regression tests. That's broken, whether it passes regression tests or not. Not canonicalizing will mean that you fail to recognize equality to canonicalized pathkeys, and thus for example execute unnecessary sorts. I haven't looked at the patch, but it sounds a bit like you are trying to put logic into the executor that needs to be in the planner. If the executor is guessing about what the planner did, that's a design failure. The planner should figure out what needs to happen and tell the executor exactly what to do, eg, which columns need to be compared to determine partition membership. regards, tom lane
2010/1/17 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> ... I tried to change not to canonicalize the >> pathkeys in make_pathkeys_window() in such cases and succeeded then >> passed all regression tests. > > That's broken, whether it passes regression tests or not. Not > canonicalizing will mean that you fail to recognize equality to > canonicalized pathkeys, and thus for example execute unnecessary > sorts. So why did you leave "canonicalize" argument in make_pathkeys_for_window()? I thought you'd thought it would be needed false in the future. > I haven't looked at the patch, but it sounds a bit like you are trying > to put logic into the executor that needs to be in the planner. If the > executor is guessing about what the planner did, that's a design > failure. The planner should figure out what needs to happen and tell > the executor exactly what to do, eg, which columns need to be compared > to determine partition membership. I'd think I'm putting logic into the planner that needs to be in the executor. Anyways, RANGE offset mode is quite different from existing framework. In a RANGE offset mode query, for example: SELECT sum(ten) over (PARTITION BY four ORDER BY four RANGE BETWEEN 2 PRECEDING AND 1 PRECEDING) FROM tenk1 the frame is determined as "from the first row which has <four> value - 2 to the last row which has <four> value - 1" and executor should know <four> value *is* the sort column even if the column is not actually significant. But the planner removes that information. It seems to me that what you say above is to put logic into the planner how to determine the frame, but I'd think the planner should leave that information about sort column for the executor, of course in another way than non-canonicalizing approach. I'll try it that way. If I'm missing something still, please point it out. Regards, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > 2010/1/17 Tom Lane <tgl@sss.pgh.pa.us>: >> That's broken, whether it passes regression tests or not. �Not >> canonicalizing will mean that you fail to recognize equality to >> canonicalized pathkeys, and thus for example execute unnecessary >> sorts. > So why did you leave "canonicalize" argument in > make_pathkeys_for_window()? I thought you'd thought it would be needed > false in the future. No, it has to be false in calls made before query_planner runs and true afterwards. There's no flexibility there. > In a RANGE offset mode query, for example: > SELECT sum(ten) over (PARTITION BY four ORDER BY four RANGE BETWEEN 2 > PRECEDING AND 1 PRECEDING) FROM tenk1 > the frame is determined as "from the first row which has <four> value > - 2 to the last row which has <four> value - 1" and executor should > know <four> value *is* the sort column even if the column is not > actually significant. But the planner removes that information. Maybe we're just talking past each other. My point is that the planner should record the fact that four is the sort column someplace where the executor can find it easily. AFAICS that doesn't mean it can't be the canonicalized form of the sort key. If a column is dropped out of the canonical sort key then it's simply redundant, and hence not relevant to determining the range. regards, tom lane
2010/1/19 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> In a RANGE offset mode query, for example: > >> SELECT sum(ten) over (PARTITION BY four ORDER BY four RANGE BETWEEN 2 >> PRECEDING AND 1 PRECEDING) FROM tenk1 > >> the frame is determined as "from the first row which has <four> value >> - 2 to the last row which has <four> value - 1" and executor should >> know <four> value *is* the sort column even if the column is not >> actually significant. But the planner removes that information. > > Maybe we're just talking past each other. My point is that the planner > should record the fact that four is the sort column someplace where the > executor can find it easily. AFAICS that doesn't mean it can't be the > canonicalized form of the sort key. If a column is dropped out of the > canonical sort key then it's simply redundant, and hence not relevant to > determining the range. Yeah, that's my point, too. The planner has to distinguish "four" from sort pathkeys and to teach the executor the simple information which column should be used to determine frame. I was bit wrong because some of current executor code isn't like it, like using ordNumCols == 0 to know whether partition equals to frame, though.... Regards, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > 2010/1/19 Tom Lane <tgl@sss.pgh.pa.us>: >> AFAICS that doesn't mean it can't be the >> canonicalized form of the sort key. �If a column is dropped out of the >> canonical sort key then it's simply redundant, and hence not relevant to >> determining the range. > Yeah, that's my point, too. The planner has to distinguish "four" from > sort pathkeys and to teach the executor the simple information which > column should be used to determine frame. I was bit wrong because some > of current executor code isn't like it, like using ordNumCols == 0 to > know whether partition equals to frame, though.... BTW, watch out for the possibility that the canonicalized key is empty. This isn't an error case --- what it means is that the planner has proven that all the rows have equal sort key values, so there's no need to compare anything. regards, tom lane
2010/1/19 Hitoshi Harada <umi.tanuki@gmail.com>: > Yeah, that's my point, too. The planner has to distinguish "four" from > sort pathkeys and to teach the executor the simple information which > column should be used to determine frame. I was bit wrong because some > of current executor code isn't like it, like using ordNumCols == 0 to > know whether partition equals to frame, though.... And here's another version to fix this problem (I hope). Now the planner distinguish sort column from actual significant pathkeys. I tested it on both of 32bit and 64bit Linux. Regards, -- Hitoshi Harada
Attachment
2010/1/19 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/1/19 Hitoshi Harada <umi.tanuki@gmail.com>: >> Yeah, that's my point, too. The planner has to distinguish "four" from >> sort pathkeys and to teach the executor the simple information which >> column should be used to determine frame. I was bit wrong because some >> of current executor code isn't like it, like using ordNumCols == 0 to >> know whether partition equals to frame, though.... > > And here's another version to fix this problem (I hope). Now the > planner distinguish sort column from actual significant pathkeys. I > tested it on both of 32bit and 64bit Linux. > I tested it, and reported problems are fixed Thank you Pavel > Regards, > > > -- > Hitoshi Harada > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
On Tue, Jan 19, 2010 at 3:02 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2010/1/19 Hitoshi Harada <umi.tanuki@gmail.com>: >> Yeah, that's my point, too. The planner has to distinguish "four" from >> sort pathkeys and to teach the executor the simple information which >> column should be used to determine frame. I was bit wrong because some >> of current executor code isn't like it, like using ordNumCols == 0 to >> know whether partition equals to frame, though.... > > And here's another version to fix this problem (I hope). Now the > planner distinguish sort column from actual significant pathkeys. I > tested it on both of 32bit and 64bit Linux. Would it make sense to pull some of the infrastructure bits out of this patch and commit those bits separately, so as to reduce the size of the main patch? In particular, the AggGetMemoryContext() stuff looks like a good candidate for that treatment. Why did you change BETWEEN from a TYPE_FUNC_NAME_KEYWORD to a COL_NAME_KEYWORD? ...Robert
2010/1/23 Robert Haas <robertmhaas@gmail.com>: > On Tue, Jan 19, 2010 at 3:02 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >> 2010/1/19 Hitoshi Harada <umi.tanuki@gmail.com>: >>> Yeah, that's my point, too. The planner has to distinguish "four" from >>> sort pathkeys and to teach the executor the simple information which >>> column should be used to determine frame. I was bit wrong because some >>> of current executor code isn't like it, like using ordNumCols == 0 to >>> know whether partition equals to frame, though.... >> >> And here's another version to fix this problem (I hope). Now the >> planner distinguish sort column from actual significant pathkeys. I >> tested it on both of 32bit and 64bit Linux. > > Would it make sense to pull some of the infrastructure bits out of > this patch and commit those bits separately, so as to reduce the size > of the main patch? In particular, the AggGetMemoryContext() stuff > looks like a good candidate for that treatment. Fair enough. Attached contains that part only. I'll search more parts like what you suggest, but there seems to be few parts because for example the change of parser affects all the road to the executor. > Why did you change BETWEEN from a TYPE_FUNC_NAME_KEYWORD to a COL_NAME_KEYWORD? Introducing new frame option syntax, shift/reduce conflict came happened. http://archives.postgresql.org/message-id/e08cc0400911240908s7efaea85wc8505d228220b980@mail.gmail.com http://archives.postgresql.org/message-id/6363.1229890896@sss.pgh.pa.us Tom suggested it as RESERVED_KEYWORD a year ago, but COL_NAME_KEYWORD works as well which is slightly more weak (ie more chances that you can use the word) than RESERVED_KEYWORD. I'm not completely sure which is better, though. Regards, -- Hitoshi Harada
Attachment
Hitoshi Harada <umi.tanuki@gmail.com> writes: > 2010/1/23 Robert Haas <robertmhaas@gmail.com>: >> Would it make sense to pull some of the infrastructure bits out of >> this patch and commit those bits separately, so as to reduce the size >> of the main patch? �In particular, the AggGetMemoryContext() stuff >> looks like a good candidate for that treatment. > Fair enough. Attached contains that part only. I started looking at this patch. I believe that we should commit the AggGetMemoryContext API function --- *not* the window context management changes that you included here, but only the API abstraction --- to be sure that that gets into 9.0 so that third-party aggregate functions can start relying on it instead of looking directly at the AggState or WindowAggState node. The rest of the patch might or might not make it, but we can at least help people future-proof their code. I'm fairly desperately unhappy with the RANGE PRECEDING/FOLLOWING parts of the patch. We have expended a great deal of sweat over the years to avoid hard-wiring assumptions about particular operator names into the code, but this patch is blithely ignoring that history and assuming that "+" and "-" will do the right thing. Also LookupOperName is probably not the right thing, since it insists on exact or binary-compatible match. To the extent that there is any justification at all for assuming that "+"/"-" are the right operators, it is that the spec speaks in terms of the range bounds being VSK+V2F etc --- but if someone were to actually write out such an expression, the parser would allow for implicit casts to happen, so this code is not implementing what that expression would produce. Plus the results are dependent on the current search path, which for example means it'll fail if the window sort column is a user-defined type whose operators happen to be out of path at the moment --- even though we would have found its default sort opclass despite that. And lastly, I'm totally unconvinced that it's a good idea to accept an operator that returns a type other than the type of the window sort column. That seems to eliminate whatever little protection we had against picking up an unsuitable operator; and it complicates the code as well. Given the lack of time remaining in this CF, I'm tempted to propose ripping out the RANGE support and just trying to get ROWS committed. That should be substantially less controversial from a semantic standpoint, and it still seems like a considerable improvement in functionality. Thoughts? regards, tom lane
On 2/8/10 11:17 AM, Tom Lane wrote: > Given the lack of time remaining in this CF, I'm tempted to propose > ripping out the RANGE support and just trying to get ROWS committed. > That should be substantially less controversial from a semantic > standpoint, and it still seems like a considerable improvement in > functionality. +1 --Josh Berkus
I wrote: > I started looking at this patch. I believe that we should commit the > AggGetMemoryContext API function --- *not* the window context > management changes that you included here, but only the API abstraction > --- to be sure that that gets into 9.0 so that third-party aggregate > functions can start relying on it instead of looking directly at the > AggState or WindowAggState node. The rest of the patch might or might > not make it, but we can at least help people future-proof their code. I have committed that little part. I revised the function API to be /* AggCheckCallContext can return one of the following codes, or 0: */ #define AGG_CONTEXT_AGGREGATE 1 /* regular aggregate */ #define AGG_CONTEXT_WINDOW 2 /* window function */ extern int AggCheckCallContext(FunctionCallInfo fcinfo, MemoryContext *aggcontext); so that it would be conveniently usable in places that just want to check aggregate-ness and don't need to fetch a memory context; and with the thought that maybe someday there would be more than two possible call contexts. regards, tom lane
2010/2/9 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> 2010/1/23 Robert Haas <robertmhaas@gmail.com>: >>> Would it make sense to pull some of the infrastructure bits out of >>> this patch and commit those bits separately, so as to reduce the size >>> of the main patch? In particular, the AggGetMemoryContext() stuff >>> looks like a good candidate for that treatment. > >> Fair enough. Attached contains that part only. > > I started looking at this patch. I believe that we should commit the > AggGetMemoryContext API function --- *not* the window context > management changes that you included here, but only the API abstraction > --- to be sure that that gets into 9.0 so that third-party aggregate > functions can start relying on it instead of looking directly at the > AggState or WindowAggState node. The rest of the patch might or might > not make it, but we can at least help people future-proof their code. > > I'm fairly desperately unhappy with the RANGE PRECEDING/FOLLOWING parts > of the patch. We have expended a great deal of sweat over the years > to avoid hard-wiring assumptions about particular operator names into > the code, but this patch is blithely ignoring that history and assuming > that "+" and "-" will do the right thing. Also LookupOperName is > probably not the right thing, since it insists on exact or > binary-compatible match. To the extent that there is any justification > at all for assuming that "+"/"-" are the right operators, it is that the > spec speaks in terms of the range bounds being VSK+V2F etc --- but if > someone were to actually write out such an expression, the parser would > allow for implicit casts to happen, so this code is not implementing > what that expression would produce. Plus the results are dependent on > the current search path, which for example means it'll fail if the > window sort column is a user-defined type whose operators happen to be > out of path at the moment --- even though we would have found its > default sort opclass despite that. And lastly, I'm totally unconvinced > that it's a good idea to accept an operator that returns a type other > than the type of the window sort column. That seems to eliminate > whatever little protection we had against picking up an unsuitable > operator; and it complicates the code as well. I know "+"/"-" part is an issue. But as far as I know there's been no infrastructure to handle such situation. My ideal plan is to introduce some mechanism to make "+"/"-" operation abstract enough such like sort opclass/opfamily, although I wasn't sure if that is to be introduced for this (ie RANGE frame) purpose only. Now that specialized hard-coding "+"/"-" in source seems unacceptable, I would like to hear how to handle this. Is there any better than introducing new mechanism such like opclass? > Given the lack of time remaining in this CF, I'm tempted to propose > ripping out the RANGE support and just trying to get ROWS committed. > That should be substantially less controversial from a semantic > standpoint, and it still seems like a considerable improvement in > functionality. > > Thoughts? As expected. I don't mind splitting patch to be committable if users who expected this feature don't mind. Regards, -- Hitoshi Harada
On Mon, Feb 8, 2010 at 10:37 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > 2010/2/9 Tom Lane <tgl@sss.pgh.pa.us>: >> Hitoshi Harada <umi.tanuki@gmail.com> writes: >>> 2010/1/23 Robert Haas <robertmhaas@gmail.com>: >>>> Would it make sense to pull some of the infrastructure bits out of >>>> this patch and commit those bits separately, so as to reduce the size >>>> of the main patch? In particular, the AggGetMemoryContext() stuff >>>> looks like a good candidate for that treatment. >> >>> Fair enough. Attached contains that part only. >> >> I started looking at this patch. I believe that we should commit the >> AggGetMemoryContext API function --- *not* the window context >> management changes that you included here, but only the API abstraction >> --- to be sure that that gets into 9.0 so that third-party aggregate >> functions can start relying on it instead of looking directly at the >> AggState or WindowAggState node. The rest of the patch might or might >> not make it, but we can at least help people future-proof their code. >> >> I'm fairly desperately unhappy with the RANGE PRECEDING/FOLLOWING parts >> of the patch. We have expended a great deal of sweat over the years >> to avoid hard-wiring assumptions about particular operator names into >> the code, but this patch is blithely ignoring that history and assuming >> that "+" and "-" will do the right thing. Also LookupOperName is >> probably not the right thing, since it insists on exact or >> binary-compatible match. To the extent that there is any justification >> at all for assuming that "+"/"-" are the right operators, it is that the >> spec speaks in terms of the range bounds being VSK+V2F etc --- but if >> someone were to actually write out such an expression, the parser would >> allow for implicit casts to happen, so this code is not implementing >> what that expression would produce. Plus the results are dependent on >> the current search path, which for example means it'll fail if the >> window sort column is a user-defined type whose operators happen to be >> out of path at the moment --- even though we would have found its >> default sort opclass despite that. And lastly, I'm totally unconvinced >> that it's a good idea to accept an operator that returns a type other >> than the type of the window sort column. That seems to eliminate >> whatever little protection we had against picking up an unsuitable >> operator; and it complicates the code as well. > > I know "+"/"-" part is an issue. But as far as I know there's been no > infrastructure to handle such situation. My ideal plan is to introduce > some mechanism to make "+"/"-" operation abstract enough such like > sort opclass/opfamily, although I wasn't sure if that is to be > introduced for this (ie RANGE frame) purpose only. > > Now that specialized hard-coding "+"/"-" in source seems unacceptable, > I would like to hear how to handle this. Is there any better than > introducing new mechanism such like opclass? > >> Given the lack of time remaining in this CF, I'm tempted to propose >> ripping out the RANGE support and just trying to get ROWS committed. >> That should be substantially less controversial from a semantic >> standpoint, and it still seems like a considerable improvement in >> functionality. >> >> Thoughts? > > As expected. I don't mind splitting patch to be committable if users > who expected this feature don't mind. Well, they'll likely be happier with a partial feature than no feature at all... I agree with Tom that there's no time time now to resolve the issue of how + and - should be handled. ...Robert
On Tue, Feb 09, 2010 at 12:37:32PM +0900, Hitoshi Harada wrote: > I know "+"/"-" part is an issue. But as far as I know there's been no > infrastructure to handle such situation. My ideal plan is to introduce > some mechanism to make "+"/"-" operation abstract enough such like > sort opclass/opfamily, although I wasn't sure if that is to be > introduced for this (ie RANGE frame) purpose only. > > Now that specialized hard-coding "+"/"-" in source seems unacceptable, > I would like to hear how to handle this. Is there any better than > introducing new mechanism such like opclass? I imagine it would be easiest to add these operators to the existing opclass infrastructure. Although it didn't start that way, the opclass structure is being more and more used to declare the semantics of a type. Btree operator classes are for *ordered* types, which seems to be the only case you can expect to work here. Currently the btree operator classes have only one support function, so it would seem than the easiest approach would be to declare two new support functions for add() and subtract(). A difficulty may be that the arguments to these functions are not obvious as in the (timestamp,interval) case, but if there is only one suitable possibility I think this can be made to work. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Tue, Feb 09, 2010 at 12:37:32PM +0900, Hitoshi Harada wrote: >> Now that specialized hard-coding "+"/"-" in source seems unacceptable, >> I would like to hear how to handle this. Is there any better than >> introducing new mechanism such like opclass? > I imagine it would be easiest to add these operators to the existing > opclass infrastructure. Although it didn't start that way, the opclass > structure is being more and more used to declare the semantics of a > type. Btree operator classes are for *ordered* types, which seems to be > the only case you can expect to work here. > Currently the btree operator classes have only one support function, so > it would seem than the easiest approach would be to declare two new > support functions for add() and subtract(). Yeah, that was pretty much the same line of thought I had. The main disadvantage seems to be two more catalog lookups per index to fill in support function entries that won't ever be used (at least not by the index machinery). However, I think we cache that stuff already inside relcache.c, so it might be insignificant. The real question is whether it's time to bite the bullet and stop overloading the opclass infrastructure for semantics-declaration purposes. Are there any other foreseeable cases where we are going to need to add operator knowledge of this sort? regards, tom lane
On Thu, Feb 11, 2010 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: >> On Tue, Feb 09, 2010 at 12:37:32PM +0900, Hitoshi Harada wrote: >>> Now that specialized hard-coding "+"/"-" in source seems unacceptable, >>> I would like to hear how to handle this. Is there any better than >>> introducing new mechanism such like opclass? > >> I imagine it would be easiest to add these operators to the existing >> opclass infrastructure. Although it didn't start that way, the opclass >> structure is being more and more used to declare the semantics of a >> type. Btree operator classes are for *ordered* types, which seems to be >> the only case you can expect to work here. > >> Currently the btree operator classes have only one support function, so >> it would seem than the easiest approach would be to declare two new >> support functions for add() and subtract(). > > Yeah, that was pretty much the same line of thought I had. The main > disadvantage seems to be two more catalog lookups per index to fill in > support function entries that won't ever be used (at least not by the > index machinery). However, I think we cache that stuff already inside > relcache.c, so it might be insignificant. > > The real question is whether it's time to bite the bullet and stop > overloading the opclass infrastructure for semantics-declaration > purposes. Are there any other foreseeable cases where we are going > to need to add operator knowledge of this sort? knngist wants to jimmy with the opclass semantics too, though the need there is a little different. The issue is that an amcanorder AM is good for optimizing ORDER BY <indexed-column-name>, but that's not what GIST indices can do: they can optimize ORDER BY <indexed-column-name> <op> <constant>, for suitable values of op. Teodor and Oleg handled this by adding a new flag to the AM (amcanorderbyop) and adding the operators to the opclass. But, unlike the comparison operators, these operators don't necessarily return a boolean: in fact they probably don't. It would be nice to come up with a general solution to this problem. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 11, 2010 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The real question is whether it's time to bite the bullet and stop >> overloading the opclass infrastructure for semantics-declaration >> purposes. �Are there any other foreseeable cases where we are going >> to need to add operator knowledge of this sort? > knngist wants to jimmy with the opclass semantics too, though the need > there is a little different. The issue is that an amcanorder AM is > good for optimizing ORDER BY <indexed-column-name>, but that's not > what GIST indices can do: they can optimize ORDER BY > <indexed-column-name> <op> <constant>, for suitable values of op. > Teodor and Oleg handled this by adding a new flag to the AM > (amcanorderbyop) and adding the operators to the opclass. But, unlike > the comparison operators, these operators don't necessarily return a > boolean: in fact they probably don't. Yeah, but in that case it is reasonable to have the information in opclasses, because it's directly tied to something you do with indexes, and it does depend on which opclass the index uses. I don't like the specific details of what they did, but associating knngist info with opclasses is sane in the abstract. What's bothering me about the window frame stuff is that it's not at all associated with an index. However, what it *is* associated with is a sort ordering, and the notion that btree opclasses are what define orderings is sufficiently deeply wired into the system that undoing it would be a huge PITA. So unless we can see a pretty clear future need for more information in this category, I'm not really inclined to invent some new structure altogether. I'm just wondering if anyone does see that... regards, tom lane
On Thu, 11 Feb 2010, Robert Haas wrote: > On Thu, Feb 11, 2010 at 3:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Martijn van Oosterhout <kleptog@svana.org> writes: >>> On Tue, Feb 09, 2010 at 12:37:32PM +0900, Hitoshi Harada wrote: >>>> Now that specialized hard-coding "+"/"-" in source seems unacceptable, >>>> I would like to hear how to handle this. Is there any better than >>>> introducing new mechanism such like opclass? >> >>> I imagine it would be easiest to add these operators to the existing >>> opclass infrastructure. Although it didn't start that way, the opclass >>> structure is being more and more used to declare the semantics of a >>> type. Btree operator classes are for *ordered* types, which seems to be >>> the only case you can expect to work here. >> >>> Currently the btree operator classes have only one support function, so >>> it would seem than the easiest approach would be to declare two new >>> support functions for add() and subtract(). >> >> Yeah, that was pretty much the same line of thought I had. The main >> disadvantage seems to be two more catalog lookups per index to fill in >> support function entries that won't ever be used (at least not by the >> index machinery). However, I think we cache that stuff already inside >> relcache.c, so it might be insignificant. >> >> The real question is whether it's time to bite the bullet and stop >> overloading the opclass infrastructure for semantics-declaration >> purposes. Are there any other foreseeable cases where we are going >> to need to add operator knowledge of this sort? > > knngist wants to jimmy with the opclass semantics too, though the need > there is a little different. The issue is that an amcanorder AM is > good for optimizing ORDER BY <indexed-column-name>, but that's not > what GIST indices can do: they can optimize ORDER BY > <indexed-column-name> <op> <constant>, for suitable values of op. > Teodor and Oleg handled this by adding a new flag to the AM > (amcanorderbyop) and adding the operators to the opclass. But, unlike > the comparison operators, these operators don't necessarily return a > boolean: in fact they probably don't. see http://archives.postgresql.org/pgsql-hackers/2009-11/msg01809.php in knngist we use distance semantic, distance < 0 means FALSE, so we compatible with old GIST. > > It would be nice to come up with a general solution to this problem. yeah Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Feb 8, 2010 at 10:37 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >> 2010/2/9 Tom Lane <tgl@sss.pgh.pa.us>: >>> Given the lack of time remaining in this CF, I'm tempted to propose >>> ripping out the RANGE support and just trying to get ROWS committed. >>> That should be substantially less controversial from a semantic >>> standpoint, and it still seems like a considerable improvement in >>> functionality. >> >> As expected. I don't mind splitting patch to be committable if users >> who expected this feature don't mind. > Well, they'll likely be happier with a partial feature than no feature > at all... I agree with Tom that there's no time time now to resolve > the issue of how + and - should be handled. I've done that and am reviewing the rest of the patch, but I had more trouble than I expected with phrasing the "not implemented" message. Usually we try to word these things like "SQLCOMMAND is not implemented" but there's no one-word version of what it is that's been left out. "RANGE" isn't right since there are variants of RANGE that work. What I have at the moment is if (n->frameOptions & (FRAMEOPTION_START_VALUE_PRECEDING | FRAMEOPTION_END_VALUE_PRECEDING)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("RANGE value PRECEDING is not implemented yet"), parser_errposition(@1))); if (n->frameOptions & (FRAMEOPTION_START_VALUE_FOLLOWING | FRAMEOPTION_END_VALUE_FOLLOWING)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("RANGE value FOLLOWING is not implemented yet"), parser_errposition(@1))); but I wonder if anyone has a better idea. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > However, what it *is* associated with is a sort ordering, and the notion > that btree opclasses are what define orderings is sufficiently deeply > wired into the system that undoing it would be a huge PITA. So unless > we can see a pretty clear future need for more information in this > category, I'm not really inclined to invent some new structure > altogether. I'm just wondering if anyone does see that... I think there's the associativity property of operators that we might want to have someday, in order for the planner to know some more about joins on A = B then on B = C, or replace with < if you will. Regards, -- dim
Dimitri Fontaine <dfontaine@hi-media.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> However, what it *is* associated with is a sort ordering, and the notion >> that btree opclasses are what define orderings is sufficiently deeply >> wired into the system that undoing it would be a huge PITA. So unless >> we can see a pretty clear future need for more information in this >> category, I'm not really inclined to invent some new structure >> altogether. I'm just wondering if anyone does see that... > I think there's the associativity property of operators that we might > want to have someday, in order for the planner to know some more about > joins on A = B then on B = C, or replace with < if you will. We already do know about that, at least in the case of =. The reason it doesn't do transitive < deductions is not lack of information but doubt that it's worth the cycles to try. regards, tom lane
On Thu, Feb 11, 2010 at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Feb 8, 2010 at 10:37 PM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: >>> 2010/2/9 Tom Lane <tgl@sss.pgh.pa.us>: >>>> Given the lack of time remaining in this CF, I'm tempted to propose >>>> ripping out the RANGE support and just trying to get ROWS committed. >>>> That should be substantially less controversial from a semantic >>>> standpoint, and it still seems like a considerable improvement in >>>> functionality. >>> >>> As expected. I don't mind splitting patch to be committable if users >>> who expected this feature don't mind. > >> Well, they'll likely be happier with a partial feature than no feature >> at all... I agree with Tom that there's no time time now to resolve >> the issue of how + and - should be handled. > > I've done that and am reviewing the rest of the patch, but I had more > trouble than I expected with phrasing the "not implemented" message. > Usually we try to word these things like "SQLCOMMAND is not implemented" > but there's no one-word version of what it is that's been left out. > "RANGE" isn't right since there are variants of RANGE that work. > What I have at the moment is > > if (n->frameOptions & (FRAMEOPTION_START_VALUE_PRECEDING | > FRAMEOPTION_END_VALUE_PRECEDING)) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("RANGE value PRECEDING is not implemented yet"), > parser_errposition(@1))); > if (n->frameOptions & (FRAMEOPTION_START_VALUE_FOLLOWING | > FRAMEOPTION_END_VALUE_FOLLOWING)) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("RANGE value FOLLOWING is not implemented yet"), > parser_errposition(@1))); > > but I wonder if anyone has a better idea. Maybe something like this? RANGE PRECEDING is only supported with UNBOUNDED ...Robert
2010/2/12 Tom Lane <tgl@sss.pgh.pa.us>: > The real question is whether it's time to bite the bullet and stop > overloading the opclass infrastructure for semantics-declaration > purposes. Are there any other foreseeable cases where we are going > to need to add operator knowledge of this sort? Table partitioning with RANGE option maybe? Regards, -- Hitoshi Harada
Tom Lane <tgl@sss.pgh.pa.us> writes: >> I think there's the associativity property of operators that we might >> want to have someday, in order for the planner to know some more about >> joins on A = B then on B = C, or replace with < if you will. > > We already do know about that, at least in the case of =. The reason it > doesn't do transitive < deductions is not lack of information but doubt > that it's worth the cycles to try. Ok. I just remember about some mails here about the problem of reordering [LEFT] JOINS when we can, but I can't remember if it's really tied to associativity or some other thing. Searching the archives ain't helping me refresh those memories. So it seems the case for an extended opclass infrastructure, or a new side one, is between thin an non-existent yet? Regards, -- dim
Dimitri Fontaine <dfontaine@hi-media.com> writes: > Searching the archives ain't helping me refresh those memories. So it > seems the case for an extended opclass infrastructure, or a new side > one, is between thin an non-existent yet? Yeah, I don't immediately see anything that would justify going to that level of effort. Adding +/- as support functions for btree seems like the thing to do. regards, tom lane
Tom Lane escribió: > Dimitri Fontaine <dfontaine@hi-media.com> writes: > > Searching the archives ain't helping me refresh those memories. So it > > seems the case for an extended opclass infrastructure, or a new side > > one, is between thin an non-existent yet? > > Yeah, I don't immediately see anything that would justify going to > that level of effort. Adding +/- as support functions for btree > seems like the thing to do. Would it work to use a fake access method instead? If we add it to btree, will we be able to backtrack and move that to a separate catalog if we ever determine that it would have been better? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane escribi�: >> Yeah, I don't immediately see anything that would justify going to >> that level of effort. Adding +/- as support functions for btree >> seems like the thing to do. > Would it work to use a fake access method instead? Then you'd have to duplicate all the information in a btree opclass; *and* teach all the stuff that knows about btree to know about fakeam instead. Doesn't seem like there's any win there. > If we add it to > btree, will we be able to backtrack and move that to a separate catalog > if we ever determine that it would have been better? Backwards compatibility with existing user-type definitions is one big reason to *not* try to pull ORDER BY information out of btree. regards, tom lane
Hitoshi Harada <umi.tanuki@gmail.com> writes: > [ more_frame_options patch ] Committed after rather extensive revisions. I removed the RANGE value PRECEDING/FOLLOWING support as per discussion, which was probably a good thing anyway from an incremental-development standpoint; it made it easier to see what was going on in nodeWindowAgg. I'm not terribly happy with the changes you made in WinGetFuncArgInPartition and WinGetFuncArgInFrame to force the window function mark to not go past frame start in some modes. Not only is that pretty ugly, but I think it can mask bugs in window functions: it's an error for a window function to fetch a row before what it has set its mark to be, but in some cases that wouldn't be detected because of this change. I think it would be better to revert those changes and find another method of protecting fetches needed to determine the frame head. One idea is to create a separate read pointer that tracks the frame head whenever actual fetches of the frame head might be needed by update_frameheadpos. I committed it without changing that, but I think this should be revisited before trying to add the RANGE value PRECEDING/FOLLOWING options, because those will substantially expand the number of cases where that hack affects the behavior. I found one actual bug, which was indeed exposed by the submitted regression tests: SELECT sum(unique1) over (rows between 1 following and 3 following),unique1, four FROM tenk1 WHERE unique1 < 10;sum | unique1 | four -----+---------+------ 9 | 4 | 0 16 | 2 | 2 23 | 1 | 1 22 | 6 | 2 16 | 9 | 1 15 | 8 | 0 10 | 5 | 1 7 | 3 | 3 0 | 7 | 3 0 | 0 | 0 (10 rows) The last row's SUM ought to be null because there are no rows left in the frame. The cause of the bug was that update_frameheadpos was forcing frameheadpos to not go past the last partition row, but this has to be allowed so that the frame can be empty at need. regards, tom lane
2010/2/13 Tom Lane <tgl@sss.pgh.pa.us>: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> [ more_frame_options patch ] > > Committed after rather extensive revisions. Thanks a lot. > I'm not terribly happy with the changes you made in WinGetFuncArgInPartition > and WinGetFuncArgInFrame to force the window function mark to not go > past frame start in some modes. Not only is that pretty ugly, but I > think it can mask bugs in window functions: it's an error for a window > function to fetch a row before what it has set its mark to be, but in > some cases that wouldn't be detected because of this change. I think > it would be better to revert those changes and find another method of > protecting fetches needed to determine the frame head. One idea is > to create a separate read pointer that tracks the frame head whenever > actual fetches of the frame head might be needed by update_frameheadpos. > I committed it without changing that, but I think this should be > revisited before trying to add the RANGE value PRECEDING/FOLLOWING > options, because those will substantially expand the number of cases > where that hack affects the behavior. Well, you're right. In addition to this topic, I concern a little about changing row fetching in aggregate from spool_tuples() to window_gettupleslot(), for performance reason. It's needed to support extended frame options, but for original basic frame options it may get slow. Anyway, I agree to revisit and refactor to executor logic. Regards, -- Hitoshi Harada