Thread: RE: [HACKERS] views and group by (formerly: create view as selec t distinct)
RE: [HACKERS] views and group by (formerly: create view as selec t distinct)
From
Michael J Davis
Date:
This implies that the "group by" clause is not supported in views. I have created views that use the group by clause and they appear to work. I have not verified the content of the records. I would like to know more about what Jan means when he says that "group by" is not supported in views? Does it mean that the content of the results could be unexpected or are they conditions where they may work and other conditions where they don't work? More info would be greatly appreciated. Thanks, Michael -----Original Message-----From: jwieck@debis.com [SMTP:jwieck@debis.com]Sent: Monday, April 26, 1999 9:35 AMTo: rbrad@hpb50023.boi.hp.comCc: pgsql-hackers@postgreSQL.orgSubject: Re: [HACKERS] create view as select distinct (fwd) >> This is on the TODO list.>> I actually have a solution that seems to work fine, but I wanted to run it past> the backend guru's after we have finished the 6.5 beta.>> Sorry I din't get it finished before the beta started.>>-Ryan I wonder how it does! Have the following: CREATE TABLE t1 (a int4, b text); CREATE TABLE t2 (c int4, d text); CREATE VIEW v2 AS SELECT DISTINCT ONc * FROM t2; Populate them with: t1: 1 'one' 1 'ena' 2 'two' 2 'thio' 3 'three' 3 'tria' 4 'four' 4 'tessera' t2: 1 'I' 1 'eins' 2 'II' 2 'zwei' 3 'III' 3 'drei' Now you do SELECT t1.a, t1.b, v2.d FROM t1, v2 WHERE t1.a = v2.c; Does that work and produce the correct results? Note that there are more than one correct results. The DISTINCT SELECT from t2 already has. But in any case, the above SELECT should present 6 rows (all the rows of t1 from1 to 33 in english and greek) and column d must show either the roman or german number. To make it more complicated, add table t3 and populate it with more languages. Then setup CREATE VIEW v3 AS SELECT DISTINCT ON e * FROM t3; and expand the above SELECT to a join over t1, v2, v3. Finally, think about a view that is a DISTINCT SELECT over multiple tables. Now you build another view as SELECT from the first plus some other table and make the new view DISTINCT again. The same kind of problem causes that views currently cannot have ORDER BY or GROUP BY clauses. All these clausescan only appear once per query, so there is no room where the rewrite system can place multiple differentones. Implementing this requires first dramatic changes to the querytree layout and I think it needs subselectingRTE's too. Sorry - Jan -- #======================================================================## It's easier to get forgiveness for being wrongthan for being right. ## Let's break this rule - forgive me. ##======================================== jwieck@debis.com (Jan Wieck) #
Re: [HACKERS] views and group by (formerly: create view as selec
From
jwieck@debis.com (Jan Wieck)
Date:
Michael J Davis wrote: > > This implies that the "group by" clause is not supported in views. I have > created views that use the group by clause and they appear to work. I have > not verified the content of the records. I would like to know more about > what Jan means when he says that "group by" is not supported in views? Does > it mean that the content of the results could be unexpected or are they > conditions where they may work and other conditions where they don't work? > More info would be greatly appreciated. I tried to make it and it works partially. The problems arise if you have a view with a group by clause but do not select the attributes the group by clause uses: CREATE TABLE t1 (a int4, b int4); CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b; SELECT count FROM v1; SELECT count(*) FROM v1; Both selects crash the backend! If you have a view that uses GROUP BY and do a simple SELECT * from it, then it will work and return the correct results. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
jwieck@debis.com (Jan Wieck) writes: > I tried to make it and it works partially. The problems arise > if you have a view with a group by clause but do not select > the attributes the group by clause uses: > CREATE TABLE t1 (a int4, b int4); > CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b; > SELECT count FROM v1; > SELECT count(*) FROM v1; > Both selects crash the backend! Hmm, this sounds very similar to a problem I was looking at on Sunday: select sum(quantity), ID+1 from aggtest1 group by ID+1; ERROR: replace_agg_clause: variable not in target list The error message is new as of Sunday; with code older than that this will crash the backend. And, in fact, what I get from Jan's example above is: SELECT count FROM v1; ERROR: replace_agg_clause: variable not in target list In both situations, it's necessary to add variables to the target list that aren't in the list produced by the parser. We have code that does that sort of thing, but it's evidently not getting applied... regards, tom lane
Re: [HACKERS] views and group by (formerly: create view as selec
From
jwieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote: > > jwieck@debis.com (Jan Wieck) writes: > > I tried to make it and it works partially. The problems arise > > if you have a view with a group by clause but do not select > > the attributes the group by clause uses: > > > CREATE TABLE t1 (a int4, b int4); > > CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b; > > > SELECT count FROM v1; > > SELECT count(*) FROM v1; > > > Both selects crash the backend! > > Hmm, this sounds very similar to a problem I was looking at on Sunday: > > select sum(quantity), ID+1 from aggtest1 group by ID+1; > ERROR: replace_agg_clause: variable not in target list > > The error message is new as of Sunday; with code older than that this > will crash the backend. And, in fact, what I get from Jan's example > above is: > > SELECT count FROM v1; > ERROR: replace_agg_clause: variable not in target list > > In both situations, it's necessary to add variables to the target list > that aren't in the list produced by the parser. We have code that does > that sort of thing, but it's evidently not getting applied... Yes, and the attributes could be marked junk so they are taken out of the final result again later. But I wouldn't spend time on it because I think it's an incomplete solution. Let's have a view doing a sum() over a field with a group by. The values are measured in meters. And there is another table with factors to convert between meters and inches, feet, yards. CREATE TABLE t1 (id serial, owner text, len float8); CREATE TABLE t2 (quant text, factor float8); CREATE VIEW v1 AS SELECT owner, sum(len) FROM t1 GROUP BY owner; Now you want the sums converted to any quantity and do a: SELECT a.owner, a.sum as meter, b.quant, a.sum * b.factor as size FROM v1 a, t2 b; Ooops - there's only one row per owner left. And more OOOPS - it has sum()*count(* from t2) as meters! You must explicitly tell "GROUP BY a.owner, b.quant" to get the correct result. This is a case, where IMHO nothing else than a subselecting RTE could help. The problem in this case is that the rewrite system would have to add another attribute to the group by clause which is already there. But I see absolutely no way how it could decide which one. And there might be cases where totally no grouping could produce the correct result. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
jwieck@debis.com (Jan Wieck) wrote: > CREATE TABLE t1 (a int4, b int4); > CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b; > SELECT count FROM v1; > [ ... ker-boom ... ] and I said I thought it was the same planner bug I was chasing in non-VIEW-using examples of GROUP BY. It turns out it's not the same. In the above example, the breakage happens in rewrite before the planner ever sees it. When make_groupPlan is called, it sees a GroupClause like this: (gdb) p *((GroupClause *) 0x400b8690)->entry $107 = {type = T_TargetEntry, resdom = 0x400b86c0, fjoin = 0x0, expr = 0x400b8700} (gdb) p *((GroupClause *) 0x400b8690)->entry->resdom $108 = {type = T_Resdom, resno = 1, restype = 23, restypmod = -1, resname = 0x400b86e8 "b", reskey = 0, reskeyop = 0, resjunk= 0} (gdb) p *(Var*)((GroupClause *) 0x400b8690)->entry->expr $114 = {type = T_Var, varno = 4, varattno = 2, vartype = 23, vartypmod = -1, varlevelsup = 0, varnoold = 4, varoattno = 2} and a target list like this: (gdb) p *(TargetEntry*)0x400b8a70 $118 = {type = T_TargetEntry, resdom = 0x400b8a88, fjoin = 0x0, expr = 0x400b8ac8} (gdb) p *((TargetEntry*)0x400b8a70)->resdom $119 = {type = T_Resdom, resno = 1, restype = 23, restypmod = -1, resname = 0x400b8ab0 "count", reskey = 0, reskeyop = 0,resjunk = 0} (gdb) p *(Aggref*)((TargetEntry*)0x400b8a70)->expr $121 = {type = T_Aggref, aggname = 0x400b8af0 "count", basetype = 0, aggtype = 23, target = 0x400b8b08, aggno = 0, usenulls= 0 '\000'} (gdb) p *(Var*)((Aggref*)((TargetEntry*)0x400b8a70)->expr)->target $123 = {type = T_Var, varno = 4, varattno = 2, vartype = 23, vartypmod = -1, varlevelsup = 0, varnoold = 4, varoattno = 2} which is all fine except that the two different expressions have been given the same Resdom number (resno = 1 in both). That confuses make_groupPlan into thinking that they are the same expression, and trouble ensues. If I understand this stuff correctly, the rewriter should have been careful to assign different Resdom numbers to distinct expressions in the target and group-by lists of the rewritten query. That's how things look in an un-rewritten query, anyway. So I think this is a rewrite bug. If you don't like that answer, it might be possible to change the planner so that it doesn't put any faith in the Resdom numbers, but uses equal() on the expr fields to decide whether target and group-by entries are the same. That would be a slower but probably much more robust approach. Jan, what do you think? regards, tom lane
I now get. I am sure it instills confidence in our users:test=> CREATE TABLE t1 (a int4, b int4);CREATEtest=> test=> CREATEVIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b;CREATEtest=> SELECT count FROM v1;ERROR: union_planner: query ismarked hasAggs, but I don't see any > jwieck@debis.com (Jan Wieck) wrote: > > CREATE TABLE t1 (a int4, b int4); > > CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b; > > SELECT count FROM v1; > > [ ... ker-boom ... ] > > and I said I thought it was the same planner bug I was chasing in > non-VIEW-using examples of GROUP BY. It turns out it's not the same. > In the above example, the breakage happens in rewrite before the > planner ever sees it. When make_groupPlan is called, it sees a > GroupClause like this: > > (gdb) p *((GroupClause *) 0x400b8690)->entry > $107 = {type = T_TargetEntry, resdom = 0x400b86c0, fjoin = 0x0, > expr = 0x400b8700} > (gdb) p *((GroupClause *) 0x400b8690)->entry->resdom > $108 = {type = T_Resdom, resno = 1, restype = 23, restypmod = -1, > resname = 0x400b86e8 "b", reskey = 0, reskeyop = 0, resjunk = 0} > (gdb) p *(Var*)((GroupClause *) 0x400b8690)->entry->expr > $114 = {type = T_Var, varno = 4, varattno = 2, vartype = 23, vartypmod = -1, > varlevelsup = 0, varnoold = 4, varoattno = 2} > > and a target list like this: > > (gdb) p *(TargetEntry*)0x400b8a70 > $118 = {type = T_TargetEntry, resdom = 0x400b8a88, fjoin = 0x0, > expr = 0x400b8ac8} > (gdb) p *((TargetEntry*)0x400b8a70)->resdom > $119 = {type = T_Resdom, resno = 1, restype = 23, restypmod = -1, > resname = 0x400b8ab0 "count", reskey = 0, reskeyop = 0, resjunk = 0} > (gdb) p *(Aggref*)((TargetEntry*)0x400b8a70)->expr > $121 = {type = T_Aggref, aggname = 0x400b8af0 "count", basetype = 0, > aggtype = 23, target = 0x400b8b08, aggno = 0, usenulls = 0 '\000'} > (gdb) p *(Var*)((Aggref*)((TargetEntry*)0x400b8a70)->expr)->target > $123 = {type = T_Var, varno = 4, varattno = 2, vartype = 23, vartypmod = -1, > varlevelsup = 0, varnoold = 4, varoattno = 2} > > which is all fine except that the two different expressions have been > given the same Resdom number (resno = 1 in both). That confuses > make_groupPlan into thinking that they are the same expression, and > trouble ensues. > > If I understand this stuff correctly, the rewriter should have been > careful to assign different Resdom numbers to distinct expressions > in the target and group-by lists of the rewritten query. That's how > things look in an un-rewritten query, anyway. So I think this is a > rewrite bug. > > If you don't like that answer, it might be possible to change the > planner so that it doesn't put any faith in the Resdom numbers, but > uses equal() on the expr fields to decide whether target and group-by > entries are the same. That would be a slower but probably much more > robust approach. Jan, what do you think? > > regards, tom lane > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > I now get. I am sure it instills confidence in our users: > ERROR: union_planner: query is marked hasAggs, but I don't see any Yes :-(. I've been waiting for Jan to respond to the issue --- I think this is a rewriter problem, so I wanted to know whether he could do anything with it. (See my message "GROUP BY fixes committed" dated 02 May 1999 20:54:30 -0400.) It'd be possible to work around this problem inside the planner, by not believing what the rewriter says about either resnos or hasAggs, but that seems like a kluge rather than a fix. regards, tom lane
Re: [HACKERS] views and group by (formerly: create view as selec
From
jwieck@debis.com (Jan Wieck)
Date:
> > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > I now get. I am sure it instills confidence in our users: > > ERROR: union_planner: query is marked hasAggs, but I don't see any > > Yes :-(. I've been waiting for Jan to respond to the issue --- I think > this is a rewriter problem, so I wanted to know whether he could do > anything with it. (See my message "GROUP BY fixes committed" dated > 02 May 1999 20:54:30 -0400.) > > It'd be possible to work around this problem inside the planner, by not > believing what the rewriter says about either resnos or hasAggs, but > that seems like a kluge rather than a fix. Sorry - forgot about that one. I think the best place to check if the query has aggregates or not is at the beginning of the planner. The rewrite system is recursive, and thus the check at the end of one cycle doesn't guarantee that it will still be true at the end of all rewrites. OTOH views with aggregates are very buggy and introducing ton's of problems (as the other threads show). I'm not sure that it was a good idea to make it partially working :-( for v6.4. There was only the advice that they shouldn't be used. Now it's a released feature. More and more pressure for the subselecting RTE's which are IMHO the only way to solve that all cleanly. The other issue about the resno's is something that I must still search for in the rewriter. There are bad problems where it seems to mangle up not only resno's, it also get's lost of varlevelsup somehow and producing totally wrong varno's into subselects. Still problems coming from the EXCEPT/INTERSECT patch I haven't found so far >:-{ Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #