Thread: RE: [HACKERS] views and group by (formerly: create view as selec t distinct)

RE: [HACKERS] views and group by (formerly: create view as selec t distinct)

From
Michael J Davis
Date:
This implies that the "group by" clause is not supported in views.  I have
created views that use the group by clause and they appear to work.  I have
not verified the content of the records.  I would like to know more about
what Jan means when he says that "group by" is not supported in views?  Does
it mean that the content of the results could be unexpected or are they
conditions where they may work and other conditions where they don't work?
More info would be greatly appreciated.

Thanks, Michael
-----Original Message-----From:    jwieck@debis.com [SMTP:jwieck@debis.com]Sent:    Monday, April 26, 1999 9:35 AMTo:
rbrad@hpb50023.boi.hp.comCc:    pgsql-hackers@postgreSQL.orgSubject:    Re: [HACKERS] create view as select distinct
(fwd)
>> This is on the TODO list.>> I actually have a solution that seems to work fine, but I wanted
to run it past> the backend guru's after we have finished the 6.5 beta.>> Sorry I din't get it finished before the beta
started.>>-Ryan
 
    I wonder how it does!
    Have the following:
      CREATE TABLE t1 (a int4, b text);      CREATE TABLE t2 (c int4, d text);      CREATE VIEW v2 AS SELECT DISTINCT
ONc * FROM t2;
 
    Populate them with:
      t1:      1    'one'      1    'ena'      2    'two'      2    'thio'      3    'three'      3    'tria'      4
'four'     4    'tessera'
 
      t2:      1    'I'      1    'eins'      2    'II'      2    'zwei'      3    'III'      3    'drei'
    Now you do
      SELECT t1.a, t1.b, v2.d FROM t1, v2        WHERE t1.a = v2.c;
    Does  that  work  and  produce the correct results? Note that    there are more than one correct results. The
DISTINCT SELECT    from t2 already has. But in any case, the above SELECT should    present 6 rows (all the rows of t1
from1 to  33  in  english    and  greek) and column d must show either the roman or german    number.
 
    To make it more complicated, add table  t3  and  populate  it    with more languages.  Then setup
      CREATE VIEW v3 AS SELECT DISTINCT ON e * FROM t3;
    and expand the above SELECT to a join over t1, v2, v3.
    Finally,  think  about  a view that is a DISTINCT SELECT over    multiple tables.  Now you build another view as
SELECT from    the  first  plus  some  other  table  and  make  the new view    DISTINCT again.
 
    The same kind of problem causes that views  currently  cannot    have ORDER BY or GROUP BY clauses. All these
clausescan only    appear once per query, so there is no room where the  rewrite    system  can place multiple
differentones.  Implementing this    requires first dramatic changes to the querytree layout and I    think it needs
subselectingRTE's too.
 

Sorry - Jan
--

#======================================================================## It's easier to get forgiveness for being
wrongthan for being
 
right. ## Let's break this rule - forgive me.
##======================================== jwieck@debis.com (Jan
Wieck) #



Re: [HACKERS] views and group by (formerly: create view as selec

From
jwieck@debis.com (Jan Wieck)
Date:
Michael J Davis wrote:

>
> This implies that the "group by" clause is not supported in views.  I have
> created views that use the group by clause and they appear to work.  I have
> not verified the content of the records.  I would like to know more about
> what Jan means when he says that "group by" is not supported in views?  Does
> it mean that the content of the results could be unexpected or are they
> conditions where they may work and other conditions where they don't work?
> More info would be greatly appreciated.

    I tried to make it and it works partially. The problems arise
    if you have a view with a group by clause but do  not  select
    the attributes the group by clause uses:

      CREATE TABLE t1 (a int4, b int4);
      CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b;

      SELECT count FROM v1;
      SELECT count(*) FROM v1;

    Both selects crash the backend!

    If  you have a view that uses GROUP BY and do a simple SELECT
    * from it, then it will work and return the correct  results.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] views and group by (formerly: create view as selec

From
Tom Lane
Date:
jwieck@debis.com (Jan Wieck) writes:
>     I tried to make it and it works partially. The problems arise
>     if you have a view with a group by clause but do  not  select
>     the attributes the group by clause uses:

>       CREATE TABLE t1 (a int4, b int4);
>       CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b;

>       SELECT count FROM v1;
>       SELECT count(*) FROM v1;

>     Both selects crash the backend!

Hmm, this sounds very similar to a problem I was looking at on Sunday:

select sum(quantity), ID+1 from aggtest1 group by ID+1;
ERROR:  replace_agg_clause: variable not in target list

The error message is new as of Sunday; with code older than that this
will crash the backend.  And, in fact, what I get from Jan's example
above is:

SELECT count FROM v1;
ERROR:  replace_agg_clause: variable not in target list

In both situations, it's necessary to add variables to the target list
that aren't in the list produced by the parser.  We have code that does
that sort of thing, but it's evidently not getting applied...
        regards, tom lane


Re: [HACKERS] views and group by (formerly: create view as selec

From
jwieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

>
> jwieck@debis.com (Jan Wieck) writes:
> >     I tried to make it and it works partially. The problems arise
> >     if you have a view with a group by clause but do  not  select
> >     the attributes the group by clause uses:
>
> >       CREATE TABLE t1 (a int4, b int4);
> >       CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b;
>
> >       SELECT count FROM v1;
> >       SELECT count(*) FROM v1;
>
> >     Both selects crash the backend!
>
> Hmm, this sounds very similar to a problem I was looking at on Sunday:
>
> select sum(quantity), ID+1 from aggtest1 group by ID+1;
> ERROR:  replace_agg_clause: variable not in target list
>
> The error message is new as of Sunday; with code older than that this
> will crash the backend.  And, in fact, what I get from Jan's example
> above is:
>
> SELECT count FROM v1;
> ERROR:  replace_agg_clause: variable not in target list
>
> In both situations, it's necessary to add variables to the target list
> that aren't in the list produced by the parser.  We have code that does
> that sort of thing, but it's evidently not getting applied...

    Yes,  and  the  attributes  could  be marked junk so they are
    taken out of the final result again  later.  But  I  wouldn't
    spend time on it because I think it's an incomplete solution.

    Let's have a view doing a sum() over a field with a group by.
    The values are measured in meters. And there is another table
    with factors to convert  between  meters  and  inches,  feet,
    yards.

      CREATE TABLE t1 (id serial, owner text, len float8);
      CREATE TABLE t2 (quant text, factor float8);
      CREATE VIEW v1 AS SELECT owner, sum(len) FROM t1 GROUP BY owner;

    Now you want the sums converted to any quantity and do a:

      SELECT a.owner, a.sum as meter, b.quant, a.sum * b.factor as size
        FROM v1 a, t2 b;

    Ooops - there's only one row per owner left. And more OOOPS -
    it has sum()*count(* from t2) as meters! You must  explicitly
    tell  "GROUP  BY a.owner, b.quant" to get the correct result.
    This is a case, where IMHO nothing else than  a  subselecting
    RTE  could help. The problem in this case is that the rewrite
    system would have to add another attribute to  the  group  by
    clause  which  is  already there. But I see absolutely no way
    how it could decide which one. And there might be cases where
    totally no grouping could produce the correct result.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] views and group by (formerly: create view as selec

From
Tom Lane
Date:
jwieck@debis.com (Jan Wieck) wrote:
>       CREATE TABLE t1 (a int4, b int4);
>       CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b;
>       SELECT count FROM v1;
>       [ ... ker-boom ... ]

and I said I thought it was the same planner bug I was chasing in
non-VIEW-using examples of GROUP BY.  It turns out it's not the same.
In the above example, the breakage happens in rewrite before the
planner ever sees it.  When make_groupPlan is called, it sees a
GroupClause like this:

(gdb) p *((GroupClause *) 0x400b8690)->entry
$107 = {type = T_TargetEntry, resdom = 0x400b86c0, fjoin = 0x0, expr = 0x400b8700}
(gdb) p *((GroupClause *) 0x400b8690)->entry->resdom
$108 = {type = T_Resdom, resno = 1, restype = 23, restypmod = -1, resname = 0x400b86e8 "b", reskey = 0, reskeyop = 0,
resjunk= 0}
 
(gdb) p *(Var*)((GroupClause *) 0x400b8690)->entry->expr
$114 = {type = T_Var, varno = 4, varattno = 2, vartype = 23, vartypmod = -1, varlevelsup = 0, varnoold = 4, varoattno =
2}

and a target list like this:

(gdb) p *(TargetEntry*)0x400b8a70
$118 = {type = T_TargetEntry, resdom = 0x400b8a88, fjoin = 0x0, expr = 0x400b8ac8}
(gdb) p *((TargetEntry*)0x400b8a70)->resdom
$119 = {type = T_Resdom, resno = 1, restype = 23, restypmod = -1, resname = 0x400b8ab0 "count", reskey = 0, reskeyop =
0,resjunk = 0}
 
(gdb) p *(Aggref*)((TargetEntry*)0x400b8a70)->expr
$121 = {type = T_Aggref, aggname = 0x400b8af0 "count", basetype = 0, aggtype = 23, target = 0x400b8b08, aggno = 0,
usenulls= 0 '\000'}
 
(gdb) p *(Var*)((Aggref*)((TargetEntry*)0x400b8a70)->expr)->target
$123 = {type = T_Var, varno = 4, varattno = 2, vartype = 23, vartypmod = -1, varlevelsup = 0, varnoold = 4, varoattno =
2}

which is all fine except that the two different expressions have been
given the same Resdom number (resno = 1 in both).  That confuses
make_groupPlan into thinking that they are the same expression, and
trouble ensues.

If I understand this stuff correctly, the rewriter should have been
careful to assign different Resdom numbers to distinct expressions
in the target and group-by lists of the rewritten query.  That's how
things look in an un-rewritten query, anyway.  So I think this is a
rewrite bug.

If you don't like that answer, it might be possible to change the
planner so that it doesn't put any faith in the Resdom numbers, but
uses equal() on the expr fields to decide whether target and group-by
entries are the same.  That would be a slower but probably much more
robust approach.  Jan, what do you think?
        regards, tom lane


Re: [HACKERS] views and group by (formerly: create view as selec

From
Bruce Momjian
Date:
I now get.  I am sure it instills confidence in our users:test=> CREATE TABLE t1 (a int4, b int4);CREATEtest=> test=>
CREATEVIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b;CREATEtest=> SELECT count FROM v1;ERROR:  union_planner: query
ismarked hasAggs, but I don't see any
 


> jwieck@debis.com (Jan Wieck) wrote:
> >       CREATE TABLE t1 (a int4, b int4);
> >       CREATE VIEW v1 AS SELECT b, count(b) FROM t1 GROUP BY b;
> >       SELECT count FROM v1;
> >       [ ... ker-boom ... ]
> 
> and I said I thought it was the same planner bug I was chasing in
> non-VIEW-using examples of GROUP BY.  It turns out it's not the same.
> In the above example, the breakage happens in rewrite before the
> planner ever sees it.  When make_groupPlan is called, it sees a
> GroupClause like this:
> 
> (gdb) p *((GroupClause *) 0x400b8690)->entry
> $107 = {type = T_TargetEntry, resdom = 0x400b86c0, fjoin = 0x0,
>   expr = 0x400b8700}
> (gdb) p *((GroupClause *) 0x400b8690)->entry->resdom
> $108 = {type = T_Resdom, resno = 1, restype = 23, restypmod = -1,
>   resname = 0x400b86e8 "b", reskey = 0, reskeyop = 0, resjunk = 0}
> (gdb) p *(Var*)((GroupClause *) 0x400b8690)->entry->expr
> $114 = {type = T_Var, varno = 4, varattno = 2, vartype = 23, vartypmod = -1,
>   varlevelsup = 0, varnoold = 4, varoattno = 2}
> 
> and a target list like this:
> 
> (gdb) p *(TargetEntry*)0x400b8a70
> $118 = {type = T_TargetEntry, resdom = 0x400b8a88, fjoin = 0x0,
>   expr = 0x400b8ac8}
> (gdb) p *((TargetEntry*)0x400b8a70)->resdom
> $119 = {type = T_Resdom, resno = 1, restype = 23, restypmod = -1,
>   resname = 0x400b8ab0 "count", reskey = 0, reskeyop = 0, resjunk = 0}
> (gdb) p *(Aggref*)((TargetEntry*)0x400b8a70)->expr
> $121 = {type = T_Aggref, aggname = 0x400b8af0 "count", basetype = 0,
>   aggtype = 23, target = 0x400b8b08, aggno = 0, usenulls = 0 '\000'}
> (gdb) p *(Var*)((Aggref*)((TargetEntry*)0x400b8a70)->expr)->target
> $123 = {type = T_Var, varno = 4, varattno = 2, vartype = 23, vartypmod = -1,
>   varlevelsup = 0, varnoold = 4, varoattno = 2}
> 
> which is all fine except that the two different expressions have been
> given the same Resdom number (resno = 1 in both).  That confuses
> make_groupPlan into thinking that they are the same expression, and
> trouble ensues.
> 
> If I understand this stuff correctly, the rewriter should have been
> careful to assign different Resdom numbers to distinct expressions
> in the target and group-by lists of the rewritten query.  That's how
> things look in an un-rewritten query, anyway.  So I think this is a
> rewrite bug.
> 
> If you don't like that answer, it might be possible to change the
> planner so that it doesn't put any faith in the Resdom numbers, but
> uses equal() on the expr fields to decide whether target and group-by
> entries are the same.  That would be a slower but probably much more
> robust approach.  Jan, what do you think?
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] views and group by (formerly: create view as selec

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> I now get.  I am sure it instills confidence in our users:
>     ERROR:  union_planner: query is marked hasAggs, but I don't see any

Yes :-(.  I've been waiting for Jan to respond to the issue --- I think
this is a rewriter problem, so I wanted to know whether he could do
anything with it.  (See my message "GROUP BY fixes committed" dated
02 May 1999 20:54:30 -0400.)

It'd be possible to work around this problem inside the planner, by not
believing what the rewriter says about either resnos or hasAggs, but
that seems like a kluge rather than a fix.
        regards, tom lane


Re: [HACKERS] views and group by (formerly: create view as selec

From
jwieck@debis.com (Jan Wieck)
Date:
>
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > I now get.  I am sure it instills confidence in our users:
> >  ERROR:  union_planner: query is marked hasAggs, but I don't see any
>
> Yes :-(.  I've been waiting for Jan to respond to the issue --- I think
> this is a rewriter problem, so I wanted to know whether he could do
> anything with it.  (See my message "GROUP BY fixes committed" dated
> 02 May 1999 20:54:30 -0400.)
>
> It'd be possible to work around this problem inside the planner, by not
> believing what the rewriter says about either resnos or hasAggs, but
> that seems like a kluge rather than a fix.

    Sorry - forgot about that one.

    I  think  the best place to check if the query has aggregates
    or not is at the beginning of the planner. The rewrite system
    is  recursive,  and  thus  the  check at the end of one cycle
    doesn't guarantee that it will still be true at  the  end  of
    all rewrites.

    OTOH  views  with  aggregates  are very buggy and introducing
    ton's of problems (as the other threads show). I'm  not  sure
    that  it was a good idea to make it partially working :-( for
    v6.4. There was only the advice that they shouldn't be  used.
    Now  it's  a released feature. More and more pressure for the
    subselecting RTE's which are IMHO the only way to solve  that
    all cleanly.

    The  other  issue  about the resno's is something that I must
    still search for in the  rewriter.  There  are  bad  problems
    where  it  seems to mangle up not only resno's, it also get's
    lost of  varlevelsup  somehow  and  producing  totally  wrong
    varno's  into  subselects.  Still  problems  coming  from the
    EXCEPT/INTERSECT patch I haven't found so far >:-{


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #