Thread: GROUP BY: v6.1 vs. v6.5.2

GROUP BY: v6.1 vs. v6.5.2

From

Ray Plante

Date:

31 January 2000, 21:02:59

Hi,

I have a question regarding a change in the way GROUP BY has work
between PostgreSQL versions 6.1 and 6.5.2.  The latter version's man
page for SELECT says, 
 "When GROUP BY is present, it is not valid to refer to ungrouped  columns except within aggregate functions, since
therewould be   more than one possible value to return for an ungrouped column."
 

This seems sensible.  However, version 6.1 did not have this
restriction; for any ungrouped function not within an aggregate, the
first matching value was returned.  In effect, a default aggregate was
applied.  Unfortunately, my application took advantage of this
behavior.  My basic question is, what's the easiest way to duplicate
this behavior using v6.5.2?

Here's a simplified run down of what I am doing, starting with the
tables I'm operating on:
 create table projects (   yr int,   nimg int,   authors text[] ); create table docs (   yr int,   im int,   title text
);select p.yr,p.nimg,p.authors,d.im,d.title   into table joint   from projects p,docs d   where p.yr=d.yr;
 

Here's the sort of search I would using the older PG version:
 select yr,nimg,authors from joint    where title~'word'   group by yr;

There is a one-to-many mapping between projects and documents.  Thus,
the attributes from the projects table will be duplicated in joint for
each matching record from docs.  What I want from the above select are
the projects that contain any doc with a title matching 'word'; but, I
only want one record per project.  

The simple solution should be including the other project attributes
list in the select clause in my group-by clause:
 select yr,nimg,authors from joint   where title~'word'   group by yr,nimg,authors;

The problem is that authors is a text array.  When I do this, I get the
error message:

ERROR:  Unable to identify a binary operator '<' for types _text and
_text

I've also tried creating an aggregate function for text arrays that just
returns the last value encountered; however, the array nature gave me
trouble.  (I'll save the details for another posting if necessary.)

many thanks,
Ray Plante

Re: [SQL] GROUP BY: v6.1 vs. v6.5.2

From

Tom Lane

Date:

01 February 2000, 01:15:21

Ray Plante <rplante@ncsa.uiuc.edu> writes:
>   "When GROUP BY is present, it is not valid to refer to ungrouped
>    columns except within aggregate functions, since there would be 
>    more than one possible value to return for an ungrouped column."

> This seems sensible.  However, version 6.1 did not have this
> restriction; for any ungrouped function not within an aggregate, the
> first matching value was returned.  In effect, a default aggregate was
> applied.

Right.  But that was contrary to the SQL spec, and furthermore the
results were not very predictable (since there's no good way to know
which tuple the executor will find first).  So we changed it.

> Unfortunately, my application took advantage of this
> behavior.  My basic question is, what's the easiest way to duplicate
> this behavior using v6.5.2?

The usual advice is
select x, min(y) from table group by x;

but this depends on the existence of a min() function for the datatype
of y.  The approach you were trying depends on a '<' operator; neither
are normally provided for array types.  (Although if the array component
type has a '<', it seems like it shouldn't be that hard to provide an
ordering operator for the array type... another TODO list item...)

What you can do at the moment is
select distinct on x  x, y from table;

This is nonstandard, klugy, and just as unpredictable as the old GROUP
BY behavior, but you only need an ordering operator on x.

BTW, in 7.0 the syntax will be
select distinct on (x) x, y from table;

per recent discussion in this list.
        regards, tom lane

Re: [SQL] GROUP BY: v6.1 vs. v6.5.2

From

Ray Plante

Date:

01 February 2000, 15:08:29

Hi Tom,

> What you can do at the moment is
> 
>     select distinct on x  x, y from table;
> 
> This is nonstandard, klugy, and just as unpredictable as the old GROUP
> BY behavior, but you only need an ordering operator on x.

Excellent.  This is just the short-term solution I can use until I work
out a permanent one.  Thanks!

Ray