Re: Group By and wildcards... - Mailing list pgsql-general

From Greg Stark
Subject Re: Group By and wildcards...
Date
Msg-id 87650oweuz.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Group By and wildcards...  (Bruno Wolff III <bruno@wolff.to>)
List pgsql-general
Bruno Wolff III <bruno@wolff.to> writes:

> On Sat, Feb 19, 2005 at 12:07:12 -0200,
>   Jon Lapham <lapham@jandr.org> wrote:
> >
> > SELECT a.*, b.*, c.*, SUM(d.blah)
> > FROM a, b, c, d
> > WHERE <some join conditions>
> > GROUP BY a.*, b.*, c.*
> >
> > Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
> > explicit column names of all the column in a, b, and c.
> >
> > This becomes a maintenance nightmare as you add/drop column in these
> > tables...
>
> Don't those tables have primary keys? Grouping by the primay key of each
> table will produce the same result set as grouping by all of the columns.

Actually it would be kind of nice to have this as a feature. Or mysql's
feature of treating any unnamed columns as something like DISTINCT ON.

However there are a few approaches for dealing with it. None of which are
perfect but if they match your needs they work well.

In the query above you could turn SUM(d.blah) into a subquery expression. This
works well as long as you don't have multiple aggregate queries on the same
table.

SELECT a.*,b.*,c.*,
       (SELECT sum(blah) FROM d WHERE ...) AS d_sum
  FROM a,b,c


This doesn't require a GROUP BY step which means it'll probably be faster. On
the other hand it effectively forces a nested loop scan on d which is not
necessarily the fastest. And if you have multiple aggregates postgres it
forces separate lookups for the same data. It would be nice to have some
feature for breaking out subquery expressions that return multiple rows into
multiple output columns. Something like:

SELECT a.*,b.*,c.*,
       (SELECT sum(blah),avg(blah) FROM d WHERE ...) AS (d_sum,d_avg)
  FROM a,b,c


You could also turn the above into a more complex join like:

SELECT *
  FROM a,b,c,
       (SELECT groupname, SUM(blah) as d_sum FROM d GROUP BY groupname) AS d
 WHERE ...
   AND c.groupname = d.groupname

This works well as long as you didn't have the aggregate function applying to
overlapping subsets of d before. (eg, it won't work for sum(product.price) if
multiple invoices can contain the same product).

alternatively you can do something like

SELECT *
  FROM a,b,c,
       (select a.id as a_id, b.id as b_id, c.id as c_id,
               sum(blah) as d_sum
          from a,b,c,d
         where ...
         group by a.id,b.id,c.id
       ) AS sub
 WHERE a.id = a_id
   AND b.id = b_id
   AND c.id = c_id

But that's pretty silly and not usually necessary.

--
greg

pgsql-general by date:

Previous
From: Russ Brown
Date:
Subject: Re: Group By and wildcards...
Next
From: Bruno Wolff III
Date:
Subject: Re: Group By and wildcards...