Home > mailing lists

Re: Group By and wildcards... - Mailing list pgsql-general

From	Greg Stark
Subject	Re: Group By and wildcards...
Date	February 19, 2005 19:04:50
Msg-id	87650oweuz.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: Group By and wildcards... (Bruno Wolff III <bruno@wolff.to>)
List	pgsql-general

Tree view

Bruno Wolff III <bruno@wolff.to> writes:

> On Sat, Feb 19, 2005 at 12:07:12 -0200,
>   Jon Lapham <lapham@jandr.org> wrote:
> >
> > SELECT a.*, b.*, c.*, SUM(d.blah)
> > FROM a, b, c, d
> > WHERE <some join conditions>
> > GROUP BY a.*, b.*, c.*
> >
> > Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
> > explicit column names of all the column in a, b, and c.
> >
> > This becomes a maintenance nightmare as you add/drop column in these
> > tables...
>
> Don't those tables have primary keys? Grouping by the primay key of each
> table will produce the same result set as grouping by all of the columns.

Actually it would be kind of nice to have this as a feature. Or mysql's
feature of treating any unnamed columns as something like DISTINCT ON.

However there are a few approaches for dealing with it. None of which are
perfect but if they match your needs they work well.

In the query above you could turn SUM(d.blah) into a subquery expression. This
works well as long as you don't have multiple aggregate queries on the same
table.

SELECT a.*,b.*,c.*,
       (SELECT sum(blah) FROM d WHERE ...) AS d_sum
  FROM a,b,c

This doesn't require a GROUP BY step which means it'll probably be faster. On
the other hand it effectively forces a nested loop scan on d which is not
necessarily the fastest. And if you have multiple aggregates postgres it
forces separate lookups for the same data. It would be nice to have some
feature for breaking out subquery expressions that return multiple rows into
multiple output columns. Something like:

SELECT a.*,b.*,c.*,
       (SELECT sum(blah),avg(blah) FROM d WHERE ...) AS (d_sum,d_avg)
  FROM a,b,c

You could also turn the above into a more complex join like:

SELECT *
  FROM a,b,c,
       (SELECT groupname, SUM(blah) as d_sum FROM d GROUP BY groupname) AS d
 WHERE ...
   AND c.groupname = d.groupname

This works well as long as you didn't have the aggregate function applying to
overlapping subsets of d before. (eg, it won't work for sum(product.price) if
multiple invoices can contain the same product).

alternatively you can do something like

SELECT *
  FROM a,b,c,
       (select a.id as a_id, b.id as b_id, c.id as c_id,
               sum(blah) as d_sum
          from a,b,c,d
         where ...
         group by a.id,b.id,c.id
       ) AS sub
 WHERE a.id = a_id
   AND b.id = b_id
   AND c.id = c_id

But that's pretty silly and not usually necessary.

--
greg

pgsql-general by date:

From: Russ Brown
Date: 19 February 2005, 18:34:09
Subject: Re: Group By and wildcards...

From: Bruno Wolff III
Date: 19 February 2005, 19:05:32
Subject: Re: Group By and wildcards...

Re: Group By and wildcards... - Mailing list pgsql-general

Previous

Next