Home > mailing lists

Re: aggregation question - Mailing list pgsql-sql

From	Samuel Gendler
Subject	Re: aggregation question
Date	December 1, 2010 05:06:51
Msg-id	AANLkTik3-kgfGdQx6iKTDW7te5fEFeVZtONQZLqJM-Ut@mail.gmail.com Whole thread
In response to	aggregation question (Samuel Gendler <sgendler@ideasculptor.com>)
List	pgsql-sql

Tree view

On Tue, Nov 30, 2010 at 7:47 PM, Samuel Gendler <sgendler@ideasculptor.com> wrote:

I have a fact table with a number of foreign keys to dimension data and some measure columns. Let's assume I've got dimension columns a,b,c,d,e, and f and measure columns x, and y.

I need to be able to find the value of f, x, and y for the row with min(x/y) when e in (1,2) and max(x/y) when e not in (1,2) with grouping by a,b,c, and d.

A query which would identify the row in question would look like this:

select min(x/y), a, b, c, d, e from my_table group by a,b,c,d,e order by a,b,c,d

But I don't know how to go about also extracting the value of x, y, and f for the row which is matched - I suppose I could join to a subquery, but that strikes me as probably not being the most efficient way to accomplish it.

Actually, it occurs to me that I've potentially got more than one row for any given a,b,c,d,e,f tuple, but I want a min for all rows aggregated by f

Basically, for each (a,b,c,d,e) I want to return sum(x), sum(y), and f (as well as a,b,c,d,e) for the f with min(sum(x)/sum(y))

I solved this, but I'm not sure if my solution is ideal

    -- select distinct causes multiple rows that tie for rank = 1

    -- to condense down to a single row. Outer query also strips

    -- any rows from windows with less than 3 entries.

    select distinct count, total_ms, time, fk_e,
           fk_a, fk_b, fk_c
    from (        -- grab count and total_ms for each fk_g, grouped by        -- a,b,c, time, and e.  Use rank() window function        -- to rank each row in the window, ordered by total_ms/count        -- and use count(*) as window function to get total rows in        -- the window. 
        select sum(f.x) as count, sum(f.y) as total_ms,
               f.time as time,
               f.fk_a, f.fk_b, f.fk_c,
               f.fk_e, f.fk_f,
               rank() over (w_agg order by sum(f.x)/sum(f.y)) as rank,
               count(*) over (w_agg) as rows
        from fact_table f
        group by 3,4,5,6,7,8
        window w_agg as (partition by time,
                         fk_a, fk_b, fk_c,
                         fk_e)
    ) q
    where q.rank = 1 and q.rows >= 3

I will need to run this twice - once with rank() ordered descending and the inner query actually limited by fk_e in (4,14) and once with rank() ordered ascending for fk_e not in (4,14). It is possible I can put a case statement in the inner select in order to handle that. I haven't tried it yet.

Is there a better solution than the outer 'select distinct' in order to only see rows that have rank = 1 and to never have duplicate rows?

pgsql-sql by date:

From: Jeff Bland
Date: 01 December 2010, 02:48:04
Subject: DELETE WHERE EXISTS unexpected results

From: Lee Hachadoorian
Date: 01 December 2010, 05:14:03
Subject: Re: OT - load a shp file

Re: aggregation question - Mailing list pgsql-sql

Previous

Next