Re: Bucketing Row Data in columns - Mailing list pgsql-sql

From Rob Sargent
Subject Re: Bucketing Row Data in columns
Date
Msg-id 4A439E6E.3050202@gmail.com
Whole thread Raw
In response to Re: Bucketing Row Data in columns  ("Oliveiros Cristina" <oliveiros.cristina@marktest.pt>)
List pgsql-sql
I would be suspicious of this sort of solution of turning rows into 
columns by mean of a series of correlated sub-selects.  Once the data 
set gets large and the number of columns goes over 2 or 3 this will in 
all likelihood not perform well. 
I had the pleasure of re-writing a "report" which was based on count() 
(similar to sum()) per user_id with the counts going into various 
columns per user.  18000 users, a dozen columns from table of 2 million 
rows, report took >1,000,000 seconds (yes almost 12 days) to complete.  
Re-write runs in 5-10 minutes (now at 10M rows) by getting the counts as 
rows (user, item, count) into a temp table and making the columns from 
the temp table (pl/psql)  Getting the counts takes half the time, making 
the flattened report takes half the time.




Oliveiros Cristina wrote:
> Hello, Sandeep,
>  
> I am not sure if this is what you want.
>  
> I came up with this query
>  
> SELECT *
> FROM (SELECT user_id,SUM(amount) as day1 FROM t1 WHERE 
> create_timestamp = '2009-1-1' GROUP BY "user_id") a
> NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day2 FROM t1 WHERE 
> create_timestamp = '2009-1-2' GROUP BY "user_id") b
> NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day3 FROM t1 WHERE 
> create_timestamp = '2009-1-3' GROUP BY "user_id") c
>  
> The solution is not totally correct because it returns NULL in the 
> places you return 0.
> It seems the SUM() returns NULL when it gets an all NULL column...
> Is it problematic for your application ?
>  
> Also, i 'm not sure if I fully understand your last sentence
> /lets assume the buckets are fixed i.e 3 only. but I wish to get them 
> unlimited i.e day 1 to day 20./
>  
> You say that the buckets are fixed at 3. So, you mean the table output 
> will always have 4 columns? 3 days plus one for user_id ?
> If you want 20 buckets it must be a different query...
>  
> Could you please clarify what you mean when you say that you want to 
> get a bucket unlimited ?
>  
> Best,
> Oliveiros
>
>     ----- Original Message -----
>      
>     *From:* Sandeep <mailto:gibsosmat@gmail.com>
>     *To:* pgsql-sql@postgresql.org <mailto:pgsql-sql@postgresql.org>
>     *Sent:* Wednesday, June 24, 2009 5:39 PM
>     *Subject:* [SQL] Bucketing Row Data in columns
>
>     Hi all,
>     I need help on creating a sql, not a problem even if its pl/sql
>
>     I have orders table schema is as follow
>
>     orders(order_id,user_id, create_timestamp, amount)
>
>     and I want to generate a report like
>     for the past 3 days bucketing purchases i.e SUM(amount) every day
>     in columns
>     i.e result will be having these columns.
>
>     (user_id, amount_day1, amount_day2, amount_day3)
>
>     ex:
>     am leaving order_id assume they are auto incrementing and unique,
>     date format dd/mm/yyyy
>     (user_id, create_timestamp, amount)
>     (user1, 01/01/2009,100)
>     (user1, 01/01/2009,100)
>     (user2, 01/01/2009,100)
>     (user2, 02/01/2009,100)
>     (user2, 02/01/2009,100)
>     (user1, 02/01/2009,100)
>     (user2, 03/01/2009,100)
>     (user2, 03/01/2009,100)
>     (user3, 03/01/2009,100)
>
>
>     result
>
>     (user_id, amount_day1, amount_day2, amount_day3)
>     (user1, 200, 200, 0)
>     (user2, 100, 200, 200)
>     (user3, 0, 0, 100)
>
>
>     hope you guys got what I am trying to generate through sql.
>
>     I could get this data in each row, but I want it in columns.
>     Can anyone help me on this? lets assume the buckets are fixed i.e
>     3 only. but I wish to get them unlimited i.e day 1 to day 20.
>
>     Regards
>     Sandeep Bandela
>



pgsql-sql by date:

Previous
From: "Oliveiros Cristina"
Date:
Subject: Re: Bucketing Row Data in columns
Next
From: "Oliveiros Cristina"
Date:
Subject: Re: Bucketing Row Data in columns