Thread: Bucketing Row Data in columns
Hi all,<br /> I need help on creating a sql, not a problem even if its pl/sql<br /><br /> I have orders table schema isas follow<br /><br /> orders(order_id,user_id, create_timestamp, amount)<br /><br /> and I want to generate a report like<br/> for the past 3 days bucketing purchases i.e SUM(amount) every day in columns<br /> i.e result will be having thesecolumns.<br /><br /> (user_id, amount_day1, amount_day2, amount_day3)<br /><br /> ex:<br /> am leaving order_id assumethey are auto incrementing and unique, date format dd/mm/yyyy<br /> (user_id, create_timestamp, amount)<br /> (user1,01/01/2009,100)<br /> (user1, 01/01/2009,100)<br /> (user2, 01/01/2009,100)<br /> (user2, 02/01/2009,100)<br /> (user2,02/01/2009,100)<br /> (user1, 02/01/2009,100)<br /> (user2, 03/01/2009,100)<br /> (user2, 03/01/2009,100)<br /> (user3,03/01/2009,100)<br /><br /><br /> result<br /><br /> (user_id, amount_day1, amount_day2, amount_day3)<br /> (user1,200, 200, 0)<br /> (user2, 100, 200, 200)<br /> (user3, 0, 0, 100)<br /><br /><br /> hope you guys got what I am tryingto generate through sql.<br /><br /> I could get this data in each row, but I want it in columns.<br /> Can anyonehelp me on this? lets assume the buckets are fixed i.e 3 only. but I wish to get them unlimited i.e day 1 to day 20.<br/><br /><table cellpadding="0" cellspacing="0" width="100%"><tr><td> Regards<br /> Sandeep Bandela </td></tr></table>
Hello Mr. Sandeep Bandela, I have gone through your scenario and come up with the following solution. SELECT USER_ID, CREATE_TIMESTAMP::DATE, SUM(AMOUNT) FROM ORDERS GROUP BY USER_ID, CREATE_TIMESTAMP ORDER BY USER_ID, CREATE_TIMESTAMP; Maybe you need to do little modification on the query to get what you want. Best Regards James Kitambara Database Administrator -------------------------------------ORGINAL MESSAGE------------------------------------------------ --- On Wed, 24/6/09, Sandeep <gibsosmat@gmail.com> wrote:
|
Hello, Sandeep,
I am not sure if this is what you want.
I came up with this query
SELECT *
FROM (SELECT user_id,SUM(amount) as day1 FROM t1 WHERE create_timestamp = '2009-1-1' GROUP BY "user_id") a
NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day2 FROM t1 WHERE create_timestamp = '2009-1-2' GROUP BY "user_id") b
NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day3 FROM t1 WHERE create_timestamp = '2009-1-3' GROUP BY "user_id") c
FROM (SELECT user_id,SUM(amount) as day1 FROM t1 WHERE create_timestamp = '2009-1-1' GROUP BY "user_id") a
NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day2 FROM t1 WHERE create_timestamp = '2009-1-2' GROUP BY "user_id") b
NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day3 FROM t1 WHERE create_timestamp = '2009-1-3' GROUP BY "user_id") c
The solution is not totally correct because it returns NULL in the places you return 0.
It seems the SUM() returns NULL when it gets an all NULL column...
Is it problematic for your application ?
Also, i 'm not sure if I fully understand your last sentence
lets assume the buckets are fixed i.e 3 only. but I wish to get them unlimited i.e day 1 to day 20.
You say that the buckets are fixed at 3. So, you mean the table output will always have 4 columns? 3 days plus one for user_id ?
If you want 20 buckets it must be a different query...
Could you please clarify what you mean when you say that you want to get a bucket unlimited ?
Best,
Oliveiros
----- Original Message -----From: SandeepSent: Wednesday, June 24, 2009 5:39 PMSubject: [SQL] Bucketing Row Data in columnsHi all,
I need help on creating a sql, not a problem even if its pl/sql
I have orders table schema is as follow
orders(order_id,user_id, create_timestamp, amount)
and I want to generate a report like
for the past 3 days bucketing purchases i.e SUM(amount) every day in columns
i.e result will be having these columns.
(user_id, amount_day1, amount_day2, amount_day3)
ex:
am leaving order_id assume they are auto incrementing and unique, date format dd/mm/yyyy
(user_id, create_timestamp, amount)
(user1, 01/01/2009,100)
(user1, 01/01/2009,100)
(user2, 01/01/2009,100)
(user2, 02/01/2009,100)
(user2, 02/01/2009,100)
(user1, 02/01/2009,100)
(user2, 03/01/2009,100)
(user2, 03/01/2009,100)
(user3, 03/01/2009,100)
result
(user_id, amount_day1, amount_day2, amount_day3)
(user1, 200, 200, 0)
(user2, 100, 200, 200)
(user3, 0, 0, 100)
hope you guys got what I am trying to generate through sql.
I could get this data in each row, but I want it in columns.
Can anyone help me on this? lets assume the buckets are fixed i.e 3 only. but I wish to get them unlimited i.e day 1 to day 20.
Regards
Sandeep Bandela
I would be suspicious of this sort of solution of turning rows into columns by mean of a series of correlated sub-selects. Once the data set gets large and the number of columns goes over 2 or 3 this will in all likelihood not perform well. I had the pleasure of re-writing a "report" which was based on count() (similar to sum()) per user_id with the counts going into various columns per user. 18000 users, a dozen columns from table of 2 million rows, report took >1,000,000 seconds (yes almost 12 days) to complete. Re-write runs in 5-10 minutes (now at 10M rows) by getting the counts as rows (user, item, count) into a temp table and making the columns from the temp table (pl/psql) Getting the counts takes half the time, making the flattened report takes half the time. Oliveiros Cristina wrote: > Hello, Sandeep, > > I am not sure if this is what you want. > > I came up with this query > > SELECT * > FROM (SELECT user_id,SUM(amount) as day1 FROM t1 WHERE > create_timestamp = '2009-1-1' GROUP BY "user_id") a > NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day2 FROM t1 WHERE > create_timestamp = '2009-1-2' GROUP BY "user_id") b > NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day3 FROM t1 WHERE > create_timestamp = '2009-1-3' GROUP BY "user_id") c > > The solution is not totally correct because it returns NULL in the > places you return 0. > It seems the SUM() returns NULL when it gets an all NULL column... > Is it problematic for your application ? > > Also, i 'm not sure if I fully understand your last sentence > /lets assume the buckets are fixed i.e 3 only. but I wish to get them > unlimited i.e day 1 to day 20./ > > You say that the buckets are fixed at 3. So, you mean the table output > will always have 4 columns? 3 days plus one for user_id ? > If you want 20 buckets it must be a different query... > > Could you please clarify what you mean when you say that you want to > get a bucket unlimited ? > > Best, > Oliveiros > > ----- Original Message ----- > > *From:* Sandeep <mailto:gibsosmat@gmail.com> > *To:* pgsql-sql@postgresql.org <mailto:pgsql-sql@postgresql.org> > *Sent:* Wednesday, June 24, 2009 5:39 PM > *Subject:* [SQL] Bucketing Row Data in columns > > Hi all, > I need help on creating a sql, not a problem even if its pl/sql > > I have orders table schema is as follow > > orders(order_id,user_id, create_timestamp, amount) > > and I want to generate a report like > for the past 3 days bucketing purchases i.e SUM(amount) every day > in columns > i.e result will be having these columns. > > (user_id, amount_day1, amount_day2, amount_day3) > > ex: > am leaving order_id assume they are auto incrementing and unique, > date format dd/mm/yyyy > (user_id, create_timestamp, amount) > (user1, 01/01/2009,100) > (user1, 01/01/2009,100) > (user2, 01/01/2009,100) > (user2, 02/01/2009,100) > (user2, 02/01/2009,100) > (user1, 02/01/2009,100) > (user2, 03/01/2009,100) > (user2, 03/01/2009,100) > (user3, 03/01/2009,100) > > > result > > (user_id, amount_day1, amount_day2, amount_day3) > (user1, 200, 200, 0) > (user2, 100, 200, 200) > (user3, 0, 0, 100) > > > hope you guys got what I am trying to generate through sql. > > I could get this data in each row, but I want it in columns. > Can anyone help me on this? lets assume the buckets are fixed i.e > 3 only. but I wish to get them unlimited i.e day 1 to day 20. > > Regards > Sandeep Bandela >
I admit that must be a more elegant and faster solution with pl/psql (or whatever other languages) As I don't know nothing about pl/psql I tried with pure sql (if you don't have a hunting dog, hunt with a cat) But obviously this solution doesn't scale well if you have a giant table with lots of columns ----- Original Message ----- From: "Rob Sargent" <robjsargent@gmail.com> To: <pgsql-sql@postgresql.org> Sent: Thursday, June 25, 2009 4:57 PM Subject: Re: [SQL] Bucketing Row Data in columns >I would be suspicious of this sort of solution of turning rows into columns >by mean of a series of correlated sub-selects. Once the data set gets >large and the number of columns goes over 2 or 3 this will in all >likelihood not perform well. I had the pleasure of re-writing a "report" >which was based on count() (similar to sum()) per user_id with the counts >going into various columns per user. 18000 users, a dozen columns from >table of 2 million rows, report took >1,000,000 seconds (yes almost 12 >days) to complete. Re-write runs in 5-10 minutes (now at 10M rows) by >getting the counts as rows (user, item, count) into a temp table and making >the columns from the temp table (pl/psql) Getting the counts takes half >the time, making the flattened report takes half the time. > > > > > Oliveiros Cristina wrote: >> Hello, Sandeep, >> I am not sure if this is what you want. >> I came up with this query >> SELECT * >> FROM (SELECT user_id,SUM(amount) as day1 FROM t1 WHERE create_timestamp = >> '2009-1-1' GROUP BY "user_id") a >> NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day2 FROM t1 WHERE >> create_timestamp = '2009-1-2' GROUP BY "user_id") b >> NATURAL FULL JOIN (SELECT user_id,SUM(amount) as day3 FROM t1 WHERE >> create_timestamp = '2009-1-3' GROUP BY "user_id") c >> The solution is not totally correct because it returns NULL in the >> places you return 0. >> It seems the SUM() returns NULL when it gets an all NULL column... >> Is it problematic for your application ? >> Also, i 'm not sure if I fully understand your last sentence >> /lets assume the buckets are fixed i.e 3 only. but I wish to get them >> unlimited i.e day 1 to day 20./ >> You say that the buckets are fixed at 3. So, you mean the table output >> will always have 4 columns? 3 days plus one for user_id ? >> If you want 20 buckets it must be a different query... >> Could you please clarify what you mean when you say that you want to get >> a bucket unlimited ? >> Best, >> Oliveiros >> >> ----- Original Message ----- >> *From:* Sandeep <mailto:gibsosmat@gmail.com> >> *To:* pgsql-sql@postgresql.org <mailto:pgsql-sql@postgresql.org> >> *Sent:* Wednesday, June 24, 2009 5:39 PM >> *Subject:* [SQL] Bucketing Row Data in columns >> >> Hi all, >> I need help on creating a sql, not a problem even if its pl/sql >> >> I have orders table schema is as follow >> >> orders(order_id,user_id, create_timestamp, amount) >> >> and I want to generate a report like >> for the past 3 days bucketing purchases i.e SUM(amount) every day >> in columns >> i.e result will be having these columns. >> >> (user_id, amount_day1, amount_day2, amount_day3) >> >> ex: >> am leaving order_id assume they are auto incrementing and unique, >> date format dd/mm/yyyy >> (user_id, create_timestamp, amount) >> (user1, 01/01/2009,100) >> (user1, 01/01/2009,100) >> (user2, 01/01/2009,100) >> (user2, 02/01/2009,100) >> (user2, 02/01/2009,100) >> (user1, 02/01/2009,100) >> (user2, 03/01/2009,100) >> (user2, 03/01/2009,100) >> (user3, 03/01/2009,100) >> >> >> result >> >> (user_id, amount_day1, amount_day2, amount_day3) >> (user1, 200, 200, 0) >> (user2, 100, 200, 200) >> (user3, 0, 0, 100) >> >> >> hope you guys got what I am trying to generate through sql. >> >> I could get this data in each row, but I want it in columns. >> Can anyone help me on this? lets assume the buckets are fixed i.e >> 3 only. but I wish to get them unlimited i.e day 1 to day 20. >> >> Regards >> Sandeep Bandela >> > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql
2009/6/25 James Kitambara <jameskitambara@yahoo.co.uk>: > > Hello Mr. Sandeep Bandela, > > I have gone through your scenario and come up with the following solution. > > SELECT USER_ID, CREATE_TIMESTAMP::DATE, SUM(AMOUNT) > FROM ORDERS > GROUP BY USER_ID, CREATE_TIMESTAMP > ORDER BY USER_ID, CREATE_TIMESTAMP; > > Maybe you need to do little modification on the query to get what you want. > Contrib tablefunc/crosstab function may help you. http://www.postgresql.org/docs/current/interactive/tablefunc.html Osvaldo
Assuming you know your dates beforehand, you could try a CASE statement. Something like: select order_id, sum(case when timestamp::date = 01/01/2009'' then amount else 0 end) as amount_day1, sum(case when timestamp::date = '02/01/2009' then amount else 0 end) as amount_day2, sum(case when timestamp::date = '03/01/2009' then amount else 0 end) as amount_day3 from orders group by order_id On Wed, Jun 24, 2009 at 9:39 AM, Sandeep<gibsosmat@gmail.com> wrote: > Hi all, > I need help on creating a sql, not a problem even if its pl/sql > > I have orders table schema is as follow > > orders(order_id,user_id, create_timestamp, amount) > > and I want to generate a report like > for the past 3 days bucketing purchases i.e SUM(amount) every day in columns > i.e result will be having these columns. > > (user_id, amount_day1, amount_day2, amount_day3) > > ex: > am leaving order_id assume they are auto incrementing and unique, date > format dd/mm/yyyy > (user_id, create_timestamp, amount) > (user1, 01/01/2009,100) > (user1, 01/01/2009,100) > (user2, 01/01/2009,100) > (user2, 02/01/2009,100) > (user2, 02/01/2009,100) > (user1, 02/01/2009,100) > (user2, 03/01/2009,100) > (user2, 03/01/2009,100) > (user3, 03/01/2009,100) > > > result > > (user_id, amount_day1, amount_day2, amount_day3) > (user1, 200, 200, 0) > (user2, 100, 200, 200) > (user3, 0, 0, 100) > > > hope you guys got what I am trying to generate through sql. > > I could get this data in each row, but I want it in columns. > Can anyone help me on this? lets assume the buckets are fixed i.e 3 only. > but I wish to get them unlimited i.e day 1 to day 20. > > Regards > Sandeep Bandela