Thread: Aggregation question
Hi
I have a situation that I need flexible number columns to model the business requirements. It could go up to 1000 columns and will be a sparse matrix. One option I was considering was to have a table with col1......col1000
Other option I can consider is store all of them as name values in single column as a string
option 1
col1 | col2 |col 3 |........|col56|.....|col77|
10 | 2 | | | 4 | | |
2 | | 4 | | |.... |6 |
option 2
"col1=10,col2=2,col56=4"
"col1=2,col3=4,col77=6"
I will have about 50Million such records in a table.
I need to aggregate the column values for a given day by taking sum of all the rows
"col1=12,col2=2,col3=4,col56=4,col77=6"
Second option looks very elegant but aggregation is hard
Besides, second option may have performance penalty (Especially when you have 50 Million records)
Any one can give any pointers or comments on how to model this an how to aggregate it?
Thanks
Dilip
if you want to put 1000 columns into one table, your data structure needs some further investigation. you are trying to solve the wrong problem. Regards, Hans Dilip Angal wrote: > Hi > > I have a situation that I need flexible number columns to model the > business requirements. It could go up to 1000 columns and will be a > sparse matrix. One option I was considering was to have a table with > col1......col1000 > Other option I can consider is store all of them as name values in > single column as a string > > option 1 > col1 | col2 |col 3 |........|col56|.....|col77| > 10 | 2 | | | 4 | | | > 2 | | 4 | | |.... |6 | > > option 2 > > "col1=10,col2=2,col56=4" > "col1=2,col3=4,col77=6" > > I will have about 50Million such records in a table. > > I need to aggregate the column values for a given day by taking sum of > all the rows > > "col1=12,col2=2,col3=4,col56=4,col77=6" > > Second option looks very elegant but aggregation is hard > Besides, second option may have performance penalty (Especially when you > have 50 Million records) > > Any one can give any pointers or comments on how to model this an how to > aggregate it? > > Thanks > Dilip > > -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/2952/30706 or +43/664/233 90 75 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
On Tue, Feb 03, 2004 at 21:27:48 -0800, Dilip Angal <dilip_angal@yahoo.com> wrote: > Hi > > I have a situation that I need flexible number columns to model the business requirements. It could go up to 1000 columnsand will be a sparse matrix. One option I was considering was to have a table with col1......col1000 I don't think this is really a hackers' topic. Probably it belongs on general, though with the sizes you are talking about performance might be appropiate. I would think that the natural way to reprensent a sparse matrix would be to use a three column table with row number, column number, and cell value. Doing aggregates on a column number will be easy. If the data isn't really a matrix and the different cells are different data types, you may be able to store all the values as strings and convert to the appropiate type as needed. > Other option I can consider is store all of them as name values in single column as a string > > option 1 > col1 | col2 |col 3 |........|col56|.....|col77| > 10 | 2 | | | 4 | | | > 2 | | 4 | | |.... |6 | > > option 2 > > "col1=10,col2=2,col56=4" > "col1=2,col3=4,col77=6" > > I will have about 50Million such records in a table. > > I need to aggregate the column values for a given day by taking sum of all the rows > > "col1=12,col2=2,col3=4,col56=4,col77=6" > > Second option looks very elegant but aggregation is hard > Besides, second option may have performance penalty (Especially when you have 50 Million records) > > Any one can give any pointers or comments on how to model this an how to aggregate it? > > Thanks > Dilip >
On Sat, 7 Feb 2004, [ISO-8859-1] Hans-J?rgen Sch?nig wrote: > if you want to put 1000 columns into one table, your data structure > needs some further investigation. you are trying to solve the wrong problem. Dilip, you may try our contrib/hstore from http://www.sai.msu.su/~megera/postgres/gist/ which is sort of perl hash and could provide you some flexibility. But you may indeed need to think if you really needed such structure :) Oleg > > Regards, > > Hans > > > Dilip Angal wrote: > > Hi > > > > I have a situation that I need flexible number columns to model the > > business requirements. It could go up to 1000 columns and will be a > > sparse matrix. One option I was considering was to have a table with > > col1......col1000 > > Other option I can consider is store all of them as name values in > > single column as a string > > > > option 1 > > col1 | col2 |col 3 |........|col56|.....|col77| > > 10 | 2 | | | 4 | | | > > 2 | | 4 | | |.... |6 | > > > > option 2 > > > > "col1=10,col2=2,col56=4" > > "col1=2,col3=4,col77=6" > > > > I will have about 50Million such records in a table. > > > > I need to aggregate the column values for a given day by taking sum of > > all the rows > > > > "col1=12,col2=2,col3=4,col56=4,col77=6" > > > > Second option looks very elegant but aggregation is hard > > Besides, second option may have performance penalty (Especially when you > > have 50 Million records) > > > > Any one can give any pointers or comments on how to model this an how to > > aggregate it? > > > > Thanks > > Dilip > > > > > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83