Thread: Aggregation question

Aggregation question

From
"Dilip Angal"
Date:
Hi
 
I have a situation that I need flexible number columns to model the business requirements. It could go up to 1000 columns and will be a sparse matrix. One option I was considering was to have a table with col1......col1000
Other option I can consider is store all of them as name values  in single column as  a string
 
option 1
col1  | col2  |col 3 |........|col56|.....|col77|
10    | 2     |      |        | 4   |     |     |
2     |       |  4   |        |     |.... |6    |
 
option 2
 
"col1=10,col2=2,col56=4"
"col1=2,col3=4,col77=6"
 
I will have about 50Million such records in a table.
 
I need to aggregate the column values for a given day by taking sum of all the rows
 
"col1=12,col2=2,col3=4,col56=4,col77=6"
 
Second option looks very elegant but aggregation is hard
Besides, second option may have performance penalty (Especially when you have 50 Million records)
 
Any one can give any pointers or comments on how to model this an how to aggregate it?
 
Thanks
Dilip
 
 

Re: Aggregation question

From
Hans-Jürgen Schönig
Date:
if you want to put 1000 columns into one table, your data structure 
needs some further investigation. you are trying to solve the wrong problem.
Regards,
    Hans


Dilip Angal wrote:
> Hi
>  
> I have a situation that I need flexible number columns to model the 
> business requirements. It could go up to 1000 columns and will be a 
> sparse matrix. One option I was considering was to have a table with 
> col1......col1000
> Other option I can consider is store all of them as name values  in 
> single column as  a string
>  
> option 1
> col1  | col2  |col 3 |........|col56|.....|col77|
> 10    | 2     |      |        | 4   |     |     |
> 2     |       |  4   |        |     |.... |6    |
>  
> option 2
>  
> "col1=10,col2=2,col56=4"
> "col1=2,col3=4,col77=6"
>  
> I will have about 50Million such records in a table.
>  
> I need to aggregate the column values for a given day by taking sum of 
> all the rows
>  
> "col1=12,col2=2,col3=4,col56=4,col77=6"
>  
> Second option looks very elegant but aggregation is hard
> Besides, second option may have performance penalty (Especially when you 
> have 50 Million records)
>  
> Any one can give any pointers or comments on how to model this an how to 
> aggregate it?
>  
> Thanks
> Dilip
>  
>  


-- 
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at



Re: Aggregation question

From
Bruno Wolff III
Date:
On Tue, Feb 03, 2004 at 21:27:48 -0800,
  Dilip Angal <dilip_angal@yahoo.com> wrote:
> Hi
>
> I have a situation that I need flexible number columns to model the business requirements. It could go up to 1000
columnsand will be a sparse matrix. One option I was considering was to have a table with col1......col1000  

I don't think this is really a hackers' topic. Probably it belongs on general,
though with the sizes you are talking about performance might be appropiate.

I would think that the natural way to reprensent a sparse matrix would be
to use a three column table with row number, column number, and cell value.
Doing aggregates on a column number will be easy.

If the data isn't really a matrix and the different cells are different
data types, you may be able to store all the values as strings and convert
to the appropiate type as needed.

> Other option I can consider is store all of them as name values  in single column as  a string
>
> option 1
> col1  | col2  |col 3 |........|col56|.....|col77|
> 10    | 2     |      |        | 4   |     |     |
> 2     |       |  4   |        |     |.... |6    |
>
> option 2
>
> "col1=10,col2=2,col56=4"
> "col1=2,col3=4,col77=6"
>
> I will have about 50Million such records in a table.
>
> I need to aggregate the column values for a given day by taking sum of all the rows
>
> "col1=12,col2=2,col3=4,col56=4,col77=6"
>
> Second option looks very elegant but aggregation is hard
> Besides, second option may have performance penalty (Especially when you have 50 Million records)
>
> Any one can give any pointers or comments on how to model this an how to aggregate it?
>
> Thanks
> Dilip
>

Re: Aggregation question

From
Oleg Bartunov
Date:
On Sat, 7 Feb 2004, [ISO-8859-1] Hans-J?rgen Sch?nig wrote:

> if you want to put 1000 columns into one table, your data structure
> needs some further investigation. you are trying to solve the wrong problem.

Dilip,

you may try our contrib/hstore from http://www.sai.msu.su/~megera/postgres/gist/
which is sort of perl hash and could provide you some flexibility.
But you may indeed need to think if you really needed such structure :)
Oleg

>
>     Regards,
>
>         Hans
>
>
> Dilip Angal wrote:
> > Hi
> >
> > I have a situation that I need flexible number columns to model the
> > business requirements. It could go up to 1000 columns and will be a
> > sparse matrix. One option I was considering was to have a table with
> > col1......col1000
> > Other option I can consider is store all of them as name values  in
> > single column as  a string
> >
> > option 1
> > col1  | col2  |col 3 |........|col56|.....|col77|
> > 10    | 2     |      |        | 4   |     |     |
> > 2     |       |  4   |        |     |.... |6    |
> >
> > option 2
> >
> > "col1=10,col2=2,col56=4"
> > "col1=2,col3=4,col77=6"
> >
> > I will have about 50Million such records in a table.
> >
> > I need to aggregate the column values for a given day by taking sum of
> > all the rows
> >
> > "col1=12,col2=2,col3=4,col56=4,col77=6"
> >
> > Second option looks very elegant but aggregation is hard
> > Besides, second option may have performance penalty (Especially when you
> > have 50 Million records)
> >
> > Any one can give any pointers or comments on how to model this an how to
> > aggregate it?
> >
> > Thanks
> > Dilip
> >
> >
>
>
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83