Re: Approach to Data Summary and Analysis - Mailing list pgsql-general

From Rob Sargent
Subject Re: Approach to Data Summary and Analysis
Date
Msg-id 534D5800.1060400@gmail.com
Whole thread Raw
In response to Re: Approach to Data Summary and Analysis  (Robert DiFalco <robert.difalco@gmail.com>)
List pgsql-general
On 04/15/2014 09:53 AM, Robert DiFalco wrote:
> Actually that was exactly the initial table design. There were more
> fields because for my use case there were a lot more states and
> certain states have additional data (for example when a call goes from
> answered to connected it also gets the user_id of the person being
> connected to). So that one table started getting a LOT of columns
> which starting making it hard to reason about.
>
> The more normalized version has a couple of things going for it.
> COUNT, MIN, MAX, etc are very fast because I don't have to
> conditionally add null checks. Everything is inserted so for the
> millions of calls that get made the normalized schema was much more
> efficient for writing. It was also easier to understand. The answer
> table only has calls that were answered, the error table only has
> calls the resulted in an error after being connected, etc.
>
> I know this kind of gets into a religious area when discussing NULLs
> and what level of normalization is appropriate so I don't want to
> spark any of that on this thread. But only doing inserts and never
> doing updates or deletes performed very well for large data sets.
>
> That said, I could explore a compromise between the monolithic table
> approach and the completely normalized set of tables approach. Thanks
> for your input!
>
I wonder if the "LOT of columns" are the bits that need to be parcelled
off as specific to one condition of a call?


pgsql-general by date:

Previous
From: Robert DiFalco
Date:
Subject: Re: Approach to Data Summary and Analysis
Next
From: David G Johnston
Date:
Subject: Re: [GENARAL] round() bug