Improve DB Size / Performance with Table Refactoring - Mailing list pgsql-performance

From Anthony Presley
Subject Improve DB Size / Performance with Table Refactoring
Date
Msg-id CALL+CapPBoCBCQz6Cb9PfCj6EvRLp5g5Wkiay8zBiirO+NVLjQ@mail.gmail.com
Whole thread Raw
List pgsql-performance
Hi there!

We currently have a database table that's laid out something like this:
  id int
  date1 date
  belongs_to date
  type varchar(1)
  type_fk int
  start_time time
  end_time time
  location_fk int
  department_fk int
  value decimal

Where each row represents some data throughout the day (96 data points for each 15-minute period) - and each "type_fk", department, and location can have up to say, 3 rows for a given start / end time and date (based on the "type").

This table has rapidly grown - we're adding about 1 - 2 million rows per month - and almost all of our queries actually sum up the values based on the belongs_to date and the location_id, however, for other statistics we need to keep the values separate.  The db is now more than 60% of our database, and we want to come up with a better way to store it.  (To speed up other queries, we actually roll this table up into a daily table).

We're considering changing the structure of this table into one of the following structures:

Option [A]:
  id int
  date1 date
  belongs_to date
  type_fk int
  location_fk int
  department_fk int
  value_type1_0 decimal
  .... 
  value_type1_96 decimal
  value_type2_0 decimal
  .... 
  value_type2_96 decimal
  value_type3_0 decimal
  .... 
  value_type3_96 decimal

or, as an alternative:

Option [B]:
  id int
  date1 date
  belongs_to date
  type varchar(1)
  type_fk int
  location_fk int
  department_fk int
  value_type_0 decimal
  .... 
  value_type_96 decimal

We're having a hard time choosing between the two options.  We'll definitely partition either one by the date or belongs_to column to speed up the queries.

Option A would mean that any given date would only have a single row, with all three "types".  However, this table would have 6+96*3 columns, and in many cases at least 96 of those columns would be empty.  More often than not, however, at least half of the columns would be empty (most location's aren't open all day).

Option B would only create rows if the type had data in it, but the other 6 columns would be redundant.  Again, many of the columns might be empty.

... From a space / size perspective, which option is a better choice?

How does PostgreSQL handle storing empty columns?

Thanks!

--
Anthony

pgsql-performance by date:

Previous
From: Matheus de Oliveira
Date:
Subject: Re: Is drop/restore trigger transactional?
Next
From: Jeff Janes
Date:
Subject: Re: Postgres 9.1.4 - high stats collector IO usage