Home > mailing lists

Improve DB Size / Performance with Table Refactoring - Mailing list pgsql-performance

From	Anthony Presley
Subject	Improve DB Size / Performance with Table Refactoring
Date	August 11, 2012 14:23:52
Msg-id	CALL+CapPBoCBCQz6Cb9PfCj6EvRLp5g5Wkiay8zBiirO+NVLjQ@mail.gmail.com Whole thread Raw
List	pgsql-performance

Tree view

Hi there!

We currently have a database table that's laid out something like this:

id int

date1 date

belongs_to date

type varchar(1)

type_fk int

start_time time

end_time time

location_fk int

department_fk int

value decimal

Where each row represents some data throughout the day (96 data points for each 15-minute period) - and each "type_fk", department, and location can have up to say, 3 rows for a given start / end time and date (based on the "type").

This table has rapidly grown - we're adding about 1 - 2 million rows per month - and almost all of our queries actually sum up the values based on the belongs_to date and the location_id, however, for other statistics we need to keep the values separate. The db is now more than 60% of our database, and we want to come up with a better way to store it. (To speed up other queries, we actually roll this table up into a daily table).

We're considering changing the structure of this table into one of the following structures:

Option [A]:

id int

date1 date

belongs_to date

type_fk int

location_fk int

department_fk int

value_type1_0 decimal

....

value_type1_96 decimal

value_type2_0 decimal

....

value_type2_96 decimal

value_type3_0 decimal

....

value_type3_96 decimal

or, as an alternative:

Option [B]:

id int

date1 date

belongs_to date

type varchar(1)

type_fk int

location_fk int

department_fk int

value_type_0 decimal

....

value_type_96 decimal

We're having a hard time choosing between the two options. We'll definitely partition either one by the date or belongs_to column to speed up the queries.

Option A would mean that any given date would only have a single row, with all three "types". However, this table would have 6+96*3 columns, and in many cases at least 96 of those columns would be empty. More often than not, however, at least half of the columns would be empty (most location's aren't open all day).

Option B would only create rows if the type had data in it, but the other 6 columns would be redundant. Again, many of the columns might be empty.

... From a space / size perspective, which option is a better choice?

How does PostgreSQL handle storing empty columns?

Thanks!

Anthony

pgsql-performance by date:

From: Matheus de Oliveira
Date: 11 August 2012, 14:16:06
Subject: Re: Is drop/restore trigger transactional?

From: Jeff Janes
Date: 12 August 2012, 21:23:14
Subject: Re: Postgres 9.1.4 - high stats collector IO usage

Improve DB Size / Performance with Table Refactoring - Mailing list pgsql-performance

Previous

Next