Thread: What's the cost of a few extra columns?

What's the cost of a few extra columns?

From
"Announce"
Date:
What's goin on pg-people?

I have a table PRODUCTIONS that is central to the DB and ties a lot of other
information together:

PRODUCTIONS (table)
----------------------------------
prod_id        primary key
type_id        foreign key
level_id        foreign key
tour_id        foreign key
show_id        foreign key
venue_id        foreign key
title            varchar(255); not null indexed
version        char;
details        text
open_date        date
close_date        date
preview_open    date
preview_close    date
perform_tot        int
preview_tot        int
park_info        text
phone_nos        text
some_other_info    text
seating_info    text
this            text
that            text
create_tstmp    timestamptz; NOW()
mod_tstmp        timestamptz;triggered
delete_tstmp    timestamptz;default null
is_complete        bool


As it stands now, there are approximately 25-30 columns on the table. Since
this table is very central to the database, would it be more efficient to
break some of the columns (especially the TEXT ones) out into a separate
INFO table since some queries on the web will not care about all of these
text columns anyway? I know that pg can handle A LOT more columns and if
there IS no performance hit for keeping them all on the same table, I would
like to do that because the relation between PRODUCTIONS and the INFO will
always be 1-to-1.

My implementation of this INFO table would look a little somethin' like
this:

PROD_INFO (table)
-------------------------------
prod_id        pkey/fkey
open_date        date
close_date        date
preview_open    date
preview_close    date
perform_tot        int
preview_tot        int
park_info        text
phone_nos        text
some_other_info    text
seating_info    text
this            text
that            text
(the rest would stay in in the original PRODUCTIONS table)


I am open to ANY suggestions, criticisms, mockery, etc.

Thanks,

Aaron


Re: What's the cost of a few extra columns?

From
"Jim C. Nasby"
Date:
What you're describing is known as vertical partitioning (think of
splitting a table vertically), and can be a good technique for
increasing performance when used properly. The key is to try and get the
average row size down, since that means more rows per page which means
less I/O. Some things to consider:

First rule of performance tuning: don't. In other words, you should be
able to verify with benchmark numbers that a) you need to do this and b)
how much it's actually helping.

How will splitting the table affect *_tstmp, especially mod_tstmp?

How will you handle inserts and joining these two tables together? Will
you always do a left join (preferably via a view), or will you have a
trigger/rule that inserts into production_info whenever a row is
inserted into productions?

On Sun, Oct 09, 2005 at 10:03:33PM -0500, Announce wrote:
> What's goin on pg-people?
>
> I have a table PRODUCTIONS that is central to the DB and ties a lot of other
> information together:
>
> PRODUCTIONS (table)
> ----------------------------------
> prod_id        primary key
> type_id        foreign key
> level_id        foreign key
> tour_id        foreign key
> show_id        foreign key
> venue_id        foreign key
> title            varchar(255); not null indexed
> version        char;
> details        text
> open_date        date
> close_date        date
> preview_open    date
> preview_close    date
> perform_tot        int
> preview_tot        int
> park_info        text
> phone_nos        text
> some_other_info    text
> seating_info    text
> this            text
> that            text
> create_tstmp    timestamptz; NOW()
> mod_tstmp        timestamptz;triggered
> delete_tstmp    timestamptz;default null
> is_complete        bool
>
>
> As it stands now, there are approximately 25-30 columns on the table. Since
> this table is very central to the database, would it be more efficient to
> break some of the columns (especially the TEXT ones) out into a separate
> INFO table since some queries on the web will not care about all of these
> text columns anyway? I know that pg can handle A LOT more columns and if
> there IS no performance hit for keeping them all on the same table, I would
> like to do that because the relation between PRODUCTIONS and the INFO will
> always be 1-to-1.
>
> My implementation of this INFO table would look a little somethin' like
> this:
>
> PROD_INFO (table)
> -------------------------------
> prod_id        pkey/fkey
> open_date        date
> close_date        date
> preview_open    date
> preview_close    date
> perform_tot        int
> preview_tot        int
> park_info        text
> phone_nos        text
> some_other_info    text
> seating_info    text
> this            text
> that            text
> (the rest would stay in in the original PRODUCTIONS table)
>
>
> I am open to ANY suggestions, criticisms, mockery, etc.
>
> Thanks,
>
> Aaron
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: What's the cost of a few extra columns?

From
"Announce"
Date:
Thanks a lot.

Well, if I'm understanding you correctly, then doing the vertical splitting
for some of the text columns WOULD decrease the average row size returned in
my slimmer PRODUCTIONS table. I don't plan on using any of the "prod_info"
columns in a WHERE clause (except open_date and close_date now that I think
of it so they would stay in the original table).

There will be a lot of queries where I just want to return quick pri-key,
prod_name and prod_date results from a PRODUCTION search.  Then, there would
be a detail query that would then need all of the PRODUCTION and INFO data
for a single row.

Thanks again,

Aaron
-----Original Message-----
Subject: Re: [PERFORM] What's the cost of a few extra columns?

What you're describing is known as vertical partitioning (think of
splitting a table vertically), and can be a good technique for
increasing performance when used properly. The key is to try and get the
average row size down, since that means more rows per page which means
less I/O. Some things to consider:

First rule of performance tuning: don't. In other words, you should be
able to verify with benchmark numbers that a) you need to do this and b)
how much it's actually helping.

How will splitting the table affect *_tstmp, especially mod_tstmp?

How will you handle inserts and joining these two tables together? Will
you always do a left join (preferably via a view), or will you have a
trigger/rule that inserts into production_info whenever a row is
inserted into productions?

On Sun, Oct 09, 2005 at 10:03:33PM -0500, Announce wrote: