Thread: What's the cost of a few extra columns?
What's goin on pg-people? I have a table PRODUCTIONS that is central to the DB and ties a lot of other information together: PRODUCTIONS (table) ---------------------------------- prod_id primary key type_id foreign key level_id foreign key tour_id foreign key show_id foreign key venue_id foreign key title varchar(255); not null indexed version char; details text open_date date close_date date preview_open date preview_close date perform_tot int preview_tot int park_info text phone_nos text some_other_info text seating_info text this text that text create_tstmp timestamptz; NOW() mod_tstmp timestamptz;triggered delete_tstmp timestamptz;default null is_complete bool As it stands now, there are approximately 25-30 columns on the table. Since this table is very central to the database, would it be more efficient to break some of the columns (especially the TEXT ones) out into a separate INFO table since some queries on the web will not care about all of these text columns anyway? I know that pg can handle A LOT more columns and if there IS no performance hit for keeping them all on the same table, I would like to do that because the relation between PRODUCTIONS and the INFO will always be 1-to-1. My implementation of this INFO table would look a little somethin' like this: PROD_INFO (table) ------------------------------- prod_id pkey/fkey open_date date close_date date preview_open date preview_close date perform_tot int preview_tot int park_info text phone_nos text some_other_info text seating_info text this text that text (the rest would stay in in the original PRODUCTIONS table) I am open to ANY suggestions, criticisms, mockery, etc. Thanks, Aaron
What you're describing is known as vertical partitioning (think of splitting a table vertically), and can be a good technique for increasing performance when used properly. The key is to try and get the average row size down, since that means more rows per page which means less I/O. Some things to consider: First rule of performance tuning: don't. In other words, you should be able to verify with benchmark numbers that a) you need to do this and b) how much it's actually helping. How will splitting the table affect *_tstmp, especially mod_tstmp? How will you handle inserts and joining these two tables together? Will you always do a left join (preferably via a view), or will you have a trigger/rule that inserts into production_info whenever a row is inserted into productions? On Sun, Oct 09, 2005 at 10:03:33PM -0500, Announce wrote: > What's goin on pg-people? > > I have a table PRODUCTIONS that is central to the DB and ties a lot of other > information together: > > PRODUCTIONS (table) > ---------------------------------- > prod_id primary key > type_id foreign key > level_id foreign key > tour_id foreign key > show_id foreign key > venue_id foreign key > title varchar(255); not null indexed > version char; > details text > open_date date > close_date date > preview_open date > preview_close date > perform_tot int > preview_tot int > park_info text > phone_nos text > some_other_info text > seating_info text > this text > that text > create_tstmp timestamptz; NOW() > mod_tstmp timestamptz;triggered > delete_tstmp timestamptz;default null > is_complete bool > > > As it stands now, there are approximately 25-30 columns on the table. Since > this table is very central to the database, would it be more efficient to > break some of the columns (especially the TEXT ones) out into a separate > INFO table since some queries on the web will not care about all of these > text columns anyway? I know that pg can handle A LOT more columns and if > there IS no performance hit for keeping them all on the same table, I would > like to do that because the relation between PRODUCTIONS and the INFO will > always be 1-to-1. > > My implementation of this INFO table would look a little somethin' like > this: > > PROD_INFO (table) > ------------------------------- > prod_id pkey/fkey > open_date date > close_date date > preview_open date > preview_close date > perform_tot int > preview_tot int > park_info text > phone_nos text > some_other_info text > seating_info text > this text > that text > (the rest would stay in in the original PRODUCTIONS table) > > > I am open to ANY suggestions, criticisms, mockery, etc. > > Thanks, > > Aaron > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Thanks a lot. Well, if I'm understanding you correctly, then doing the vertical splitting for some of the text columns WOULD decrease the average row size returned in my slimmer PRODUCTIONS table. I don't plan on using any of the "prod_info" columns in a WHERE clause (except open_date and close_date now that I think of it so they would stay in the original table). There will be a lot of queries where I just want to return quick pri-key, prod_name and prod_date results from a PRODUCTION search. Then, there would be a detail query that would then need all of the PRODUCTION and INFO data for a single row. Thanks again, Aaron -----Original Message----- Subject: Re: [PERFORM] What's the cost of a few extra columns? What you're describing is known as vertical partitioning (think of splitting a table vertically), and can be a good technique for increasing performance when used properly. The key is to try and get the average row size down, since that means more rows per page which means less I/O. Some things to consider: First rule of performance tuning: don't. In other words, you should be able to verify with benchmark numbers that a) you need to do this and b) how much it's actually helping. How will splitting the table affect *_tstmp, especially mod_tstmp? How will you handle inserts and joining these two tables together? Will you always do a left join (preferably via a view), or will you have a trigger/rule that inserts into production_info whenever a row is inserted into productions? On Sun, Oct 09, 2005 at 10:03:33PM -0500, Announce wrote: