Thread: How number of columns affects performance
If a table which will be heavily used has numerous fields, yet only a handfull of them will be used heavily, would it make sense performance wise to split it? Example Table 1 Field 1 .... Field 100 Table 2 References Field 1 of table1 ..... Table n References Field 1 of table 1 So table 1 basically will be referenced by many tables and most of the time only a handfull of fields of table 1 are needed. Don't have exact numbers, but let's say that more than 60% of queries to table 1 queries only use 20 fields or less. If I split Table 1 then the second table will basically be a 1 to 1 to Table 1. I have this simmilar scenario for two tables. One is close to 1 Million records and the other is about 300,000 records. Programming wise it is much easier to only have one table, but I am just concerned about performance. Most access to these tables will be indexed with some occassional sequential scans. Number of concurrent users now is probably 10 or less. Expect to grow to 20+ concurrent connections. Will this be more of an issue if I had hundreds/thousands of users?
On Fri, 2003-08-01 at 11:08, Francisco Reyes wrote: > If a table which will be heavily used has numerous fields, yet only a > handfull of them will be used heavily, would it make sense performance wise to split it? > > Example > Table 1 > Field 1 > .... > Field 100 > > Table 2 > References Field 1 of table1 > ..... > > Table n > References Field 1 of table 1 > > So table 1 basically will be referenced by many tables and most of the > time only a handfull of fields of table 1 are needed. Don't have exact > numbers, but let's say that more than 60% of queries to table 1 queries > only use 20 fields or less. > > If I split Table 1 then the second table will basically be a 1 to 1 to > Table 1. Do all 100 fields *really* all refer to the same *one* entity, with no repeating values, etc? If not, then good design says to split the table. Also, if it's a high-activity table, but you only rarely need fields 60-90, then splitting them out to their own table might be useful (especially if some of those fields are large *CHAR or TEXT). > I have this simmilar scenario for two tables. One is close to 1 Million > records and the other is about 300,000 records. > > Programming wise it is much easier to only have one table, but I am just > concerned about performance. > > Most access to these tables will be indexed with some occassional > sequential scans. Number of concurrent users now is probably 10 or less. > Expect to grow to 20+ concurrent connections. Will this be more of an > issue if I had hundreds/thousands of users? > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
On Fri, 1 Aug 2003, Ron Johnson wrote: > Do all 100 fields *really* all refer to the same *one* entity, > with no repeating values, etc? Yes all fields belong to the same entity. I used 100 as an example it may be something like 60 to 80 fields (there are two tables in question). I don't formally do 3rd normal form, but for the most part I do most of the general concepts of normalization. > If not, then good design says to split the table. The original data was in Foxpro tables and I have made better normalized tables in PostgreSQL. > Also, if it's a high-activity table, but you only rarely need fields > 60-90, then splitting them out to their own table might be useful > (especially if some of those fields are large *CHAR or TEXT). Yes some of the fields are varchars. 5 fields are varchar(22) and 3 longer (35, 58, 70). The total row length is a little over 400 characters in Foxpro. In postgreSQL may be less than 300 (ie Foxpro uses ASCII representation for numbers so to store "1234567" it uses 7 bytes, whereas in PostgreSQL I can just make it an int and use 4 bytes)
On Fri, 2003-08-01 at 12:14, Francisco J Reyes wrote: > On Fri, 1 Aug 2003, Ron Johnson wrote: > > > Do all 100 fields *really* all refer to the same *one* entity, > > with no repeating values, etc? > > Yes all fields belong to the same entity. I used 100 as an example it may > be something like 60 to 80 fields (there are two tables in question). I > don't formally do 3rd normal form, but for the most part I do most of > the general concepts of normalization. Woo hoo!! > Yes some of the fields are varchars. 5 fields are varchar(22) and 3 longer > (35, 58, 70). The total row length is a little over 400 characters in > Foxpro. In postgreSQL may be less than 300 (ie Foxpro uses ASCII > representation for numbers so to store "1234567" it uses 7 bytes, whereas > in PostgreSQL I can just make it an int and use 4 bytes) But I'd only split if these big field are rarely used. Note that VARCHAR(xx) removes trailing spaces, so that also is a factor. -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
Francisco, > Yes all fields belong to the same entity. I used 100 as an example it may > be something like 60 to 80 fields (there are two tables in question). I > don't formally do 3rd normal form, but for the most part I do most of > the general concepts of normalization. > > > If not, then good design says to split the table. Actually, no, it doesn't. If all 60-80 fields are unitary and required characteristics of the row-entity, normalization says keep them in one table. The only time NF would recommend splitting the table is for fields which are frequenly NULL for reasons other than missing data entry. For those, you'd create a child table. Although while this is good 4NF, it's impractical in PostgreSQL, where queries with several LEFT OUTER JOINs tend to be very slow indeed. My attitude toward these normalization vs. performance issues is consistenly the same: First, verify that you have a problem. That is, build the database with everything in one table (or with child tables for Nullable fields, as above) and try to run your application. If performance is appalling, *then* take denormalization steps to improve it. I'm frequently distressed by the number of developers who make questionable design decisions "for performance reasons" without every verifying that they were, in fact, improving performance ... -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, 2003-08-01 at 12:44, Josh Berkus wrote: > Francisco, > > > Yes all fields belong to the same entity. I used 100 as an example it may > > be something like 60 to 80 fields (there are two tables in question). I > > don't formally do 3rd normal form, but for the most part I do most of > > the general concepts of normalization. > > > > > If not, then good design says to split the table. > > Actually, no, it doesn't. If all 60-80 fields are unitary and required > characteristics of the row-entity, normalization says keep them in one table. You snipped out too much, because that's exactly what I said... Another way of writing it: only split the table if some of the fields are not unitary to the entity. > The only time NF would recommend splitting the table is for fields which are > frequenly NULL for reasons other than missing data entry. For those, you'd > create a child table. Although while this is good 4NF, it's impractical in > PostgreSQL, where queries with several LEFT OUTER JOINs tend to be very slow > indeed. Good to know. > My attitude toward these normalization vs. performance issues is consistenly > the same: First, verify that you have a problem. That is, build the > database with everything in one table (or with child tables for Nullable > fields, as above) and try to run your application. If performance is > appalling, *then* take denormalization steps to improve it. The OP was not talking about denormalizing ... It was: will vertically partitioning a table increase performance. And the answer is "sometimes", -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
Ron, > You snipped out too much, because that's exactly what I said... > Another way of writing it: only split the table if some of the fields > are not unitary to the entity. Sorry! No offense meant. -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, 1 Aug 2003, Josh Berkus wrote: > My attitude toward these normalization vs. performance issues is consistenly > the same: First, verify that you have a problem. That is, build the > database with everything in one table (or with child tables for Nullable > fields, as above) and try to run your application. If performance is > appalling, *then* take denormalization steps to improve it. I think I understand your point, however it would be very laborious after you do all development to find out you need to de-normalize. On your experience at which point it would actually help to do this de-normalization in PostgreSQL? I know there are numerous factors ,but any feedback based on previous experiences would help. Right now the work I am doing is only for company internal use. If I was to ever do work that outside users would have access to then I would be looking at 100+ concurrent users.
Francisco, > I think I understand your point, however it would be very laborious after > you do all development to find out you need to de-normalize. Not terribly. Views and Rules are good for this. > On your experience at which point it would actually help to do this > de-normalization in PostgreSQL? I know there are numerous factors ,but any > feedback based on previous experiences would help. My experience? If you're running on good hardware, it's completely unnecessary to vertically partition the table. The only thing I'd do would be to look for columns which are frequently NULL and can be grouped together, and spin those off into a sub-table. That is, if you have 4 columns which are generally either all null or all filled, and are all null for 70% of records then those 4 could make a nice child table. -- -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco