Thread: How number of columns affects performance

How number of columns affects performance

From
Francisco Reyes
Date:
If a table which will be heavily used has numerous fields, yet only a
handfull of them will be used heavily, would it make sense performance wise to split it?

Example
Table 1
Field 1
....
Field 100

Table 2
References Field 1 of table1
.....

Table n
References Field 1 of table 1

So table 1 basically will be referenced by many tables and most of the
time only a handfull of fields  of table 1 are needed. Don't have exact
numbers, but let's say that more than 60% of queries to table 1 queries
only use 20 fields or less.

If I split Table 1 then the second table will basically be a 1 to 1 to
Table 1.

I have this simmilar scenario for two tables. One is close to 1 Million
records and the other is about 300,000 records.

Programming wise it is much easier to only have one table, but I am just
concerned about performance.

Most access to these tables will be indexed with some occassional
sequential scans. Number of concurrent users now is probably 10 or less.
Expect to grow to 20+ concurrent connections. Will this be more of an
issue if I had hundreds/thousands of users?

Re: How number of columns affects performance

From
Ron Johnson
Date:
On Fri, 2003-08-01 at 11:08, Francisco Reyes wrote:
> If a table which will be heavily used has numerous fields, yet only a
> handfull of them will be used heavily, would it make sense performance wise to split it?
>
> Example
> Table 1
> Field 1
> ....
> Field 100
>
> Table 2
> References Field 1 of table1
> .....
>
> Table n
> References Field 1 of table 1
>
> So table 1 basically will be referenced by many tables and most of the
> time only a handfull of fields  of table 1 are needed. Don't have exact
> numbers, but let's say that more than 60% of queries to table 1 queries
> only use 20 fields or less.
>
> If I split Table 1 then the second table will basically be a 1 to 1 to
> Table 1.

Do all 100 fields *really* all refer to the same *one* entity,
with no repeating values, etc?
If not, then good design says to split the table.

Also, if it's a high-activity table, but you only rarely need fields
60-90, then splitting them out to their own table might be useful
(especially if some of those fields are large *CHAR or TEXT).

> I have this simmilar scenario for two tables. One is close to 1 Million
> records and the other is about 300,000 records.
>
> Programming wise it is much easier to only have one table, but I am just
> concerned about performance.
>
> Most access to these tables will be indexed with some occassional
> sequential scans. Number of concurrent users now is probably 10 or less.
> Expect to grow to 20+ concurrent connections. Will this be more of an
> issue if I had hundreds/thousands of users?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+



Re: How number of columns affects performance

From
Francisco J Reyes
Date:
On Fri, 1 Aug 2003, Ron Johnson wrote:

> Do all 100 fields *really* all refer to the same *one* entity,
> with no repeating values, etc?

Yes all fields belong to the same entity. I used 100 as an example it may
be something like 60 to 80 fields (there are two tables in question). I
don't formally do 3rd normal form, but for the most part I do most of
the general concepts of normalization.

> If not, then good design says to split the table.

The original data was in Foxpro tables and I have made better normalized
tables in PostgreSQL.


> Also, if it's a high-activity table, but you only rarely need fields
> 60-90, then splitting them out to their own table might be useful
> (especially if some of those fields are large *CHAR or TEXT).

Yes some of the fields are varchars. 5 fields are varchar(22) and 3 longer
(35, 58, 70). The total row length is a little over 400 characters in
Foxpro. In postgreSQL may be less than 300 (ie Foxpro uses ASCII
representation for numbers so to store "1234567" it uses 7 bytes, whereas
in PostgreSQL I can just make it an int and use 4 bytes)

Re: How number of columns affects performance

From
Ron Johnson
Date:
On Fri, 2003-08-01 at 12:14, Francisco J Reyes wrote:
> On Fri, 1 Aug 2003, Ron Johnson wrote:
>
> > Do all 100 fields *really* all refer to the same *one* entity,
> > with no repeating values, etc?
>
> Yes all fields belong to the same entity. I used 100 as an example it may
> be something like 60 to 80 fields (there are two tables in question). I
> don't formally do 3rd normal form, but for the most part I do most of
> the general concepts of normalization.

Woo hoo!!

> Yes some of the fields are varchars. 5 fields are varchar(22) and 3 longer
> (35, 58, 70). The total row length is a little over 400 characters in
> Foxpro. In postgreSQL may be less than 300 (ie Foxpro uses ASCII
> representation for numbers so to store "1234567" it uses 7 bytes, whereas
> in PostgreSQL I can just make it an int and use 4 bytes)

But I'd only split if these big field are rarely used.  Note that
VARCHAR(xx) removes trailing spaces, so that also is a factor.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+



Re: How number of columns affects performance

From
Josh Berkus
Date:
Francisco,

> Yes all fields belong to the same entity. I used 100 as an example it may
> be something like 60 to 80 fields (there are two tables in question). I
> don't formally do 3rd normal form, but for the most part I do most of
> the general concepts of normalization.
>
> > If not, then good design says to split the table.

Actually, no, it doesn't.   If all 60-80 fields are unitary and required
characteristics of the row-entity, normalization says keep them in one table.

The only time NF would recommend splitting the table is for fields which are
frequenly NULL for reasons other than missing data entry.  For those, you'd
create a child table.  Although while this is good 4NF, it's impractical in
PostgreSQL, where queries with several LEFT OUTER JOINs tend to be very slow
indeed.

My attitude toward these normalization vs. performance issues is consistenly
the same:  First, verify that you have a problem.   That is, build the
database with everything in one table (or with child tables for Nullable
fields, as above) and try to run your application.  If performance is
appalling, *then* take denormalization steps to improve it.

I'm frequently distressed by the number of developers who make questionable
design decisions "for performance reasons" without every verifying that they
were, in fact, improving performance ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: How number of columns affects performance

From
Ron Johnson
Date:
On Fri, 2003-08-01 at 12:44, Josh Berkus wrote:
> Francisco,
>
> > Yes all fields belong to the same entity. I used 100 as an example it may
> > be something like 60 to 80 fields (there are two tables in question). I
> > don't formally do 3rd normal form, but for the most part I do most of
> > the general concepts of normalization.
> >
> > > If not, then good design says to split the table.
>
> Actually, no, it doesn't.   If all 60-80 fields are unitary and required
> characteristics of the row-entity, normalization says keep them in one table.

You snipped out too much, because that's exactly what I said...
Another way of writing it: only split the table if some of the fields
are not unitary to the entity.

> The only time NF would recommend splitting the table is for fields which are
> frequenly NULL for reasons other than missing data entry.  For those, you'd
> create a child table.  Although while this is good 4NF, it's impractical in
> PostgreSQL, where queries with several LEFT OUTER JOINs tend to be very slow
> indeed.

Good to know.

> My attitude toward these normalization vs. performance issues is consistenly
> the same:  First, verify that you have a problem.   That is, build the
> database with everything in one table (or with child tables for Nullable
> fields, as above) and try to run your application.  If performance is
> appalling, *then* take denormalization steps to improve it.

The OP was not talking about denormalizing ...

It was: will vertically partitioning a table increase performance.
And the answer is "sometimes",

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+



Re: How number of columns affects performance

From
Josh Berkus
Date:
Ron,

> You snipped out too much, because that's exactly what I said...
> Another way of writing it: only split the table if some of the fields
> are not unitary to the entity.

Sorry!  No offense meant.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: How number of columns affects performance

From
Francisco J Reyes
Date:
On Fri, 1 Aug 2003, Josh Berkus wrote:

> My attitude toward these normalization vs. performance issues is consistenly
> the same:  First, verify that you have a problem.   That is, build the
> database with everything in one table (or with child tables for Nullable
> fields, as above) and try to run your application.  If performance is
> appalling, *then* take denormalization steps to improve it.

I think I understand your point, however it would be very laborious after
you do all development to find out you need to de-normalize.

On your experience at which point it would actually help to do this
de-normalization in PostgreSQL? I know there are numerous factors ,but any
feedback based on previous experiences would help.

Right now the work I am doing is only for company internal use. If I was
to ever do work that outside users would have access to then I would be
looking at 100+ concurrent users.

Re: How number of columns affects performance

From
Josh Berkus
Date:
Francisco,

> I think I understand your point, however it would be very laborious after
> you do all development to find out you need to de-normalize.

Not terribly.   Views and Rules are good for this.

> On your experience at which point it would actually help to do this
> de-normalization in PostgreSQL? I know there are numerous factors ,but any
> feedback based on previous experiences would help.

My experience?   If you're running on good hardware, it's completely
unnecessary to vertically partition the table.   The only thing I'd do would
be to look for columns which are frequently NULL and can be grouped together,
and spin those off into a sub-table.   That is, if you have 4 columns which
are generally either all null or all filled, and are all null for 70% of
records then those 4 could make a nice child table.

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                        Josh Berkus
   Complete information technology     josh@agliodbs.com
    and data management solutions     (415) 565-7293
   for law firms, small businesses      fax 621-2533
    and non-profit organizations.     San Francisco