Re: Database design - best practice - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Database design - best practice
Date
Msg-id CAMkU=1yCWLchP-F2MvDjyjTJwJv4XASq_dU0Afnjm6-Gt04SNg@mail.gmail.com
Whole thread Raw
In response to Database design - best practice  (Niels Kristian Schjødt <nielskristian@autouncle.com>)
List pgsql-performance
On Wed, Nov 28, 2012 at 4:41 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
>
> So my main concern is actually about the cars table, since this one currently has a lot of columns (151 - I expect
thatsquite a lot?), and a lot of data (4 mil. rows, and growing). Now you might start by thinking, this could sound
likea regular need for some normalization, but wait a second and let me explain :-) 

If you have 151 single-valued pieces of information, than that is what
you have.  You can't tell if something is normalized or not by
counting the columns.

> The columns in this table is for the most very short stings, integers, decimals or booleans. So take for an example
has_automatic_transmission(boolean) I can't see why it would make sense to put that into a separate table and join in
thevalues. 

I can't see why that would make sense, either.  Nor do I think that
doing so would increase the level of normalization.  What rule of
normalization would be served by creating gratuitous joins?

> Or the milage or the price as another example. The cars table used for search is indexed quite a lot.

How useful are the indices?

> The questions:
> Having the above setup in mind, what impact on performance, in terms of read performance and write performance, does
ithave, whether I do the following: 
>         1) In general would the read and/or the write on the database be faster, if I serialized some of the not
searchedcolumns in the table into a single text columns instead of let's say 20 booleans? 

Probably not. And could make it much worse, depending on how you
serialize it.  For example, if you use hstore or json, now the "column
names" for each of the 20 booleans are repeated in every row, rather
than being metadata stored only once.  But try it and see.

>         2) Lets say I'm updating a timestamp in a single one of the 151 columns in the cars table. The update
statementis using the id to find the car. Would the write performance of that UPDATE be affected, if the table had
fewercolumns? 

Yes, but probably not by much.  The biggest effect will be on whether
the timestamp column is indexed.  If it is, then updating it means
that all other indexes on the table will also need to be updated.  If
it is not indexed, then the update can be a HOT update.


>         3) When adding a new column to the table i know that it becomes slower the more rows is in the table, but
whatabout the "width" of the table does that affect the performance when adding new columns? 

Adding a new column to a table is pretty much instantaneous if the
default value is NULL.

>         4) In general what performance downsides do you get when adding a lot of columns to one table instead of
havingthem in separate tables? 

This question cannot be answered in general.  If every time you use
the main table you have to join it to the separate table, then
performance will be bad.  If you almost never have to join to the
separate table, then performance will be better.

>         5) Is it significantly faster to select * from a table with 20 columns, than selecting the same 20 in a table
with150 columns? 

If the extra 130 columns are mostly null, the difference will be very
small.  Or, if the where clause is such that you only do a single-row
lookup on a primary key column, for example, the difference will also
be small.

Cheers,

Jeff


pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: Optimize update query
Next
From: "suhas.basavaraj12"
Date:
Subject: NEED REPLICATION SOLUTION -POSTGRES 9.1