Database design - best practice - Mailing list pgsql-performance

From Niels Kristian Schjødt
Subject Database design - best practice
Date
Msg-id 1101C331-4930-4C1C-BF4D-46BBA2717081@autouncle.com
Whole thread Raw
Responses Re: Database design - best practice  (Willem Leenen <willem_leenen@hotmail.com>)
Re: Database design - best practice  (Vitalii Tymchyshyn <tivv00@gmail.com>)
Re: Database design - best practice  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
Hi,

I'm on the hunt for some solid knowledge on a theoretical level about the performance of postgresql. My question is
regardingbest practices, and how architectural decisions might influence the performance. First a little background: 

The setup:
I have a database which holds informations on used cars. The database has mainly 3 tables of interest for this case:
A cars table, an adverts table and a sellers table. One car has many adverts and one seller has many adverts. One
advertbelongs to one car and one seller. 
The database is powering a website for searching used cars. When searching for used cars, the cars table is mainly
used,and a lot of the columns should be directly available for searching e.g. color, milage, price,
has_automatic_transmissionetc. 

So my main concern is actually about the cars table, since this one currently has a lot of columns (151 - I expect
thatsquite a lot?), and a lot of data (4 mil. rows, and growing). Now you might start by thinking, this could sound
likea regular need for some normalization, but wait a second and let me explain :-) 
The columns in this table is for the most very short stings, integers, decimals or booleans. So take for an example
has_automatic_transmission(boolean) I can't see why it would make sense to put that into a separate table and join in
thevalues. Or the milage or the price as another example. The cars table used for search is indexed quite a lot. 

The questions:
Having the above setup in mind, what impact on performance, in terms of read performance and write performance, does it
have,whether I do the following: 
    1) In general would the read and/or the write on the database be faster, if I serialized some of the not searched
columnsin the table into a single text columns instead of let's say 20 booleans? 
    2) Lets say I'm updating a timestamp in a single one of the 151 columns in the cars table. The update statement is
usingthe id to find the car. Would the write performance of that UPDATE be affected, if the table had fewer columns? 
    3) When adding a new column to the table i know that it becomes slower the more rows is in the table, but what
aboutthe "width" of the table does that affect the performance when adding new columns? 
    4) In general what performance downsides do you get when adding a lot of columns to one table instead of having
themin separate tables? 
    5) Is it significantly faster to select * from a table with 20 columns, than selecting the same 20 in a table with
150columns? 

Hope there is some good answers out there :-)

pgsql-performance by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: pgsql_tmp( Temporary tablespace)
Next
From: Niels Kristian Schjødt
Date:
Subject: Optimize update query