Re: Database design - best practice - Mailing list pgsql-performance
From | Jeff Janes |
---|---|
Subject | Re: Database design - best practice |
Date | |
Msg-id | CAMkU=1yCWLchP-F2MvDjyjTJwJv4XASq_dU0Afnjm6-Gt04SNg@mail.gmail.com Whole thread Raw |
In response to | Database design - best practice (Niels Kristian Schjødt <nielskristian@autouncle.com>) |
List | pgsql-performance |
On Wed, Nov 28, 2012 at 4:41 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote: > > So my main concern is actually about the cars table, since this one currently has a lot of columns (151 - I expect thatsquite a lot?), and a lot of data (4 mil. rows, and growing). Now you might start by thinking, this could sound likea regular need for some normalization, but wait a second and let me explain :-) If you have 151 single-valued pieces of information, than that is what you have. You can't tell if something is normalized or not by counting the columns. > The columns in this table is for the most very short stings, integers, decimals or booleans. So take for an example has_automatic_transmission(boolean) I can't see why it would make sense to put that into a separate table and join in thevalues. I can't see why that would make sense, either. Nor do I think that doing so would increase the level of normalization. What rule of normalization would be served by creating gratuitous joins? > Or the milage or the price as another example. The cars table used for search is indexed quite a lot. How useful are the indices? > The questions: > Having the above setup in mind, what impact on performance, in terms of read performance and write performance, does ithave, whether I do the following: > 1) In general would the read and/or the write on the database be faster, if I serialized some of the not searchedcolumns in the table into a single text columns instead of let's say 20 booleans? Probably not. And could make it much worse, depending on how you serialize it. For example, if you use hstore or json, now the "column names" for each of the 20 booleans are repeated in every row, rather than being metadata stored only once. But try it and see. > 2) Lets say I'm updating a timestamp in a single one of the 151 columns in the cars table. The update statementis using the id to find the car. Would the write performance of that UPDATE be affected, if the table had fewercolumns? Yes, but probably not by much. The biggest effect will be on whether the timestamp column is indexed. If it is, then updating it means that all other indexes on the table will also need to be updated. If it is not indexed, then the update can be a HOT update. > 3) When adding a new column to the table i know that it becomes slower the more rows is in the table, but whatabout the "width" of the table does that affect the performance when adding new columns? Adding a new column to a table is pretty much instantaneous if the default value is NULL. > 4) In general what performance downsides do you get when adding a lot of columns to one table instead of havingthem in separate tables? This question cannot be answered in general. If every time you use the main table you have to join it to the separate table, then performance will be bad. If you almost never have to join to the separate table, then performance will be better. > 5) Is it significantly faster to select * from a table with 20 columns, than selecting the same 20 in a table with150 columns? If the extra 130 columns are mostly null, the difference will be very small. Or, if the where clause is such that you only do a single-row lookup on a primary key column, for example, the difference will also be small. Cheers, Jeff
pgsql-performance by date: