Re: join tables vs. denormalization by trigger - Mailing list pgsql-performance

From Mark Lewis
Subject Re: join tables vs. denormalization by trigger
Date
Msg-id 1188933480.5367.40.camel@archimedes
Whole thread Raw
In response to join tables vs. denormalization by trigger  ("Walter Mauritz" <waltermauritz@gmx.at>)
List pgsql-performance
On Tue, 2007-09-04 at 20:53 +0200, Walter Mauritz wrote:
> Hi,
>
> I wonder about differences in performance between two scenarios:
>
> Background:
> Table A, ~50,000 records
> Table B, ~3,000,000 records (~20 cols)
> Table C, ~30,000,000 records (~10 cols)
>
> a query every 3sec. with limit 10
>
> Table C depends on Table B wich depends on Table A, int8 foreign key, btree index
>
> * consider it a read only scenario (load data only in night, with time for vacuum analyze daily)
> * im required to show records from Table C, but also with some (~5cols) info from Table B
> * where clause always contains the foreign key to Table A
> * where clause may contain further 1-10 search parameter
>
>
> Scenario A)
> simply inner join Table B + C
>
> Scenario B)
> with use of trigger on insert/update I could push the required information from table B down to table C.
> -> so i would only require to select from table C.
>
>
> My question:
> 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ?

You're assuming that B is always going to be faster than A, which
certainly isn't a foregone conclusion.  Let's say that you average 10
bytes per column.  In scenario A, the total data size is then roughly
3,000,000 * 20 * 10 + 30,000,000 * 10 * 10 = 3.6 GiB.  In scenario B due
to your denormalization, the total data size is more like 30,000,000 *
30 * 10 = 9 GiB, or 2.5 times more raw data.

That's a lot of extra disk IO, unless your database will always fit in
memory in both scenarios.

Although you didn't provide enough data to answer with certainty, I
would go on the assumption that A is going to be faster than B.  But
even if it weren't, remember that premature optimization is the root of
all evil.  If you try A and it doesn't perform fast enough, then you can
always try B later to see if it works any better.

-- Mark Lewis

pgsql-performance by date:

Previous
From: Bill Moran
Date:
Subject: Re: About autovacuum
Next
From: "Carlo Stonebanks"
Date:
Subject: Performance on 8CPU's and 32GB of RAM