Thread: join tables vs. denormalization by trigger

join tables vs. denormalization by trigger

From

"Walter Mauritz"

Date:

04 September 2007, 16:00:25

Hi,

I wonder about differences in performance between two scenarios:

Background:
Table A, ~50,000 records
Table B, ~3,000,000 records (~20 cols)
Table C, ~30,000,000 records (~10 cols)

a query every 3sec. with limit 10

Table C depends on Table B wich depends on Table A, int8 foreign key, btree index

* consider it a read only scenario (load data only in night, with time for vacuum analyze daily)
* im required to show records from Table C, but also with some (~5cols) info from Table B
* where clause always contains the foreign key to Table A
* where clause may contain further 1-10 search parameter


Scenario A)
simply inner join Table B + C

Scenario B)
with use of trigger on insert/update I could push the required information from table B down to table C.
-> so i would only require to select from table C.


My question:
1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ?

2) any other tips for such a read only scenario

Thx for any attention :-)
Walter
--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Re: join tables vs. denormalization by trigger

From

Mark Lewis

Date:

04 September 2007, 16:18:15

On Tue, 2007-09-04 at 20:53 +0200, Walter Mauritz wrote:
> Hi,
>
> I wonder about differences in performance between two scenarios:
>
> Background:
> Table A, ~50,000 records
> Table B, ~3,000,000 records (~20 cols)
> Table C, ~30,000,000 records (~10 cols)
>
> a query every 3sec. with limit 10
>
> Table C depends on Table B wich depends on Table A, int8 foreign key, btree index
>
> * consider it a read only scenario (load data only in night, with time for vacuum analyze daily)
> * im required to show records from Table C, but also with some (~5cols) info from Table B
> * where clause always contains the foreign key to Table A
> * where clause may contain further 1-10 search parameter
>
>
> Scenario A)
> simply inner join Table B + C
>
> Scenario B)
> with use of trigger on insert/update I could push the required information from table B down to table C.
> -> so i would only require to select from table C.
>
>
> My question:
> 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ?

You're assuming that B is always going to be faster than A, which
certainly isn't a foregone conclusion.  Let's say that you average 10
bytes per column.  In scenario A, the total data size is then roughly
3,000,000 * 20 * 10 + 30,000,000 * 10 * 10 = 3.6 GiB.  In scenario B due
to your denormalization, the total data size is more like 30,000,000 *
30 * 10 = 9 GiB, or 2.5 times more raw data.

That's a lot of extra disk IO, unless your database will always fit in
memory in both scenarios.

Although you didn't provide enough data to answer with certainty, I
would go on the assumption that A is going to be faster than B.  But
even if it weren't, remember that premature optimization is the root of
all evil.  If you try A and it doesn't perform fast enough, then you can
always try B later to see if it works any better.

-- Mark Lewis

Re: join tables vs. denormalization by trigger

From

"Marc Mamin"

Date:

04 September 2007, 19:48:50

Hello,

I had a similar issue and -atfer testing - decided to merge the tables
B and C into a single table.
In my case the resulting table contains a large proportion of nulls
which limits the size increase...
You'll have to do some testing with your data to evaluate the
performance gain.

Hope to help,

Marc

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Walter
Mauritz
Sent: Tuesday, September 04, 2007 8:53 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] join tables vs. denormalization by trigger

Hi,

I wonder about differences in performance between two scenarios:

Background:
Table A, ~50,000 records
Table B, ~3,000,000 records (~20 cols)
Table C, ~30,000,000 records (~10 cols)

a query every 3sec. with limit 10

Table C depends on Table B wich depends on Table A, int8 foreign key,
btree index

* consider it a read only scenario (load data only in night, with time
for vacuum analyze daily)
* im required to show records from Table C, but also with some (~5cols)
info from Table B
* where clause always contains the foreign key to Table A
* where clause may contain further 1-10 search parameter


Scenario A)
simply inner join Table B + C

Scenario B)
with use of trigger on insert/update I could push the required
information from table B down to table C.
-> so i would only require to select from table C.


My question:
1) From your experience ... how much faster (approximately) in percent
do you regard Scenario B faster than A ?

2) any other tips for such a read only scenario

Thx for any attention :-)
Walter
--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match