Thread: join tables vs. denormalization by trigger
Hi, I wonder about differences in performance between two scenarios: Background: Table A, ~50,000 records Table B, ~3,000,000 records (~20 cols) Table C, ~30,000,000 records (~10 cols) a query every 3sec. with limit 10 Table C depends on Table B wich depends on Table A, int8 foreign key, btree index * consider it a read only scenario (load data only in night, with time for vacuum analyze daily) * im required to show records from Table C, but also with some (~5cols) info from Table B * where clause always contains the foreign key to Table A * where clause may contain further 1-10 search parameter Scenario A) simply inner join Table B + C Scenario B) with use of trigger on insert/update I could push the required information from table B down to table C. -> so i would only require to select from table C. My question: 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ? 2) any other tips for such a read only scenario Thx for any attention :-) Walter -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
On Tue, 2007-09-04 at 20:53 +0200, Walter Mauritz wrote: > Hi, > > I wonder about differences in performance between two scenarios: > > Background: > Table A, ~50,000 records > Table B, ~3,000,000 records (~20 cols) > Table C, ~30,000,000 records (~10 cols) > > a query every 3sec. with limit 10 > > Table C depends on Table B wich depends on Table A, int8 foreign key, btree index > > * consider it a read only scenario (load data only in night, with time for vacuum analyze daily) > * im required to show records from Table C, but also with some (~5cols) info from Table B > * where clause always contains the foreign key to Table A > * where clause may contain further 1-10 search parameter > > > Scenario A) > simply inner join Table B + C > > Scenario B) > with use of trigger on insert/update I could push the required information from table B down to table C. > -> so i would only require to select from table C. > > > My question: > 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ? You're assuming that B is always going to be faster than A, which certainly isn't a foregone conclusion. Let's say that you average 10 bytes per column. In scenario A, the total data size is then roughly 3,000,000 * 20 * 10 + 30,000,000 * 10 * 10 = 3.6 GiB. In scenario B due to your denormalization, the total data size is more like 30,000,000 * 30 * 10 = 9 GiB, or 2.5 times more raw data. That's a lot of extra disk IO, unless your database will always fit in memory in both scenarios. Although you didn't provide enough data to answer with certainty, I would go on the assumption that A is going to be faster than B. But even if it weren't, remember that premature optimization is the root of all evil. If you try A and it doesn't perform fast enough, then you can always try B later to see if it works any better. -- Mark Lewis
Hello, I had a similar issue and -atfer testing - decided to merge the tables B and C into a single table. In my case the resulting table contains a large proportion of nulls which limits the size increase... You'll have to do some testing with your data to evaluate the performance gain. Hope to help, Marc -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Walter Mauritz Sent: Tuesday, September 04, 2007 8:53 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] join tables vs. denormalization by trigger Hi, I wonder about differences in performance between two scenarios: Background: Table A, ~50,000 records Table B, ~3,000,000 records (~20 cols) Table C, ~30,000,000 records (~10 cols) a query every 3sec. with limit 10 Table C depends on Table B wich depends on Table A, int8 foreign key, btree index * consider it a read only scenario (load data only in night, with time for vacuum analyze daily) * im required to show records from Table C, but also with some (~5cols) info from Table B * where clause always contains the foreign key to Table A * where clause may contain further 1-10 search parameter Scenario A) simply inner join Table B + C Scenario B) with use of trigger on insert/update I could push the required information from table B down to table C. -> so i would only require to select from table C. My question: 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ? 2) any other tips for such a read only scenario Thx for any attention :-) Walter -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match