Re: Selecting across Multiple Tables - Mailing list pgsql-general
From | Tielman J de Villiers |
---|---|
Subject | Re: Selecting across Multiple Tables |
Date | |
Msg-id | E1F206EC93DCD4119A05009027A41359064AD3@sbssvr.bondnet.co.za Whole thread Raw |
In response to | Selecting across Multiple Tables (Tielman J de Villiers <tjdevil@bondnet.co.za>) |
List | pgsql-general |
Thank you Antonio, I have played with a couple of possibilities, but none really give a good enogh performance. The main change was creating indices on all the unique_id columns of the 10 tables. That decreased the select time somewhat. The main issue however is why postgres 7.1 works fine (and quick) on "WHERE" statements such as >Where 1.id = (select last_value from sequence) and > 2.id = (select last_value from sequence) and > 3.id = (select last_value from sequence) and > 4.id = (select last_value from sequence) and While postgres 6.2 drops dead on it, and works slow on "WHERE" such as >Where 1.id = (select last_value from sequence) and > 2.id = 1.id and > 3.id = 1.id and > 4.id = 1.id and > 5.id = 1.id and Regards, Tielman J de Villiers BondNet Pty Ltd -----Original Message----- From: Antonio Fiol Bonnín [mailto:fiol@w3ping.com] Sent: Wednesday, December 05, 2001 3:43 PM To: Tielman J de Villiers Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Selecting across Multiple Tables > > >I am now rewriting the structure to rather insert/update/delete the >details in 10 little tables, all linked by a unique ID. > I see no reason to do so unless all 10 "attributes" (a,b,c,d,...) are very unfrequent, and otherwise tables would get far too big. >The problem is selecting ALL details from all 10 the tables in this >kind of >format: > >Select a,b,c, > d,e, > f,g, > ... >From 1,2,3,4,5,6,7,8,9,10 >Where 1.id = (select last_value from sequence) and > 2.id = 1.id and > 3.id = 1.id and > 4.id = 1.id and > 5.id = 1.id and > ... > >When I developed the new application, it was on postgres 7.1.3, and >initially it caused a heavy load on the postmaster. This load was >substantially reduced by changing the where statement to: > > ... >Where 1.id = (select last_value from sequence) and > 2.id = (select last_value from sequence) and > 3.id = (select last_value from sequence) and > 4.id = (select last_value from sequence) and > ... > >When I now tried the same on the live server, running postgres 6.5.3, >and any which way I try, I get an extremely heavy load on the >postmaster -- with the last test I had a (cost=737.78 rows=11 width=40) >when selecting only 1 column from table 1!. > >Am I misunderstanding the "relational" data model completely or >selecting wrongly? > I'm afraid that performance shoud be substantially better if you had a single table with all the attributes. Except for situations like: Contact_name, Company_name, Phone_number, Billing_address In this case, billing address is something "company-specific", and not "contact-specific". So I'd see something like: Table1: Contact_name, Phone_number, id_company Table2: id_company, Company_name, Billing_address If and only if you intend to have multiple contacts for each company. If you only intend to have one contact for each company, performance (and file size) will be better if you stick everything in the same table. Another situation that could lead to good results by splitting tables is the following: Original Table: Report_number, magnitudeA, magnitudeB, magnitudeC If reports usually only contain one magnitude, either A, B, or C, you may think of splitting that into three tables: Table A: Report_number, magnitudeA Table B: Report_number, magnitudeB Table C: Report_number, magnitudeC You may substantially improve disk usage, but it depends mostly on how your data is organised. However, performance does not seem to me that it may get improved by splitting tables, in general, as you will need (in general, again) to re-join the information to access it. Each case is different, and it depends much on your numbers. As a general advice: EXPLAIN SELECT ... And then analyze precisely what is happening. A long (high cost) seq scan is bad. But an index scan can also be bad, if it is not on the best possible index. I have recently optimised a request that used to take about 50 sec, and now it is sub-second, just by finding (and creating) the missing index. In your case, I suppose I should not need to say that if you have the split tables you should index all of them on the ID you use to access the data. I hope that helps, but I could be completely wrong. Good luck! Antonio
pgsql-general by date: