Joins and links - Mailing list pgsql-general

From Leon
Subject Joins and links
Date
Msg-id 13740.990705@udmnet.ru
Whole thread Raw
Responses Re: [GENERAL] Joins and links
List pgsql-general
Hello all!

 You probably remember me - recently I complained about speed
 of joins in Postgres. After a short investigation the way was
 found in which Postgres's optimizer can do the right job. It
 was constructive discussion. Now I want to tell you what could
 make Postgres better and faster. And what will make us (our
 development group) happy. Maybe I am bothering someone, if
 I do - tell me that.

 Let me begin.

 First of all, some accounting reports need to be delivered
 very fast - within minutes or so. And what's bad is that
 quite a few of these reports are quite time-consuming and search
 intensive. In particular, internals of these reports include
 a lot of joins on tables.

 Secondly, almost all of accounting information naturally
 fits into network data model, which can be implemented very
 efficiently.

 This stuff described here is not accounting-specific, it
 can be found in every database which uses master-detail
 tables and other such types of relations.

 So. How is join being performed in such cases? Although I am
 not an expert, I can imagine the way: first there is an (index)
 scan on first table, and then an (index) scan on the second.
 It is the best way, reality could be much worse as we have seen.

 How can we radically improve performance in such cases? There
 is a simple and quite obvious way. (For you not to think that
 I am hallucinating I will tell you that there exist some
 real servers that offer such features I am talking about)
 We should make a real reference in one table to another! That
 means there could be special data type called, say, "link",
 which is a physical record number in the foreign table.

 Queries could look like this:

 table1:
 a int4
 b link (->table2)

 table2:
 c int4
 recnum (system auxiliary field, really a record number in the table)

 select * from table2 where table1.a > 5 and table1.b = table2.recnum

 Such joins can fly really fast, as practice shows :)
 Just consider: the thing table1.b = table2.recnum is a READY-MADE
 join, so server doesn't have to build anything on top of that. It
 can simply perform lookup through link, and since it is a physical
 record number, this is done with the efficiency of C pointers! Thus
 performance gain is ENORMOUS.

 And it simplifies the optimizer, because it doesn't have to decide
 anything about whether to use indices and such like. The join is
 performed always the same way, and it is the best way.

 This feature, being implemented, could bring Postgres ahead
 of most commercial servers, so proving creative abilities of
 free software community. Let us make a step in the future!

Best regards,
 Leon



pgsql-general by date:

Previous
From: Herouth Maoz
Date:
Subject: Re: [GENERAL] Curiousity about indexing, again
Next
From: "Ansley, Michael"
Date:
Subject: EXPLAIN