Joins and links - Mailing list pgsql-general
From | Leon |
---|---|
Subject | Joins and links |
Date | |
Msg-id | 13740.990705@udmnet.ru Whole thread Raw |
Responses |
Re: [GENERAL] Joins and links
|
List | pgsql-general |
Hello all! You probably remember me - recently I complained about speed of joins in Postgres. After a short investigation the way was found in which Postgres's optimizer can do the right job. It was constructive discussion. Now I want to tell you what could make Postgres better and faster. And what will make us (our development group) happy. Maybe I am bothering someone, if I do - tell me that. Let me begin. First of all, some accounting reports need to be delivered very fast - within minutes or so. And what's bad is that quite a few of these reports are quite time-consuming and search intensive. In particular, internals of these reports include a lot of joins on tables. Secondly, almost all of accounting information naturally fits into network data model, which can be implemented very efficiently. This stuff described here is not accounting-specific, it can be found in every database which uses master-detail tables and other such types of relations. So. How is join being performed in such cases? Although I am not an expert, I can imagine the way: first there is an (index) scan on first table, and then an (index) scan on the second. It is the best way, reality could be much worse as we have seen. How can we radically improve performance in such cases? There is a simple and quite obvious way. (For you not to think that I am hallucinating I will tell you that there exist some real servers that offer such features I am talking about) We should make a real reference in one table to another! That means there could be special data type called, say, "link", which is a physical record number in the foreign table. Queries could look like this: table1: a int4 b link (->table2) table2: c int4 recnum (system auxiliary field, really a record number in the table) select * from table2 where table1.a > 5 and table1.b = table2.recnum Such joins can fly really fast, as practice shows :) Just consider: the thing table1.b = table2.recnum is a READY-MADE join, so server doesn't have to build anything on top of that. It can simply perform lookup through link, and since it is a physical record number, this is done with the efficiency of C pointers! Thus performance gain is ENORMOUS. And it simplifies the optimizer, because it doesn't have to decide anything about whether to use indices and such like. The join is performed always the same way, and it is the best way. This feature, being implemented, could bring Postgres ahead of most commercial servers, so proving creative abilities of free software community. Let us make a step in the future! Best regards, Leon
pgsql-general by date: