Re: Research/Implementation of Nested Loop Join optimization - Mailing list pgsql-hackers
From | Manoel Henrique |
---|---|
Subject | Re: Research/Implementation of Nested Loop Join optimization |
Date | |
Msg-id | 5b27a1ad0807231347o53a08dccj26bea6b04e541448@mail.gmail.com Whole thread Raw |
In response to | Re: Research/Implementation of Nested Loop Join optimization ("Dann Corbit" <DCorbit@connx.com>) |
Responses |
Re: Research/Implementation of Nested Loop Join optimization
Re: Research/Implementation of Nested Loop Join optimization |
List | pgsql-hackers |
When you install the source tree (e.g. in folder \postgresql-8.3.x) you will want to examine nodeMergejoin.c typically found in a path similar to this:
\postgresql-8.3.x\src\backend\executor\nodeMergejoin.c
Here are the comments from the version on my machine:
/*
* INTERFACE ROUTINES
* ExecMergeJoin mergejoin outer and inner relations.
* ExecInitMergeJoin creates and initializes run time states
* ExecEndMergeJoin cleans up the node.
*
* NOTES
*
* Merge-join is done by joining the inner and outer tuples satisfying
* join clauses of the form ((= outerKey innerKey) ...).
* The join clause list is provided by the query planner and may contain
* more than one (= outerKey innerKey) clause (for composite sort key).
*
* However, the query executor needs to know whether an outer
* tuple is "greater/smaller" than an inner tuple so that it can
* "synchronize" the two relations. For example, consider the following
* relations:
*
* outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1
* inner: (1 ^3 5 5 5 5 6) current tuple: 3
*
* To continue the merge-join, the executor needs to scan both inner
* and outer relations till the matching tuples 5. It needs to know
* that currently inner tuple 3 is "greater" than outer tuple 1 and
* therefore it should scan the outer relation first to find a
* matching tuple and so on.
*
* Therefore, rather than directly executing the merge join clauses,
* we evaluate the left and right key expressions separately and then
* compare the columns one at a time (see MJCompare). The planner
* passes us enough information about the sort ordering of the inputs
* to allow us to determine how to make the comparison. We may use the
* appropriate btree comparison function, since Postgres' only notion
* of ordering is specified by btree opfamilies.
*
*
* Consider the above relations and suppose that the executor has
* just joined the first outer "5" with the last inner "5". The
* next step is of course to join the second outer "5" with all
* the inner "5's". This requires repositioning the inner "cursor"
* to point at the first inner "5". This is done by "marking" the
* first inner 5 so we can restore the "cursor" to it before joining
* with the second outer 5. The access method interface provides
* routines to mark and restore to a tuple.
*
*
* Essential operation of the merge join algorithm is as follows:
*
* Join {
* get initial outer and inner tuples INITIALIZE
* do forever {
* while (outer != inner) { SKIP_TEST
* if (outer < inner)
* advance outer SKIPOUTER_ADVANCE
* else
* advance inner SKIPINNER_ADVANCE
* }
* mark inner position SKIP_TEST
* do forever {
* while (outer == inner) {
* join tuples JOINTUPLES
* advance inner position NEXTINNER
* }
* advance outer position NEXTOUTER
* if (outer == mark) TESTOUTER
* restore inner position to mark TESTOUTER
* else
* break // return to top of outer loop
* }
* }
* }
*
* The merge join operation is coded in the fashion
* of a state machine. At each state, we do something and then
* proceed to another state. This state is stored in the node's
* execution state information and is preserved across calls to
* ExecMergeJoin. -cim 10/31/89
*/
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Manoel Henrique
Sent: Wednesday, July 23, 2008 1:17 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Research/Implementation of Nested Loop Join optimization
Hi! I`m a researcher from PUC-Rio (Brazil) and we`re studying about an Joins, and we`d like to implement an optimization on the Nested Loop Join, this optimization consists on while scanning the inner table, the iteration would go from up-down then backwards(down-up) to take advantage of the buffer pages in memory.
We`d work with MaterialScan and only NestedLoop (we`re dropping all indexes and keys to make it this way). The research objective is to show some students how a DBMS works.
Does PostgreSQL already works this way?Is it possible to implement such thing? Is it easy? how hard?
Thank you in advance,
Manoel Henrique Souza.
pgsql-hackers by date: