Re: Research/Implementation of Nested Loop Join optimization - Mailing list pgsql-hackers

From Manoel Henrique
Subject Re: Research/Implementation of Nested Loop Join optimization
Date
Msg-id 5b27a1ad0807231347o53a08dccj26bea6b04e541448@mail.gmail.com
Whole thread Raw
In response to Re: Research/Implementation of Nested Loop Join optimization  ("Dann Corbit" <DCorbit@connx.com>)
Responses Re: Research/Implementation of Nested Loop Join optimization
Re: Research/Implementation of Nested Loop Join optimization
List pgsql-hackers
The nodeMergejoin.c is the code for the Merge Join isn`t it? I am trying to find a way to change the Nested Loop Join, It would be more like on nodeNestloop.c when rescanning the inner plan, (second time scanning the inner plan and so on) he`d change the scan direction, If the scan direction was from first tuple to last tuple it would go backwards, if it was from last to first it would go forward... The code I`m looking atm is from 8.3.1 , seems to have some kind of direction manager but doesn`t seems to be in use.
 
--Manoel

On Wed, Jul 23, 2008 at 5:23 PM, Dann Corbit <DCorbit@connx.com> wrote:

When you install the source tree (e.g. in folder \postgresql-8.3.x) you will want to examine nodeMergejoin.c typically found in a path similar to this:

\postgresql-8.3.x\src\backend\executor\nodeMergejoin.c

 

Here are the comments from the version on my machine:

 

/*

 * INTERFACE ROUTINES

 *                            ExecMergeJoin                                 mergejoin outer and inner relations.

 *                            ExecInitMergeJoin                          creates and initializes run time states

 *                            ExecEndMergeJoin                         cleans up the node.

 *

 * NOTES

 *

 *                            Merge-join is done by joining the inner and outer tuples satisfying

 *                            join clauses of the form ((= outerKey innerKey) ...).

 *                            The join clause list is provided by the query planner and may contain

 *                            more than one (= outerKey innerKey) clause (for composite sort key).

 *

 *                            However, the query executor needs to know whether an outer

 *                            tuple is "greater/smaller" than an inner tuple so that it can

 *                            "synchronize" the two relations. For example, consider the following

 *                            relations:

 *

 *                                                            outer: (0 ^1 1 2 5 5 5 6 6 7)            current tuple: 1

 *                                                            inner: (1 ^3 5 5 5 5 6)                                       current tuple: 3

 *

 *                            To continue the merge-join, the executor needs to scan both inner

 *                            and outer relations till the matching tuples 5. It needs to know

 *                            that currently inner tuple 3 is "greater" than outer tuple 1 and

 *                            therefore it should scan the outer relation first to find a

 *                            matching tuple and so on.

 *

 *                            Therefore, rather than directly executing the merge join clauses,

 *                            we evaluate the left and right key expressions separately and then

 *                            compare the columns one at a time (see MJCompare).  The planner

 *                            passes us enough information about the sort ordering of the inputs

 *                            to allow us to determine how to make the comparison.  We may use the

 *                            appropriate btree comparison function, since Postgres' only notion

 *                            of ordering is specified by btree opfamilies.

 *

 *

 *                            Consider the above relations and suppose that the executor has

 *                            just joined the first outer "5" with the last inner "5". The

 *                            next step is of course to join the second outer "5" with all

 *                            the inner "5's". This requires repositioning the inner "cursor"

 *                            to point at the first inner "5". This is done by "marking" the

 *                            first inner 5 so we can restore the "cursor" to it before joining

 *                            with the second outer 5. The access method interface provides

 *                            routines to mark and restore to a tuple.

 *

 *

 *                            Essential operation of the merge join algorithm is as follows:

 *

 *                            Join {

 *                                            get initial outer and inner tuples                                                               INITIALIZE

 *                                            do forever {

 *                                                            while (outer != inner) {                                                                  SKIP_TEST

 *                                                                            if (outer < inner)

 *                                                                                            advance outer                                                                                   SKIPOUTER_ADVANCE

 *                                                                            else

 *                                                                                            advance inner                                                                                   SKIPINNER_ADVANCE

 *                                                            }

 *                                                            mark inner position                                                                                                         SKIP_TEST

 *                                                            do forever {

 *                                                                            while (outer == inner) {

 *                                                                                            join tuples                                                                                                           JOINTUPLES

 *                                                                                            advance inner position                                                  NEXTINNER

 *                                                                            }

 *                                                                            advance outer position                                                                 NEXTOUTER

 *                                                                            if (outer == mark)                                                                                            TESTOUTER

 *                                                                                            restore inner position to mark                   TESTOUTER

 *                                                                            else

 *                                                                                            break    // return to top of outer loop

 *                                                            }

 *                                            }

 *                            }

 *

 *                            The merge join operation is coded in the fashion

 *                            of a state machine.  At each state, we do something and then

 *                            proceed to another state.  This state is stored in the node's

 *                            execution state information and is preserved across calls to

 *                            ExecMergeJoin. -cim 10/31/89

 */

 

From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Manoel Henrique
Sent: Wednesday, July 23, 2008 1:17 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Research/Implementation of Nested Loop Join optimization

 

Hi! I`m a researcher from PUC-Rio (Brazil) and we`re studying about an Joins, and we`d like to implement an optimization on the Nested Loop Join, this optimization consists on while scanning the inner table, the iteration would go from up-down then backwards(down-up) to take advantage of the buffer pages in memory.

We`d work with MaterialScan and only NestedLoop (we`re dropping all indexes and keys to make it this way). The research objective is to show some students how a DBMS works.


Does PostgreSQL already works this way?

Is it possible to implement such thing? Is it easy? how hard?

 

 

Thank you in advance,

Manoel Henrique Souza.


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [PATCHES] GIN improvements
Next
From: Tom Lane
Date:
Subject: Re: Postgres-R: internal messaging