Re: Research/Implementation of Nested Loop Join optimization - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: Research/Implementation of Nested Loop Join optimization
Date
Msg-id D425483C2C5C9F49B5B7A41F8944154701000FA1@postal.corporate.connx.com
Whole thread Raw
In response to Research/Implementation of Nested Loop Join optimization  ("Manoel Henrique" <mhenriquesgbd@gmail.com>)
Responses Re: Research/Implementation of Nested Loop Join optimization  ("Manoel Henrique" <mhenriquesgbd@gmail.com>)
List pgsql-hackers

When you install the source tree (e.g. in folder \postgresql-8.3.x) you will want to examine nodeMergejoin.c typically found in a path similar to this:

\postgresql-8.3.x\src\backend\executor\nodeMergejoin.c

 

Here are the comments from the version on my machine:

 

/*

 * INTERFACE ROUTINES

 *                            ExecMergeJoin                                 mergejoin outer and inner relations.

 *                            ExecInitMergeJoin                          creates and initializes run time states

 *                            ExecEndMergeJoin                         cleans up the node.

 *

 * NOTES

 *

 *                            Merge-join is done by joining the inner and outer tuples satisfying

 *                            join clauses of the form ((= outerKey innerKey) ...).

 *                            The join clause list is provided by the query planner and may contain

 *                            more than one (= outerKey innerKey) clause (for composite sort key).

 *

 *                            However, the query executor needs to know whether an outer

 *                            tuple is "greater/smaller" than an inner tuple so that it can

 *                            "synchronize" the two relations. For example, consider the following

 *                            relations:

 *

 *                                                            outer: (0 ^1 1 2 5 5 5 6 6 7)            current tuple: 1

 *                                                            inner: (1 ^3 5 5 5 5 6)                                       current tuple: 3

 *

 *                            To continue the merge-join, the executor needs to scan both inner

 *                            and outer relations till the matching tuples 5. It needs to know

 *                            that currently inner tuple 3 is "greater" than outer tuple 1 and

 *                            therefore it should scan the outer relation first to find a

 *                            matching tuple and so on.

 *

 *                            Therefore, rather than directly executing the merge join clauses,

 *                            we evaluate the left and right key expressions separately and then

 *                            compare the columns one at a time (see MJCompare).  The planner

 *                            passes us enough information about the sort ordering of the inputs

 *                            to allow us to determine how to make the comparison.  We may use the

 *                            appropriate btree comparison function, since Postgres' only notion

 *                            of ordering is specified by btree opfamilies.

 *

 *

 *                            Consider the above relations and suppose that the executor has

 *                            just joined the first outer "5" with the last inner "5". The

 *                            next step is of course to join the second outer "5" with all

 *                            the inner "5's". This requires repositioning the inner "cursor"

 *                            to point at the first inner "5". This is done by "marking" the

 *                            first inner 5 so we can restore the "cursor" to it before joining

 *                            with the second outer 5. The access method interface provides

 *                            routines to mark and restore to a tuple.

 *

 *

 *                            Essential operation of the merge join algorithm is as follows:

 *

 *                            Join {

 *                                            get initial outer and inner tuples                                                               INITIALIZE

 *                                            do forever {

 *                                                            while (outer != inner) {                                                                  SKIP_TEST

 *                                                                            if (outer < inner)

 *                                                                                            advance outer                                                                                   SKIPOUTER_ADVANCE

 *                                                                            else

 *                                                                                            advance inner                                                                                   SKIPINNER_ADVANCE

 *                                                            }

 *                                                            mark inner position                                                                                                         SKIP_TEST

 *                                                            do forever {

 *                                                                            while (outer == inner) {

 *                                                                                            join tuples                                                                                                           JOINTUPLES

 *                                                                                            advance inner position                                                  NEXTINNER

 *                                                                            }

 *                                                                            advance outer position                                                                 NEXTOUTER

 *                                                                            if (outer == mark)                                                                                            TESTOUTER

 *                                                                                            restore inner position to mark                   TESTOUTER

 *                                                                            else

 *                                                                                            break    // return to top of outer loop

 *                                                            }

 *                                            }

 *                            }

 *

 *                            The merge join operation is coded in the fashion

 *                            of a state machine.  At each state, we do something and then

 *                            proceed to another state.  This state is stored in the node's

 *                            execution state information and is preserved across calls to

 *                            ExecMergeJoin. -cim 10/31/89

 */

 

From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Manoel Henrique
Sent: Wednesday, July 23, 2008 1:17 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Research/Implementation of Nested Loop Join optimization

 

Hi! I`m a researcher from PUC-Rio (Brazil) and we`re studying about an Joins, and we`d like to implement an optimization on the Nested Loop Join, this optimization consists on while scanning the inner table, the iteration would go from up-down then backwards(down-up) to take advantage of the buffer pages in memory.

We`d work with MaterialScan and only NestedLoop (we`re dropping all indexes and keys to make it this way). The research objective is to show some students how a DBMS works.


Does PostgreSQL already works this way?

Is it possible to implement such thing? Is it easy? how hard?

 

 

Thank you in advance,

Manoel Henrique Souza.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pltcl_*mod commands are broken on Solaris 10
Next
From: Markus Wanner
Date:
Subject: Re: Postgres-R: internal messaging