Re: [HACKERS] Implement custom join algorithm - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Implement custom join algorithm
Date
Msg-id 25367.1487303157@sss.pgh.pa.us
Whole thread Raw
In response to [HACKERS] Implement custom join algorithm  (Amin Fallahi <amin.fallahi@gmail.com>)
List pgsql-hackers
Amin Fallahi <amin.fallahi@gmail.com> writes:
> I want to implement my custom join algorithm into postgres. Where should I
> start? Where in the code base and how the current algorithms are currently
> implemented?

TBH, if this is your first Postgres programming project, you might
want to set your sights a bit lower than a new join algorithm.
Tackling something a little less far-reaching would be a good way
of starting to learn your way around the codebase.

But in any case, start here:
https://www.postgresql.org/docs/devel/static/overview.html
and after you've read that, start poking around the codebase.
There are many README files worth looking at, and a significant
part of what you'll need to do will amount to copying-and-pasting
existing code, so first you want to get familiar with the existing
code that does something like what you want.

You might for instance want to model your code on merge join,
src/backend/executor/nodeMergejoin.c
or hash join,
src/backend/executor/nodeHash.c
src/backend/executor/nodeHashjoin.c
(the separation between Hash and HashJoin nodes is a bit artificial
IMO, but it's been there since Berkeley days), or good ol' nestloop,
src/backend/executor/nodeNestloop.c

You'll also need planner support, the core of which would be additions
in these files:
src/backend/optimizer/path/joinpath.c
src/backend/optimizer/path/costsize.c
although depending on how outré your algorithm is and how smart you
want the planner to be, you could need large amounts of additional
work in that area.  (As an example, the desire to avoid extra sorts
for mergejoin input has consequences throughout the planner, starting
with the choice to label every path node with its sort order.  For
that matter, it's hardly accidental that mergejoin depends on btree
operator classes and hashjoin depends on hash operator classes ---
you might have to go as far as developing new types of operator classes
to embody whatever datatype and operator knowledge you need.)

And you'll need a whole bunch of boilerplate support code, such as
copyfuncs.c support for your plan node type.  Grepping for references to
the path and plan node types for one of the existing join methods should
find the places where you need to add that, or at least most of them.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade
Next
From: Amit Langote
Date:
Subject: Re: [HACKERS] Partitioning vs ON CONFLICT