Thread: [HACKERS] Implement custom join algorithm
Hi
I want to implement my custom join algorithm into postgres. Where should I start? Where in the code base and how the current algorithms are currently implemented?Amin Fallahi <amin.fallahi@gmail.com> writes: > I want to implement my custom join algorithm into postgres. Where should I > start? Where in the code base and how the current algorithms are currently > implemented? TBH, if this is your first Postgres programming project, you might want to set your sights a bit lower than a new join algorithm. Tackling something a little less far-reaching would be a good way of starting to learn your way around the codebase. But in any case, start here: https://www.postgresql.org/docs/devel/static/overview.html and after you've read that, start poking around the codebase. There are many README files worth looking at, and a significant part of what you'll need to do will amount to copying-and-pasting existing code, so first you want to get familiar with the existing code that does something like what you want. You might for instance want to model your code on merge join, src/backend/executor/nodeMergejoin.c or hash join, src/backend/executor/nodeHash.c src/backend/executor/nodeHashjoin.c (the separation between Hash and HashJoin nodes is a bit artificial IMO, but it's been there since Berkeley days), or good ol' nestloop, src/backend/executor/nodeNestloop.c You'll also need planner support, the core of which would be additions in these files: src/backend/optimizer/path/joinpath.c src/backend/optimizer/path/costsize.c although depending on how outré your algorithm is and how smart you want the planner to be, you could need large amounts of additional work in that area. (As an example, the desire to avoid extra sorts for mergejoin input has consequences throughout the planner, starting with the choice to label every path node with its sort order. For that matter, it's hardly accidental that mergejoin depends on btree operator classes and hashjoin depends on hash operator classes --- you might have to go as far as developing new types of operator classes to embody whatever datatype and operator knowledge you need.) And you'll need a whole bunch of boilerplate support code, such as copyfuncs.c support for your plan node type. Grepping for references to the path and plan node types for one of the existing join methods should find the places where you need to add that, or at least most of them. regards, tom lane