Thread: Implement hook for self-join simplification

Implement hook for self-join simplification

From
Leif Harald Karlsen
Date:

Hi,


I have a made a small deductive database on top of PostgreSQL for educational/research purposes. In this setting, due to certain VIEW-constructions, queries often end up being self-joins on primary keys, e.g.:


SELECT t1.id, t2.val

FROM t AS t1 JOIN t AS t2 USING (id);


where t(id) is a primary key. This query is equivalent to the much more efficient:


SELECT id, val

FROM t AS t1;


However, PostgreSQL currently does not seem to implement this simplification. Therefore, I have looked into writing an extension that performs this, but I am struggling a bit with finding out when this simplification should be done, i.e. which hook I should implement.


The simplification is not too different from those done in prep/prepjoin.c, i.e. doing the simplification on the query-tree directly. However, I think I then would need to implement a planner_hook, as it is the only hook giving me direct access to the query-tree. But I need to perform my simplification after view-definitions have been expanded into the query, and after the transformations in prepjoin.c (but before the rest of planning). But there seems to be no easy way to inject a function there, as this is buried deep in the middle of the planner-function.


I therefore looked into using a set_join_pathlist_hook, and try to do the simplification at path-level. I.e., doing something like:


static void self_join_optimize_hook(PlannerInfo *root, RelOptInfo* joinrel, RelOptInfo* outerrel, RelOptInfo* innerrel, JoinType jointype, JoinPathExtraData* extra)

{

    if (is_selfjoin_on_pk(root, joinrel, extra)) {

        ListCell *p;

        foreach(p, innerrel->pathlist) {

            add_path(joinrel, (Path *) p);

        }

    }


That is, if joinrel is a (binary) self-join on a primary key, the paths for evaluating the join is the same as the paths for evaluating the innerrel, However, this does not work, as the rest of the query may require values from the other table (e.g. t2 in the example above). I therefore need to replace all mentions of t2 with t1, but is this possible at a path-level?


If not, does anyone have a an idea on how this can be done in a different way? Thanks!



Kind regards,


Leif Harald Karlsen

Senior Lecturer

Department of Informatics

University of Oslo

Re: Implement hook for self-join simplification

From
Andrey Lepikhov
Date:
On 24/6/2022 18:58, Leif Harald Karlsen wrote:
> I have a made a small deductive database on top of PostgreSQL for 
> educational/research purposes. In this setting, due to certain 
> VIEW-constructions, queries often end up being self-joins on primary 
> keys, e.g.:
> SELECT t1.id, t2.val
> FROM t AS t1 JOIN t AS t2 USING (id);
> 
> where t(id) is a primary key. This query is equivalent to the much more 
> efficient:
> SELECT id, val FROM t AS t1;
> 
> However, PostgreSQL currently does not seem to implement this 
> simplification. Therefore, I have looked into writing an extension that 
> performs this, but I am struggling a bit with finding out when this 
> simplification should be done, i.e. which hook I should implement.
It is true, but you can use a proposed patch that adds such 
functionality [1].

I tried to reproduce your case:
CREATE TABLE t(id int PRIMARY KEY, val text);
explain verbose
SELECT t1.id, t2.val FROM t AS t1 JOIN t AS t2 USING (id);

With this patch you will get a plan:
  Seq Scan on public.t t2
    Output: t2.id, t2.val
    Filter: (t2.id IS NOT NULL)

The approach, implemented in this patch looks better because removes 
self-joins on earlier stage than the path generation stage. Feel free to 
use it in your research.

[1] 
https://www.postgresql.org/message-id/a1d6290c-44e0-0dfc-3fca-66a68b3109ef@postgrespro.ru

-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: Implement hook for self-join simplification

From
Leif Harald Karlsen
Date:

Hi Andrey,


Thank you for the quick answer, and for the pointer to the patch! This looks like just the thing I need!


On a more general note: What would, in general, be the best way to implement such optimizations? Is there a good way to do this as an extension, or is a patch the preferred way?


Kind regards,
Leif Harald Karlsen
Senior Lecturer
Department of Informatics
University of Oslo

From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Sent: 24 June 2022 19:27:50
To: Leif Harald Karlsen; pgsql-hackers@lists.postgresql.org
Subject: Re: Implement hook for self-join simplification
 
On 24/6/2022 18:58, Leif Harald Karlsen wrote:
> I have a made a small deductive database on top of PostgreSQL for
> educational/research purposes. In this setting, due to certain
> VIEW-constructions, queries often end up being self-joins on primary
> keys, e.g.:
> SELECT t1.id, t2.val
> FROM t AS t1 JOIN t AS t2 USING (id);
>
> where t(id) is a primary key. This query is equivalent to the much more
> efficient:
> SELECT id, val FROM t AS t1;
>
> However, PostgreSQL currently does not seem to implement this
> simplification. Therefore, I have looked into writing an extension that
> performs this, but I am struggling a bit with finding out when this
> simplification should be done, i.e. which hook I should implement.
It is true, but you can use a proposed patch that adds such
functionality [1].

I tried to reproduce your case:
CREATE TABLE t(id int PRIMARY KEY, val text);
explain verbose
SELECT t1.id, t2.val FROM t AS t1 JOIN t AS t2 USING (id);

With this patch you will get a plan:
  Seq Scan on public.t t2
    Output: t2.id, t2.val
    Filter: (t2.id IS NOT NULL)

The approach, implemented in this patch looks better because removes
self-joins on earlier stage than the path generation stage. Feel free to
use it in your research.

[1]
https://www.postgresql.org/message-id/a1d6290c-44e0-0dfc-3fca-66a68b3109ef@postgrespro.ru

--
regards,
Andrey Lepikhov
Postgres Professional

Re: Implement hook for self-join simplification

From
Andrey Lepikhov
Date:
On 24/6/2022 23:43, Leif Harald Karlsen wrote:
> Thank you for the quick answer, and for the pointer to the patch! This 
> looks like just the thing I need! 
> On a more general note: What would, in general, be the best way to 
> implement such optimizations? Is there a good way to do this as an 
> extension, or is a patch the preferred way?
According to my experience, it depends on your needings.
For example, self-join-removal feature, or my current project - 
flattening of nested subqueries - is much more optimal to implement as a 
patch, because you can do it so early as possible and can generalize 
parts of the core code and thus, reduce size of your code a lot.
But if you want to use your code with many PG versions, even already 
working in production or you make just a research, without immediate 
practical result - your choice is an extension.

-- 
regards,
Andrey Lepikhov
Postgres Professional