Home > mailing lists

Foreign join search stops on the first try - Mailing list pgsql-hackers

From	Alexander Pyhalov
Subject	Foreign join search stops on the first try
Date	January 25, 2022 07:56:12
Msg-id	03ed1b1c28b151325a43facba16e201b@postgrespro.ru Whole thread Raw
Responses	Re: Foreign join search stops on the first try
List	pgsql-hackers

Tree view

Hi.

I was investigating why foreign join is not happening and found the 
following issue. When we search for foreign join path we stop on the 
first try. For example, in case
A JOIN B JOIN C where all of them are foreign tables on the same server 
firstly we estimate A JOIN B, A JOIN C and B JOIN C costs. Then we 
select one of them (let's say A JOIN B) and estimate (A JOIN B) JOIN C 
cost. On this stage we fill in joinrel->fdw_private in joinrel (A, B, 
C). Now we could look at A JOIN (B JOIN C), but we don't, as
in contrib/postgres_fdw/postgres_fdw.c:5962  (in 
postgresGetForeignJoinPaths()) we have:

    /*
     * Skip if this join combination has been considered already.
     */
    if (joinrel->fdw_private)
        return;

This can lead to situation when A JOIN B JOIN C cost is not correctly 
estimated (as we don't look at another join orders) and foreign join is 
not pushed down when it would be efficient.

Attached patch tries to fix this. Now we collect different join paths 
for joinrels, if join is possible at all. However, as we don't know what 
path is really selected, we can't precisely estimate fpinfo costs.

The following example shows the case, when we fail to push down foreign 
join due to mentioned issue.

create extension postgres_fdw;

DO $d$
     BEGIN
         EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER 
postgres_fdw
             OPTIONS (dbname '$$||current_database()||$$',
                      port '$$||current_setting('port')||$$'
             )$$;
END
$d$;

create user MAPPING FOR PUBLIC SERVER loopback ;

CREATE SCHEMA test;

CREATE TABLE test.district (
     d_id smallint NOT NULL PRIMARY KEY,
     d_next_o_id integer NOT NULL,
     d_name varchar(100) NOT NULL
);

CREATE TABLE test.stock (
     s_i_id integer NOT NULL PRIMARY KEY,
     s_quantity smallint NOT NULL,
     s_data varchar(100) NOT NULL
);

CREATE TABLE test.order_line (
     ol_o_id integer NOT NULL PRIMARY KEY,
     ol_i_id integer NOT NULL,
     ol_d_id smallint NOT NULL
);

import foreign schema test from server loopback into public ;

INSERT INTO district SELECT i, 20000 + 20*i  , 'test' FROM 
generate_series(1,10) i;
INSERT INTO stock SELECT i, round(random()*100) + 10, 'test data' FROM 
generate_series(1,10000) i;
INSERT INTO order_line SELECT i, i%10000, i%10   FROM 
generate_series(1,30000) i;

analyze test.order_line, test.district ,test.stock;
analyze order_line, district ,stock;


Without patch:

explain analyze verbose SELECT * FROM order_line, stock, district WHERE 
ol_d_id = 1 AND d_id = 1 AND (ol_o_id < d_next_o_id) AND ol_o_id >= 
(d_next_o_id - 20) AND s_i_id = ol_i_id AND s_quantity < 11;
                                                                          
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=382.31..966.86 rows=2 width=37) (actual 
time=5.459..5.466 rows=0 loops=1)
    Output: order_line.ol_o_id, order_line.ol_i_id, order_line.ol_d_id, 
stock.s_i_id, stock.s_quantity, stock.s_data, district.d_id, 
district.d_next_o_id, district.d_name
    Hash Cond: (order_line.ol_i_id = stock.s_i_id)
    ->  Foreign Scan  (cost=100.00..683.29 rows=333 width=21) (actual 
time=2.773..2.775 rows=2 loops=1)
          Output: order_line.ol_o_id, order_line.ol_i_id, 
order_line.ol_d_id, district.d_id, district.d_next_o_id, district.d_name
          Relations: (public.order_line) INNER JOIN (public.district)
          Remote SQL: SELECT r1.ol_o_id, r1.ol_i_id, r1.ol_d_id, r3.d_id, 
r3.d_next_o_id, r3.d_name FROM (test.order_line r1 INNER JOIN 
test.district r3 ON (((r1.ol_o_id < r3.d_next_o_id)) AND ((r1.ol_o_id >= 
(r3.d_next_o_id - 20))) AND ((r3.d_id = 1)) AND ((r1.ol_d_id = 1))))
    ->  Hash  (cost=281.42..281.42 rows=71 width=16) (actual 
time=2.665..2.665 rows=48 loops=1)
          Output: stock.s_i_id, stock.s_quantity, stock.s_data
          Buckets: 1024  Batches: 1  Memory Usage: 11kB
          ->  Foreign Scan on public.stock  (cost=100.00..281.42 rows=71 
width=16) (actual time=2.625..2.631 rows=48 loops=1)
                Output: stock.s_i_id, stock.s_quantity, stock.s_data
                Remote SQL: SELECT s_i_id, s_quantity, s_data FROM 
test.stock WHERE ((s_quantity < 11))
  Planning Time: 1.812 ms
  Execution Time: 8.534 ms
(15 rows)

With patch:

explain analyze verbose SELECT * FROM order_line, stock, district WHERE 
ol_d_id = 1 AND d_id = 1 AND (ol_o_id < d_next_o_id) AND ol_o_id >= 
(d_next_o_id - 20) AND s_i_id = ol_i_id AND s_quantity < 11;
                                                                          
                                                                          
                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan  (cost=100.00..974.88 rows=2 width=37) (actual 
time=3.399..3.400 rows=0 loops=1)
    Output: order_line.ol_o_id, order_line.ol_i_id, order_line.ol_d_id, 
stock.s_i_id, stock.s_quantity, stock.s_data, district.d_id, 
district.d_next_o_id, district.d_name
    Relations: ((public.order_line) INNER JOIN (public.district)) INNER 
JOIN (public.stock)
    Remote SQL: SELECT r1.ol_o_id, r1.ol_i_id, r1.ol_d_id, r2.s_i_id, 
r2.s_quantity, r2.s_data, r3.d_id, r3.d_next_o_id, r3.d_name FROM 
((test.order_line r1 INNER JOIN test.district r3 ON (((r1.ol_o_id < 
r3.d_next_o_id)) AND ((r1.ol_o_id >= (r3.d_next_o_id - 20))) AND 
((r3.d_id = 1)) AND ((r1.ol_d_id = 1)))) INNER JOIN test.stock r2 ON 
(((r1.ol_i_id = r2.s_i_id)) AND ((r2.s_quantity < 11))))
  Planning Time: 0.928 ms
  Execution Time: 4.511 ms



-- 
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment

v01-0001-Look-through-all-possible-foreign-join-orders.patch

pgsql-hackers by date:

From: David Rowley
Date: 25 January 2022, 07:03:07
Subject: Re: Fix BUG #17335: Duplicate result rows in Gather node

From: Dmitry Koval
Date: 25 January 2022, 07:56:53
Subject: Re: WIN32 pg_import_system_collations

Foreign join search stops on the first try - Mailing list pgsql-hackers

Attachment

Previous

Next