postgres_fdw chooses remote index scan too rarely - Mailing list pgsql-general

From Jeremy Finzel
Subject postgres_fdw chooses remote index scan too rarely
Date
Msg-id CAMa1XUhU_tQJc2eb1R5Dd+DsjGdyb=vbForGniaZy+wqbDOL+A@mail.gmail.com
Whole thread Raw
List pgsql-general
First of all, my local system which has postgres_fdw installed is this version:
PostgreSQL 9.6.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit

Here are my server configurations (I have tweaked these with no improvement to my issue):
  • fetch_size=100
  • fdw_startup_cost=1000
  • fdw_tuple_cost=100000
  • use_remote_estimate=true
I am seeing some really odd behavior and I'm not sure how I can most easily help our users to write queries that actually hit an index scan on the remote server.  In one example, I have:
  • a temp table with 33 rows
  • text instead of integer data type - which the remote table's data type is
  • it is freshly analyzed
Joining to the table, doing id IN (list), or doing EXISTS all yield a full remote table scan:

select *
from remote.customers c
inner join abc t on t.customer_id::int=c.customer_id;

select *
from remote.customers c
where c.customer_id in (SELECT customer_id::int FROM abc);

select *
from remote.customers c
where exists (SELECT 1 FROM abc WHERE customer_id::int = c.customer_id);

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Hash Join  (cost=1002.47..1170208178867.64 rows=5851034 width=3113)
   Hash Cond: (c.customer_id = (abc.customer_id)::integer)
   ->  Foreign Scan on customers c  (cost=1000.43..1170208089344.77 rows=11702069 width=902)
   ->  Hash  (cost=1.69..1.69 rows=28 width=7)
         ->  HashAggregate  (cost=1.41..1.69 rows=28 width=7)
               Group Key: (abc.customer_id)::integer
               ->  Seq Scan on abc  (cost=0.00..1.33 rows=33 width=7)
(7 rows)

However, I can get the index scan 2 ways:
  • converting the temp table data type to int
  • keeping the data type as is, but adding DISTINCT to the IN list
Neither of these make any sense to me because the planner knows there are only 33 rows in the temp table.  It should always do an index scan.  And why should converting the data type to int be any different than casting it to int?  I understand the planner stats on that field are not for an integer, but it's still only 33 rows.

So these 2 versions get a good plan and actually run very quickly:

select *
from remote.customers c
where c.customer_id in (SELECT DISTINCT customer_id::int FROM abc);

WITH distinctified AS (SELECT DISTINCT customer_id::int FROM abc)

select *
from remote.customers c
INNER JOIN distinctified t on t.customer_id::int=c.customer_id;

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Nested Loop  (cost=1002.03..5628140.32 rows=5851034 width=3113)
   ->  HashAggregate  (cost=1.58..2.00 rows=28 width=4)
         Group Key: (abc.customer_id)::integer
         ->  Seq Scan on abc  (cost=0.00..1.50 rows=33 width=4)
   ->  Foreign Scan on customers c  (cost=1000.45..201004.91 rows=2 width=902)
(5 rows)


Likewise if I alter the type it works without any special treatment:

ALTER TABLE abc ALTER COLUMN customer_id TYPE int USING customer_id::INT;
ANALYZE abc;

EXPLAIN
select *
from remote.customers c
inner join abc t using (customer_id);

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Nested Loop  (cost=1000.45..6633164.02 rows=1930841 width=3194)
   ->  Seq Scan on abc t  (cost=0.00..1.33 rows=33 width=85)
   ->  Foreign Scan on customers c  (cost=1000.45..201004.91 rows=2 width=902)
(3 rows)

Any insight appreciated!

Thanks,
Jeremy

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Converting to number with given format
Next
From: Gabriel Furstenheim Milerud
Date:
Subject: Re: Converting to number with given format