Home > mailing lists

Re: SQL/MED - file_fdw - Mailing list pgsql-hackers

From	Shigeru HANADA
Subject	Re: SQL/MED - file_fdw
Date	January 14, 2011 04:20:35
Msg-id	20110114142019.82AD.6989961C@metrosystems.co.jp Whole thread Raw
In response to	Re: SQL/MED - file_fdw (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Responses	Re: SQL/MED - file_fdw (Itagaki Takahiro <itagaki.takahiro@gmail.com>) Re: SQL/MED - file_fdw (Shigeru HANADA <hanada@metrosystems.co.jp>)
List	pgsql-hackers

Tree view

On Fri, 14 Jan 2011 13:03:27 +0900
Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:

> Good catch. I merged your fix into the attached patch.

Thanks, I'll rebase my patches.

> BTW, why didn't planner choose a materialized plan for the inner loop?
> FDW scans are typically slower than heap scans or TupleTableslot scans,
> it seems reasonable for me to add a Materialize node at the top of the
> inner Foreign Scan, especially when we don't use indexes for the scan
> keys or join keys.

Maybe because foreign tables lack statistics, and file_fdw's estimate
isn't smart enough.

After copying statisticsof pgbench_xxx tables into csv_xxx tables,
planner generates same plans as for local tables, but costs of
ForeignScan nodes are little lower than them of SeqScan nodes.

==============================
postgres=# explain analyze select * from csv_accounts a, csv_branches b where a.bid = b.bid;
                                QUERY PLAN
 

--------------------------------------------------------------------------------------------------------------------------------------Hash
Join (cost=0.33..45467.32 rows=1000000 width=197) (actual time=0.234..8044.077 rows=1000000 loops=1)  Hash Cond: (a.bid
=b.bid)  ->  Foreign Scan on csv_accounts a  (cost=0.00..31717.00 rows=1000000 width=97) (actual time=0.107..4147.074
rows=1000000loops=1)  ->  Hash  (cost=0.20..0.20 rows=10 width=100) (actual time=0.085..0.085 rows=10 loops=1)
Buckets:1024  Batches: 1  Memory Usage: 1kB        ->  Foreign Scan on csv_branches b  (cost=0.00..0.20 rows=10
width=100)(actual time=0.027..0.056 rows=10 loops=1)Total runtime: 9690.686 ms
 
(7 rows)

postgres=# explain analyze select * from pgbench_accounts a, pgbench_branches b where a.bid = b.bid;
                                        QUERY PLAN
 

--------------------------------------------------------------------------------------------------------------------------------------Hash
Join (cost=1.23..40145.22 rows=1000000 width=197) (actual time=0.146..5693.883 rows=1000000 loops=1)  Hash Cond: (a.bid
=b.bid)  ->  Seq Scan on pgbench_accounts a  (cost=0.00..26394.00 rows=1000000 width=97) (actual time=0.073..1884.018
rows=1000000loops=1)  ->  Hash  (cost=1.10..1.10 rows=10 width=100) (actual time=0.048..0.048 rows=10 loops=1)
Buckets:1024  Batches: 1  Memory Usage: 1kB        ->  Seq Scan on pgbench_branches b  (cost=0.00..1.10 rows=10
width=100)(actual time=0.003..0.021 rows=10 loops=1)Total runtime: 7333.713 ms
 
(7 rows)
==============================

Forced Nested Loop uses Materialize node as expected.

==============================
postgres=# set enable_hashjoin = false;
SET
postgres=# explain select * from csv_accounts a, csv_branches b where a.bid = b.bid;
QUERYPLAN
 
-----------------------------------------------------------------------------------Nested Loop  (cost=0.00..181717.23
rows=1000000width=197)  Join Filter: (a.bid = b.bid)  ->  Foreign Scan on csv_accounts a  (cost=0.00..31717.00
rows=1000000width=97)  ->  Materialize  (cost=0.00..0.25 rows=10 width=100)        ->  Foreign Scan on csv_branches b
(cost=0.00..0.20rows=10 width=100)
 
(5 rows)

postgres=# explain select * from pgbench_accounts a, pgbench_branches b where a.bid = b.bid;
      QUERY PLAN
 
-----------------------------------------------------------------------------------Nested Loop  (cost=0.00..176395.12
rows=1000000width=197)  Join Filter: (a.bid = b.bid)  ->  Seq Scan on pgbench_accounts a  (cost=0.00..26394.00
rows=1000000width=97)  ->  Materialize  (cost=0.00..1.15 rows=10 width=100)        ->  Seq Scan on pgbench_branches b
(cost=0.00..1.10rows=10 width=100)
 
(5 rows)
==============================

ISTM that new interface which is called from ANALYZE would help to
update statistics of foreign talbes.  If we could leave sampling
argorythm to FDWs, acquire_sample_rows() might fit for that purpose.

If a FDW doesn't provide analyze handler, postgres might be able to
execute "SELECT * FROM foreign_table LIMIT sample_num" internally to
get sample rows.

Regards,
--
Shigeru Hanada

pgsql-hackers by date:

From: KaiGai Kohei
Date: 14 January 2011, 03:35:52
Subject: Re: [PATCH] Return command tag 'REPLACE X' for CREATE OR REPLACE statements.

From: Itagaki Takahiro
Date: 14 January 2011, 05:09:14
Subject: Re: SQL/MED - file_fdw

Re: SQL/MED - file_fdw - Mailing list pgsql-hackers

Previous

Next