Re: SQL/MED - file_fdw - Mailing list pgsql-hackers

From Shigeru HANADA
Subject Re: SQL/MED - file_fdw
Date
Msg-id 20110111182013.BD6F.6989961C@metrosystems.co.jp
Whole thread Raw
In response to Re: SQL/MED - file_fdw  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 10 Jan 2011 19:26:11 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Shigeru HANADA <hanada@metrosystems.co.jp> writes:
> > For the purpose of file_fdw, additional ResetCopyFrom() would be
> > necessary. I'm planning to include such changes in file_fdw patch. 
> > Please find attached partial patch for ResetCopyFrom(). Is there
> > anything else which should be done at reset?
> 
> Seems like it would be smarter to close and re-open the copy operation.
> Adding a reset function is just creating an additional maintenance
> burden and point of failure, for what seems likely to be a negligible
> performance benefit.

Agreed. fileReScan can be implemented with close/re-open with storing
some additional information into FDW private area. I would withdraw
the proposal.

> If you think it's not negligible, please show some proof of that before
> asking us to support such code.

Anyway, I've measured overhead of re-open with executing query
including inner join between foreign tables copied from pgbench schema. 
I used SELECT statement below:
   EXPLAIN (ANALYZE) SELECT count(*) FROM csv_accounts a JOIN   csv_branches b ON (b.bid = a.bid);

On the average of (Nested Loop - (Foreign Scan * 2)), overhead of
re-open is round 0.048ms per tuple (average of 3 times measurement).

After the implementation of file_fdw, I'm going to measure again. If
ResetCopyFrom significantly improves performance of ReScan, I'll
propose it as a separate patch.

=========================================================================

The results of EXPLAIN ANALYZE are:

[using ResetCopyFrom]                                                                QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=11717.02..11717.03 rows=1 width=0) (actual time=73357.655..73357.657 rows=1 loops=1)  ->  Nested Loop
(cost=0.00..11717.01rows=1 width=0) (actual time=0.209..71424.059 rows=1000000 loops=1)        ->  Foreign Scan on
public.csv_accountsa  (cost=0.00..11717.00 rows=1 width=4) (actual time=0.144..6998.497 rows=1000000 loops=1)        ->
Foreign Scan on public.csv_branches b  (cost=0.00..0.00 rows=1 width=4) (actual time=0.008..0.037 rows=10
loops=1000000)Totalruntime: 73358.135 ms
 
(11 rows)

[using EndCopyFrom + BeginCopyFrom]                                                                QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=11717.02..11717.03 rows=1 width=0) (actual time=120724.138..120724.140 rows=1 loops=1)  ->  Nested Loop
(cost=0.00..11717.01rows=1 width=0) (actual time=0.321..118583.681 rows=1000000 loops=1)        ->  Foreign Scan on
public.csv_accountsa  (cost=0.00..11717.00 rows=1 width=4) (actual time=0.156..7208.968 rows=1000000 loops=1)        ->
Foreign Scan on public.csv_branches b  (cost=0.00..0.00 rows=1 width=4) (actual time=0.016..0.046 rows=10
loops=1000000)Totalruntime: 121118.792 ms
 
(11 rows)

Time: 121122.205 ms

=========================================================================

Regards,
--
Shigeru Hanada




pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: WIP: RangeTypes
Next
From: Simon Riggs
Date:
Subject: Re: ALTER TYPE 0: Introduction; test cases