Re: SQL/MED - file_fdw - Mailing list pgsql-hackers

From Shigeru HANADA
Subject Re: SQL/MED - file_fdw
Date
Msg-id 20110113190056.828D.6989961C@metrosystems.co.jp
Whole thread Raw
In response to Re: SQL/MED - file_fdw  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Responses Re: SQL/MED - file_fdw  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
List pgsql-hackers
On Fri, 7 Jan 2011 10:57:17 +0900
Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:
> On Mon, Dec 20, 2010 at 20:42, Itagaki Takahiro
> <itagaki.takahiro@gmail.com> wrote:
> > I added comments and moved some setup codes for COPY TO to BeginCopyTo()
> > for maintainability. CopyTo() still contains parts of initialization,
> > but I've not touched it yet because we don't need the arrangement for now.
>
> I updated the COPY FROM API patch.
> - GetCopyExecutorState() is removed because FDWs will use their own context.

I rebased file_fdw patch to recent copy_export patch, and have some
comments.

> The patch just rearranges codes for COPY FROM to export those functions.
> It also modifies some of COPY TO codes internally for code readability.
> - BeginCopyFrom(rel, filename, attnamelist, options)
> - EndCopyFrom(cstate)
> - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid)
> - CopyFromErrorCallback(arg)

This API set seems to be enough to implement file_fdw using COPY
routines.

But EndCopyFrom() seems not to be able to release memory which is
allocated in BeginCopy() and BeginCopyFrom().  I found this behavior
by executing a query which generates nested loop plan (outer 1000000
row * inner 10 row), and at last postgres grows up to 300MB+ from
108MB (VIRT of top command).

Attached patch would avoid this leak by adding per-copy context to
CopyState.  This would be overkill, and ResetCopyFrom() might be
reasonable though.

Anyway, I couldn't find performance degrade with this patch (tested on
my Linux box).

==============
# csv_accounts and csv_branches are generated by:
1) pgbench -i -s 10
2) COPY pgbench_accounts to '/path/to/accounts.csv' WITH CSV;
3) COPY pgbench_branches to '/path/to/branches.csv' WITH CSV;

<Original (There is no memory swap during measurement) >
postgres=# explain analyze select * from csv_accounts b, csv_branches t where t.bid = b.bid;
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..11717.01 rows=1 width=200) (actual time=0.300..100833.057 rows=1000000 loops=1)
   Join Filter: (b.bid = t.bid)
   ->  Foreign Scan on csv_accounts b  (cost=0.00..11717.00 rows=1 width=100) (actual time=0.148..4437.595 rows=1000000
loops=1)
   ->  Foreign Scan on csv_branches t  (cost=0.00..0.00 rows=1 width=100) (actual time=0.014..0.039 rows=10
loops=1000000)
 Total runtime: 102882.308 ms
(5 rows)

<Patched, Using per-copy context to release memory>
postgres=# explain analyze select * from csv_accounts b, csv_branches t where t.bid = b.bid;
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..11717.01 rows=1 width=200) (actual time=0.226..100931.864 rows=1000000 loops=1)
   Join Filter: (b.bid = t.bid)
   ->  Foreign Scan on csv_accounts b  (cost=0.00..11717.00 rows=1 width=100) (actual time=0.085..4439.777 rows=1000000
loops=1)
   ->  Foreign Scan on csv_branches t  (cost=0.00..0.00 rows=1 width=100) (actual time=0.015..0.039 rows=10
loops=1000000)
 Total runtime: 102684.276 ms
(5 rows)
==============

This memory leak would not be problem when using from COPY command
because it handles only one CopyState in a query, and it will be
cleaned up with parent context.

> Some items to be considered:
> - BeginCopyFrom() could receive filename as an option instead of a separated
> argument. If do so, file_fdw would be more simple, but it's a change only for
> file_fdw. COPY commands in the core won't be improved at all.

ISTM that current design would be better.

> - NextCopyFrom() returns values/nulls arrays rather than a HeapTuple. I expect
> the caller store the result into tupletableslot with ExecStoreVirtualTuple().
> It is designed for performance, but if the caller always needs an materialized
> HeapTuple, HeapTuple is better for the result type.

I tried to add tableoid to TupleTableSlot as tts_tableoid, but it
seems to make codes such as slot_getaddr() and other staff tricky.

How about to implement using materialized tuples to avoid unnecessary
(at least for functionality) changes.  I would like to send this
virtual-tuple-optimization to next development cycle because it would
not effect the interface heavily.  I'll post materialized-tuple
version of foreign_scan patch soon.

Regards,
--
Shigeru Hanada

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: pg_ctl failover Re: Latches, signals, and waiting
Next
From: Joel Jacobson
Date:
Subject: Possible bug in pg_settings/pg_depend