Thread: SQL/MED - file_fdw
Hi, hackers, Attached is a patch that adds file_fdw, FDW which reads records from files on the server side, as a contrib module. This patch is based on "SQL/MED core functionality" patch. [SQL/MED - core functionality] http://archives.postgresql.org/pgsql-hackers/2010-11/msg01698.php File_fdw can be installed with the steps similar to other contrib modules, and you can create FDW with the script: $SHAREDIR/contrib/file_fdw.sql Note that you need to create file_fdw for each database. Document for file_fdw is included in the patch, although the contents might not be enough. Any comments and questions are welcome. Regards, -- Shigeru Hanada
Attachment
On Thu, 25 Nov 2010 17:12:44 +0900 Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > Attached is a patch that adds file_fdw, FDW which reads records from > files on the server side, as a contrib module. This patch is based on > "SQL/MED core functionality" patch. > > [SQL/MED - core functionality] > http://archives.postgresql.org/pgsql-hackers/2010-11/msg01698.php I'm going to add new CommitFest items for this patch and "SQL/MED - postgresql_fdw" patch which have been split from "SQL/MED" patch. Can I add them to CF 2010-11 which original "SQL/MED" item is in? Or should I add them to CF 2011-01? Regards, -- Shigeru Hanada
On Thu, Nov 25, 2010 at 05:51:11PM +0900, Shigeru HANADA wrote: > On Thu, 25 Nov 2010 17:12:44 +0900 > Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > > Attached is a patch that adds file_fdw, FDW which reads records from > > files on the server side, as a contrib module. This patch is based on > > "SQL/MED core functionality" patch. > > > > [SQL/MED - core functionality] > > http://archives.postgresql.org/pgsql-hackers/2010-11/msg01698.php > > I'm going to add new CommitFest items for this patch and "SQL/MED - > postgresql_fdw" patch which have been split from "SQL/MED" patch. Can > I add them to CF 2010-11 which original "SQL/MED" item is in? Or > should I add them to CF 2011-01? The original. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, 25 Nov 2010 18:40:09 -0800 David Fetter <david@fetter.org> wrote: > On Thu, Nov 25, 2010 at 05:51:11PM +0900, Shigeru HANADA wrote: > > I'm going to add new CommitFest items for this patch and "SQL/MED - > > postgresql_fdw" patch which have been split from "SQL/MED" patch. Can > > I add them to CF 2010-11 which original "SQL/MED" item is in? Or > > should I add them to CF 2011-01? > > The original. Thanks, added them to CF 2010-11. -- Shigeru Hanada
On 11/25/2010 03:12 AM, Shigeru HANADA wrote: > Hi, hackers, > > Attached is a patch that adds file_fdw, FDW which reads records from > files on the server side, as a contrib module. This patch is based on > "SQL/MED core functionality" patch. > > [SQL/MED - core functionality] > http://archives.postgresql.org/pgsql-hackers/2010-11/msg01698.php > > File_fdw can be installed with the steps similar to other contrib > modules, and you can create FDW with the script: > $SHAREDIR/contrib/file_fdw.sql > Note that you need to create file_fdw for each database. > > Document for file_fdw is included in the patch, although the contents > might not be enough. > > Any comments and questions are welcome. Looking at file_parser.c, it seems to be largely taken from copy.c. Wouldn't it be better to call those functions, or refactor them so they are callable if necessary? cheers andrew
On Sun, Dec 5, 2010 at 07:24, Andrew Dunstan <andrew@dunslane.net> wrote: > Looking at file_parser.c, it seems to be largely taken from copy.c. Wouldn't > it be better to call those functions, or refactor them so they are callable > if necessary? We could export private functions and structs in copy.c, though details of the implementation should be kept in copy.c. How about splitting the file_fdw patch into two pieces? One exports the copy functions from the core, and another implements file_fdw using the infrastructure. -- Itagaki Takahiro
On 12/04/2010 11:11 PM, Itagaki Takahiro wrote: > On Sun, Dec 5, 2010 at 07:24, Andrew Dunstan<andrew@dunslane.net> wrote: >> Looking at file_parser.c, it seems to be largely taken from copy.c. Wouldn't >> it be better to call those functions, or refactor them so they are callable >> if necessary? > We could export private functions and structs in copy.c, > though details of the implementation should be kept in copy.c. > > How about splitting the file_fdw patch into two pieces? > One exports the copy functions from the core, and another > implements file_fdw using the infrastructure. > Yes please. cheers andrew
2010/11/25 Shigeru HANADA <hanada@metrosystems.co.jp>: > Hi, hackers, > > Attached is a patch that adds file_fdw, FDW which reads records from > files on the server side, as a contrib module. This patch is based on > "SQL/MED core functionality" patch. > > [SQL/MED - core functionality] > http://archives.postgresql.org/pgsql-hackers/2010-11/msg01698.php > > File_fdw can be installed with the steps similar to other contrib > modules, and you can create FDW with the script: > $SHAREDIR/contrib/file_fdw.sql > Note that you need to create file_fdw for each database. > > Document for file_fdw is included in the patch, although the contents > might not be enough. > > Any comments and questions are welcome. I think it is better to add encoding option to FileFdwOption. In the patch the encoding of file is assumed as client_encoding, but we may want to SELECT from different-encoded csv in a query. Apart from the issue with fdw, I've been thinking client_encoding for COPY is not appropriate in any way. client_encoding is the encoding of the statement the client sends, not the COPY target which is on the server's filesystem. Adding encoding option to COPY will eliminate allowEncodingChanges option from JDBC driver. Regards, -- Hitoshi Harada
On Mon, Dec 6, 2010 at 5:48 AM, Hitoshi Harada <umi.tanuki@gmail.com> wrote: > I think it is better to add encoding option to FileFdwOption. In the > patch the encoding of file is assumed as client_encoding, but we may > want to SELECT from different-encoded csv in a query. > > Apart from the issue with fdw, I've been thinking client_encoding for > COPY is not appropriate in any way. client_encoding is the encoding of > the statement the client sends, not the COPY target which is on the > server's filesystem. Adding encoding option to COPY will eliminate > allowEncodingChanges option from JDBC driver. Yeah, this point has been raised before, and I agree with it. I haven't heard anyone who speaks a European language complain about this, but it seems to keep coming up for Japanese speakers. I am guessing that means that using multiple encodings is fairly common in Japan. I typically don't run into anything other than UTF-8 and Latin-1, which are mostly compatible especially if you're an English speaker, but if it weren't for that happy coincidence I think this would be quite annoying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/04/2010 11:11 PM, Itagaki Takahiro wrote: > On Sun, Dec 5, 2010 at 07:24, Andrew Dunstan<andrew@dunslane.net> wrote: >> Looking at file_parser.c, it seems to be largely taken from copy.c. Wouldn't >> it be better to call those functions, or refactor them so they are callable >> if necessary? > We could export private functions and structs in copy.c, > though details of the implementation should be kept in copy.c. > > How about splitting the file_fdw patch into two pieces? > One exports the copy functions from the core, and another > implements file_fdw using the infrastructure. > Who is actually going to do this split? cheers andrew
On Sat, Dec 11, 2010 at 05:30, Andrew Dunstan <andrew@dunslane.net> wrote: > On 12/04/2010 11:11 PM, Itagaki Takahiro wrote: >> One exports the copy functions from the core, and another >> implements file_fdw using the infrastructure. > > Who is actually going to do this split? I'm working for it :-) I extract those functions from copy.c: - CopyState BeginCopyFrom(Relation rel, const char *filename, List *attnamelist, List *options); - void EndCopyFrom(CopyState cstate); - bool NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *oid); There was Reset() in file_fdw, but it is not contained in the patch. It will be added again if required, but I wonder we might need not only reset but also mark/restore a position in a file. -- Itagaki Takahiro
Attachment
On 12/13/2010 01:31 AM, Itagaki Takahiro wrote: > On Sat, Dec 11, 2010 at 05:30, Andrew Dunstan<andrew@dunslane.net> wrote: >> On 12/04/2010 11:11 PM, Itagaki Takahiro wrote: >>> One exports the copy functions from the core, and another >>> implements file_fdw using the infrastructure. >> Who is actually going to do this split? > I'm working for it :-) I extract those functions from copy.c: > > - CopyState BeginCopyFrom(Relation rel, const char *filename, > List *attnamelist, List *options); > - void EndCopyFrom(CopyState cstate); > - bool NextCopyFrom(CopyState cstate, > Datum *values, bool *nulls, Oid *oid); > > There was Reset() in file_fdw, but it is not contained in the > patch. It will be added again if required, but I wonder we might > need not only reset but also mark/restore a position in a file. > Hmm. I don't think that's going to expose enough for what I want to be able to do. I actually had in mind exposing lower level routines like CopyReadAttibutesCSV/CopyReadAttributesText and allowing the Foreign Data Wrapper to manipulate the raw values read (for example from an irregularly shaped CSV file). cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Hmm. I don't think that's going to expose enough for what I want to be > able to do. I actually had in mind exposing lower level routines like > CopyReadAttibutesCSV/CopyReadAttributesText and allowing the Foreign > Data Wrapper to manipulate the raw values read (for example from an > irregularly shaped CSV file). I think that exposing the guts of COPY to the open air is a bad idea. We refactor that code for performance or other reasons every release or two. I don't want to find us tied down to the current implementation because we're afraid of breaking third-party FDWs. regards, tom lane
On 12/13/2010 11:12 AM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> Hmm. I don't think that's going to expose enough for what I want to be >> able to do. I actually had in mind exposing lower level routines like >> CopyReadAttibutesCSV/CopyReadAttributesText and allowing the Foreign >> Data Wrapper to manipulate the raw values read (for example from an >> irregularly shaped CSV file). > I think that exposing the guts of COPY to the open air is a bad idea. > We refactor that code for performance or other reasons every release or > two. I don't want to find us tied down to the current implementation > because we're afraid of breaking third-party FDWs. > In that case I guess I'll need to do what Shigeru-san has done, and copy large parts of copy.c. cheers andrew
On Tue, Dec 14, 2010 at 01:25, Andrew Dunstan <andrew@dunslane.net> wrote: > On 12/13/2010 11:12 AM, Tom Lane wrote: >> I think that exposing the guts of COPY to the open air is a bad idea. I don't want to export the details, too. > In that case I guess I'll need to do what Shigeru-san has done, and copy > large parts of copy.c. I found file_fdw would require the executor state in CopyState and the error callback function. I revised the patch to export them. Now 5 functions are exported from copy.c: - BeginCopyFrom(rel, filename, attnamelist, options) : CopyState - EndCopyFrom(cstate) : void - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid) : bool - GetCopyExecutorState(cstate) : EState * - CopyFromErrorCallback(arg) Are they enough, Shigeru-san? Note that the internal CopyFrom() is now implemented only with them, so I think file_fdw is also possible. BTW, we might have another choice instead of GetCopyExecutorState() because the EState will be used only for ResetPerTupleExprContext() in file_fdw. If NextCopyFrom() returns a HeapTuple instead of values and nulls arrays, we could hide EState in NextCopyFrom(). -- Itagaki Takahiro
Attachment
On Tue, 14 Dec 2010 12:01:36 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Dec 14, 2010 at 01:25, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 12/13/2010 11:12 AM, Tom Lane wrote: > > In that case I guess I'll need to do what Shigeru-san has done, and copy > > large parts of copy.c. > > I found file_fdw would require the executor state in CopyState and > the error callback function. I revised the patch to export them. > Now 5 functions are exported from copy.c: > > - BeginCopyFrom(rel, filename, attnamelist, options) : CopyState > - EndCopyFrom(cstate) : void > - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid) : bool > - GetCopyExecutorState(cstate) : EState * > - CopyFromErrorCallback(arg) > > Are they enough, Shigeru-san? Note that the internal CopyFrom() is > now implemented only with them, so I think file_fdw is also possible. In addition to above, ResetCopyFrom() is necessary to support nested loops which inner node is a ForeignScan. On the other hand, I think that MarkPos()/RestrPos() wouldn't be necessary until ForeignScan supports ordered output. ForeignScan can't become direct child of MergeJoin because ForeignScan is not an ordered scan, at least in current SQL/MED implementation. Regards, -- Shigeru Hanada
On Tue, Dec 14, 2010 at 15:31, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: >> - BeginCopyFrom(rel, filename, attnamelist, options) : CopyState >> - EndCopyFrom(cstate) : void >> - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid) : bool >> - GetCopyExecutorState(cstate) : EState * >> - CopyFromErrorCallback(arg) >> >> Are they enough, Shigeru-san? Note that the internal CopyFrom() is >> now implemented only with them, so I think file_fdw is also possible. > > In addition to above, ResetCopyFrom() is necessary to support nested > loops which inner node is a ForeignScan. I think you can add ResetCopyFrom() to the core in your next file_fdw patch because the function is not used by COPY command. I'll note other differences between the API and your FileState: - There are no superuser checks in the exported functions because the restriction should be only at CREATE/ALTER FOREIGNTABLE. If the superuser grants SELECT privileges to normal users, they should be able to read the file contents. (Butwe might need to hide the file path.) - errcontext and values/nulls arrays are not included in CopyState. They will be additionally kept in a holder of the CopyState. - You need to pass non-NULL filename. If it is NULL, the server tries to read data from the client. - The framework supports to read dumped binary files and files with OIDs. If you don't want to support them, please checkparameters not to include those options. -- Itagaki Takahiro
Hi hackers, Attached is the revised WIP version of file_fdw patch. This patch should be applied after both of fdw_syntax and fdw_scan patches, which have been posted to another thread "SQL/MED - core functionality". In this version, file_fdw consists of two parts, file_fdw core part and copy of COPY FROM codes as they were in last version. The reason of this form is to make it possible to test actual SELECT statement ASAP. I'll revise file_fdw again according to Itagaki-san's export-copy-routines patch. Note that this version of file_fdw doesn't support force_not_null option because column-level generic option is not supported by current fdw_syntax. It will be available if column-level generic option is implemented. And, as possible implementation of FDW-specific EXPLAIN information, EXPLAIN SELECT xxx FROM file shows name and size of the file. It may be better to hide file information if the user was not superuser for security reason. If so, filename option should not appear in output of \det psql command too. Regards, -- Shigeru Hanada
Attachment
On Tue, 14 Dec 2010 15:51:18 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Dec 14, 2010 at 15:31, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > > In addition to above, ResetCopyFrom() is necessary to support nested > > loops which inner node is a ForeignScan. > > I think you can add ResetCopyFrom() to the core in your next file_fdw > patch because the function is not used by COPY command. Agreed. I tried your patch with adding ResetCopyFrom() to copy.c, and found the patch works fine for superuser at least. > I'll note other differences between the API and your FileState: > > - There are no superuser checks in the exported functions because > the restriction should be only at CREATE/ALTER FOREIGN TABLE. > If the superuser grants SELECT privileges to normal users, they > should be able to read the file contents. > (But we might need to hide the file path.) > - errcontext and values/nulls arrays are not included in CopyState. > They will be additionally kept in a holder of the CopyState. > - You need to pass non-NULL filename. If it is NULL, the server > tries to read data from the client. > - The framework supports to read dumped binary files and files > with OIDs. If you don't want to support them, please check > parameters not to include those options. All differences above wouldn't be serious problem, but I worry about difference between file_fdw and COPY FROM. "COPY FROM" is a command which INSERT data from a file essentially, so it requires RowExclusiveLock on the target table. On the other hand, file_fdw is a feature which reads data from a file through a table, so it requires AccessShareLock on the source table. Current export_copy patch doesn't allow non-superusers to fetch data from files because BeginCopy() acquires RowExclusiveLock when SELECTing from file_fdw table. Using COPY routines from file_fdw might need another kind of modularization, such as split file operation from COPY module and use it from both of COPY and file_fdw. But it would require more code work, and possibly performance tests. Regards, -- Shigeru Hanada
On Thu, Dec 16, 2010 at 18:45, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > "COPY FROM" is a command which INSERT data from a file essentially, > so it requires RowExclusiveLock on the target table. On the other > hand, file_fdw is a feature which reads data from a file through a > table, so it requires AccessShareLock on the source table. Ah, I found my bug in BeginCopy(), but it's in the usage of ExecCheckRTPerms() rather than RowExclusiveLock, right? The target relation should have been opened and locked by the caller. I think we can move the check to DoCopy() as like as checking for superuser(). In my understanding, we don't have to check permissions in each FDW because it was done in parse and analyze phases. Could you fix it? Or, shall I do? > Using COPY routines from file_fdw might need another kind of > modularization, such as split file operation from COPY module and use > it from both of COPY and file_fdw. But it would require more code work, > and possibly performance tests. My plan was that the 'rel' argument for BeginCopyFrom() is a "template" for the CSV file. So, we need only AccessShareLock (or, NoLock?) for the relation. TupleDesc might be enough for the purpose, but I've not changed the type because of DEFAULT columns. OTOH, CopyFrom(cstate, rel) will require an additional 'rel' argument, that means the "target" to be inserted, though it's always same with the above "template" in COPY FROM. RowExclusiveLock is required only for the target relation. -- Itagaki Takahiro
On Thu, Dec 16, 2010 at 5:35 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > Ah, I found my bug in BeginCopy(), but it's in the usage of > ExecCheckRTPerms() rather than RowExclusiveLock, right? > The target relation should have been opened and locked by the caller. > I think we can move the check to DoCopy() as like as checking for > superuser(). In my understanding, we don't have to check permissions > in each FDW because it was done in parse and analyze phases. > Could you fix it? Or, shall I do? I believe that our project policy is that permissions checks must be done at execution time, not parse/plan time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 16, 2010 at 23:09, Robert Haas <robertmhaas@gmail.com> wrote: > I believe that our project policy is that permissions checks must be > done at execution time, not parse/plan time. Oops, yes. I should have said "permission checks for foreign tables should have done in their own execution". So, additional checks in each FDW are not required eventually. In addition, we allow users to read the definition of the columns and default values even if they don't have SELECT permission. So, I still think permission checks for the template relation are not required in the file reader API. But we need the checks in COPY FROM command because the relation is used not only as a template but also as a target. => SELECT * FROM tbl; ERROR: permission denied for relation tbl => \d+ tbl Table "public.tbl"Column | Type | Modifiers | Storage | Description --------+---------+-----------+---------+-------------i | integer | | plain |j | integer | default5 | plain | Has OIDs: no -- Itagaki Takahiro
On Thu, 16 Dec 2010 19:35:56 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Thu, Dec 16, 2010 at 18:45, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > > "COPY FROM" is a command which INSERT data from a file essentially, > > so it requires RowExclusiveLock on the target table. On the other > > hand, file_fdw is a feature which reads data from a file through a > > table, so it requires AccessShareLock on the source table. > > Ah, I found my bug in BeginCopy(), but it's in the usage of > ExecCheckRTPerms() rather than RowExclusiveLock, right? > The target relation should have been opened and locked by the caller. The target foreign tables are locked in InitForeignScan() via ExecOpenScanRelation(), so COPY routines don't need to lock it again. > I think we can move the check to DoCopy() as like as checking for > superuser(). In my understanding, we don't have to check permissions > in each FDW because it was done in parse and analyze phases. In addition, ISTM that the check for read-only transaction should be moved to DoCopy() because file_fdw scan can be used in read-only transacntion. > Could you fix it? Or, shall I do? I've just moved permission check and read-only check from BeginCopy() to DoCopy(). Please see attached patch. Regards, -- Shigeru Hanada
Attachment
On Fri, Dec 17, 2010 at 11:49, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > I've just moved permission check and read-only check from BeginCopy() > to DoCopy(). Please see attached patch. Thanks! Are there any objections for the change? If acceptable, I'd like to apply it prior to SQL/MED and file_fdw patches. We could have some better name for Begin/Next/EndCopyFrom() and the exported CopyState. As Shigeru mentioned, COPY FROM consists of "a file reader" and "a heap inserter", but file_fdw only uses the file reader. In addition, we could split CopyState into two versions for COPY FROM and COPY TO later. So, it might be better to export them as "FileReader API" or some similar names rather than CopyFrom. -- Itagaki Takahiro
On Fri, Dec 17, 2010 at 11:01 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Fri, Dec 17, 2010 at 11:49, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: >> I've just moved permission check and read-only check from BeginCopy() >> to DoCopy(). Please see attached patch. > > Thanks! > > Are there any objections for the change? If acceptable, > I'd like to apply it prior to SQL/MED and file_fdw patches. I think at a bare minimum the functions you're adding should have a comment explaining what they do, what their arguments mean, etc. I'm sort of suspicious of the fact that BeginCopyTo() is a shell around BeginCopy() while BeginCopyFrom() does a whole bunch of other stuff. I haven't grokked what the code is doing here well enough to have a concrete proposal though... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Dec 19, 2010 at 12:18, Robert Haas <robertmhaas@gmail.com> wrote: > I'm sort of suspicious of the fact that BeginCopyTo() is a shell > around BeginCopy() while BeginCopyFrom() does a whole bunch of other > stuff. I haven't grokked what the code is doing here well enough to > have a concrete proposal though... I added Begin/EndCopyTo() just because the internal code looks symmetric. The proposal doesn't change behaviors of COPY commands at all. It just exports a part of COPY FROM codes as "File Reader" so that the file_fdw external module can reuse the code. I believe we have the conclusion that we should avoid code duplication to read files in the prior discussion. We could arrange COPY TO codes as like as the COPY FROM APIs, but I've not and I won't do that at this time because it is not required by SQL/MED at all. If we do, it would be "File Writer" APIs, like: cstate = BeginCopyTO(...); while (tuple = ReadTupleFromSomewhere()) { /* write the tuple into a TSV/CSV file */ NextCopyTo(cstate, tuple); } EndCopyTo(cstate); -- Itagaki Takahiro
On Sat, Dec 18, 2010 at 10:43 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Sun, Dec 19, 2010 at 12:18, Robert Haas <robertmhaas@gmail.com> wrote: >> I'm sort of suspicious of the fact that BeginCopyTo() is a shell >> around BeginCopy() while BeginCopyFrom() does a whole bunch of other >> stuff. I haven't grokked what the code is doing here well enough to >> have a concrete proposal though... > > I added Begin/EndCopyTo() just because the internal code looks > symmetric. The proposal doesn't change behaviors of COPY commands > at all. It just exports a part of COPY FROM codes as "File Reader" > so that the file_fdw external module can reuse the code. I believe > we have the conclusion that we should avoid code duplication > to read files in the prior discussion. I'm not questioning any of that. But I'd like the resulting code to be as maintainable as we can make it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Dec 19, 2010 at 12:45, Robert Haas <robertmhaas@gmail.com> wrote: > I'm not questioning any of that. But I'd like the resulting code to > be as maintainable as we can make it. I added comments and moved some setup codes for COPY TO to BeginCopyTo() for maintainability. CopyTo() still contains parts of initialization, but I've not touched it yet because we don't need the arrangement for now. -- Itagaki Takahiro
Attachment
On Mon, 20 Dec 2010 20:42:38 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Sun, Dec 19, 2010 at 12:45, Robert Haas <robertmhaas@gmail.com> wrote: > > I'm not questioning any of that. But I'd like the resulting code to > > be as maintainable as we can make it. > > I added comments and moved some setup codes for COPY TO to BeginCopyTo() > for maintainability. CopyTo() still contains parts of initialization, > but I've not touched it yet because we don't need the arrangement for now. Attached is the revised version of file_fdw patch. This patch is based on Itagaki-san's copy_export-20101220.diff patch. Changes from previous version are: * file_fdw uses CopyErrorCallback() as error context callback routine in fileIterate() to report error context. "CONTEXT" line in the example below is added by the callback. postgres=# select * From csv_tellers_bad; ERROR: missing data for column "bid" CONTEXT: COPY csv_tellers_bad, line 10: "10" postgres=# * Only superusers can change table-level file_fdw options. Normal user can't change the options even if the user was the owner of the table. This is for security reason. Regards, -- Shigeru Hanada
Attachment
On Tue, Dec 21, 2010 at 20:14, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > Attached is the revised version of file_fdw patch. This patch is > based on Itagaki-san's copy_export-20101220.diff patch. #1. Don't you have per-tuple memory leak? I added GetCopyExecutorState() because the caller needs to reset the per-tuple context periodically. Or, if you eventually make a HeapTuple from values and nulls arrays, you could modify NextCopyFrom() to return a HeapTuple instead of values, nulls, and tupleOid. The reason I didn't use HeapTuple is that I've seen arrays were used in the proposed FDW APIs. But we don't have to use such arrays if you use HeapTuple based APIs. IMHO, I prefer HeapTuple because we can simplify NextCopyFrom and keep EState private in copy.c. #2. Can you avoid making EXPLAIN text in fplan->explainInfo on non-EXPLAIN cases? It's a waste of CPU cycles in normal executions. I doubt whether FdwPlan.explainInfo field is the best design. How do we use the EXPLAIN text for XML or JSON explain formats? Instead, we could have an additional routine for EXPLAIN. #3. Why do you re-open a foreign table in estimate_costs() ? Since the caller seems to have the options for them, you can pass them directly, no? In addition, passing a half-initialized fplan to estimate_costs() is a bad idea. If you think it is an OUT parameter, the OUT params should be *startup_cost and *total_cost. #4. It'a minor cosmetic point, but our coding conventions would be we don't need (void *) when we cast a pointer to void *, but need (Type *) when we cast a void pointer to another type. -- Itagaki Takahiro
On Mon, Dec 20, 2010 at 6:42 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Sun, Dec 19, 2010 at 12:45, Robert Haas <robertmhaas@gmail.com> wrote: >> I'm not questioning any of that. But I'd like the resulting code to >> be as maintainable as we can make it. > > I added comments and moved some setup codes for COPY TO to BeginCopyTo() > for maintainability. CopyTo() still contains parts of initialization, > but I've not touched it yet because we don't need the arrangement for now. I haven't analyzed this enough to know whether I agree with it, but as a trivial matter you should certainly revert this hunk: /* field raw data pointers found by COPY FROM */ - - int max_fields; - char ** raw_fields; + int max_fields; + char **raw_fields; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 21, 2010 at 21:32, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Dec 21, 2010 at 20:14, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: >> Attached is the revised version of file_fdw patch. This patch is >> based on Itagaki-san's copy_export-20101220.diff patch. > > #1. Don't you have per-tuple memory leak? I added GetCopyExecutorState() > because the caller needs to reset the per-tuple context periodically. Sorry, I found there are no memory leak here. The related comment is: [execnodes.h]* CurrentMemoryContext should be set to ecxt_per_tuple_memory before* calling ExecEvalExpr() --- see ExecEvalExprSwitchContext(). I guess CurrentMemoryContext in Iterate callback a per-tuple context. So, we don't have to xport cstate->estate via GetCopyExecutorState(). > Or, if you eventually make a HeapTuple from values and nulls arrays, ExecStoreVirtualTuple() seems to be better than the combination of heap_form_tuple() and ExecStoreTuple() for the purpose. Could you try to use slot->tts_values and slot->tts_isnull for NextCopyFrom() directly? -- Itagaki Takahiro
On Fri, 24 Dec 2010 11:09:16 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Dec 21, 2010 at 21:32, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: > > On Tue, Dec 21, 2010 at 20:14, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > >> Attached is the revised version of file_fdw patch. This patch is > >> based on Itagaki-san's copy_export-20101220.diff patch. > > > > #1. Don't you have per-tuple memory leak? I added GetCopyExecutorState() > > because the caller needs to reset the per-tuple context periodically. > > Sorry, I found there are no memory leak here. The related comment is: > [execnodes.h] > * CurrentMemoryContext should be set to ecxt_per_tuple_memory before > * calling ExecEvalExpr() --- see ExecEvalExprSwitchContext(). > I guess CurrentMemoryContext in Iterate callback a per-tuple context. > So, we don't have to xport cstate->estate via GetCopyExecutorState(). Iterate is called in query context, so GetCopyExecutorState() need to be exported to avoid memory leak happens in NextCopyFrom(). Or, enclosing context switching into NextCopyFrom() is better? Then, CopyFrom() would need to create tuples in Portal context and set shouldFree of ExecStoreTuple() true to free stored tuple at next call. Please try attached patch. > > Or, if you eventually make a HeapTuple from values and nulls arrays, > > ExecStoreVirtualTuple() seems to be better than the combination of > heap_form_tuple() and ExecStoreTuple() for the purpose. Could you try > to use slot->tts_values and slot->tts_isnull for NextCopyFrom() directly? Virtual tuple would be enough to carry column data, but virtual tuple doesn't have system attributes including tableoid... Regards, -- Shigeru Hanada
Attachment
On Fri, Dec 24, 2010 at 20:04, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > Iterate is called in query context, Is it an unavoidable requirement? If possible, I'd like to use per-tuple memory context as the default. We use per-tuple context in FunctionScan for SETOF functions. I hope we could have little difference between SRF and FDW APIs. > Virtual tuple would be enough to carry column data, but virtual tuple > doesn't have system attributes including tableoid... We could add tts_tableOid into TupleTableSlot. We'd better avoid materializing slot only for the tableoid support in foreign tables. Almost all of the foreign tables should have different data format from HeapTuple, including pgsql_fdw. -- Itagaki Takahiro
On Mon, Dec 20, 2010 at 20:42, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > I added comments and moved some setup codes for COPY TO to BeginCopyTo() > for maintainability. CopyTo() still contains parts of initialization, > but I've not touched it yet because we don't need the arrangement for now. I updated the COPY FROM API patch. - GetCopyExecutorState() is removed because FDWs will use their own context. The patch just rearranges codes for COPY FROM to export those functions. It also modifies some of COPY TO codes internally for code readability. - BeginCopyFrom(rel, filename, attnamelist, options) - EndCopyFrom(cstate) - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid) - CopyFromErrorCallback(arg) Some items to be considered: - BeginCopyFrom() could receive filename as an option instead of a separated argument. If do so, file_fdw would be more simple, but it's a change only for file_fdw. COPY commands in the core won't be improved at all. - NextCopyFrom() returns values/nulls arrays rather than a HeapTuple. I expect the caller store the result into tupletableslot with ExecStoreVirtualTuple(). It is designed for performance, but if the caller always needs an materialized HeapTuple, HeapTuple is better for the result type. -- Itagaki Takahiro
Attachment
On Fri, 7 Jan 2011 10:57:17 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > I updated the COPY FROM API patch. > - GetCopyExecutorState() is removed because FDWs will use their own context. > > The patch just rearranges codes for COPY FROM to export those functions. > It also modifies some of COPY TO codes internally for code readability. > - BeginCopyFrom(rel, filename, attnamelist, options) > - EndCopyFrom(cstate) > - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid) > - CopyFromErrorCallback(arg) For the purpose of file_fdw, additional ResetCopyFrom() would be necessary. I'm planning to include such changes in file_fdw patch. Please find attached partial patch for ResetCopyFrom(). Is there anything else which should be done at reset? > Some items to be considered: > - BeginCopyFrom() could receive filename as an option instead of a separated > argument. If do so, file_fdw would be more simple, but it's a change only for > file_fdw. COPY commands in the core won't be improved at all. > - NextCopyFrom() returns values/nulls arrays rather than a HeapTuple. I expect > the caller store the result into tupletableslot with ExecStoreVirtualTuple(). > It is designed for performance, but if the caller always needs an materialized > HeapTuple, HeapTuple is better for the result type. IIUC, materizlizing is for tableoid system column. If we could add tts_tableoid into TupleTableSlot, virtual tuple would be enough. In this design, caller can receive results with tts_values/tts_isnull arrays. Regards, -- Shigeru Hanada
Attachment
Shigeru HANADA <hanada@metrosystems.co.jp> writes: > For the purpose of file_fdw, additional ResetCopyFrom() would be > necessary. I'm planning to include such changes in file_fdw patch. > Please find attached partial patch for ResetCopyFrom(). Is there > anything else which should be done at reset? Seems like it would be smarter to close and re-open the copy operation. Adding a reset function is just creating an additional maintenance burden and point of failure, for what seems likely to be a negligible performance benefit. If you think it's not negligible, please show some proof of that before asking us to support such code. regards, tom lane
On Mon, 10 Jan 2011 19:26:11 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Shigeru HANADA <hanada@metrosystems.co.jp> writes: > > For the purpose of file_fdw, additional ResetCopyFrom() would be > > necessary. I'm planning to include such changes in file_fdw patch. > > Please find attached partial patch for ResetCopyFrom(). Is there > > anything else which should be done at reset? > > Seems like it would be smarter to close and re-open the copy operation. > Adding a reset function is just creating an additional maintenance > burden and point of failure, for what seems likely to be a negligible > performance benefit. Agreed. fileReScan can be implemented with close/re-open with storing some additional information into FDW private area. I would withdraw the proposal. > If you think it's not negligible, please show some proof of that before > asking us to support such code. Anyway, I've measured overhead of re-open with executing query including inner join between foreign tables copied from pgbench schema. I used SELECT statement below: EXPLAIN (ANALYZE) SELECT count(*) FROM csv_accounts a JOIN csv_branches b ON (b.bid = a.bid); On the average of (Nested Loop - (Foreign Scan * 2)), overhead of re-open is round 0.048ms per tuple (average of 3 times measurement). After the implementation of file_fdw, I'm going to measure again. If ResetCopyFrom significantly improves performance of ReScan, I'll propose it as a separate patch. ========================================================================= The results of EXPLAIN ANALYZE are: [using ResetCopyFrom] QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=11717.02..11717.03 rows=1 width=0) (actual time=73357.655..73357.657 rows=1 loops=1) -> Nested Loop (cost=0.00..11717.01rows=1 width=0) (actual time=0.209..71424.059 rows=1000000 loops=1) -> Foreign Scan on public.csv_accountsa (cost=0.00..11717.00 rows=1 width=4) (actual time=0.144..6998.497 rows=1000000 loops=1) -> Foreign Scan on public.csv_branches b (cost=0.00..0.00 rows=1 width=4) (actual time=0.008..0.037 rows=10 loops=1000000)Totalruntime: 73358.135 ms (11 rows) [using EndCopyFrom + BeginCopyFrom] QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=11717.02..11717.03 rows=1 width=0) (actual time=120724.138..120724.140 rows=1 loops=1) -> Nested Loop (cost=0.00..11717.01rows=1 width=0) (actual time=0.321..118583.681 rows=1000000 loops=1) -> Foreign Scan on public.csv_accountsa (cost=0.00..11717.00 rows=1 width=4) (actual time=0.156..7208.968 rows=1000000 loops=1) -> Foreign Scan on public.csv_branches b (cost=0.00..0.00 rows=1 width=4) (actual time=0.016..0.046 rows=10 loops=1000000)Totalruntime: 121118.792 ms (11 rows) Time: 121122.205 ms ========================================================================= Regards, -- Shigeru Hanada
On Fri, 7 Jan 2011 10:57:17 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Mon, Dec 20, 2010 at 20:42, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: > > I added comments and moved some setup codes for COPY TO to BeginCopyTo() > > for maintainability. CopyTo() still contains parts of initialization, > > but I've not touched it yet because we don't need the arrangement for now. > > I updated the COPY FROM API patch. > - GetCopyExecutorState() is removed because FDWs will use their own context. I rebased file_fdw patch to recent copy_export patch, and have some comments. > The patch just rearranges codes for COPY FROM to export those functions. > It also modifies some of COPY TO codes internally for code readability. > - BeginCopyFrom(rel, filename, attnamelist, options) > - EndCopyFrom(cstate) > - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid) > - CopyFromErrorCallback(arg) This API set seems to be enough to implement file_fdw using COPY routines. But EndCopyFrom() seems not to be able to release memory which is allocated in BeginCopy() and BeginCopyFrom(). I found this behavior by executing a query which generates nested loop plan (outer 1000000 row * inner 10 row), and at last postgres grows up to 300MB+ from 108MB (VIRT of top command). Attached patch would avoid this leak by adding per-copy context to CopyState. This would be overkill, and ResetCopyFrom() might be reasonable though. Anyway, I couldn't find performance degrade with this patch (tested on my Linux box). ============== # csv_accounts and csv_branches are generated by: 1) pgbench -i -s 10 2) COPY pgbench_accounts to '/path/to/accounts.csv' WITH CSV; 3) COPY pgbench_branches to '/path/to/branches.csv' WITH CSV; <Original (There is no memory swap during measurement) > postgres=# explain analyze select * from csv_accounts b, csv_branches t where t.bid = b.bid; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..11717.01 rows=1 width=200) (actual time=0.300..100833.057 rows=1000000 loops=1) Join Filter: (b.bid = t.bid) -> Foreign Scan on csv_accounts b (cost=0.00..11717.00 rows=1 width=100) (actual time=0.148..4437.595 rows=1000000 loops=1) -> Foreign Scan on csv_branches t (cost=0.00..0.00 rows=1 width=100) (actual time=0.014..0.039 rows=10 loops=1000000) Total runtime: 102882.308 ms (5 rows) <Patched, Using per-copy context to release memory> postgres=# explain analyze select * from csv_accounts b, csv_branches t where t.bid = b.bid; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..11717.01 rows=1 width=200) (actual time=0.226..100931.864 rows=1000000 loops=1) Join Filter: (b.bid = t.bid) -> Foreign Scan on csv_accounts b (cost=0.00..11717.00 rows=1 width=100) (actual time=0.085..4439.777 rows=1000000 loops=1) -> Foreign Scan on csv_branches t (cost=0.00..0.00 rows=1 width=100) (actual time=0.015..0.039 rows=10 loops=1000000) Total runtime: 102684.276 ms (5 rows) ============== This memory leak would not be problem when using from COPY command because it handles only one CopyState in a query, and it will be cleaned up with parent context. > Some items to be considered: > - BeginCopyFrom() could receive filename as an option instead of a separated > argument. If do so, file_fdw would be more simple, but it's a change only for > file_fdw. COPY commands in the core won't be improved at all. ISTM that current design would be better. > - NextCopyFrom() returns values/nulls arrays rather than a HeapTuple. I expect > the caller store the result into tupletableslot with ExecStoreVirtualTuple(). > It is designed for performance, but if the caller always needs an materialized > HeapTuple, HeapTuple is better for the result type. I tried to add tableoid to TupleTableSlot as tts_tableoid, but it seems to make codes such as slot_getaddr() and other staff tricky. How about to implement using materialized tuples to avoid unnecessary (at least for functionality) changes. I would like to send this virtual-tuple-optimization to next development cycle because it would not effect the interface heavily. I'll post materialized-tuple version of foreign_scan patch soon. Regards, -- Shigeru Hanada
Attachment
On Thu, Jan 13, 2011 at 19:00, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > But EndCopyFrom() seems not to be able to release memory which is > allocated in BeginCopy() and BeginCopyFrom(). I found this behavior > by executing a query which generates nested loop plan (outer 1000000 > row * inner 10 row), and at last postgres grows up to 300MB+ from > 108MB (VIRT of top command). > > Attached patch would avoid this leak by adding per-copy context to > CopyState. This would be overkill, and ResetCopyFrom() might be > reasonable though. Good catch. I merged your fix into the attached patch. BTW, why didn't planner choose a materialized plan for the inner loop? FDW scans are typically slower than heap scans or TupleTableslot scans, it seems reasonable for me to add a Materialize node at the top of the inner Foreign Scan, especially when we don't use indexes for the scan keys or join keys. -- Itagaki Takahiro
Attachment
On Fri, 14 Jan 2011 13:03:27 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > Good catch. I merged your fix into the attached patch. Thanks, I'll rebase my patches. > BTW, why didn't planner choose a materialized plan for the inner loop? > FDW scans are typically slower than heap scans or TupleTableslot scans, > it seems reasonable for me to add a Materialize node at the top of the > inner Foreign Scan, especially when we don't use indexes for the scan > keys or join keys. Maybe because foreign tables lack statistics, and file_fdw's estimate isn't smart enough. After copying statisticsof pgbench_xxx tables into csv_xxx tables, planner generates same plans as for local tables, but costs of ForeignScan nodes are little lower than them of SeqScan nodes. ============================== postgres=# explain analyze select * from csv_accounts a, csv_branches b where a.bid = b.bid; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------Hash Join (cost=0.33..45467.32 rows=1000000 width=197) (actual time=0.234..8044.077 rows=1000000 loops=1) Hash Cond: (a.bid =b.bid) -> Foreign Scan on csv_accounts a (cost=0.00..31717.00 rows=1000000 width=97) (actual time=0.107..4147.074 rows=1000000loops=1) -> Hash (cost=0.20..0.20 rows=10 width=100) (actual time=0.085..0.085 rows=10 loops=1) Buckets:1024 Batches: 1 Memory Usage: 1kB -> Foreign Scan on csv_branches b (cost=0.00..0.20 rows=10 width=100)(actual time=0.027..0.056 rows=10 loops=1)Total runtime: 9690.686 ms (7 rows) postgres=# explain analyze select * from pgbench_accounts a, pgbench_branches b where a.bid = b.bid; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------Hash Join (cost=1.23..40145.22 rows=1000000 width=197) (actual time=0.146..5693.883 rows=1000000 loops=1) Hash Cond: (a.bid =b.bid) -> Seq Scan on pgbench_accounts a (cost=0.00..26394.00 rows=1000000 width=97) (actual time=0.073..1884.018 rows=1000000loops=1) -> Hash (cost=1.10..1.10 rows=10 width=100) (actual time=0.048..0.048 rows=10 loops=1) Buckets:1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on pgbench_branches b (cost=0.00..1.10 rows=10 width=100)(actual time=0.003..0.021 rows=10 loops=1)Total runtime: 7333.713 ms (7 rows) ============================== Forced Nested Loop uses Materialize node as expected. ============================== postgres=# set enable_hashjoin = false; SET postgres=# explain select * from csv_accounts a, csv_branches b where a.bid = b.bid; QUERYPLAN -----------------------------------------------------------------------------------Nested Loop (cost=0.00..181717.23 rows=1000000width=197) Join Filter: (a.bid = b.bid) -> Foreign Scan on csv_accounts a (cost=0.00..31717.00 rows=1000000width=97) -> Materialize (cost=0.00..0.25 rows=10 width=100) -> Foreign Scan on csv_branches b (cost=0.00..0.20rows=10 width=100) (5 rows) postgres=# explain select * from pgbench_accounts a, pgbench_branches b where a.bid = b.bid; QUERY PLAN -----------------------------------------------------------------------------------Nested Loop (cost=0.00..176395.12 rows=1000000width=197) Join Filter: (a.bid = b.bid) -> Seq Scan on pgbench_accounts a (cost=0.00..26394.00 rows=1000000width=97) -> Materialize (cost=0.00..1.15 rows=10 width=100) -> Seq Scan on pgbench_branches b (cost=0.00..1.10rows=10 width=100) (5 rows) ============================== ISTM that new interface which is called from ANALYZE would help to update statistics of foreign talbes. If we could leave sampling argorythm to FDWs, acquire_sample_rows() might fit for that purpose. If a FDW doesn't provide analyze handler, postgres might be able to execute "SELECT * FROM foreign_table LIMIT sample_num" internally to get sample rows. Regards, -- Shigeru Hanada
On Fri, Jan 14, 2011 at 14:20, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > After copying statisticsof pgbench_xxx tables into csv_xxx tables, > planner generates same plans as for local tables, but costs of > ForeignScan nodes are little lower than them of SeqScan nodes. > Forced Nested Loop uses Materialize node as expected. Interesting. It means we need per-column statistics for foreign tables in addition to cost values. > ISTM that new interface which is called from ANALYZE would help to > update statistics of foreign talbes. If we could leave sampling > argorythm to FDWs, acquire_sample_rows() might fit for that purpose. We will discuss how to collect statistics from foreign tables in the next development cycle. I think we have two choice here: #1. Retrieve sample rows from remote foreign tables and store stats in the local pg_statistic.#2. Use remote statisticsfor each foreign table directly. acquire_sample_rows() would be a method for #1, Another approach for #2 is to use remote statistics directly. We provide hooks to generate virtual statistics with get_relation_stats_hook() and families. We could treat statistics for foreign tables in a similar way as the hook. file_fdw likes #1 because there are no external storage to store statistics for CSV files, but pgsql_fdw might prefer #2 because the remote server already has stats for the underlying table. -- Itagaki Takahiro
On Fri, 14 Jan 2011 14:20:20 +0900 Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > On Fri, 14 Jan 2011 13:03:27 +0900 > Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > > > Good catch. I merged your fix into the attached patch. > > Thanks, I'll rebase my patches. I've rebased WIP patches for file_fdw onto Itagaki-san's recent copy_export patch. Interface of NextCopyFrom() is fixed to return HeapTuple, to support tableoid without any change to TupleTableSlot. Please apply patches in this order: 1) patches for FDW API (NOT attached to this message) These patches are attached and described in this message. http://archives.postgresql.org/pgsql-hackers/2011-01/msg01096.php 2) copy_export-20110114.patch (NOT attached to this message) This patch is attached and described in this message. http://archives.postgresql.org/pgsql-hackers/2011-01/msg01066.php 3) 20110114-copy_export_HeapTupe.patch This patch fixes interface of NextCopyFrom() to return results as HeapTuple. 4) 20110114-foreign_scan.patch This patch adds HANDLER option of FOREIGN DATA WRAPPER, and ForeignScan executor node. Note that no wrapper is available in this patch. 5) 20110114-file_fdw.patch This patch adds contrib/file_fdw, foreign-data wrapper for server-side files. Supported file formats are similar to COPY command, and COPY options except "force_not_null" are accepted as generic options of foreign table. Regards, -- Shigeru Hanada
Attachment
Itagaki Takahiro wrote: > Shigeru HANADA wrote: >> Attached patch would avoid this leak by adding per-copy context to >> CopyState. This would be overkill, and ResetCopyFrom() might be >> reasonable though. > > Good catch. I merged your fix into the attached patch. Review for CF: While there is a post which suggests applying this patch in the middle of a string of fdw patches, it stands alone without depending on any of the other patches. I chose to focus on testing it alone, to isolate just issues with this particular patch. In addition to the base patch, there was a patch-on-patch which was applied to change signature of NextCopyFrom to take fewer params and return HeapTuple. This patch is in context diff format, applies cleanly, compiles without warning, passes all tests in `make check` and `make installcheck-world`. From that and the testing I did, I think that this patch could be committed before any of the others, if desired. A few minor comments on format, though -- some parts of the patch came out much more readable (for me, at least) when I regenerated the diff using the git diff --patience switch. For example, see the end of the CopyStateData structure definition each way. Also, whitespace has little differences from what pgindent uses. Most are pretty minor except for a section where the indentation wasn't changed based on a change in depth of braces, to keep the patch size down. It appears that the point of the patch is to provide a better API for accessing the implementation of COPY, to support other patches. This whole FDW effort is not an area of expertise for me, but the API looks reasonable to me, with a somewhat parallel structure to some of the other APIs used by the executor. From reading the FDW posts, it appears that other patches are successfully using this new API, and the reviewers of the other patches seem happy enough with it, which would tend to indicate that it is on-target. Other hackers seem to want it and we didn't already have it. From my reading of the posts the idea of creating an API at this level was agreed upon by the community. The only two files touched by this patch are copy.h and copy.c. The copy.h changes consisted entirely of new function prototypes and one declaration of a pointer type (CopyState) to a struct defined and used only inside copy.c (CopyStateData). That pointer is the return value from BeginCopyFrom and required as a parameter to other new functions. So the old API is still present exactly as it was, with additional functions added. I tried to read through the code to look for problems, but there are so many structures and techniques involved that I haven't had to deal with yet, that it would take me days to get up to speed enough to desk check this adequately. Since this API is a prerequisite for other patches, and already being used by them, I figured I should do what I could quickly and then worry about how to cover that. Since it doesn't appear to be intended to change any user-visible behavior, I don't see any need for docs or changes to the regression tests. I didn't see any portability problems. It seemed a little light on comments to me, but perhaps that's because of my ignorance of some of the techniques being used -- maybe things are obvious enough to anyone who would be mucking about in copy.c. I spent a few hours doing ad hoc tests of various sorts to try to break things, without success. Among those tests were checks for correct behavior on temporary tables and read only transactions -- separately and in combination. I ran millions of COPY statements inside of a single database transaction (I tried both FROM and TO) without seeing memory usage increase by more than a few kB. No assertion failures. No segfaults. Nothing unusual in the log. I plan to redo these tests with the full fdw patch set unless someone has already covered that ground. So far everything I've done has been with asserts enabled, so I haven't tried to get serious benchmarks, but it seems fast. I will rebuild without asserts and do performance tests before I change the status on the CF page. I'm wondering if it would make more sense to do the benchmarking with just this patch or the full fdw patch set? Both? -Kevin
On Sat, Jan 15, 2011 at 08:35, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > Interface of NextCopyFrom() is fixed to return HeapTuple, to support > tableoid without any change to TupleTableSlot. > > 3) 20110114-copy_export_HeapTupe.patch > This patch fixes interface of NextCopyFrom() to return results as > HeapTuple. I think file_fdw can return tuples in virtual tuples forms, and ForeignNext() calls ExecMaterializeSlot() to store tableoid. -- Itagaki Takahiro
On Tue, 18 Jan 2011 15:17:12 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Sat, Jan 15, 2011 at 08:35, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > > Interface of NextCopyFrom() is fixed to return HeapTuple, to support > > tableoid without any change to TupleTableSlot. > > > > 3) 20110114-copy_export_HeapTupe.patch > > This patch fixes interface of NextCopyFrom() to return results as > > HeapTuple. > > I think file_fdw can return tuples in virtual tuples forms, > and ForeignNext() calls ExecMaterializeSlot() to store tableoid. Thanks for the comment. I've fixed file_fdw to use tts_values and tts_isnull to receive results from NextCopyFrom, and store virtual tuple in the slot. Attached patch requires FDW API patches and copy_export-20110114.patch. Please see also: http://archives.postgresql.org/message-id/20110119002615.8316.6989961C@metrosystems.co.jp And also cost estimation of file_fdw is simplified along your comments below. On Tue, 21 Dec 2010 21:32:17 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > #3. Why do you re-open a foreign table in estimate_costs() ? > Since the caller seems to have the options for them, you can > pass them directly, no? > > In addition, passing a half-initialized fplan to estimate_costs() > is a bad idea. If you think it is an OUT parameter, the OUT params > should be *startup_cost and *total_cost. In that message, you also pointed out that FDW must generate explainInfo in every PlanRelScan call even if the planning is not for EXPLAIN. I'll try to defer generating explainInfo until EXPLAIN VERBOSE really uses it. It might need new hook point in expalain.c, though. Regards, -- Shigeru Hanada
Attachment
On Wed, Jan 19, 2011 at 00:34, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > Attached patch requires FDW API patches and copy_export-20110114.patch. Some minor comments: * Can you pass slot->tts_values and tts_isnull directly to NextCopyFrom()? It won't allocate the arrays; just fill the array buffers. * You can pass NULL for the 4th argument for NextCopyFrom(). | Oid tupleoid; /* just for required parameter */ * file_fdw_validator still has duplicated codes with BeginCopy, but I have no idea to share the validation code in clean way... * Try strVal() instead of DefElem->val.str * FdwEPrivate seems too abbreviated for me. How about FileFdwPrivate? * "private" is a bad identifier name because it's a C++ keyword. We should rename FdwExecutionState->private. > In that message, you also pointed out that FDW must generate > explainInfo in every PlanRelScan call even if the planning is not for > EXPLAIN. I'll try to defer generating explainInfo until EXPLAIN > VERBOSE really uses it. It might need new hook point in expalain.c, > though. I complained about the overhead, but it won't be a problem for file_fdw and pgsql_fdw. file_fdw can easily generate the text, and pgsql_fdw needs to generate a SQL query anyway. My concern is the explainInfo interface is not ideal for the purpose and therefore it will be unstable interface. If we support nested plans in FDWs, each FDW should receive a tree writer used internally in explain.c. explainInfo, that is a plan text, is not enough for complex FdwPlans. However, since we don't have any better solution for now, we could have the variable for 9.1. It's much better than nothing. -- Itagaki Takahiro
On Tue, Jan 18, 2011 at 09:33, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Review for CF: Thank your for the review! > Since it doesn't appear to be intended to change any user-visible > behavior, I don't see any need for docs or changes to the regression > tests. There might be some user-visible behaviors in error messages because I rearranged some codes to check errors, But we can see the difference only if we have two or more errors in COPY commands. They should be not so serious issues. > So far everything I've done has been with asserts enabled, so I > haven't tried to get serious benchmarks, but it seems fast. I will > rebuild without asserts and do performance tests before I change the > status on the CF page. > > I'm wondering if it would make more sense to do the benchmarking with > just this patch or the full fdw patch set? Both? I tested the performance on my desktop PC, but I cannot see any differences. But welcome if any of you could test on high-performance servers. Comparison with file_fdw would be more interesting If they have similar performance, we could replace "COPY FROM" to "CREATE TABLE AS SELECT FROM foreign_table", that is more flexible. -- Itagaki Takahiro
On Thu, 20 Jan 2011 22:21:37 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Wed, Jan 19, 2011 at 00:34, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > > Attached patch requires FDW API patches and copy_export-20110114.patch. > > Some minor comments: Thanks for the comments. I'll post revised version of patches in new threads. > * Can you pass slot->tts_values and tts_isnull directly to NextCopyFrom()? > It won't allocate the arrays; just fill the array buffers. I think it's safe to pass them to NextCopyFrom() directly because that arrays are allocated in ExecSetSlotDescriptor() during ExecInitForeignScan(), and size of arrays are taken from Please let me know if I've missed your point. > * You can pass NULL for the 4th argument for NextCopyFrom(). > | Oid tupleoid; /* just for required parameter */ I didn't know that it's NULL-safe, thanks. > * file_fdw_validator still has duplicated codes with BeginCopy, > but I have no idea to share the validation code in clean way... It would be necessary to change considerable part of BeginCopy() to separate validation from it to use validation from file_fdw... > * Try strVal() instead of DefElem->val.str > * FdwEPrivate seems too abbreviated for me. How about FileFdwPrivate? Thanks, fixed. > * "private" is a bad identifier name because it's a C++ keyword. > We should rename FdwExecutionState->private. Renamed to fdw_private, including another 'private' in FdwPlan. > > In that message, you also pointed out that FDW must generate > > explainInfo in every PlanRelScan call even if the planning is not for > > EXPLAIN. I'll try to defer generating explainInfo until EXPLAIN > > VERBOSE really uses it. It might need new hook point in expalain.c, > > though. > > I complained about the overhead, but it won't be a problem for > file_fdw and pgsql_fdw. file_fdw can easily generate the text, > and pgsql_fdw needs to generate a SQL query anyway. > > My concern is the explainInfo interface is not ideal for the purpose > and therefore it will be unstable interface. If we support nested plans > in FDWs, each FDW should receive a tree writer used internally in > explain.c. explainInfo, that is a plan text, is not enough for complex > FdwPlans. However, since we don't have any better solution for now, > we could have the variable for 9.1. It's much better than nothing. When I was writing file_fdw, I hoped to use static functions in explain.c such as ExplainProperty() to handle complex information. Even for single plan node, I think that filename and size (currently they are printed in a plain text together) should be separated in the output of explain, especially when the format was XML or JSON. Regards, -- Shigeru Hanada
On Fri, Jan 21, 2011 at 22:12, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: >> My concern is the explainInfo interface is not ideal for the purpose >> and therefore it will be unstable interface. If we support nested plans >> in FDWs, each FDW should receive a tree writer used internally in >> explain.c. explainInfo, that is a plan text, is not enough for complex >> FdwPlans. However, since we don't have any better solution for now, >> we could have the variable for 9.1. It's much better than nothing. > > When I was writing file_fdw, I hoped to use static functions in > explain.c such as ExplainProperty() to handle complex information. > Even for single plan node, I think that filename and size (currently > they are printed in a plain text together) should be separated in the > output of explain, especially when the format was XML or JSON. Just an idea -- we could return complex node trees with explainInfo if we use XML or JSON for the format. For example, pgsql_fdw can return the result from "EXPLAIN (FORMAT json)" without modification. It might be one of the reasons we should should support JSON in the core :) -- Itagaki Takahiro
On Fri, Jan 21, 2011 at 8:59 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Fri, Jan 21, 2011 at 22:12, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: >>> My concern is the explainInfo interface is not ideal for the purpose >>> and therefore it will be unstable interface. If we support nested plans >>> in FDWs, each FDW should receive a tree writer used internally in >>> explain.c. explainInfo, that is a plan text, is not enough for complex >>> FdwPlans. However, since we don't have any better solution for now, >>> we could have the variable for 9.1. It's much better than nothing. >> >> When I was writing file_fdw, I hoped to use static functions in >> explain.c such as ExplainProperty() to handle complex information. >> Even for single plan node, I think that filename and size (currently >> they are printed in a plain text together) should be separated in the >> output of explain, especially when the format was XML or JSON. > > Just an idea -- we could return complex node trees with explainInfo > if we use XML or JSON for the format. For example, pgsql_fdw can > return the result from "EXPLAIN (FORMAT json)" without modification. > > It might be one of the reasons we should should support JSON in the core :) Nice try, but I think that'd be a real drag. You wouldn't want to return JSON when the explain format is text, or XML. I think we probably need to modify the EXPLAIN code so that FDWs get a chance to inject their own customer properties into the output, but I don't know that we need to get that done right this minute. We can ship something really crude/basic for 9.1, if need be, and fix this up for 9.2. Of course if it turns out that getting EXPLAIN working the way we'd like is really easy, then we can just do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 21, 2011 at 8:12 AM, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: >> * Try strVal() instead of DefElem->val.str >> * FdwEPrivate seems too abbreviated for me. How about FileFdwPrivate? > > Thanks, fixed. Was there supposed to be a patch attached here? Or where is it? We are past out of time to get this committed, and there hasn't been a new version in more than two weeks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 17, 2011 at 06:33:08PM -0600, Kevin Grittner wrote: > Itagaki Takahiro wrote: > > Shigeru HANADA wrote: > > >> Attached patch would avoid this leak by adding per-copy context to > >> CopyState. This would be overkill, and ResetCopyFrom() might be > >> reasonable though. > > > > Good catch. I merged your fix into the attached patch. > > Review for CF: ... > I tried to read through the code to look for problems, but there are > so many structures and techniques involved that I haven't had to deal > with yet, that it would take me days to get up to speed enough to > desk check this adequately. Since this API is a prerequisite for > other patches, and already being used by them, I figured I should do > what I could quickly and then worry about how to cover that. I've studied the patch from this angle. My two substantive questions concern CopyFromErrorCallback() error strings and the use of the per-tuple memory context; see below. The rest either entail trivial fixes, or they do not indicate anything that clearly ought to change. The patch mostly moves code around; it's a bit difficult to grasp what's really changing by looking at the diff (not a criticism of the approach -- I see no way to avoid that). The final code structure is at least as easy to follow as the structure it replaces. Here is a summary of the changes: file_fdw wishes to borrow the COPY FROM code for parsing structured text and building tuples fitting a given relation. file_fdw must bypass the parts that insert those tuples. Until now, copy.c has exposed a single API, DoCopy(). To meet file_fdw'sneeds, the patch adds four new APIs for use as a set: BeginCopyFrom() - once-per-COPY initialization and validation NextCopyFrom() - call N times to get N values/null arrays EndCopyFrom() - once-per-COPY cleanup CopyFromErrorCallback() - for caller use in ErrorContextCallback.callback Implementing these entails breaking apart the existing code structure. For one, the patch separates from DoCopy the once-per-COPY-statementcode, both initialization and cleanup. Secondly, it separates the CopyFrom code for assembling atuple based on COPY input from the code that actually stores said tuples in the target relation. To avoid duplicating code,the patch then calls these new functions from DoCopy and CopyFrom. To further avoid duplicating code and to retainsymmetry, it refactors COPY TO setup and teardown along similar lines. We have four new static functions: BeginCopyTo() - parallel to BeginCopyFrom(), for DoCopy() alone BeginCopy() - shared bits between BeginCopyTo() andBeginCopyFrom() EndCopyTo() - parallel to EndCopyFrom(), for DoCopy() alone EndCopy() - shared bits between EndCopyTo()and EndCopyFrom() Most "parse analysis"-type bits of DoCopy() move to BeginCopy(). Several checks remain in DoCopy(): superuser for readinga server-local file, permission on the relation, and a read-only transaction check. The first stays where it doesbecause a superuser may define a file_fdw foreign table and then grant access to others. The other two remain in DoCopy()because the new API uses the relation as a template without reading or writing it. The catalog work (looking up defaults, I/O functions, etc) in CopyFrom() moves to BeginCopyFrom(), and applicable localvariables become CopyState members. copy_in_error_callback changes name to CopyFromErrorCallback() to better fit with the rest of the new API. Its implementationdoes not change at all. Since it's now possible for one SQL statement to call into the COPY machinery an arbitrary number of times, the patch introducesa per-COPY memory context. The patch implements added refactorings that make sense but are peripheral to the API change at hand. CopyState.fe_copy is gone, now calculated locally in DoCopyTo(). CopyState.processed is gone, with the row count now bubbling up through return values. We now initialize the CopyState buffers for COPY FROM only; they are unused in COPY TO. The purest of patches would defer all these, but I don't object to them tagging along. For COPY TO, the relkind checks move from DoCopyTo() to BeginCopyTo(). I'm skeptical about this one. It's not required for correctness, and the relkind checks for COPY FROM remain in CopyFrom(). file_fdw uses CopyFromErrorCallback() to give errors the proper context. The function uses template strings like "COPY %s, line %d", where %s is the name of the relation being copied. Presumably file_fdw and other features using this API would wish to customize that error message prefix, and the relation name might not be apropos at all. How about another argument to BeginCopyFrom, specifying an error prefix to be stashed in the CopyState? We could easily regret requiring a Relation in BeginCopyFrom(); another user may wish to use a fabricated TupleDesc. Code paths in the new API use the Relation for a TupleDesc, relhashoids, column defaults, and error message strings. Removing the need for a full Relation is within reach. That being said, the API definitely loses cleanliness if we cater to that possibility now. It's probably best to keep the API as the patch has it, and let the hypothetical future use case change it. There were some concerns upthread that exposing more of copy.c would attract use in third party FDWs, hamstringing future efforts to improve the implementation for the COPY command itself. On review, FDW modules are not likely users in general. file_fdw wants this because it parses structured text files, not because it's an FDW. Either way, the API also seems nicely-chosen to minimize the exposure of internals. I have not tested the patched code in any detail, because Kevin's review covered that angle quite thoroughly. I have no cause to suspect that the patch will change user-visible function behavior, aside from the spurious error message changes noted below. It looks unlikely to affect performance, but that's probably worth a quick test. > Since it doesn't appear to be intended to change any user-visible > behavior, I don't see any need for docs or changes to the regression > tests. I didn't see any portability problems. It seemed a little > light on comments to me, but perhaps that's because of my ignorance > of some of the techniques being used -- maybe things are obvious > enough to anyone who would be mucking about in copy.c. In some places, the new code does not add comments despite the surrounding code having many comments. However, some existing comments went a bit too far (/* Place tuple in tuple slot */, /* Reset the per-tuple exprcontext */). The degree of commenting is mostly good, but I've noted a few places below. The header comment for BeginCopyFrom should probably explain the arguments a bit more; it assumes familiarity with the COPY implementation. How about this: /** Setup to read tuples from a file for COPY FROM.** 'rel': used as a template for the tuples* 'filename': name of server-localfile to read* 'attnamelist': List of char *, columns to include. NIL selects all cols.* 'options': List of DefElem. See copy_opt_item in gram.y for selections.** Returns a CopyState, to be passed to NextCopyFrom().*/ Thanks, nm Comments on specific hunks: > *** a/src/backend/commands/copy.c > --- b/src/backend/commands/copy.c > *************** > *** 167,181 **** typedef struct CopyStateData > char *raw_buf; > int raw_buf_index; /* next byte to process */ > int raw_buf_len; /* total # of bytes stored */ > } CopyStateData; > > - typedef CopyStateData *CopyState; > - > /* DestReceiver for COPY (SELECT) TO */ > typedef struct > { > DestReceiver pub; /* publicly-known function pointers */ > CopyState cstate; /* CopyStateData for the command */ > } DR_copy; > > > --- 170,197 ---- > char *raw_buf; > int raw_buf_index; /* next byte to process */ > int raw_buf_len; /* total # of bytes stored */ > + > + /* > + * The definition of input functions and default expressions are stored > + * in these variables. > + */ > + EState *estate; > + AttrNumber num_defaults; > + bool file_has_oids; > + FmgrInfo oid_in_function; > + Oid oid_typioparam; > + FmgrInfo *in_functions; > + Oid *typioparams; > + int *defmap; > + ExprState **defexprs; /* array of default att expressions */ > } CopyStateData; The leading comment only applies to a few of those fields. Each field needs its own comment, instead. > *************** > *** 1314,1339 **** DoCopyTo(CopyState cstate) > } > PG_END_TRY(); > > ! if (!pipe) > { > ! if (FreeFile(cstate->copy_file)) > ! ereport(ERROR, > ! (errcode_for_file_access(), > ! errmsg("could not write to file \"%s\": %m", > ! cstate->filename))); > } > } > > /* > * Copy from relation or query TO file. > */ > ! static void > CopyTo(CopyState cstate) > { > TupleDesc tupDesc; > int num_phys_attrs; > Form_pg_attribute *attr; > ListCell *cur; > > if (cstate->rel) > tupDesc = RelationGetDescr(cstate->rel); > --- 1396,1442 ---- > } > PG_END_TRY(); > > ! return processed; > ! } > ! > ! /* > ! * Clean up storage and release resources for COPY TO. > ! */ > ! static void > ! EndCopyTo(CopyState cstate) > ! { > ! if (cstate->filename != NULL && FreeFile(cstate->copy_file)) > ! ereport(ERROR, > ! (errcode_for_file_access(), > ! errmsg("could not close file \"%s\": %m", > ! cstate->filename))); The old error message was "could not write to file \"%s\": %m". This might be a good change, but it belongs in a separate patch. > ! > ! /* > ! * Close the relation or query. We can release the AccessShareLock we got. > ! */ Separated from its former location, this comment is not applicable. > *************** > *** 1826,1832 **** CopyFrom(CopyState cstate) > slot = ExecInitExtraTupleSlot(estate); > ExecSetSlotDescriptor(slot, tupDesc); > > ! econtext = GetPerTupleExprContext(estate); > > /* > * Pick up the required catalog information for each attribute in the > --- 1889,2070 ---- > slot = ExecInitExtraTupleSlot(estate); > ExecSetSlotDescriptor(slot, tupDesc); > > ! /* Prepare to catch AFTER triggers. */ > ! AfterTriggerBeginQuery(); > ! > ! /* > ! * Check BEFORE STATEMENT insertion triggers. It's debateable whether we > ! * should do this for COPY, since it's not really an "INSERT" statement as > ! * such. However, executing these triggers maintains consistency with the > ! * EACH ROW triggers that we already fire on COPY. > ! */ > ! ExecBSInsertTriggers(estate, resultRelInfo); > ! > ! values = (Datum *) palloc(tupDesc->natts * sizeof(Datum)); > ! nulls = (bool *) palloc(tupDesc->natts * sizeof(bool)); > ! > ! bistate = GetBulkInsertState(); > ! > ! /* Set up callback to identify error line number */ > ! errcontext.callback = CopyFromErrorCallback; > ! errcontext.arg = (void *) cstate; > ! errcontext.previous = error_context_stack; > ! error_context_stack = &errcontext; > ! > ! while (!done) > ! { The loop may as well be unconditional ... > ! bool skip_tuple; > ! Oid loaded_oid = InvalidOid; > ! > ! CHECK_FOR_INTERRUPTS(); > ! > ! /* Reset the per-tuple exprcontext */ > ! ResetPerTupleExprContext(estate); > ! > ! /* Switch into its memory context */ > ! MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); Shouldn't a switch to this context happen inside NextCopyFrom(), then again for the calls to heap_form_tuple/HeapTupleSetOid? I found previous discussion on this matter, but no conclusion. > ! > ! done = !NextCopyFrom(cstate, values, nulls, &loaded_oid); > ! if (done) > ! break; ... since this is the only exit. No need to maintain "done". > ! /* > ! * Clean up storage and release resources for COPY FROM. > ! */ > ! void > ! EndCopyFrom(CopyState cstate) > ! { > ! FreeExecutorState(cstate->estate); > ! > ! if (cstate->filename != NULL && FreeFile(cstate->copy_file)) > ! ereport(ERROR, > ! (errcode_for_file_access(), > ! errmsg("could not close file \"%s\": %m", > ! cstate->filename))); Likewise: this might be a good verbiage change, but it belongs in its own patch.
Thank you for the detailed review! Revised patch attached, but APIs for jagged csv is not included in it. On Sun, Feb 6, 2011 at 09:01, Noah Misch <noah@leadboat.com> wrote: > Most "parse analysis"-type bits of DoCopy() move to BeginCopy(). It would be possible to move more FROM-only or TO-only members in BeginCopy() into their BeginCopyFrom/To(), but it is just a code refactoring requires more code rearrangement. It should be done by another try even if needed. > CopyState.processed is gone, with the row count now bubbling up > through return values. I removed it from CopyState because it represents "number of inserted rows" in COPY FROM. However, for the viewpoint of API, the numbers of lines in the input file is more suitable. To avoid the confusion, I moved it from a common member to COPY FROM-only variable. > For COPY TO, the relkind checks move from DoCopyTo() to BeginCopyTo(). I'm > skeptical about this one. It's not required for correctness, and the relkind > checks for COPY FROM remain in CopyFrom(). I think error checks should be done in the early phase, so I moved the check to BeginCopyTo(). However, relkind check in COPY FROM is needed only for COPY FROM command because the relation is the inserted target. For APIs, the relation is used as a template for input file. So, we cannot perform the check in BeginCopyFrom(). > file_fdw uses CopyFromErrorCallback() to give errors the proper context. The > function uses template strings like "COPY %s, line %d", where %s is the name of > the relation being copied. Presumably file_fdw and other features using this > API would wish to customize that error message prefix, and the relation name > might not be apropos at all. How about another argument to BeginCopyFrom, > specifying an error prefix to be stashed in the CopyState? I changed "COPY %s, .." to "relation %s, ..." because the first string is the relation name anyway. We could have another prefix argument, but I think it has little information for errors. We also have many "COPY" in other messages, but they won't be used actually because the are messages for invalid arguments and file_fdw will have own validater function. All invalid arguments will be filtered in CREATE commands. > We could easily regret requiring a Relation in BeginCopyFrom(); another user may > wish to use a fabricated TupleDesc. Code paths in the new API use the Relation > for a TupleDesc, relhashoids, column defaults, and error message strings. > Removing the need for a full Relation is within reach. That being said, the API > definitely loses cleanliness if we cater to that possibility now. It's probably > best to keep the API as the patch has it, and let the hypothetical future use > case change it. Agreed. The current file_fdw has a template relation anyway. > The header comment for BeginCopyFrom should probably explain the arguments a bit > more; it assumes familiarity with the COPY implementation. How about this: > (snip) > Comments on specific hunks: Thanks included. >> ! /* Reset the per-tuple exprcontext */ >> ! ResetPerTupleExprContext(estate); >> ! >> ! /* Switch into its memory context */ >> ! MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > > Shouldn't a switch to this context happen inside NextCopyFrom(), then again for > the calls to heap_form_tuple/HeapTupleSetOid? I found previous discussion on > this matter, but no conclusion. In my understanding, NextCopyFrom() should use CurrentMemoryContext provided by the caller. The file_fdw will use executor's per-tuple context for it. Another issue in the automatic context switch is to discard the previous values in the next call while the caller is unaware. -- Itagaki Takahiro
Attachment
On Fri, 4 Feb 2011 10:10:56 -0500 Robert Haas <robertmhaas@gmail.com> wrote: > Was there supposed to be a patch attached here? Or where is it? We > are past out of time to get this committed, and there hasn't been a > new version in more than two weeks. Sorry for late to post patches. Attached are revised version of file_fdw patch. This patch is based on latest FDW API patches which are posted in another thread "SQL/MED FDW API", and copy_export-20110104.patch which was posted by Itagaki-san. Regards, -- Shigeru Hanada
Attachment
On Mon, Feb 7, 2011 at 16:01, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > This patch is based on latest FDW API patches which are posted in > another thread "SQL/MED FDW API", and copy_export-20110104.patch which > was posted by Itagaki-san. I have questions about estimate_costs(). * What value does baserel->tuples have? Foreign tables are never analyzed for now. Is the number correct? * Your previous measurement showed it has much more startup_cost. When you removed ReScan, it took long time but plannerdidn't choose materialized plans. It might come from lower startup costs. * Why do you use lstat() in it? Even if the file is a symlink, we will read the linked file in the succeeding copy. So, Ithink it should be stat() rather than lstat(). +estimate_costs(const char *filename, RelOptInfo *baserel, + double *startup_cost, double *total_cost) +{ ... + /* get size of the file */ + if (lstat(filename, &stat) == -1) + { + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not stat file \"%s\": %m", filename))); + } + + /* + * The way to estimate costs is almost same as cost_seqscan(), but there + * are some differences: + * - DISK costs are estimated from file size. + * - CPU costs are 10x of seq scan, for overhead of parsing records. + */ + pages = stat.st_size / BLCKSZ + (stat.st_size % BLCKSZ > 0 ? 1 : 0); + run_cost += seq_page_cost * pages; + + *startup_cost += baserel->baserestrictcost.startup; + cpu_per_tuple = cpu_tuple_cost + baserel->baserestrictcost.per_tuple; + run_cost += cpu_per_tuple * 10 * baserel->tuples; + *total_cost = *startup_cost + run_cost; + + return stat.st_size; +} -- Itagaki Takahiro
On Mon, Feb 07, 2011 at 03:39:39PM +0900, Itagaki Takahiro wrote: > On Sun, Feb 6, 2011 at 09:01, Noah Misch <noah@leadboat.com> wrote: > > Most "parse analysis"-type bits of DoCopy() move to BeginCopy(). > > It would be possible to move more FROM-only or TO-only members in BeginCopy() > into their BeginCopyFrom/To(), but it is just a code refactoring requires > more code rearrangement. It should be done by another try even if needed. Agreed. > > CopyState.processed is gone, with the row count now bubbling up > > through return values. > > I removed it from CopyState because it represents "number of inserted rows" > in COPY FROM. However, for the viewpoint of API, the numbers of lines in > the input file is more suitable. To avoid the confusion, I moved it > from a common member to COPY FROM-only variable. Perhaps I'm missing something. The new API does not expose a "processed" count at all; that count is used for the command tag of a top-level COPY. This part of the patch is just changing how we structure the code to maintain that tally, and it has no implications for observers outside copy.c. Right? > > For COPY TO, the relkind checks move from DoCopyTo() to BeginCopyTo(). ??I'm > > skeptical about this one. ??It's not required for correctness, and the relkind > > checks for COPY FROM remain in CopyFrom(). > > I think error checks should be done in the early phase, so I moved the check > to BeginCopyTo(). However, relkind check in COPY FROM is needed only for > COPY FROM command because the relation is the inserted target. For APIs, > the relation is used as a template for input file. So, we cannot perform > the check in BeginCopyFrom(). The choice of where to put them does not affect anything outside of copy.c, and placement in DoCopyTo() would make the symmetry between the TO and FROM code paths easier to follow. Not a major concern, though. > > file_fdw uses CopyFromErrorCallback() to give errors the proper context. ??The > > function uses template strings like "COPY %s, line %d", where %s is the name of > > the relation being copied. ??Presumably file_fdw and other features using this > > API would wish to customize that error message prefix, and the relation name > > might not be apropos at all. ??How about another argument to BeginCopyFrom, > > specifying an error prefix to be stashed in the CopyState? > > I changed "COPY %s, .." to "relation %s, ..." because the first string is > the relation name anyway. We could have another prefix argument, but I think > it has little information for errors. That's perhaps an improvement for file_fdw, but not for regular COPY. My comment originated with a faulty idea that file_fdw's internal use of COPY was an implementation detail that users should not need to see. Looking now, the file_fdw documentation clearly explains the tie to COPY, even referring users to the COPY documentation. I no longer see a need to hide the fact that the "foreign" source is a PostgreSQL COPY command. The error messages are fine as they were. Some future client of the new API may wish to hide its internal COPY use, but there's no need to design for that now. > We also have many "COPY" in other messages, but they won't be used actually > because the are messages for invalid arguments and file_fdw will have own > validater function. All invalid arguments will be filtered in CREATE commands. Agreed; "could not read from COPY file: %m" appears to be the primary one liable to happen in practice. The greater failure with that one is that, given a query reading from multiple file_fdw tables, you can't tell which file had a problem. This issue runs broader than the patch at hand; I will start another thread to address it. Let's proceed with this patch, not changing any error messages. If other discussion concludes that the desired behavior requires an enhancement to this new API, a followup commit can implement that. > >> ! ?? ?? ?? ?? ?? ?? /* Reset the per-tuple exprcontext */ > >> ! ?? ?? ?? ?? ?? ?? ResetPerTupleExprContext(estate); > >> ! > >> ! ?? ?? ?? ?? ?? ?? /* Switch into its memory context */ > >> ! ?? ?? ?? ?? ?? ?? MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > > > > Shouldn't a switch to this context happen inside NextCopyFrom(), then again for > > the calls to heap_form_tuple/HeapTupleSetOid? ??I found previous discussion on > > this matter, but no conclusion. > > In my understanding, NextCopyFrom() should use CurrentMemoryContext provided > by the caller. The file_fdw will use executor's per-tuple context for it. In a direct call to COPY FROM, all of NextCopyFrom() uses the per-tuple context of CopyState->estate. We reset that context before each call to NextCopyFrom(). This is true before (ignoring code movement) and after the patch. A file_fdw NextCopyFrom() call will use the per-tuple context of the executor performing a foreign scan. Allocations will arise primarily in type input functions. ExecEvalExpr(), used to acquire default values, will still use the per-tuple context of CopyState->estate. That per-tuple context will never get reset explicitly, so default value computations leak until EndCopyFrom(). I see memory context use was discussed already, but I don't see the aforementioned specific details addressed: http://archives.postgresql.org/message-id/AANLkTikfiM1qknr9=tL+xemBLAJ+YJoLhAG3XsH7mfwH@mail.gmail.com The use of ExecEvalExpr() in NextCopyFrom() also seems contrary to the execnodes.h comment you quoted in that thread ("CurrentMemoryContext should be set to ecxt_per_tuple_memory before calling ExecEvalExpr() --- see ExecEvalExprSwitchContext()."). NextCopyFrom() passes CopyState->estate to ExecEvalExpr, but the current context may be the per-tuple context of a different executor state. > Another issue in the automatic context switch is to discard the previous > values in the next call while the caller is unaware. True; we would need to document that pointers stored in the "values" argument of NextCopyData() are valid only until the next call. Does that already work with file_fdw's usage pattern, or would file_fdw need to make copies? Three hunk-specific comments (diff is between previously-reviewed copy.c and current master plus today's patch): > *** a/src/backend/commands/copy.c > --- b/src/backend/commands/copy.c > *************** > *** 175,193 **** typedef struct CopyStateData > * The definition of input functions and default expressions are stored > * in these variables. > */ > EState *estate; > AttrNumber num_defaults; > bool file_has_oids; > FmgrInfo oid_in_function; > Oid oid_typioparam; > ! FmgrInfo *in_functions; > ! Oid *typioparams; > ! int *defmap; > ExprState **defexprs; /* array of default att expressions */ > } CopyStateData; > > /* DestReceiver for COPY (SELECT) TO */ > typedef struct > { > DestReceiver pub; /* publicly-known function pointers */ > CopyState cstate; /* CopyStateData for the command */ > --- 175,193 ---- > * The definition of input functions and default expressions are stored > * in these variables. > */ This block comment remains roughly half correct. Again, I think a small comment on every field below should replace it. > EState *estate; > AttrNumber num_defaults; > bool file_has_oids; > FmgrInfo oid_in_function; > Oid oid_typioparam; > ! FmgrInfo *in_functions; /* array of input functions for each attrs */ > ! Oid *typioparams; /* array of element types for in_functions */ > ! int *defmap; /* array of default att numbers */ > ExprState **defexprs; /* array of default att expressions */ > } CopyStateData; > > /* DestReceiver for COPY (SELECT) TO */ > typedef struct > { > DestReceiver pub; /* publicly-known function pointers */ > CopyState cstate; /* CopyStateData for the command */ > *************** > *** 1268,1283 **** BeginCopy(bool is_from, > --- 1268,1289 ---- > } > > /* > * Release resources allocated in a cstate. > */ > static void > EndCopy(CopyState cstate) > { > + if (cstate->filename != NULL && FreeFile(cstate->copy_file)) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not close file \"%s\": %m", > + cstate->filename))); In the write case, this is not an improvement -- failure in fclose(3) almost always arises from the attempt to flush buffered writes. Note how most other call sites checking the return value of FreeFile() use an error message like the original one. The change to the read case seems fine. > + > MemoryContextDelete(cstate->copycontext); > pfree(cstate); > } > > /* > * Setup CopyState to read tuples from a table or a query for COPY TO. > */ > static CopyState > *************** > *** 1400,1424 **** DoCopyTo(CopyState cstate) > } > > /* > * Clean up storage and release resources for COPY TO. > */ > static void > EndCopyTo(CopyState cstate) > { > - if (cstate->filename != NULL && FreeFile(cstate->copy_file)) > - ereport(ERROR, > - (errcode_for_file_access(), > - errmsg("could not close file \"%s\": %m", > - cstate->filename))); > - > - /* > - * Close the relation or query. We can release the AccessShareLock we got. > - */ > if (cstate->queryDesc != NULL) > { > /* Close down the query and free resources. */ > ExecutorEnd(cstate->queryDesc); > FreeQueryDesc(cstate->queryDesc); > PopActiveSnapshot(); > } > > --- 1406,1421 ---- > *************** > *** 1681,1746 **** void > CopyFromErrorCallback(void *arg) > { > CopyState cstate = (CopyState) arg; > > if (cstate->binary) > { > /* can't usefully display the data */ > if (cstate->cur_attname) > ! errcontext("COPY %s, line %d, column %s", > cstate->cur_relname, cstate->cur_lineno, > cstate->cur_attname); > else > ! errcontext("COPY %s, line %d", > cstate->cur_relname, cstate->cur_lineno); > } > else > { > if (cstate->cur_attname && cstate->cur_attval) > { > /* error is relevant to a particular column */ > char *attval; > > attval = limit_printout_length(cstate->cur_attval); > ! errcontext("COPY %s, line %d, column %s: \"%s\"", > cstate->cur_relname, cstate->cur_lineno, > cstate->cur_attname, attval); > pfree(attval); > } > else if (cstate->cur_attname) > { > /* error is relevant to a particular column, value is NULL */ > ! errcontext("COPY %s, line %d, column %s: null input", > cstate->cur_relname, cstate->cur_lineno, > cstate->cur_attname); > } > else > { > /* error is relevant to a particular line */ > if (cstate->line_buf_converted || !cstate->need_transcoding) > { > char *lineval; > > lineval = limit_printout_length(cstate->line_buf.data); > ! errcontext("COPY %s, line %d: \"%s\"", > cstate->cur_relname, cstate->cur_lineno, lineval); > pfree(lineval); > } > else > { > /* > * Here, the line buffer is still in a foreign encoding, and > * indeed it's quite likely that the error is precisely a > * failure to do encoding conversion (ie, bad data). We dare > * not try to convert it, and at present there's no way to > * regurgitate it without conversion. So we have to punt and > * just report the line number. > */ > ! errcontext("COPY %s, line %d", > cstate->cur_relname, cstate->cur_lineno); > } > } > } > } > > /* > * Make sure we don't print an unreasonable amount of COPY data in a message. > --- 1678,1743 ---- > CopyFromErrorCallback(void *arg) > { > CopyState cstate = (CopyState) arg; > > if (cstate->binary) > { > /* can't usefully display the data */ > if (cstate->cur_attname) > ! errcontext("relation %s, line %d, column %s", > cstate->cur_relname, cstate->cur_lineno, > cstate->cur_attname); > else > ! errcontext("relation %s, line %d", > cstate->cur_relname, cstate->cur_lineno); > } > else > { > if (cstate->cur_attname && cstate->cur_attval) > { > /* error is relevant to a particular column */ > char *attval; > > attval = limit_printout_length(cstate->cur_attval); > ! errcontext("relation %s, line %d, column %s: \"%s\"", > cstate->cur_relname, cstate->cur_lineno, > cstate->cur_attname, attval); > pfree(attval); > } > else if (cstate->cur_attname) > { > /* error is relevant to a particular column, value is NULL */ > ! errcontext("relation %s, line %d, column %s: null input", > cstate->cur_relname, cstate->cur_lineno, > cstate->cur_attname); > } > else > { > /* error is relevant to a particular line */ > if (cstate->line_buf_converted || !cstate->need_transcoding) > { > char *lineval; > > lineval = limit_printout_length(cstate->line_buf.data); > ! errcontext("relation %s, line %d: \"%s\"", > cstate->cur_relname, cstate->cur_lineno, lineval); > pfree(lineval); > } > else > { > /* > * Here, the line buffer is still in a foreign encoding, and > * indeed it's quite likely that the error is precisely a > * failure to do encoding conversion (ie, bad data). We dare > * not try to convert it, and at present there's no way to > * regurgitate it without conversion. So we have to punt and > * just report the line number. > */ > ! errcontext("relation %s, line %d", > cstate->cur_relname, cstate->cur_lineno); > } > } > } > } > > /* > * Make sure we don't print an unreasonable amount of COPY data in a message. > *************** > *** 1782,1798 **** limit_printout_length(const char *str) > */ > static uint64 > CopyFrom(CopyState cstate) > { > HeapTuple tuple; > TupleDesc tupDesc; > Datum *values; > bool *nulls; > - bool done = false; > ResultRelInfo *resultRelInfo; > EState *estate = cstate->estate; /* for ExecConstraints() */ > TupleTableSlot *slot; > MemoryContext oldcontext = CurrentMemoryContext; > ErrorContextCallback errcontext; > CommandId mycid = GetCurrentCommandId(true); > int hi_options = 0; /* start with default heap_insert options */ > BulkInsertState bistate; > --- 1779,1794 ---- > *************** > *** 1906,1936 **** CopyFrom(CopyState cstate) > bistate = GetBulkInsertState(); > > /* Set up callback to identify error line number */ > errcontext.callback = CopyFromErrorCallback; > errcontext.arg = (void *) cstate; > errcontext.previous = error_context_stack; > error_context_stack = &errcontext; > > ! while (!done) > { > bool skip_tuple; > Oid loaded_oid = InvalidOid; > > CHECK_FOR_INTERRUPTS(); > > /* Reset the per-tuple exprcontext */ > ResetPerTupleExprContext(estate); > > /* Switch into its memory context */ > MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > > ! done = !NextCopyFrom(cstate, values, nulls, &loaded_oid); > ! if (done) > break; > > /* And now we can form the input tuple. */ > tuple = heap_form_tuple(tupDesc, values, nulls); > > if (loaded_oid != InvalidOid) > HeapTupleSetOid(tuple, loaded_oid); > > --- 1902,1931 ---- > bistate = GetBulkInsertState(); > > /* Set up callback to identify error line number */ > errcontext.callback = CopyFromErrorCallback; > errcontext.arg = (void *) cstate; > errcontext.previous = error_context_stack; > error_context_stack = &errcontext; > > ! for (;;) > { > bool skip_tuple; > Oid loaded_oid = InvalidOid; > > CHECK_FOR_INTERRUPTS(); > > /* Reset the per-tuple exprcontext */ > ResetPerTupleExprContext(estate); > > /* Switch into its memory context */ > MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > > ! if (!NextCopyFrom(cstate, values, nulls, &loaded_oid)) > break; > > /* And now we can form the input tuple. */ > tuple = heap_form_tuple(tupDesc, values, nulls); > > if (loaded_oid != InvalidOid) > HeapTupleSetOid(tuple, loaded_oid); > > *************** > *** 2015,2031 **** CopyFrom(CopyState cstate) > */ > if (hi_options & HEAP_INSERT_SKIP_WAL) > heap_sync(cstate->rel); > > return processed; > } > > /* > ! * Setup CopyState to read tuples from a file for COPY FROM. > */ > CopyState > BeginCopyFrom(Relation rel, > const char *filename, > List *attnamelist, > List *options) > { > CopyState cstate; > --- 2010,2032 ---- > */ > if (hi_options & HEAP_INSERT_SKIP_WAL) > heap_sync(cstate->rel); > > return processed; > } > > /* > ! * CopyGetAttnums - build an integer list of attnums to be copied > ! * > ! * The input attnamelist is either the user-specified column list, > ! * or NIL if there was none (in which case we want all the non-dropped > ! * columns). > ! * > ! * rel can be NULL ... it's only used for error reports. > */ > CopyState > BeginCopyFrom(Relation rel, This is just a verbatim copy of the CopyGetAttnums() header comment. (The middle paragraph happens to be true, though.) > const char *filename, > List *attnamelist, > List *options) > { > CopyState cstate; > *************** > *** 2208,2224 **** BeginCopyFrom(Relation rel, > MemoryContextSwitchTo(oldcontext); > > return cstate; > } > > /* > * Read next tuple from file for COPY FROM. Return false if no more tuples. > * > ! * valus and nulls arrays must be the same length as columns of the > * relation passed to BeginCopyFrom. Oid of the tuple is returned with > * tupleOid separately. > */ > bool > NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *tupleOid) > { > TupleDesc tupDesc; > Form_pg_attribute *attr; > --- 2209,2225 ---- > MemoryContextSwitchTo(oldcontext); > > return cstate; > } > > /* > * Read next tuple from file for COPY FROM. Return false if no more tuples. > * > ! * values and nulls arrays must be the same length as columns of the > * relation passed to BeginCopyFrom. Oid of the tuple is returned with > * tupleOid separately. > */ > bool > NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *tupleOid) > { > TupleDesc tupDesc; > Form_pg_attribute *attr; > *************** > *** 2453,2474 **** NextCopyFrom(CopyState cstate, Datum *values, bool *nulls, Oid *tupleOid) > /* > * Clean up storage and release resources for COPY FROM. > */ > void > EndCopyFrom(CopyState cstate) > { > FreeExecutorState(cstate->estate); > > - if (cstate->filename != NULL && FreeFile(cstate->copy_file)) > - ereport(ERROR, > - (errcode_for_file_access(), > - errmsg("could not close file \"%s\": %m", > - cstate->filename))); > - > /* Clean up storage */ > EndCopy(cstate); > } > > > /* > * Read the next input line and stash it in line_buf, with conversion to > * server encoding. > --- 2454,2469 ----
On Mon, 7 Feb 2011 21:00:53 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Mon, Feb 7, 2011 at 16:01, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > > This patch is based on latest FDW API patches which are posted in > > another thread "SQL/MED FDW API", and copy_export-20110104.patch which > > was posted by Itagaki-san. > > I have questions about estimate_costs(). > > * What value does baserel->tuples have? > Foreign tables are never analyzed for now. Is the number correct? No, baserel->tuples is always 0, and baserel->pages is 0 too. In addition, width and rows are set to 100 and 1, as default values. I think ANALYZE support is needed to make PostgreSQL's standard optimization for foreign scans. At present, estimation for foreign tables would be awful. > * Your previous measurement showed it has much more startup_cost. > When you removed ReScan, it took long time but planner didn't choose > materialized plans. It might come from lower startup costs. I tested joining file_fdw tables, accounts and branches, which are initialized with "pgbench -i -s 10" and exported to csv files. At first, I tried adding random_page_cost (4.0) to startup_cost as cost to open file (though it's groundless), but materialized was not chosen. After updating pg_class.reltuples to correct value, Hash-join was choosen for same query. With disabling Hash-join, finally materialized was choosen. ISTM that choosing simple nested loop would come from wrong estimation of loop count, but not from startup cost. IMHO, supporting analyze (PG-style statistics) is necessary to make PostgreSQL to generate sane plan. > * Why do you use lstat() in it? > Even if the file is a symlink, we will read the linked file in the > succeeding copy. So, I think it should be stat() rather than lstat(). Good catch! Fixed version is attached. Regards, -- Shigeru Hanada
Attachment
On 02/07/2011 01:39 AM, Itagaki Takahiro wrote: > > >> file_fdw uses CopyFromErrorCallback() to give errors the proper context. The >> function uses template strings like "COPY %s, line %d", where %s is the name of >> the relation being copied. Presumably file_fdw and other features using this >> API would wish to customize that error message prefix, and the relation name >> might not be apropos at all. How about another argument to BeginCopyFrom, >> specifying an error prefix to be stashed in the CopyState? > I changed "COPY %s, .." to "relation %s, ..." because the first string is > the relation name anyway. We could have another prefix argument, but I think > it has little information for errors. > > We also have many "COPY" in other messages, but they won't be used actually > because the are messages for invalid arguments and file_fdw will have own > validater function. All invalid arguments will be filtered in CREATE commands. These changes have broken the regression tests. The attached patches (one for the core regression tests and one for file_fdw) fix that. But I don't know that your change is terribly helpful. I rather like Noah's idea better, if we need to make a change. cheers andrew
Attachment
Here is a revised patch, that including jagged csv support. A new exported function is named as NextCopyFromRawFields. On Mon, Feb 7, 2011 at 21:16, Noah Misch <noah@leadboat.com> wrote: > Perhaps I'm missing something. The new API does not expose a "processed" count > at all; that count is used for the command tag of a top-level COPY. This part > of the patch is just changing how we structure the code to maintain that tally, > and it has no implications for observers outside copy.c. Right? True, but the counter is tightly bound with INSERT-side of COPY FROM. | copy.c (1978) | * We count only tuples not suppressed by a BEFORE INSERT trigger; I think it is cleaner way to separate it from CopyState because it is used as a file reader rather than a writer. However, if there are objections, I'd revert it. >> I changed "COPY %s, .." to "relation %s, ..." > My comment originated with a faulty idea that file_fdw's internal use of COPY > was an implementation detail that users should not need to see. Looking now, > the file_fdw documentation clearly explains the tie to COPY, even referring > users to the COPY documentation. I no longer see a need to hide the fact that > the "foreign" source is a PostgreSQL COPY command. The error messages are fine > as they were. OK, I reverted the changes. User-visible changes might be more important, pointed by Andrew. > ExecEvalExpr(), used to acquire default values, will still use the > per-tuple context of CopyState->estate. That per-tuple context will never get > reset explicitly, so default value computations leak until EndCopyFrom(). Ah, I see. I didn't see the leak because we rarely use per-tuple memory context in the estate. We just use CurrentMemoryContext in many cases. But the leak could occur, and the code is misleading. I moved ResetPerTupleExprContext() into NextCopyFrom(), but CurrentMemoryContext still used in it to the result values. Another possible design might be passing EState as an argument of NextCopyFrom and remove estate from CopyState. It seems a much cleaner way in terms of control flow, but it requires more changes in file_fdw. Comments? > This block comment remains roughly half correct. Again, I think a small comment > on every field below should replace it. > > This is just a verbatim copy of the CopyGetAttnums() header comment. (The > middle paragraph happens to be true, though.) Silly mistakes. Maybe came from too many 'undo' in my editor. -- Itagaki Takahiro
Attachment
On Tue, Feb 08, 2011 at 09:25:29PM +0900, Itagaki Takahiro wrote: > Here is a revised patch, that including jagged csv support. > A new exported function is named as NextCopyFromRawFields. Seems a bit incongruous to handle the OID column in that function. That part fits with the other per-column code in NextCopyFrom(). > On Mon, Feb 7, 2011 at 21:16, Noah Misch <noah@leadboat.com> wrote: > > Perhaps I'm missing something. ??The new API does not expose a "processed" count > > at all; that count is used for the command tag of a top-level COPY. ??This part > > of the patch is just changing how we structure the code to maintain that tally, > > and it has no implications for observers outside copy.c. ??Right? > > True, but the counter is tightly bound with INSERT-side of COPY FROM. > | copy.c (1978) > | * We count only tuples not suppressed by a BEFORE INSERT trigger; > > I think it is cleaner way to separate it from CopyState > because it is used as a file reader rather than a writer. > However, if there are objections, I'd revert it. No significant objection. I just wished to be clear on whether it was pure refactoring, or something more. > > ExecEvalExpr(), used to acquire default values, will still use the > > per-tuple context of CopyState->estate. ??That per-tuple context will never get > > reset explicitly, so default value computations leak until EndCopyFrom(). > > Ah, I see. I didn't see the leak because we rarely use per-tuple memory > context in the estate. We just use CurrentMemoryContext in many cases. > But the leak could occur, and the code is misleading. > I moved ResetPerTupleExprContext() into NextCopyFrom(), but > CurrentMemoryContext still used in it to the result values. The code still violates the contract of ExecEvalExpr() by calling it with CurrentMemoryContext != econtext->ecxt_per_tuple_memory. > Another possible design might be passing EState as an argument of > NextCopyFrom and remove estate from CopyState. It seems a much cleaner way > in terms of control flow, but it requires more changes in file_fdw. > Comments? Seems sensible and more-consistent with the typical structure of executor code. Do we place any constraints on sharing of EState among different layers like this? I could not identify any, offhand. The new API uses EState for two things. First, BeginCopyFrom() passes it to ExecPrepareExpr(). Instead, let's use expression_planner() + ExecInitExpr() and require that we've been called with a memory context of suitable longevity. Things will break anyway if BeginCopyFrom()'s CurrentMemoryContext gets reset too early. This way, we no longer need an EState in BeginCopyFrom(). Second, NextCopyFrom() sends the per-output-tuple ExprContext of the EState to ExecEvalExpr(). It really needs a specific ExprContext, not an EState. A top-level COPY has a bijection between input and output tuples, making the distinction unimportant. GetPerTupleExprContext() is wrong for a file_fdw scan, though. We need the ExprContext of the ForeignScanState or another of equivalent lifecycle. NextCopyFrom() would then require that it be called with CurrentMemoryContext == econtext->ecxt_per_tuple_memory. Sorry, I missed a lot of these memory details on my first couple of reviews. nm
On Wed, Feb 9, 2011 at 03:49, Noah Misch <noah@leadboat.com> wrote: >> A new exported function is named as NextCopyFromRawFields. > Seems a bit incongruous to handle the OID column in that function. That part > fits with the other per-column code in NextCopyFrom(). Hmmm, I thought oid columns should be separated from user columns, but it might be a confusing interface. I removed the explicit oid output from NextCopyFromRawFields. file_fdw won't use oids at all in any cases, though. > The code still violates the contract of ExecEvalExpr() by calling it with > CurrentMemoryContext != econtext->ecxt_per_tuple_memory. I'm not sure whether we have such contract because the caller might want to get the expression result in long-lived context. But anyway using an external ExprContext looks cleaner. The new prototype is: bool NextCopyFrom( [IN] CopyState cstate, ExprContext *econtext, [OUT] Datum *values, bool *nulls, Oid *tupleOid) Note that econtext can be NULL if no default values are used. Since file_fdw won't use default values, we can just pass NULL for it. > Sorry, I missed a lot of these memory details on my first couple of reviews. You did great reviews! Thank you very much. -- Itagaki Takahiro
Attachment
On Wed, Feb 09, 2011 at 02:55:26PM +0900, Itagaki Takahiro wrote: > On Wed, Feb 9, 2011 at 03:49, Noah Misch <noah@leadboat.com> wrote: > > The code still violates the contract of ExecEvalExpr() by calling it with > > CurrentMemoryContext != econtext->ecxt_per_tuple_memory. > > I'm not sure whether we have such contract because the caller might > want to get the expression result in long-lived context. execQual.c has this comment: /* ----------------------------------------------------------------* ExecEvalExpr routines ...* The caller should already have switched into the temporary memory* context econtext->ecxt_per_tuple_memory. The convenienceentry point* ExecEvalExprSwitchContext() is provided for callers who don't prefer to* do the switch in an outerloop. We do not do the switch in these routines* because it'd be a waste of cycles during nested expression evaluation.*----------------------------------------------------------------*/ Assuming that comment is accurate, ExecEvalExpr constrains us; any default values must get allocated in econtext->ecxt_per_tuple_memory. To get them in a long-lived allocation, the caller can copy the datums or supply a long-lived ExprContext. Any CurrentMemoryContext works when econtext == NULL, of course. I'd suggest enhancing your new paragraph in the NextCopyFrom() header comment like this: - * 'econtext' is used to evaluate default expression for each columns not - * read from the file. It can be NULL when no default values are used, i.e. - * when all columns are read from the file. + * 'econtext' is used to evaluate default expression for each columns not read + * from the file. It can be NULL when no default values are used, i.e. when all + * columns are read from the file. If econtext is not NULL, the caller must have + * switched into the temporary memory context econtext->ecxt_per_tuple_memory. > But anyway > using an external ExprContext looks cleaner. The new prototype is: > > bool NextCopyFrom( > [IN] CopyState cstate, ExprContext *econtext, > [OUT] Datum *values, bool *nulls, Oid *tupleOid) Looks good. > Note that econtext can be NULL if no default values are used. > Since file_fdw won't use default values, we can just pass NULL for it. Nice. Good thinking. One small point: > --- 2475,2504 ---- > * provided by the input data. Anything not processed here or above > * will remain NULL. > */ > + /* XXX: End of only-indentation changes. */ > + if (num_defaults > 0) > + { > + Assert(econtext != NULL); > + > for (i = 0; i < num_defaults; i++) This could be better-written as "Assert(num_defaults == 0 || econtext != NULL);". From a functional and code structure perspective, I find this ready to commit. (I assume you'll drop the XXX: indent only comments on commit.) Kevin, did you want to do that performance testing you spoke of? Thanks, nm
On Wed, Feb 9, 2011 at 2:03 AM, Noah Misch <noah@leadboat.com> wrote: > From a functional and code structure perspective, I find this ready to commit. > (I assume you'll drop the XXX: indent only comments on commit.) Kevin, did you > want to do that performance testing you spoke of? OK, so is this Ready for Committer, or we're still working on it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 12, 2011 at 01:12, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Feb 9, 2011 at 2:03 AM, Noah Misch <noah@leadboat.com> wrote: >> From a functional and code structure perspective, I find this ready to commit. >> (I assume you'll drop the XXX: indent only comments on commit.) Kevin, did you >> want to do that performance testing you spoke of? > > OK, so is this Ready for Committer, or we're still working on it? Basically, we have no more tasks until the FDW core API is applied. COPY API and file_fdw patches are waiting for it. If we extend them a little more, I'd raise two items: * Should we print foreign table names in error messages? http://archives.postgresql.org/pgsql-hackers/2011-02/msg00427.php * COPY encoding patch was rejected, but using client_encoding is logically wrong for file_fdw. We might need subset of thepatch for file_fdw. -- Itagaki Takahiro
Robert Haas <robertmhaas@gmail.com> wrote: > Noah Misch <noah@leadboat.com> wrote: >> From a functional and code structure perspective, I find this >> ready to commit. (I assume you'll drop the XXX: indent only >> comments on commit.) Kevin, did you want to do that performance >> testing you spoke of? > > OK, so is this Ready for Committer, or we're still working on it? I can run some benchmarks to compare COPY statements with and without the patch this weekend. Noah, does it make more sense to do that with just the copy_export-20110209.patch patch file applied, or in combination with some other FDW patch(es)? -Kevin
On Fri, Feb 11, 2011 at 11:31 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Sat, Feb 12, 2011 at 01:12, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Feb 9, 2011 at 2:03 AM, Noah Misch <noah@leadboat.com> wrote: >>> From a functional and code structure perspective, I find this ready to commit. >>> (I assume you'll drop the XXX: indent only comments on commit.) Kevin, did you >>> want to do that performance testing you spoke of? >> >> OK, so is this Ready for Committer, or we're still working on it? > > Basically, we have no more tasks until the FDW core API is applied. > COPY API and file_fdw patches are waiting for it. > > If we extend them a little more, I'd raise two items: > * Should we print foreign table names in error messages? > http://archives.postgresql.org/pgsql-hackers/2011-02/msg00427.php > * COPY encoding patch was rejected, but using client_encoding is > logically wrong for file_fdw. We might need subset of the patch > for file_fdw. It sounds to me like that second issue is a showstopper. I think we either need to reopen discussion on that patch and come up with a resolution that is acceptable ASAP, or we need to punt file_fdw to 9.2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > Basically, we have no more tasks until the FDW core API is > applied. COPY API and file_fdw patches are waiting for it. This is something I've found confusing about this patch set, to the point of not knowing what to test, exactly. The COPY API patch and the patch-on-patch for it both applied cleanly *without any of the other patches* and seemed to run fine, even though the post with a patch-on-patch for the COPY API said that several other patches needed to be applied first. In spite of having tried to follow the posts for all the FDW threads, I'm still confused enough about the relationship between these patches to be unsure what to test. My top priority for testing would be to confirm that there is no adverse impact on existing COPY performance from the refactoring. -Kevin
On Fri, Feb 11, 2011 at 10:31:08AM -0600, Kevin Grittner wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > > Noah Misch <noah@leadboat.com> wrote: > >> From a functional and code structure perspective, I find this > >> ready to commit. (I assume you'll drop the XXX: indent only > >> comments on commit.) Kevin, did you want to do that performance > >> testing you spoke of? > > > > OK, so is this Ready for Committer, or we're still working on it? > > I can run some benchmarks to compare COPY statements with and > without the patch this weekend. Noah, does it make more sense to do > that with just the copy_export-20110209.patch patch file applied, or > in combination with some other FDW patch(es)? I'd say, run them with this patch alone. The important thing is to not penalize existing COPY users. Incidentally, the "did you want ... ?" was a genuine question. I see very little performance risk here, so the tests could be quite cursory, even absent entirely. nm
Noah Misch <noah@leadboat.com> wrote: > I'd say, run them with this patch alone. The important thing is > to not penalize existing COPY users. Sounds good. > Incidentally, the "did you want ... ?" was a genuine question. I > see very little performance risk here, so the tests could be quite > cursory, even absent entirely. From what I've seen, I tend to agree. If there's a committer ready to go over this, I would say that it might be worth waiting for the benchmark results against the patch from the day before yesterday to be run before "pulling the trigger" on it; but proceed on the basis that it's a near-certainty that it will test out OK. -Kevin
Noah Misch <noah@leadboat.com> wrote: > I'd say, run them with this patch alone. The important thing is > to not penalize existing COPY users. Incidentally, the "did you > want ... ?" was a genuine question. I see very little performance > risk here, so the tests could be quite cursory, even absent > entirely. In two hours of testing with a 90GB production database, the copy patch on top of HEAD ran 0.6% faster than HEAD for pg_dumpall (generating identical output files), but feeding that in to and empty cluster with psql ran 8.4% faster with the patch than without! I'm going to repeat that latter with more attention to whether everything made it in OK. (That's not as trivial to check as the dump phase.) Do you see any reason that COPY FROM should be significantly *faster* with the patch? Are there any particular things I should be checking for problems? -Kevin
On Sat, Feb 12, 2011 at 03:42:17PM -0600, Kevin Grittner wrote: > In two hours of testing with a 90GB production database, the copy > patch on top of HEAD ran 0.6% faster than HEAD for pg_dumpall > (generating identical output files), but feeding that in to and > empty cluster with psql ran 8.4% faster with the patch than without! > I'm going to repeat that latter with more attention to whether > everything made it in OK. (That's not as trivial to check as the > dump phase.) > > Do you see any reason that COPY FROM should be significantly > *faster* with the patch? No. Up to, say, 0.5% wouldn't be too surprising, but 8.4% is surprising. What is the uncertainty of that figure? > Are there any particular things I should > be checking for problems? Nothing comes to mind. Thanks, nm
Noah Misch <noah@leadboat.com> wrote: > On Sat, Feb 12, 2011 at 03:42:17PM -0600, Kevin Grittner wrote: >> Do you see any reason that COPY FROM should be significantly >> *faster* with the patch? > > No. Up to, say, 0.5% wouldn't be too surprising, but 8.4% is > surprising. > What is the uncertainty of that figure? With a few more samples, it's not that high. It's hard to dodge around the maintenance tasks on this machine to get good numbers, so I can't really just set something up to run overnight to get numbers in which I can have complete confidence, but (without putting statistical probabilities around it) I feel very safe in saying there isn't a performance *degradation* with the patch. I got four restores of of the 90GB data with the patch and four without. I made sure it was during windows without any maintenance running, did a fresh initdb for each run, and made sure that the disk areas were the same for each run. The times for each version were pretty tightly clustered except for each having one (slow) outlier. If you ignore the outlier for each, there is *no overlap* between the two sets -- the slowest of the non-outlier patched times is faster than the fastest non-patched time. With the patch, compared to without -- best time is 9.8% faster, average time without the outliers is 6.9% faster, average time including outliers is 4.3% faster, outlier is 0.8% faster. Even with just four samples each, since I was careful to minimize distorting factors, that seems like plenty to have confidence that there is no performance *degradation* from the patch. If we want to claim some particular performance *gain* from it, I would need to arrange a dedicated machine and script maybe 100 runs each way to be willing to offer a number for public consumption. Unpatched: real 17m24.171s real 16m52.892s real 16m40.624s real 16m41.700s Patched: real 15m56.249s real 15m47.001s real 15m3.018s real 17m16.157s Since you said that a cursory test, or no test at all, should be good enough given the low risk of performance regression, I didn't book a machine and script a large test run, but if anyone feels that's justified, I can arrange something. -Kevin
On 02/12/2011 05:33 PM, Noah Misch wrote: > On Sat, Feb 12, 2011 at 03:42:17PM -0600, Kevin Grittner wrote: >> In two hours of testing with a 90GB production database, the copy >> patch on top of HEAD ran 0.6% faster than HEAD for pg_dumpall >> (generating identical output files), but feeding that in to and >> empty cluster with psql ran 8.4% faster with the patch than without! >> I'm going to repeat that latter with more attention to whether >> everything made it in OK. (That's not as trivial to check as the >> dump phase.) >> >> Do you see any reason that COPY FROM should be significantly >> *faster* with the patch? > No. Up to, say, 0.5% wouldn't be too surprising, but 8.4% is surprising. What > is the uncertainty of that figure? > > We have seen in the past that changes that might be expected to slow things down slightly can have the opposite effect. For example, see <http://people.planetpostgresql.org/andrew/index.php?/archives/37-Puzzling-results.html> where Tom commented: Yeah, IME it's not unusual for microbenchmark results to move a percent or three in response to any code change at all,even unrelated ones. I suppose it's from effects like critical loops breaking across cache lines differently thanbefore. cheers andrew
On Sun, Feb 13, 2011 at 12:41:11PM -0600, Kevin Grittner wrote: > Unpatched: > real 17m24.171s > real 16m52.892s > real 16m40.624s > real 16m41.700s > > Patched: > real 15m56.249s > real 15m47.001s > real 15m3.018s > real 17m16.157s > > Since you said that a cursory test, or no test at all, should be > good enough given the low risk of performance regression, I didn't > book a machine and script a large test run, but if anyone feels > that's justified, I can arrange something. Based on this, I've taken the liberty of marking the patch Ready for Committer. Thanks.
On Mon, Feb 14, 2011 at 22:06, Noah Misch <noah@leadboat.com> wrote: > On Sun, Feb 13, 2011 at 12:41:11PM -0600, Kevin Grittner wrote: >> Unpatched: >> real 17m24.171s >> real 16m52.892s >> real 16m40.624s >> real 16m41.700s >> >> Patched: >> real 15m56.249s >> real 15m47.001s >> real 15m3.018s >> real 17m16.157s >> >> Since you said that a cursory test, or no test at all, should be >> good enough given the low risk of performance regression, I didn't >> book a machine and script a large test run, but if anyone feels >> that's justified, I can arrange something. > > Based on this, I've taken the liberty of marking the patch Ready for Committer. Thank you very much for performance testing and reviewing! The result is interesting because I didn't intend performance optimization. At least no performance regression is enough for the purpose. -- Itagaki Takahiro
On Tue, Feb 8, 2011 at 00:30, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > Fixed version is attached. I reviewed your latest git version, that is a bit newer than the attached patch. http://git.postgresql.org/gitweb?p=users/hanada/postgres.git;a=commit;h=0e1a1e1b0e168cb3d8ff4d637747d0ba8f7b8d55 The code still works with small adjustment, but needs to be rebased on the latest master, especially for extension support and copy API changes. Here are a list of comments and suggestions: * You might forget some combination or unspecified options in file_fdw_validator(). For example, format == NULL or !csv && header cases. I've not tested all cases, but please recheckvalidations used in BeginCopy(). * estimate_costs() needs own row estimation rather than using baserel. > > What value does baserel->tuples have? > > Foreign tables are never analyzed for now. Is the number correct? > No, baserel->tuples is always 0, and baserel->pages is 0 too. > In addition, width and rows are set to 100 and 1, as default values. It means baserel is not reliable at all, right? If so, we need alternative solution in estimate_costs(). We adjust statistics with runtime relation size in estimate_rel_size(). Also, we use get_rel_data_width() for not analyzed tables. We could use similar technique in file_fdw, too. * Should use int64 for file byte size (or, uint32 in blocks). unsigned long might be 32bit. ulong is not portable. * Oid List (list_make1_oid) might be more handy than Const to hold relid in FdwPlan.fdw_private. * I prefer FileFdwExecutionState to FileFdwPrivate, because there are two different 'fdw_private' variables in FdwPlan andFdwExecutionState. * The comment in fileIterate seems wrong. It should be /* Store the next tuple as a virtual tuple. */ , right? * #include <sys/stat.h> is needed. -- Itagaki Takahiro
On Wed, 16 Feb 2011 16:48:33 +0900 Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Feb 8, 2011 at 00:30, Shigeru HANADA <hanada@metrosystems.co.jp> wrote: > > Fixed version is attached. > > I reviewed your latest git version, that is a bit newer than the attached patch. > http://git.postgresql.org/gitweb?p=users/hanada/postgres.git;a=commit;h=0e1a1e1b0e168cb3d8ff4d637747d0ba8f7b8d55 > > The code still works with small adjustment, but needs to be rebased on the > latest master, especially for extension support and copy API changes. > > Here are a list of comments and suggestions: Thanks for the comments. Revised version of patch has been attached. > * You might forget some combination or unspecified options in > file_fdw_validator(). > For example, format == NULL or !csv && header cases. I've not tested all > cases, but please recheck validations used in BeginCopy(). Right, I've revised validation based on BeginCopy(), and added regression tests about validation. > * estimate_costs() needs own row estimation rather than using baserel. > > > What value does baserel->tuples have? > > > Foreign tables are never analyzed for now. Is the number correct? > > No, baserel->tuples is always 0, and baserel->pages is 0 too. > > In addition, width and rows are set to 100 and 1, as default values. > > It means baserel is not reliable at all, right? Right, tables which has not been ANALYZEd have default stats in baserel. But getting # of records needs another parsing for the file... > If so, we need alternative > solution in estimate_costs(). We adjust statistics with runtime relation > size in estimate_rel_size(). Also, we use get_rel_data_width() for not > analyzed tables. We could use similar technique in file_fdw, too. Ah, using get_relation_data_width(), exported version of get_rel_data_width(), seems to help estimation. I'll research around it little more. By the way, adding ANALYZE support for foreign tables is reasonable idea for this issue? > * Should use int64 for file byte size (or, uint32 in blocks). > unsigned long might be 32bit. ulong is not portable. > > * Oid List (list_make1_oid) might be more handy than Const to hold relid > in FdwPlan.fdw_private. > > * I prefer FileFdwExecutionState to FileFdwPrivate, because there are > two different 'fdw_private' variables in FdwPlan and FdwExecutionState. > > * The comment in fileIterate seems wrong. It should be > /* Store the next tuple as a virtual tuple. */ , right? > > * #include <sys/stat.h> is needed. Fixed all of above. Regards, -- Shigeru Hanada
Attachment
Shigeru HANADA <hanada@metrosystems.co.jp> writes: > [ 20110218-file_fdw.patch ] I've adjusted this to fit the extensions infrastructure and the committed version of the FDW API patch, and applied it. >> * You might forget some combination or unspecified options in >> file_fdw_validator(). >> For example, format == NULL or !csv && header cases. I've not tested all >> cases, but please recheck validations used in BeginCopy(). > Right, I've revised validation based on BeginCopy(), and added > regression tests about validation. This approach struck me as entirely unmaintainable. I modified the core COPY code to allow its option validation code to be called directly. >> If so, we need alternative >> solution in estimate_costs(). We adjust statistics with runtime relation >> size in estimate_rel_size(). Also, we use get_rel_data_width() for not >> analyzed tables. We could use similar technique in file_fdw, too. > Ah, using get_relation_data_width(), exported version of > get_rel_data_width(), seems to help estimation. I'll research around > it little more. By the way, adding ANALYZE support for foreign tables > is reasonable idea for this issue? I did some quick hacking so that the numbers are at least a little bit credible, but of course without ANALYZE support the qualification selectivity estimates are likely to be pretty bogus. I am not sure whether there's much of a use-case for supporting ANALYZE though. I would think that if someone is going to read the same file in multiple queries, they'd be far better off importing the data into a real table. In any case, it's too late to worry about that for 9.1. I suggest waiting to see what sort of real-world usage file_fdw gets before we worry about whether it needs ANALYZE support. regards, tom lane
Is this right? postgres=# \d+ agg_text Foreign table "public.agg_text"Column | Type | Modifiers | Storage | Description --------+----------+-----------+----------+-------------a | smallint | | plain |b | text | | extended | Server: file_server Has OIDs: no It says the agg_text foreign table is using extended storage for the text field. If it's in-file, how can it be classified as potentially TOASTed? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
Thom Brown <thom@linux.com> writes: > Is this right? > postgres=# \d+ agg_text > Foreign table "public.agg_text" > Column | Type | Modifiers | Storage | Description > --------+----------+-----------+----------+------------- > a | smallint | | plain | > b | text | | extended | > Server: file_server > Has OIDs: no > It says the agg_text foreign table is using extended storage for the > text field. If it's in-file, how can it be classified as potentially > TOASTed? It's meaningless, but I don't think it's worth trying to suppress the meaningless column(s) ... regards, tom lane