Thread: textarray option for file FDW
I've been waiting for the latest FDW patches as patiently as I can, and I've been reviewing them this morning, in particular the file_fdw patch and how it interacts with the newly exposed COPY API. Overall it seems to be a whole lot cleaner, and the wholesale duplication of the copy code is gone, so it's much nicer and cleaner. So now I'd like to add a new option to it: "textarray". This option would require that the foreign table have exactly one field, of type text[], and would compose all the field strings read from the file for each record into the array (however many there are). This would require a few changes to contrib/file_fdw/file_fdw.c and a few changes to src/backend/commands/copy.c, which I can probably have done in fairly short order, Deo Volente. This will allow something like: CREATE FOREIGN TABLE arr_text ( t text[] ) SERVER file_server OPTIONS (format 'csv', filename '/path/to/ragged.csv',textarray 'true'); SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there FROM arr_text; Thoughts? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > ... So now I'd like to add a > new option to it: "textarray". This option would require that the > foreign table have exactly one field, of type text[], and would compose > all the field strings read from the file for each record into the array > (however many there are). Why is this a good thing? It seems like it would accomplish little except to defeat the SQL type system entirely. regards, tom lane
On 01/15/2011 12:44 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> ... So now I'd like to add a >> new option to it: "textarray". This option would require that the >> foreign table have exactly one field, of type text[], and would compose >> all the field strings read from the file for each record into the array >> (however many there are). > Why is this a good thing? It seems like it would accomplish little > except to defeat the SQL type system entirely. > > We have discussed previously allowing this sort of thing. It's not a new proposal at all. My use case is that I have a customer who reads in data like this (using my patch to allow ragged CSV input, which you previously objected to) and then performs a sophisticated battery of validity tests on the data before loading it into its final destination. To do that their requirement is that we not error out on reading the data, so we load the data into a table that is all text fields. In fact, having COPY read in a text array is *your* suggestion: <http://archives.postgresql.org/pgsql-hackers/2009-09/msg00547.php>. This is simply a proposal to implement that via FDW, which makes it easy to avoid any syntax issues. cheers andrew
On Sun, Jan 16, 2011 at 02:29, Andrew Dunstan <andrew@dunslane.net> wrote: > "textarray". This option would require that the foreign table have exactly > one field, of type text[], and would compose all the field strings read from > the file for each record into the array (however many there are). > > CREATE FOREIGN TABLE arr_text ( > t text[] > ) SERVER file_server > OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray 'true'); FOREIGN TABLE has stable definitions, so there are some issues that doesn't exist in COPY command: - when the type of the last column is not a text[] - when the last column is changed by ALTER FOREIGN TABLE ADD/DROP COLUMN BTW, those options could be specified in column options rather than table options. But we don't have column option syntax in the current proposal. CREATE FOREIGN TABLE arr_text ( i integer OPTION (column 1), -- column position in csv file t text[] OPTION (column 'all the rest'), d date OPTION (column 2) ) SERVER file_server OPTIONS (format 'csv', filename '/path/to/ragged.csv'); -- Itagaki Takahiro
On 01/15/2011 01:24 PM, Itagaki Takahiro wrote: > On Sun, Jan 16, 2011 at 02:29, Andrew Dunstan<andrew@dunslane.net> wrote: >> "textarray". This option would require that the foreign table have exactly >> one field, of type text[], and would compose all the field strings read from >> the file for each record into the array (however many there are). >> >> CREATE FOREIGN TABLE arr_text ( >> t text[] >> ) SERVER file_server >> OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray 'true'); > FOREIGN TABLE has stable definitions, so there are some issues > that doesn't exist in COPY command: > - when the type of the last column is not a text[] > - when the last column is changed by ALTER FOREIGN TABLE ADD/DROP COLUMN I intended that this would be checked for validity at runtime, in BeginCopy(). If the table doesn't match the requirement an error would be raised. I don't think it's a big problem. > BTW, those options could be specified in column options rather than table > options. But we don't have column option syntax in the current proposal. > > CREATE FOREIGN TABLE arr_text ( > i integer OPTION (column 1), -- column position in csv file > t text[] OPTION (column 'all the rest'), > d date OPTION (column 2) > ) SERVER file_server > OPTIONS (format 'csv', filename '/path/to/ragged.csv'); Well, ok, but is there any proposal on the table to do that? cheers andrew
Tom Lane <tgl@sss.pgh.pa.us> writes: > Why is this a good thing? It seems like it would accomplish little > except to defeat the SQL type system entirely. It simply allows to have the classical ETL problem solved directly in the database server, in SQL. That's huge. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 01/15/2011 12:29 PM, Andrew Dunstan wrote: > > I've been waiting for the latest FDW patches as patiently as I can, > and I've been reviewing them this morning, in particular the file_fdw > patch and how it interacts with the newly exposed COPY API. Overall it > seems to be a whole lot cleaner, and the wholesale duplication of the > copy code is gone, so it's much nicer and cleaner. So now I'd like to > add a new option to it: "textarray". This option would require that > the foreign table have exactly one field, of type text[], and would > compose all the field strings read from the file for each record into > the array (however many there are). This would require a few changes > to contrib/file_fdw/file_fdw.c and a few changes to > src/backend/commands/copy.c, which I can probably have done in fairly > short order, Deo Volente. This will allow something like: > > CREATE FOREIGN TABLE arr_text ( > t text[] > ) SERVER file_server > OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray > 'true'); > SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there > FROM arr_text; > > A WIP patch is attached. It's against Shigeru Hanada's latest FDW patches. It's surprisingly tiny. Right now it probably leaks memory like a sieve, and that's the next thing I'm going to chase down. cheers andrew
Attachment
On 01/15/2011 07:41 PM, Andrew Dunstan wrote: > > > On 01/15/2011 12:29 PM, Andrew Dunstan wrote: >> >> I've been waiting for the latest FDW patches as patiently as I can, >> and I've been reviewing them this morning, in particular the file_fdw >> patch and how it interacts with the newly exposed COPY API. Overall >> it seems to be a whole lot cleaner, and the wholesale duplication of >> the copy code is gone, so it's much nicer and cleaner. So now I'd >> like to add a new option to it: "textarray". This option would >> require that the foreign table have exactly one field, of type >> text[], and would compose all the field strings read from the file >> for each record into the array (however many there are). This would >> require a few changes to contrib/file_fdw/file_fdw.c and a few >> changes to src/backend/commands/copy.c, which I can probably have >> done in fairly short order, Deo Volente. This will allow something like: >> >> CREATE FOREIGN TABLE arr_text ( >> t text[] >> ) SERVER file_server >> OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray >> 'true'); >> SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there >> FROM arr_text; >> >> > > A WIP patch is attached. It's against Shigeru Hanada's latest FDW > patches. It's surprisingly tiny. Right now it probably leaks memory > like a sieve, and that's the next thing I'm going to chase down. > > Updated patch attached, that should use memory better. cheers andrew