Thread: textarray option for file FDW

textarray option for file FDW

From
Andrew Dunstan
Date:
I've been waiting for the latest FDW patches as patiently as I can, and 
I've been reviewing them this morning, in particular the file_fdw patch 
and how it interacts with the newly exposed COPY API. Overall it seems 
to be a whole lot cleaner, and the wholesale duplication of the copy 
code is gone, so it's much nicer and cleaner. So now I'd like to add a 
new option to it: "textarray". This option would require that the 
foreign table have exactly one field, of type text[], and would compose 
all the field strings read from the file for each record into the array 
(however many there are). This would require a few changes to 
contrib/file_fdw/file_fdw.c and a few changes to 
src/backend/commands/copy.c, which I can probably have done in fairly 
short order, Deo Volente. This will allow something like:
   CREATE FOREIGN TABLE arr_text (        t text[]   )  SERVER file_server   OPTIONS (format 'csv', filename
'/path/to/ragged.csv',textarray   'true');   SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there   FROM
arr_text;

Thoughts?

cheers

andrew


Re: textarray option for file FDW

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> ... So now I'd like to add a 
> new option to it: "textarray". This option would require that the 
> foreign table have exactly one field, of type text[], and would compose 
> all the field strings read from the file for each record into the array 
> (however many there are).

Why is this a good thing?  It seems like it would accomplish little
except to defeat the SQL type system entirely.
        regards, tom lane


Re: textarray option for file FDW

From
Andrew Dunstan
Date:

On 01/15/2011 12:44 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> ... So now I'd like to add a
>> new option to it: "textarray". This option would require that the
>> foreign table have exactly one field, of type text[], and would compose
>> all the field strings read from the file for each record into the array
>> (however many there are).
> Why is this a good thing?  It seems like it would accomplish little
> except to defeat the SQL type system entirely.
>
>             


We have discussed previously allowing this sort of thing. It's not a new 
proposal at all.

My use case is that I have a customer who reads in data like this (using 
my patch to allow ragged CSV input, which you previously objected to) 
and then performs a sophisticated battery of validity tests on the data 
before loading it into its final destination. To do that their 
requirement is that we not error out on reading the data, so we load the 
data into a table that is all text fields.

In fact, having COPY read in a text array is *your* suggestion: 
<http://archives.postgresql.org/pgsql-hackers/2009-09/msg00547.php>. 
This is simply a proposal to implement that via FDW, which makes it easy 
to avoid any syntax issues.

cheers

andrew


Re: textarray option for file FDW

From
Itagaki Takahiro
Date:
On Sun, Jan 16, 2011 at 02:29, Andrew Dunstan <andrew@dunslane.net> wrote:
> "textarray". This option would require that the foreign table have exactly
> one field, of type text[], and would compose all the field strings read from
> the file for each record into the array (however many there are).
>
>   CREATE FOREIGN TABLE arr_text (
>        t text[]
>   )  SERVER file_server
>   OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray 'true');

FOREIGN TABLE has stable definitions, so there are some issues
that doesn't exist in COPY command:
- when the type of the last column is not a text[]
- when the last column is changed by ALTER FOREIGN TABLE ADD/DROP COLUMN

BTW, those options could be specified in column options rather than table
options. But we don't have column option syntax in the current proposal.

  CREATE FOREIGN TABLE arr_text (
       i integer OPTION (column 1), -- column position in csv file
       t text[]  OPTION (column 'all the rest'),
       d date    OPTION (column 2)
  )  SERVER file_server
  OPTIONS (format 'csv', filename '/path/to/ragged.csv');

--
Itagaki Takahiro


Re: textarray option for file FDW

From
Andrew Dunstan
Date:

On 01/15/2011 01:24 PM, Itagaki Takahiro wrote:
> On Sun, Jan 16, 2011 at 02:29, Andrew Dunstan<andrew@dunslane.net>  wrote:
>> "textarray". This option would require that the foreign table have exactly
>> one field, of type text[], and would compose all the field strings read from
>> the file for each record into the array (however many there are).
>>
>>    CREATE FOREIGN TABLE arr_text (
>>         t text[]
>>    )  SERVER file_server
>>    OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray 'true');
> FOREIGN TABLE has stable definitions, so there are some issues
> that doesn't exist in COPY command:
> - when the type of the last column is not a text[]
> - when the last column is changed by ALTER FOREIGN TABLE ADD/DROP COLUMN


I intended that this would be checked for validity at runtime, in 
BeginCopy(). If the table doesn't match the requirement an error would 
be raised. I don't think it's a big problem.

> BTW, those options could be specified in column options rather than table
> options. But we don't have column option syntax in the current proposal.
>
>    CREATE FOREIGN TABLE arr_text (
>         i integer OPTION (column 1), -- column position in csv file
>         t text[]  OPTION (column 'all the rest'),
>         d date    OPTION (column 2)
>    )  SERVER file_server
>    OPTIONS (format 'csv', filename '/path/to/ragged.csv');


Well, ok, but is there any proposal on the table to do that?

cheers

andrew




Re: textarray option for file FDW

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Why is this a good thing?  It seems like it would accomplish little
> except to defeat the SQL type system entirely.

It simply allows to have the classical ETL problem solved directly in
the database server, in SQL.  That's huge.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: textarray option for file FDW

From
Andrew Dunstan
Date:

On 01/15/2011 12:29 PM, Andrew Dunstan wrote:
>
> I've been waiting for the latest FDW patches as patiently as I can,
> and I've been reviewing them this morning, in particular the file_fdw
> patch and how it interacts with the newly exposed COPY API. Overall it
> seems to be a whole lot cleaner, and the wholesale duplication of the
> copy code is gone, so it's much nicer and cleaner. So now I'd like to
> add a new option to it: "textarray". This option would require that
> the foreign table have exactly one field, of type text[], and would
> compose all the field strings read from the file for each record into
> the array (however many there are). This would require a few changes
> to contrib/file_fdw/file_fdw.c and a few changes to
> src/backend/commands/copy.c, which I can probably have done in fairly
> short order, Deo Volente. This will allow something like:
>
>    CREATE FOREIGN TABLE arr_text (
>         t text[]
>    )  SERVER file_server
>    OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray
>    'true');
>    SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there
>    FROM arr_text;
>
>

A WIP patch is attached. It's against Shigeru Hanada's latest FDW
patches. It's surprisingly tiny. Right now it probably leaks memory like
a sieve, and that's the next thing I'm going to chase down.

cheers

andrew

Attachment

Re: textarray option for file FDW

From
Andrew Dunstan
Date:

On 01/15/2011 07:41 PM, Andrew Dunstan wrote:
>
>
> On 01/15/2011 12:29 PM, Andrew Dunstan wrote:
>>
>> I've been waiting for the latest FDW patches as patiently as I can,
>> and I've been reviewing them this morning, in particular the file_fdw
>> patch and how it interacts with the newly exposed COPY API. Overall
>> it seems to be a whole lot cleaner, and the wholesale duplication of
>> the copy code is gone, so it's much nicer and cleaner. So now I'd
>> like to add a new option to it: "textarray". This option would
>> require that the foreign table have exactly one field, of type
>> text[], and would compose all the field strings read from the file
>> for each record into the array (however many there are). This would
>> require a few changes to contrib/file_fdw/file_fdw.c and a few
>> changes to src/backend/commands/copy.c, which I can probably have
>> done in fairly short order, Deo Volente. This will allow something like:
>>
>>    CREATE FOREIGN TABLE arr_text (
>>         t text[]
>>    )  SERVER file_server
>>    OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray
>>    'true');
>>    SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there
>>    FROM arr_text;
>>
>>
>
> A WIP patch is attached. It's against Shigeru Hanada's latest FDW
> patches. It's surprisingly tiny. Right now it probably leaks memory
> like a sieve, and that's the next thing I'm going to chase down.
>
>

Updated patch attached, that should use memory better.

cheers

andrew



Attachment