Re: COPY Transform support - Mailing list pgsql-hackers

From NikhilS
Subject Re: COPY Transform support
Date
Msg-id d3c4af540804030659h70e1ed5aw57305b0e69e056e@mail.gmail.com
Whole thread Raw
In response to COPY Transform support  (Dimitri Fontaine <dfontaine@hi-media.com>)
Responses Re: COPY Transform support  (PFC <lists@peufeu.com>)
List pgsql-hackers
Hi, <br /><br /><br /><div class="gmail_quote">On Thu, Apr 3, 2008 at 6:47 PM, Dimitri Fontaine <<a
href="mailto:dfontaine@hi-media.com">dfontaine@hi-media.com</a>>wrote:<br /><blockquote class="gmail_quote"
style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Here's a proposal for
COPYto support the T part of an ETL, that is adding the<br /> capability for COPY FROM to Transform the data it
gets.<br/><br /> The idea is quite simple: adding to COPY FROM the option to run a function on<br /> the data before to
calldatatype_in functions. This needs some syntax<br /> addition to be worked out at the COPY side, then the COPY code
willhave to<br /> run the given function on the read data and consider giving the output of it<br /> to current COPY
code(datatype input function).<br /><br /> The function could either get the data as text or bytea, and would have
to<br/> return either text or bytea. bytea seems the more sensible choice, as long as<br /> we don't lose encoding
informationthere, which I'm not sure about.<br /><br /> The syntax could be something like:<br />  COPY mytable FROM
'/my/file.txt'WITH COLUMN x CONVERT USING myfunc;<br /><br /> I tried to only add keywords already present in [1],
whilegetting something<br /> meaningfull... and x is intended to be the column number, counting from 1.<br />  [1] <a
href="http://www.postgresql.org/docs/8.3/static/sql-keywords-appendix.html"
target="_blank">http://www.postgresql.org/docs/8.3/static/sql-keywords-appendix.html</a><br/><br /> Comments?<br
/><fontcolor="#888888">--<br /> dim<br /></font></blockquote></div><br />+1<br /><br />Data transformation while doing
adata load is a requirement now and then. Considering that users will have to do mass updates *after* the load
completesto mend the data to their liking should be reason enough to do this while the loading is happening. I think to
goabout it the right way we should support the following:<br /><br /> <br />* The ability to provide per-column
transformationexpressions<br /><br /> COPY mytable (col1 transform to "col1 + 10", col2 transform to "'Post' ||
'greSQL'",col3...) FROM ..<br /><br />* The ability to use any kind of expressions while doing the transformation <br
/>The transformation expression should be any expression (basically ExecEvalExpr) that can be evaluated to give a
resultingvalue and obviously a corresponding is_null value too. It should and could be system in-built functions (e.g.
UPPER,TRIM, TO_CHAR, TO_NUMBER etc.) or user defined functions too<br /><br clear="all" />* The transformation
expressioncan refer to other columns involved in the load. So that when the current row is extracted from the input
file,the current values should be used to generate the new resultant values before doing a heap_form_tuple. E.g.<br
/><br/>(col1 transform "col1 + 10", col2 transform "col1 * col2", col3 transform "UPPER(col1 || col3)",...)<br /><br
/>Ihave spent some thoughts on how to do this and will be happy to share the same if the list is interested.
Personally,I think data transformation using such expressions is a pretty powerful and important activity while doing
thedata load itself.<br /><br />Regards,<br />Nikhils <br />-- <br />EnterpriseDB <a
href="http://www.enterprisedb.com">http://www.enterprisedb.com</a>

pgsql-hackers by date:

Previous
From: "Greg Sabino Mullane"
Date:
Subject: Re: modules
Next
From: Aidan Van Dyk
Date:
Subject: Re: modules