Re: COPY formatting - Mailing list pgsql-hackers

From Chris Browne
Subject Re: COPY formatting
Date
Msg-id 60u10nz47l.fsf@dev6.int.libertyrms.info
Whole thread Raw
In response to COPY formatting  (Karel Zak <zakkr@zf.jcu.cz>)
List pgsql-hackers
andrew@dunslane.net (Andrew Dunstan) writes:
> Karel Zak wrote:
>
>> Hi,
>>
>> in TODO is item: "* Allow dump/load of CSV format". I don't think
>> it's clean idea. Why CSV and why not something other? :-)
>>
>> A why not allow to users full control of the format by they own
>> function. It means something like:
>> COPY tablename [ ( column [, ...] ) ]
>>     TO { 'filename' | STDOUT }
>>     [ [ WITH ]          [ BINARY ]
>>          [ OIDS ]
>>          [ DELIMITER [ AS ] 'delimiter' ]
>>          [ NULL [ AS ] 'null string' ]
>>          [ FORMAT funcname ] ]
>>           ^^^^^^^^^^^^^^^^
>>                                                 The formatting
>> function API can be pretty simple:
>>
>> text *my_copy_format(text *attrdata, int direction,             int
>> nattrs, int attr, oid attrtype, oid relation)
>>
>> -- it's pseudocode of course, it should be use standard fmgr
>> interface.
>> It's probably interesting for non-binary COPY version.
>
> Interesting ... The alternative might be an external program to munge
> CSVs and whatever other format people want to support and then call
> the exisiting COPY- either in bin or contrib. I have seen lots of
> people wanting to import CSVs, and that's even before we get a Windows
> port.

I know Jan Wieck has been working on something like this, with a bit
of further smarts...
- By having, alongside, a table definition, the table can be created   concurrently;
- A set of mapping functions can be used, so that if, for instance,  the program generating the data was Excel, and you
havea field with  values like 37985, 38045, or 38061, they can respectively be mapped  to '2004-01-01', '2004-03-01',
and'2004-03-17';
 
- It can load whatever data is loadable, and use Ethernet-like  backoffs when it encounters bad records so that it
loadsall the data  that is good, and leaves a bundle of 'crud' that is left over.
 

He had been prototyping it in Tcl; I'm not sure how far a port to C
has gotten.  It looked pretty neat; it sure seems better to put the
"cleverness" in userspace than to try to increase the complexity of
the postmaster...
-- 
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/linuxxian.html
Have you heard of the new Macsyma processor?  It has three instructions --
LOAD, STORE, and SKIP IF INTEGRABLE.


pgsql-hackers by date:

Previous
From: Chris Ryan
Date:
Subject: Re: [pgsql-www] The Name Game: postgresql.net vs. pgfoundry.org
Next
From: Andrew Dunstan
Date:
Subject: Re: COPY formatting