On 10/09/2013 11:23 AM, Dimitri Fontaine wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I don't see at all that your suggested alternative has any advantages over
>> what's been written. If you can say "NULL FOR (foo) as '""' how will you
>> specify the null for some other column(s)? Are we going to have multiple
>> such clauses? It looks like a real mess.
> Basically the CSV files don't have out-of-band NULLs and it's then a
> real mess. In the new pgloader version I've been adding per-column NULL
> processing, where NULL can be either an empty string, any number of
> space characters or any constant string such as "\N" or "****".
>
> I first added a global per-file NULL representation setting, but that's
> not flexible enough to make any sense really. The files we have to
> import are way to "creative" in their formats.
>
> In my view, we can slowly deprecate pgloader by including such features
> in the core code or make pgloader and the like non-optional parts of
> external data loading tool chain.
>
The CSV code was somewhat controversial when adopted, and was never
intended to cater for all cases. I think it was accepted because it gave
good coverage of a large number of common cases without huge additional
code complexity. I think we drew the line in about the right place for
what we support, although we've extended it modestly over the years. I
seriously doubt that it will ever fully replace a utility like pgloader,
and I'm not sure that's a desirable goal in the first place.
cheers
andrew