Re: COPY enhancements - Mailing list pgsql-hackers

From Greg Smith
Subject Re: COPY enhancements
Date
Msg-id alpine.GSO.2.01.0909120324010.9961@westnet.com
Whole thread Raw
In response to Re: COPY enhancements  (Josh Berkus <josh@agliodbs.com>)
Responses Re: COPY enhancements
List pgsql-hackers
On Fri, 11 Sep 2009, Josh Berkus wrote:

> I've been thinking about it, and can't come up with a really strong case
> for wanting a user-defined table if we settle the issue of having a
> strong key for pg_copy_errors.  Do you have one?

No, but I'd think that if the user table was only allowed to be the exact 
same format as the system one it wouldn't be that hard to implement--once 
the COPY syntax is expanded at least.  I'm reminded of how Oracle EXPLAIN 
PLANs get logged into the PLAN_TABLE by default, but you can specify "INTO 
table" to put them somewhere else.  You'd basically doing the same thing 
but with a different destination relation.

> After some thought, I think that Andrew's feature *is* generally
> applicable, if done as IGNORE COLUMN COUNT (or, more likely,
> column_count=ignore).  I can think of a lot of data sets where column
> count is jagged and you want to do ELT instead of ETL.

Exactly, the ELT approach gives you so many more options for cleaning up 
the data that I think it would be used more if it weren't so hard to 
do in Postgres right now.

> As opposed to Tom, Peter and Heikki vetoing things because the feature 
> gain doesn't justify the maintnenance burden?  That's your real choice. 
> Adding a framework for manageable syntax extensions means that we can be 
> more liberal about what we justify as an extension.

I think you're not talking at the distinction I was trying to make.  The 
work to make the *syntax* for COPY easier to extend is an unfortunate 
requirement for all these new bits; no arguments from me that using GUCs 
for everything is just too painful

What I was suggesting is that the first set of useful features required 
for what you're calling the ELT load path is both small and well 
understood.  An implementation of the stuff I see a constant need for 
could get banged out so fast that trying to completely generalize it on 
the first pass has a questionable return.

While complicated, COPY is a pretty walled off command of around 3500 
lines of code, and the hackery required here is pretty small.  For 
example, it turns out we do already have the code to get it to ignore 
column overruns here, and it's all of 50 new lines--much of which is 
shared with code that does other error ignoring bits too.  It's easy to 
make a case for a grand future extensibility cleanup here, but it's really 
not necessary to provide a significant benefit here for the cases I 
mentioned.  And I would guess the maintenance burden of a more general 
solution has to be higher than a simple implementation of the feature list 
I gave in my last message.

In short:  there's a presumption that adding any error-ignoring code would 
require significant contortions.  I don't think that's really true though, 
and would like to keep open the possibilty of accepting some simple but 
useful ad-hoc features in this area, even if they don't solve every 
possible problem in this space just yet.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: COPY enhancements
Next
From: Greg Smith
Date:
Subject: Re: COPY enhancements