Re: COPY enhancements - Mailing list pgsql-hackers

From Greg Smith
Subject Re: COPY enhancements
Date
Msg-id alpine.GSO.2.01.0909111651530.7278@westnet.com
Whole thread Raw
In response to Re: COPY enhancements  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: COPY enhancements
Re: COPY enhancements
List pgsql-hackers
On Fri, 11 Sep 2009, Tom Lane wrote:

> If you believe that somebody might think of a new per-column COPY 
> behavior in the future, then the same issue is going to come up again.

While Andrew may have given up on a quick hack to work around his recent 
request, I don't have that luxury.  We've already had to add two new 
behaviors here to COPY in our version and I expect more in the future. 
The performance of every path to get data into the database besides COPY 
is too miserable for us to use anything else, and the current 
inflexibility makes it useless for anything but the cleanest input data.

The full set of new behavior here I'd like to see allows adjusting:

-Accept or reject rows with extra columns?
-Accept or reject rows that are missing columns at the end?
--Fill them with the default for the column (if available) or NULL?
-Save rejected rows?
--To a single system table?
--To a user-defined table?
--To the database logs?

The user-defined table for rejects is obviously exclusive of the system 
one, either of those would be fine from my perspective.

I wasn't really pleased with the "if it's not the most general solution 
possible we're not interested" tone of Andrew's other COPY-change thread 
this week.  I don't think there's *that* many common requests here that 
they can't all be handled by specific implementations, and the scope creep 
of launching into a general framework for adding them is just going to 
lead to nothing useful getting committed.  If you want something really 
complicated, drop into a PL-based solution.  The stuff I list above I see 
regular requests for at *every* PG installation I've ever been involved 
in, and it would be fantastic if they were available out of the box.

But I think it's quite reasonable to say the COPY syntax needs to be 
overhauled to handle all these.  The two changes we've made at Truviso 
both use GUCs to control their behavior, and I'm guessing Aster did that 
too for the same reasons we did:  it's easier to do and makes for cleaner 
upstream merges.  That approach doesn't really scale well though to many 
options, and when considered for core the merge concerns obviously go 
away.  (The main reason I haven't pushed for us to submit our 
customizations here is that I know perfectly well the GUC-based UI isn't 
acceptable, but I haven't been able to get a better one done yet)

This auto-partioning stuff is interesting if the INSERT performance of it 
can be made reasonable.  I think Emmanuel is too new to the community 
process here to realize that there's little hope of those getting 
committed or even reviewed together.  If I were reviewing this I'd just 
kick it back as "separate these cleanly into separate patches where the 
partitioning one depends on the logging one" before even starting to look 
at the code, it's too much stuff to consume properly in one gulp.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: COPY enhancements
Next
From: Tom Lane
Date:
Subject: Re: COPY enhancements