Re: RETURNING syntax for COPY - Mailing list pgsql-hackers

From Ryan Kelly
Subject Re: RETURNING syntax for COPY
Date
Msg-id 20130508181659.GA9650@llserver.lakeliving.com
Whole thread Raw
In response to Re: RETURNING syntax for COPY  (David Fetter <david@fetter.org>)
Responses Re: RETURNING syntax for COPY  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Wed, May 05/08/13, 2013 at 10:55:40AM -0700, David Fetter wrote:
> On Wed, May 08, 2013 at 01:16:14PM -0400, Tom Lane wrote:
> > Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > > On 08.05.2013 19:44, Tom Lane wrote:
> > >> No there isn't; what you suggest would require FE/BE protocol
> > >> extensions, making it several orders of magnitude more work than the
> > >> other thing.
> > 
> > > I'd imagine that the flow would go something like this:
> > 
> > > BE    FE
> > 
> > > CopyInResponse
> > >     CopyData
> > >     CopyData
> > >     ...
> > >     CopyDone
> > > RowDescription
> > > DataRow
> > > DataRow
> > > CommandComplete
> > 
> > That would require the backend to buffer the entire query response,
> > which isn't a great idea.  I would expect that such an operation would
> > need to interleave CopyData to the backend with DataRow responses.  Such
> > a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> > of work (at both ends) for extremely debatable value.
> > 
> > The general idea of COPY is to load data as fast as possible,
> 
> With utmost respect, that is one of several use cases, and any change
> would need to keep that use case unburdened.  A sometimes overlapping
> set of use cases move data in and out of the database in a simple
> manner.  In some of these, people might wish to trade some performance
> for the feature.

99% of my uses at work for COPY are as a general data import and export
facility. I often find myself loading CSV files into our database for
analysis and further cleanup, and then use COPY to output queries as CSV
files for consumption by other members of the business.

The recent work for (PRE|POST)PROCESSOR options to COPY is indicative of
the fact that users are not merely using COPY to "load data as fast as
possible".

Other discussions around a COMPRESSED option are more than just a
performance enhancement, in my view, as I oftern receive files
compressed and decompressing the data is just another step standing in
the way of myself importing the data into the database.

Additionally, once I have the data imported, I often take many steps to
cleanup and format the data, prior to applying actual typing to a table
(which invariably fails due to invalid dates, and other nonsense).

COPY ... RETURNING would certainly be useful to apply additional
transformations to the data before finally sending it to its ultimate
destination.

> A particular example would be one where there are several tables to be
> loaded, some with generated columns that the future ones would depend
> on.  Yes, it's possible (kinda) to do this with the FDW machinery, but
> the burden is much higher as it requires DDL permission in general
> each time.

I find using the FDW machinery to perform many queries to be much slower
than importing the data once and then running my queries. There is also
no ability to use indexes.

> > so weighing it down with processing options seems like a pretty
> > dubious idea even if the implementation were easy.
> 
> Totally agreed that the "fast load/unload" code path must not be
> affected by any such changes.

Agreed here as well.

-Ryan P. Kelly




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Terminology issue: suffix tree
Next
From: Stephen Frost
Date:
Subject: Re: RETURNING syntax for COPY