Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date
Msg-id 7b97c5a40912291856k21b8a2x6da061aad0ea9089@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Dec 29, 2009 at 6:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think there's clear support for a version of COPY that returns rows
> like a SELECT statement, particularly for use with CTEs.  There seems
> to be support both for a mode that returns text[] or something like it
> and also for a mode that returns a defined record type.  But that all
> seems separate from what you're proposing here, which is a
> considerably lower-level facility which seems like it would not be of
> much use to ordinary users, but might be of some value to tool
> implementors - or perhaps you'd disagree with that characterization?
>

This is in the other direction: freeing COPY from the restriction that
it can only put bytes into two places:

* A network socket (e.g. stdout)
* A file (as supseruser)

Instead, it can hand off bytes to an arbitrary UDF that can handle it
in any way.  A clean design should be able to subsume at least the
existing simple behaviors, plus enabling more, as well as potentially
providing inspiration for how to decouple at least a few components of
COPY that perhaps can benefit the long-term cleanup effort there.

> Anyway, my specific reaction to your suggestions in the email that I
> quoted is that it seems a bit baroque and that I'm not really sure
> what it's useful for in practice.  I'm certainly not saying it ISN'T
> useful, because I can't believe that you would have gone to the
> trouble to work through all of this unless you had some ideas about
> nifty things that could be done with it, but I think maybe we need to
> back up and start by talking about the problems you're trying to
> solve, before we get too far down into a discussion of implementation
> details.

At Truviso this is used a piece of our replication solution.  In the
patches submitted we see how enhancing dblink allows postgres to copy
directly from one node to another.   Truviso uses it to directly write
bytes to a libpq connection (see the dblink patch) in the open COPY
state to achieve direct cross-node bulk loading for the purposes of
replication.

One could imagine a lot of ETL or data warehouse offloading
applications that can be enabled by allowing bytes to be handled by
arbitrary code, although this patch achieves nothing that writing some
middleware could not accomplish: it's just convenient to have and
likely more efficient than writing some application middleware to do
the same thing.

fdr


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Thoughts on statistics for continuously advancing columns
Next
From: Tom Lane
Date:
Subject: Re: Thoughts on statistics for continuously advancing columns