Re: RETURNING syntax for COPY - Mailing list pgsql-hackers
From | Jim Nasby |
---|---|
Subject | Re: RETURNING syntax for COPY |
Date | |
Msg-id | 518AA62E.8080708@nasby.net Whole thread Raw |
In response to | Re: RETURNING syntax for COPY ("Jonathan S. Katz" <jonathan.katz@excoventures.com>) |
Responses |
Re: RETURNING syntax for COPY
|
List | pgsql-hackers |
On 5/8/13 12:54 PM, Jonathan S. Katz wrote: > On May 8, 2013, at 1:16 PM, Tom Lane wrote: > >> Heikki Linnakangas <hlinnakangas@vmware.com> writes: >>> On 08.05.2013 19:44, Tom Lane wrote: >>>> No there isn't; what you suggest would require FE/BE protocol >>>> extensions, making it several orders of magnitude more work than the >>>> other thing. >> >>> I'd imagine that the flow would go something like this: >> >>> BE FE >> >>> CopyInResponse >>> CopyData >>> CopyData >>> ... >>> CopyDone >>> RowDescription >>> DataRow >>> DataRow >>> CommandComplete >> >> That would require the backend to buffer the entire query response, >> which isn't a great idea. I would expect that such an operation would >> need to interleave CopyData to the backend with DataRow responses. Such >> a thing could possibly be built on COPY_BOTH mode, but it would be a lot >> of work (at both ends) for extremely debatable value. >> >> The general idea of COPY is to load data as fast as possible, so weighing >> it down with processing options seems like a pretty dubious idea even if >> the implementation were easy. > > There are cases that I indeed want to load data very quickly, but I want to perform an operation on it immediately after,e.g. removing bad data that was immediately added from that copy. For instance, I do have this scenario: > > WITH new_data AS ( > COPY FROM ... > RETURNING id, field_to_check > ) > DELETE FROM table > USING new_data > WHERE > table.id = new_data.id AND > new_data.field_to_check ~* 'bad data'; > > Now I can take care of that all in one step, and I know I'm only removing fields I just added. This comes up when I amimporting external files from other sources where I may not necessarily want all of the rows or some of the rows containbad data. > > This also presumes that COPY works in a CTE, which I'm not sure it does (and I will do the TIAS test after I hit send onthis message). What you're really asking for here is some kind of stream processing capability. There are spin-offs of Postgres that providethat capability (I know Neil Conway worked on some). Those are geared completely around stream processing, but I thinkit would be extremely interesting to provide some minimal support for that in community Postgres. Using your use case as an example, something like this would be very interesting: COPY table FROM ... WHERE field_to_check !~* 'bad data' ; In this case we're just applying a simple WHERE clause against each incoming row. Perhaps what I'm suggesting could be implemented with a CTE, but I'm not sure it makes sense to do it the way you propose,at least not initially. A CTE would provide so much flexibility that it'd be difficult for the optimizer to be efficientabout it. Something like a WHERE clause directly on COPY would be a lot easier to handle. As someone mentioned,FDW might be another option there. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
pgsql-hackers by date: