Re: COPY FROM WHEN condition - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: COPY FROM WHEN condition |
Date | |
Msg-id | b87e80a4-15dc-079c-d3d4-41ce03f55480@2ndquadrant.com Whole thread Raw |
In response to | COPY FROM WHEN condition (Surafel Temesgen <surafel3000@gmail.com>) |
Responses |
RE: COPY FROM WHEN condition
Re: COPY FROM WHEN condition |
List | pgsql-hackers |
Hi, I've taken a quick look at this on the way back from pgconf.eu, and it seems like a nice COPY improvement in a fairly good shape. Firstly, I think it's a valuable because it allows efficiently importing a subset of data. Currently, we either have to create an intermediate table, copy all the data into that, do the filtering, and then insert the subset into the actual target table. Another common approach that I see in practice is using file_fdw to do this, but it's not particularly straight-forward (having to create the FDW servers etc. first) not efficient (particularly for large amounts of data). This feature allows skipping this extra step (at least in simpler ETL cases). So, I like the idea and I think it makes sense. A couple of comments regarding the code/docs: 1) I think this deserves at least some regression tests. Plenty of tests already use COPY, but there's no coverage for the new piece. So let's add a new test suite, or maybe add a couple of tests into copy2.sql. 2) In copy.sqml, the new item is defined like this <term><literal>WHEN Clause</literal></term> I suggest we use just <term><literal>WHEN</literal></term>, that's what the other items do (see ENCODING). The other thing is that this does not say what expressions are allowed in the WHEN clause. It seems pretty close to WHEN clause for triggers, which e.g. mentions that subselects are not allowed. I'm pretty sure that's true here too, because it fails like this (118 = T_SubLink): test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10); ERROR: unrecognized node type: 118 So, the patch needs to detect this, produce a reasonable error message and document the limitations in copy.sqml, just like we do for CREATE TRIGGER. 3) For COPY TO, the WHEN clause is accepted but ignored, leading to confusing cases like this: test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10); COPY 151690 So, it contains subselect, but unlike COPY FROM it does not fail (because we never execute it). The fun part is that the expression is logically false, so a user might expect it to filter rows, yet we copy everything. IMHO we need to either error-out in these cases, complaining about WHEN not being supported for COPY TO, or make it work (effectively treating it as a simpler alternative to COPY (subselect) TO). 4) There are some minor code style issues in copy.c - the variable is misnamed as when_cluase, there are no spaces after 'if' etc. See the attached patch fixing this. AFAICS we could just get rid of the extra when_cluase variable and mess with the cstate->whenClause directly, depending on how (3) gets fixed. 5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it requires it to be 'WHEN (expr)'. I suggest we do the same thing here, requiring the parentheses. 6) The skip logic in CopyFrom() seems to be slightly wrong. It does work, but the next_record label is defined after CHECK_FOR_INTERRUPTS() so a COPY will not respond to Ctrl-C unless it finds a row matching the WHEN condition. If you have a highly selective condition, that's a bit inconvenient. It also skips MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); so I wonder what the heap_form_tuple() right after the next_record label will use for tuples right after a skipped one. I'd bet it'll use the oldcontext (essentially the long-lived context), essentially making it a memory leak. So I suggest to get rid of the next_record label, and use 'continue' instead of the 'goto next_record'. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: