Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: COPY FROM WHEN condition
Date
Msg-id b87e80a4-15dc-079c-d3d4-41ce03f55480@2ndquadrant.com
Whole thread Raw
In response to COPY FROM WHEN condition  (Surafel Temesgen <surafel3000@gmail.com>)
Responses RE: COPY FROM WHEN condition
Re: COPY FROM WHEN condition
List pgsql-hackers
Hi,

I've taken a quick look at this on the way back from pgconf.eu, and it
seems like a nice COPY improvement in a fairly good shape.

Firstly, I think it's a valuable because it allows efficiently importing
a subset of data. Currently, we either have to create an intermediate
table, copy all the data into that, do the filtering, and then insert
the subset into the actual target table. Another common approach that I
see in practice is using file_fdw to do this, but it's not particularly
straight-forward (having to create the FDW servers etc. first) not
efficient (particularly for large amounts of data). This feature allows
skipping this extra step (at least in simpler ETL cases).

So, I like the idea and I think it makes sense.


A couple of comments regarding the code/docs:


1) I think this deserves at least some regression tests. Plenty of tests
already use COPY, but there's no coverage for the new piece. So let's
add a new test suite, or maybe add a couple of tests into copy2.sql.


2) In copy.sqml, the new item is defined like this

    <term><literal>WHEN Clause</literal></term>

I suggest we use just <term><literal>WHEN</literal></term>, that's what
the other items do (see ENCODING).

The other thing is that this does not say what expressions are allowed
in the WHEN clause. It seems pretty close to WHEN clause for triggers,
which e.g. mentions that subselects are not allowed. I'm pretty sure
that's true here too, because it fails like this (118 = T_SubLink):

    test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
    ERROR:  unrecognized node type: 118

So, the patch needs to detect this, produce a reasonable error message
and document the limitations in copy.sqml, just like we do for CREATE
TRIGGER.


3) For COPY TO, the WHEN clause is accepted but ignored, leading to
confusing cases like this:

    test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
    COPY 151690

So, it contains subselect, but unlike COPY FROM it does not fail
(because we never execute it). The fun part is that the expression is
logically false, so a user might expect it to filter rows, yet we copy
everything.

IMHO we need to either error-out in these cases, complaining about WHEN
not being supported for COPY TO, or make it work (effectively treating
it as a simpler alternative to COPY (subselect) TO).


4) There are some minor code style issues in copy.c - the variable is
misnamed as when_cluase, there are no spaces after 'if' etc. See the
attached patch fixing this.

AFAICS we could just get rid of the extra when_cluase variable and mess
with the cstate->whenClause directly, depending on how (3) gets fixed.


5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
requiring the parentheses.


6) The skip logic in CopyFrom() seems to be slightly wrong. It does
work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
so a COPY will not respond to Ctrl-C unless it finds a row matching the
WHEN condition. If you have a highly selective condition, that's a bit
inconvenient.

It also skips

    MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

so I wonder what the heap_form_tuple() right after the next_record label
will use for tuples right after a skipped one. I'd bet it'll use the
oldcontext (essentially the long-lived context), essentially making it
a memory leak.

So I suggest to get rid of the next_record label, and use 'continue'
instead of the 'goto next_record'.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Narayanan V
Date:
Subject: Re: Conflicting option checking in pg_restore
Next
From: legrand legrand
Date:
Subject: Re: Multiple Wait Events for extensions