Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: COPY FROM WHEN condition
Date
Msg-id 8c9edfdf-6fad-7c68-8969-e1e315baa7a4@2ndquadrant.com
Whole thread Raw
In response to Re: COPY FROM WHEN condition  (Surafel Temesgen <surafel3000@gmail.com>)
Responses Re: COPY FROM WHEN condition
Re: COPY FROM WHEN condition
List pgsql-hackers
On 11/23/18 12:14 PM, Surafel Temesgen wrote:
> 
> 
> On Sun, Nov 11, 2018 at 11:59 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
>     So, what about using FILTER here? We already use it for aggregates when
>     filtering rows to process.
> 
> i think its good idea and describe its purpose more. Attache is a
> patch that use FILTER instead

Thanks, looks good to me. A couple of minor points:

1) While comparing this to the FILTER clause we already have for
aggregates, I've noticed the aggregate version is

    FILTER '(' WHERE a_expr ')'

while here we have

    FILTER '(' a_expr ')'

For a while I was thinking that maybe we should use the same syntax
here, but I don't think so. The WHERE bit seems rather unnecessary and
we probably implemented it only because it's required by SQL standard,
which does not apply to COPY. So I think this is fine.


2) The various parser checks emit errors like this:

    case EXPR_KIND_COPY_FILTER:
        err = _("cannot use subquery in copy from FILTER condition");
        break;

I think the "copy from" should be capitalized, to make it clear that it
refers to a COPY FROM command and not a copy of something.


3) I think there should be regression tests for these prohibited things,
i.e. for a set-returning function, for a non-existent column, etc.


4) What might be somewhat confusing for users is that the filter uses a
single snapshot to evaluate the conditions for all rows. That is, one
might do this

    create or replace function f() returns int as $$
        select count(*)::int from t;
    $$ language sql;

and hope that

    copy t from '/...' filter (f() <= 100);

only ever imports the first 100 rows - but that's not true, of course,
because f() uses the snapshot acquired at the very beginning. For
example INSERT SELECT does behave differently:

    test=# copy t from '/home/user/t.data' filter (f() < 100);
    COPY 81
    test=# insert into t select * from t where f() < 100;
    INSERT 0 19

Obviously, this is not an issue when the filter clause references only
the input row (which I assume will be the primary use case).

Not sure if this is expected / appropriate behavior, and if the patch
needs to do something else here.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Should new partitions inherit their tablespace from their parent?
Next
From: Tomas Vondra
Date:
Subject: Re: Tid scan improvements