Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From Andres Freund
Subject Re: COPY FROM WHEN condition
Date
Msg-id 20190122183557.6246fitlswmt725p@alap3.anarazel.de
Whole thread Raw
In response to Re: COPY FROM WHEN condition  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2019-01-22 18:35:21 +0100, Tomas Vondra wrote:
> On 1/21/19 11:15 PM, Tomas Vondra wrote:
> > On 1/21/19 7:51 PM, Andres Freund wrote:
> >> I'm *not* convinced by this. I think it's bad enough that we do this for
> >> normal COPY, but for WHEN, we could end up *never* resetting before the
> >> end. Consider a case where a single tuple is inserted, and then *all*
> >> rows are filtered.  I think this needs a separate econtext that's reset
> >> every round. Or alternatively you could fix the code not to rely on
> >> per-tuple not being reset when tuples are buffered - that actually ought
> >> to be fairly simple.
> >>
> > 
> > I think separating the per-tuple and per-batch contexts is the right
> > thing to do, here. It seems the batching was added somewhat later and
> > using the per-tuple context is rather confusing.
> > 
> 
> OK, here is a WIP patch doing that. It creates a new "batch" context,
> and allocates tuples in it (instead of the per-tuple context). The
> per-tuple context is now reset always, irrespectedly of nBufferedTuples.
> And the batch context is reset every time the batch is emptied.
> 
> It turned out to be a tad more complex due to partitioning, because when
> we find the partitions do not match, the tuple is already allocated in
> the "current" context (be it per-tuple or batch). So we can't just free
> the whole context at that point. The old code worked around this by
> alternating two contexts, but that seems a bit too cumbersome to me, so
> the patch simply copies the tuple to the new context. That allows us to
> reset the batch context always, right after emptying the buffer. I need
> to do some benchmarking to see if the extra copy causes any regression.
> 
> Overall, separating the contexts makes it quite a bit clearer. I'm not
> entirely happy about the per-tuple context being "implicit" (hidden in
> executor context) while the batch context being explicitly created, but
> there's not much I can do about that.

I think the extra copy is probably OK for now - as part of the pluggable
storage series I've converted COPY to use slots, which should make that
less of an issue - I've not done that, but I actually assume we could
remove the whole batch context afterwards.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Refactoring the checkpointer's fsync request queue
Next
From: Arthur Zakirov
Date:
Subject: Re: [PROPOSAL] Shared Ispell dictionaries