Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From David Rowley
Subject Re: COPY FROM WHEN condition
Date
Msg-id CAKJS1f8vjLPCpb-YqqixzZcKnjku2iWdZWpkionAiARN4h9s6w@mail.gmail.com
Whole thread Raw
In response to Re: COPY FROM WHEN condition  (Andres Freund <andres@anarazel.de>)
Responses Re: COPY FROM WHEN condition
List pgsql-hackers
On Tue, 2 Apr 2019 at 13:59, Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-04-02 13:41:57 +1300, David Rowley wrote:
> > On Tue, 2 Apr 2019 at 05:19, Andres Freund <andres@anarazel.de> wrote:
> > > Thanks! I'm not quite clear whether you planning to continue working on
> > > this, or whether this is a handoff? Either is fine with me, just trying
> > > to avoid unnecessary work / delay.
> >
> > I can, if you've not. I was hoping to gauge if you thought the
> > approach was worth pursuing.
>
> I think it's worth pursuing, with the caveats below. I'm going to focus
> on docs the not-very-long rest of today, but I definitely could work on
> this afterwards. But I also would welcome any help. Let me know...

I'm looking now. I'll post something when I get it into some better
shape than it us now.

> > > It still seems wrong to me to just perform a second hashtable search
> > > here, givent that we've already done the partition dispatch.
> >
> > The reason I thought this was a good idea is that if we use the
> > ResultRelInfo to buffer the tuples then there's no end to how many
> > tuple slots can exist as the code in copy.c has no control over how
> > many ResultRelInfos are created.
>
> To me those aren't contradictory - we're going to have a ResultRelInfo
> for each partition either way, but there's nothing preventing copy.c
> from cleaning up subsidiary data in it.  What I was thinking is that
> we'd just keep track of a list of ResultRelInfos with bulk insert slots,
> and occasionally clean them up. That way we avoid the secondary lookup,
> while also managing the amount of slots.

The problem that I see with that is you can't just add to that list
when the partition changes. You must check if the ResultRelInfo is
already in the list or not since we could change partitions and change
back again. For a list with just a few elements checking
list_member_ptr should be pretty cheap, but I randomly did choose that
we try to keep just the last 16 partitions worth of buffers. I don't
think checking list_member_ptr in a 16 element list is likely to be
faster than a hash table lookup, do you?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: COPY FROM WHEN condition
Next
From: Andres Freund
Date:
Subject: Re: COPY FROM WHEN condition