Re: Making "COPY partitioned_table FROM" faster - Mailing list pgsql-hackers

From David Rowley
Subject Re: Making "COPY partitioned_table FROM" faster
Date
Msg-id CAKJS1f9ZZ-ZaLq4rbbJWYHKC9-VG0GOnNb4VdOZSG2=3fHJ5Sg@mail.gmail.com
Whole thread Raw
In response to Re: Making "COPY partitioned_table FROM" faster  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: Making "COPY partitioned_table FROM" faster
List pgsql-hackers
On 30 July 2018 at 20:33, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Two more thoughts:
>
> - Add some tests.  The if (nBufferedTuples > 0) that flushes the tuples
> when the partition changes is not currently exercised.

That seems like a good idea. In fact, it uncovered a bug around
ConvertPartitionTupleSlot() freeing the previously stored tuple out
the slot which resulted in a crash. I didn't notice before because my
test had previously not required any tuple conversions.

While ensuring my test was working correctly I noticed that I had a
thinko in the logic that decided if another avgTuplesPerPartChange
calculation was due.  The problem was that the (cstate->cur_lineno -
RECHECK_MULTI_INSERT_THRESHOLD) is unsigned and when cur_lineno is
below RECHECK_MULTI_INSERT_THRESHOLD it results in an underflow which
makes the if test always true until cur_lineno gets up to 1000. I
considered just making all those counters signed, but chickened out
when it came to changing "processed". I thought it was quite strange
to have "processed" unsigned and the other counters that do similar
things signed.  Of course, signed would have enough range, but it
would mean changing the return type of CopyFrom() which I wasn't
willing to do.

In the end, I just protected the if test so that it only calculates
the average again if cur_lineno is at least
RECHECK_MULTI_INSERT_THRESHOLD. This slightly changes when
avgTuplesPerPartChange is first set, but it'll still be at least 1000
tuples before we start doing multi-inserts. Another option would be to
cast the expression as int64... I'm a bit undecided what's best here.

> - With proute becoming a function-level variable,
> cstate->partition_tuple_routing is obsolete and could be removed.  (No
> point in saving this in cstate if it's only used in one function anyway.)

Good idea. I've removed that.

I've attached a delta patch based on the v4 patch.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Jesper Pedersen
Date:
Subject: Re: partition tree inspection functions
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: Explain buffers wrong counter with parallel plans