Re: Making "COPY partitioned_table FROM" faster - Mailing list pgsql-hackers

From David Rowley
Subject Re: Making "COPY partitioned_table FROM" faster
Date
Msg-id CAKJS1f_qk1AAzLiKtHatzkbD-nE0PYn4y-Nt98h2VjTC9K=oDg@mail.gmail.com
Whole thread Raw
In response to Re: Making "COPY partitioned_table FROM" faster  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Making "COPY partitioned_table FROM" faster  (Simon Riggs <simon@2ndquadrant.com>)
Re: Making "COPY partitioned_table FROM" faster  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 25 July 2018 at 04:37, Simon Riggs <simon@2ndquadrant.com> wrote:
> I don't see any need here for another GUC, nor even a command option.
> The user has already indicated their use case to us:

I agree.

> We know that the common case for RANGE partitioned tables is to load
> into the one current partition. We also know that the partition might
> change as we load data, when the data loaded crosses the partition
> boundary, so we need this optimization to be adaptive. Not all data
> loads follow that rule, so we also need the adpative algorithm to shut
> off for those cases.
>
> We also know that the common case for HASH partitions is to load into
> all partitions at once, since hash is specifically designed to spread
> data out across partitions. It is almost never true that we would want
> to load one partition at a time, so it seems easy to turn the
> optimization off if we use this type of partitioning. Or better, we
> need work done to improve that case also, but that is outside the
> current scope of this patch.
>
> If we have multi-level partitions, if any of the levels includes a
> Hash, then turn it off.
>
> LIST partitions are less likely to have a clear pattern, so I would
> treat them like HASH and assume the data is not sorted by partition.
>
> So for this patch, just add an "if (RANGE)" test.

I agree RANGE partition is probably the most likely case to benefit
from this optimisation, but I just don't think that HASH could never
benefit and LIST probably sits somewhere in the middle.

HASH partitioning might be useful in cases like partitioning by
"sensor_id".  It does not seem that unreasonable that someone might
want to load all the data for an entire sensor at once.

The v3 version of the patch also fixes the very small performance
regression for the (probably quite likely) worst-case situation.  New
performance is about 3.5% faster instead of 0.5-1% slower. So likely
there's no longer any need to consider this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Explain buffers wrong counter with parallel plans
Next
From: Amit Langote
Date:
Subject: Re: PartitionDispatch's partdesc field