Re: Skip partition tuple routing with constant partition key - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Skip partition tuple routing with constant partition key
Date
Msg-id CA+HiwqFbMSLDMinPRsGQVn_gfb-bMy0J2z_rZ0-b9kSfxXF+Ag@mail.gmail.com
Whole thread Raw
In response to Re: Skip partition tuple routing with constant partition key  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Skip partition tuple routing with constant partition key
RE: Skip partition tuple routing with constant partition key
List pgsql-hackers
On Tue, May 18, 2021 at 10:28 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Tue, 18 May 2021 at 01:31, Amit Langote <amitlangote09@gmail.com> wrote:
> > Hmm, does this seem common enough for the added complexity to be worthwhile?
>
> I'd also like to know if there's some genuine use case for this. For
> testing purposes does not seem to be quite a good enough reason.
>
> A slightly different optimization that I have considered and even
> written patches before was to have ExecFindPartition() cache the last
> routed to partition and have it check if the new row can go into that
> one on the next call.  I imagined there might be a use case for
> speeding that up for RANGE partitioned tables since it seems fairly
> likely that most use cases, at least for time series ranges will
> always hit the same partition most of the time.   Since RANGE requires
> a binary search there might be some savings there.  I imagine that
> optimisation would never be useful for HASH partitioning since it
> seems most likely that we'll be routing to a different partition each
> time and wouldn't save much since routing to hash partitions are
> cheaper than other types.  LIST partitioning I'm not so sure about. It
> seems much less likely than RANGE to hit the same partition twice in a
> row.
>
> IIRC, the patch did something like call ExecPartitionCheck() on the
> new tuple with the previously routed to ResultRelInfo. I think the
> last used partition was cached somewhere like relcache (which seems a
> bit questionable).   Likely this would speed up the example case here
> a bit. Not as much as the proposed patch, but it would likely apply in
> many more cases.
>
> I don't think I ever posted the patch to the list, and if so I no
> longer have access to it, so it would need to be done again.

I gave a shot to implementing your idea and ended up with the attached
PoC patch, which does pass make check-world.

I do see some speedup:

-- creates a range-partitioned table with 1000 partitions
create unlogged table foo (a int) partition by range (a);
select 'create unlogged table foo_' || i || ' partition of foo for
values from (' || (i-1)*100000+1 || ') to (' || i*100000+1 || ');'
from generate_series(1, 1000) i;
\gexec

-- generates a 100 million record file
copy (select generate_series(1, 100000000)) to '/tmp/100m.csv' csv;

Times for loading that file compare as follows:

HEAD:

postgres=# copy foo from '/tmp/100m.csv' csv;
COPY 100000000
Time: 31813.964 ms (00:31.814)
postgres=# copy foo from '/tmp/100m.csv' csv;
COPY 100000000
Time: 31972.942 ms (00:31.973)
postgres=# copy foo from '/tmp/100m.csv' csv;
COPY 100000000
Time: 32049.046 ms (00:32.049)

Patched:

postgres=# copy foo from '/tmp/100m.csv' csv;
COPY 100000000
Time: 26151.158 ms (00:26.151)
postgres=# copy foo from '/tmp/100m.csv' csv;
COPY 100000000
Time: 28161.082 ms (00:28.161)
postgres=# copy foo from '/tmp/100m.csv' csv;
COPY 100000000
Time: 26700.908 ms (00:26.701)

I guess it would be nice if we could fit in a solution for the use
case that houjz mentioned as a special case.  BTW, houjz, could you
please check if a patch like this one helps the case you mentioned?




--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS
Next
From: Amit Langote
Date:
Subject: Re: Skip partition tuple routing with constant partition key