Re: Skip partition tuple routing with constant partition key - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Skip partition tuple routing with constant partition key |
Date | |
Msg-id | CA+HiwqFbMSLDMinPRsGQVn_gfb-bMy0J2z_rZ0-b9kSfxXF+Ag@mail.gmail.com Whole thread Raw |
In response to | Re: Skip partition tuple routing with constant partition key (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Skip partition tuple routing with constant partition key
RE: Skip partition tuple routing with constant partition key |
List | pgsql-hackers |
On Tue, May 18, 2021 at 10:28 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Tue, 18 May 2021 at 01:31, Amit Langote <amitlangote09@gmail.com> wrote: > > Hmm, does this seem common enough for the added complexity to be worthwhile? > > I'd also like to know if there's some genuine use case for this. For > testing purposes does not seem to be quite a good enough reason. > > A slightly different optimization that I have considered and even > written patches before was to have ExecFindPartition() cache the last > routed to partition and have it check if the new row can go into that > one on the next call. I imagined there might be a use case for > speeding that up for RANGE partitioned tables since it seems fairly > likely that most use cases, at least for time series ranges will > always hit the same partition most of the time. Since RANGE requires > a binary search there might be some savings there. I imagine that > optimisation would never be useful for HASH partitioning since it > seems most likely that we'll be routing to a different partition each > time and wouldn't save much since routing to hash partitions are > cheaper than other types. LIST partitioning I'm not so sure about. It > seems much less likely than RANGE to hit the same partition twice in a > row. > > IIRC, the patch did something like call ExecPartitionCheck() on the > new tuple with the previously routed to ResultRelInfo. I think the > last used partition was cached somewhere like relcache (which seems a > bit questionable). Likely this would speed up the example case here > a bit. Not as much as the proposed patch, but it would likely apply in > many more cases. > > I don't think I ever posted the patch to the list, and if so I no > longer have access to it, so it would need to be done again. I gave a shot to implementing your idea and ended up with the attached PoC patch, which does pass make check-world. I do see some speedup: -- creates a range-partitioned table with 1000 partitions create unlogged table foo (a int) partition by range (a); select 'create unlogged table foo_' || i || ' partition of foo for values from (' || (i-1)*100000+1 || ') to (' || i*100000+1 || ');' from generate_series(1, 1000) i; \gexec -- generates a 100 million record file copy (select generate_series(1, 100000000)) to '/tmp/100m.csv' csv; Times for loading that file compare as follows: HEAD: postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000 Time: 31813.964 ms (00:31.814) postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000 Time: 31972.942 ms (00:31.973) postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000 Time: 32049.046 ms (00:32.049) Patched: postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000 Time: 26151.158 ms (00:26.151) postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000 Time: 28161.082 ms (00:28.161) postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000 Time: 26700.908 ms (00:26.701) I guess it would be nice if we could fit in a solution for the use case that houjz mentioned as a special case. BTW, houjz, could you please check if a patch like this one helps the case you mentioned? -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: