Re: Skip partition tuple routing with constant partition key - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Skip partition tuple routing with constant partition key |
Date | |
Msg-id | CA+HiwqFWGNSy1zwcno=AuLLScCFV64P3BYt+EXqqwM1DYhQ16Q@mail.gmail.com Whole thread Raw |
In response to | RE: Skip partition tuple routing with constant partition key ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
Responses |
RE: Skip partition tuple routing with constant partition key
|
List | pgsql-hackers |
Hou-san, On Thu, May 20, 2021 at 7:35 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > From: Amit Langote <amitlangote09@gmail.com> > Sent: Wednesday, May 19, 2021 9:17 PM > > I guess it would be nice if we could fit in a solution for the use case that houjz > > mentioned as a special case. BTW, houjz, could you please check if a patch like > > this one helps the case you mentioned? > > Thanks for the patch! > I did some test on it(using the table you provided above): Thanks a lot for doing that. > 1): Test plain column in partition key. > SQL: insert into foo select 1 from generate_series(1, 10000000); > > HEAD: > Time: 5493.392 ms (00:05.493) > > AFTER PATCH(skip constant partition key) > Time: 4198.421 ms (00:04.198) > > AFTER PATCH(cache the last partition) > Time: 4484.492 ms (00:04.484) > > The test results of your patch in this case looks good. > It can fit many more cases and the performance gain is nice. Hmm yeah, not too bad. > 2) Test expression in partition key > > create or replace function partition_func(i int) returns int as $$ > begin > return i; > end; > $$ language plpgsql immutable parallel restricted; > create unlogged table foo (a int) partition by range (partition_func(a)); > > SQL: insert into foo select 1 from generate_series(1, 10000000); > > HEAD > Time: 8595.120 ms (00:08.595) > > AFTER PATCH(skip constant partition key) > Time: 4198.421 ms (00:04.198) > > AFTER PATCH(cache the last partition) > Time: 12829.800 ms (00:12.830) > > If add a user defined function in the partition key, it seems have > performance degradation after the patch. Oops. > I did some analysis on it, for the above testcase , ExecPartitionCheck > executed three expression 1) key is null 2) key > low 3) key < top > In this case, the "key" contains a funcexpr and the funcexpr will be executed > three times for each row, so, it bring extra overhead which cause the performance degradation. > > IMO, improving the ExecPartitionCheck seems a better solution to it, we can > Calculate the key value in advance and use the value to do the bound check. > Thoughts ? This one seems bit tough. ExecPartitionCheck() uses the generic expression evaluation machinery like a black box, which means execPartition.c can't really tweal/control the time spent evaluating partition constraints. Given that, we may have to disable the caching when key->partexprs != NIL, unless we can reasonably do what you are suggesting. > Besides, are we going to add a reloption or guc to control this cache behaviour if we more forward with this approach ? > Because, If most of the rows to be inserted are routing to a different partition each time, then I think the extra ExecPartitionCheck > will become the overhead. Maybe it's better to apply both two approaches(cache the last partition and skip constant partitionkey) > which can achieve the best performance results. A reloption will have to be a last resort is what I can say about this at the moment. -- Amit Langote EDB: http://www.enterprisedb.com
pgsql-hackers by date: