Re: Skip partition tuple routing with constant partition key - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Skip partition tuple routing with constant partition key
Date
Msg-id CA+HiwqFWGNSy1zwcno=AuLLScCFV64P3BYt+EXqqwM1DYhQ16Q@mail.gmail.com
Whole thread Raw
In response to RE: Skip partition tuple routing with constant partition key  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses RE: Skip partition tuple routing with constant partition key
List pgsql-hackers
Hou-san,

On Thu, May 20, 2021 at 7:35 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> From: Amit Langote <amitlangote09@gmail.com>
> Sent: Wednesday, May 19, 2021 9:17 PM
> > I guess it would be nice if we could fit in a solution for the use case that houjz
> > mentioned as a special case.  BTW, houjz, could you please check if a patch like
> > this one helps the case you mentioned?
>
> Thanks for the patch!
> I did some test on it(using the table you provided above):

Thanks a lot for doing that.

> 1): Test plain column in partition key.
> SQL: insert into foo select 1 from generate_series(1, 10000000);
>
> HEAD:
> Time: 5493.392 ms (00:05.493)
>
> AFTER PATCH(skip constant partition key)
> Time: 4198.421 ms (00:04.198)
>
> AFTER PATCH(cache the last partition)
> Time: 4484.492 ms (00:04.484)
>
> The test results of your patch in this case looks good.
> It can fit many more cases and the performance gain is nice.

Hmm yeah, not too bad.

> 2) Test expression in partition key
>
> create or replace function partition_func(i int) returns int as $$
>     begin
>         return i;
>     end;
> $$ language plpgsql immutable parallel restricted;
> create unlogged table foo (a int) partition by range (partition_func(a));
>
> SQL: insert into foo select 1 from generate_series(1, 10000000);
>
> HEAD
> Time: 8595.120 ms (00:08.595)
>
> AFTER PATCH(skip constant partition key)
> Time: 4198.421 ms (00:04.198)
>
> AFTER PATCH(cache the last partition)
> Time: 12829.800 ms (00:12.830)
>
> If add a user defined function in the partition key, it seems have
> performance degradation after the patch.

Oops.

> I did some analysis on it, for the above testcase , ExecPartitionCheck
> executed three expression 1) key is null 2) key > low 3) key < top
> In this case, the "key" contains a funcexpr and the funcexpr will be executed
> three times for each row, so, it bring extra overhead which cause the performance degradation.
>
> IMO, improving the ExecPartitionCheck seems a better solution to it, we can
> Calculate the key value in advance and use the value to do the bound check.
> Thoughts ?

This one seems bit tough.  ExecPartitionCheck() uses the generic
expression evaluation machinery like a black box, which means
execPartition.c can't really tweal/control the time spent evaluating
partition constraints.  Given that, we may have to disable the caching
when key->partexprs != NIL, unless we can reasonably do what you are
suggesting.

> Besides, are we going to add a reloption or guc to control this cache behaviour if we more forward with this approach
?
> Because, If most of the rows to be inserted are routing to a different partition each time, then I think the extra
ExecPartitionCheck
> will become the overhead. Maybe it's better to apply both two approaches(cache the last partition and skip constant
partitionkey)
 
> which can achieve the best performance results.

A reloption will have to be a last resort is what I can say about this
at the moment.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Force disable of SSL renegociation in the server
Next
From: Joe Conway
Date:
Subject: Re: Freenode woes