RE: Skip partition tuple routing with constant partition key - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Skip partition tuple routing with constant partition key
Date
Msg-id OS0PR01MB5716B2AFF51ECB0775A77480942A9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Skip partition tuple routing with constant partition key  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: Skip partition tuple routing with constant partition key
List pgsql-hackers

From: Amit Langote <amitlangote09@gmail.com>
Sent: Wednesday, May 19, 2021 9:17 PM
> I gave a shot to implementing your idea and ended up with the attached PoC
> patch, which does pass make check-world.
> 
> I do see some speedup:
> 
> -- creates a range-partitioned table with 1000 partitions create unlogged table
> foo (a int) partition by range (a); select 'create unlogged table foo_' || i || '
> partition of foo for values from (' || (i-1)*100000+1 || ') to (' || i*100000+1 || ');'
> from generate_series(1, 1000) i;
> \gexec
> 
> -- generates a 100 million record file
> copy (select generate_series(1, 100000000)) to '/tmp/100m.csv' csv;
> 
> Times for loading that file compare as follows:
> 
> HEAD:
> 
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 31813.964 ms (00:31.814)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 31972.942 ms (00:31.973)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 32049.046 ms (00:32.049)
> 
> Patched:
> 
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 26151.158 ms (00:26.151)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 28161.082 ms (00:28.161)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 26700.908 ms (00:26.701)
>
> I guess it would be nice if we could fit in a solution for the use case that houjz
> mentioned as a special case.  BTW, houjz, could you please check if a patch like
> this one helps the case you mentioned?

Thanks for the patch!
I did some test on it(using the table you provided above):

1): Test plain column in partition key.
SQL: insert into foo select 1 from generate_series(1, 10000000);

HEAD:
Time: 5493.392 ms (00:05.493)

AFTER PATCH(skip constant partition key)
Time: 4198.421 ms (00:04.198)

AFTER PATCH(cache the last partition)
Time: 4484.492 ms (00:04.484)

The test results of your patch in this case looks good.
It can fit many more cases and the performance gain is nice.

-----------
2) Test expression in partition key

create or replace function partition_func(i int) returns int as $$
    begin
        return i;
    end;
$$ language plpgsql immutable parallel restricted;
create unlogged table foo (a int) partition by range (partition_func(a));

SQL: insert into foo select 1 from generate_series(1, 10000000);

HEAD
Time: 8595.120 ms (00:08.595)

AFTER PATCH(skip constant partition key)
Time: 4198.421 ms (00:04.198)

AFTER PATCH(cache the last partition)
Time: 12829.800 ms (00:12.830)

If add a user defined function in the partition key, it seems have
performance degradation after the patch. 

I did some analysis on it, for the above testcase , ExecPartitionCheck
executed three expression 1) key is null 2) key > low 3) key < top
In this case, the "key" contains a funcexpr and the funcexpr will be executed
three times for each row, so, it bring extra overhead which cause the performance degradation.

IMO, improving the ExecPartitionCheck seems a better solution to it, we can
Calculate the key value in advance and use the value to do the bound check.
Thoughts ?

------------

Besides, are we going to add a reloption or guc to control this cache behaviour if we more forward with this approach
?
Because, If most of the rows to be inserted are routing to a different partition each time, then I think the extra
ExecPartitionCheck
will become the overhead. Maybe it's better to apply both two approaches(cache the last partition and skip constant
partitionkey)
 
which can achieve the best performance results.

Best regards,
houzj






pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Skip partition tuple routing with constant partition key
Next
From: Michael Paquier
Date:
Subject: Force disable of SSL renegociation in the server