Thread: Proposal: Partitioning Advisor for PostgreSQL

Proposal: Partitioning Advisor for PostgreSQL

From
"Yuzuko Hosoya"
Date:
Hello,

I'm Yuzuko Hosoya. This is my first PostgreSQL project participation.

I have been developing partitioning advisor prototype with Julien Rouhaud.
It will be a new feature of HypoPG[1], which is a PostgreSQL extension, and
will help partitioning design tuning.  Currently, HypoPG only supports index
design tuning; it allows users to define hypothetical indexes for real tables and
shows resulting queries' plan/cost with EXPLAIN as if they were actually constructed.
Since declarative partitioning will be greatly improved in PostgreSQL 11 and further
versions, there are emerging needs to support partitioning design tuning. This is
why we are working on partitioning advisor.  We plan to release the first version 
of partitioning advisor for PostgreSQL 11, and then, improve it for PostgreSQL 12.


Overview of partitioning advisor
---------------------------------------
- Partitioning advisor allows users to define multiple hypothetical partitioning
  schemes on real tables and real data
- PostgreSQL can show resulting queries' plan/cost with EXPLAIN using hypothetical
  partitioning schemes
Users can quickly check how their queries would behave if some tables were 
partitioned, and try different partitioning schemes (for instance, to optimize some
queries efficiency Vs. maintenance efficiency).


Partitioning advisor works as follows:

Usage
---------
0. Consider this target table, t1
    #= CREATE TABLE t1 (a int, b text);
    #= INSERT INTO t1 SELECT i, 'test' FROM generate_series(1,299) i ;
    #= EXPLAIN SELECT * FROM t1;
          QUERY PLAN
   -----------------------------------------------------
   Seq Scan on t1  (cost=0.00..4.99 rows=299 width=9)
   (1 row)

1. Partition the target table hypothetically
    #= SELECT * FROM hypopg_partition_table('t1','partition by range(a)');
   The hypopg_partition_table() defines hypothetical range partitioned table 't1' 
   by the partition key 'a' and stores these information into backend local memory.

2. Create hypothetical partitions
    #= SELECT * FROM hypopg_add_partition('t1_1','partition of t1 for values from (1) to (100)');
    #= SELECT * FROM hypopg_add_partition('t1_2','partition of t1 for values from (100) to (300)');
   The hypopg_add_partition() defines hypothetical partitions t1_1 and t1_2 according 
   to their bounds 'from (1) to (100)' and 'from (100) to (300)' respectively, and stores 
   these information into backend local memory.

3. PostgreSQL can show resulting queries' plan/cost with EXPLAIN
    #= EXPLAIN SELECT * FROM t1;
                        QUERY PLAN
   ---------------------------------------------------------------
   Append  (cost=0.00..7.49 rows=299 width=9)
     ->  Seq Scan on t1 t1_1  (cost=0.00..1.99 rows=99 width=9)
     ->  Seq Scan on t1 t1_2  (cost=0.00..4.00 rows=200 width=9)
   (3 rows)
  PostgreSQL retrieves hypothetical partitioning schemes from HypoPG.
  And then if the referred table is defined as hypothetical partitioned table, 
  PostgreSQL creates plans using them.

This is a simple example. In addition, it enables us to simulate range/list/hash
partitioning, partition pruning, N-way join and partition-wise join/aggregation.  
It is already helpful for users to design partitioning schemes.


Current implementation
------------------------
We mainly use get_relation_info_hook().  What we do in this hook is to inject
hypothetical partitioning schemes according to user definition.  At first, we do
all processes that are done at expand_inherited_tables().  Specifically, we 
expand root->simple_rte_array and root->parse->rtable, rewrite target 
table's RangeTblEntry as a partitioned table, and create RangeTblEntries and 
AppendRelInfos for all hypothetical partitions.  Besides that, we set hypothetical
partition's name into rte->alias->aliasname at this time to display hypothetical
partition's name with EXPLAIN.  And then, we rewrite RelOptInfo as needed.  
Specifically, we add partition information, which is set at set_relation_partition_info(),
to hypothetical partitioned tables, and set rel->tuples and rel->pages for 
hypothetical partitions.


However, PostgreSQL isn't designed to have hypothetical tables, so we have 
some problematic blockers for an implementation as follows.  We'd like to 
discuss these topics.

Topics of discussion
---------------------
- Expanding partition's RTE
We have to do all processes which are done at expand_inherited_tables() for 
hypothetical partitions. But, since there are no hooks around here, we use 
get_relation_info_hook() as I mentioned above. In this case, we cannot simulate
update queries correctly, because inheritance_planner() which handles update 
queries is called before get_relation_info_hook().  Therefore, we'd like to see if 
we could add a hook at expand_inherited_tables() to expand hypothetical partitions.

- Showing hypothetical partition's name with EXPLAIN
We set hypothetical partition's name into rte->alias->aliasname as I mentioned
above.  In this case, it is displayed with original table name like this:
   #= EXPLAIN SELECT * FROM t1;
                        QUERY PLAN
    ---------------------------------------------------------------
    Append  (cost=0.00..7.49 rows=299 width=9)
      ->  Seq Scan on t1 t1_1  (cost=0.00..1.99 rows=99 width=9)
      ->  Seq Scan on t1 t1_2  (cost=0.00..4.00 rows=200 width=9)
    (3 rows)
t1 is an original table name and t1_* is hypothetical partition's name. Therefore, 
we'd like to see if we could add a hook to get hypothetical partition's name like 
explain_get_index_name_hook().

- Estimating stats
It is complicated because hypothetical partition has no data.  Currently, we compute
hypothetical partition's size using clauselist_selectivity() according to their partition
bound and original table stats.  As a result, estimate is done with low accuracy, 
especially if there is WHERE clause.  We will improve during developing, but for now, 
we don't have good ideas.

- Reducing extension's code
To define get_relation_info_hook, we have to copy/paste many static functions 
defined in core.  In addition, we have to slightly adapt some functions such as 
RelationGetPartitionDesc() and RelationGetPartitionKey() called from 
set_relation_partition_info(), which retrieve information from system catalogs.  
The reason why we have to adapt is that since objects defined by HypoPG
functions mentioned above doesn't exist in PostgreSQL system catalogs and are
not attached to any existing Relation, so we cannot call RelationGetPartitionDesc()
and RelationGetPartitionKey() as-is.  Specifically, we copy/paste and modify
set_relation_partition_info() and called functions (find_partition_scheme() and
set_baserel_partition_key_exprs()) to get hypothetical partition information.  
Therefore, we'd like to see if we could add hooks to set_relation_partition_info()
to get hypothetical PartitionDesc/PartitionKey and change find_partition_scheme()
and set_baserel_partition_key_exprs() so that we use PartitionKey as the argument 
to these functions, not Relation.


For now, it's a working prototype, and we'd be happy to have some feedback.
And also, we'd be very interested if anyone has good ideas on how to solve the
problems we're facing, and how such a feature could be more integrated into the
core using some new hooks or with other infrastructures.


[1] https://github.com/HypoPG/hypopg

Best regards,
----
Yuzuko Hosoya
NTT Open Source Software Center




Re: Proposal: Partitioning Advisor for PostgreSQL

From
Ashutosh Bapat
Date:
On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya
<hosoya.yuzuko@lab.ntt.co.jp> wrote:
> Hello,
>
> I'm Yuzuko Hosoya. This is my first PostgreSQL project participation.
>
> I have been developing partitioning advisor prototype with Julien Rouhaud.
> It will be a new feature of HypoPG[1], which is a PostgreSQL extension, and
> will help partitioning design tuning.  Currently, HypoPG only supports index
> design tuning; it allows users to define hypothetical indexes for real tables and
> shows resulting queries' plan/cost with EXPLAIN as if they were actually constructed.
> Since declarative partitioning will be greatly improved in PostgreSQL 11 and further
> versions, there are emerging needs to support partitioning design tuning. This is
> why we are working on partitioning advisor.  We plan to release the first version
> of partitioning advisor for PostgreSQL 11, and then, improve it for PostgreSQL 12.

+10 for this feature. Finding that the partitioning doesn't work as
expected after implementing is rather painful since the partitioning
and unpartitioning is time consuming right now.

>
>
> Overview of partitioning advisor
> ---------------------------------------
> - Partitioning advisor allows users to define multiple hypothetical partitioning
>   schemes on real tables and real data
> - PostgreSQL can show resulting queries' plan/cost with EXPLAIN using hypothetical
>   partitioning schemes
> Users can quickly check how their queries would behave if some tables were
> partitioned, and try different partitioning schemes (for instance, to optimize some
> queries efficiency Vs. maintenance efficiency).
>
>
> Partitioning advisor works as follows:
>
> Usage
> ---------
> 0. Consider this target table, t1
>     #= CREATE TABLE t1 (a int, b text);
>     #= INSERT INTO t1 SELECT i, 'test' FROM generate_series(1,299) i ;
>     #= EXPLAIN SELECT * FROM t1;
>           QUERY PLAN
>    -----------------------------------------------------
>    Seq Scan on t1  (cost=0.00..4.99 rows=299 width=9)
>    (1 row)
>
> 1. Partition the target table hypothetically
>     #= SELECT * FROM hypopg_partition_table('t1','partition by range(a)');
>    The hypopg_partition_table() defines hypothetical range partitioned table 't1'
>    by the partition key 'a' and stores these information into backend local memory.
>
> 2. Create hypothetical partitions
>     #= SELECT * FROM hypopg_add_partition('t1_1','partition of t1 for values from (1) to (100)');
>     #= SELECT * FROM hypopg_add_partition('t1_2','partition of t1 for values from (100) to (300)');
>    The hypopg_add_partition() defines hypothetical partitions t1_1 and t1_2 according
>    to their bounds 'from (1) to (100)' and 'from (100) to (300)' respectively, and stores
>    these information into backend local memory.
>
> 3. PostgreSQL can show resulting queries' plan/cost with EXPLAIN
>     #= EXPLAIN SELECT * FROM t1;
>                         QUERY PLAN
>    ---------------------------------------------------------------
>    Append  (cost=0.00..7.49 rows=299 width=9)
>      ->  Seq Scan on t1 t1_1  (cost=0.00..1.99 rows=99 width=9)
>      ->  Seq Scan on t1 t1_2  (cost=0.00..4.00 rows=200 width=9)
>    (3 rows)
>   PostgreSQL retrieves hypothetical partitioning schemes from HypoPG.
>   And then if the referred table is defined as hypothetical partitioned table,
>   PostgreSQL creates plans using them.
>
> This is a simple example. In addition, it enables us to simulate range/list/hash
> partitioning, partition pruning, N-way join and partition-wise join/aggregation.
> It is already helpful for users to design partitioning schemes.
>
>
> Current implementation
> ------------------------
> We mainly use get_relation_info_hook().  What we do in this hook is to inject
> hypothetical partitioning schemes according to user definition.  At first, we do
> all processes that are done at expand_inherited_tables().  Specifically, we
> expand root->simple_rte_array and root->parse->rtable, rewrite target
> table's RangeTblEntry as a partitioned table, and create RangeTblEntries and
> AppendRelInfos for all hypothetical partitions.  Besides that, we set hypothetical
> partition's name into rte->alias->aliasname at this time to display hypothetical
> partition's name with EXPLAIN.  And then, we rewrite RelOptInfo as needed.
> Specifically, we add partition information, which is set at set_relation_partition_info(),
> to hypothetical partitioned tables, and set rel->tuples and rel->pages for
> hypothetical partitions.
>
>
> However, PostgreSQL isn't designed to have hypothetical tables,

I agree. But there are light-weight tables like foreign tables, views
and partitioned tables themselves. These kinds of tables do not have
any storage associated with them. We could implement semi-hypothetical
partitioned table using these three. The reason I say it's
semi-hypothetical since we will need to create some real objects, but
which do not take actual storage. The idea is to create partitioned
table with foreign table partitions which point to views simulating
partitions. The steps are
1. Create views one per partition which select data from the
unpartitioned table that would fall in a partition simulated by that
view. So something like SELECT * FROM unpartitioned_table WHERE
partition constraint for that partition.
2. Create partitioned table
3. Create foreign table partitions that point to the views created in
the first step.
4. ANALYZE the foreign tables and the partitioned table

Now if we EXPLAIN the query on unpartitioned table by redirecting it
to the partitioned table, we would get the EXPLAIN plans as if the
query is running on the partitioned table. We will need to zero out
the FDW costs, so that the cost of accessing foreign table comes out
to be same as accessing a local table. That's mostly setting the right
FDW GUCs.

Since we are creating and dropping some real objects, may be we want
to create temporary objects (we don't have support to create temporary
foreign tables for now, but may be that's desirable feature) or create
them in a different database to reduce catalog bloat. Similarly we
won't be able to create indexes on the foreign table, but may be we
could simulate those using hypothetical indexes feature.

This method doesn't need any core changes which are useful only for
this extension. Supporting temporary foreign table and declarative
indexes on foreign tables may be seen as separate features and
acceptable in the community.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: Proposal: Partitioning Advisor for PostgreSQL

From
Julien Rouhaud
Date:
Hi Ashutosh,

Thanks for answering!  And I'm very sorry for the time I needed to reply

On Wed, May 30, 2018 at 5:44 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya
>>
>> However, PostgreSQL isn't designed to have hypothetical tables,
>
> I agree. But there are light-weight tables like foreign tables, views
> and partitioned tables themselves. These kinds of tables do not have
> any storage associated with them. We could implement semi-hypothetical
> partitioned table using these three. The reason I say it's
> semi-hypothetical since we will need to create some real objects, but
> which do not take actual storage. The idea is to create partitioned
> table with foreign table partitions which point to views simulating
> partitions. The steps are
> 1. Create views one per partition which select data from the
> unpartitioned table that would fall in a partition simulated by that
> view. So something like SELECT * FROM unpartitioned_table WHERE
> partition constraint for that partition.
> 2. Create partitioned table
> 3. Create foreign table partitions that point to the views created in
> the first step.
> 4. ANALYZE the foreign tables and the partitioned table
>
> Now if we EXPLAIN the query on unpartitioned table by redirecting it
> to the partitioned table, we would get the EXPLAIN plans as if the
> query is running on the partitioned table. We will need to zero out
> the FDW costs, so that the cost of accessing foreign table comes out
> to be same as accessing a local table. That's mostly setting the right
> FDW GUCs.
>
> Since we are creating and dropping some real objects, may be we want
> to create temporary objects (we don't have support to create temporary
> foreign tables for now, but may be that's desirable feature) or create
> them in a different database to reduce catalog bloat. Similarly we
> won't be able to create indexes on the foreign table, but may be we
> could simulate those using hypothetical indexes feature.
>
> This method doesn't need any core changes which are useful only for
> this extension. Supporting temporary foreign table and declarative
> indexes on foreign tables may be seen as separate features and
> acceptable in the community.

I both like and dislike this idea.  The good thing is that it's way
less hacky than what we did in our prototype, and it's also working
out of the box.  However, the problem I have with this approach is
that the generated plans will be quite different from real
partitioning,  The main features such as partition pruning or
partition-wise join will probably work, but you'll always have a
ForeignScan as the primary path and I think that it'll drastically
limit the planner and the usability.

I'm also not a fan of doing core changes for a single extension
purpose only, but I think that many of the blockers could be solved
with only slight changes in the core code (for instance, don't use a
Relation as a function parameter just to get the underlying
PartitionKey, but directly pass the PartitionKey on top level).  For
the rest, I'm not sure yet of what exactly would need to be changed
(the partitioning code moved quite a lot lately, and it's hard to stay
up to date), and if such changes could also be useful for other
purpose.


Re: Proposal: Partitioning Advisor for PostgreSQL

From
Ashutosh Bapat
Date:
On Tue, Jun 12, 2018 at 12:21 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> I both like and dislike this idea.  The good thing is that it's way
> less hacky than what we did in our prototype, and it's also working
> out of the box.  However, the problem I have with this approach is
> that the generated plans will be quite different from real
> partitioning,  The main features such as partition pruning or
> partition-wise join will probably work, but you'll always have a
> ForeignScan as the primary path and I think that it'll drastically
> limit the planner and the usability.

AFAIR, there is a hook using which we can change the EXPLAIN output,
so we could change the ForeignScan label. But I don't remember that
hook top of my head and a brief look at Explain code didn't reveal
anything. May be there isn't any hook. We may be able add one in that
case or use CustomScan or something like that. I agree that seeing a
ForeignScan in the plan is not a good thing.

Anyway, the work involved in my proposal may not be worth the utility
we get out of this extension, so may not be worth pursuing it further.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: Proposal: Partitioning Advisor for PostgreSQL

From
Dilip Kumar
Date:
On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya <hosoya.yuzuko@lab.ntt.co.jp> wrote:
Hello,

I'm Yuzuko Hosoya. This is my first PostgreSQL project participation.

I have been developing partitioning advisor prototype with Julien Rouhaud.
It will be a new feature of HypoPG[1], which is a PostgreSQL extension, and
will help partitioning design tuning.  Currently, HypoPG only supports index
design tuning; it allows users to define hypothetical indexes for real tables and
shows resulting queries' plan/cost with EXPLAIN as if they were actually constructed.
Since declarative partitioning will be greatly improved in PostgreSQL 11 and further
versions, there are emerging needs to support partitioning design tuning. This is
why we are working on partitioning advisor.  We plan to release the first version
of partitioning advisor for PostgreSQL 11, and then, improve it for PostgreSQL 12.


Interesting. 
 

- Estimating stats
It is complicated because hypothetical partition has no data.  Currently, we compute
hypothetical partition's size using clauselist_selectivity() according to their partition
bound and original table stats.  As a result, estimate is done with low accuracy,
especially if there is WHERE clause.  We will improve during developing, but for now,
we don't have good ideas.

I haven't yet read the patch but curious to know.  Suppose we have table which is already loaded with some data.  Now, if I create  hypothetical partitions on that will we create any stat data (mcv, histogram) for hypothetical table? because, in this case we already have the data from the main table and we also have partition boundary for the hypothetical table.  I am not sure you are already doing this or its an open item?


[1] https://github.com/HypoPG/hypopg

Best regards,
----
Yuzuko Hosoya
NTT Open Source Software Center






--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Partitioning Advisor for PostgreSQL

From
Julien Rouhaud
Date:
Hi,

On Tue, Jun 12, 2018 at 11:14 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya <hosoya.yuzuko@lab.ntt.co.jp>
> wrote:
>>
>> This is
>> why we are working on partitioning advisor.  We plan to release the first
>> version
>> of partitioning advisor for PostgreSQL 11, and then, improve it for
>> PostgreSQL 12.
>>
>
> Interesting.

Thanks!

>> - Estimating stats
>> It is complicated because hypothetical partition has no data.  Currently,
>> we compute
>> hypothetical partition's size using clauselist_selectivity() according to
>> their partition
>> bound and original table stats.  As a result, estimate is done with low
>> accuracy,
>> especially if there is WHERE clause.  We will improve during developing,
>> but for now,
>> we don't have good ideas.
>
>
> I haven't yet read the patch but curious to know.  Suppose we have table
> which is already loaded with some data.  Now, if I create  hypothetical
> partitions on that will we create any stat data (mcv, histogram) for
> hypothetical table? because, in this case we already have the data from the
> main table and we also have partition boundary for the hypothetical table.
> I am not sure you are already doing this or its an open item?


For now we're simply using the original table statistics, and
appending the partition bounds as qual on the hypothetical partition.
It'll give good result if the query doesn't have quals for the table,
or for simple cases where selectivity functions understand that
expressions such as

(id BETWEEN 1 AND 1000000) AND (id < 6)

will return only 5 rows, while they can't for expressions like

(id IN (x,y...)) AND (id < z)

In this second case, the estimates are for now therefore quite wrong.
I think that we'd have no other choice than to generate hypothetical
statistics according to the partition bounds, and only compute
selectivity based on the query quals.  It's definitely not simple to
do, but it should be doable with the hooks currently available.