Re: Proposal: Partitioning Advisor for PostgreSQL - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Proposal: Partitioning Advisor for PostgreSQL
Date
Msg-id CAOBaU_bwG6hX9xOZ3n7xq43ztEwZR4bS78aFj12vfLDL7rPffw@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Partitioning Advisor for PostgreSQL  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
Hi,

On Tue, Jun 12, 2018 at 11:14 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya <hosoya.yuzuko@lab.ntt.co.jp>
> wrote:
>>
>> This is
>> why we are working on partitioning advisor.  We plan to release the first
>> version
>> of partitioning advisor for PostgreSQL 11, and then, improve it for
>> PostgreSQL 12.
>>
>
> Interesting.

Thanks!

>> - Estimating stats
>> It is complicated because hypothetical partition has no data.  Currently,
>> we compute
>> hypothetical partition's size using clauselist_selectivity() according to
>> their partition
>> bound and original table stats.  As a result, estimate is done with low
>> accuracy,
>> especially if there is WHERE clause.  We will improve during developing,
>> but for now,
>> we don't have good ideas.
>
>
> I haven't yet read the patch but curious to know.  Suppose we have table
> which is already loaded with some data.  Now, if I create  hypothetical
> partitions on that will we create any stat data (mcv, histogram) for
> hypothetical table? because, in this case we already have the data from the
> main table and we also have partition boundary for the hypothetical table.
> I am not sure you are already doing this or its an open item?


For now we're simply using the original table statistics, and
appending the partition bounds as qual on the hypothetical partition.
It'll give good result if the query doesn't have quals for the table,
or for simple cases where selectivity functions understand that
expressions such as

(id BETWEEN 1 AND 1000000) AND (id < 6)

will return only 5 rows, while they can't for expressions like

(id IN (x,y...)) AND (id < z)

In this second case, the estimates are for now therefore quite wrong.
I think that we'd have no other choice than to generate hypothetical
statistics according to the partition bounds, and only compute
selectivity based on the query quals.  It's definitely not simple to
do, but it should be doable with the hooks currently available.


pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Proposal: Partitioning Advisor for PostgreSQL
Next
From: Geoff Winkless
Date:
Subject: late binding of shared libs for C functions