Home > mailing lists

Re: Proposal: Partitioning Advisor for PostgreSQL - Mailing list pgsql-hackers

From	Julien Rouhaud
Subject	Re: Proposal: Partitioning Advisor for PostgreSQL
Date	June 12, 2018 16:28:15
Msg-id	CAOBaU_bwG6hX9xOZ3n7xq43ztEwZR4bS78aFj12vfLDL7rPffw@mail.gmail.com Whole thread Raw
In response to	Re: Proposal: Partitioning Advisor for PostgreSQL (Dilip Kumar <dilipbalaut@gmail.com>)
List	pgsql-hackers

Tree view

Hi,

On Tue, Jun 12, 2018 at 11:14 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Thu, May 24, 2018 at 4:16 PM, Yuzuko Hosoya <hosoya.yuzuko@lab.ntt.co.jp>
> wrote:
>>
>> This is
>> why we are working on partitioning advisor.  We plan to release the first
>> version
>> of partitioning advisor for PostgreSQL 11, and then, improve it for
>> PostgreSQL 12.
>>
>
> Interesting.

Thanks!

>> - Estimating stats
>> It is complicated because hypothetical partition has no data.  Currently,
>> we compute
>> hypothetical partition's size using clauselist_selectivity() according to
>> their partition
>> bound and original table stats.  As a result, estimate is done with low
>> accuracy,
>> especially if there is WHERE clause.  We will improve during developing,
>> but for now,
>> we don't have good ideas.
>
>
> I haven't yet read the patch but curious to know.  Suppose we have table
> which is already loaded with some data.  Now, if I create  hypothetical
> partitions on that will we create any stat data (mcv, histogram) for
> hypothetical table? because, in this case we already have the data from the
> main table and we also have partition boundary for the hypothetical table.
> I am not sure you are already doing this or its an open item?

For now we're simply using the original table statistics, and
appending the partition bounds as qual on the hypothetical partition.
It'll give good result if the query doesn't have quals for the table,
or for simple cases where selectivity functions understand that
expressions such as

(id BETWEEN 1 AND 1000000) AND (id < 6)

will return only 5 rows, while they can't for expressions like

(id IN (x,y...)) AND (id < z)

In this second case, the estimates are for now therefore quite wrong.
I think that we'd have no other choice than to generate hypothetical
statistics according to the partition bounds, and only compute
selectivity based on the query quals.  It's definitely not simple to
do, but it should be doable with the hooks currently available.

pgsql-hackers by date:

From: Dilip Kumar
Date: 12 June 2018, 15:14:02
Subject: Re: Proposal: Partitioning Advisor for PostgreSQL

From: Geoff Winkless
Date: 12 June 2018, 16:48:14
Subject: late binding of shared libs for C functions

Re: Proposal: Partitioning Advisor for PostgreSQL - Mailing list pgsql-hackers

Previous

Next