Home > mailing lists

Re: Merging statistics from children instead of re-sampling everything - Mailing list pgsql-hackers

From	Andrey Lepikhov
Subject	Re: Merging statistics from children instead of re-sampling everything
Date	June 30, 2021 15:55:38
Msg-id	82fcba0a-7c50-c714-2a6b-f2677affe65d@postgrespro.ru Whole thread Raw
In response to	Merging statistics from children instead of re-sampling everything (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Merging statistics from children instead of re-sampling everything
List	pgsql-hackers

Tree view

Sorry, I forgot to send CC into pgsql-hackers.
On 29/6/21 13:23, Tomas Vondra wrote:
> Because sampling is fairly expensive, especially if you have to do it 
> for large number of child relations. And you'd have to do that every 
> time *any* child triggers autovacuum, pretty much. Merging the stats is 
> way cheaper.
> 
> See the other thread linked from the first message.
Maybe i couldn't describe my idea clearly.
The most commonly partitioning is used for large tables.
I suppose to store a sampling reservoir for each partition, replace on 
update of statistics and merge to build statistics for parent table.
It can be spilled into tuplestore on a disk, or stored in a parent table.
In the case of complex inheritance we can store sampling reservoirs only 
for leafs.
You can consider this idea as an imagination, but the merging statistics 
approach has an extensibility problem on another types of statistics.
> 
> 
> On 6/29/21 9:01 AM, Andrey Lepikhov wrote:
>> On 30/3/21 03:51, Tomas Vondra wrote:
>>> Of course, that assumes the merge is cheaper than processing the list of
>>> statistics, but I find that plausible, especially the list needs to be
>>> processed multiple (e.g. when considering different join orders, filters
>>> and so on).
>> I think your approach have a chance. But I didn't understand: why do 
>> you merge statistics? I think we could merge only samples of each 
>> children and build statistics as usual.
>> Error of a sample merging procedure would be quite limited.
>>
> 


-- 
regards,
Andrey Lepikhov
Postgres Professional

pgsql-hackers by date:

From: David Rowley
Date: 30 June 2021, 15:42:53
Subject: Re: Remove redundant initializations

From: Alvaro Herrera
Date: 30 June 2021, 15:58:31
Subject: Re: Preventing abort() and exit() calls in libpq

Re: Merging statistics from children instead of re-sampling everything - Mailing list pgsql-hackers

Previous

Next