Re: Merging statistics from children instead of re-sampling everything - Mailing list pgsql-hackers

From Andrey V. Lepikhov
Subject Re: Merging statistics from children instead of re-sampling everything
Date
Msg-id f1435fb9-e78d-bd43-ec2a-e1477db1ab32@postgrespro.ru
Whole thread Raw
In response to Re: Merging statistics from children instead of re-sampling everything  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Merging statistics from children instead of re-sampling everything
List pgsql-hackers
On 2/11/22 20:12, Tomas Vondra wrote:
> 
> 
> On 2/11/22 05:29, Andrey V. Lepikhov wrote:
>> On 2/11/22 03:37, Tomas Vondra wrote:
>>> That being said, this thread was not really about foreign partitions,
>>> but about re-analyzing inheritance trees in general. And sampling
>>> foreign partitions doesn't really solve that - we'll still do the
>>> sampling over and over.
>> IMO, to solve the problem we should do two things:
>> 1. Avoid repeatable partition scans in the case inheritance tree.
>> 2. Avoid to re-analyze everything in the case of active changes in 
>> small subset of partitions.
>>
>> For (1) i can imagine a solution like multiplexing: on the stage of 
>> defining which relations to scan, group them and prepare parameters of 
>> scanning to make multiple samples in one shot.
> I'm not sure I understand what you mean by multiplexing. The term 
> usually means "sending multiple signals at once" but I'm not sure how 
> that applies to this issue. Can you elaborate?

I suppose to make a set of samples in one scan: one sample for plane 
table, another - for a parent and so on, according to the inheritance 
tree. And cache these samples in memory. We can calculate all parameters 
of reservoir method to do it.

> sample might be used for estimation of clauses directly.
You mean, to use them in difficult cases, such of estimation of grouping 
over APPEND ?
> 
> But it requires storing the sample somewhere, and I haven't found a good 
> and simple way to do that. We could serialize that into bytea, or we 
> could create a new fork, or something, but what should that do with 
> oversized attributes (how would TOAST work for a fork) and/or large 
> samples (which might not fit into 1GB bytea)? 
This feature looks like meta-info over a database. It can be stored in 
separate relation. It is not obvious that we need to use it for each 
relation, for example, with large samples. I think, it can be controlled 
by a table parameter.

-- 
regards,
Andrey Lepikhov
Postgres Professional



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Next
From: "tanghy.fnst@fujitsu.com"
Date:
Subject: RE: row filtering for logical replication