Thread: Merging statistics from children instead of re-sampling everything

Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:
Hi,

While reviewing the thread about issues with auto-analyze on partitioned
tables [1] I remembered that the way we build statistics on the parent
is somewhat expensive, because it samples rows from the children again.

It's true we sample much smaller amounts of rows from each partition
(proportional to it's size), so it's not as expensive as simply running
ANALYZE on all partitions individually. But it's not free either, and in
some cases (e.g. with partitioning by time) it may touch old data that
is not touched by regular queries, so likely not cached etc. That's a
bit against the idea to use partitioning to "segregate" old data.

One reason why the ANALYZE costs are not a huge issue in practice is
that we're not actually triggering that - changes_since_analyze is not
updated for the parent, so autoanalyze does not realize it needs to do
anything, and we never rebuild the statistics :-( But as shown in [2],
that may lead to poor estimates (and bad plans) in cases when we rely on
the parent's stats (probably due to not expanding the list of children
early enough).

(Note: I wonder if we might simply not rely on the parent stats at all,
and always consult directly the children stats - but with many children
that's likely problematic/expensive, and it does have the same issues as
the merging.)

The other thread [1] attempts to fix that by incrementing the counters
for the parent too, but that'll make this much worse. firstly, with
multi-level partitioning we'll have to sample the children repeatedly,
essentially once for each level. Secondly, it's tricky to control when
exactly are the counters propagated to the parent, making the parent
analyzes more frequent. Not great.

Even if we do a great job in [1] and come up with smart heuristics,
there will always be cases where it does not work too well and we either
analyze too late or too often.

Note: Propagating changes_since_analyze is only part of the story, as it
does not do anything about stats after attach/detach of a partition.


This attempts to approach the problem from the other end - instead of
tightly controlling when to analyze the parent, it makes the analyze
much cheaper. That means we don't need to worry too much about doing the
analyze too often, and we can update the stats more aggressively.

So, how does it work? Well, it simply fetches the statistics for all
children, and merges them together. For most statistics that's fairly
simple for most statistics types.

1) stawidth, stanullfrac - Those are trivial to merge.

2) stacorrelation - Not sure, I've used weighted average for now.
Perhaps we could store the "internal" counters (sumX, sumX2) which would
allow us to calculate regular estimate for the parent.

3) stadistinct - This is quite problematic. We only have the per-child
estimates, and it's not clear if there's any overlap. For now I've just
summed it up, because that's safer / similar to what we do for gather
merge paths etc. Maybe we could improve this by estimating the overlap
somehow (e.g. from MCV lists / histograms). But honestly, I doubt the
estimates based on tiny sample of each child are any better. I suppose
we could introduce a column option, determining how to combine ndistinct
(similar to how we can override n_distinct itself).

4) MCV - It's trivial to build a new "parent" MCV list, although it may
be too large (in which case we cut it at statistics target, and copy the
remaining bits to the histogram)

5) histograms - A bit more complicated, because it requires dealing with
overlapping bins, so we may have to "cut" and combine them in some way.
If we assume that cutting a bin in K parts means each part has 1/K
tuples (no matter where exactly we cut it), then it's trivial and it
works just fine in practice. That's because with N children, each bin
actually represents 1.0/(target*N) of the tuples, so the errors are
quite limited.


The attached patch is a PoC - it should work, but there's plenty of room
for improvement. It only deals with "regular" per-column statistics, not
with extended stats (but I don't see why it wouldn't work for them too).

It adds a new analyze option "MERGE" which does not sample the children
but instead just combines the statistics. So the example from [2] looks
like this:

======================================================================
create table p (i integer, j integer) partition by list  (i);

create table p0 partition of p for values in (0);
create table p1 partition of p for values in (1);

insert into p select 0,generate_series(1,1000);
insert into p select 1,generate_series(1,1000);

analyze p0;
analyze p1;

create table q (i integer);
insert into q values (0);
analyze q;

test=# explain select * from q join p using (i) where j
between 1 and 500;
                             QUERY PLAN
---------------------------------------------------------------------
 Hash Join  (cost=1.02..49.82 rows=5 width=8)
   Hash Cond: (p.i = q.i)
   ->  Append  (cost=0.00..45.00 rows=1000 width=8)
         ->  Seq Scan on p0 p_1  (cost=0.00..20.00 rows=500 width=8)
               Filter: ((j >= 1) AND (j <= 500))
         ->  Seq Scan on p1 p_2  (cost=0.00..20.00 rows=500 width=8)
               Filter: ((j >= 1) AND (j <= 500))
   ->  Hash  (cost=1.01..1.01 rows=1 width=4)
         ->  Seq Scan on q  (cost=0.00..1.01 rows=1 width=4)
(9 rows)

test=# analyze (merge) p;
ANALYZE
test=# explain select * from q join p using (i) where j
between 1 and 500;
                             QUERY PLAN
---------------------------------------------------------------------
 Hash Join  (cost=1.02..54.77 rows=500 width=8)
   Hash Cond: (p.i = q.i)
   ->  Append  (cost=0.00..45.00 rows=1000 width=8)
         ->  Seq Scan on p0 p_1  (cost=0.00..20.00 rows=500 width=8)
               Filter: ((j >= 1) AND (j <= 500))
         ->  Seq Scan on p1 p_2  (cost=0.00..20.00 rows=500 width=8)
               Filter: ((j >= 1) AND (j <= 500))
   ->  Hash  (cost=1.01..1.01 rows=1 width=4)
         ->  Seq Scan on q  (cost=0.00..1.01 rows=1 width=4)
(9 rows)

======================================================================

FWIW I'm not sure we need the separate MERGE mode, but it's an easy way
to allow both the regular and "merge" approach, so that it's possible to
experiment and compare the behavior.

One issue is that this would require some coordination between the
parent/child analyzes. Essentially, any time a child is analyzed, the
parent should rebuild the stats (to merge from the new child stats).
This is similar to the issue of analyzing the parent too often because
we don't know when exactly the counters get updated, but it's much
cheaper to merge the stats, so it's much less problematic.


regards


[1] https://commitfest.postgresql.org/32/2492/

[2]
https://www.postgresql.org/message-id/CAM-w4HO9hUHvJDVwQ8%3DFgm-znF9WNvQiWsfyBjCr-5FD7gWKGA%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Merging statistics from children instead of re-sampling everything

From
Justin Pryzby
Date:
Thanks for taking a fresh look at this.

As you've written it, this can apply to either/both partitioned or inheritence.
I imagine when "MERGE" goes away, this should apply only to partitioned tables.
(Actually, personally I would advocate to consider applying it to *both*, but I
don't think that's been the tendency over the last 4 years.  I wrote here about
some arguably-gratuitous differences between inheritence and partitioning.
https://www.postgresql.org/message-id/20180601221428.GU5164@telsasoft.com)

> +     if (*mcv_items > default_statistics_target)
> +             n = default_statistics_target;

It should use any non-default stats target of the parent's column

> +         * ignore anything but valid leaf relatiins with data, but release

sp: relatiins.

> +                elog(WARNING, "stats for %d %d not found",
> +                                        RelationGetRelid(rels[j]), vacattrstats[i]->attr->attnum);

should be %u %d

This code duplication is undesirable:
> +    /* Log the action if appropriate */
> +     * Determine which columns to analyze



Re: Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:

On 3/29/21 8:36 PM, Justin Pryzby wrote:
> Thanks for taking a fresh look at this.
> 
> As you've written it, this can apply to either/both partitioned or inheritence.
> I imagine when "MERGE" goes away, this should apply only to partitioned tables.
> (Actually, personally I would advocate to consider applying it to *both*, but I
> don't think that's been the tendency over the last 4 years.  I wrote here about
> some arguably-gratuitous differences between inheritence and partitioning.
> https://www.postgresql.org/message-id/20180601221428.GU5164@telsasoft.com)
> 

Haven't thought about that too much at this point, but I don't see any
reason to not apply it only to both cases. I might be missing something,
but the fact that with declarative partitioning we analyze the children,
while with inheritance we don't, seems a bit strange. Not sure.

>> +     if (*mcv_items > default_statistics_target)
>> +             n = default_statistics_target;
> 
> It should use any non-default stats target of the parent's column
> 

Yeah. That's a simplification, the non-PoC code would have to look at
per-column statistics target for the target / all the children, and make
some calculation based on that. I ignored that in the PoC.

>> +         * ignore anything but valid leaf relatiins with data, but release
> 
> sp: relatiins.
> 
>> +                elog(WARNING, "stats for %d %d not found",
>> +                                        RelationGetRelid(rels[j]), vacattrstats[i]->attr->attnum);
> 
> should be %u %d
> 
> This code duplication is undesirable:
>> +    /* Log the action if appropriate */
>> +     * Determine which columns to analyze
Yeah. Fine for PoC, but needs to be cleaned up in future patch.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:

On 3/29/21 9:24 PM, Tomas Vondra wrote:
> 
> 
> On 3/29/21 8:36 PM, Justin Pryzby wrote:
>> Thanks for taking a fresh look at this.
>>
>> As you've written it, this can apply to either/both partitioned or inheritence.
>> I imagine when "MERGE" goes away, this should apply only to partitioned tables.
>> (Actually, personally I would advocate to consider applying it to *both*, but I
>> don't think that's been the tendency over the last 4 years.  I wrote here about
>> some arguably-gratuitous differences between inheritence and partitioning.
>> https://www.postgresql.org/message-id/20180601221428.GU5164@telsasoft.com)
>>
> 
> Haven't thought about that too much at this point, but I don't see any
> reason to not apply it only to both cases. I might be missing something,
> but the fact that with declarative partitioning we analyze the children,
> while with inheritance we don't, seems a bit strange. Not sure.
> 

BTW I'm not sure the "merge" will / should go away. What I'd expect is
that we'll keep it, and you can do either "ANALYZE" or "ANALYZE
(MERGE)". The former does regular sampling, while the latter does the
proposed merging of stats.

For the autovacuum, I think a new autovacuum_analyze_merge GUC and
reloption would work - chances are people will want to set the default
and then maybe enable merging only for some cases. Not sure.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:
Hi,

I'd like to point out two less obvious things, about how this relates to
Tom's proposal [1] and patch [2] from 2015. Tom approached the problem
from a different direction, essentially allowing Var to be associated
with a list of statistics instead of just one.

So it's a somewhat orthogonal solution, and it has pros and cons. The
pro is that it can ignore statistics for eliminated partitions, thus
producing better estimates. The con is that it requires all the places
dealing with VariableStatData to assume there's a list, not just one,
making the code more complex and more CPU expensive (with sufficiently
many partitions).

However, it seems to me we could easily combine those two things - we
can merge the statistics (the way I proposed here), so that each Var has
still just one VariableStatData. That'd mean the various places don't
need to start dealing with a list, and it'd still allow ignoring stats
for eliminated partitions.

Of course, that assumes the merge is cheaper than processing the list of
statistics, but I find that plausible, especially the list needs to be
processed multiple (e.g. when considering different join orders, filters
and so on).

Haven't tried, but might be worth exploring in the future.


regards


[1] https://www.postgresql.org/message-id/7363.1426537103@sss.pgh.pa.us

[2] https://www.postgresql.org/message-id/22598.1425686096@sss.pgh.pa.us

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Merging statistics from children instead of re-sampling everything

From
Andrey Lepikhov
Date:
Sorry, I forgot to send CC into pgsql-hackers.
On 29/6/21 13:23, Tomas Vondra wrote:
> Because sampling is fairly expensive, especially if you have to do it 
> for large number of child relations. And you'd have to do that every 
> time *any* child triggers autovacuum, pretty much. Merging the stats is 
> way cheaper.
> 
> See the other thread linked from the first message.
Maybe i couldn't describe my idea clearly.
The most commonly partitioning is used for large tables.
I suppose to store a sampling reservoir for each partition, replace on 
update of statistics and merge to build statistics for parent table.
It can be spilled into tuplestore on a disk, or stored in a parent table.
In the case of complex inheritance we can store sampling reservoirs only 
for leafs.
You can consider this idea as an imagination, but the merging statistics 
approach has an extensibility problem on another types of statistics.
> 
> 
> On 6/29/21 9:01 AM, Andrey Lepikhov wrote:
>> On 30/3/21 03:51, Tomas Vondra wrote:
>>> Of course, that assumes the merge is cheaper than processing the list of
>>> statistics, but I find that plausible, especially the list needs to be
>>> processed multiple (e.g. when considering different join orders, filters
>>> and so on).
>> I think your approach have a chance. But I didn't understand: why do 
>> you merge statistics? I think we could merge only samples of each 
>> children and build statistics as usual.
>> Error of a sample merging procedure would be quite limited.
>>
> 


-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:
On 6/30/21 2:55 PM, Andrey Lepikhov wrote:
> Sorry, I forgot to send CC into pgsql-hackers.
> On 29/6/21 13:23, Tomas Vondra wrote:
>> Because sampling is fairly expensive, especially if you have to do it 
>> for large number of child relations. And you'd have to do that every 
>> time *any* child triggers autovacuum, pretty much. Merging the stats 
>> is way cheaper.
>>
>> See the other thread linked from the first message.
> Maybe i couldn't describe my idea clearly.
> The most commonly partitioning is used for large tables.
> I suppose to store a sampling reservoir for each partition, replace on 
> update of statistics and merge to build statistics for parent table.
> It can be spilled into tuplestore on a disk, or stored in a parent table.
> In the case of complex inheritance we can store sampling reservoirs only 
> for leafs.
> You can consider this idea as an imagination, but the merging statistics 
> approach has an extensibility problem on another types of statistics.
 >

Well, yeah - we might try that too, of course. This is simply exploring 
the "merge statistics" idea from [1], which is why it does not even 
attempt to do what you suggested. We may explore the approach with 
keeping per-partition samples, of course.

You're right maintaining a per-partition samples and merging those might 
solve (or at least reduce) some of the problems, e.g. eliminating most 
of the I/O that'd be needed for sampling. And yeah, it's not entirely 
clear how to merge some of the statistics types (like ndistinct). But 
for a lot of the basic stats it works quite nicely, I think.

I'm sure there'll be some complexity due to handling large / toasted 
values, etc. And we probably need to design this for large hierarchies 
(IMHO it should work with 10k partitions, not just 100), in which case 
it may still be quite a bit more expensive than merging the stats.

So maybe we should really support both, and combine them somehow?

regards


https://www.postgresql.org/message-id/CAM-w4HO9hUHvJDVwQ8%3DFgm-znF9WNvQiWsfyBjCr-5FD7gWKGA%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:
On 6/30/21 17:15, Tomas Vondra wrote:
> On 6/30/21 2:55 PM, Andrey Lepikhov wrote:
>> Sorry, I forgot to send CC into pgsql-hackers.
>> On 29/6/21 13:23, Tomas Vondra wrote:
>>> Because sampling is fairly expensive, especially if you have to do it
>>> for large number of child relations. And you'd have to do that every
>>> time *any* child triggers autovacuum, pretty much. Merging the stats
>>> is way cheaper.
>>>
>>> See the other thread linked from the first message.
>> Maybe i couldn't describe my idea clearly.
>> The most commonly partitioning is used for large tables.
>> I suppose to store a sampling reservoir for each partition, replace on
>> update of statistics and merge to build statistics for parent table.
>> It can be spilled into tuplestore on a disk, or stored in a parent table.
>> In the case of complex inheritance we can store sampling reservoirs
>> only for leafs.
>> You can consider this idea as an imagination, but the merging
>> statistics approach has an extensibility problem on another types of
>> statistics.
>>
> 
> Well, yeah - we might try that too, of course. This is simply exploring
> the "merge statistics" idea from [1], which is why it does not even
> attempt to do what you suggested. We may explore the approach with
> keeping per-partition samples, of course.
> 
> You're right maintaining a per-partition samples and merging those might
> solve (or at least reduce) some of the problems, e.g. eliminating most
> of the I/O that'd be needed for sampling. And yeah, it's not entirely
> clear how to merge some of the statistics types (like ndistinct). But
> for a lot of the basic stats it works quite nicely, I think.
> 
> I'm sure there'll be some complexity due to handling large / toasted
> values, etc. And we probably need to design this for large hierarchies
> (IMHO it should work with 10k partitions, not just 100), in which case
> it may still be quite a bit more expensive than merging the stats.
> 
> So maybe we should really support both, and combine them somehow?
> 

I've been thinking about this PoC patch regularly since I submitted it a
year ago, and I still think merging the statistics is an interesting
idea. It may be a huge win in various scenarios, like:

1) Multi-level partitioning hierarchies, where analyze of each level has
to re-sample all the leaf relations, causing a lot of I/O.

2) Partitioning with foreign/remote partitions, where analyze has to
retrieve significant amounts of data from the remote node over network
(so a different kind of I/O).

These issues will only get worse as the number of partitions used by
systems in the wild grows - we continuously reduce the per-partition
overhead, so people are likely to leverage that by using more of them.

But I don't have a very good idea what to do about statistics that we
can't really merge. For some types of statistics it's rather tricky to
reasonably merge the results - ndistinct is a simple example, although
we could work around that by building and merging hyperloglog counters.

What's trickier are extended statistics - I can't quite imagine merging
of functional dependencies, mvdistinct, etc. So if there are extended
stats we'd have to do the sampling anyway. (Some of the extended
statistics can also be defined only on the parent relation, in which
case we have nothing to merge. But that seems like a rare case.)

In any case, I don't have a very good plan how to move this patch
forward, so unless people have some interesting ideas I'll mark it as
returned with feedback. It's been lurking in the CF for ~1 year ...


However, It'd be a mistake to also discard the approach proposed by
Andrey Lepikhov - storing samples for individual relations, and then
just using those while analyzing the parent relations. That is more
expensive, but it does not have the issues with merging some types of
statistics, and so on.

It seems interesting also in for the dynamic sampling PoC patch [1],
which does sampling as part of the query planning. In that context the
cost of collecting the sample is clearly a major issue, and storing the
sample somewhere would help a lot. But the question is how/where to
store it. Joins are another issue, because we can't just build two
random samples. But let's discuss that in the other thread [1].


[1] https://commitfest.postgresql.org/36/3211/


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Merging statistics from children instead of re-sampling everything

From
Andrey Lepikhov
Date:
On 21/1/2022 01:25, Tomas Vondra wrote:
> But I don't have a very good idea what to do about statistics that we
> can't really merge. For some types of statistics it's rather tricky to
> reasonably merge the results - ndistinct is a simple example, although
> we could work around that by building and merging hyperloglog counters.
I think, as a first step on this way we can reduce a number of pulled 
tuples. We don't really needed to pull all tuples from a remote server. 
To construct a reservoir, we can pull only a tuple sample. Reservoir 
method needs only a few arguments to return a sample like you read 
tuples locally. Also, to get such parts of samples asynchronously, we 
can get size of each partition on a preliminary step of analysis.
In my opinion, even this solution can reduce heaviness of a problem 
drastically.

-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:

On 2/10/22 12:50, Andrey Lepikhov wrote:
> On 21/1/2022 01:25, Tomas Vondra wrote:
>> But I don't have a very good idea what to do about statistics that we
>> can't really merge. For some types of statistics it's rather tricky to
>> reasonably merge the results - ndistinct is a simple example, although
>> we could work around that by building and merging hyperloglog counters.
>
> I think, as a first step on this way we can reduce a number of pulled
> tuples. We don't really needed to pull all tuples from a remote server.
> To construct a reservoir, we can pull only a tuple sample. Reservoir
> method needs only a few arguments to return a sample like you read
> tuples locally. Also, to get such parts of samples asynchronously, we
> can get size of each partition on a preliminary step of analysis.
> In my opinion, even this solution can reduce heaviness of a problem
> drastically.
> 

Oh, wow! I haven't realized we're fetching all the rows from foreign
(postgres_fdw) partitions. For local partitions we already do that,
because that uses the usual acquire function, with a reservoir
proportional to partition size. I have assumed we use tablesample to
fetch just a small fraction of rows from FDW partitions, and I agree
doing that would be a pretty huge benefit.

I actually tried hacking that together - there's a couple problems with
that (e.g. determining what fraction to sample using bernoulli/system),
but in principle it seems quite doable. Some minor changes to the FDW
API may be necessary, not sure.

Not sure about the async execution - that seems way more complicated,
and the sampling reduces the total cost, async just parallelizes it.


That being said, this thread was not really about foreign partitions,
but about re-analyzing inheritance trees in general. And sampling
foreign partitions doesn't really solve that - we'll still do the
sampling over and over.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Merging statistics from children instead of re-sampling everything

From
"Andrey V. Lepikhov"
Date:
On 2/11/22 03:37, Tomas Vondra wrote:
> That being said, this thread was not really about foreign partitions,
> but about re-analyzing inheritance trees in general. And sampling
> foreign partitions doesn't really solve that - we'll still do the
> sampling over and over.
IMO, to solve the problem we should do two things:
1. Avoid repeatable partition scans in the case inheritance tree.
2. Avoid to re-analyze everything in the case of active changes in small 
subset of partitions.

For (1) i can imagine a solution like multiplexing: on the stage of 
defining which relations to scan, group them and prepare parameters of 
scanning to make multiple samples in one shot.
It looks like we need a separate logic for analysis of partitioned 
tables - we should form and cache samples on each partition before an 
analysis.
It requires a prototype to understand complexity of such solution and 
can be done separately from (2).

Task (2) is more difficult to solve. Here we can store samples from each 
partition in values[] field of pg_statistic or in specific table which 
stores a 'most probable values' snapshot of each table.
Most difficult problem here, as you mentioned, is ndistinct value. Is it 
possible to store not exactly calculated value of ndistinct, but an 
'expected value', based on analysis of samples and histograms on 
partitions? Such value can solve also a problem of estimation of a SETOP 
result grouping (joining of them, etc), where we have statistics only on 
sources of the union.

-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:

On 2/11/22 05:29, Andrey V. Lepikhov wrote:
> On 2/11/22 03:37, Tomas Vondra wrote:
>> That being said, this thread was not really about foreign partitions,
>> but about re-analyzing inheritance trees in general. And sampling
>> foreign partitions doesn't really solve that - we'll still do the
>> sampling over and over.
> IMO, to solve the problem we should do two things:
> 1. Avoid repeatable partition scans in the case inheritance tree.
> 2. Avoid to re-analyze everything in the case of active changes in small 
> subset of partitions.
> 
> For (1) i can imagine a solution like multiplexing: on the stage of 
> defining which relations to scan, group them and prepare parameters of 
> scanning to make multiple samples in one shot.
 >> It looks like we need a separate logic for analysis of partitioned
> tables - we should form and cache samples on each partition before an 
> analysis.
> It requires a prototype to understand complexity of such solution and 
> can be done separately from (2).
> 

I'm not sure I understand what you mean by multiplexing. The term 
usually means "sending multiple signals at once" but I'm not sure how 
that applies to this issue. Can you elaborate?

I assume you mean something like collecting a sample for a partition 
once, and then keeping and reusing the sample for future ANALYZE runs, 
until invalidated in some sense.

Yeah, I agree that'd be useful - and not just for partitions, actually. 
I've been pondering something like that for regular tables, because the 
sample might be used for estimation of clauses directly.

But it requires storing the sample somewhere, and I haven't found a good 
and simple way to do that. We could serialize that into bytea, or we 
could create a new fork, or something, but what should that do with 
oversized attributes (how would TOAST work for a fork) and/or large 
samples (which might not fit into 1GB bytea)?


> Task (2) is more difficult to solve. Here we can store samples from each 
> partition in values[] field of pg_statistic or in specific table which 
> stores a 'most probable values' snapshot of each table.

I think storing samples in pg_statistic is problematic, because values[] 
is subject to 1GB limit etc. Not an issue for small MCV list of a single 
attribute, certainly an issue for larger samples. Even if the data fit, 
the size of pg_statistic would explode.

> Most difficult problem here, as you mentioned, is ndistinct value. Is it 
> possible to store not exactly calculated value of ndistinct, but an 
> 'expected value', based on analysis of samples and histograms on 
> partitions? Such value can solve also a problem of estimation of a SETOP 
> result grouping (joining of them, etc), where we have statistics only on 
> sources of the union.
> 

I think ndistinct is problem only when merging final estimates. But if 
we're able to calculate and store some intermediate results, we can 
easily use HLL and merge that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Merging statistics from children instead of re-sampling everything

From
Robert Haas
Date:
On Wed, Jun 30, 2021 at 11:15 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> You're right maintaining a per-partition samples and merging those might
> solve (or at least reduce) some of the problems, e.g. eliminating most
> of the I/O that'd be needed for sampling. And yeah, it's not entirely
> clear how to merge some of the statistics types (like ndistinct). But
> for a lot of the basic stats it works quite nicely, I think.

It feels like you might in some cases get very different answers.
Let's say you have 1000 partitions. In each of those partitions, there
is a particular value that appears in column X in 50% of the rows.
This value differs for every partition. So you can imagine for example
that in partition 1, X = 1 with probability 50%; in partition 2, X = 2
with probability 50%, etc. There is also a value, let's say 0, which
appears in 0.5% of the rows in every partition. It seems possible that
0 is not an MCV in any partition, or in only some of them, but it
might be more common overall than the #1 MCV of any single partition.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Merging statistics from children instead of re-sampling everything

From
"Andrey V. Lepikhov"
Date:
On 2/11/22 20:12, Tomas Vondra wrote:
> 
> 
> On 2/11/22 05:29, Andrey V. Lepikhov wrote:
>> On 2/11/22 03:37, Tomas Vondra wrote:
>>> That being said, this thread was not really about foreign partitions,
>>> but about re-analyzing inheritance trees in general. And sampling
>>> foreign partitions doesn't really solve that - we'll still do the
>>> sampling over and over.
>> IMO, to solve the problem we should do two things:
>> 1. Avoid repeatable partition scans in the case inheritance tree.
>> 2. Avoid to re-analyze everything in the case of active changes in 
>> small subset of partitions.
>>
>> For (1) i can imagine a solution like multiplexing: on the stage of 
>> defining which relations to scan, group them and prepare parameters of 
>> scanning to make multiple samples in one shot.
> I'm not sure I understand what you mean by multiplexing. The term 
> usually means "sending multiple signals at once" but I'm not sure how 
> that applies to this issue. Can you elaborate?

I suppose to make a set of samples in one scan: one sample for plane 
table, another - for a parent and so on, according to the inheritance 
tree. And cache these samples in memory. We can calculate all parameters 
of reservoir method to do it.

> sample might be used for estimation of clauses directly.
You mean, to use them in difficult cases, such of estimation of grouping 
over APPEND ?
> 
> But it requires storing the sample somewhere, and I haven't found a good 
> and simple way to do that. We could serialize that into bytea, or we 
> could create a new fork, or something, but what should that do with 
> oversized attributes (how would TOAST work for a fork) and/or large 
> samples (which might not fit into 1GB bytea)? 
This feature looks like meta-info over a database. It can be stored in 
separate relation. It is not obvious that we need to use it for each 
relation, for example, with large samples. I think, it can be controlled 
by a table parameter.

-- 
regards,
Andrey Lepikhov
Postgres Professional



Re: Merging statistics from children instead of re-sampling everything

From
Tomas Vondra
Date:

On 2/14/22 11:22, Andrey V. Lepikhov wrote:
> On 2/11/22 20:12, Tomas Vondra wrote:
>>
>>
>> On 2/11/22 05:29, Andrey V. Lepikhov wrote:
>>> On 2/11/22 03:37, Tomas Vondra wrote:
>>>> That being said, this thread was not really about foreign partitions,
>>>> but about re-analyzing inheritance trees in general. And sampling
>>>> foreign partitions doesn't really solve that - we'll still do the
>>>> sampling over and over.
>>> IMO, to solve the problem we should do two things:
>>> 1. Avoid repeatable partition scans in the case inheritance tree.
>>> 2. Avoid to re-analyze everything in the case of active changes in 
>>> small subset of partitions.
>>>
>>> For (1) i can imagine a solution like multiplexing: on the stage of 
>>> defining which relations to scan, group them and prepare parameters 
>>> of scanning to make multiple samples in one shot.
>> I'm not sure I understand what you mean by multiplexing. The term 
>> usually means "sending multiple signals at once" but I'm not sure how 
>> that applies to this issue. Can you elaborate?
> 
> I suppose to make a set of samples in one scan: one sample for plane 
> table, another - for a parent and so on, according to the inheritance 
> tree. And cache these samples in memory. We can calculate all parameters 
> of reservoir method to do it.
> 

I doubt keeping the samples just in memory is a good solution. Firstly, 
there's the question of memory consumption. Imagine a large partitioned 
table with 1-10k partitions. If we keep a "regular" sample (30k rows) 
per partition, that's 30M-300M rows. If each row needs 100B, that's 
3-30GB of data.

Sure, maybe we could keep smaller per-partition samples, large enough to 
get the merged sample of 30k row. But then you can also have higher 
statistics target values, the rows can be larger, etc.

So a couple of GB per inheritance tree can easily happen. And this data 
may not be used all that often, so keeping it in memory may be wasteful.

But maybe you have an idea how to optimize sizes per-partition samples? 
In principle we need

   30k * size(partition) / size(total)

for each partition, but the trouble is partitions may be detached, data 
may be deleted from some partitions, etc.

Also, what would happen after a restart? If we lose the samples, we'll 
have to resample everything anyway - and after a restart the system is 
usually fairly busy, so that's not a great timing.

So IMHO the samples need to be serialized, in some way.

>> sample might be used for estimation of clauses directly.
> You mean, to use them in difficult cases, such of estimation of grouping 
> over APPEND ?

That's one example, yes. But the sample might be used even to estimate 
complex conditions on a single partition (there's plenty of stuff we 
can't estimate from MCV/histogram).

>> But it requires storing the sample somewhere, and I haven't found a 
>> good and simple way to do that. We could serialize that into bytea, or 
>> we could create a new fork, or something, but what should that do with 
>> oversized attributes (how would TOAST work for a fork) and/or large 
>> samples (which might not fit into 1GB bytea)? 
> This feature looks like meta-info over a database. It can be stored in 
> separate relation. It is not obvious that we need to use it for each 
> relation, for example, with large samples. I think, it can be controlled 
> by a table parameter.
> 

Well, a separate catalog is one of the options. But I don't see how that 
deals with large samples, etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Merging statistics from children instead of re-sampling everything

From
"Andrey V. Lepikhov"
Date:
On 2/14/22 20:16, Tomas Vondra wrote:
> 
> 
> On 2/14/22 11:22, Andrey V. Lepikhov wrote:
>> On 2/11/22 20:12, Tomas Vondra wrote:
>>>
>>>
>>> On 2/11/22 05:29, Andrey V. Lepikhov wrote:
>>>> On 2/11/22 03:37, Tomas Vondra wrote:
>>>>> That being said, this thread was not really about foreign partitions,
>>>>> but about re-analyzing inheritance trees in general. And sampling
>>>>> foreign partitions doesn't really solve that - we'll still do the
>>>>> sampling over and over.
>>>> IMO, to solve the problem we should do two things:
>>>> 1. Avoid repeatable partition scans in the case inheritance tree.
>>>> 2. Avoid to re-analyze everything in the case of active changes in 
>>>> small subset of partitions.
>>>>
>>>> For (1) i can imagine a solution like multiplexing: on the stage of 
>>>> defining which relations to scan, group them and prepare parameters 
>>>> of scanning to make multiple samples in one shot.
>>> I'm not sure I understand what you mean by multiplexing. The term 
>>> usually means "sending multiple signals at once" but I'm not sure how 
>>> that applies to this issue. Can you elaborate?
>>
>> I suppose to make a set of samples in one scan: one sample for plane 
>> table, another - for a parent and so on, according to the inheritance 
>> tree. And cache these samples in memory. We can calculate all 
>> parameters of reservoir method to do it.
>>
> 
> I doubt keeping the samples just in memory is a good solution. Firstly, 
> there's the question of memory consumption. Imagine a large partitioned 
> table with 1-10k partitions. If we keep a "regular" sample (30k rows) 
> per partition, that's 30M-300M rows. If each row needs 100B, that's 
> 3-30GB of data.
I tell about caching a sample only for a time that it needed in this 
ANALYZE operation. Imagine 3 levels of partitioned table. On each 
partition you should create and keep three different samples (we can do 
it in one scan). Sample for a plane table we can use immediately and 
destroy it.
Sample for the partition on second level of hierarchy: we can save a 
copy of sample for future usage (maybe, repeated analyze) to a disk. 
In-memory data used to form a reservoir, that has a limited size and can 
be destroyed immediately. At the third level we can use the same logic.
So, at one moment we only use as many samples as many levels of 
hierarchy we have. IMO, it isn't large number.

 > the trouble is partitions may be detached, data may be deleted from
 > some partitions, etc.
Because statistics hasn't strong relation with data, we can use two 
strategies: In the case of explicit 'ANALYZE <table>' we can recalculate 
all samples for all partitions, but in autovacuum case or implicit 
analysis we can use not-so-old versions of samples and samples of 
detached (but not destroyed) partitions in optimistic assumption that it 
doesn't change statistic drastically.

> So IMHO the samples need to be serialized, in some way.
Agreed

> Well, a separate catalog is one of the options. But I don't see how that 
> deals with large samples, etc.
I think, we can design fall back to previous approach in the case of 
very large tuples, like a switch from HashJoin to NestedLoop if we 
estimate, that we haven't enough memory.

-- 
regards,
Andrey Lepikhov
Postgres Professional



Fwd: Merging statistics from children instead of re-sampling everything

From
Damir Belyalov
Date:
3) stadistinct - This is quite problematic. We only have the per-child
estimates, and it's not clear if there's any overlap. For now I've just
summed it up, because that's safer / similar to what we do for gather
merge paths etc. Maybe we could improve this by estimating the overlap
somehow (e.g. from MCV lists / histograms). But honestly, I doubt the
estimates based on tiny sample of each child are any better. I suppose
we could introduce a column option, determining how to combine ndistinct
(similar to how we can override n_distinct itself).

4) MCV - It's trivial to build a new "parent" MCV list, although it may
be too large (in which case we cut it at statistics target, and copy the
remaining bits to the histogram)

I think there is one approach to solve the problem with calculating mcv and distinct statistics.
To do this, you need to calculate the density of the sample distribution and store it, for example, in some slot.
Then, when merging statistics, we will sum up the densities of all partitions as functions and get a new density.
According to the new density, you can find out which values are most common and which are distinct.

To calculate the partition densities, you can use the "Kernel density Estimation" - https://www.statsmodels.org/dev/examples/notebooks/generated/kernel_density html

The approach may be very inaccurate and difficult to implement, but solves the problem.

Regards,
Damir Belyalov
Postgres Professional