Thread: Re: [PoC] Partition path cache

Re: [PoC] Partition path cache

From
Andy Fan
Date:
Bykov Ivan <I.Bykov@modernsys.ru> writes:

> Our customers use databases (1-10 TB) with big tables. Often these tables are
> big and split on sections. For example, we have tables with almost
> thousand sections. In most cases, those sections have a similar set of indexes
> and contain similar data. Often this partitioned table has multilevel structure
> (for example, a per-year section has a per-quarter section, and a per-quarter
> section in turn has a per-month simple relation sections).
>
> During the analysis of the planning procedure, we found that the planner
> we found that the planner in PostgreSQL 15.7 spents a lot of time building
> access paths.
..
> We backported new access path build algorithms from PostgreSQL 17 (which
> optimizes match_pathkeys_to_index()) and it takes effect:
> planner spent 1090 ms for planning query at first time and 320 ms for
> second time.
>
> But we still think that planners make unnecessary jobs when building all
> types of paths for every section. So we implemented a feature named
> “partition path cache” (see next section), and now planner spent 970 ms for
> planning query at the first time and 240 ms for the second time.

>
> Partition path cache
> ====================
> The partition path cache aims to speed up planning for partition scan paths.
>
> Path cache doesn't copy and transform Path nodes directly due to the absence of
> Path nodes copy functions. The main aim of this patch is to prove the assumption
> that partitions of the same relation that have similar sets of indexes may use
> similar access path types.

This sounds like an interesting idea, I like it because it omit the needs
for "global statistics" effort for partitioned table since it just use
the first partition it knows. Of couse it has its drawback that "first"
partition can't represent other partitions.

One of the Arguments of this patch might be "What if other partitions
have a pretty different statistics from the first partition?". If I were
you, I might check all the used statistics on this stage and try to find
out a similar algorithms to prove that the best path would be similar
too. This can happens once when the statistics is gathered. However this
might be not easy.

--
Best Regards
Andy Fan




RE: [PoC] Partition path cache

From
Bykov Ivan
Date:
Hello

> This sounds like an interesting idea, I like it because it omit the needs for "global statistics" effort for
partitionedtable since it just use the first partition it knows. Of couse it has its drawback that "first"
 
> partition can't represent other partitions.

This method uses global statistics for all partitions. 
The cache uses standard path building functions (it calculates selectivity for path), but it avoids calling all of them
forthe second and later partitions in a group.
 

The concept is similar to the GEQO method used for joins.
We skip creating some path variants if building all paths would take too long.

> One of the Arguments of this patch might be "What if other partitions have a pretty different statistics from the
firstpartition?". If I were you, I might check all the used statistics on this stage and try to find out a similar
algorithmsto > prove that the best path would be similar too. This can happens once when the statistics is gathered.
Howeverthis might be not easy.
 

Yes, maybe we can split partitions by groups not only by available index lists but also by some statistical property
ranges.

--
Best Regards
Ivan Bykov