Re: extended stats on partitioned tables - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: extended stats on partitioned tables |
Date | |
Msg-id | 689accb1-76e8-5115-6ed0-336749cdca57@enterprisedb.com Whole thread Raw |
In response to | extended stats on partitioned tables (Justin Pryzby <pryzby@telsasoft.com>) |
List | pgsql-hackers |
On 11/4/21 12:19 AM, Justin Pryzby wrote: > On Wed, Nov 03, 2021 at 11:48:44PM +0100, Tomas Vondra wrote: >> On 10/8/21 12:45 AM, Justin Pryzby wrote: >>> On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote: >>>> On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote: >>>>> On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote: >>>>>> It seems like your patch should also check "inh" in examine_variable and >>>>>> statext_expressions_load. >>>>> >>>>> I tried adding that - I mostly kept my patches separate. >>>>> Hopefully this is more helpful than a complication. >>>>> I added at: https://commitfest.postgresql.org/35/3332/ >>>>> >>>> >>>> Actually, this is confusing. Which patch is the one we should be >>>> reviewing? >>> >>> It is confusing, but not as much as I first thought. Please check the commit >>> messages. >>> >>> The first two patches are meant to be applied to master *and* backpatched. The >>> first one intends to fixes the bug that non-inherited stats are being used for >>> queries of inheritance trees. The 2nd one fixes the regression that stats are >>> not collected for inheritence trees of partitioned tables (which is the only >>> type of stats they could ever possibly have). >> >> I think 0001 and 0002 seem mostly fine, but it seems a bit strange to do >> the (!rte->inh) check in the rel->statlist loops. AFAICS both places >> could do that right at the beginning, because it does not depend on the >> statistics object at all, just the RelOptInfo. > > I probably did this to make the code change small, to avoid indentin the whole > block. But indenting the block is not necessary. It's possible to do something like this: if (!rel->inh) return 1.0; or whatever is the "default" result for that function. > >>> And the 3rd+4th patches (Tomas' plus my changes) allow collecting both >>> inherited and non-inherited stats, only in master, since it requires a catalog >>> change. It's a bit confusing that patch #4 removes most what I added in >>> patches 1 and 2. But that's exactly what's needed to collect and apply both >>> inherited and non-inherited stats: the first two patches avoid applying stats >>> collected with the wrong inheritence. That's also what's needed for the >>> patchset to follow the normal "apply to master and backpatch" process, rather >>> than 2 patches which are backpatched but not applied to master, and one which >>> is applied to master and not backpatched.. >>> >> >> Yeah. Af first I was a bit confused because after applying 0003 there >> are both the fixes and the "correct" way, but then I realized 0004 >> removes the unnecessary bits. > > This was to leave your 0003 (mostly) unchanged, so you can see and/or apply my > changes. They should be squished together. > Yep. >> The one thing 0003 still needs is to rework the places that need to >> touch both inh and !inh stats. The patch simply does >> >> for (inh = 0; inh <= 1; inh++) { ... } >> >> but that feels a bit too hackish. But if we don't know which of the two >> stats exist, I'm not sure what to do about it. > > There's also this: > > On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote: >> + /* create only the "stxdinherit=false", because that always exists */ >> + datavalues[Anum_pg_statistic_ext_data_stxdinherit - 1] = ObjectIdGetDatum(false); >> >> That'd be confusing for partitioned tables, no? >> They'd always have an row with no data. >> I guess it could be stxdinherit = BoolGetDatum(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE). >> (not ObjectIdGetDatum). >> Then, that affects the loops which delete the tuples - neither inh nor !inh is >> guaranteed, unless you check relkind there, too. > > Maybe the for inh<=1 loop should instead be two calls to new functions factored > out of get_relation_statistics() and RemoveStatisticsById(), which take "bool > inh". > Well, yeah. That's part of the strange 1:2 mapping between the stats definition and data. Although, even with regular stats we have such mapping, except the "definition" is the pg_attribute row. >> And I'm not sure we do the right thing after removing children, for example >> (that should drop the inheritance stats, I guess). > > Do you mean for inheritance only ? Or partitions too ? > I think for partitions, the stats should stay. > And for inheritence, they can stay, for consistency with partitions, and since > it does no harm. > I think the behavior should be the same as for data in pg_statistic, i.e. if we keep/remove those, we should do the same thing for extended statistics. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: