Re: extended stats on partitioned tables - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: extended stats on partitioned tables
Date
Msg-id 20210925214610.GW831@telsasoft.com
Whole thread Raw
In response to Re: extended stats on partitioned tables  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: extended stats on partitioned tables  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Sat, Sep 25, 2021 at 11:01:21PM +0200, Tomas Vondra wrote:
> > Do you think it's possible to backpatch a fix to handle partitioned tables
> > specifically ?
> > 
> > The "tuple already updated" error which I reported and which was fixed by
> > 859b3003 involved inheritence children.  Since partitioned tables have no data
> > themselves, the !inh check could be relaxed.  It's not totally clear to me if
> > the correct statistics would be used in that case.  I suppose the wrong
> > (inherited) stats would be wrongly applied affect queries FROM ONLY a
> > partitioned table, which seems pointless to write and also hard for the
> > estimates to be far off :)
> 
> Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure we
> never build stats with and without inheritance at the same time (for the
> same rel). The 859b3003de ensures that by only building extended stats in
> the (!inh) case, but we might tweak that based on relkind. See the attached
> patch. But I wonder if there are cases that might be hurt by this - that'd
> be a regression too, of course.

I think we should leave the inheritance case alone, since it hasn't changed in
2 years, and building stats on the table ONLY is a legitimate interpretation,
and it's as good as we can do without the catalog change.

But the partitioned case used to work, and there's no utility in selecting FROM
ONLY a partitioned table, so we might as well build the stats including its
partitions.

I don't think anything would get worse for the partitioned case.
Obviously building inherited ext stats could change plans - that's the point.
It's weird that the stats objects which existed for 18 months before being
"built" after the patch was applied, but no so weird that the release notes
wouldn't be ample documentation.

If building statistics caused the plan to change undesirably, the solution
would be to drop the stats object, of course.

+               build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);
                                                                                                     
 

It's weird to build inherited extended stats for partitioned tables but not for
inheritence parents.  We could be clever and say "build inherited ext stats for
inheritence parents only if we didn't insert any stats for the table itself
(because it's empty)".  But I think that's fragile: a single tuple in the
parent table could cause stats to be built there instead of on its heirarchy,
and the extended stats would be used for *both* FROM and FROM ONLY, which is an
awful combination.

Since do_analyze_rel is only called once for partitioned tables, I think you
could write that as:

/* Do not build inherited stats (since the catalog cannot support it) except
 * for partitioned tables, for which numrows==0 and have no non-inherited stats */
build_ext_stats = !inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;

-- 
Justin



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: extended stats on partitioned tables
Next
From: Justin Pryzby
Date:
Subject: Re: extended stats on partitioned tables