Re: [HACKERS] WITH clause in CREATE STATISTICS - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] WITH clause in CREATE STATISTICS
Date
Msg-id b6b1f3a1-281a-20bc-d961-60604046c73d@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] WITH clause in CREATE STATISTICS  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] WITH clause in CREATE STATISTICS  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On 5/12/17 4:46 AM, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Hmm, yeah, makes sense.  Here's a patch for this approach.
> 
> ...
> 
> Also, while reading the docs changes, I got rather ill from the
> inconsistent and not very grammatical handling of "statistics" as a
> noun, particularly the inability to decide from one sentence to the next
> if that is singular or plural.  In the attached, I fixed this in the
> ref/*_statistics.sgml files by always saying "statistics object" instead.
> If people agree that that reads better, we should make an effort to
> propagate that terminology elsewhere in the docs, and into error messages,
> objectaddress.c output, etc.
> 

I'm fine with the 'statistics object' wording. I've been struggling with 
this bit while working on the patch, and I agree it sounds better and 
getting it consistent is definitely worthwhile.
>
> Although I've not done anything about it here, I'm not happy about the
> handling of dependencies for stats objects.  I do not think that cloning
> RemoveStatistics as RemoveStatisticsExt is sane at all.  The former is
> meant to deal with cleaning up pg_statistic rows that we know will be
> there, so there's no real need to expend pg_depend overhead to track them.
> For objects that are only loosely connected, the right thing is to use
> the dependency system; in particular, this design has no real chance of
> working well with cross-table stats.  Also, it's really silly to have
> *both* this hard-wired mechanism and a pg_depend entry; the latter is
> surely redundant if you have the former.  IMO we should revert
> RemoveStatisticsExt and instead handle things by making stats objects
> auto-dependent on the individual column(s) they reference (not the whole
> table).
> 
> I'm also of the opinion that having an AUTO dependency, rather than
> a NORMAL dependency, on the stats object's schema is the wrong semantics.
> There isn't any other case where you can drop a non-empty schema without
> saying CASCADE, and I'm mystified why this case should act that way.
> 

Yeah, it's a bit frankensteinian ...
>
> Lastly, I tried the example given in the CREATE STATISTICS reference page,
> and it doesn't seem to work.  Without the stats object, the two queries
> are both estimated at one matching row, whereas they really produce 100
> and 0 rows respectively.  With the stats object, they seem to both get
> estimated at ~100 rows, which is a considerable improvement for one case
> but a considerable disimprovement for the other.  If this is not broken,
> I'd like to know why not.  What's the point of an expensive extended-
> stats mechanism if it can't get this right?
> 

I assume you're talking about the functional dependencies and in that 
case that's expected behavior, because f. dependencies do assume the 
conditions are "consistent" with the functional dependencies.

This is an inherent limitation of functional dependencies, and perhaps a 
price for the simplicity of this statistics type. If your application 
executes queries that are likely not consistent with this, don't use 
functional dependencies.

The simplicity is why dependencies were implemented first, mostly to 
introduce all the infrastructure. The other statistics types (MCV, 
histograms) don't have this limitation, but those did not make it into pg10.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Alexander Kuzmenkov
Date:
Subject: [HACKERS] PoC: full merge join on comparison clause
Next
From: Rahila Syed
Date:
Subject: Re: [HACKERS] Adding support for Default partition in partitioning