Re: [HACKERS] extended statistics: n-distinct - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [HACKERS] extended statistics: n-distinct
Date
Msg-id 20170322210345.zoqj4tmdyoh23mxm@alvherre.pgsql
Whole thread Raw
In response to Re: [HACKERS] extended statistics: n-distinct  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: extended statistics: n-distinct  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Kyotaro HORIGUCHI wrote:

> At Mon, 20 Mar 2017 16:02:20 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
<20170320190220.ixlaueanxegqd5gr@alvherre.pgsql>

> > This is a new thread to present a version of the n-distinct patch that
> > IMO is close enough to commit.  There are some work items still.
> > There's some discussion on the topic of cross-column statistics:
> > https://wiki.postgresql.org/wiki/Cross_Columns_Stats
> > 
> > This problem is important enough that Kyotaro Horiguchi submitted
> > another patch that does the same thing:
> > https://www.postgresql.org/message-id/flat/20150828.173334.114731693.horiguchi.kyotaro%40lab.ntt.co.jp
> > This patch aims to provide the same functionality, keeping the design
> > general enough that other kinds of statistics can be added later (such
> > as functional dependencies, histograms and MCVs, all of which have been
> > previously submitted as patches by Tomas).
> 
> I may be stupid but I don't get the picture here, specifically
> about the relation to Tomas's patch. Does this work as
> infrastructure for Tomas's mv patch? Or in some other
> relationsip?

Well, this patch is Tomas' first patch, which I've reviewed and reworked
-- I changed some things that weren't properly finished, cleaned up the
code, made it all more robust, and made sure the sane cases work sanely
while the others rejected promptly (rather than throwing bogus error
messages at a later time, or crashing).

I didn't review your own n-distinct patch.  I don't think there's any
common code, but it would be very useful if you could try your test
scenarios and make sure they are handled sanely by this patch.

Regarding your question:

> Do you planning to realize correcting esitimation of joins
> perplexed by strong correlations?

There is a later patch in Tomas' series which I would like to get to
before PG10 closes, but it's not this patch.  It needs to be rebased on
top of this one.

Attached is v30, which includes some more cleanup.  Detailed commits can
be seen here:
https://github.com/2ndQuadrant/postgres/commits/dev/mvstats-ndistinct
In particular, this includes code from Tomas to consider mixing
ndistinct estimates from multiple multivariate statistic objects, which
is better than the old approach of only using the estimate when a
perfect match was found.  However, I lobotomized Tomas' selfuncs.c code
however and I need to revert that part before pushing -- essentially I
removed examine_variable() processing, which seemed a bit on the
expensive side for what we were doing, but that was a silly mistake.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Elvis Pranskevichus
Date:
Subject: Re: [HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.
Next
From: David Steele
Date:
Subject: Re: [HACKERS] increasing the default WAL segment size