pgsql: Correctly calculate "MCV frequency" for a unique column. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Correctly calculate "MCV frequency" for a unique column.
Date
Msg-id E1vwl2L-001nu0-2u@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Correctly calculate "MCV frequency" for a unique column.

In commit bd3e3e9e5, I over-hastily used 1 / rel->rows as the assumed
frequency of entries in a column that ANALYZE has found to be unique.
However, rel->rows is the number of table rows that are estimated to
pass the query's restriction conditions, so that we got a too-large
result if the query has selective restrictions.  What I should have
used is 1 / rel->tuples, since that is the estimated total number of
table rows.  The pre-existing code path that digs a frequency out of
the histogram produces a frequency relative to the whole table, so
surely this new alternative code path must do so as well.  Any
correction needed on the basis of selectivity must be done by the
user of the mcv_freq value.

Fixing this causes all the regression test plans changed by bd3e3e9e5
to revert to what they had been, except for the first change in
join.out.  As I correctly argued in bd3e3e9e5, in that test case we
have no stats and should not risk a hash join.  Evidently I was less
correct to argue that the other changes were improvements.

Reported-by: Joel Jacobson <joel@compiler.org>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/341b723c-da45-4058-9446-1514dedb17c1@app.fastmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/d80b0225010fd407c784bbecde116a28198b6eab

Modified Files
--------------
src/backend/utils/adt/selfuncs.c             |   9 +-
src/test/regress/expected/join.out           |  16 +-
src/test/regress/expected/partition_join.out | 432 ++++++++++++++-------------
3 files changed, 231 insertions(+), 226 deletions(-)


pgsql-committers by date:

Previous
From: Fujii Masao
Date:
Subject: pgsql: psql: Show comments in \dRp+, \dRs+, and \dX+ psql meta-commands
Next
From: Michael Paquier
Date:
Subject: pgsql: Fix set of issues with extended statistics on expressions