Thread: no MCV list of tiny table with unique columns

no MCV list of tiny table with unique columns

From

Justin Pryzby

Date:

02 November 2016, 18:53:26

We have a report query which joins (multiple times, actually) against this
trivial, tiny table:

ts=# \d bsm_to_switch
Table "public.bsm_to_switch"
 Column | Type | Modifiers
--------+------+-----------
 bsm    | text | not null
 switch | text | not null

ts=# SELECT length(bsm), length(switch) FROM bsm_to_switch;
 length | length
--------+--------
     10 |      6
     10 |      6
(2 rows)

The column values are distinct.

I believe the join is being (badly) underestimated, leading to a crappy plan
involving multiple nested loop joins, which takes 2.5 hours instead of a
handful of seconds; I believe that might be resolved by populating its MCV
list..

..however, on reading commands/analyze.c, the issue is these columns have no
duplicates, and also postgres decides that "since the number of distinct rows
is greater than 10% of the total number of rows", that ndistinct should be -1
(meaning it scales with the table size).  That's fine, except that it then
effectively precludes populating the MCV list.

|    if (nmultiple == 0)
|    {
|        /*
|         * If we found no repeated non-null values, assume it's a unique
|         * column; but be sure to discount for any nulls we found.
|         */
|        stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
|    }
|    else if (track_cnt < track_max && toowide_cnt == 0 &&
|             nmultiple == track_cnt)
|    {
|        /*
|         * Our track list includes every value in the sample, and every
|         * value appeared more than once.  Assume the column has just
|         * these values.  (This case is meant to address columns with
|         * small, fixed sets of possible values, such as boolean or enum
|         * columns.  If there are any values that appear just once in the
|         * sample, including too-wide values, we should assume that that's
|         * not what we're dealing with.)
|         */
|        stats->stadistinct = track_cnt;
|    }

ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE
tablename='bsm_to_switch';
 attname | inherited | null_frac | avg_width | n_distinct | most_common_vals
---------+-----------+-----------+-----------+------------+------------------
 bsm     | f         |         0 |        11 |         -1 |
 switch  | f         |         0 |         7 |         -1 |
(2 rows)


Any ideas?  I tried setting n_distinct=2, but that seems to not have any effect
within ANALYZE itself.

ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE
tablename='bsm_to_switch';
 attname | inherited | null_frac | avg_width | n_distinct | most_common_vals
---------+-----------+-----------+-----------+------------+------------------
 bsm     | f         |         0 |        11 |          2 |
 switch  | f         |         0 |         7 |          2 |
(2 rows)

Thanks in advance.

Justin

Re: no MCV list of tiny table with unique columns

From

Tom Lane

Date:

02 November 2016, 20:05:52

Justin Pryzby <pryzby@telsasoft.com> writes:
> I believe the join is being (badly) underestimated, leading to a crappy plan
> involving multiple nested loop joins, which takes 2.5 hours instead of a
> handful of seconds; I believe that might be resolved by populating its MCV
> list..

With only two rows in the table, I'm not real sure why you'd need an MCV
list.  Could we see the actual problem query (and the other table
schemas), rather than diving into the code first?

            regards, tom lane

Re: no MCV list of tiny table with unique columns

From

Justin Pryzby

Date:

02 November 2016, 20:20:05

On Wed, Nov 02, 2016 at 04:05:46PM -0400, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > I believe the join is being (badly) underestimated, leading to a crappy plan
> > involving multiple nested loop joins, which takes 2.5 hours instead of a
> > handful of seconds; I believe that might be resolved by populating its MCV
> > list..
>
> With only two rows in the table, I'm not real sure why you'd need an MCV
> list.  Could we see the actual problem query (and the other table
> schemas), rather than diving into the code first?

Sigh, yes, but understand that it's a legacy report which happens to currently
be near the top of my list of things to improve:

https://explain.depesz.com/s/5rN6

The relevant table is involved three times:

Seq Scan on two_november mike_oscar (cost=0.000..1.020 rows=2 width=18) (actual time=0.010..0.010 rows=2 loops=1)
Seq Scan on echo_oscar foxtrot (cost=0.000..209.860 rows=6,286 width=13) (actual time=0.014..2.271 rows=5,842 loops=1)
Seq Scan on two_november xray_yankee_alpha (cost=0.000..1.020 rows=2 width=18) (actual time=0.017..0.019 rows=2
loops=1) 

Justin

Re: no MCV list of tiny table with unique columns

From

Tom Lane

Date:

02 November 2016, 23:48:42

Justin Pryzby <pryzby@telsasoft.com> writes:
>> With only two rows in the table, I'm not real sure why you'd need an MCV
>> list.  Could we see the actual problem query (and the other table
>> schemas), rather than diving into the code first?

> Sigh, yes, but understand that it's a legacy report which happens to currently
> be near the top of my list of things to improve:
> https://explain.depesz.com/s/5rN6

Hmm, I wonder what you have join_collapse_limit and from_collapse_limit
set to.  There's an awful lot of tables in that query.

Also, it seems like most of the rowcount misestimations have to do with
inheritance child tables, eg

  Append (cost=0.000..50,814.990 rows=2,156 width=36) (actual time=9.054..1,026.409 rows=429,692 loops=1)
    Seq Scan on delta_mike golf_six (cost=0.000..0.000 rows=1 width=36) (actual time=0.009..0.009 rows=0 loops=1)
      Filter: ((four_charlie >= 'alpha_six'::timestamp without time zone) AND (four_charlie <= 'four_three'::timestamp
withouttime zone) AND (echo_tango('seven_november'::text, four_charlie) >= 'november_golf'::double precision) AND
(echo_tango('seven_november'::text,four_charlie) <= 'papa_quebec'::double precision)) 
    Index Scan using bravo on papa_two four_delta (cost=0.430..50,814.990 rows=2,155 width=36) (actual
time=9.043..848.063rows=429,692 loops=1) 
      Index Cond: ((four_charlie >= 'alpha_six'::timestamp without time zone) AND (four_charlie <=
'four_three'::timestampwithout time zone)) 
      Filter: ((echo_tango('seven_november'::text, four_charlie) >= 'november_golf'::double precision) AND
(echo_tango('seven_november'::text,four_charlie) <= 'papa_quebec'::double precision)) 

There's not a lot of point in worrying about your two-row table when these
other estimates are off by multiple orders of magnitude.  In this
particular case my first bet would be that the planner has no idea about
the selectivity of the conditions on "echo_tango('seven_november'::text,
four_charlie)".  Reformulating that, or maybe making an index on it just
so that ANALYZE will gather stats about it, could help.

            regards, tom lane

Re: no MCV list of tiny table with unique columns

From

Justin Pryzby

Date:

05 November 2016, 22:36:52

On Wed, Nov 02, 2016 at 07:48:23PM -0400, Tom Lane wrote:

> There's not a lot of point in worrying about your two-row table when these
> other estimates are off by multiple orders of magnitude.  In this
> particular case my first bet would be that the planner has no idea about
> the selectivity of the conditions on "echo_tango('seven_november'::text,
> four_charlie)".  Reformulating that, or maybe making an index on it just
> so that ANALYZE will gather stats about it, could help.

Thanks, you're exactly right.  That's date_trunc('hour') BTW.

We actually already have a "new way" of doing that which avoids date_trunc, so
now I just have to get it in place for 100+ old reports..

I thought I had tried that before, but I think I was confusing myself, and
tried putting the index on the parent, which ends up with no stats since it's
empty.

With indices+analyze:
 Sort  (cost=189014.28..189014.28 rows=1 width=785) (actual time=25063.831..25063.886 rows=328 loops=1)
 ...

BTW:
join_collapse_limit | 8
from_collapse_limit | 8

..and changing them doesn't seem to have any effect.  By my count there's 11
tables, not counting multiply a few used multiply..

Thanks again.

Justin