Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters - Mailing list pgsql-performance

From Tom Lane
Subject Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters
Date
Msg-id 6083.1579200949@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters  (Cosmin Prund <cprund@gmail.com>)
Responses Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters
List pgsql-performance
Cosmin Prund <cprund@gmail.com> writes:
> Running the same query with a different "Ver" produces a proper plan.

Oh, that *is* interesting.

After studying the code a bit more I see why this is possible when I
originally thought not.  The case that you are interested in is one that
has special handling -- it's a "lower-order ScalarArrayOpExpr" in the
terms of the code.  This means that get_index_paths will actually produce
two index paths, one with the IN clause as an indexqual and one without,
because it expects that they have different sort behaviors [1].  So then
we do have a chance for a cost-based choice, making it possible for the
estimated selectivity of the higher-order clause to affect the outcome.

I'm still a bit surprised that it wouldn't choose the alternative with
the IN ... but if the estimated number of rows matching just the first
column is small enough, it might see the paths as having indistinguishable
costs, and then it's down to luck which it chooses.

> There are 25 valid values for "Ver" in this database. I ran the query for
> all of them. The only one miss-behaving is "92". I ran the query with
> random values for Ver (invalid values), the query plan always attempts to
> use the index using both values.
> I looked into "most_common_values" in pg_stats, this value (92) is not in
> that list.

Are the other 24 all in the list?

> Finally I ran "ANALYZE" again and now the problem went away. Running the
> query with Ver=92 uses the proper plan. I'm not happy with this - I know I
> haven't solved the problem (I've ran ANALYZE multiple times before).

Maybe increasing the stats target for the "Ver" column would help.  It
sounds like you want to get to a point where all the valid values are
given in the MCV list, so that the estimates for them will be accurate.

            regards, tom lane

[1] Right at the moment, it seems like that's wrong and we could just
generate one path.  Need to study this.



pgsql-performance by date:

Previous
From: Michael Lewis
Date:
Subject: Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters
Next
From: Cosmin Prund
Date:
Subject: Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters