Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters - Mailing list pgsql-performance

From Cosmin Prund
Subject Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters
Date
Msg-id CAGU4dz-jq678=qccivjxJsrfG+jb0H-tLuHnPoJ=6BszDKJikQ@mail.gmail.com
Whole thread Raw
In response to Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters  (Michael Lewis <mlewis@entrata.com>)
Responses Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters  (Michael Lewis <mlewis@entrata.com>)
Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hello Michael and hello again Tom, sorry for mailing you directly. I just hit Reply in gmail - I expected the emails to have a reply-to=Pgsql. Apparently they do not.

Running the same query with a different "Ver" produces a proper plan. Here's a non-redacted example (Ver=91):

EXPLAIN (ANALYZE, BUFFERS) select "IdRand", "IdColoana", "Valoare" from "LucrareBugetDate" where ("LucrareBugetVersiuneId" = 91) and ("LucrareBugetDateId" in (10,11));
                                                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using "IX_LucrareBugetDate_LucrareBugetVersiuneId_LucrareBugetDateId" on "LucrareBugetDate"  (cost=0.56..4.95 rows=2 width=13) (actual time=3.617..3.631 rows=2 loops=1)
   Index Cond: (("LucrareBugetVersiuneId" = 91) AND ("LucrareBugetDateId" = ANY ('{10,11}'::integer[])))
   Buffers: shared hit=9 read=3
 Planning time: 0.223 ms
 Execution time: 3.663 ms
(5 rows)

I have reindex everything, not just this INDEX.

"reltuples" for this table is 41712436.

> I'd be curious of the fraction in the MCVs frequency list in stats indicates that rows with Ver = 92 are rare and therefore the index on only Ver column is sufficient to find the rows quickly.

There are 25 valid values for "Ver" in this database. I ran the query for all of them. The only one miss-behaving is "92". I ran the query with random values for Ver (invalid values), the query plan always attempts to use the index using both values.
I looked into "most_common_values" in pg_stats, this value (92) is not in that list.
Finally I ran "ANALYZE" again and now the problem went away. Running the query with Ver=92 uses the proper plan. I'm not happy with this - I know I haven't solved the problem (I've ran ANALYZE multiple times before).


On Thu, 16 Jan 2020 at 19:00, Michael Lewis <mlewis@entrata.com> wrote:
Does the behavior change with different values of Ver column? I'd be curious of the fraction in the MCVs frequency list in stats indicates that rows with Ver = 92 are rare and therefore the index on only Ver column is sufficient to find the rows quickly. What is reltuples for this table by the way?

I also wonder if the situation may be helped by re-indexing the "index on both columns" to remove any chance of issues on bloat in the index. Which order are the columns by the way? If Ver is first, is there also an index on only id column?. Since you aren't on v12, you don't get to re-index concurrently but I assume you know the work around of create concurrently (different name), drop concurrently (old one), and finally rename new index.

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters
Next
From: Laurenz Albe
Date:
Subject: Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters