Home > mailing lists

Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters - Mailing list pgsql-performance

From	Cosmin Prund
Subject	Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters
Date	January 16, 2020 18:15:09
Msg-id	CAGU4dz-jq678=qccivjxJsrfG+jb0H-tLuHnPoJ=6BszDKJikQ@mail.gmail.com Whole thread
In response to	Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters (Michael Lewis <mlewis@entrata.com>)
Responses	Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters
List	pgsql-performance

Tree view

Hello Michael and hello again Tom, sorry for mailing you directly. I just hit Reply in gmail - I expected the emails to have a reply-to=Pgsql. Apparently they do not.

Running the same query with a different "Ver" produces a proper plan. Here's a non-redacted example (Ver=91):

EXPLAIN (ANALYZE, BUFFERS) select "IdRand", "IdColoana", "Valoare" from "LucrareBugetDate" where ("LucrareBugetVersiuneId" = 91) and ("LucrareBugetDateId" in (10,11));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using "IX_LucrareBugetDate_LucrareBugetVersiuneId_LucrareBugetDateId" on "LucrareBugetDate" (cost=0.56..4.95 rows=2 width=13) (actual time=3.617..3.631 rows=2 loops=1)
Index Cond: (("LucrareBugetVersiuneId" = 91) AND ("LucrareBugetDateId" = ANY ('{10,11}'::integer[])))
Buffers: shared hit=9 read=3
Planning time: 0.223 ms
Execution time: 3.663 ms
(5 rows)

I have reindex everything, not just this INDEX.

"reltuples" for this table is 41712436.

> I'd be curious of the fraction in the MCVs frequency list in stats indicates that rows with Ver = 92 are rare and therefore the index on only Ver column is sufficient to find the rows quickly.

There are 25 valid values for "Ver" in this database. I ran the query for all of them. The only one miss-behaving is "92". I ran the query with random values for Ver (invalid values), the query plan always attempts to use the index using both values.

I looked into "most_common_values" in pg_stats, this value (92) is not in that list.

Finally I ran "ANALYZE" again and now the problem went away. Running the query with Ver=92 uses the proper plan. I'm not happy with this - I know I haven't solved the problem (I've ran ANALYZE multiple times before).

On Thu, 16 Jan 2020 at 19:00, Michael Lewis <mlewis@entrata.com> wrote:

Does the behavior change with different values of Ver column? I'd be curious of the fraction in the MCVs frequency list in stats indicates that rows with Ver = 92 are rare and therefore the index on only Ver column is sufficient to find the rows quickly. What is reltuples for this table by the way?

I also wonder if the situation may be helped by re-indexing the "index on both columns" to remove any chance of issues on bloat in the index. Which order are the columns by the way? If Ver is first, is there also an index on only id column?. Since you aren't on v12, you don't get to re-index concurrently but I assume you know the work around of create concurrently (different name), drop concurrently (old one), and finally rename new index.

pgsql-performance by date:

From: Tom Lane
Date: 16 January 2020, 17:27:39
Subject: Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters

From: Laurenz Albe
Date: 16 January 2020, 18:20:12
Subject: Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters

Re: Bad query plan decision when using multiple column index -postgresql uses only first column then filters - Mailing list pgsql-performance

Previous

Next