Re: Optimizer choosing the wrong plan - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Optimizer choosing the wrong plan
Date
Msg-id CAMkU=1w2MDqZOvwqeB56vPdc1b0JOsBwPx3xcA8MvF4vzQLdZQ@mail.gmail.com
Whole thread Raw
In response to Optimizer choosing the wrong plan  (Viswanath <M.Viswanath16@gmail.com>)
Responses Re: Optimizer choosing the wrong plan  (Jim Finnerty <jfinnert@amazon.com>)
List pgsql-performance
On Mon, Nov 26, 2018 at 5:11 AM Viswanath <M.Viswanath16@gmail.com> wrote:
*Postgres server version -  9.5.10*
*RAM - 128 GB*
*WorkMem 64 MB*

*Problematic query with explain :*
*Query 1 (original):*
explain analyse SELECT myTable1.ID FROM myTable1 LEFT JOIN myTable2 ON
myTable1.ID=myTable2.ID WHERE  ((((myTable1.bool_val = true) AND
(myTable1.small_intval IN (1,2,3))) AND ((*myTable2.bigint_val = 1*) AND
(myTable1.bool_val = true))) AND (((myTable1.ID >= 1000000000000) AND
(myTable1.ID <= 1999999999999)) ))  ORDER BY 1 DESC , 1 NULLS FIRST  LIMIT
11;

There is no point doing a LEFT JOIN when the NULL-extended rows get filtered out later.

Also, ordering by the same column twice is peculiar, to say the least.


The table myTable2 contains *12701952* entries. Out of which only *86227* is
not null and *146* entries are distinct.

I assume you mean the column myTable2.ID has that many not null and distinct?
 

The above query returns 0 rows since 'myTable2.bigint_val = 1' criteria
satisfies nothing. It takes 6 seconds for execution as the planner chooses*
myTable1.ID column's index*.

More importantly, it chooses the index on myTable2.ID.  It does also use the index on myTable1.ID, but that is secondary.

The ideal index for this query would probably be a composite index on myTable2 (bigint_val, id DESC);
The planner would probably choose to use that index, even if the statistics are off.

I tried running *vacuum analyse* table many times, tried changing the
*statistics target of the column to 250 (since there are only 149 distinct
values)*. But none worked out. The planner thinks that there are *1727* rows
that matches the condition *myTable2.bigint_val = 1* but there are none.

It would interesting if you can upgrade a copy of your server to v11 and try it there.  We made changes to ANALYZE in that version which were intended to improve this situation, and it would be nice to know if it actually did so for your case.  

Also, increasing statistics target even beyond 250 might help.  If every one of the observed value is seen at least twice, it will trigger the system to assume that it has observed all distinct values that exist.  But if any of the values are seen exactly once, that causes it to take a different path (the one which got modified in v11).

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: dsa_allocate() faliure
Next
From: Jakub Glapa
Date:
Subject: Re: dsa_allocate() faliure