Re: A limit clause can cause a poor index choice - Mailing list pgsql-general

From Mohamed Wael Khobalatte
Subject Re: A limit clause can cause a poor index choice
Date
Msg-id CABZeWdxy7Ryp-ETb4gm+o0B4X38e=RVkdMypWNDAybEo6uwGPg@mail.gmail.com
Whole thread Raw
In response to A limit clause can cause a poor index choice  (Nick Cleaton <nick@cleaton.net>)
Responses Re: A limit clause can cause a poor index choice
Re: A limit clause can cause a poor index choice
List pgsql-general
Hi Nick, 

I believe a second ordering, by id desc, will get your query to use the right index, and shouldn't be functionally different from what you would expect.  

```
select * from test_orders where

customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])

order by o_date desc, id desc limit 10;
```

I didn't look closely as to why from your data though. I'll leave it to more experienced people to comment as to why the planner misjudged your query badly. What happens when you raise the limit? Say to a 1000?

On Tue, May 19, 2020 at 3:00 PM Nick Cleaton <nick@cleaton.net> wrote:
The attached script builds a 10G test table which demonstrates a
problem that we have in production with postgresql 12.3-1.pgdg18.04+1
on ubuntu linux. Indexes:

test_orders_o_date_idx btree(o_date)
test_orders_customer_id_o_date_idx btree(customer_id, o_date)

We query for the most recent orders for sets of customers, and
sometimes none of those customers have any orders and the results are
empty:

explain analyze select * from test_orders where
customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])
order by o_date desc;

    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=24848.96..24870.67 rows=8686 width=1839) (actual
time=1.101..1.102 rows=0 loops=1)
   Sort Key: o_date DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using test_orders_customer_id_o_date_idx on
test_orders  (cost=0.43..17361.20 rows=8686 width=1839) (actual
time=1.047..1.047 rows=0 loops=1)
         Index Cond: (customer_id = ANY
('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[]))
 Planning Time: 3.821 ms
 Execution Time: 1.174 ms
(7 rows)

So far so good. But if we add a limit clause to the query then the
plan goes very wrong:

explain analyze select * from test_orders where
customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])
order by o_date desc limit 10;

      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..1660.98 rows=10 width=1839) (actual
time=4990.424..4990.424 rows=0 loops=1)
   ->  Index Scan Backward using test_orders_o_date_idx on test_orders
 (cost=0.43..1442355.43 rows=8686 width=1839) (actual
time=4990.423..4990.423 rows=0 loops=1)
         Filter: (customer_id = ANY
('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[]))
         Rows Removed by Filter: 5000000
 Planning Time: 0.063 ms
 Execution Time: 4990.435 ms


Is there something we can adjust to get it to prefer
test_orders_customer_id_o_date_idx even when there's a limit clause ?

pgsql-general by date:

Previous
From: Nick Cleaton
Date:
Subject: A limit clause can cause a poor index choice
Next
From: Michael Lewis
Date:
Subject: Re: A limit clause can cause a poor index choice