Re: [PERFORM] Querying with multicolumn index - Mailing list pgsql-performance

From Eric Jiang
Subject Re: [PERFORM] Querying with multicolumn index
Date
Msg-id CAOfJSTxCfzoKxZ72nFV-Ow8mSch8b6C8qS6efH_1_Oi7SA7acQ@mail.gmail.com
Whole thread Raw
In response to Re: [PERFORM] Querying with multicolumn index  (Daniel Blanch Bataller <daniel.blanch.bataller@gmail.com>)
Responses Re: [PERFORM] Querying with multicolumn index
List pgsql-performance
We aren't using any special planner settings - all enable_* options are "on".

On Sat, Dec 10, 2016 at 12:06 AM, Daniel Blanch Bataller
<daniel.blanch.bataller@gmail.com> wrote:
> As a quick fix: Have you considered dropping ix_updates_time index?

We do occasionally want to use ix_updates_time, although not very often.

> I’ve been able to reproduce the issue, but with bigger numbers than you. When I dropped ix_updates_time it went much
muchfaster. It uses updates_driver_id_time_idx index instead. 
>
> For some reason the planner does not make a good estimation in this case. Can you show us EXPLAIN (ANALYZE, BUFFERS)
beforeand after dropping ix_updates_time index? Can you show us too the output of SHOW shared_buffers; ? 

Here's EXPLAIN (ANALYZE, BUFFERS) with the above bad query on a cold cache:

 Limit  (cost=0.09..0.70 rows=1 width=56) (actual
time=244846.915..244846.915 rows=1 loops=1)
   Buffers: shared hit=3999254 read=57831
   I/O Timings: read=242139.661
   ->  Index Scan Backward using ix_updates_time on updates
(cost=0.09..271696.74 rows=442550 width=56) (actual
time=244846.913..244846.913 rows=1 loops=1)
         Filter: (driver_id = 30132)
         Rows Removed by Filter: 5316811
         Buffers: shared hit=3999254 read=57831
         I/O Timings: read=242139.661
 Total runtime: 244846.946 ms

and after dropping ix_updates_time:

 Limit  (cost=0.11..0.98 rows=1 width=56) (actual time=2.270..2.271
rows=1 loops=1)
   Buffers: shared hit=1 read=4
   I/O Timings: read=2.230
   ->  Index Scan Backward using updates_driver_id_time_idx on updates
 (cost=0.11..382307.69 rows=442550 width=56) (actual time=2.270..2.270
rows=1 loops=1)
         Index Cond: (driver_id = 30132)
         Buffers: shared hit=1 read=4
         I/O Timings: read=2.230
 Total runtime: 2.305 ms

and `SHOW shared_buffers;`

 shared_buffers
----------------
 244MB

> I suspect the issue has to do with low shared_buffers configuration and cache misses, and maybe some costs suboptimal
configurationI’ll try to find it out, if anyone can enlighten us it will be very welcomed. 
>
>
> P.S. Meanwhile If you still need 'time' index, you can create an index using ‘time' and ‘customer_id' in this order.

Did you mean an index on (time, driver_id)? I did:

CREATE INDEX CONCURRENTLY ix_updates_time_driver_id ON updates
("time", driver_id)

but seems like the planner will use it for driver_id having more than
~300k rows:

 Limit  (cost=0.11..0.79 rows=1 width=56) (actual
time=115.051..115.052 rows=1 loops=1)
   Buffers: shared hit=20376
   ->  Index Scan Backward using ix_updates_time_driver_id on updates
(cost=0.11..302189.90 rows=443924 width=56) (actual
time=115.048..115.048 rows=1 loops=1)
         Index Cond: (driver_id = 30132)
         Buffers: shared hit=20376
 Total runtime: 115.091 ms

It does seem faster than when having an index on just "time", but
still not optimal.

Really appreciate everyone's help!

--
Eric Jiang, DoubleMap
eric@doublemap.com | www.doublemap.com


pgsql-performance by date:

Previous
From: Daniel Blanch Bataller
Date:
Subject: Re: [PERFORM] Querying with multicolumn index
Next
From: Tomas Vondra
Date:
Subject: Re: [PERFORM] Querying with multicolumn index