Re: [PERFORM] Querying with multicolumn index - Mailing list pgsql-performance

From Daniel Blanch Bataller
Subject Re: [PERFORM] Querying with multicolumn index
Date
Msg-id 36867AC4-DDD8-464E-A0F7-066B4C88300C@gmail.com
Whole thread Raw
In response to Re: [PERFORM] Querying with multicolumn index  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [PERFORM] Querying with multicolumn index  (Daniel Blanch Bataller <daniel.blanch.bataller@gmail.com>)
List pgsql-performance
Hi all,

Thomas is absolutely right, the distribution I synthetically made, had 6M records but very old, 9M old, as you can see
ithad to skip 9M records before finding a suitable record using time index.  

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM updates WHERE driver_id = 100 ORDER BY "time" DESC LIMIT 1;
                                                                         QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..0.65 rows=1 width=36) (actual time=3827.807..3827.807 rows=1 loops=1)
   Buffers: shared hit=24592 read=99594 written=659
   ->  Index Scan Backward using updates_time_idx on updates  (cost=0.44..1284780.53 rows=6064800 width=36) (actual
time=3827.805..3827.805rows=1 loops=1) 
         Filter: (driver_id = 100)
         Rows Removed by Filter: 9000000
         Buffers: shared hit=24592 read=99594 written=659
 Planning time: 0.159 ms
 Execution time: 3827.846 ms
(8 rows)


Here you have my tests where I was able to reproduce the problem using default settings on 9.6, 9.5 and 9.3. 9.6 and
9.5choose the wrong index, while 9.3 didn’t. (update: 9.5 didn’t fail last time)  



However when I tried to add more than one value with this strange distribution ~ 30% of distribution to one value the
indexbad choice problem didn’t happen again in none of the different versions. 

I Hope this helps. Regards,

Daniel Blanch.


> El 10 dic 2016, a las 21:34, Tomas Vondra <tomas.vondra@2ndquadrant.com> escribió:
>
> Hi,
>
> On 12/10/2016 12:51 AM, Tom Lane wrote:
>> Eric Jiang <eric@doublemap.com> writes:
>>> I have a query that I *think* should use a multicolumn index, but
>>> sometimes isn't, resulting in slow queries.
>>
>> I tried to duplicate this behavior, without success.  Are you running
>> with nondefault planner parameters?
>>
>
> My guess is this is a case of LIMIT the matching rows are uniformly distributed in the input data. The planner likely
concludesthat for a driver with a lot of data we'll find the first row using ix_updates_time very quickly, and that it
willbe cheaper than inspecting the larger multi-column index. But imagine a driver with a lots of data long time ago.
Thatbreaks the LIMIT fairly quickly. 
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Attachment

pgsql-performance by date:

Previous
From: Eric Jiang
Date:
Subject: Re: [PERFORM] Querying with multicolumn index
Next
From: ProPAAS DBA
Date:
Subject: [PERFORM] Isolation of tx logs on VMware