Re: index problems (again) - Mailing list pgsql-general

From Geoff Winkless
Subject Re: index problems (again)
Date
Msg-id CAEzk6ffQwpZUSjsrs98igJzqU9AU6H0HH45UaruLtXN0HnAVJg@mail.gmail.com
Whole thread Raw
In response to Re: index problems (again)  (Victor Yegorov <vyegorov@gmail.com>)
Responses Re: index problems (again)  (Victor Yegorov <vyegorov@gmail.com>)
Re: index problems (again)  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
On 7 March 2016 at 11:48, Victor Yegorov <vyegorov@gmail.com> wrote:
> 2016-03-07 13:38 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>:
>>
>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
>> BETWEEN 20160219 AND 20160221;
>
>
> Will it help if you'll add `count(*)` to your query like this:
>
>     SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND
> 20160221;

Thanks for the reply.

Yes, that does work around the problem, sort-of (although it's only
using the scdate-only index, since it needs all the data):

 Aggregate  (cost=1242.59..1242.60 rows=1 width=4)
   ->  Index Scan using legs_scdate_idx on legs  (cost=0.43..1170.62
rows=14394 width=4)
         Index Cond: ((scdate >= 20160219) AND (scdate <= 20160221))

Unfortunately the cost of changing all the code that uses MIN() in
this way would be higher than just adding an extra index :(

I suppose the thought is that for selecting just the MIN() value, by
traipsing through the index you immediately find the lowest match - so
for a dataset where scdate cardinality is higher, this would make
sense; indeed if I give this query a value with scdate in the low
range of the table it returns quickly (although still slower than when
it uses the scdate index).

It seems to me that the weighting the planner applied to this MIN()
rule is too high, or perhaps it needs to pay more attention to the
statistics of the indexes for the WHERE clauses?

Even given that, I still don't see why the (scdate,sc_id) index isn't
perfect for this; it allows the planner to use sc_id for MIN() while
using scdate to restrict the values. Three values to look up from the
index-only.

If I manually change the query to do what I hoped the planner would do for me:

SELECT LEAST(( SELECT MIN(sc_id) FROM legs WHERE scdate =20160219), (
SELECT MIN(sc_id) FROM legs WHERE scdate =20160220), ( SELECT
MIN(sc_id) FROM legs WHERE scdate =20160221));

it returns in 16ms - and uses the (scdate_sc_id_idx) index as
expected; again though, I can't really justify changing all the code
to do that instead.

Geoff


pgsql-general by date:

Previous
From: Victor Yegorov
Date:
Subject: Re: index problems (again)
Next
From:
Date:
Subject: Re: Does a call to a language handler provide a context/session, and somewhere to keep session data?