Re: index problems (again) - Mailing list pgsql-general

From Victor Yegorov
Subject Re: index problems (again)
Date
Msg-id CAGnEboip1fNgh9hH9qgKC4L_rur9N6yWo23VPHMf9iX8_Jc3vA@mail.gmail.com
Whole thread Raw
In response to Re: index problems (again)  (Geoff Winkless <pgsqladmin@geoff.dj>)
Responses Re: index problems (again)  (Geoff Winkless <pgsqladmin@geoff.dj>)
List pgsql-general
2016-03-07 15:01 GMT+02:00 Geoff Winkless <pgsqladmin@geoff.dj>:
Unfortunately the cost of changing all the code that uses MIN() in
this way would be higher than just adding an extra index :(

I suppose the thought is that for selecting just the MIN() value, by
traipsing through the index you immediately find the lowest match - so
for a dataset where scdate cardinality is higher, this would make
sense; indeed if I give this query a value with scdate in the low
range of the table it returns quickly (although still slower than when
it uses the scdate index).

It seems to me that the weighting the planner applied to this MIN()
rule is too high, or perhaps it needs to pay more attention to the
statistics of the indexes for the WHERE clauses?

Even given that, I still don't see why the (scdate,sc_id) index isn't
perfect for this; it allows the planner to use sc_id for MIN() while
using scdate to restrict the values. Three values to look up from the
index-only.

Your `sc_id` and `scdate` columns are correlated.

Planner has no such knowledge and assumes columns being independent. Your `scdate` predicate is
estimate to return 14394 rows (based on the EXPLAIN of your first post). I think, that this corresponds to
a quite small portion of your table, less than 1% (based on `Rows Removed by Filter: 4068865` from the 
same EXPLAIN). Under uniform distribution, these 14394 rows can be anywhere in the table.
Therefore, reading min values in the order of your PK is optimal, as you're expected to hit a rows that
matches given conditions quite soon.

Problem is — your predicate matches a bunch of rows towards the end of the table, which causes Postgres
to read a big portion of your index before it finds the row that fits.


Right now (9.5 and earlier versions) I do not know of any options that would not require fixing your queries.


P.S. Maybe `Upper pathification` patch, that is being considered for 9.6, can deal with such cases.


--
Victor Y. Yegorov

pgsql-general by date:

Previous
From:
Date:
Subject: Re: Does a call to a language handler provide a context/session, and somewhere to keep session data?
Next
From: Jan de Visser
Date:
Subject: Re: Does a call to a language handler provide a context/session, and somewhere to keep session data?