On Mon, Mar 06, 2017 at 12:17:22PM +0000, Dinesh Chandra 12108 wrote:
> Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;
>
>
> schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals |
most_common_freqs| histogram_bounds | correlation
>
>
"evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.00393333,0.00183333,0.00146667,0.0005,0.0003,6.66667e-05,6.66667e-05}"|""|0.889078
On Fri, Mar 03, 2017 at 12:44:07PM +0000, Dinesh Chandra 12108 wrote:
> -> Index Scan using point_domain_class_id_index on point p (cost=0.00..1483472.70 rows=1454751 width=16)
(actualtime=27.265..142101.1 59 rows=1607491 loops=1)
On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id. In which case, why isn't it just faster?
I missed your response until now, and can't see that anybody else responded,
but I suspect the issue is that the *table* is highly correlated WRT this
column, but the index may not be, probably due to duplicated index keys.
postgres only stores statistics on expression indices, and falls back to
correlation of table column of a simple indices.
If you're still fighting this, would you send result of:
SELECT domain_class_id, count(1) FROM point GROUP BY 1 ORDER BY 2 DESC LIMIT 22;
or,
SELECT count(1) FROM point GROUP BY domain_class_id ORDER BY 1 DESC LIMIT 22;
if there's much repetition in the index keys, then PG's planner thinks an index
scan has low random_page_cost, and effective_cache_size has little effect on
large tables, and it never uses bitmap scan, which blows up if the index is
fragmented and has duplicate keys. The table reads end up costing something
like 1454751*random_page_cost nonsequential reads and fseek() calls when it
thinks it'll cost only 1454751*16*seq_page_cost.
Is the query much faster if you first reindex point_domain_class_id_index ?
This has come up before, see:
> https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#520D6610.8040907@emulex.com
>
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.GA11880@telsasoft.com
>
https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9rk1srebjvhphg0lm8dou1kn@4ax.com
Justin