Re: again on index usage - Mailing list pgsql-hackers

From Daniel Kalchev
Subject Re: again on index usage
Date
Msg-id 200201080922.LAA02480@dcave.digsys.bg
Whole thread Raw
In response to Re: again on index usage  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: again on index usage  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>>>Tom Lane said:> It's difficult to tell from this what it thinks the selectivity of the> ipdate index would be, since
therows estimate includes the effect of> the ipaddr and router restrictions.  What do you get from just> > explain>
SELECTsum(input) FROM iplog_gate200112 > WHERE >   '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02
00:00:00+02'A    ND >   '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02';
 

Same result (sorry, should have included this originally):


Aggregate  (cost=47721.72..47721.72 rows=1 width=8) ->  Seq Scan on iplog_gate200112  (cost=0.00..47579.54 rows=56873
width=8)

> If you say "set enable_seqscan to off", does that change the plan?

Yes. As expected (I no longer have the problem of NaN estimates :)

Aggregate  (cost=100359.71..100359.71 rows=1 width=8) ->  Index Scan using iplog_gate200112_ipdate_idx on
iplog_gate200112 
 
(cost=0.00..100217.52 rows=56873 width=8)

My belief is that the planner does not want to use index due to low value 
dispersion of the indexed attribute. When splitting the table into several 
smaller tables, index is used.

This bites me, because each such query takes at least 3 minutes and the script 
that generates these needs to execute few thousands queries.
> BTW, the planner does not associate function calls with indexes.  If you> want to have the ipaddr index considered
forthis query, you need to write> ipaddr <<= '193.68.240.0/20' not network_subeq(ipaddr, '193.68.240.0/20').> (But
IIRC,that only works in 7.2 anyway, not earlier releases :-()
 

This is what I though too, but using the ipdate index will be sufficient.

I understand my complaint is not a bug, but rather question of proper planner 
optimization (it worked 'as expected' in 7.0). Perhaps the planner should 
consider the total number of rows, as well as the dispersion factor. With the 
dispersion being around 1.5% and total rows 1.7 million this gives about 25k 
rows with the same value - large enough to trigger sequential scan, as far as 
I understand it, but the cost of scanning 1.7 million rows sequentially is 
just too high.

By the way, the same query takes approx 10 sec with set enable_seqscan to off.

Daniel



pgsql-hackers by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: ON ERROR triggers
Next
From: Holger Krug
Date:
Subject: Re: ON ERROR triggers