Thread: a question about row estimation in postgres
I have a rankings table and it has 1302 rows in total. I am a bit confused by how postgres (8.2.11) calculates the cardinality for this rankings table based on < predicates on gradrate attribute.
select histogram_bounds from pg_stats where attname = 'gradrate' and tablename = 'rankings'; histogram_bounds ------------------------------------{8,33,40,46,55,61,69,75,81,90,118}
explain SELECT * FROM rankings WHERE gradrate < 11; QUERY PLAN ---------------------------------------------------------------------------------Index Scan using gradrate_idx on rankings (cost=0.00..44.24 rows=11 width=196) Index Cond: (gradrate < 11::double precision) (2 rows)
explain select * from rankings where gradrate < 10;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using gradrate_idx on rankings (cost=0.00..32.24 rows=7 width=196)
Index Cond: (gradrate < 10::double precision)
(2 rows)
Following the formula outlined in http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html
Both gradrate 10 and gradrate 11 would fall in the first bucket.
Shouldn't the row estimation be:
(11 - 8) / (33 - 8) / 10 * 1302 = 15.624
and
(10 - 8) / (33 - 8) / 10 * 1302 = 10.416
instead of 11 and 7?
Perhaps I am missing something. I'd appreciate if you can point it out. Thanks!
--
Reynold Xin
Dne 20.3.2011 08:20, Reynold Xin napsal(a): > Following the formula outlined > in http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html > > Both gradrate 10 and gradrate 11 would fall in the first bucket. > > Shouldn't the row estimation be: > > (11 - 8) / (33 - 8) / 10 * 1302 = 15.624 > and > (10 - 8) / (33 - 8) / 10 * 1302 = 10.416 > > instead of 11 and 7? > > Perhaps I am missing something. I'd appreciate if you can point it out. > Thanks! I haven't checked the exact estimation algorithm implementation, but I've noticed you're using the exact cardinality (1302). The planner does not have this exact information, it has to work with the value stored in pg_class.reltuples - what does this query return? select reltuples from rankings where relname = 'rankings' I guess it will be about 900. Tomas
> Following the formula outlined > in http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html > Both gradrate 10 and gradrate 11 would fall in the first bucket. > Shouldn't the row estimation be: > (11 - 8) / (33 - 8) / 10 * 1302 = 15.624 > and > (10 - 8) / (33 - 8) / 10 * 1302 = 10.416 > instead of 11 and 7? > Perhaps I am missing something. I'd appreciate if you can point it out. IIRC, you need to subtract the rows that are mcv's from the row count. ie, you're not working with the full 1302 rows, only the ones that are not mcv's. Best, Nathan