Thread: a question about row estimation in postgres

a question about row estimation in postgres

From

Reynold Xin

Date:

20 March 2011, 20:13:00

I have a rankings table and it has 1302 rows in total. I am a bit confused by how postgres (8.2.11) calculates the cardinality for this rankings table based on < predicates on gradrate attribute.

select histogram_bounds from pg_stats where attname = 'gradrate' and tablename = 'rankings';         histogram_bounds
------------------------------------{8,33,40,46,55,61,69,75,81,90,118}

explain SELECT * FROM rankings WHERE gradrate < 11;                                  QUERY PLAN                                   
---------------------------------------------------------------------------------Index Scan using gradrate_idx on rankings  (cost=0.00..44.24 rows=11 width=196)  Index Cond: (gradrate < 11::double precision)
(2 rows)

explain select * from rankings where gradrate < 10;

QUERY PLAN

--------------------------------------------------------------------------------

Index Scan using gradrate_idx on rankings (cost=0.00..32.24 rows=7 width=196)

Index Cond: (gradrate < 10::double precision)

(2 rows)

Following the formula outlined in http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html

Both gradrate 10 and gradrate 11 would fall in the first bucket.

Shouldn't the row estimation be:

(11 - 8) / (33 - 8) / 10 * 1302 = 15.624

and

(10 - 8) / (33 - 8) / 10 * 1302 = 10.416

instead of 11 and 7?

Perhaps I am missing something. I'd appreciate if you can point it out. Thanks!

Reynold Xin

Re: a question about row estimation in postgres

From

Tomas Vondra

Date:

20 March 2011, 21:06:12

Dne 20.3.2011 08:20, Reynold Xin napsal(a):
> Following the formula outlined
> in http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html
>
> Both gradrate 10 and gradrate 11 would fall in the first bucket.
>
> Shouldn't the row estimation be:
>
> (11 - 8) / (33 - 8) / 10 * 1302 = 15.624
> and
> (10 - 8) / (33 - 8) / 10 * 1302 = 10.416
>
> instead of 11 and 7?
>
> Perhaps I am missing something. I'd appreciate if you can point it out.
> Thanks!

I haven't checked the exact estimation algorithm implementation, but
I've noticed you're using the exact cardinality (1302). The planner does
not have this exact information, it has to work with the value stored in
pg_class.reltuples - what does this query return?

  select reltuples from rankings where relname = 'rankings'


I guess it will be about 900.

Tomas

Re: a question about row estimation in postgres

From

Nathan Boley

Date:

20 March 2011, 21:35:55

> Following the formula outlined
> in http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html
> Both gradrate 10 and gradrate 11 would fall in the first bucket.
> Shouldn't the row estimation be:
> (11 - 8) / (33 - 8) / 10 * 1302 = 15.624
> and
> (10 - 8) / (33 - 8) / 10 * 1302 = 10.416
> instead of 11 and 7?
> Perhaps I am missing something. I'd appreciate if you can point it out.

IIRC, you need to subtract the rows that are mcv's from the row count.
ie, you're not working with the full 1302 rows, only the ones that are
not mcv's.

Best,
Nathan