Re: strange row count estimates with conditions on multiple column - Mailing list pgsql-general

From Tom Lane
Subject Re: strange row count estimates with conditions on multiple column
Date
Msg-id 13285.1289962995@sss.pgh.pa.us
Whole thread Raw
In response to strange row count estimates with conditions on multiple column  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: strange row count estimates with conditions on multiple column  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-general
Tomas Vondra <tv@fuzzy.cz> writes:
> Results from (A) and (B) seem strange to me because AFAIK there are no
> multi-column statistics available, and accoring to this thread

> http://archives.postgresql.org/pgsql-hackers/2009-03/msg00052.php

> the single-column estimates are not multiplied (which would be OK only
> in case of statistically independent columns).

You're misreading that thread: it's discussing row inequality
comparisons, as in your example (D).  Row equality comparisons are the
same as a bunch of per-column equality comparisons, which is why (A) and
(B) behave the same, and for that you *will* get a multiplication of the
assumed-independent clause selectivities.

> I'm not quite sure why (C) has an estimate of 1.

It's smart enough to see that each of the clauses is a range constraint
on the variable, so you get fairly tight estimates on the number of
matches ... and then those two small selectivities are multiplied
together.  It does not however notice that the range bounds are actually
equal, which would allow it to convert the estimate to a simple equality
estimate, which in many cases (including this one) would be better.
I think we've discussed special-casing that, but it doesn't look like
anybody got around to it yet.  It's a little bit tricky to do because
the range estimator doesn't really distinguish < from <= --- which
normally doesn't matter a lot, but it does when you're considering
"x >= 33 and x <= 33" versus "x > 33 and x < 33".

> And I do have exactly the same problem with the estimate in (D). Where
> the heck did 227232 come from?

It doesn't recognize that this case is a range comparison (which was a
point made in the thread you cited).  So you get a dumb multiplication
of the selectivities for col_a >= 33 and col_a <= 33.

            regards, tom lane

pgsql-general by date:

Previous
From: Glen Parker
Date:
Subject: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
Next
From: Tom Lane
Date:
Subject: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!