Home > mailing lists

Re: Introducing floating point cast into filter drastically changes row estimate - Mailing list pgsql-bugs

From	Merlin Moncure
Subject	Re: Introducing floating point cast into filter drastically changes row estimate
Date	October 24, 2012 22:47:59
Msg-id	CAHyXU0wcNjxwyefE8Vd8cUsHK2yeA_VpAUgnEnF1-VkKXKD=eA@mail.gmail.com Whole thread Raw
In response to	Re: Introducing floating point cast into filter drastically changes row estimate (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Introducing floating point cast into filter drastically changes row estimate (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-bugs

Tree view

On Wed, Oct 24, 2012 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> The following query runs fine: it estimates the returned rows pretty wel:
>> postgres=# explain analyze  select * from foo where i > 100 and i < 10000;
>
>> ...but if you introduce a floating point cast, it drastically changes
>> the returned rows (why?):
>
>> postgres=# explain analyze  select * from foo where i::float8 > 100
>> and i::float8 < 10000;
>
> The planner has stats about "i", but none about "i::float8", so you're
> getting a default estimate in the second case.  It does, however,
> realize that you're applying a range restriction condition to
> "i::float8", and the default selectivity estimate for that is
> intentionally pretty small.

Yeah -- I have a case where a large number of joins are happening that
have a lot of filtering based on expressions and things like that.  I
didn't write the SQL, but the characteristics are pretty typical for
code in this particular branch of the application.  Unfortunately, the
effects multiply (both in the where clause in and in various joins)
and the row count estimates quickly degrade to 1 which is off by
orders of magnitude.  The planner then favors materialization and
nestloops which leads to basically unbounded query times given that
the 'inner' scan is a seqscan.  Disabling nestloops fixes the issue,
but now the server favors hash joins (in this particular case it's ok,
but the hash memory usage is quite high).

(see here: http://explain.depesz.com/s/gmL for an example of real word
explain ... the "Seq Scan on yankee_bravo (cost=0.000..727319.040
rows=14 width=71) (actual time=.. rows= loops=)" returns 1000's of
rows, not 14).

I've been thinking about this all morning and I think there's a
fundamental problem here: the planner is using low confidence
estimates in order to pick plans that really only be used when the
plan is relatively precise.  In particular, I think the broad
assumption that rows pruned via default selectivity should be capped,
say to the lesser of 1000 or the greatest known value if otherwise
constrained.

merlin

pgsql-bugs by date:

From: Tom Lane
Date: 24 October 2012, 22:26:46
Subject: Re: Introducing floating point cast into filter drastically changes row estimate

From: Tom Lane
Date: 24 October 2012, 23:33:20
Subject: Re: Introducing floating point cast into filter drastically changes row estimate

Re: Introducing floating point cast into filter drastically changes row estimate - Mailing list pgsql-bugs

Previous

Next