Home > mailing lists

Re: Bad Planner Statistics for Uneven distribution. - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Bad Planner Statistics for Uneven distribution.
Date	July 22, 2006 14:04:11
Msg-id	27058.1153587838@sss.pgh.pa.us Whole thread Raw
In response to	Re: Bad Planner Statistics for Uneven distribution. ("Guillaume Smet" <guillaume.smet@gmail.com>)
List	pgsql-performance

Tree view

"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> Isn't there any way to make PostgreSQL have a better estimation here:
> ->  Index Scan using models_brands_brand on models_brands
> (cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008
> rows=0 loops=303)
>            Index Cond: (brand = $0)

Note that the above plan extract is pretty misleading, because it
doesn't account for the implicit "LIMIT 1" of an EXISTS() clause.
What the planner is *actually* imputing to this plan is 216410.97/92372
cost units, or about 2.34.  However that applies to the seqscan variant
as well.

I think the real issue with Kevin's example is that when doing an
EXISTS() on a brand_id that doesn't actually exist in the table, the
seqscan plan has worst-case behavior (ie, scan the whole table) while
the indexscan plan still manages to be cheap.  Because his brands table
has so many brand_ids that aren't in the table, that case dominates the
results.  Not sure how we could factor that risk into the cost
estimates.  The EXISTS code could probably special-case it reasonably
well for the simplest seqscan and indexscan subplans, but I don't see
what to do with more general subqueries (like joins).

            regards, tom lane

pgsql-performance by date:

From: Tom Lane
Date: 22 July 2006, 13:52:42
Subject: Re: Forcing using index instead of sequential scan?

From: "Craig A. James"
Date: 22 July 2006, 15:27:23
Subject: Re: Forcing using index instead of sequential scan?

Re: Bad Planner Statistics for Uneven distribution. - Mailing list pgsql-performance

Previous

Next