Re: benchmarking the query planner - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: benchmarking the query planner
Date
Msg-id 49423C06.EE98.0025.0@wicourts.gov
Whole thread Raw
In response to Re: benchmarking the query planner  ("Nathan Boley" <npboley@gmail.com>)
Responses Re: benchmarking the query planner
List pgsql-hackers
>>> "Nathan Boley" <npboley@gmail.com> wrote: 
> Can anyone suggest a good data set to test this sort of question on?
Where we have the biggest problem with bad estimates is on complex
searches involving many joins where the main criterion limiting the
result set is a name.  The estimate based on the histogram is often
very low (e.g. 2) when the actual result set is several hundred. 
While several hundred is far short of 1% of the table, the best plan
for a result set of that size is very different than the best plan for
two rows.
Some numbers follow to give an idea of the shape of data where current
techniques sometimes do poorly.  We use a "searchName" column which
puts the name components from various columns into a canonical format;
this is what is indexed and searched.  The search is usually a LIKE
with the high order portion being six to ten characters followed by
the wild card.
Total rows in table: 32,384,830
There are 9,958,969 distinct values.
There is one value present in over 1% of the rows, with 433,578 rows.
There are ten values present in over 0.1% of the rows:433578140398135489112088 64069 63158 44656 36499 35896 35819
The 100th most common value is present in 4847 rows.
There are 186 rows with over 0.01% of the rows.
Based on my experience, we would need better estimates for ranges with
200 to 300 rows to improve our plans for the problem cases.  I'd be
happy to have it scan the whole table during our nightly VACUUM
ANALYZE if that would get me statistics which would improve the
estimates to that degree without a huge increase in plan time.
Which raises the issue, if we could get better statistics by passing
the whole table, why not do that when VACUUM ANALYZE is run?
-Kevin


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: benchmarking the query planner
Next
From: "Douglas McNaught"
Date:
Subject: Re: WIP: default values for function parameters