Re: Query optimizer 8.0.1 (and 8.0) - Mailing list pgsql-hackers

From pgsql@mohawksoft.com
Subject Re: Query optimizer 8.0.1 (and 8.0)
Date
Msg-id 16638.24.91.171.78.1107956759.squirrel@mail.mohawksoft.com
Whole thread Raw
In response to Re: Query optimizer 8.0.1 (and 8.0)  (Ron Mayer <ron@cheapcomplexdevices.com>)
Responses Re: Query optimizer 8.0.1 (and 8.0)  (Oleg Bartunov <oleg@sai.msu.su>)
Re: Query optimizer 8.0.1 (and 8.0)  (Tzahi Fadida <tzahi_ml@myrealbox.com>)
List pgsql-hackers
I wrote a message caled "One Big trend vs multiple smaller trends in table
statistics" that, I think, explains what we've been seeing.


> pgsql@mohawksoft.com wrote:
>>
>> In this case, the behavior observed could be changed by altering the
>> sample size for a table. I submit that an arbitrary fixed sample size is
>> not a good base for the analyzer, but that the sample size should be
>> based
>> on the size of the table or some calculation of its deviation.
>>
>
>    Mark,
>
> Do you have any evidence that the Sample Size had anything to do
> with the performance problem you're seeing?

Sample size is only a bandaid for the issue, however, more samples always
provide more information.


>
> I also do a lot with the complete Census/TIGER database.
>
> Every problem I have with the optimizer comes down to the
> fact that the data is loaded (and ordered on disk) by
> State/County FIPS codes, and then queried by zip-code
> or by city name.  Like this:
>
>      Alabama    36101 [hundreds of pages with zip's in 36***]
>      Alaska     99686 [hundreds of pages with zip's in 9****]
>      Arizona    85701 [hundreds of pages with zip's in 855**]
>
> Note that the zip codes are *NOT* sequential.

Again, read "One Big Trend..." and let me know what you think. I think it
describes exactly the problem that we see.

For now, the solution that works for me is to seriously up the value of
"targrows" in analyze.c. It makes it take longer, and while the stats are
not "correct" because they are not designed to detect these sorts of
patterns, a larger sample allows them to be "less wrong" enough to give a
better hint to the planner.




pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Is there a way to make VACUUM run completely outside
Next
From: "Gavin M. Roy"
Date:
Subject: Re: [GENERAL] PHP/PDO Database Abstraction Layer