Re: Overhauling GUCS - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Overhauling GUCS
Date
Msg-id 87iqwetvj5.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Overhauling GUCS  ("Greg Sabino Mullane" <greg@turnstep.com>)
Responses Re: Overhauling GUCS  ("Greg Sabino Mullane" <greg@turnstep.com>)
List pgsql-hackers
"Greg Sabino Mullane" <greg@turnstep.com> writes:

>> Also, I'd actually assert that "10" seems to be perfectly adequate for
>> the majority of users.  That is, the number of users where I've
>> recommended increasing d_s_t for the whole database is smaller than the
>> number where I don't, and of course we never hear from most users at
>> all.  So I'm pretty happy recommending "Leave the default.  If you
>> encounter problem queries, increase it to 100, and analyse the database.
>
> Really? I'm the opposite: I never leave a client's setting at 10, that's
> just asking for trouble. Making it 100 *after* you encounter problem
> queries is reactive; I prefer being proactive. 

Have you ever measured the system speed before and after?

> Nor is a setting of 10 "perfectly adequate":

What percentage of your plans actually change with the larger statistics? How
many for the better? How many were massively improved?

I suspect you're looking at some single-digit percentage slowdown for planning
across the board. In exchange if you a) have simple queries you probably see
none improving. If you b) have moderately complex queries you probably get
some single-digit percentage of them with improvements. And if you c) have
very complex queries you probably have a handful of them which see massive
improvements. Across the internet there are a whole lot more applications of
type (a) than the others...

> Frankly, I'd be shocked if there is any significant difference and all
> compared to the actual query run time.

Well you might start preparing to be shocked. Note that retrieving the
statistics is a query itself so it's not hard for it to be comparable to a
similarly simple query. It's not hard for a simple query using multiple
columns to be using more records of statistics than it is from the actual
data. And things can look much worse if that data is TOASTed and requires
further lookups and/or decompression...

> The orders of magnitude speed up of certain queries when the d_s_t goes
> above 98 is what spawned my original thread proposing a change to 100:
>
> http://markmail.org/message/tun3a3juxlsyjbsw
>
> While it's easy to get bogged down in theory about what things
> d_s_t should measure, the optimal size of buckets, etc., it's still
> a severe performance regression bug that should be fixed, IMO.

It was, three months ago:

http://archives.postgresql.org/pgsql-committers/2008-03/msg00129.php

100 is no longer a magic threshold for LIKE queries (in CVS HEAD)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


pgsql-hackers by date:

Previous
From: James William Pye
Date:
Subject: Re: Options for protocol level cursors
Next
From: Josh Berkus
Date:
Subject: Re: Overhauling GUCS