Re: Why does a simple query not use an obvious index? - Mailing list pgsql-performance
From | Greg Stark |
---|---|
Subject | Re: Why does a simple query not use an obvious index? |
Date | |
Msg-id | 87ekloidls.fsf@stark.xeocode.com Whole thread Raw |
In response to | Re: Why does a simple query not use an obvious index? (Mr Pink <mr_pink_is_the_only_pro@yahoo.com>) |
Responses |
Re: Why does a simple query not use an obvious index?
Re: Why does a simple query not use an obvious index? |
List | pgsql-performance |
[I'm actually responding to the previous post from Tom Lane, but I've deleted it and the archives seem to be down again.] The assumption being made is that the first provided result is representative of all future results. I don't see any reason that making this assumption of all stable functions should be less scary than making the assumption about user provided parameters. However I have the complementary reaction. I find peeking at the first bind parameter to be scary as hell. Functions seem slightly less scary. On Oracle Peeking at bind parameters is a feature explicitly intended for DSS data warehouse type systems. The use of placeholders there was purely for security and programming ease, not efficiency, since the queries are only planned executed a small number of times per plan. These are systems that suffered enormously without the parameter values. They often involved full table scans or bitmap index scans and without the statistics produced awful plans. For OLTP systems peeking at placeholders is more a danger than a benefit. The query will be executed thousands of times and if it's planned based on a single unusual value initially the entire system could fail. Consider the following scenario which isn't farfetched at all. In fact I think it well describes my current project: I have a table with a few million records. 99% of the time users are working with only a few hundred records at most. There's an index on the column they're keying off of. 1% of the key values have an unusually large number of records. Without peeking at placeholders the system should see that virtually all the key values are well under the threshold for an index scan to be best. So it always uses an index scan. 1% of the time it takes longer than that it would have with a sequential scan, but only by a small factor. (On the whole we're probably still better off avoiding the cache pollution anyways.) With peeking at placeholders 99% of the backends would perform the same way. However 1 backend in 100 sees one of these unusual values for its first query. This backend will use a sequential scan for *every* request. Executing a sequential table scan of this big table once a second this backend will drive the entire system into the ground. This means every time I start the system up I stand a small but significant chance of it just completely failing to perform properly. Worse, apache is designed to periodically start new processes, so at any given time the system could just randomly fall over and die. I would rather incur a 10% penalty on every query than have a 1% chance of it keeling over and dieing. Given this I would when I upgrade to 8.0 have to ensure that my application driver is either not using placeholders at all (at the protocol level -- I always prefer them at the api level) or ensure that postgres is *not* peeking at the value. I like the feature but I just want to be sure that it's optional. -- greg
pgsql-performance by date: