Re: Why does a simple query not use an obvious index? - Mailing list pgsql-performance

From Greg Stark
Subject Re: Why does a simple query not use an obvious index?
Date
Msg-id 87ekloidls.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Why does a simple query not use an obvious index?  (Mr Pink <mr_pink_is_the_only_pro@yahoo.com>)
Responses Re: Why does a simple query not use an obvious index?
Re: Why does a simple query not use an obvious index?
List pgsql-performance

[I'm actually responding to the previous post from Tom Lane, but I've deleted
it and the archives seem to be down again.]


The assumption being made is that the first provided result is representative
of all future results. I don't see any reason that making this assumption of
all stable functions should be less scary than making the assumption about
user provided parameters.

However I have the complementary reaction. I find peeking at the first
bind parameter to be scary as hell. Functions seem slightly less scary.

On Oracle Peeking at bind parameters is a feature explicitly intended for DSS
data warehouse type systems. The use of placeholders there was purely for
security and programming ease, not efficiency, since the queries are only
planned executed a small number of times per plan. These are systems that
suffered enormously without the parameter values. They often involved full
table scans or bitmap index scans and without the statistics produced awful
plans.

For OLTP systems peeking at placeholders is more a danger than a benefit. The
query will be executed thousands of times and if it's planned based on a
single unusual value initially the entire system could fail.

Consider the following scenario which isn't farfetched at all. In fact I think
it well describes my current project:

I have a table with a few million records. 99% of the time users are working
with only a few hundred records at most. There's an index on the column
they're keying off of. 1% of the key values have an unusually large number of
records.

Without peeking at placeholders the system should see that virtually all the
key values are well under the threshold for an index scan to be best. So it
always uses an index scan. 1% of the time it takes longer than that it would
have with a sequential scan, but only by a small factor. (On the whole we're
probably still better off avoiding the cache pollution anyways.)

With peeking at placeholders 99% of the backends would perform the same way.
However 1 backend in 100 sees one of these unusual values for its first query.
This backend will use a sequential scan for *every* request. Executing a
sequential table scan of this big table once a second this backend will drive
the entire system into the ground.

This means every time I start the system up I stand a small but significant
chance of it just completely failing to perform properly. Worse, apache is
designed to periodically start new processes, so at any given time the system
could just randomly fall over and die.

I would rather incur a 10% penalty on every query than have a 1% chance of it
keeling over and dieing. Given this I would when I upgrade to 8.0 have to
ensure that my application driver is either not using placeholders at all (at
the protocol level -- I always prefer them at the api level) or ensure that
postgres is *not* peeking at the value.

I like the feature but I just want to be sure that it's optional.

--
greg

pgsql-performance by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: seqscan instead of index scan
Next
From: Greg Stark
Date:
Subject: Re: seqscan instead of index scan