Data point on the competition regarding selectivity of unknown parameters - Mailing list pgsql-hackers

From Greg Stark
Subject Data point on the competition regarding selectivity of unknown parameters
Date
Msg-id 87r7pmsdij.fsf@stark.xeocode.com
Whole thread Raw
Responses Re: Data point on the competition regarding selectivity of unknown parameters  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

One of the things I think has to change with postgres is the default
selectivity assumptions for inequality operators. They're way to high
currently. Probably the single most frequently asked question on -performance
and -general are people asking why Postgres isn't using their index. And while
some of the cases are caused by other things, easily half the time it's simply
Postgres making unreasonably pessimistic assumptions about the selectivity of
inequalities.

Just out of curiosity I checked what Oracle does. This took my a lot of effort
so I hope you find the data useful. 

The short of it is that Oracle assumes 5% for a single inequality. It assumes
.25% for a range query, but that could just be the product of two 5%s. I would
guess it's not handling range queries as a special case.

This actually is a startling coincidence. I was already planning to argue for
precisely 5% myself as a reasonable compromise. I think even lower values make
sense but at least 5% would be low enough to consistently cause index scans,
which seems to match users' expectations.


Methodology:

This is with Oracle 10g (aka 10.1.0.2.0) on Linux.

I created a table with 100,000 records containing a single integer column
populated with integers ranging from 1 to 100,000 and ran "analyze table test2
compute statistics" on it. This is vaguely equivalent to "vacuum analyze full"
as far as gathering statistics. It tells it to scan the entire table, not to
use any sampling.

I then ran
explain plan for select 1 from test2 where a > :0;select cardinality from plan_table;

The results were 5000.
I did the same thing for "where a>:0 and a<:1"

The results were 250.

I intend to try some other distributions and other where clauses tomorrow. But
for now it's time for some sleep.

-- 
greg



pgsql-hackers by date:

Previous
From: Reini Urban
Date:
Subject: plperl regression tests
Next
From: "Simon Riggs"
Date:
Subject: Re: Forward zeroing of pg_clog